Thursday, August 30, 2007

rsyslog v3 object model and message flow

I thought quite a while about the upcoming configuration file format today. I
have to admit, I am not very satisfied with the outcome. I have a few potential
formats, but either they are too brief (and quite complex) or there is a lot of
text. Also, I am not satisfied yet with the flexibility of the format. I guess
this is a harder nut to crack than I thought. Unfortunately, I can not yet post
any of the ideas, as I have done them on paper and they are probably unreadable
to anybody but me. Bear with me, I'll create an online version these days.


But to understand any config file formats, you need to know more about the
object model. A first version of that model has materialized on my mind, and
actually rsyslog 1.18.0 and above is already supporting parts of it. To aid
understanding (and get some discussion going), I have created a quick outline of
what I have in mind. It's textual, it's incomplete and it can not be formally
verified. Still, I think (sincerely hope), it is useful. So I decided to post
what I currently have. Besides the actual object model discussion, it will help
a great deal in understanding what needs to be in the config file (thus it's
currently a bit biased towards configurable objects and internal ones are almost
completely missing).


rsyslog v3 Objects


Rsyslog uses objects, even though it is written in C. This can quite well be
done. Only at some points (like inheritance) we need to fiddle a bit with the
language. But nothing to hard to be done. We try to keep object overhead very
low, so it should be more like traditional C than C++.


module


The module object is the base class for anything that even remotely looks
like a loadable object. In the long term, that might eventually be almost
everything. It handles the necessary plumbing like loading modules, keeping
track of module status, querying interface and all those things... It does not
by itself provide any module-specific actions.


input


An input gatheres messages (events) from external sources. Current typical
examples are UDP or TCP based syslog. However, there is no architectural limit,
so in the long term an input module may also gather SNMP traps or file lines.
Please note that an input does not necessarily parse the obtained event by
itself - this may be delegated to a parser module (this whole thing still needs
to be thought out).


output


An output module receives strings from the engine and writes them to some
ultimate destination. Popluar examples are files, databases or remote syslog
servers.


action


The action object is the "engine wrapper" around an output module. It
provides numerous servies to the output. Most importantly, it provides the core
plumbing behind restarting and queueing actions.


function


A function object (and lodable module) provides extensibility for internal
processing. Version 3.0 will support programming-like functions, which replace
and extend the property replacer options. For example, we may want to extract
characters 5 to 10 and convert them to lower case. With current rsyslog, this
works as follows: %msg:5:10:lower%. With Version 3, it will look much nicer (at
least I hope so): lower(substr(%msg, 5, 10)). The exact sytax and semantics of
how this is used in the config file is under development - but I think you can
get the idea from the example. The core improvement featuere-wise is that with
functions, rsyslog can very esaily be extended by just programming a small
plug-in with a new function. Do to the programming-like strucuture, the new
function can be combined with all exisitng ones.


Functions will be supported everywhere a string is supported, which means
everywhere. Examples are filters, output format (templates) and file name
generation.


Message Flow


This is a brief description of how a message (an event) flows inside rsyslog.


The message text is received by the input module. The input module, possibly
with the help of a parser module uses the message text to create a msg-in memory
structure. This msg structure enters rsyslog's main processing queue. As soon as
possible, the msg object is dequeued. Now the rule engine processes the rule set.
A rule set object contains multiple rule object, which are executed in order
until either all rules are processed or a discard action is encountered. If
there are multiple rule set objects bound to a single input module, they are
also executed in the configured order (the normal discard action will discard a
message for one rule set, but an additional rule set will  continue to
process the msg object. A special case of the discard action can be envisioned,
which discards a msg object for all bound rule sets.. Each rule object contains
filter objects and action objects. First, all filter objects are executed. That
means, their expressions are evaluated. If there are multiple filter objects,
their result is combined in a AND operation (other modes are not supported -
that could be easily done by crafting a specific filter expression, so we do not
encourage an additional set of complexity be allowing additional boolean
operators when multiple filter objects are used inside a single rule). If the
outcome of the overall filter object evaluation results in true ("to be
processed"), all action objects are called. The are called in the order of their
appearance in the config file (no exceptions here). The msg object is used to
generate that string required by the action (note that the msg object itself is
NOT passed to the action, this is a security and encapsulation boundary). For
string generation, function plug-ins may posibly be called (this is also the
case during filter procesing). The resulting strings are  passed to the
execute method of the action object. That object decides if queing,
rate-limiting or other functionality generically available to all actions is the
be carried out. All of this is done by the action object itself. Finally, the
strings are passed from the action object to the actual output object (a plug-in).
The output objct takes the strings and performs whatever processing needs to be
done. After all actions, rules and rule sets have been processed (or when a
discard action occurs), the msg object is destroyed. Please note that even than
a copy may be held in memory, because that might be needed for duplicate message
reduction and similiar features. Thus, msg is reference-counted and actually
only destroyed when the count reaches zero.


Please note that there are a number of utility objects involved. For clarify,
these have been omitted. Also, all function calls return a meaningful return
state. The caller will process that state with care. In some cases, however,
this may mean ignore a failed call, because this is the most appropriate thing
to do (e.g. when an action fails - if we'd abort processing the msg object, we'd
do much more harm).


Data flow


This is a brief and not so accurate picture of the data flow. It concentrates
on the role of plug-ins.


input module(parser) -> rule engine (custom functions) -> action (with
aggregated output module)


Known Open Issues with Plug-ins


The object model is not yet fully designed, so there is nothing bad about
open issues. I list them, so that the do not get lost.


Probably the number one question is how message-modifying plug-ins can be
created. I think about things like implementing syslog-sign or TLS security. It
is my current, unverified, view that we must have something like "filter-plug-ins"
which can be placed inside the message data flow to modify the message text and/or
the msg object.


Copyright © 2007 by
Rainer Gerhards

Last Updated: 2007-08-30

Post a Comment