Wednesday, December 19, 2007

modules, core functionality and rsyslog v3...

As I have written, I have begun to work on rsyslog v3 (and so far I am pleased to say that I have made quite good progress on it). One of the things with rsyslog v3 is that it will have an even more module architecture, utilizing loadable modules for almost everything. I asked on the mailing list about backward compatibility and I received this very good response by Michael Biebl:

One thing I was wondering:

If you intend to shift all (even core) functionality into loadable modules, how do do you handle things like --help or available command line options like -m?

Do you want to hardcode it or will you provide an interface, where rsyslog will query the module about its help message and available options.

I'm also still a bit uncertain, if moving everything resp. core functionality to modules is a good idea (for problems you already mentioned). Imho having all core functionality in a single binary is simply much more robust and fool proof. For things like the SQL db output plugin, the module interface is great, because it avoids to pull in large library and package dependencies and allows to install them on a as need basis. For other functionality I still need to recognize the benefits.

Rainer, could you roughly sketch, how you envision to break rsyslog into loadable modules in v3. Which kind of functionality would be loadable as module, which functionality do you plan to keep in the rsyslogd binary. A listing of all (planned) modules + the provided functionality and requirements would really help.

Another thing: Say you move the regexp support into a separate module. If a regexp is then used in rsyslog.conf, will you bail out with an error, simply print a warning (which could go unnoticed and the poor administrator doesn't know why his regexp doesn't know) or load modules on demand.

For the latter you'd need some kind of interface to query the *.so files for their supported functionality. I.e. the modules would export a list of config directives it supports and rsyslog could upon startup query each available module and create a map.

So, e.g. the ommysql module would export its support for the :ommysql: config directive. Whenever rsyslog finds such a config directive it could/would load the modules on demand.

Same could be done for the command line parameters. The imklog module would export, that it supports the -m command line parameter. Whenever that commandline parameter is used, rsyslog would know which module to load.

There are only rough ideas and there is certainly still much to consider. But what do you think about the basic idea?

This is a great response - it not only asks questions but offers some good solutions, too. It comes at a perfect time, too, because there is much that is not yet finalized for v3. For sure I have (hopefully good ;)) ideas, but all of them need to be proven in practice. The issues that come up here are a good example.

So, now let me go into the rough sketch about I envision what v3 will do. Note that it is what I envision *today* - it may change if I get good reasoning for change and/or smarter solutions.

First, let me introduce two blog posts which you may want to read before continuing here:

And, most importantly, this post already has the root reasoning for pushing things out of the syslogd core:
Let me highlight the two most important parts from that later post:

This is exactly the way rsyslog is heading: we will try to provide an ultry-slim framework which offers just the basic things needed to orchestrate the plug-ins. Most of the functionality will indeed be available via plug-ins, dynamically loaded as needed.

... With that design philosophy, we can make rsyslog really universally available, even on low-powered devices (loading just a few plug-ins). At the high end, systems with a lot of plug-ins loaded will be able to handle the most demanding tasks.
And this is actually what the v3 effort is all about: rsyslog should become as modular as possible, with the least amount of code in the core linked binary and everything else provided via plugins. I still do not know exactly how that will happen, I am approaching it incrementally. I am now at the input plugins and trying to set them right.

In the longer term, there will be at least three different types of plugins: output, input and "filter". I think I do not need to elaborate about the first to. Filter plugins will provide work together with expressions, another feature to come. It will enhance the template and filter system to provide a rich expression capability supporting function calls. For example, a template may look like this in a future release:

$Template MyTemplate, substr(MSG, 5, 10) + "/" + tolower(FROMHOST) + "/"

and a filter condition may be

:expr:substr(MSG, 5, 10) == "error" /var/log/errorlog

Don't bash me for the config format shown above, that will also change ;)

Regexpt functionality will then be provided by something like a regexp() function. Functions will be defined in loadable modules. Pretty no function will be in the core. A module may contain multiple functions.

Bottom line: almost everything will be a loadable module. If you do not load modules, rsyslog will not do anything useful.

Now a quick look at the command line options: I don't like them. Take -r, for example. Sure, it allows you to specify a listener port and also allows to convey that a listener should be started at all. But how about multiple instances? How about advanced configuration parameters? I think command line options are good for simple cases but rsyslog will provide much more than can be done with simple cases. I favor to replace all command line options with configuration file directives. This is the right place for them to be. Except, of course, such things like where to look for the master configuration file.

Which brings up backward compatibility. As you know, I begin to be puzzled about that. After all, rsyslog is meant to be a drop-in replacement for sysklogd. That means it should run with the same options like sysklogd - and should also enable administrators to build on their knowledge with sysklogd. Tough call.

Thankfully, sur5r introduced the idea of having a compatibility mode. He suggested to look at the absence of a rsyslog.conf file and then conclude that we need to run in that mode. That probably is a good suggestion that I will pick up. It can also be extended: how about a, for example, "-c" command line switch. If absent it tells rsyslog to use compatibility mode. And it should absent in previous versions as well as sysklogd, because it was not defined there.

Now let's think. If we know we need to provide compatibility, we can load a plugin implementing compatibility settings (again, moving that out of the core functionality). Once loaded, it could analyze the rest of the command line and load whatever modules are necessary to make rsyslogd correctly interpret a post v3 configuration file. That way we have a somewhat larger then necessary memory footprint, but all works well.

Then back to native mode. Here, indeed, I'd expect that the user loads each and every module needed. I assume, however, that for any typical package the maintainer will probably load all "core" functionality (like write to file, user message, several inputs, common filter functions, ...) right there in the default rsyslog.conf. This make sense for today's hardware. It also will make the config quite foolproof. A good way to implement that would work on the semantics of $IncludeConfig. How about:

$ModLoad /whereever/necessrayplugins/

which would load all plugins in that directory.

The key point, however, is that in a limited environment, the very same binaries can be used. No recompilation required. This would be scenarios with e.g. embedded devices - or security sensitive environments where only those components that are absolutely vital should run (which is good practice because it protects you from bugs in the not-loaded code).

I personally find it OK to handle the situation as described above. I don't like magic autoloading of modules.

This modular approach has also great advantages when it comes to maintaining the code and making sure it is as bugfree as possible. Modules tend to be small, modules should be independent of each other. So testing and finding/fixing bugs that escaped testing should be considerably easier than with the v2 code base. There are also numerous other advantages, but I think that goes to far for this post...

Comments are appreciated. Especially if you do not like what I intend to do. Now is the time to speak up. In a few weeks from now, things have probably evolved too far to change some of the basics.

No comments:

Busy at the moment...

Some might have noticed that I am not as active as usual on the rsyslog project . As this seems to turn out to keep at least for the upcomi...