Monday, December 31, 2007

rsyslog work log and future directions

Hi folks, probably the last rsyslog work log post for 2007. Thanks for sticking around - and hopefully I'll see you again in 2008. It'll become a very exciting year, with a lot of new features. I am eager to implement what is right now on my head, and I'll most probably will start with modifying the message queue, an endeavor that will ultimately lead to store-and-forward capability just like in syslog-ng's premium edition. And the good news is that I hope to finish that in January 2008 ;) -- what also means that I have made up my priorities. Was not an easy job, and I hope I got it right. So store-and-forward with enhanced output threading is first and the other things will follow later. To me, the hardest decision was to put off expressions, another feature at least I would like to see the sooner the better.

But now back to the work log:
2007-12-27
- added $UDPServerAddress config directive
- added capability to have multiple UDP listeners running concurrently
- applied cross-platform patch from darix to facilitate GSS-API compile
on more platforms
- some cleanup
- internal restructuring in omfwd.c - stage work for further modularization
I think I also fixed a bug as a side-effect - but not looked to much at it
2007-12-28
- took TCPSend() apart and made it generic via function pointers
- moved TCPSend() and frame building code to tcpsyslog.c
- omgssapi created
- removed gss-api code from omfwd.c

Friday, December 28, 2007

gss-api and rsyslog v2

I initially sent this message only to the mailing list. But now I think it make sense to reproduce it here. So there we go:

I am working on the modular structure of rsyslog v3. I am currently revisiting gss-api support. I notice that with the current omfwd, it will be extremely hard to separate gss-api support into its own module. Doing so will break backward compatibility to the configuration file.

GSS-API has been out only for a few days, and mostly over the holiday period. So it is much less of a concern if we introduce now some changes that will case rsyslog.conf format modifications. Much less trouble than when we release v2, a release expected to be in wide use for at least half a year, if not much longer. V2 released with the current syntax would require me to do some tricks in v3 to keep compatibility. Quite complex.

So I decided to create a omgssapi for v3 and extract the gss-api code from omfwd. It looks like this can be done without too much code duplication. There will be some duplicate code, but it will shrink as v3 continues to be developed. Once I have a good working version, which I expect very soon, I will backport that to the v1/2 source tree. I'll then do a new v1 release with a slightly incompatible gss-api config file syntax. After this is out for a few days, I hope I can than finally push out that version as v2.

I hope this is a good decision. I think it will save us major future trouble at the expense of a relatively slight disturbance in the late v1 timeline. I guess most user's won't even notice there is a change.

As always, Feedback is appreciated.

Thursday, December 27, 2007

Things to do in rsyslog...

I have made good progress with rsyslog's input modules. As it looks, the basic things are done and the input module interface has been proven to be both quite stable as well as very simple. It doesn't yet support different instances, but I begin to think that I do not even need them - also not in the long term.

Of course, most of the current input modules are not clean modules. They have a lot of dependencies to other parts of the code, which is not yet able to be dynamically loaded. But at least there is a foundation on which additional modules could be build. Getting the current input modules to be real clean modules will require further stage work. Many thanks need to be done.

So what to do next? It now comes down to both a matter of priorities and dependencies. I am writing this note here mostly for myself. It helps my clear up my thoughts and will also probably serve as a reference for quite a while. My thoughts may be hard to understand - sorry for that. But I thought I make them public when I write them down - even if they are not really targeted toward others. I still hope they may help you get some more background info.

So what's to do:

- find a way to handle global settings
- multi-threaded output modules
a perquisite for
- create queued outputs (write to queue if action fails and
restart when it resumed)
- re-write way config file is read
probably perquisite for:
- create expression support
- in templates
- in selector filters
- create interface for (loadable) user function modules
- create a system to allow loading "library" loadable modules
(e.g. network library for imudp, imtcp, ...)
- separate GSSAPI from plain TCP (requires libs and lib extension system)

xmas rsyslog work log

I've been a bit busy with rsyslog over the xmas break. Here are the changes:

2007-12-25
- moved some more net functionality out of syslogd.c - stage work
- fixed duplicate license text in syslogd.c - made it ambigious
- moved udp net code - again, stage work
- moved some of the udp input code to its right place
2007-12-26
- moved cross-platform define for AI_NUMERICSERV to net.h
- made udp code somewhat less dependable on global variables -- stage work
- removed omfwd code dependency on "finet"
- removed imudp code dependency on "finet"
- removed active INET code from syslogd.c - still some auxiliary things
remain
- fixed socket leak in omfwd.c
- removed global variable LogPort
- removed gloable variable AcceptRemote and external def of bFinished

Monday, December 24, 2007

Seasons Greetings to Everyone

My best wishes to everyone! Let me share this lovely impression:

fog and cool temperatures create a christmas wonderland - seen 2007 in GermanyI thought this image conveys much of the beauty of our planet earth and the hopefully peaceful holiday season. I wish all of you great holidays, nice gifts and time with your beloved ones.

In 2007, we've gone a long way. Both from an Adiscon perspective, with lots of new product releases and great features and also from the rsyslog point of view. And, of course, there were private highlights as well, for example my unforgettable trip to view space shuttle Discovery's sts-120 launch. Thanks everyone for your support and all the kind words I received!

Once again, a great holiday season to all of you!

PS: if you enjoyed the image above, you may want to have a look at my xmas 2007 impressions gallery.

Friday, December 21, 2007

rsyslog work log for 2007-12-21

It has been a good day today! Finally, the alarm() call has been deleted! :) That was something long on my agenda, but I couldn't do it without the redesign of the inputs. The alarm() was not really a big issue, but it became an annoyance to me because it was so hard to remove.

I would also like to mention that I will do only occasional work during the holiday period. So I do not expect more serious changes until early January. Some releases, however, are due next week (maybe 2.0.0).

Here is the detailed rsyslog worklog for today:


- removed no longer needed mutex from omfwd
- released a preview of 3.0.0 "as is" to mailing list - just to get the idea
- begun work on imtcp
- created first version of imtcp (still very much depending on syslogd.c for
configuration and a lot of other things)
- cleaned up code (resulting in some shuffeling from syslogd.c to the
"right" module)
- prepared for imudp
- created an initial version of imudp.c. The majority of UDP reception code
is now in that module and it is dynamically loadable. HOWEVER, that doesn't
mean it is a proper module. There are still many, many dependencies on
global variables, cross-module calls and such. However, havin the code base
separated allows me to carry out some other cleanup before I return to
create a really clean implementation of these modules. So it is kind of a
stage work. Just don't mistake it with "the real thing"...
- removed code no longer needed
- finally, alarm() has gone away :) -- this is now done by the main thread
- some cleanup

rsyslog work log for 2007-12-20

Yesterday was a very busy day for rsyslog. I am on a good path to input modularization, but the hardest part needs still be done ;)

Here is the log:

- bugfix: fixing memory leak when message queue is full and during
parsing. Thanks to varmojfekoj for the patch.
- working on a potential race condition on the new input module
interface. See newsgroup posting for details on the issue:
http://groups.google.com/group/comp.programming.threads/msg/330b9675f17a1ad6
I tried some mutex operations but came to the conclusion that this
does not really help. So I have now switched to plain thread
cancellation, which so far seems to be OK. Need more practical
experience with other input modules to make a final decision. Thus
I leave all code in and have just disabled the problematic
code.
- implemented $klogUseSyscallInterface config directive
- implemented $klogSymbolLookup config directive
- moved unix socket code to its own module (imuxsock)
- implemented $OmitLocalLogging config directive
- bugfix: memory leak in cfsysline.c/doGetWord() fixed
- implemented $SystemLogSocketName config directive
- implemented $AddUnixListenSocket config directive
- MILESTONE reached: imuxsock initial version done
- removed single-threading support for sending TCP messages; caused
simplyfication of output module interface as well as core syslog
processing.
- moved udp send code to its own function

Thursday, December 20, 2007

rsyslog work log...

Here is my recent rsyslog work log:

2007-12-18
- removed files from cvs that not belong there (thanks to Michael Biebl for
pointing that out)
- restructured #include's somewhat thanks to Michael Biebl
- code cleanups thanks to Michael Biebl
- applied Michael Biebl's patch to enhance $includeconfig to support
wildcard filenames
2007-12-19
- applied some more cleanup provided by Michael Biebl
- applied enhanced gss-api functionality provided by varmojfekoj
- GSS-API support for syslog/TCP connections was added. Thanks to
varmojfekoj for providing the patch with this functionality
- release 1.21.0
- added the -c option
- enhanced -c option support (some basics)
- bugfix: llDestroy() left the list with invalid root/last pointers

Wednesday, December 19, 2007

modules, core functionality and rsyslog v3...

As I have written, I have begun to work on rsyslog v3 (and so far I am pleased to say that I have made quite good progress on it). One of the things with rsyslog v3 is that it will have an even more module architecture, utilizing loadable modules for almost everything. I asked on the mailing list about backward compatibility and I received this very good response by Michael Biebl:

One thing I was wondering:

If you intend to shift all (even core) functionality into loadable modules, how do do you handle things like --help or available command line options like -m?

Do you want to hardcode it or will you provide an interface, where rsyslog will query the module about its help message and available options.

I'm also still a bit uncertain, if moving everything resp. core functionality to modules is a good idea (for problems you already mentioned). Imho having all core functionality in a single binary is simply much more robust and fool proof. For things like the SQL db output plugin, the module interface is great, because it avoids to pull in large library and package dependencies and allows to install them on a as need basis. For other functionality I still need to recognize the benefits.

Rainer, could you roughly sketch, how you envision to break rsyslog into loadable modules in v3. Which kind of functionality would be loadable as module, which functionality do you plan to keep in the rsyslogd binary. A listing of all (planned) modules + the provided functionality and requirements would really help.

Another thing: Say you move the regexp support into a separate module. If a regexp is then used in rsyslog.conf, will you bail out with an error, simply print a warning (which could go unnoticed and the poor administrator doesn't know why his regexp doesn't know) or load modules on demand.

For the latter you'd need some kind of interface to query the *.so files for their supported functionality. I.e. the modules would export a list of config directives it supports and rsyslog could upon startup query each available module and create a map.

So, e.g. the ommysql module would export its support for the :ommysql: config directive. Whenever rsyslog finds such a config directive it could/would load the modules on demand.

Same could be done for the command line parameters. The imklog module would export, that it supports the -m command line parameter. Whenever that commandline parameter is used, rsyslog would know which module to load.

There are only rough ideas and there is certainly still much to consider. But what do you think about the basic idea?

This is a great response - it not only asks questions but offers some good solutions, too. It comes at a perfect time, too, because there is much that is not yet finalized for v3. For sure I have (hopefully good ;)) ideas, but all of them need to be proven in practice. The issues that come up here are a good example.

So, now let me go into the rough sketch about I envision what v3 will do. Note that it is what I envision *today* - it may change if I get good reasoning for change and/or smarter solutions.

First, let me introduce two blog posts which you may want to read before continuing here:

And, most importantly, this post already has the root reasoning for pushing things out of the syslogd core:
Let me highlight the two most important parts from that later post:

This is exactly the way rsyslog is heading: we will try to provide an ultry-slim framework which offers just the basic things needed to orchestrate the plug-ins. Most of the functionality will indeed be available via plug-ins, dynamically loaded as needed.

... With that design philosophy, we can make rsyslog really universally available, even on low-powered devices (loading just a few plug-ins). At the high end, systems with a lot of plug-ins loaded will be able to handle the most demanding tasks.
And this is actually what the v3 effort is all about: rsyslog should become as modular as possible, with the least amount of code in the core linked binary and everything else provided via plugins. I still do not know exactly how that will happen, I am approaching it incrementally. I am now at the input plugins and trying to set them right.

In the longer term, there will be at least three different types of plugins: output, input and "filter". I think I do not need to elaborate about the first to. Filter plugins will provide work together with expressions, another feature to come. It will enhance the template and filter system to provide a rich expression capability supporting function calls. For example, a template may look like this in a future release:

$Template MyTemplate, substr(MSG, 5, 10) + "/" + tolower(FROMHOST) + "/"

and a filter condition may be

:expr:substr(MSG, 5, 10) == "error" /var/log/errorlog

Don't bash me for the config format shown above, that will also change ;)

Regexpt functionality will then be provided by something like a regexp() function. Functions will be defined in loadable modules. Pretty no function will be in the core. A module may contain multiple functions.

Bottom line: almost everything will be a loadable module. If you do not load modules, rsyslog will not do anything useful.

Now a quick look at the command line options: I don't like them. Take -r, for example. Sure, it allows you to specify a listener port and also allows to convey that a listener should be started at all. But how about multiple instances? How about advanced configuration parameters? I think command line options are good for simple cases but rsyslog will provide much more than can be done with simple cases. I favor to replace all command line options with configuration file directives. This is the right place for them to be. Except, of course, such things like where to look for the master configuration file.

Which brings up backward compatibility. As you know, I begin to be puzzled about that. After all, rsyslog is meant to be a drop-in replacement for sysklogd. That means it should run with the same options like sysklogd - and should also enable administrators to build on their knowledge with sysklogd. Tough call.

Thankfully, sur5r introduced the idea of having a compatibility mode. He suggested to look at the absence of a rsyslog.conf file and then conclude that we need to run in that mode. That probably is a good suggestion that I will pick up. It can also be extended: how about a, for example, "-c" command line switch. If absent it tells rsyslog to use compatibility mode. And it should absent in previous versions as well as sysklogd, because it was not defined there.

Now let's think. If we know we need to provide compatibility, we can load a plugin implementing compatibility settings (again, moving that out of the core functionality). Once loaded, it could analyze the rest of the command line and load whatever modules are necessary to make rsyslogd correctly interpret a post v3 configuration file. That way we have a somewhat larger then necessary memory footprint, but all works well.

Then back to native mode. Here, indeed, I'd expect that the user loads each and every module needed. I assume, however, that for any typical package the maintainer will probably load all "core" functionality (like write to file, user message, several inputs, common filter functions, ...) right there in the default rsyslog.conf. This make sense for today's hardware. It also will make the config quite foolproof. A good way to implement that would work on the semantics of $IncludeConfig. How about:

$ModLoad /whereever/necessrayplugins/

which would load all plugins in that directory.

The key point, however, is that in a limited environment, the very same binaries can be used. No recompilation required. This would be scenarios with e.g. embedded devices - or security sensitive environments where only those components that are absolutely vital should run (which is good practice because it protects you from bugs in the not-loaded code).

I personally find it OK to handle the situation as described above. I don't like magic autoloading of modules.

This modular approach has also great advantages when it comes to maintaining the code and making sure it is as bugfree as possible. Modules tend to be small, modules should be independent of each other. So testing and finding/fixing bugs that escaped testing should be considerably easier than with the v2 code base. There are also numerous other advantages, but I think that goes to far for this post...

Comments are appreciated. Especially if you do not like what I intend to do. Now is the time to speak up. In a few weeks from now, things have probably evolved too far to change some of the basics.

Tuesday, December 18, 2007

rsyslog changes for 2007-12-17

Yesterday's rsyslog changes:

2007-12-17
- fixed a potential race condition with enqueueMsg() - thanks to mildew
for making me aware of this issue
- created thread-class internal wrapper for calling user supplied thread
main function
- solved an issue when compiling immark.c on some platforms. LARGEFILE
preprocessor defines are changed in rsyslog.h, which causes grief
for zlib. As a temporary solution, I have moved rsyslog.h right at the
beginnng of the include order. It's somewhat dirty, but it works. I think
the real solution will be inside the autoconf files.
- moved thread termination code out to threads.c
- implemented $MarkMessagePeriod config directive
- command $ResetConfigVariables implemented for immark.c
- begun imklog, replacing klogd.c (finally we get rid of it...)
- implemented $DebugPrintKernelSymbols
- implemented afterRun input module interface function
- implemented $klogSymbolsTwice config directive

As you can see, it was quite a busy day. The input module interface has already materialized for the most part.

Tracking file deletions on Windows

Have you ever wondered why an important file magically disappeared? My co-worker Andre has worked with a couple of folks, who didn't like that scenario. As a corporate policy, deletion to some important file locations must be logged. Andre has created a nice guide for use under Windows. He utilizes MonitorWare Agent's event log monitoring capabilities together with its advanced rule engine.

That guide is not only a good source of information if you need to implement Windows file deletion tracking. It also shows nicely what can be done with MonitorWare. It definitely helps understanding the full potential and how to make best use of it.

I suggest you have a quick look at Andre's guide to tracking file and directory deletions under Windows.

Monday, December 17, 2007

rsyslog changes upto 2007-12-14

This is my worklog for rsyslog:

2007-12-12
- begun to shuffle the mark code to a separate module - that will take some
time and definitely require much more code shuffling. This is the begin
of the input module interface

2007-12-14
- created new branch for what will become 2.0.0 stable
- begin work on on immark, the first input module. In the long term
this will lead to a complete rewrite of the input system
- changed license to GPLv3 (for what is to become rsyslog v3)
- moved core threading helpers out of syslogd.c
- remove USE_PTHREADS macro from all sources except omfwd.c (I wait
for a gssapi patch from Red Hat, removing these macros would probably
cause unnecessary grief...)
- tried approach to terminate input module thread via pthread_kell() - so
far, seems to work ok
- begun to create input module interface and macros
- changed module interface to include function to query type
- milestone: can load input module dynamically, but can not do anything
with it - now I need to think about activating IMs...

Friday, December 14, 2007

begun working on rsyslog v3

I reproduce a note here that I sent out to the mailing list this morning. In the mean time, I have done most of the work in CVS.

As you know, I am looking at the way threading is supposed to work in future releases and, most importantly, looking at the inputs (like mark message generation).

Around summer, I wrote that I will probably need to release new major versions when we go into multithreading redesign. It looks like we have reached this stage. I tried to keep a single code base that still supports both single- and multi-threaded operations. I have looked into this the past days and I need to say that it creates a lot of complexity and hard to understand code.

For this reason, I think it is finally time to branch the code based and release some new versions.

Soon, I will create a branch for the current 1.20.1 code base. That will only receive bug fixes, but no new development (except, I guess, GSSAPI which I about to be contributed by Red Hat). When we are confident the last changes worked well and introduced no new bugs, there will be a version 2.0.0 stable release based on that code base.

CVS head, however, will then be rsyslog version 3. It will receive the new input module interface. It requires pthreads, because there is no way input modules and many more of the new desired features can be implemented without them. Consequently, I will remove all single-threading code from it, resulting in an easier to understand code base. Please note that I expect this code to change dramatically when it is being modified to be more modular (much like it was when I introduced modular outputs in summer). Please note that I will apply any non-bugfix patches to this code base, only.

I have somewhat bad feeling of going ahead with implementing a more sophisticated and more parallel multi-threading while we still have an issue with the segfault. However, I think by now we did everything imaginable to capture that rare bug. I have come to the conclusion that the best chance to find it is go ahead and implement the more sophisticated design. That will lead to a review, and rewrite, of much of the code in question, uncovering this we didn't think about before. The recently discovered race condition is an excellent sample.

One thing about the license: rsyslog 2 will stay with "GPL v2 and above" license, but rsyslog V3 will be licensed under "GPL v3 and above". I already wrote about that change. It is my firm believe that GPL v3 brings benefit to our freedom to use digital goods. I am a strong oppose of digital restrictions management (DRM) and software patens and I do not like the idea that rsyslog benefits anyone who encourages these things. I hope for your understanding.

I will set stage now for these changes and will do a web announcement soon. Please don't be surprised that rsyslog v3 will be available before v2, you now know the reason.

Wednesday, December 12, 2007

rsyslog changes upto 2007-12-12

It looks like I have become too lazy in reporting my changes. I'll try to be more quickly again in the future. Here is the part of the work log that is missing. Please note that it does not always mention my hard thinking about the new threading model ;)

2007-12-07
- applied patch from Michael Biebl to finally fix the -ldl cross-platform
issue
- fixed some type conversion warnings that appeared on 64 bit machines - these were in
debug statements, so indicated no real problem
- some code cleanup
- released 1.20.0 (finally ;))

2007-12-11
- When a hostname ACL was provided and DNS resolution for that name failed,
ACL processing was stopped at that point. Thanks to mildew for the patch.
Fedora Bugzilla: http://bugzilla.redhat.com/show_bug.cgi?id=395911
- fixed a small memory leak that happend when PostgreSQL date formatting
was used
- corrected a debug setting that survived release. Caused TCP connections
to be retried unnecessarily often.
- added expr.c, which has some thoughts on expression implementation
- fixed a potential race condition, see link for details:
http://rgerhards.blogspot.com/2007/12/rsyslog-race-condition.html
- added synchronization class to handle mutex-operations in the most
portable way.

2007-12-12
- handled selector flushing on termination (and hup) correctly. Could lose
some information before.
- done some more hard thinking on the threading model for upcoming
enhancements
- released 1.20.1

Tuesday, December 11, 2007

rsyslog race condition fixed

There is a race-condition when running rsyslog 1.20.0 or any previous release in multi-threaded mode. The probability for it to happen is very remote, but there definitely is a design flaw in it. Quick communication revealed, unfortunately, that this flaw can not be responsible for the hard to track segfault bug. The segfault occurs in a situation that does not match what I have found out. I discovered this problem when I worked on multi-threading re-design and focussed on input modules. Maybe my decision to hold off redesign until after the segfault bug has been found was wrong. Redesign forces me to look at more places from a very different angle and that may reveal even more (at lest I hope so).

Now here is the story on the "mark" race condition I discovered today:

fprintlog() duplicates the message object when we have a "last message repeated n times". It does this by saving the pointer to the message object in a temporary buffer, carries out its work and then checks if it needs to restore the saved pointer. This works well in single threading as well as in almost all cases when running multi-threaded. However, if we reach a mark interval domark() calls fprintlog() potentially concurrently to a call that is already in place. What can happen is:
  1. domark() calls fprintlog() on an action
  2. fprintlog() begin execution and saves the previous message to the buffer
  3. fprintlog() is preempted
  4. the worker thread now calls into fprintlog() with the exact same message
  5. fprintlog() processes the message and finishes (deletes it)
  6. now processing of that thread ends and our first thread is resumed
  7. fprintlog() performs its actions and restores the already freed message object pointer
Now, the action holds an invalid pointer. Depending on what happens next, it either aborts (if the buffer has been overwritten) or continues to function but does a double-free.

The root cause is that access to the action object is not synchronized. This was deemed unnecessary, because there could be no concurrent write operations be in place. The domark() processing, however, had been overlooked.

This analysis is still preliminary, but points into a good direction. It needs to be said, though, that the probability for this failure scenario is remote. I have confirmed this is a race condition.

If you think about it, the mark() processing as a whole does not make much sense if we have a full queue. It is awfully flawed. I don't like mark(): in the original sysklogd code, there was a similar issue: mark() was called by an alarm() timer and executed the full syslogd code during its processing. Given that lead to serious problems if some other message was being processed. I solved that issue by setting just a flag() in the alarm() handler. Then, actual mark() processing was started in the mainloop(). For single threading mode that works, because no other thread can be in the action processing at that time.

In multi-threaded mode, however, the mainloop() runs on a thread different from the work thread. So in fact, domark() can once again conflict with action processing. And if the queue is full, it does totally wrong things: because it uses whatever message is currently being processed as basis for emiting mark messages. This is seriously flawed! The root cause is that mark() processing does not go through the producer/consumer queue/gateway. This is what I now need to fix.

What mark() does is first to inject the "--mark--" message. That is no problem, because it is injected via the regular producer interface logmsgInternal(). But then, it calls domarkActions() on each action which in turn calls fprintlog(). It also accesses the messages then-current f_prevcount, which, with a full queue, has nothing to do with the last element being seen at that time.

The more I look at the code, the more I wonder what exact useful feature it is. I just checked the cuttren sysklogd source, and, yes, it still is there (even that domark() is being called in an alarm() handler is still there...). Interestingly, in both sysklogd and rsyslogd the "last message repeated n times" periodic display is turned off when mark messages are turned off. Is this intentional? I doubt so...

So what did the original sysklogd author think when he wrote that post? I guess he wanted to have an indication that messages had been dropped - and this not only when the next message with different text arrived, but after a set period (30 seconds with the current defines both in rsyslog and sysklogd). So message compression should indicate at least every 30 seconds that messages arrived, but were compressed. OK, that gives me something to think about.

Obviously, there is no point in emitting the "last message repeated n times" message if we have, let's say, 100 identical message sitting in the queue followed from at least one non-identical message. In that case, the queue will be processed as fast as possible and upon arrival at the non-identical message, the "repeat message" will be issued. No need to say anything in between. If, however, there is no such non-identical message, rsyslogd is left in a somewhat open state. The queue is depleted but still no output is written (but "n" messages have not been displayed). Only in this scenario it is appropriate to start a timeout timer that will ultimately lead to the "repeated message" when no further non-identical message arrives in the allocated time window.

To be more precisely, it is not actually a question of messages being in the queue. As we have sophisticated filtering with rsyslog, the question actually is if a message was processed (e.g. written to file) by an action within the "repeated message" time window (RMTW). When the last message has been processed can be quite different from action to action.

One way to track this is to keep track when each action last successfully was called. If the queue is reasonable full and the action is supplied with reasonable, to be processed, data on a steady rate, that time should never fall outside of the RMTW. And if it does, isn't that an indication that it is time to write a "repeated message" out, so that the operator is seeing at least one indication in every RMTW? Of course it is! So we can store this timer with each action and use it as a base check.

Now, we can periodically awake and check each action object; did it last process something outside of its RMTW AND does it have any repeated messages received? If so, it is time to emit the "repeated message" message. The fine difference to existing code is that we use the newly constructed timer here. Also, the action object must be locked, so that this background process and the worker thread(s) will not access the same vital data structures at the same time. The scenario from the top of this post would otherwise still apply. Finally, this processing should be de-coupled from the mark processing, because IMHO these are two totally separate things.

I will now go ahead and apply these changes and then we shall see where this bring us.

I have now done a first implementation. Actually, the code does not look that much different compared to before the change. The difference is that I have handled the timestamp thing a bit more transparently and, the biggie, I have used synchronization macros to guard the action object against the race condition. This code will become part of 1.20.1.

A design problem...

Folks, I am facing a design problem - and it looks so simple that I am pulling out all my hair ;)

I am currently preparing the next steps in modular rsyslog redesign. I am not sure yet, there are a couple of candidates what to do first. One is to add a real expression capability, another one is to add threaded inputs (which would be quite useful). In support of these enhancements, a number of things need to be changed in the current code. Remember, we are still running on large parts of the original sysklogd code, which was never meant to do all these advanced things (plus, it is quite old and shows its age). A cleanup of the core, however, requires some knowledge of what shall be done with it in the future.

My trouble is about a small detail. A detail I thought that should be easy to solve by a little bit of web search or doing a forum post or two. But... not only did I find the relevant information, I did not even find an appropriate place to post. May be I am too dumb (well possible).

OK, enough said. Now what is the problem? I don't know how to terminate a long-running "socket call" in a proper way under *nix. Remember, I have done most of my multithreading programming in the past ten years or so under Windows.

What I want to do: Rsyslog will support loadable input modules in the future. In essence, an input module is something that gets data from a data source (e.g. syslog/udp, syslog/tcp, kernel log, text file, whatever ...), parses it and constructs a message object out of it and injects that message object into the processing queue. Each input module will run on its own thread. Fairly easy and well-understood. The problem happens when it comes to termination (or config reload). At that instant, I need to stop all of these input module threads in a graceful way. The problem is that they are probably still in a long-lasting read call. So how to handle this right?

Under Windows, I have the WSACancelBlockingCall() API. Whenever I call that method, all threads magically wake up and their read and write calls return an error state. Sweet. I know that I can use signal() under Linux to do much of the same. However, from what I read on the web I have the impression that this is not the right thing to do. First of all it seems to interfere with the pthreads library in a somewhat unexpected way and secondly there is only a very limited set of signals available ... and none left for me?

The next approach would be to have each blocking call timeout after a relatively short period, e.g. 10 seconds. But that feels even worse to me. Performance wise, it looks bad. Design-wise it looks just plain ugly, much like a work-around. It looks like I needed to do something not knowing what the right thing is (which, as it turns out, is the right description at the time being ;)).

To make matters worse, I have a similar problem not only with the read and write calls but with other constructs as well. For example, I'd like to have a couple (well, one to get started) of background threads that handle periodic activity (mark messages immediately come to my mind). Again, I would need a way to awake them when it comes to termination time - immediately.

And, of course, I would prefer to have one mechanism to awake any sleeping thread. Granted, can't do that under Windows either, so I may need to use different constructs, here, too.

This is the current state of affairs. There is still enough work to do before the question MUST be answered in order to proceed. But that point in time approaches quickly. I would deeply appreciate any help on this issue. Be it either advise on how to actually design that part of the code - or be it advised where to ask for a solution! Really - a big problem is that I did not find an appropriate place to ask. Either the forum is not deeply technical enough, or there are some mailing lists where the topic is on something really different. If you know where to ask- please tell me!

[update] In the mean time, I have found a place to ask. Blieve it or not, I had forgotten to check for a dedicated newsgroup. And, of course, there is ;) The discussion there is quite fruitful.

Wednesday, December 05, 2007

going to view another shuttle launch...

... but this time not to Florida but to the Columbus Control Center located in Munic, Germany. The details are in my space blog for those that are curios ;) The bottom line is that I am away from my development machine from tomorrow afternoon until Friday at around noon.

Please keep your fingers crossed so that I can experience an on-time launch. This time, I can not stay any longer...

update on rsyslog changes

I have been a bit lazy in posting my recent rsyslog changes. This is because there was a lot of fine-tuning as well as design work. I also fear that I probably have forgotten to log a few less important changes. Nevertheless, the log should still provide you with a good idea of what's going on. If I do not get any more last minute nits, the 1.20.0 release can hopefully be released either tomorrow or on Friday.

Here are the changes now:

2007-11-27
- got new dev environment on Fedora 8 x64 up and running
- reviewed calls to MsgDestruct() - re-confirmed that there are no
double frees.
- moved syslog PRI code tables to msg.c as this is the only place where
they are still needed

2007-12-03
- adding sur5r's postgres module - many thanks for providing it!
- added Bartosz Kuzma patches for better compatibility with NetBSD - many
thanks for them!
- disabled GSS-API in configure.ac, as support for it is not yet complete
(Peter Vrabec told me to hold it until a further patch)
2007-12-04
- released 1.20.0 preview (via mailing list only, waiting on feedback)
2007-12-05
- applied patch from Bartosz Kuzma and Michael Biebl to fix compilation
problem on NetBSD (it is missing the -ldl lib, even though it provides
dlopen()).

Monday, December 03, 2007

rsyslog now does postgres SQL!

I have just finished the integration of a new postgres sql output module for rsyslog. The code was provided by sur5r. I am more than glad that rsyslog now supports the popular postgres sql engine. I am sure this is a very nice addition for the community.

Sur5r's able-coded code also provided some feedback about the output module interface in general. The good news is that the amount of code needed in the actual output module was as low as initially thought.

But it also proved that module design still is less than perfect: there was a new date format needed to support postgres sql. This could not be done in the output module. Instead, sur5r had to modify the core files. He did this in an excellent way, but the point is that it is not a good thing this was needed at all. It is a result of the old, non-modular syslogd design which we inherited from sysklogd and have not yet changed. In fact, we are at the very beginning of modularization. The next thing on the list is a loadable function interface. With that, such modifications will no longer be necessary, a custom (loadable) function can than be provided.

This enhancement requires a number of changes in the core, most importantly in the template subsystem. This is most probably the next area of work I am heading to.

But for the time being, let's celebrate that rsyslog now has postgres sql support! It will be released in 1.20.0, hopefully around the end of the week. A preview will be announced on the mailing list somewhat earlier. And if you can't wait, simply pull it out of anonymous CVS - it's in there as of this writing.

rsyslog now part of Fedora 8

Oops... I think I forgot to mention that rsyslog was recently included in Fedora 8. This is an important milestone for the rsyslog project, because it is the first time ever that it appears as the default syslogd in a wide-spread distribution.

Thankfully, a lot of other package maintainers have been found for other distros (I really appreciate your hard work, guys!), but everywhere else rsyslog is "just" an extra. In Fedora 8, it is the default logging daemon. I'd also like to add that Red Hat folks contributed quite a lot to rsyslog, including great testing.

Thanks to everyone who helped make rsyslog this capable. I think its appearance in Fedora 8 is rewarding for all of us!

rsyslog 1.19.12 released

I just released rsyslog 1.19.12. For the minor change I made in 1.19.11, I created a branch based on 1.19.10. I thought .11 would be the last release in that branch. Obviously, I was wrong. Working on two branches is definitely a lot of not really useful extra work. I'll try to get thing together this week, then again only working with one branch. That will be the feature-enhanced 1.20.0 release.

Wednesday, November 28, 2007

German UNAWE Committee ...

Last Saturday, I had the joy of attending a meeting that formally founded the German UNAWE Committee (which, as a side-note made me become a member of it).

UNAWE ("Universe Awareness for Young Children") is an internationally-recognized organization trying to educate young children on astronomy. The target age is rougly between 4 and 10 years. Besides astronomy, UNAWE is also about people (children) from different cultures talking to each others and sharing their experiences. This is a fantastic idea and I like it very much. There are already a number of UNAWE committees all around the world and I am eager to help grow that network.

If you are into astronomy, work with children, or both: consider contributing!

Did I find the rsyslog bug?

I have a new leading theory on the rsyslog segfault bug. Before I restructured everything to get rid of the alarm() calls, I did some more research in respect to the best threading model. More or less by accident, I found a nice note on glibc, rentrancy and the _REENTRANT preprocessor macro. That lead me to the "-pthread" compiler option...

May it be that we "just" have a compiler option problem? So far, we only call it with "-lpthreads", which only effects the linker. -pthread, if I understood it right, defines _REENTRANT, which in turn causes some reentrant versions of standard library functions to be used. If that isn't outdated information, it could pretty much be our problem.

I am not sure which functions of the run time library are affected by the _REENTRANT macro. So I do not know if I call them. However, non-reentrant run time library functions would be a good explanation for very seldom segfaults that only occur under heavy load and when running in multithreading mode. It would also explain why so many in-depth code reviews did not find anything...

This compiler option finding looks very promising. I'll probably do a special 1.19.11 release with just that patch and then we'll see what happens. In the mean time, I wait if somebody comes up with some more diagnostic information. But I finally have found a good explanation for what I see - at least in theory...

Tuesday, November 27, 2007

Is alarm() the culprit?

I have once again reviewed the threading. Now I have a weak hope. To handle mark messages, there is an alarm() call inside rsyslog. While alarm() typically does not play well with pthreads, rsyslog uses it in a very limited scope. Also, the alarm handler is activated only in one specific thread. But... "alarm() does not play well with pthreads"... In absence of any better solution, may this be the actual cause for this hard to hunt bug?

I will now see what is required to get rid of the alarm(). Actually, this is harder than it initially looks. For alarm() to go away, I need to set up a real background thread that does the timer ticks and that causes the mark messages. That, in turn, means that I have two concurrent messages sources, which involves quite a bit of synchronization. All of that is currently not needed as the alarm signal simply interrupts the select, which in turn leads to execution of nice sequential code. Oh, and yes: the alarm signal handler of course does (contrary to sysklogd) nothing then to set a global variable flag. So I'll have a look at all that...

found another part of the code ... that is not the problem...

I'd like to reproduce part of my conversation with Peter Vrabec over here. Together with others, Peter is very actively helping track down the nasty rsyslog segfault bug. He had quite a good idea of what could have caused it, resulting in me doing another focussed code inspection. I just explained to him why we have not yet found the problem. And there is a lot of detail in that description. Detail, I think, that others can benefit from, too. So here it comes:

> *** glibc detected *** rsyslogd: double free or corruption (!prev):
> 0x09cbd588
> ***
> (gdb) print (uchar *) 0x09cbd588
> $49 = (unsigned char *) 0x9cbd588 ""
>
> is it possible to call free on pszRcvFrom when it points at "" ?
> because I can see:
> msg.c:
> if(pM->pszRcvFrom == NULL)
> return "";
> else
> return (char*) pM->pszRcvFrom;
>
> same with getRawMsg/getUxTradMsg/..., places where mudflap
> screams in MsgDestruct().


Sorry - if I have not totally screwed up, it's impossible that this is the problem. I've done (yet another) thorough review. All of these functions are either called to supply another Set...() function (which simply copies over the "", but does not free it) or from MsgGetProp(). MsgGetProp() uses a variable (pbMustBeFreed) to track whether or not a buffer must be freed. In general, unmodified properties are never freed - this happens only at message destruction (MsgDestruct()). MsgDestruct() however, just checks the pointers and frees if they are non-NULL. So MsgDestruct() is not interested in getRawMsg() or another get() function returning an empty string except a NULL pointer.

I have also checked if MsgGetProp() and its helpers correctly handle pbMustBeFreed - and they do it right. So a buffer is only freed when it was dynamically allocated. Message properties are only created during message construction and are free()ed when the message is destroyed.

So I guess we have found another part of the code that does not cause the problem.

What I would like to re-iterate, though, is that the segfault problem seems to disappear under all circumstances if compiled with --disable-pthreads. IF SO (and this is my current assumption), it can not be a general logic error like a double free but must be a synchronization problem. Let me re-iterate: I have yet to find a single installation that has a segfault when running on a single thread! All folks with segfaults that compiled in single threading mode did no longer experience any problems. This is strong evidence to me.

But while I almost think I know it is threading related, I can not find anything wrong in that area, too. The threading model is very simple and construction/destruction of elements is spread in an easy way over the two threads. Basically, the listener thread creates the message and its properties, while the action thread destructs them. There are a few exceptions where properties are derived, but nothing really complex. Yet, I still think it is in the threading area - why does it work in single thread mode otherwise?

Hunting for the segfault...

Do you remember? We are still hunting for a segfault in rsyslog that is very hard to find. The biggest problem is that most people will never experience it. I do not experience it in lab, nor does anybody else who is currently working on the project. And without the ability to reproduce it, there is a lot of guesswork involved.

This is why we are asking for the help of our users (that means: you!). If you run rsyslog and experience a segfault, we would very much appreciate if you could run a specially instrumented version. Peter Vrabec created it and it contains debugging support as well mudflap support, which is a tool to track down nasty memory management errors.

The RPM can be found at:

http://people.redhat.com/pvrabec/rpms/rsyslog/rsyslog-1.19.10-2.mudflap.src.rpm

Please install it. You will notice that CPU usage is higher than normal, but in most cases this is harmless. If you are concerned, drop me a line with specifics and I'll happily address them.

If a segfault happens when you run this version, please send us
  • binary,

  • coredump and

  • /var/log/rsyslog.mudflap

  • hardware and OS information (which distro? multiprocessor? ...?)

Our sincere hope is that we receive enough reports to find something in common between them. So, please contribute your segfault info if you happen to have it. This is a great way to contribute to the project!

Wednesday, November 21, 2007

recent rsyslog changes

I am back to my routine of posting rsyslog changes. You may also imply that this means I am actually developing some things (and not just writing about it ;)). After I had a somewhat slow start today, things evolved quite nicely this afternoon. If I did not overlook anything important, I even managed to complete the "clean unload process" for loadable modules. That also brought me back to good working knowledge of the code. Actually, I am at least a day ahead of my schedule. But, of course, I'll check if I overlooked something - but that'll be tomorrow.

So on to the promised change log (it also covers some past days where I had not reported):

2007-11-19
- applied gssapi patch from varmojfekoj - gss-api is now supported
- added some debug message to ommysql
2007-11-20
- added user doc for gssapi patch from varmojfekoj - thanks!
- bumped version number to 1.20.0, because of new gss api functionality
2007-11-21
- begun to look at dynamic module unloading - this is currently a hack
and works with the mysql module only (which is the only one, so there
is no problem in practice. But it would be good to begin to do it right ;)
- added new modExit() entry point to loadable module interface
- added an identifier to command handler table - need to identify which
command handler entries need to be removed when module is unloaded
- added support so that linkedlist key can be used for owner handle
- enhanced llExecFunc to support deletion of list elements (on behalf of
user function being called, slight interface change)
- enhanced linkedlist class so that list elements can now be deleted based
on the key value they have
- created entry point so that CfSysLine handlers are removed on modExit()
- some cleanup
- modules are now correctly unloaded and de-initialized

going back to rsyslog coding...

As promised, I have started to look at the rsyslog code this morning again and done so in an effort to enhance it. My first target is unloading loadable modules in a "well done" way. So far, this is a hack that does only work because ommysql (and probably a postgres module basing on it soon) does not use some of the interface functionality. Namely cfSysLineHandlers do not work with the current code.

So what I am set now for is doing it right and making sure that a loadable module can be cleanly unloaded under all circumstances. That, of course, requires some interface changes, but nothing major (keep in mind the interface is not yet finalized!). This work provides the basis for upcoming work, which will utilize many more loadable modules for other functionality, too. So it is a critical task.

I have to admit, though, that I think I need another day or so to get fully re-acquainted to the code. There was really a toll from my absence and I begin to notice it ;)

Tuesday, November 20, 2007

rsyslog- what's next?

I posted an outline of my next actions on the rsyslog mailing list and would like to share it here as well:

I have thought about setting up a full lab for GSS-API before carrying on. For now, I have decided to NOT do that. I am sure that the contributors have tested it quite well and the code that I have reviewed looks excellent.

So I will pull it in as is and wait for some feedback from the field (with the assumption "no feedback" equals "OK").

I will then begin to look at the loadable module de-initialization. This is not really clean in the current release, but that's no problem because modules never get unloaded. However, in the long term we need this to be clean.

The mysterious segfault issue is still dangling. I was hesitant to do any larger-scale new development without fixing it. But given the fact that it is extremely hard to find, and obviously happens very seldom, I'll continue developing. I am right now looking into upgrading the dev machine to an x64 OS, where most of the problems happened. My hope is that I will see a segfault during further development work and then hopefully be able to tackle it. I still think that the segfault must be well understood and fixed before I go into some serious multithreading redesign. As such, unfortunately, this issue still holds some of the work scheduled for the next *major* version.

I thought I give you an update here in my end (will also post this to the blog for the others). Any feedback/suggestion is highly welcome.

Monday, November 19, 2007

syslog GSS-API usage notes

I copy over some usage notes from Peter Vrabec's intial announcement of the patch:

It adds a new commandline option '-g' to listen for a connection wrapped with gss-api and few new configuration directives:

for server:

$gsslistenservicename <service name>

for client:

$gssforwardservicename <service name>
$gssmode <encryption|integrity|none>

With gssmode set to "encryption" or "integrity" all tcp selectors will be forwarding messages via gss-api.


That's probably useful while I am getting up some real documentation.

GSS-API for Rsyslog

I just reviewed and integrated a patch from varmojfekoj into rsyslog - it provided GSS-API support. I have to admit that I have not yet fully understood what it does from a user's point of view. But I begin to have the feeling that this patch will probably be the most important addition to rsyslog in the later half of this year.

I also have not yet evaluated how this patch relates to the syslog-sec IETF syslog security effort, namely syslog-transport-tls. I guess they are related and the patch probably not only provided non-standard functionality but may even make it harder to implement the standard. HOWEVER, if we look at how slow moving that IETF WG is, I do not bother about any compliance problems. They can be dealt with later. What I find much more important is that we have a real-world answer to real-world security question and we do have this now. So what could be more important? ;)

I keep you updated on the progress.

being back...

I am now back in Germany for a week. I've now managed to get through all these nice "gifts" that were waiting for me while I was in Florida. Today, I think, I'll be able to resume work on rsyslog. First on my todo list will be solving a problem with ommysql (or, better said, providing some instrumentation to get hold of it) as well as integrating a very interesting patch sent over from the red hat folks. So ... stay tuned and expect updates to this blog to happen once again.

Thursday, November 08, 2007

New Travel Blog ...

While I am on the road, I thought it would be a good idea to create a travel blog. Said and done, it is already online -- of course with few content.

Those of you interested in rsyslog might like to hear that I will be back in the office next Monday. I'll then review what has happened and go for another round of enhancements.

Thursday, October 18, 2007

Getting off for launch viewing

A note to all fellow rsyslog users and contributors. I got a chance to view space shuttle Discovery's STS-120 launch next week. I will fly over to Florida tomorrow. I will be available via email, but my response time will be sluggish. My buddy Michael will handle things related to rsyslog in the mean time. Most importantly, he'll release 1.19.10 tomorrow while I am on the plane. Please keep thoughts and contributions flowing, they are very welcome.

recent changes

A couple of changes the last days. They are all now in cvs. Tomorrow we will release 1.19.10.

2007-10-15
- bumped version number
- changed ommsyql to allow for ":ommysql:" module specific action call method
instead of ">". This shall facilitate the creation of other plugins.
- added notes about new action module designator to module documentation
- updated user doc to new module action calling convention
2007-10-16
- updated rsyslogd doc set man page; now in html format
2007-10-17
- undid creation of a separate thread for the main loop -- this did not
turn out to be needed or useful, so reduce complexity once again.
- added doc fixes provided by Michael Biebl - thanks

Friday, October 12, 2007

rsyslog 1.19.9 released

I have just released rsyslog 1.19.9. It is now two weeks since the last release. I have taken some extra time to make sure that the release system (source tarball) now fits the packagers' and user's needs. Also, some time went into hunting the segfault, though this was still quite fruitless. However, mildew has identified that the segfault seems to occur only on 64 bit machines, which is a very good hint.

I'll now watch for comments on the 1.19.9 release and then see if I myself can do anything against the segfault early next week (I am setting up another x64 machine for testing). Later the week, I'll be heading for my space shuttle launch viewing trip, which will unfortunately mean that my focus will not be on rsyslog (by my co-workers will keep it up).

worklog for the past days...

Work on rsyslog was focussed on the new packaging and bug fixing. Most importantly, people had problems with the new tarball introduced in 1.19.8 and we have made sure that 1.19.9 will again meet the expectations. We've done a round of pre-releases on the mailing list and managed to get down to something that seems to be quite acceptable. The official 1.19.9 release will most probably be released later today.

Here are the details:

2007-10-08
- changed the threading to include one extra thread that runs the
mainloop. This was done to work around a problem with malloc/free. Note
that we are still running on two threads - the startup thread just waits
for the new one to finish. For a description of the problem I try to
work-around, please see:
http://rgerhards.blogspot.com/2007/10/could-i-really-reproduce-bug.html
Note that this is an experimental change, which will only stay if it
proves to fix the segfault issue we are dealing with.
2007-10-10
- changed the build system to use a single source tarball again (but
different makefiles for the main project and ommysql)
thanks to varmojfekoj for the patch
2007-10-11
- applied fixes from Michael Biebl:
1.) fix failing compilation of ommysql plugin (s/ommysql-config.h/config.h/)
2.) fix mysql configure check (although the default is no, we did
check for the mysql devel files)
3.) Create a separate Makefile.am for the doc files. This cleans up
the toplevel Makefile.am considerably and makes it much more readable
and maintainable.
3b) Assign the html doc files to html_DATA. This means, they are
installed to $(hmtdir), which by autoconf standards is
$(prefix)/share/doc/$packagename/.
4.) Reformat the SOURCES line to make it better readable and maintainable.
- integrated patch by varmojfekoj to tweak build system even further
- Peter Vrabec requested doc not to be "stored" in html_DATA - changed that

Tuesday, October 09, 2007

malloc/free anomaly cleared

Peter Vrabec provided very helpful information on the anomaly I experienced with malloc/free under mudflap instrumentation. See his report:


$ gcc -lmudflapth -lpthread -fmudflapth mud.c
.........----------
mud.c: In function ‘main’:
mud.c:27: warning: return type of ‘main’ is not ‘int’
./a.out
alloc p in thread: 0ea72530
alloc p in main thread: 0ea72460
freeing p from thread: 0ea72530
free done!
freeing p from main thread: 0ea72460
free done!
main thread exiting

gcc -lpthread -fmudflapth mud.c -lmudflapth
................................----------
mud.c: In function ‘main’:
mud.c:27: warning: return type of ‘main’ is not ‘int’
$ ./a.out
alloc p in thread: 1bffe3f0
alloc p in main thread: 1bffe440
freeing p from thread: 1bffe3f0
free done!
freeing p from main thread: 1bffe440
*** glibc detected *** ./a.out: double free or corruption (out):
0x000000001bffe440 ***
======= Backtrace: =========
/lib64/libc.so.6[0x32bde6e8a0]
/lib64/libc.so.6(cfree+0x8c)[0x32bde71fbc]
./a.out(__wrap_main+0x174)[0x400924]
/lib64/libpthread.so.0[0x32bea061b5]
/lib64/libc.so.6(clone+0x6d)[0x32bdecd39d]
======= Memory map: ========
bla bla bla

Note the position of the -lmudflapth argument. So, as it looks, the problem was really one of the instrumentation and not of rsyslogd itself. So, bad as it is, we are still back to hunting a bug that is hiding well. But hopefully we'll get somewhat closer when mudflap is now actually active... I'll see and post news as soon as I have them.

Monday, October 08, 2007

work-around for malloc/free issue created

I have now created a work-around for the malloc/free threading issue which creates an extra thread for the mainloop. The startup thread then simply waits for it to be finished. As such, there is never memory allocated in the "main" thread. At least for me, it works now with mudflap. I am still in doubt if that was the segfault issue (or just a bug in mudflap), but at least we can give it a try.

I will now see that I get some feedback. The next thing is change the packaging back to a single source tarball (by popular request ;)).

Friday, October 05, 2007

Could I really reproduce the bug...?

Today, I was able to actually test and debug rsyslog Not just looking at code and how it may work. No, real interaction and real crashes.

Things went well, but then I got stuck. Somehow, the segfault didn't make much sense. I found something that is related to the segfault user's are seeing. But is it really the actual segfault or just a side-effect of instrumentation?

With mudflap active, rsyslog crashes when freeing the message structure in the worker thread. The structure was allocated (malloc) in another thread, actually the "main" thread, that is the one rsyslog starts up in. Of course, I've first assumed I have messed up with the structure. But further analysis showed that I have not. So a bad feeling creeped in ... that there may be some thread safety issues with malloc/free. On the other hand, rsyslog is far from being my first multi-threaded program (but on a modern flavor of linux, I have to admit). I've used dynamic memory alloc in multithreaded apps for years now and without any problems. After all, dynamic memory is often a trouble-safer with multithreading.

Then, I have written a minimalistic program to check out threading functionality. Here it is:


#include <stdlib.h>
#include <pthread.h>

static char *pthrd;
static char *pmain;

static void *singleWorker1()
{
pthrd = malloc(32);
printf("alloc p in thread: %8.8x\n", pthrd);
pthread_exit(0);
}

static void *singleWorker2()
{
printf("freeing p from thread: %8.8x\n", pthrd);
free(pthrd);
printf("free done!\n");
printf("freeing p from main thread: %8.8x\n", pmain);
free(pmain);
printf("free done!\n");
pthread_exit(0);
}


void main()
{
int i;
pthread_t thrdWorker;

i = pthread_create(&thrdWorker, NULL, singleWorker1, NULL);
pthread_join(thrdWorker, NULL);
pmain = malloc(32);
printf("alloc p in main thread: %8.8x\n", pmain);
i = pthread_create(&thrdWorker, NULL, singleWorker2, NULL);
pthread_join(thrdWorker, NULL);
printf("main thread exiting\n");
}


Note: the code did originally contain sleep(1) in stead of the pthread_join()s now found in it. I was initially too lazy to do it right in this tester. I've been told this is bad, so I fixed it. The result, however, is unchanged.

... and now look at the output:

cc -O1 -fmudflapth threadtest.c -lpthread -lmudflapth
threadtest.c: In function ‘main’:
threadtest.c:27: warning: return type of ‘main’ is not ‘int’
[root@localhost rsyslog]# ./a.out
malloc: using debugging hooks
alloc p in thread: 095586d0
alloc p in main thread: 095587f8
freeing p from thread: 095586d0
free done!
freeing p from main thread: 095587f8
*** glibc detected *** ./a.out: free(): invalid pointer: 0x095587f8 ***

free done!
main thread exiting
*******
mudflap stats:
calls to __mf_check: 0
__mf_register: 5179 [524294B, 32B, 20981024B, 0B, 2365B]
__mf_unregister: 0 [0B]
__mf_violation: [0, 0, 0, 0, 0]
calls with reentrancy: 5132
lock contention: 0
lookup cache slots used: 0 unused: 1024 peak-reuse: 0
number of live objects: 5179
zombie objects: 0

As you can see, the free that is done on the memory malloc'ed in the thread I created manually works fine. But the freeing the memory malloc'ed in the main thread fails miserably (I've set MALLOC_CHECK_=1, for the records).

I am both stunned and puzzled. If that is really a problem, it is clear why rsyslog aborts.

... but can that really be? I have to admit I now suspect a problem with mudflap -- when it is compiled without it, everything works. But this applies only to the test program. Rsyslog doesn't as quickly abort compiled without mudflap, but it aborts in any case. So can there really be a problem in the way dynamic memory management is done and in which threads?

If you can contribute to the solution, please do. I really need any helping hand, this is probably one of the most strange situations I've ever seen [and, of course, all will clear up once I see where I have failed - as always ;)].

Feedback appreciated!

finally ... I can produce a segfault!

Thanks to mildew, I can now produce a rsyslog segfault. It even happens immediately at startup. Actually, it's a bit quick, I'd expected it to occur much later. But it looks similar enough to the bug reports I received, so I can hunt that one down (yes, it seems to occur persistently)! Chances are pretty good that, if it is fixed, the real culprit is found.

Ah ... and what was the magic? Mildew suggested to add mudflap to the project. I have to admit I was unaware of this capability. I guess, I'll now make it a standard part of the debug environment - but let's hunt the bug first...

homeland security can not secure its mailing list

This morning, I had in interesting find in my inbox. The department of homeland security seems not to be able (or not interested) to secure some of its public announcement lists. Read the original news here:

http://isc.sans.org/diary.html?storyid=3450

Of course, this is not a real thread - but I find it an interesting fact...

Friday, September 28, 2007

on ommysql packaging

I thought I share an interesting conversation that went over the rsyslog mailing list. Even though it buried in the mailing list archive, I think here is a better place to be displayed. Feedback is much appreciated.



> -----Original Message-----
> From: Michael Biebl
> Sent: Friday, September 28, 2007 10:23 AM
> To: rsyslog-users; Rainer Gerhards
> Subject: Re: [rsyslog] rsyslog 1.19.8 released
>
> 2007/9/27, Rainer Gerhards :
> > repeat message processing. The MySQL functionality is now taken out
> of
> > the core package, but its tarball is still contained in the main
> tarball
> > (so it is still a single download for everything). This is part of
> the
> > effort to fully support third-party plugins. Rsyslog 1.19.8 is a
>
>
> To be honest, I don't particularly like this change. It increases the
> work for package maintainers (like me). You now have to maintain two
> source packages. Having a non-standard tarball inside a tarball
> doesn't help there. It even worsens things, as stuff like "make dist"
> or "make distcheck" doesn't work anymore for a cvs checkout.
> There's also the problem of which version of the plugins will be
> compatible with the core version (upgrade scenarios, keeping them in
> sync, etc.)
> It also adds complexity for the developer (as he has to maintain an
> additonal set of build files).
> As you were talking about 3rd party plugins (with the emphasis on 3rd
> party) I don't understand the benefit of splitting out the mysql
> plugin as it is you who develops the mysql plugin, not a third party.
> Do you actually intend to create a separate tarball for each plugin in
> the future?
>
> What was wrong with the --enable-mysql configure switch? I don't see
> any benefits, only disadvantages. You know, if it ain't broken, why
> fix it ;-)
>
> Cheers,
> Michael

Hi Michael,

as I have blogged, I am not yet sure about how to handle the situation. I am also consulting with Autotools experts, any advise is appreciated.

Two packages seem useful, especially when more plugins become available (I myself think about email and a couple of other databases). Many folks also do not like the idea of having to have libmysql present on the system just to install rsyslog - with a core package and the plugin, those can only install the core (and that will probably the majority of cases).

What I have not yet found - and I have very limited expertise in this area - is how to do that in the best possible way.

Oh, and some more background: ones the plugin interace has matured (I expect this in 3 to 6 month), I intend to actually use ommysql with its own version number. Versioning will be handled by the interface (that part is already present, but no code for it yet as there is only a release 1 of it). So, in the medium to long term, ommysql *will* be a separate project - maybe one with a different maintainer, as I am no mysql guy.

Does this make sense? As I said, comments are much appreciated...

Rainer

Thursday, September 27, 2007

work log for 2007-09-26

work log for rsyslog:

2007-09-26
- applied patch provided by varmojfekoj to support building ommysql
in its own way (now also resides in a plugin subdirectory)
- fixed a bug in cvthname() that lead to message loss if part
of the source hostname would have been dropped

Wednesday, September 26, 2007

Did not manage to release 1.19.8 today...

Sorry folks, I didn't manage to release 1.19.8 today. There was simply too much other activity that required attention. So I couldn't finally decide on how to distribute from now on. Anyhow, it is my firm intention to release 1.19.8 tomorrow.

The good news, though, is that some folks are already testing that code. Maybe I get some feedback until tomorrow morning - and maybe that even helps me to get a better release...

needed to pull 1.19.7 release

Unfortunately, there is a serious bug in rsyslog 1.19.7 that can prevent UDP message reception. Totally. No message received at all. This somehow slipped through our own testing as well as test at some third parties.

The root cause of this problem is interesting: I changed an internal interface to make things more reliable. What I changed had some old code in them that did return's right in the middle of the code. I overlooked those returns and so an invalid state was returned.

The interesting fact is that the function now returns an enum type (rsRetVal). Previously, it returned an integer. For some reason, the compiler issued no warning when the old (accidentally remaining) code returned an integer. If I'd receive a "wrong type" warning, I'd probably spotted it before even doing testing at all.

Bottom line: what's wrong with my compiler settings?

Oh, and yes: I'll release the fixed version soon. It then will be 1.19.8 to avoid confusion. I now fight with the distribution system: I received a patch that allows ommysql to be build as a separate module. So now it is separate - even from a distribution point of view. That causes some grief for distributing it and can possibly break some things in distribution packages. I need to think how to tackle that in the best possible way...

worklog for 2007-09-25

2007-09-25
- changed ttyname() to ttyname_r() - not a real fix, as this part of the
code was single threaded, but better to be prepared for the future.
- changed localtime() to localtime_r()
- released 1.19.7
- applied contributed patch to improve repeated message processing
(see https://bugzilla.redhat.com/show_bug.cgi?id=303341)

Tuesday, September 25, 2007

rsyslog changes until 2007-09-24

Hi all,

I've been a bit lazy reporting what I've done with rsyslog. The primary reason is that it was a lot of review, which is quite boring to report in a blog. Today, I think, I'll do a new release, so at least here is my worklog for up until yesterday.

2007-09-17
- possibly found a bug in cvthname() that lead to a wrong size being specified
in a getnameinfo() API call - not sure, though, if it is "the" bug (actually,
it looks like it isn't). - this is EXPERIMENTAL
- fixed a bug that caused signal handlers in cvthname() not to be restored when
a malicious pointer record was detected and processing of the message been
stopped for that reason (this should be really rare and can not be related
to the segfault bug we are hunting).
2007-09-18
- split the function cvthname() for clarity. Also changed to using the rsRetVal
status return system
- removed some compiler warnings in regard to signed / unsigned comparison
2007-09-20
- code cleanup
- fixed a minor memory leak that occured when the %APPNAME% property was
used (I think nobody used that in practice)
2007-09-21
- more review and cleanup
- simplified code in shouldProcessThisMessage() for debug output
- changed strerror() calls to thread-safe strerror_r() variant

Thursday, September 20, 2007

What's going on with rsyslog?

I've not posted much the past days. That doesn't mean nothing has happened. I wanted to post a work log today, but I have to admit I have forgotten it on a machine I right now can not access. OK, first thing tomorrow morning...

In short words, we are still on the bug hunt. I am now again back to reviewing code, this time on a functional basis. I am checking everything based on the message flow, looking at functions as they are called. Today, I completed the review of the reception part (up to the point when it gets into the main message queue). Unfortunately, no serious problem found. I used the review, however, to clean out some nits, add a large number of new comments and even found a memory leak. The later one would currently most probably never occur in practice, but when syslog-protocol gets adopted, it would have hit.

Tomorrow, and probably then next two or three days I'll review the code that is executed once the message leaves to main queue. There is probably more meat for a bug in that part (its by far more complex).

I also think I'll release the cleaned-up version sometime soon - after all, it's better then what is currently released.

I keep you posted. Comments are always welcome.

Wednesday, September 19, 2007

a dedicated shuttle blog

Ah, the joys of blogging. When I started this blog a few years ago, I kept it focussed on one topic - and thus had a few other blogs for other topics. Then, these nice labels appeared in blogger. That looked like an ultimate solution to me: just put everything in a single blog and then use the labels (or tags, as others would call them) to generate topic-specific feeds. Now is the the first time that I really did this. And, guess what, it doesn't seem to work as nicely as I initially thought.

Mixing rsyslog/logging and a field trip to the space shuttle launch causes some confusion. I do not like confusion ;) So I have created another blog for my space shuttle launch viewing trip today.

I hope this will work out. I sincerely think it is in the best interest of all readers. The next days (as time permits, its obviously not a priority task...), I'll create some useful links to get the different pieces together if you are interested in the big picture. Also, I'll remove the shuttle posts here and change them to redirects.

Tuesday, September 18, 2007

paid services for rsyslog

As with a lot of open source projects, rsyslog funding is problematic. Of course, rsyslog currently is funded by my interest in it. And I am glad that Adiscon, my company, permits me to work on. The actual funding for all that action, kind of funny, mostly stems from closed source projects in the Windows world.

However, I'd like to see that over the years rsyslog can fund itself. In my point of view, funding should be provided by those that benefit most from it. So obviously, I do not expect any funding from private folks, people like you and me. Of course, everyone is invented to contribute and new code, doc and bug hunting is definitely a pro. On the other hand, companies (and other organizations, namely the government) take financial advantage of using it. Some of them also contribute in terms of time and code. This is great and much appreciated :)

For the others, I have begun to offer donations. Not surprisingly, this is not a real source of funding ;) But now I have taken an offer from sourceforge.net, which I think is interesting. They begin to offer a marketplace for open source services. So those folks that actually need help can be brought in contact with those that have experience. Sounds like a good idea to me and a fair way to fund projects.

I have created two test service offerings for rsyslog. One is to write a nice custom-created rsyslog configuration file. Well, with rsyslog's relative ease of use, I do not expect that many folks use that. But you never know and I am always curios.

The other one is especially targeted towards organizations who must prove they have "official" support for all software they use (I guess this includes at least a number of government agencies). For them, I have created a email support option for rsyslog. It guarantees responses to support questions and this is often needed for auditing and other purposes. Of course, we still answer all support emails and do not plan to stop that.

I have added one real goody, though: and that is that we will provide patches for past releases of rsyslog. As you may know, developers hate to fumble with old releases. And so there is very little motivation to look at an older release when there is a new one out. With the paid support option, however, there is some motivation to do that. So, again, I think this is fair: we are just offering a service that would otherwise never appear. It doesn't hurt any of the other users.

I am very interested to see how this works out. I would also be interested in feedback from the field. How do you like this idea? Do you have any other/additional suggestions?

space shuttle troubles...

Space Shuttle Strut Repair
Yesterday, some info leaked that the shuttle ... well, leaked ;) To be serious, there were reports that there is an unacceptable leak at a shuttle strut - hydraulics fluid seems to have been leaking. But yesterday the decision was made that a repair is actually needed.

Of course, it didn't take long for all sorts of rumors to appear. Some sources even said that the October mission would be canceled and moved to January - which is as far from being true as it only can be. In fact, the NASA source quoted above does not outrule there is a change in the launch date, but it is expected to be not a major hit. As far as I understand it, things may be moved a few days at most. There seem to be buffers all along the process, so I do not yet begin to panic ;)

... but I have to admit that this triggers bad memories. As I said, I flew in into Orlando last year to see the STS-115 launch (which I finally didn't make due to its long delay). With STS-115, all the trouble also started with launch delays, that time caused by a lightning strike. That, too, was quite some time before launch, at least if I correctly remember. It was not that early as the current problem, which leaves me with the firm hope that there is enough buffer time available even to launch on the target date of October, 23rd. Let's see...

Photo: In the Orbiter Processing Facility at NASA's Kennedy Space Center, workers secure the tool storage assembly unit into Discovery's payload bay. Photo credit: NASA/Amanda Diller

Monday, September 17, 2007

viewing a space shuttle launch..

Space Shuttle Launch
As some of you know, I am addicted to astronomy and space travel. Since long, I'd wanted to experience a space shuttle launch. Last summer, I got tickets for the STS-115 mission. I went down to Florida, went through a hurricane and ... had to leave without the shuttle being launched. Well, actually I could view it rocketing into space from far away (Fort Myers) when I had to leave home. That was really bad luck.

Unfortunately, I either could not go for the next missions or I didn't manage to get tickets (they sell out soooooo quickly that it is a real problem even if you type fast ;)).

Now, I was lucky enough to secure tickets and also manged to get enough vacancy to do a re-try. I am going to visit the STS-120 mission now. We'll fly in to Orlando and then move over to the Titusville/Cocoa Beach area. Getting it all together was far from being simple.

After I got the launch tickets, I needed to get flights and then find hotels. This list is sorted in descending order or rareness ;) While I had only a 3-minute shot at obtaining the tickets, the flights were quite complicated too. Hotels were available, but of course not those I hoped to find. Based on experience from fellow travelers, Titusville seems to have only one decent hotel. I can back this, as I found none on my previous trips. The one in question is the Hampton. Everybody seems to know, and I didn't get a room for the 22nd ;) But everybody also seems to expect the shuttle to take off on first launch attempt - because starting from the 23rd there were vacancies. I secured some of them ;)

I hope to have a really great launch experience. I hope I'll find time to post more of my experiences on the way to the launch here in the blog. I'll probably even start a STS-120 section on my site www.gerhards.net - let's see :)

Friday, September 14, 2007

recent rsyslog changes

Again, I am doing small changes, mostly review, at this time. So I have batched up some things. Probably I'll switch back to a more daily mode, as it allows to keep better in touch. We'll see (comments appreciated).

Here comes the log:

2007-09-10
- bumped version number to 1.19.6
- fixed a bug that in --enable-debug mode caused an assertion when the
discard action was used
2007-09-07
- added eCmdHdlrGetWord command handler
- added $ModDir config directive
- modified $ModLoad so that an absolute path may be specified as
module name (e.g. /rsyslog/ommysql.so)
- applied patch by varmojfekoj two fix two potential segfault situations
- cleaned up some signed/unsinged char issues
- released 1.19.5
2007-09-10
- bumped version number to 1.19.6
- fixed a bug that in --enable-debug mode caused an assertion when the
discard action was used
2007-09-11
- applied patch by varmojfekoj to change signal handling to the new
sigaction API set (replacing the depreciated signal() calls and its
friends.
- did a full review of all remaining TODO items in code - nothing of
importance found (but some minor nits like comments fixed)
- cleaned up compiler warnings
- applied patch by varmojfekoj to FIX a bug that could cause
segfaults if empty properties were processed using modifying
options (e.g. space-cc, drop-cc)
- fixed man bug: rsyslogd supports -l option
- checked -s/-l option and found that they work as expected - closed case
- added some comments in relation to -s/-l option
- released 1.19.6
- removed compiler warnings in non-debug mode
- fixed a bug that caused the CStr class to not honor the parameter to
return NULL on empty string - causes a mem leak and can create favourable
environment for other bugs (as it leads to empty hostnames)
- pulled 1.19.6 (no downloads so far), created new 1.19.6 with the bugfix,
re-released that (now tagged v1-19-6b in cvs)

Tuesday, September 11, 2007

bug hunting...

We have received feedback that the 1.19.x releases of rsyslog contain a bug that leads to a segfault after some time. Of course, this is a very bad thing to happen. After all, the primary goal for the rsyslog project is reliability and it should be able to survive even in the worst conditions (e.g. low system memory). So this bug is clearly unacceptable and has received highest priority.

Unfortunately, it is hiding very well. We are all looking into troubleshooting this beast. Thankfully, there is a lot of community support, especially with testing. What makes this bug so hard to find is, among others, the inability to reproduce it in lab. So the project is totally dependent on user feedback. As current reports have shown, we also currently assume that releases prior to 1.19.0 can also be affected - so it is not as easy as checking the changes.

I receive very valuable help by varmojfekoj. He is already a frequent and great contributor of patches. His skills and experience are extremely valuable and I am very glad to have him work on this project. Please give him a big hand. Thanks, varmojfekoj, the project would not be that far without your help!

While the bug itself is obviously a very bad thing, it has some good side-effects. Most importantly, the code is getting another round of very in-depth review. That review has already brought up some fixes for obscure situations. Situations that I'd expect to happen extremely infrequently (if at all) in reality. But now even those are fixed. And I am sure that the review will bring even more benefit as it continues. Of course, I'd be glad if we find the bug ASAP, but it is good to know that all that work also provides additional benefit (at least that keeps my depression level low ;)).

The bad thing, besides the bug itself, is that the bug hunt obviously defers other work. Most importantly, I refrain from making any changes that may not be related to clean up or bug fixing. I think it is not smart to introduce new code (read: complexity) at this stage of the project. And, of course, all resources should focus on fixing that bug, so there wouldn't even be time to enhance other parts of the code. So for all of you waiting for new features: please bear with us, the implementation schedule will slip a little bit.

If you experience stability problems, please report. Each reports helps us understand the problem somewhat better.

Friday, September 07, 2007

recent rsyslog changes

Now that I am back working on the rsyslog code, I will continue to provide information about what I am doing. In most cases, however, I will not provide daily logs. The reason is that I currently focus on new design and will leave rsyslog code mature. So changes will be relatively infrequent (at least I plan so ;]).

Here is the work log for the past days:

2007-09-04
- integrated patches from Michel Samia and varmojfekoj
- released 1.19.4
- changed some calls to CStr class to their "safe" counterpart - they could
cause program aborts if the object in question was an empty string
2007-09-05
- added some links to doc
- removed an invalid config sample from sample.conf
- changed part of the CStr interface so that better error tracking
is provided and the calling sequence is more intuitive (there were
invalid calls based on a too-weired interface)
- (hopefully) fixed some remaining bugs rooted in wrong use of
the CStr class. These could lead to program abort.
2007-09-06
- added forwarding information to sysklogd (requires special template)
to config doc

Busy at the moment...

Some might have noticed that I am not as active as usual on the rsyslog project . As this seems to turn out to keep at least for the upcomi...