Thursday, October 18, 2007

Getting off for launch viewing

A note to all fellow rsyslog users and contributors. I got a chance to view space shuttle Discovery's STS-120 launch next week. I will fly over to Florida tomorrow. I will be available via email, but my response time will be sluggish. My buddy Michael will handle things related to rsyslog in the mean time. Most importantly, he'll release 1.19.10 tomorrow while I am on the plane. Please keep thoughts and contributions flowing, they are very welcome.

recent changes

A couple of changes the last days. They are all now in cvs. Tomorrow we will release 1.19.10.

- bumped version number
- changed ommsyql to allow for ":ommysql:" module specific action call method
instead of ">". This shall facilitate the creation of other plugins.
- added notes about new action module designator to module documentation
- updated user doc to new module action calling convention
- updated rsyslogd doc set man page; now in html format
- undid creation of a separate thread for the main loop -- this did not
turn out to be needed or useful, so reduce complexity once again.
- added doc fixes provided by Michael Biebl - thanks

Friday, October 12, 2007

rsyslog 1.19.9 released

I have just released rsyslog 1.19.9. It is now two weeks since the last release. I have taken some extra time to make sure that the release system (source tarball) now fits the packagers' and user's needs. Also, some time went into hunting the segfault, though this was still quite fruitless. However, mildew has identified that the segfault seems to occur only on 64 bit machines, which is a very good hint.

I'll now watch for comments on the 1.19.9 release and then see if I myself can do anything against the segfault early next week (I am setting up another x64 machine for testing). Later the week, I'll be heading for my space shuttle launch viewing trip, which will unfortunately mean that my focus will not be on rsyslog (by my co-workers will keep it up).

worklog for the past days...

Work on rsyslog was focussed on the new packaging and bug fixing. Most importantly, people had problems with the new tarball introduced in 1.19.8 and we have made sure that 1.19.9 will again meet the expectations. We've done a round of pre-releases on the mailing list and managed to get down to something that seems to be quite acceptable. The official 1.19.9 release will most probably be released later today.

Here are the details:

- changed the threading to include one extra thread that runs the
mainloop. This was done to work around a problem with malloc/free. Note
that we are still running on two threads - the startup thread just waits
for the new one to finish. For a description of the problem I try to
work-around, please see:
Note that this is an experimental change, which will only stay if it
proves to fix the segfault issue we are dealing with.
- changed the build system to use a single source tarball again (but
different makefiles for the main project and ommysql)
thanks to varmojfekoj for the patch
- applied fixes from Michael Biebl:
1.) fix failing compilation of ommysql plugin (s/ommysql-config.h/config.h/)
2.) fix mysql configure check (although the default is no, we did
check for the mysql devel files)
3.) Create a separate for the doc files. This cleans up
the toplevel considerably and makes it much more readable
and maintainable.
3b) Assign the html doc files to html_DATA. This means, they are
installed to $(hmtdir), which by autoconf standards is
4.) Reformat the SOURCES line to make it better readable and maintainable.
- integrated patch by varmojfekoj to tweak build system even further
- Peter Vrabec requested doc not to be "stored" in html_DATA - changed that

Tuesday, October 09, 2007

malloc/free anomaly cleared

Peter Vrabec provided very helpful information on the anomaly I experienced with malloc/free under mudflap instrumentation. See his report:

$ gcc -lmudflapth -lpthread -fmudflapth mud.c
mud.c: In function ‘main’:
mud.c:27: warning: return type of ‘main’ is not ‘int’
alloc p in thread: 0ea72530
alloc p in main thread: 0ea72460
freeing p from thread: 0ea72530
free done!
freeing p from main thread: 0ea72460
free done!
main thread exiting

gcc -lpthread -fmudflapth mud.c -lmudflapth
mud.c: In function ‘main’:
mud.c:27: warning: return type of ‘main’ is not ‘int’
$ ./a.out
alloc p in thread: 1bffe3f0
alloc p in main thread: 1bffe440
freeing p from thread: 1bffe3f0
free done!
freeing p from main thread: 1bffe440
*** glibc detected *** ./a.out: double free or corruption (out):
0x000000001bffe440 ***
======= Backtrace: =========
======= Memory map: ========
bla bla bla

Note the position of the -lmudflapth argument. So, as it looks, the problem was really one of the instrumentation and not of rsyslogd itself. So, bad as it is, we are still back to hunting a bug that is hiding well. But hopefully we'll get somewhat closer when mudflap is now actually active... I'll see and post news as soon as I have them.

Monday, October 08, 2007

work-around for malloc/free issue created

I have now created a work-around for the malloc/free threading issue which creates an extra thread for the mainloop. The startup thread then simply waits for it to be finished. As such, there is never memory allocated in the "main" thread. At least for me, it works now with mudflap. I am still in doubt if that was the segfault issue (or just a bug in mudflap), but at least we can give it a try.

I will now see that I get some feedback. The next thing is change the packaging back to a single source tarball (by popular request ;)).

Friday, October 05, 2007

Could I really reproduce the bug...?

Today, I was able to actually test and debug rsyslog Not just looking at code and how it may work. No, real interaction and real crashes.

Things went well, but then I got stuck. Somehow, the segfault didn't make much sense. I found something that is related to the segfault user's are seeing. But is it really the actual segfault or just a side-effect of instrumentation?

With mudflap active, rsyslog crashes when freeing the message structure in the worker thread. The structure was allocated (malloc) in another thread, actually the "main" thread, that is the one rsyslog starts up in. Of course, I've first assumed I have messed up with the structure. But further analysis showed that I have not. So a bad feeling creeped in ... that there may be some thread safety issues with malloc/free. On the other hand, rsyslog is far from being my first multi-threaded program (but on a modern flavor of linux, I have to admit). I've used dynamic memory alloc in multithreaded apps for years now and without any problems. After all, dynamic memory is often a trouble-safer with multithreading.

Then, I have written a minimalistic program to check out threading functionality. Here it is:

#include <stdlib.h>
#include <pthread.h>

static char *pthrd;
static char *pmain;

static void *singleWorker1()
pthrd = malloc(32);
printf("alloc p in thread: %8.8x\n", pthrd);

static void *singleWorker2()
printf("freeing p from thread: %8.8x\n", pthrd);
printf("free done!\n");
printf("freeing p from main thread: %8.8x\n", pmain);
printf("free done!\n");

void main()
int i;
pthread_t thrdWorker;

i = pthread_create(&thrdWorker, NULL, singleWorker1, NULL);
pthread_join(thrdWorker, NULL);
pmain = malloc(32);
printf("alloc p in main thread: %8.8x\n", pmain);
i = pthread_create(&thrdWorker, NULL, singleWorker2, NULL);
pthread_join(thrdWorker, NULL);
printf("main thread exiting\n");

Note: the code did originally contain sleep(1) in stead of the pthread_join()s now found in it. I was initially too lazy to do it right in this tester. I've been told this is bad, so I fixed it. The result, however, is unchanged.

... and now look at the output:

cc -O1 -fmudflapth threadtest.c -lpthread -lmudflapth
threadtest.c: In function ‘main’:
threadtest.c:27: warning: return type of ‘main’ is not ‘int’
[root@localhost rsyslog]# ./a.out
malloc: using debugging hooks
alloc p in thread: 095586d0
alloc p in main thread: 095587f8
freeing p from thread: 095586d0
free done!
freeing p from main thread: 095587f8
*** glibc detected *** ./a.out: free(): invalid pointer: 0x095587f8 ***

free done!
main thread exiting
mudflap stats:
calls to __mf_check: 0
__mf_register: 5179 [524294B, 32B, 20981024B, 0B, 2365B]
__mf_unregister: 0 [0B]
__mf_violation: [0, 0, 0, 0, 0]
calls with reentrancy: 5132
lock contention: 0
lookup cache slots used: 0 unused: 1024 peak-reuse: 0
number of live objects: 5179
zombie objects: 0

As you can see, the free that is done on the memory malloc'ed in the thread I created manually works fine. But the freeing the memory malloc'ed in the main thread fails miserably (I've set MALLOC_CHECK_=1, for the records).

I am both stunned and puzzled. If that is really a problem, it is clear why rsyslog aborts.

... but can that really be? I have to admit I now suspect a problem with mudflap -- when it is compiled without it, everything works. But this applies only to the test program. Rsyslog doesn't as quickly abort compiled without mudflap, but it aborts in any case. So can there really be a problem in the way dynamic memory management is done and in which threads?

If you can contribute to the solution, please do. I really need any helping hand, this is probably one of the most strange situations I've ever seen [and, of course, all will clear up once I see where I have failed - as always ;)].

Feedback appreciated!

finally ... I can produce a segfault!

Thanks to mildew, I can now produce a rsyslog segfault. It even happens immediately at startup. Actually, it's a bit quick, I'd expected it to occur much later. But it looks similar enough to the bug reports I received, so I can hunt that one down (yes, it seems to occur persistently)! Chances are pretty good that, if it is fixed, the real culprit is found.

Ah ... and what was the magic? Mildew suggested to add mudflap to the project. I have to admit I was unaware of this capability. I guess, I'll now make it a standard part of the debug environment - but let's hunt the bug first...

homeland security can not secure its mailing list

This morning, I had in interesting find in my inbox. The department of homeland security seems not to be able (or not interested) to secure some of its public announcement lists. Read the original news here:

Of course, this is not a real thread - but I find it an interesting fact...