Tuesday, December 11, 2007

A design problem...

Folks, I am facing a design problem - and it looks so simple that I am pulling out all my hair ;)

I am currently preparing the next steps in modular rsyslog redesign. I am not sure yet, there are a couple of candidates what to do first. One is to add a real expression capability, another one is to add threaded inputs (which would be quite useful). In support of these enhancements, a number of things need to be changed in the current code. Remember, we are still running on large parts of the original sysklogd code, which was never meant to do all these advanced things (plus, it is quite old and shows its age). A cleanup of the core, however, requires some knowledge of what shall be done with it in the future.

My trouble is about a small detail. A detail I thought that should be easy to solve by a little bit of web search or doing a forum post or two. But... not only did I find the relevant information, I did not even find an appropriate place to post. May be I am too dumb (well possible).

OK, enough said. Now what is the problem? I don't know how to terminate a long-running "socket call" in a proper way under *nix. Remember, I have done most of my multithreading programming in the past ten years or so under Windows.

What I want to do: Rsyslog will support loadable input modules in the future. In essence, an input module is something that gets data from a data source (e.g. syslog/udp, syslog/tcp, kernel log, text file, whatever ...), parses it and constructs a message object out of it and injects that message object into the processing queue. Each input module will run on its own thread. Fairly easy and well-understood. The problem happens when it comes to termination (or config reload). At that instant, I need to stop all of these input module threads in a graceful way. The problem is that they are probably still in a long-lasting read call. So how to handle this right?

Under Windows, I have the WSACancelBlockingCall() API. Whenever I call that method, all threads magically wake up and their read and write calls return an error state. Sweet. I know that I can use signal() under Linux to do much of the same. However, from what I read on the web I have the impression that this is not the right thing to do. First of all it seems to interfere with the pthreads library in a somewhat unexpected way and secondly there is only a very limited set of signals available ... and none left for me?

The next approach would be to have each blocking call timeout after a relatively short period, e.g. 10 seconds. But that feels even worse to me. Performance wise, it looks bad. Design-wise it looks just plain ugly, much like a work-around. It looks like I needed to do something not knowing what the right thing is (which, as it turns out, is the right description at the time being ;)).

To make matters worse, I have a similar problem not only with the read and write calls but with other constructs as well. For example, I'd like to have a couple (well, one to get started) of background threads that handle periodic activity (mark messages immediately come to my mind). Again, I would need a way to awake them when it comes to termination time - immediately.

And, of course, I would prefer to have one mechanism to awake any sleeping thread. Granted, can't do that under Windows either, so I may need to use different constructs, here, too.

This is the current state of affairs. There is still enough work to do before the question MUST be answered in order to proceed. But that point in time approaches quickly. I would deeply appreciate any help on this issue. Be it either advise on how to actually design that part of the code - or be it advised where to ask for a solution! Really - a big problem is that I did not find an appropriate place to ask. Either the forum is not deeply technical enough, or there are some mailing lists where the topic is on something really different. If you know where to ask- please tell me!

[update] In the mean time, I have found a place to ask. Blieve it or not, I had forgotten to check for a dedicated newsgroup. And, of course, there is ;) The discussion there is quite fruitful.

No comments: