Monday, January 24, 2011

Multi-Threading rsyslog's TCP input

A form thread made me aware that there seems to be an issue with rsyslog performance if TLS is used. The past two weeks, I have worked on a paper which looks in-depth at rsyslog performance an I came across a paper [1] that promotes writing servers in that "traditional" multi-threaded way (with a single thread per connection). It addressed some of my concerns, and I thought it is worth actually trying out this approach (I outruled it for several years and never again looked at it). As a result, I created an experimental module imttcp, which works in this mode. I put this to test, especially as that would also lead to a much simpler programming paradigm. Unfortuantely, the performance results are devastive: while there is a very slight speedup with  a low connection number (close to the number of cores on the system), there is a dramatic negative speedup if running with many threads. Even at only 50 connections, rsyslog is dramatically slower (80 seconds for the same workload which was processed in 60 seconds with traditional imtcp or when running on a single connection). At 1,000 connections, the run was *extremely* slow. So this is definitely a dead-end. To be honest, Behren, condit and Brewer (the authors of [1]) claim that the problem lies in the current implementation of thread libraries. As one cure, they propose user-level threads. However, as far as I could find out, User-Level threads seem not to be much faster under Linux than Kernel-Level threads (which I used in my approach).

Even more convincing is, from the rsyslog PoV, that there are clear reasons why the highly threaded input must be slower:
  • batch sizes are smaller, leading to much more overhead
  • many more context switches are needed to switch between the various i/o handlers
  • more OS API calls are required because in this model we get more   frequent wakeups on new incoming data, so we have less data available to read at each instant
  • more lock contention because many more threads compete on the main queue mutex
All in all, this means that the approach is not the right one, at least not for rsyslog (it may work better if the input can be processed totally independent, but I have note evaluated this). So I will look into an enhanced event-based model with a small set of input workers pulling off data (I assume this is useful for e.g. TLS, as TLS transport is much more computebound than other inputs, and this computation becomes a limiting factor for the overall processing speed under some circumstances - see [2]).

As a side-note: for obvious reasons, I will not try to finish imttcp. However, I have decided to leave it included in the source tree, so that a) someone else can build on it, if he sees value in that b) I may use it for some other tests in the future.

[1] R. Von Behren, J. Condit, and E. Brewer. Why events are a bad idea
      (for high-concurrency servers). In Proceedings of the 9th conference on Hot
     Topics in Operating Systems-Volume 9, page 4. USENIX Association, 2003.

Thursday, January 13, 2011

New Mailing List for Log Normalization

Thankfully, the interest in log normalization and the related libraries liblognorm and libee has increased. Up until now, I have handled discussions on this topics via the rsyslog mailing list. As conversations increase, this may be come an unnecessary burden for those only interested in rsyslog. So I have created a new mailing list named lognorm. I used this somewhat generic name, as I intend to use it for both libraries. This saves me some overhead, and I strongly assume that anyone interested in liblognorm will also be interested in libee (but to a lesser extent in the reverse direction).

Please subscribe to the new lists. Currently, it is a very exciting phase in log normalization development, so getting involved is a great way to shape things in the way you need it!