Monday, August 13, 2007

on reliability and the need to discard messages...

I am a bit involved in creating the next-generation on the wire syslog protocol with the IETF. Unfortunately, that activity had to starve a little while I was busy with getting rsyslog to its current state. However, I have not lost it out of my eye. And in fact the rsyslog implementation effort helped my sharpen my mind for what to suggest in the standardization area.

A recap: the IETF tries to standardize the syslog protocol for quite a while now. There were changing thoughts about what was needed, but this spring we agreed on bringing a draft up to the IESG. From there, we received a number of comments. Some easy ones, some of more substance. In July 2007, more or less one topic was left: that on what to do when a syslog sender blocks. I quote the relevant part of the IESG reply here (accentuation by me):

> First, this starts as an issue with TLS over TCP and the
> syslog base protocol.
> It can also arise teorethically for UDP, but as I understand
> not in practice for todays usage. When you are using TCP, in
> case the syslog sender generates events at an rate that is
> higher than available rate over the path used there will be
> build up of a queue. So I would like to have some words
> somewhere saying what you do when you build up a queue of
> messages that should be transmitted, but the queue simply
> keeps growing. What do I do? To me this situation implies
> that one starts discarding syslog messages and starts
> with the least important ones.
So I would like to have
> a paragraph on this.
> I also included UDP in this in the case that you actually
> have reserved or determined that syslog is allowed to use a
> particular amount of bandwidth, but not more. In this case it
> could be possible that one implements a rate limiter
> and run into exactly the same issue as for TCP
> Please do understand that if syslog was designed from scratch
> today it wouldn't get away without a congestion control that
> guarantees that it isn't missbehaving in any situation. But
> being an "old" protocol we are accepting less than that.
> However, we do require it to contain the limitations and
> assumptions that it can be safely operated with. Using it as
> it currently is used is not an issue because the networks it
> is used in has many magnitudes more bandwidth that syslog
> generates. However, what would happen if some one starts
> using syslog in low-powered, low-bitrate sensor network for
> carrying some information. In that situation syslog becomes a
> potential threat to the stability of the network as it
> doesn't react at all to congestion if run over UDP. Network
> failures are also sitation that are problematic to ensure
> that the inteded resources are available.
Thus we do
> like to protect the utility of what resources do exist.

And this sentence covers it all:

> Please seriously consider adding a paragraph about
> how one can thin a queue of syslog messages in
> the base protocol. This as I think it potentially applies
> to any transport.

This basic requirement caused some confusion in the working group (WG). I have to admit, it confused me at that time, too.

In the mean time, I kept working on rsyslogd. I didn't even realize that I had written code in 2005 (or so) to take care of what was requested by the IESG: if rsyslog runs in single thread mode and it forwards messages via TCP and the send call would block - then we have a problem. Because when we wait, we will potentially hang the whole system. This is unacceptable. So I coded rsyslog in such a way that it discards the message in such a situation.

When I moved rsyslog to a multi-threaded design, that restriction was removed. Now, there was a queue between the receiver and the (potentially blocking) sender. And, as we all know, queues are endless, so we'll never run into problems ;-]. Of course, that simplistic point of view survived only so long as we did not deliberately bash rsyslogd with lots of traffic. What then happens is pretty much the same as in the single-threaded case: rsyslog freezes. Mildew, a frequent contributor, discovered this problem and also sent a good fix: he throws away messages if the queue is full. Sounds familiar? Yes, there seems to be a real-world problem and there is also a well known cure for it.

In some essence, this is what the IESG is asking for - at least as of my understanding. But the IESG, obviously speaking with a lot of experience, goes a step further. What they recommend is not to discard newly arrived packages. The recommend a way to "thin out the queue". So we would not just blindly drop whatever comes when the queue is full. No, we should intelligently decide what to drop, and that based on the severity of the message. It sounds only fair that debugging messages get dropped in favor of emergency messages. And, yes, syslog has a field "severity", which is the right thing to use here.

I thought about implementing a "high watermark" in the rsyslog queuing engine. What that amount of the queue would be full, a filter should be applied to the message. So from that on, only messages with a given severity would be queued, while all others be dropped. Of course, those already in the queue would be sent, no matter how unimportant they are. So far, I have not implemented that algorithm. And I think I am now unlikely to do it that way. It would be much better to follow the advise and, once the queue becomes full, begin to drop messages based on their severity.

I can even envision the algorithm: go forward through the queue starting at the tail and look for messages with the least importance. Once you have found one, drop it. Then enqueue the (new) higher-severity message. When this is done, do not drop any more messages, because it may not be necessary. We can always drop if a newer higher priority message arrives. If, during our queue traversal, we could not find any message with lowest severity, retry with the next lowest severity. Run this algorithm until either a message to drop is found or the drop-severity has reached the severity of the to-be-queued message, which in this case is the youngest and thus the drop target. Of course, that algorithm needs to be tweaked in regard to performance, but basically it should work fine. I even think, a derivative of it will make it into a rsyslog feature request.

... and now back to the IETF. We still need wording for our upcoming standard (called an I-D). Now that I know what I look for, shouldn't it be easy? Well, to be honest I don't think its that easy. First of all, I am not sure if, in a protocol specification, I can demand such queue processing. It looks somewhat too algorithmic, too implementation specific to me. So I need to find a better way to phrase it. And of course, peer review will show if that makes sense or not (maybe we even get a better suggestion).

My first try at it is as follows:

In any implementation, a situation can arise in which an originator or relay would need to block sending messages. A common case is when an internal queue is full. This might happen due to rate-limiting or slow performance of the syslog application. In such cases, it is RECOMMENDED that the originator or relay drops messages of lower severity in favor of higher severity messages. Messages with a numerically lower SEVERITY value have a higher severity than those with a numerically higher one. To-be-dropped messages SHOULD simply be discarded. The syslog application may notify a collector or relay about the fact that it drops messages. If the underlying protocol supports congestion control, discarding of messages SHOULD be avoided.

I have to admit that this text is probably too lengthy and imprecise. But hopefully it is a starting point. I'll post it to the syslog IETF WG and we'll see...

Still, there is one important message to learn:
no matter how hard we try, there is always a chance that we must discard messages. It may be highly unlikely and it may be totally unexpected. But we can not absolutely avoid it. Not even an on-disk queue is large enough, that it can buffer any imaginable message queue. So loss of messages is inherent in any application, also in syslog and auditing. The question is, how we deal smartly with it and how we preserve as much evidence in our logs as possible (and keep the log valid in those cases!). This all sounds pretty basic and easy, we always need to remember ourself's of this fact...

1 comment:

Christopher Cashell said...

On a slightly related topic, have you considered some sort of disk-based buffering for TCP transported remote syslog?

Particularly if you are already logging the same thing remotely and locally, it would be really slick to track what logs have made it across the wire. In the case of connection failure, when the connection is restored, treat the local logfile as your log buffer, and replay any logs that haven't been sent yet.