Friday, June 13, 2008

rsyslog work log

Past day's rsyslog work log:
- somewhat improved plain tcp syslog reliability by doing a connection
check before sending. Credits to Martin Schuette for providing the
- made rsyslog tickless in the (usual and default) case that repeated
message reduction is turned off. More info:
- applied some patches from Michael Biebl - thanks!
- released 3.19.7
- changed default for $KlogSymbolLookup to "off" (in beta)

Monday, June 09, 2008

coding to save the environment...

Each one of us has ways to help save the environment. Interestingly, there are even some options exclusively available to coders ;)

I was made aware by Michael Biebl some time ago that the Linux community is trying to save power by disabling unnecessary wakeups inside the softwares. For example, the Linux kernel has become mostly tickless, that is does not necessarily wake up frequently. This is important, because modern CPUs have power save modes: simply said, the less is computed, the less power is needed. An idle CPU uses least power. However, when an application wakes up, the CPU resumes into modes that need more power. Of course, the CPU does not know if what it needs to be executed is actually useful or could be done more smartly. You can find more details at the site

But why am I telling you this? The reason is that rsyslog, too, had (and unfrotunately has) some wakeups which are not strictly necessary. For example, the current imfile implementation needs to poll input files. This is the most portable way to do things, but it is not power-effcient. An update to a more power-efficient solution (on the platforms that support it) is scheduled. Next, the --MARK-- message generator runs periodically. This one can probably never be avoided, because the actual functionality is the periodic activity. Any user should think if the functionality is neeed and, if not, not enable this plugin.

Finally, and the core rsyslog awoke every 30 seconds to check if repeated messages need to be flushed. However, repeated message reduction is no longer the default and I assume that most people do not use it. The default is to have it turned off. But even when it was turned off, rsyslog still awoke to do the (unnecessary) check. I have changed this, rsyslog now only initiates the wakeups when they are actually needed. To many (if not most) environments this means rsyslog will never again wakeup the system if there is no need to. It's a small saving, but every little bit counts toward the ultimate goal. And given the potentially large number of systems running rsyslog, the saving may actually be larger than one initially assumes.

In the future, repeated message reduction will probably be implemented based on a different paradigm, which enables us to not run on a tick even when it is turned on.

In any case, I hope the new feature will contribute to - you name it - slightly longer notebook battery life and less environent pollution. And, ultimately, it's a interesting fact to see how a software developer can affect environment health these days ;)

The new functionality will be available starting with rsyslog 3.19.7.

reliable plain tcp syslog - once again...

As said in my last blog post on plain tcp syslog reliability, I have now implemented Martin Schütte's idea in rsyslog's plain tcp sender. I created an interface inside the netstream driver layer specifically for this method. It shall not be used by any other transport, and so it is clean enough to not affect any protocol other than the one that has the problem.

I have done some initial testing. It confirms the findings from my last post: the method works well in local, low-traffic environments, but fails in others. However, even in the fail cases, the message loss is somewhat less than without the work-around. So I think it is useful to have it inside rsyslog.

One thing that it affects is the current retry logic. Rsyslog used the work-around to resend the last message, which was most probably lost, after a successful reconnect. As connection loss is now much more reliable detected in some cases, the probabilty of message duplication greatly increases in them. In other cases, however, it may still be desirable to resend the last message. To (somewhat) solve this issue, I have now added a new config directive ($ActionSendResendLastMsgOnReconnect on/off) so that the user can specify his preferrence. The default now is to NOT send these messages on reconnect.

This functionality will be part of rsyslog 3.19.7.

rsyslog work

Past day's rsyslog work log:
- released 3.19.6
- added doc on suggested TLS deployment (rough picture, configuration
sample still missing).
- added new property replacer option "date-subseconds" that enables
to query just the subsecond part of a high-precision timestamp
- fixed a bug with the new property replacer option in timereported
property - thanks to Elizabeth for reporting it

Friday, June 06, 2008

getting a bit more reliability from plain tcp syslog

As I have described in my recent post, plain TCP syslog is unreliable by its very nature. However, Martin Schütte had a good idea that helps improve the reliability, at least under some circumstances.

Martin showed that by doing a recv() call in front of the send(), reliability improves. This is because the local TCP stack is then forced to evaluate whatever the remote peer sent to it. Without that recv() call, the local stack may not evaluate the response until later. Of course, this is implementation specific, but seems to be common. From traces, I have seen that there seems to be good reason to do so: the number of packets sent is slightly lower when not doing the recv(), so it looks like the buffering is improved. On the other hand, this costs reliability as a remote shutdown is not immediately detected. I have not verified the buffering behaviour by looking at the tcp stack sources, but the traces are evidence enough. At least for practical purposes.

The recv() suggested by Martin forces the stack to evaluate packets received. If the remote end has closed the connection, that makes the local stack detect the condition. Consequently, loss of connection can be detected at this stage. Without the recv(), this needs at least one other send() call and thus causes the loss of one message. So we increase the reliability of plain tcp syslog by being able to detect the error state earlier.

However, this approach is racy. For example, if the remote peer initiated the shutdown after we did the recv() (but before the send()), this will not help. Also, it does not help if there is high traffic. In this case, the remote peer may be forced to shutdown the connection in the middle of a message. Keep in mind that a syslog server has no way of telling the client it intends to shut down. So if it must, it has no other chance than to terminate the connection forcefully (maybe after a short timeout, depending on the needs).

So, in short, the recv() trick often works if we have:
  • a network with likely no packet loss
  • a network with low latency
  • low network traffic
Fortunately, this is the situation we experience in many entry-level local networks. So for them, doing the recv() provides an improvement over not doing it. Most importantly, plain tcp syslog will deterministically lose one message in all environments without the recv(). With it, there is a chance the message will not be lost.

The larger the environment, the more slim this chance is. Unfortunately, I assume that in enterprise environments that really need reliability, the recv() trick does not contribute much to improving reliability (because they do not have the perquisites outlined above). So for enterprises, plain tcp syslog is still not an option when there is need for reliable logging (e.g. in an auditing environment). Protocols with app-level acknowledgement are required there (e.g. RFC 3195, RELP or SETP).

I have thought long how this affects rsyslog. I have now concluded I will give the recv() trick a try. I think it can improve the situation, at least in some practical scenarios. The problem is that it is kind of a hack. I need to find a well enough way to implement it into my netstream driver layer that doesn't force the code to do something where it is illogical while at the same time keeping the driver clean to be used with more sophisticated protocols. So it is not a matter of adding a simple API call. I'll probably introduce a driver call which may do the recv(), only if it makes sense for that type of driver, and does a dummy return for those drivers that do not (need) to support it.

For completeness, I reproduce some of my comments on Martin's blog just so that I have a stable reference within a domain a manage (things too often disappear too quickly ;)).

  1. Rainer Gerhards says:

    Martin, I have read your update. This use case isn’t fixed either, at least not reliably. It is racy. You get the same results be just doing the send() without the recv() - just give it a try. ;) — Rainer

  2. Rainer Gerhards says:

    OK, I am not just ignorant ;) I’ve given it another try, while racy it looks like it indeed solves the problem in the majority of simple cases. I just wonder what happens under the hood. I’ll have a look :) — Rainer

  3. Rainer Gerhards says:

    Interesting. I’ve now done a wireshark trace. As it looks, the IP stack does NOT process the incoming connection request until after it actually sends the message. If you do the recv(), this forces this evaluation. I have not verified this with the tcp sources and I don’t think it makes much sense to do so. It’s probably a bit platform-specific, depending on what the IP stack does. Anyhow, it seems to greatly reduce the problem in non-failure cases and does not cause much performance pain. So it is worth adding it. I’ll now apply it once again to the real rsyslog and see how it performs there. I keep you posted. Rainer

  4. Rainer Gerhards says:

    OK, I have done some further stress testing. Unfortunately, the solution does not work as soon as the traffic gets higher. In client.c, you do a sleep. This sleep makes sure that the remote end has time to do its close and notify the client before the next message is sent. I modified you examples so that the sleep() is no longer used and 1000 messages are sent by the client. Then, the disconnect happens 1 byte into message 423 (stream was partly received). The next message I see is 701. The client also notices the problem at 701. So I have lost around 280 messages without anybody noticing it. Of course, on a non-busy server that does not happen, but it proves the point that even in a nice clean close scenario without packet loss you can lose data on a server restart. There is no reliable cure. I’ll now evaluate if it makes sense to add your suggested recv() logic for those cases where it helps. I am just unsure if these are a large enough sample to justify this code change. Anyhow… Please let me know any further findings you may have (did I overlook something)?


rsyslog work log

As you can see below, I am currently more involved in non-coding activities, like getting most reliability out of plain tcp syslog, IETF work and pondering the phpLogCon team with all those nice little things that are hard to implement ;)

Past day's rsyslog work log:
- released 3.17.3
- (finally switched to Fedora 9 on main dev machine;))
- enabled Posix ERE expressions inside the property replacer
(previously BRE was permitted only)
- provided ability to specify that a regular expression submatch shall
be used inside the property replacer
- Worked on Martin's suggestion for a more reliable plain tcp transport
- implemented in property replacer: if a regular expression does not match,
it can now either return "**NO MATCH** (default, as before), a blank
property or the full original property text
- Worked on Martin's new suggestion for a more reliable plain tcp transport
This one looks promising, at least for simple cases
- worked on plain tcp syslog reliability
- begun working on TLS doc (finally ;))
- bugfix: part of permittedPeer structure was not correctly initialized
thanks to varmojfekoj for spotting this
- bugfix: off-by-one bug during certificate check
- bugfix: removed some memory leaks in TLS code
- enhanced property replacer to support multiple regex matches
- IETF work, syslog-transport-tls
- begun doc on unreliable tcp syslog and a work-around based on Martin's