Monday, December 08, 2008
After this is now settled, I sat back, relaxed and meditated a bit about the root cause of the issue. Acutally, I didn't need to think very hard. The problem was introduced when I implemented the netstream driver class. During that implementation, I shuffled a lot of code to the now-modular interfaces. Among them were the access control lists, whose roots were kept in global variables at this time.
I screwed up the first time when I allowed them to remain global variables. We all know that global variables are evil, especially when making publically accessible. Now that we moved to a proper interface, I should have replaced them by a function call. Doing that in the first place had prevented the problem. Why? Because I just initialized the now-interface specific global variable "representative" with the value at time of interface creation, which meant NULL in all cases. So whoever used the interface, always got an empty list, which meant no access control was configured.
Any user-configuration still hit the global variable, which caused the ACLs to be created, but no part of the code ever accessed it any longer. One may argue if that is a simple coding error, and there is some truth in it, but I'd still say its primarily a design issue (bad design promises to provide the quick solution, but it seldom does...).
And as it always takes at least two faults to really screw up, the next major issue wasn't around to far. Rsyslog had not - and still has not! - a formal test suite that you can simply run each time code changes. I have begun to employ some limited test cases via "make check", but they cover primarily exotic aspects and do not yet contain any serious test case that involves actually running rsyslogd against any serious number of messages. One of the reasons is that I had no good tool for doing so, or that I considered building the test suite to be too expensive (in comparison what else needed to do). As a small excuse I would like to mention that some others have encouraged this view. But I always new it is a lame excuse...
So it exactly happened what usually happens in such cases: the test case vital to discover this problem was not present in the series of test I ran against the new code. As usual, the programmer himself tests whatever he thinks needs testing. And, also as usual, this means that the programmer doesn't test those things that he can not think of being wrong. Usually, these are the real problems, because if the programmer did not think of a potential problem, he did not implement, or at least carefully check, for it. This is just another example, why external testers are needed.
In open source, users adopting the devel and beta releases are often considered to be these testers. Quite frankly, I could not afford a full testing lab and continue developing the project. I think this is true for most open source projects. "Free testing" by early adopters is a major advantage over closed source. But this time, this failed, too. Probably the (small) club of early adopters also did not think about this issue. Maybe that's because the more knowledgeable folks prefer to solve this problem with a firewall, which is the better approach to use for various reasons (not to be outlined here, see security advisory for details).
Finally, the issue came up in the form of a bug report. Unfortunately quite later, month after the initial release. But it was reported and so I could fix it as quickly as possible once I knew.
The important lesson to learn is that it usually takes more than one error to cause real problems. But these things happen!
I think the case also strengthens the need for good, systematic testing. Some time ago, I began to look into the DejaGnu testing suite and asked the mailing list if somebody had some experience in it. Unfortunately, nobody showed up. I'll now give it another shot. There have been too-often small problems that were rooted in things not being consistently tested. Most often, it were only really small issues, like missing files, or some variables not defined in some conditional path. Since I improved my "make distcheck" settings, many of these small items no longer appear. Even the small set of current exotic tests reveal a problem from time to time.
So I think it would be wise to try to expand the test cases that rsyslog runs on regular basis. Frankly, I will not be able to create a full suite from the ground up. But the idea is if I once manage to get DejaGNU - or something similar - up and running, and acquire the necessary knowledge, I could gradually add tests as I go along. So over time, the tests would increase and we could finally very much better, automatic, that existing functionality is no longer broken by new features.
I will try to get the focus for my next release steps on DejaGNU. Obviously, any help in doing so is appreciated.
Monday, November 24, 2008
Let me think of security as a probability of security breach. S_curr is the security of the reference system without a root jail. S_total is the security of a hypothetical system that is "totally secure" (knowing well that no such system exists). In other words, the probability S_total equals 0.
I think the common ground is that a root jail does not worsen security. Note that I do not say it improves security, only that it does not reduce a system's security. S_jail is the security of a system that is otherwise identical to the reference system, but with a root jail. Than S_jail <= s_curr, because we assume that the security of the system is not reduced.
I think it is also common ground that the probability of a security breach is reduced if the number of attack vectors is reduced, without any new attack vectors being added. [There is one generic "attack vector", the "thought of being secure and thus becoming careless" which always increases as risk is reduced - I will not include that vector in my thoughts]
We seem to be in agreement that a root jail is able to prevent some attacks from being successful. I can't enumerate them and it is probably useless to try to do so (because attackers invent new attacks each day), but there exist some attacks which can be prevented by a root jail. I do not try to weigh them by their importance.
For obvious reasons, there exist other attacks which are not affected by the root jail. Some of them have been mentions, like the class of in-memory based attacks, code injection and many more.
I tend to think that the set of attack vectors that can be prevented by a root jail is much smaller than the set of those which can not. I also tend to think that the later class contains the more serious attack vectors.
But even then, a root jail seems to remove a subset of the attack vectors that otherwise exist and so it reduces the probably of security breach. So it benefits security. We can only argue that it does not benefit security if we can show that in all cases we can think of (and those we can not), security is not improved. However, some cases have been show, where it improves, so it can not be that security is not improved in all cases. As such, a root jail improves security, or more precisely the probability of a security breach is
0 < S_jail < S_curr
We can identify the benefit we gain is the difference between the reference system's probability of security breach and the system with the jail. Be S_impr this improvement, than
S_impr = S_curr - S_jail
Now the root jail is just one potential security measure. We could now try to calculate S_impr for all kinds of security measures, for example a privilege drop. I find it hard to do the actual probability calculations, but I would guess that S_impr_privdop > S_impr_jail.
Based on the improvements, one may finally decide what to implement first (either at the code or admin level), all of this of course weighted with the importance of the numbers.
In any case, I think I have shown that both is correct:
- the root jail is a security improvement
- there exist numerous other improvements, many of them probably more efficient than the jail
Wednesday, November 19, 2008
I expect that the new code will de-stabilize the project for some time and so I have now a feature-rich v3-stable release which will be available to everyone, at the price of less performance. That doesn't mean v3 is slow, but v4 is even much faster. So I can finally begin to experiment a bit more with the new v4 branch and don't need to think too hard that I may be introducing changes that are hard to roll into a stable within a reasonable time frame.
As far as the first v4-stable is concerned, I do not expect one to surface before February 2009 and, obviously, it will be not as stable as v3-stable is.
So the game is starting once again, and I hope you enjoy it ;)
Friday, November 07, 2008
When Vista came out, we at Adiscon immediately introduced the new event log monitor V2 service in MonitorWare Agent and EventReporter. That service worked well, but not many customers ever used it (who really monitors the workstation event logs...).
With the rise of Windows 2008 Server, we saw a notable increase in interest. And we finally got questions! While we always supported all properties, some of the former event log monitor properties are not available under the Windows 2008 event system. A number of customers asked if we could map them. Makes an awful lot of sense, especially if you have log analysis scripts that expect those fields. So I went to development (no longer working on Windows myself these days) and asked what we could do. I thought it would be trivial. But it wasn't. Some mappings seem to have really hard, plus we got the impression (from lack of discussion and coverage) that we are probably among the first to ever work in this area). But nothing can stop a good programmer ;)
So I was quite happy to learn today that we have finally manage to include a full emulation of pre-Windows 2008 event properties. The code currently is in beta, and is available both for MonitorWare Agent and EventReporter.
I am especially happy as the new emulation also makes it far easier for phpLogCon to work with Windows events in a consistent way. Not to mention that I like happy customers ;)
Thursday, November 06, 2008
If this tool turns out to be useful (judging on comments and access count over the next weeks), I will probably do some other online tools aiding in config generation. This will be part of the overall effort to make it easier to unleash rsyslog's full potential (all too often people simply do not know what magic they could do ;)).
Tuesday, November 04, 2008
Wednesday, October 29, 2008
Instantiating this is much quicker then building a machine with scripts from scratch, not to mention that we do not need to have the hardware in stock. In fact, we think about moving such functionality even to data center servers and thus be able to quickly switch between them if there is need to.
My syslog appliance could play a similar role in disaster recovery. While it probably is not appropriate to lose data (depending on use case), it may make sense to set up a new temporary appliance, just to continue gather data and provide analysis while the rest of the system is restored. Instant log analysis is probably a key thing you would like to have in your early recovery stages.
Tuesday, October 28, 2008
While we worked on the virtual syslog appliance, we have checked out various other appliances. They live up to this promise in very different ways. Some are really plug and play, while others are more a demo-type of a complicated system, where the user does not know what to do with the appliance unless he reads through a big manual. This is definitely not what people are after if they look for appliances.
With SyslogAppliance, I try hard to do things as simple as possible. I learned that I probably need to add some nice HTML start page, not only the plain phplogcon log analysis display. So I have now begun to do this appliance home page, just to see that displaying information is probably not sufficient.
I will need to do some basic configuration of the appliance, too. I was (and am) tempted to use something like webmin. But on the other hand, there are so many settings. I think most appliance user will never want to touch them. So a full config front-end is probably good for those in the know. But for the rest, a software appliance should come with the bare minimum of config options that are absolutely essential to do the job. For me, the "make everything configurable expert", this is a hard lesson to learn. Usability is top priority with appliances and usability means to present only those options that are useful to most folks (the rest will probably not use an appliance, at least not for anything but demo).
I thought I share this interesting thought on my way to creating great virtual software appliances. Besides logging, I have some other ideas (and all benefit from a great logging interface), but it is too early to talk about these, now.
Monday, October 27, 2008
However, rsyslog is much more capable than sysklogd. Among others, it is able to buffer messages that were received, but could not yet be processed. To remain compatible to the sysklogd of doing HUP, rsyslogd does a full daemon restart when it is HUPed. Among others, that means that messages from the queue are discarded, at least if the queue is configured with default settings. David Lang correctly stated that this may surprise some, if not most users. While I am still of the view that discarding the queue, under these circumstances, is the right thing to do, I agree it may be surprising (I added a hint to the man pages recently to reduce the level of surprise).
Still, there is no real need to do a full daemon restart in most cases. The typical HUP case is when logrotation wants to rotate files away and it needs to tell rsyslogd to close them. Actually, I asked if anybody knew any script that HUPs rsyslog to do a full config reload. The outcome was that nobody knew. However, some people liked to stick with the old semantics, and there may be reason to do so.
I have now implemented a lightweight HUP to address this issue. It is triggered via a new configuration directive, $HUPisRestart. If set to "on", rsyslogd will work as usual and do a (very, very expensive) full restart. This is the default to keep folks happy that want to keep things as backwards-compatible as possible. Still, I guess most folks will set it to "off", which is the new non-restart mode. In it, only output files are closed. Actually, the output plugin receive a HUP notification and can do whatever it likes. Currently, onle omfile acts on that and closes any open files. I can envision that other outputs, e.g. omfwd, can also be configured to do some light HUP action (for example close outbound connections).
The administrator needs to select either mode for the system. I think this is no issue at all and it safes me the trouble to define multiple signals just to do different types of HUP. My suggestion obviously is to use the new lightweight HUP for file closing, which means you need not to change anything for logrotate et al. Then, when you need to do a config reload, do a "real" restart by issuing a command like "/etc/init.d/rsyslogd restart". And if there really exists a script that requires a config-reload HUP, that should be changed accordingly.
Saturday, October 25, 2008
From a feature point of few, version 3 would have been good for at least three to four major releases, which I did not do just because to prevent you from coming scared by the pace with which we are moving ;)- so I think it now is a perfect spot to begin developing v4. I hope that we will see a first beta of that branch around xmas, which, I think, is a nice gift.
Friday, October 24, 2008
As a side-note, you may notice I am using a .de (German) domain. Thanks to the spammers, com, org and net domains are already used by spamming sites. And I thought it does not matter if we use a de domain. After all, we live in a time where domains from the Cocos Islands (cc) or Tuvalu (tv) are being abused for generic purposes, so why not use .de for a generic site, too?
Oh, and one interesting find: at least one person actually downloaded and tried the version of SyslogAppliance I uploaded yesterday. How do I know? I had forgotten to include the phpLogCon user in the README ;) [of course, this is fixed now ;)]
But you are probably interested in actual numbers. The current v3-stable (3.18.x) manages to process around 22.000 messages per second (mps) with DNS name resolution turned on and about double that value without. That's not bad, but obviously there is room for improvement.
Thanks to our combined effort, we have reached a state where we can process more than 100,000 mps and there is an experimental version (applying some lock-free algorithms) that goes well beyond 200,000 mps. I am not yet sure if we will pursue the lock-free algorithm. There are ample of additional ideas available and I am quite positive we can push the limit even further.
All numbers were tested with a minimal configuration (one udp input, one file output) on a capable multi-core machine. The numbers above are for sustained traffic rates. More messages can be accepted (and buffered) during bursts.
Thursday, October 23, 2008
For starters, I have created a very simple system. While I have a number of options for the operating platform, I started to Ubuntu JeOS mainly because it had good guides for getting started with an appliance quickly. Being based based on Debian also was a plus. Some may argue that the downside is that the log appliance currently requires VMWare. While I agree this may be an issue, it is not an extremely big one especially as VMWare server runs quite well under Linux and is free to use.
I will investigate Red Hat's AOS, but I think I need to get some results from the app point of view first and JeOS looks quite promising in this regard.
For now, I have even started with the stock rsyslog package, which is quite outdated on that platform. However, I'll do a couple of iterations in the next days and so will come up to the current release soon. But for what the appliance currently needs, the older version is not really a problem.
I am now very interested in feedback on this new offering. The appliance can be downloaded from
One of my next actions is to set up a dedicated site, which will make finding (and providing!) information on the appliance much easier. But one thing after the other...
Oh, and one thing on the licensing: the appliance is free for non-commercial use. However, we intend to request a moderate fee for commercial use, which I think is a fair policy. Of course, all appliance components are freely available.
If you try out the appliance, please provide feedback!. I have set up a dedicated forum at
syslog appliance forum
As I said, the initial version will probably not as "plug and play" as I hope, but I am very positive we are on a good path. Besides, it is an exciting project.
Tuesday, October 21, 2008
The interesting question, though, is if this forum will actually attract much attention. In German IT, there is a tendency to think that almost everyone speaks sufficiently well English so that he or she can obtain enough information to get the job done. If that proves true, there would be very little benefit in localizing any of the documentation into German language. So before seriously considering that, it is probably a good idea to do some testing. For the very same reason, my buddy Tom Bergfeld currently translates the rsyslog home page, mainly the announcements, into German. We will do a similar experiment to phpLogCon and evaluate both together after some time has progressed.
Please drop me a note if you have an opinion on this, or on localization at all.
Friday, October 17, 2008
One of the things we intend to investigate is create a logging appliance, using VMWare tools. We will set up a standard Linux, use MySQL and Apache with it and install rsyslog and phpLogCon. That in a "ready to use" fashion where only the devices need to be pointed to the right IP address of this virtual box.
This is one of my next projects and feedback on such an effort is very much appreciated.
Thursday, October 16, 2008
Download now : [some random crappy free webhost]
Send us an e-mail ,we will send you the working copy of software & as soon as you make the payment e-gold $50 1234567.We will give you the password .
Get the copy today
GoldTresor is a known software in E-Gold accessing since 2002. Now our software is very stronger than before. It accesses any E-Gold account in a few minutes and give you some ability such as transfer balance or know passphrase. Generally GoldTresor is used for those who forgot their passphrase. Also in new feature of E-Gold site that named AccSent, if you forgot your e-mail or you can't access e-mail account, this software helps you to find your e-mail address or transfer balance to other account.
Thank you very much
[some random gmail address]
In a conversation I had with a friend of mine who runs the syslog.org forum He pointed me to his site statistics: http://www.syslog.org/forum/index.php?action=stats. While we were talking about site usage, the top notch content item of this deeply technical site drew my attention. It is the spam you can see above.
Mutex, the site owner, once decided keep a sample of the what the forum spammer left for him. Very interestingly, over time this seems to have been evolved into a the top page for egold hack downloads. At least if you do a google search "hack egold download", his syslog site is right there in the top spot of the search results.
And what I find really interesting is that this spam accounts for nearly 10% of a quality site's traffic - a leftover just to show off the spammers. And, no, I don't think he gains much from that traffic, at least I tend to think that folks looking to break into e-gold suddenly turn out to be interested in the beauty of syslog ;)
Wednesday, October 15, 2008
I am also hunting another threading bug. This one manifests only when running on high-end hardware and seems to have to do with synchronization. Interestingly, the problem does not show up under valgrind (a memory and threading debugger), which usually points to flaws very well. Hard to find... Especially hard to find because I do not have the right hardware to reproduce it. I managed to get a faster box last week, but still it is not fast enough (and does not have enough cores...) to reliably trigger the problem. Since them, I have seen it a couple of times, but no indication yet of what is going on.
Also, I need to help some other Adiscon projects and do a bit of my consulting chores, so that the time allocated to rsyslog is a bit limited. But, hey, I have worked nearly full time on it this year and it has evolved very well. So I don't think it is a problem going at a somewhat lower pace for the time being (plus, of course, I hope some other sources of funding will appear which enable me to go back to "full time rsyslog" mode).
In any case, the future has exciting things to come up with. I personally would like to see a customizable message parser, which would enable users to work with different sender formats at the same time.
Tuesday, October 14, 2008
Suse now is doing an official rsyslog package. With that, I think there is now a rsyslog package for almost all major distributions available. Unfortunately, some of the current distributions do not have a good one, but it the next version will have. So over time things will be much better.
For now, let's welcome Novell/SuSe Linux to the camp of those that natively support rsyslog :)
Monday, September 29, 2008
I have now reviewed all of the discussion.
Let me start with the broken receiver. With 255 octets, any (generic)
fragmentation method would need to be ultra-compact which of course is
doable but not (with reasonable overhead) in the context we set up with
the version of syslog-protocol that will turn into a normative RFC.
Note, also, that RFC3164 will be superseded by that RFC once it is out.
As has been described here, fragmentation can either be done at the
protocol layer or at the application layer. In the later case, the
application needs to consider a sensible maximum and needs to emit
message sequences which are somewhat atomic. sendmail seems to do that,
and I know of some other examples, too. Some database servers seem to do
verbose logging in a similar way, in that they log parts of the
statement within different log messages. However, it is often quite
complicated to re-unite those application logs to the original message
(much of the complexity of log analysis stems from that).
A protocol based approach solves these issues. But as it has been
rejected by the syslog-sec WG it is not considered useful by the IETF
syslog community. We may want to try give it another shot, but that
should be done inside the framework layed out in the new IETF series and
as such it would not be a good solution for a broken receiver. Actually,
the new RFC series requires a minimum maximum length of 480 octets
(stemming back to IPv4 UDP available payload size). The recommended
minimum maximum length is 2K and more is permitted if sender and
receiver support it. There is no upper limit per se, but a receiver may
either truncate the message or even discard it as whole. If truncation
happens, it must truncate at the end and without paying any attention to
syntax and semantics of the message. This is specified in 's section
6.1 and it was the result of very elaborate discussions. Most
importantly, the syntax- and semantic-agnostic truncation was a
requirement out of this discussion.
As Tina mentioned, my company Adiscon and me personally are doing
Windows event log to syslog conversion for quite a while. Windows event
messages can be large and keep growing larger. We are converting them to
syslog for over 10 years now and at the time we started the only common
ground was the 1K limit already mentioned. Note that at that time some
implementations could experience serious malfunction if messages over
this size arrived (I remember the Solaris syslogd immediately
segfaulting as one sample of more). We thought about how to best address
the issue. We were tempted to do an app-level split, much as sendmail
does, but refused to do that for two reasons:
1) the messages to be logged did not originate from ourselves (other
than in the sendmail case). This implies that we do not exactly know
what makes sense to put together. While there is a potentially large set
where this can be properly concluded from context, there is also a set
where this is not the case. The later would have required a more
protocol-like generic approach and thus specialised parsers on the end
systems - something we did not really like.
2) this is somewhat similar to the parser problem. In general, log
analysis is even harder if a single logical log entry is distributed
over several physical records. Especially if you take into account that
the order of appearance does not necessarily (in practice almost never)
reflect the order of creation. So processing such a log requires a
consolidation phase. It is especially hard for a human reviewer do this
while reviewing logs and thus considered a big disadvantage.
So we looked at what was available at that time. While the 1K size limit
was universally accepted, most syslog receivers either supported larger
sizes by default, could configured to do so or being recompiled to
handle it (sysklogd, the then-omnipresent syslogd on Linux is a premier
example for the later - #define MAXLINE 1024 needed to be changed and
you were basically done [within UDP constranints]). The real limit
turned out to be the UDP max size, 64K in theory but with different
default/hard coded limits in various stacks. In 2005 a did a bit
research and found that 4K seems to be the typical real-world limit.
But even many years before, allmost every Windows event record fit into
4K (the exception being records with dumps in them...). Also, we already
had plain TCP-based syslog at that time, which did not experience any
So the practical solution for our Windows to syslog size problem was
simply to ignore it and tell customers how to configure/recompile their
syslogd. That worked on Windows at least for WinSyslog and Kiwi Syslog.
and on the *nix side at least for sysklogd, syslog-ng, some variants I
don't remember by name and now rsyslog. We recommended to switch to a
product that supported larger sizes where the stock solution did not do
that. Or we used interim, specialised, receivers who logged data into
separate databases or files.
When I was unable to convince the syslog-sec WG to specify fragmentation
as part of the syslog protocol itself, I was at least able to put that
spirit into the I-D: so we now have the ability to use large sizes if
everybody configures the systems correctly. Part of that spirit, funny
as it may sound, is to place important information early in the packet
as this improves chances it will actually be delivered.
I have to admit this is not a perfect way to do it, but at least it
works if everything is setup up correctly. The current main "problem" is
that RFC 3195 (somewhat vaguely) sets an upper limit of 1K for messages
and also does not talk about truncation. So if there is a RFC3195 system
inside a relay chain, the maximum size for the whole chain goes down to
1K - there is nothing we can do about this. This is also the reason why
a new revision of 3195 is needed. This is underway, as far as I know.
One should also note that this limitation is of no practical importance
for the time being (thus no real "problem"), because 3195 did not find
widespread support. To the best of my knowledge, the only commercially
available implementations are Cisco's and ours with us also providing
the only (more or less, due to low priority) fully supported 3195
implementation inside an open source syslogd. There was SDSC syslog, but
the project is to the best of my knowledge no longer alive. It also
never spread to become the default syslogd on any important Linux
distribution and can be considered "exotic" at best.
I hope this description is useful for you. The bottom line is that there
is no standard, and there is, at least was, no support for specifying
one. Even if we change that, the end-result will most probably not
support down-level reveivers below the 480 octet limit set forth in the
upcoming RFC series.
Thursday, September 04, 2008
As you probably expect, there was a bunch of work waiting for me when I returned. I am still suffering a bit from it. However, I managed to do some work on rsyslog. So I finally managed to get rid of the hardcoded syslog message size limit. This, of course, caused a lot of code to be touched. I did a pre-release on the mailing list, but I do not have the feeling that many tried it. Well, now it is the official devel and we'll see if we get into interesting parts of trouble.
The next thing on my agenda is the new documentation generation system. I got a lot of help from my friends at Red Hat Japan. Actually, I now need to fully understand the way docbook and the generation process at all works. I guess that will keep my occupied for a while. So please keep watching this blog, even though I may not have so many new posts for the time being.
Monday, August 04, 2008
To me, work (including rsyslog) is much more than just "doing something for a living". Of course, that aspect is involved, I can't deny that. But to be good at something, one must love what one does. So any work we conduct should ideally match our interests and be something we can be proud of (which also includes failing to deliver good work should make us ashamed and thus trying to fix the situation). Not everything, even well done, is "good work". Good work is work that benefits society at large. That doesn't mean I need to be Einstein - every garbageman also provides a useful service to society (and should be proud in what he does, provided he does it well). As a side-note, in that sense I do not see that any one work is more respectable than any other: people who try similarly hard to provide good service to society, each one with all the capabilities they have, deserve the same respect, no matter how large their contribution to society is being considered by other people. In fact, a highly educated scholar working on something light-hearted is in my opinion much less respectable than a garbageman who tries his very best in fulfilling his duties. As a side-note to the side-note, do not mistake respectability with material earnings: I see that some folks generate very large benefit for society at large. No matter if their motivations makes them respectable, they obviously deserve to earn more than others. Just keep in mind that respect and material things are two totally different aspects...
Back to the importance of work: I do not consider work to be something "external" to me. Instead, it is a very important part of my personality. Not the only one, and I don't try to assign priorities to different parts of my personality so I can't say if it is the most important one or not - but that doesn't really matter, I think. In that sense, if you help me succeeding in my work, you also help me succeeding in growing my personality. You help me being more proud of what I am doing because you help making it better, more well-known and, importantly, more valuable to society at large. And I hope that my contribution to your work (e.g. by providing some basis) will have a similar effect for you. What's more important is that the borders between "my work" and "your work" go away. So it becomes "our work", something we jointly work on, and something we are actually being tied together. And, in a sense, part of my personality becomes yours and vice versa. Doesn't that justify to also care a bit about the person who is behind that shared work? To me, I think so, even though we "know" each other only via electrons traveling a global network...
Friday, August 01, 2008
This is one of my few non-IT posts ;) Today is solar eclipse day. At my location (roughly 50N, 8E), we had a rough 10% occultation. But, as usual, the day started with a 100% occultation due to clouds. Nevertheless, I set up my PST solar scope, and a pair of eclipse "glasses". Finally, the skies cleared up around the time the eclipse began. Not great, still lots of clouds, but, hey, this time I wasn't after chasing fine detail. The moon could clearly be visible in front of the sun and this was what counted.
The h-alpha view in the PST was quite different, in my personal observation, from what I saw through the eclipse glasses. Most interestingly, when I initially looked I actually did not see a part of the sun being dark, but I rather "saw" the moon in front of it. I actually envisioned the moon's globe. I thought I could even see its structured extended into the dark of the sky (especially in the PST), as an extension from its image "inside" the sun. That was rather strange. In fact, it took me some effort to forget about the moon and "just" see a piece of the sun "bitten off". I guess this is again a good proof on how much our senses (or better our brain's interpretation of what the senses deliver) depend on our knowledge and expectation.
As a side-note, I gathered my family around the scope and we all had a great time viewing (though me was obviously the one who had most endurance). And I should probably also mention that I screwed a biz meeting by running out loudly telling "the clouds are gone" ;). And at another time a fellow astronomer called in remembering my that I should watch. Great when people care :). And, finally, after around 75 minutes (around three quarter of the eclipses duration), it just appeared to me that I had special solar-observing save plastic film, which I than quickly moved in front of my camera. So I even have a few shots of the eclipse (one of which you can see right here at the top).
While I could not see the initial phase of the eclipse, I could see the moon move away from the sun. Very interesting.
All in all, this was a very motivating day, a real highlight. And now on to the usual stuff...
Friday, July 25, 2008
I am very happy with this development. However, it also shows a downside. Rsyslog, for obvious reasons, "offers" much more chance for misconfiguration than its feature-bare ancestor sysklogd. While I tried hard to make configuration simple and intuitive, I managed to succeed only in the simple cases. If you configure rsyslog for complex needs, I have provided ample ways to screw up ;) That situation will hopefully get better with the new scripting engine, but even that will not be able to totally resolve the issue. To help people getting over this phase, rsyslog offers a myriad of diagnostics. Whenever something is wrong, it logs at least one message telling you (and often a couple of them, helping to identify the culprit).
With some frustration, however, I begin to see that many people never see these diagnostics. Many people try a configuration and just notice that it does not work. They never look at rsyslog's own detailed error message (and some even do not write them anywhere, so they actually have nothing to look at ;)). So it has become quite common that for questions raised on the forum, we go lengths through the process of obtaining (large) debug logs, just to see that the debug log contains the otherwise-ignored error message that explains it all.
A (too) simple approach would be to blame users: why, the heck, don't they pay attention to their system setup. Beside being lame to blame users, it doesn't help solve the issue. The root cause seems to be that people tend not to know where to look for help. One approach is to improve documentation. I've gone some length on that, but what comes into our way is that people (including me ;)) tend to ignore documentation until there is no way around it (at this point, almost always too late, the foolishness of that approach is proven ;)).
An additional complexity is that people need to have at least some working configuration, including permissions, to make rsyslog record its error messages. A while ago, I thought about an internal message buffer, for error messages and maybe others, that can be viewed via a web browsers (this implies a simple http server inside rsyslogd). So folks with problems could simply point their web browser at rsyslogd and see which diagnostics exists. This solves at least the problem of recording and finding the messages. It does not solve the issue of people not knowing to do that, but it improves the situation: it is easier to tell "just point your web browser to" then to instruct people on how to review log files or create and send debug logs.
Up until now, I have stayed back from this approach because of security concerns I have. Such a http server, especially if it would also enable to view some of the live log data and live state information, could be an ideal tool for attackers. Of course, I can disable it by default and limit its features. But that is counter-productive in regard to the simple troubleshooting case. So I stayed back from it.
Now, seeing that the support problem (and pain people experience) is becoming worse, I have taken another look at this approach. I now think there is a compromise: I can create a separate plugin, one that can be loaded via a single "$ModLoad", but is not by default. So the system is secure by default but it is easy to instruct people to activate advanced diagnostics capability. It is obviously also easy to disable them, but what if people forget (and we tend to, don't we?). That would leave the attack vector intact once it was enabled. I think I now have a cure: What if the plugin automatically disables itself after a given period of time. Looking at the use case, it would be sufficient for most cases if diagnostics are available for 10 minutes after rsyslogd restart, but not any longer. I think I will now take this route. While it leaves the attack vector open, it mitigates the risk to a time frame, which usually is very short compared to overall runtime. And, after all, this is just a second safety measure. In the first instance, people should disable diagnostics once they no longer need it. Enabled diagnostics probably warrant a (startup) log message in themselves, so someone who cares to get an error-free startup should not forget about that (and those who don't care, don't deserve any better but still have a safety belt...).
I think I will try to implement a testing plugin in the current devel branch. This was unplanned work, and obviously pushes away other work. In the long term, however, I think it is very important to help folks getting up and running easily, so changing the schedule to solve that need is justified. For my testing purposes, though, I will not start with a http server but with something much more simplistic - just to grasp the essence of this approach.
I will reproduce my posts/definitions here in this blog entry. To save me some time, I more or less copy and paste my text to this page. I think it should be sufficiently self-contained to grasp the idea. I reproduce my thoughts for two reasons: first of all, I would like to have a handy reference to my view of the logging world (so that I can refer anyone who asks to this page). Secondly, I think it is useful for all those of you who dig a bit deeper into rsyslog, phpLogCon and the rest of the MonitorWare line. All of these softwares are build around the definitions I am giving here.
Please note that I may update this page when I write new content. This helps to keep things tidy together. For you, unfortunately, this means you need to check this page from time to time...
As always, feedback is highly appreciated. You may either comment on this post or mail me.
So here now comes the "meat":
I actually tend to agree that an event is *always* *associated* with a state change. The state change, however, may be subtle and not always obvious.
Let's take this post excerpt as an example:
some logs which don't indicate a state change such as login failures,
At first, a login failure seems to have not changed any state. After all, nobody was logged in before the event and nobody is logged in after it. HOWEVER, here our view is just not granular enough.
Let me take the position of the login machinery. That machine obviously has states for "awaiting login", "authentication requested", "auth success" and "auth failure" (among many others). What happens in the failed login case is that the machine transits between these states:
"awaiting login" --> "auth req" --> "auth failure" --> "awaiting login"
Obviously, state in this subsystem has changed three times. An upper level (e.g. a user process creator) may not have seen a state change, because the transitions did not request one. So at the upper level, we have no state change.
Depending on the granularity of logging we would like to have, we need to look at state changes at different levels. If I am interested in security logging, I most probably should look at the login machinery. And there, I see three transitions. As such, I may report three different events.
In a somewhat more abstract form, I even think an event is *always* bound to a state change. If we look closely enough at an entity, and this entity does not change state, so what can we then report? Is it noteworthy to report that entity x remains in the same state a it had a while ago? And even if so, couldn't one say that the state of x has changed in a subtle way - time needs to be considered part of the entities "state set".
With the "state set" I mean anything that makes up and influences the state of the entity. It includes all properties that differentiate between different states. The state set is obviously defined by the entity, but also by the needs of the observer (so the same entity may have different state sets if different observers look at it). For a given observer (use case) and a given entity, there is exactly one state set, consisting of mutually exclusive states and a set of properties that identify these states.
In this point of view, an event is actually another word for a state transition. If there is no state transition, there is no event. And why should there be one? Let's assume the use case requires tracking of (identical) states over time. As such, elapsed time is part of the state set. Now let's assume all other properties of the state set remain identical, but time elapses. Now we still have a state transition, because the state set changes between time "than" and time "now". This justifies an event. If, however, the use case does not require tracking of continuity, elapsed time is not part of the state set. In the same example above, we do now not have a state change (all other properties remain identical) and consequently no event happens. This may sound like loss of information, but we defined that the observer is not interested in elapsed time. As such, from the POV of this observer, nothing happened. So we are correct in not raising an event.
From this I conclude:
a) there is no such thing like an "absolute event" - an event is always relative to the needs of the observer
b) thus different observers may have different perception of what justifies raising an event
c) what the observer is interested in is defined in the "state set"
d) an event is generated when a change in the "state set" is detected
Trying (and probably failing) to get a short grip on this, I would say "An event is a change in observer-relevant properties of an entity". Which, of course leads to the need to define observer, observer-relevant, property and entity...
I haven't looked at CEE enough to know if there are definitions of these (or similar) terms. Actually, when I looked at CEE some very long time ago I have to admit it didn't look very appealing. But I now have subscribed to the mailing list and will see if I can contribute (or use ;)) to/from the work.
An interesting side-note of a) is that one may (try to) define a superset of state sets of all possible observers. Actually, I think this is what auditing people try to define when they design audit points. The knobs that trigger specific events provide the ability to limit the actual (running) state set to the subset that the configuring observer is interested in. And if you ever wanted to audit anything but did not find an audit point in the software for it, auditing designers made a wrong assumption on the required state superset ;)
> -----Original Message-----
> From: email@example.com [mailto:loganalysis-
> firstname.lastname@example.org] On Behalf Of Anton Chuvakin
> Sent: Friday, July 25, 2008 6:14 AM
> To: email@example.com
> Cc: firstname.lastname@example.org
> Subject: Re: [logs] How to define Log, Event, and Alert?
> Good point. So:
> Event = something that happened on a system.
> Log = a TIMED record of the above occurence.
I'd replace TIME by "sequence identifier", with sequence identifier defined as being something that is monotonically increasing. A timestamp is an object that we think to be a natural example of a monotonically increasing function. HOWEVER, if we look at existing technology, this is not always the case. In fact, it is more often NOT the case than it is... If we have two systems a and b and these systems do not have time synchronized, and have a system c which is the event collector (and collects only events from a and b), then c may record time stamps inside its log that do not monotonically increase. For example, it may record:
02:00:00 event a1
01:00:00 event b1
02:01:00 event a2
01:01:00 event b2
Of course, this is still a TIMED record of occurrences. However, in this sense this is used, "TIMED" includes a sense of temporal order (at least to me). In the above log, we may not have the correct temporal order. We may be able to reconstruct it by sorting on the timestamp. That would be a valid approach if the timestamps are indeed correct (compared to universal time). But if a and/or b has incorrect time, we would create a wrong temporal order. Indeed, in this sense the monotonically increasing identity of the log in question would actually not be the timestamp but rather the *sequence of recording*, kind of a meta-property not directly contained in the property set of the individual event record (but rather obtained by its relationship to its predecessor in the log file).
Now let's assume a log without a timestamp. These things happens, e.g. in debug logs (and all too often in others I have seen).
If we define
> Log = a TIMED record of the above occurence.
such a "log" would obviously not be a log, because it does not fulfill the requirement to be timed.
If a log instead is "a record of events with a sequence identifier", that problem does not exist. The sequence identifier in that case would be the derived property I mentioned above.
The question remains if such a definition is actually useful. The sequence identifier is obviously something with very vague semantics. They depend on the observer as well as the correctness of the "sequence identifier generating function" on all systems in question.
Let's get back to the simple case of timestamps: as outlined above, the semantics of timestamps depend on time sync. Even if there is ntp timesync, timestamps (with reasonable precision) are always questionable. They are approximate, even on the same system. With standard syslog timestamps (second precision!) the problem is easy to see: one may receive hundreds of events within the same second. So even if time is correct, an observer is unable to detect any order of events. If looking just at the timestamps, one must conclude that all events happened at once. If looking at the semantics of the messages, one most often also can conclude this is impossible (e.g. how to delete a file before it is created?). Obviously, the timestamp alone is never sufficient to detect order of events, even on a single system. Granted, for practical purposes a high resolution timestamp (with good time synchronization) is most often a sufficiently well approximation of the time an event happened. But do you really trust it? ...always? Have a look at your own correlation engines: do they work on pure timestamps - or do they include some other properties, like the order of event log records inside the log?
Now let me try to define what I think a log actually is:
An EVENT is a set of properties that describe a state change (in the sense I have described state change yesterday). The contents of this set is depending on the entity who's state changes as well as on the observer. [so it may actually be a set of two sets: entity-related properties and observer-related properties]
An event is generated when a state changes.
An EVENT RECORD is the physical representation of an event inside an event log. It is a set of (at least) two sets: the event's set itself as well as a set of meta-properties which describe the event record (e.g. it's size, it's order inside an event log record, it's format, ...). I think this has a tendency to contain observer-related properties, too. Some Meta-properties are obviously event log related (like sequence of recording).
Finally, an (EVENT) LOG is a set of event records. Please note that according to this definition, the stream of syslog messages flowing from system a to system b is also an event log - not just log files or database tables. So an event log does not necessarily need to be persistend.
There is no inherent order inside event logs. In practice we experience a "natural order" (one record begins before another), but that actually is a meta-property of the event record. So we can order event records based on their meta-properties. It just happens that a text log is physically ordered by the sequence meta property.
[side-note: if we look at a UDP based transmission of an event log, we notice that the sender's perception of the event log is different from the receiver's: the physical order of event log records may be different, as well as their absolute number (if UDP discards messages)]
Finally, different log files, database tables, log stores in general are just partitions of the event log set. This also explains why we need to merge different log files if we would like to have a broader view of events that happened inside our system: We need to re-unit the partitions that contain things of interest for our needs so that we build a sufficiently large set for our analysis algorithm (that set being a complete partition of the log set in regard to what we are interested in).
[side-note: if we need to unite such sets, we often experience the problem that we do not have a truly identifying function for event log records, which makes it often really hard to create real supersets, where no two elements are the same]
Again, I see that my definitions are not really one-liners, but I think if we try to use too-simple definitions, we end up incorrectly describing the objects we are talking about. That will most probably lead us to wrong conclusions, and, in a very practical sense, to a) wrong implementations and b) wrong actions taken on the conclusions we think we have drawn (but were invalid).
Tuesday, July 22, 2008
Special thanks go to Michael Biebl. He maintains the rsyslog package on Debian. Without his hard work and very solid understanding of the Debian platform, rsyslog would never have been able to be ready for prime time on that platform.
For those who enjoy the good news as much as I do, here are two relevant links:
While it is joy to read about these developments, it is also a chore of the rsyslog project to live up to the trust that is put into it. We are well aware of this and will keep rsyslog in good shape for all who depend on it.
Monday, July 21, 2008
To me, it is an interesting experiment. It took me quite some effort to (try to) phrase things in a very generic way (mostly) without any technical vocabulary. I am now looking forward to the feedback I receive (if any). It will be most interesting to see if I can capture some momentum on this blog. If you understand German, you may want to have a look yourself - and be it only to see if I messed up ;)
Thursday, July 17, 2008
Thinking about it, this is actually a buffer that performs a delayed write, but while the data is already accessible to the user. Delayed writing as such may be useful in its own right. Given that rsyslog already has queues, it may be worth to extend it to a mode where data is kept in the queue until either a configured amount of messages is queued or a configured period of time has elapsed. That could be used to generate more efficient, burst disk writes. For example, we could say that log messages be flushed only if there are at least 10 or at least one is older than one minute.
Note that this is a use case quite different from what ramlog addresses. Actually, they could even be combined. If you like this idea, please tell me, so that I can consider implementing it.
Wednesday, July 02, 2008
I have not posted any work log for long. Even worse, I have not fully kept up with the log. So what's going on? Is rsyslog development pace going down?
No, definitely not! I am just currently concerned with helping users, doing some labs myself, working a bit on phpLogCon and thinking about how to best do the scripting engine.
Oh yes, the scripting engine... I am promising and promising it. The reason it takes so long is that I want to have a perfect solution on the first shot (well, as close to perfect as possible...). And I want to keep it backwards-compatible to what we currently have. This involves a lot of things under the hood, and it requires very careful planning to do it right. In fact, I am even re-evaluating some theoretical concepts for the scripting language, maybe it makes sense to adopt a few more thoughts. But, even though you do not see it, I am still very focussed on it. It is the next big thing to happen.
Just to get you an example of the subtle problems: how to *best* handle the "*.*" PRI selectors that are in the current config at the rule of almost all filters. If that is not mapped properly, it can have an enourmous performance hit. To support it, I may need to do a constant folding phase in my interpretative language, something that I so far did not consider as needed. And I am still trying to avoid this (it offers little benefit for almost all other cases).
Whith that said, on to the (partial) work log:
Past day's rsyslog work log:
- created autconf support for a rsyslog testbench
- created the first very rough beginnings of a RainerScript testbench
- disabled warnings caused by third-party libraries
- enhanced gtls driver to support fully async operation (there was
a bug before if gtls did issue EGAIN without getting any data)
- bugfix: comments after actions were not properly treated. For some
actions (e.g. forwarding), this could also lead to invalid configuration
applied to beta branch
... something missing, see git log ;) ...
- released 3.19.8
But a tag is only as good as the information that it helps to find. Consequently, I have started to describe error cases inside the knowledge base's event repository:
So far, there is only a limited set of messages available (to phrase it politely ;)), but I plan to increase it over time. Note that there is an interactive feature where questions to the message can directly be posted to the forum. I hope this is useful.
If you run into an error message that is not-yet described, let us know and we'll add an entry. In the long term, the new knowledge base part should be able to solve most problems.
Friday, June 13, 2008
- somewhat improved plain tcp syslog reliability by doing a connection
check before sending. Credits to Martin Schuette for providing the
- made rsyslog tickless in the (usual and default) case that repeated
message reduction is turned off. More info:
- applied some patches from Michael Biebl - thanks!
- released 3.19.7
- changed default for $KlogSymbolLookup to "off" (in beta)
Monday, June 09, 2008
I was made aware by Michael Biebl some time ago that the Linux community is trying to save power by disabling unnecessary wakeups inside the softwares. For example, the Linux kernel has become mostly tickless, that is does not necessarily wake up frequently. This is important, because modern CPUs have power save modes: simply said, the less is computed, the less power is needed. An idle CPU uses least power. However, when an application wakes up, the CPU resumes into modes that need more power. Of course, the CPU does not know if what it needs to be executed is actually useful or could be done more smartly. You can find more details at the site http://www.lesswatts.org/.
But why am I telling you this? The reason is that rsyslog, too, had (and unfrotunately has) some wakeups which are not strictly necessary. For example, the current imfile implementation needs to poll input files. This is the most portable way to do things, but it is not power-effcient. An update to a more power-efficient solution (on the platforms that support it) is scheduled. Next, the --MARK-- message generator runs periodically. This one can probably never be avoided, because the actual functionality is the periodic activity. Any user should think if the functionality is neeed and, if not, not enable this plugin.
Finally, and the core rsyslog awoke every 30 seconds to check if repeated messages need to be flushed. However, repeated message reduction is no longer the default and I assume that most people do not use it. The default is to have it turned off. But even when it was turned off, rsyslog still awoke to do the (unnecessary) check. I have changed this, rsyslog now only initiates the wakeups when they are actually needed. To many (if not most) environments this means rsyslog will never again wakeup the system if there is no need to. It's a small saving, but every little bit counts toward the ultimate goal. And given the potentially large number of systems running rsyslog, the saving may actually be larger than one initially assumes.
In the future, repeated message reduction will probably be implemented based on a different paradigm, which enables us to not run on a tick even when it is turned on.
In any case, I hope the new feature will contribute to - you name it - slightly longer notebook battery life and less environent pollution. And, ultimately, it's a interesting fact to see how a software developer can affect environment health these days ;)
The new functionality will be available starting with rsyslog 3.19.7.
I have done some initial testing. It confirms the findings from my last post: the method works well in local, low-traffic environments, but fails in others. However, even in the fail cases, the message loss is somewhat less than without the work-around. So I think it is useful to have it inside rsyslog.
One thing that it affects is the current retry logic. Rsyslog used the work-around to resend the last message, which was most probably lost, after a successful reconnect. As connection loss is now much more reliable detected in some cases, the probabilty of message duplication greatly increases in them. In other cases, however, it may still be desirable to resend the last message. To (somewhat) solve this issue, I have now added a new config directive ($ActionSendResendLastMsgOnReconnect on/off) so that the user can specify his preferrence. The default now is to NOT send these messages on reconnect.
This functionality will be part of rsyslog 3.19.7.
- released 3.19.6
- added doc on suggested TLS deployment (rough picture, configuration
sample still missing).
- added new property replacer option "date-subseconds" that enables
to query just the subsecond part of a high-precision timestamp
- fixed a bug with the new property replacer option in timereported
property - thanks to Elizabeth for reporting it
Friday, June 06, 2008
Martin showed that by doing a recv() call in front of the send(), reliability improves. This is because the local TCP stack is then forced to evaluate whatever the remote peer sent to it. Without that recv() call, the local stack may not evaluate the response until later. Of course, this is implementation specific, but seems to be common. From traces, I have seen that there seems to be good reason to do so: the number of packets sent is slightly lower when not doing the recv(), so it looks like the buffering is improved. On the other hand, this costs reliability as a remote shutdown is not immediately detected. I have not verified the buffering behaviour by looking at the tcp stack sources, but the traces are evidence enough. At least for practical purposes.
The recv() suggested by Martin forces the stack to evaluate packets received. If the remote end has closed the connection, that makes the local stack detect the condition. Consequently, loss of connection can be detected at this stage. Without the recv(), this needs at least one other send() call and thus causes the loss of one message. So we increase the reliability of plain tcp syslog by being able to detect the error state earlier.
However, this approach is racy. For example, if the remote peer initiated the shutdown after we did the recv() (but before the send()), this will not help. Also, it does not help if there is high traffic. In this case, the remote peer may be forced to shutdown the connection in the middle of a message. Keep in mind that a syslog server has no way of telling the client it intends to shut down. So if it must, it has no other chance than to terminate the connection forcefully (maybe after a short timeout, depending on the needs).
So, in short, the recv() trick often works if we have:
- a network with likely no packet loss
- a network with low latency
- low network traffic
The larger the environment, the more slim this chance is. Unfortunately, I assume that in enterprise environments that really need reliability, the recv() trick does not contribute much to improving reliability (because they do not have the perquisites outlined above). So for enterprises, plain tcp syslog is still not an option when there is need for reliable logging (e.g. in an auditing environment). Protocols with app-level acknowledgement are required there (e.g. RFC 3195, RELP or SETP).
I have thought long how this affects rsyslog. I have now concluded I will give the recv() trick a try. I think it can improve the situation, at least in some practical scenarios. The problem is that it is kind of a hack. I need to find a well enough way to implement it into my netstream driver layer that doesn't force the code to do something where it is illogical while at the same time keeping the driver clean to be used with more sophisticated protocols. So it is not a matter of adding a simple API call. I'll probably introduce a driver call which may do the recv(), only if it makes sense for that type of driver, and does a dummy return for those drivers that do not (need) to support it.
For completeness, I reproduce some of my comments on Martin's blog just so that I have a stable reference within a domain a manage (things too often disappear too quickly ;)).
Past day's rsyslog work log:
- released 3.17.3
- (finally switched to Fedora 9 on main dev machine;))
- enabled Posix ERE expressions inside the property replacer
(previously BRE was permitted only)
- provided ability to specify that a regular expression submatch shall
be used inside the property replacer
- Worked on Martin's suggestion for a more reliable plain tcp transport
- implemented in property replacer: if a regular expression does not match,
it can now either return "**NO MATCH** (default, as before), a blank
property or the full original property text
- Worked on Martin's new suggestion for a more reliable plain tcp transport
This one looks promising, at least for simple cases
- worked on plain tcp syslog reliability
- begun working on TLS doc (finally ;))
- bugfix: part of permittedPeer structure was not correctly initialized
thanks to varmojfekoj for spotting this
- bugfix: off-by-one bug during certificate check
- bugfix: removed some memory leaks in TLS code
- enhanced property replacer to support multiple regex matches
- IETF work, syslog-transport-tls
- begun doc on unreliable tcp syslog and a work-around based on Martin's
Thursday, May 29, 2008
I was more than interested. Should there be a solution I had overlooked? Martin's idea is that he queries the TCP stack if the connection is alive before sending any data. He outlined two approaches with the later being a non-blocking recv() right in front of the send(). The idea is that the recv() should detect a broken connection.
After I thought a bit about this approach I had a concern that it may be a bit racy. But in any case, new thoughts are worth evaluating. And a solution would be most welcome. So I quickly implemented some logic in my existing gtls stream driver. To make matter simple, I just did the recv() and sent the return status and errno to debug output (but did not try any reconnects based on it). And then the ugly happened: I always got the same EAGAIN return state (which is not an error), no matter in what state the connection was. I was even able to pull the receiver's network cable and the sender didn't care.
So, this approach doesn't work. And, if you think a bit more about it, it comes at no surprise.
Consider the case with the pulled network cable. When I plugged it in again a quarter hour later, TCP happily delivered the "in-transit" messages (that were sitting in the client buffer) after a short while. This is how TCP is supposed to work! The whole point is that it is designed to survive even serious network failures. This is why the client buffers messages in the first place.
What should the poor client do in the "pulled network cable" case. Assume the data is lost just because it can not immediately send it? To make it worse, let's assume the data had already left the system and successfully reached the destination machine. Now it is sitting in the destination's receive buffer. What now if the server application (for example due to a bug) does not pull this data but also does not abort or close the file descriptor? How at all should TCP detect these failures? The simple truth is it isn't and it is not supposed to be.
The real problem is the missing application level acknowledgment. The transport level ACK is for use by TCP. It shall not imply anything for the app layer. So if we depend on the TCP ack for an app-level protocol, we abuse TCP IMHO. Of course, abusing something may be OK, but we shouldn't wonder if it doesn't work as we expect.
Back to the proposed solution: the problem here is not that the send call does not fail, even though the stack knows the connection is broken. The real problem is that the TCP stack does not know it is broken! Thus, it permits us to send data on a broken connection - it still assumes the connection is OK (and, as can be seen in the plugged cable case, this assumption often is true).
As such, we can NOT cure the problem by querying the TCP stack if the connection is broke before we send. Because the stack will tell us it is fine in all those cases where the actual problem occurs. So we do not gain anything from the additional system call. It just reflects the same unreliable status information that the send() call is also working on.
And now let's dig a bit deeper. Let's assume we had a magic system call that told us "yes, this connection is fine and alive". Let's call it isalive() for easy reference. So we would be tempted to use this logic:
But, as thread programmers know, this code is racy. What, if the connection breaks after the isalive() call but before the send()? Of course, the same problem happens! And, believe me, this would happen often enough ;).
What we would need is an atomic send_if_alive() call which checks if the connection is alive and only then submits data. Of course, this atomicity must be preserved over the network. This is exactly why databases invented two-phase commits. It requires an awful lot of well thought-out code ... and a network protocol that supports it. To ensure this, you need to shuffle at least two network packets between the peers. To handle it correctly (the case where the client dies), you need four, or a protocol that works with delayed commits (as a side-note, RELP works along those lines).
Coming back to our syslog problem, the only solution to solve our unreliability problem without specifying app-layer acks inside syslog, is to define a whole new protocol that does these acks (quite costly and complex) out of band. Of course, this is not acceptable.
Looking at all of this evidence, I come to the conclusion that my former statement unfortunately is still correct: one can not implement a reliable syslog transport without introducing app-level acks. It simply is impossible. The only cure, in syslog terms, is to use a transport with acks, like RFC 3195 (the unbeloved standard) or RELP (my non-standard logging protocol). There are no other choices.
The discussion, however, was good insofar that we now have generally proven that it is impossible to implement a reliable TCP based protocol without application layer acknowledgment at all. So we do not need to waste any more effort on trying that.
As a funny side-note, I just discovered that I described the problem we currently see in IETF's transport-tls document back than on June, 16, 2006, nearly two years ago:
Tom Petch also voiced some other concerns, which still exist in the current draft:
As you can see, he also mentions the problem of TCP failures. The idea of introducing some indication of a successful connection was quickly abandoned, as it was considered too complex for the time being. But as it looks, plain technical fact does not go away by simply ignoring it ;)
UPDATE, half an hour later...
People tend to be lazy and so am I. So I postponed to do "the right thing" until now: read RFC 793, the core TCP RFC that still is not obsoleted (but updated by RFC3168, which I only had a quick look at because it seems not to be relevant to our case). In 793, read at least section 3.4 and 3.7. In there, you will see that the local TCP buffer is not permitted to consider a connection to be broken until it receives a reset from the remote end. This is the ultimate evidence that you can not build a reliable syslog infrastructure just on top of TCP (without app-layer acks).
Reliable TCP Reconect made Easy
In short, he tries to do a non-blocking recv() from the connection to see if the remote peer has shut it down. This may work and I will give it a try. However, as of my understanding it will NOT solve the issue of unreliability because of broken connections. I have to admit that I also think there is still a race condition (what if the server closes the connection after the client has done the recv() but before the send()...
I'll report back as soon as I have some real-life data. It's an interesting approach in any case and good to know somebody else is working on the same issues. That will hopefully bring us to a better overall solution :)
Wednesday, May 28, 2008
Tuesday, May 27, 2008
I have roughly worked the past three weeks on the new -12 version of transport-tls. First of all, it is important to keep in mind that I already had implemented the -11 version (minus the then-unclear authentication) in rsyslog 3.19.0. That meant I had just to implement the new authentication stuff. This was obviously a major time-saver.
The current implementation utilizes the GnuTLS library for all TLS operations. I would like to thank the GnuTLS folks for all their help they provided on the mailing list. This was extremely useful. GnuTLS in rsyslog works as a "network stream driver" and can theoretically be replaced with other libraries (support for at least NSS is planned). For obvious reasons, this implementation report includes a number of GnuTLS specifics.
It is not exactly specified whether a syslog message traveling over -transport-tls must strictly be in syslog-protocol format or not. This may lead to interoperability problems. For rsyslog, I have implemented that any message format is accepted. Any message received is simply fed into the general parser-selector, which looks at the message format and selects the most appropriate parser. However, this may not really be desirable from a security point of view. When sending, rsyslog also does not demand anything specific. Due to rsyslog design, message creation and transmission are quite separate parts. So even if the draft would demand -syslog-protocol format, I would not be able to enforce that in rsyslog (it would break too many application layers). Of course, rsyslog supports -syslog-protocol format, but it requires the proper template to be applied to the send rule.
Rsyslog implements even most optional features. However, I have not implemented IP-address-based authentication, which is a MUST in Joe's new proposed text (section 5.1). The reason is that we found out this option is of very limited practical experience. IP addresses are very seldomly found in certificates. Also, there are ample ways to configure rsyslog in client role so that it knows the server's identity. This was also brought up on the IETF syslog mailing list and it looks like this feature will be dropped. Should it be actually survive and go into the final standard, I will implement it, even though I do not see any use in practice. Thus I have deferred implementation until it is actually needed. Rsyslog user feedback may also show if there is a need for this feature in practice.
Each rsyslog instance is expected to have one certificate identifying it. There can be different certificates for different senders and receivers, but this is considered the unusual case. So in general, a single certificate identifies the rsyslog instance both as a client and server.
Rsyslog support the three authentication modes laid out in the draft: anonymous, fingerprints and subject names. Obviously, anonymous authentication is easy to do. This was a quick task without any problems.
Fingerprint authentication was somewhat problematic to implement. The core problem was that GnuTLS, by default, sends only those certificates to the server that are in the server's trusted CA list. With self-signed certs on both the client and the server, this is never the case and GnuTLS does not provide any certificate at all. I used kind of a hack to get around this. There is a function in GnuTLS that permits to provide certificates on an as-needed basis. I used this hook. However, I now no longer have the ability to provide only those certificates a server can verify. When I have multiple certificate stores and the server is in subject name authentication mode, this would be valuable. So far, I have ignored this problem. If practice shows it needs attention, I will further investigate. But here is definitely a potential future trouble spot. A core problem is that a sender does not (should not need to) know if the receiver is using fingerprint or subject name authentication. For the later, the GnuTLS defaults are quite correct and provide a very convenient interface. But I can not select different modes on the client as I do not know which one is right.
Subject name based authentication did not pose any such problems. This comes at no surprise, because this is the the usual mode of operations for the TLS library. One can assume this to be best-tested.
One disappointment with GnuTLS was that during the TLS handshake procedure only basic authentication can be done. Most importantly, there is no hook that enables an application to check the remote peer's certificate and authorize it or deny access during the handshake. Authorization can only be done after the handshake has completed. Form asking around, NSS seems to provide this ability. OpenSSL, on the other hand, seems NOT to provide that hook, too (I could not verify that, though). As such, rsyslog needs to complete the handshake and then verifies fingerprint's or validates the certificate chain, expiration dates and checks the subject name. If these checks show that we are not permitted to talk to the peer, all we can do is close the connection.
If a client is connecting to a server, this is a minor annoyance, as a connection is created and dropped. As we can not communicate the reason why we close the connection, the server is left somewhat clueless and currently logs a diagnostic warning of a freshly created connection being immediately closed. I will probably change that diagnostic in the future.
Quite more problematic is the case when a server fails to authenticate the client. Here, the client received the handshake and already begun to send data when the server closes the connection. As there is no application level acknowledgment in transport-tls, the client does not know when exactly the connection is closed by the server. In the end result the client experiences message loss and may even not notice the failed connection attempt until much later (in most cases, the first message is always successfully sent and only the second message, possible hours later, will see a problem). In the end result, this can lead to massive data loss, even to complete data loss. Note that this is not a direct cause of transport-tls, but of the underlying plain TCP syslog protocol. I have more details in my blog post on the unreliability of TCP syslog.
Please note that -transport-tls does not specify when peer authentication has to happen. It may happen during the handshake but it is also valid to do it after the handshake. As we have seen, doing it after the handshake causes serious problems. It may be good to at least mention that. If the draft is changed to mandate authentication during the handshake, some implementors will probably not implement it, because the library they use does not support it. Of course, one could blame the library, but for existing products/projects, that will probably not help.
The need to authenticate during the handshake is a major problem for anyone implementing -transport-tls. For rsyslog and for now, I have decided to live with the problem, because I do have the unreliability problem in any case. My long-term goal is to switch to RELP to address this issue and provide TLS support for RELP (RELP uses app-level acks, so there is no problem with authenticating after a successful handshake - I can still emit an "authentication failed" type of message). Please note that the transport-tls specific problem only occurs if the remote client fails to authenticate - this is what make it acceptable to me. I expect this situation to be solved quickly (either something is misconfigured or an attack is going on in those cases).
As a side-note, I may see if I can provide a patch for GnuTLS if this turns out to become a major problem.
Besides implementing the required methods, I have also thought about how to create a sufficiently secure system with the least possible effort.
In home environments where the "administrator" has little or no knowledge and uses rsyslog to receive message from a few low-end devices (typically a low-end router), it is hard to think of any good security settings. Most probably, anonymous "authentication" is the best choice here. It doesn't protect against man-in-the-middle attacks, but it at least provides confidentiality for messages in transit. The key point here is that it does not require any configuration except for enabling TLS and specifying the syslog server's address in the device GUI.
Another good alternative for these environments may actually be auto-generating a self-signed cert on first rsyslogd startup. This is based on the assumption that the device GUI provides a way to view and authorize this certificate (after it has talked to the server and obtained the cert9). However, I have to admit that I see only limited advantage in implementing this. After all, if the admin is not able to configure things correctly, do we really expect him to be able to interpret and sufficiently frequently review the system logs? I'd say this is at least doubtful and so I prefer to put my implementation efforts to better uses...
The anticipated common use case for rsyslog is an environment where the administrator is at least knowledgeable enough to carry out some basic configuration steps and create certificates if instructed on which tools to run. We do not assume that a full PKI infrastructure is present. As such, we suggest that each organization creates its own CA for rsyslog use. This involves creating one root CA certificate. That certificate is then used to create certificates for each instance of rsyslog that is to be installed. There is one instance per machine. To keep configuration simple, each machine's DNS name is to be used.
All clients shall forward via a @@hostname action, where hostname must be the actual DNS name (as specified in the certificate) and not an IP address or something else. To prevent DNS failures or unavailability of DNS during startup, this name and its IP address may be set in /etc/hosts. With that configuration, the client can validate the server's identity without any extra configuration.
To achieve a similar automatic identity check on the server side (server authenticating client), subject name wildcards are used. It is suggested that all syslog client are within the same domain. Then, the server can be instructed to accept messages from all of them with a single configuration setting enabling message reception from e.g. "*.example.com". This, together with the fact that the certificate must have been signed with the "rsyslog root CA"'s certificate provides sufficient proof of identification in most cases.
In more complex scenarios, more complex authentication can be used. There will be no specific guidelines within the rsyslog documentation on other policies. It is assumed that those who have need for such complex policies know what they need to have, so there is no point in providing advise. From the engine point of view, rsyslog already provides for many advanced uses (e.g. different certificate stores for different sessions) and can easily extended to provide for others. As of my understanding, the latest text proposed by Joe permits me to do all of this under the umbrella of -transport-tls, so the draft is no limiting factor.
The bottom line is that an enterprise-specific rsyslog root CA provides quite automatic configuration of peer credentials while being easy to implement. Wildcard subject name matches play a vital role, because they are the only way to permit a server with the ability to authorize a wide range of clients in a semi-automatic manner.
IMO, subject name based authentication is easier to setup than fingerprint authentication, at least in a rsyslog-to-rsyslog case. If it is easier to setup in a heterogeneous environment depends on the ability of all peers to either generate certificate requests and accept the certificate and/or import prefabricated .pem files. If that is simple enough, subject name based authentication can be used with very low administrative overhead (but integrating it into a full-blown PKI is still another thing...).
To really prove the implementation, of course, at least one other independent implementation is needed. Currently there is none, but as it looks NetBSD's syslogd will be enhanced as a Google Summer of Code project. I am keeping an eye on that project and will try to do interop testing as soon as such is possible. Having one implementation from the "device camp" (e.g. a router) would be extremely useful, though, as that would provide more insight on how easy it will be to configure things via such an administrative interface (not in theory, but in actual implementation - I expect a difference between the two as there are always constraints that must be observed, like the overall application framework and programming tool set).
To wrap things up, -syslog-transport-tls-12+ is sufficiently easy to implement and deploy. IMHO it also provides sufficient extensibility to implement complex scenarios. Some details could be improved (when to authenticate, message format) and a decision on IP based authentication should be finalized. But I don't see any reason to hold it much longer and look forward to it being finalized.
Some might have noticed that I am not as active as usual on the rsyslog project . As this seems to turn out to keep at least for the upcomi...
Did you ever use TCP to transfer syslog reliably? And do you think that makes you immune against message loss? If so, it's time to think...
I currently think about creating a very basic shipper for log files, but wonder if it really makes sense. I am especially concerned if good ...
As most of you know, rsyslog permits to pull multiple lines from a text file and combine these into a single message. This is done with the ...