Happy Holidays to Everyone!

           *             ,
_/^_
< >
* /.-. *
* `/&` *
,@.*;@,
/_o.I %_ *
* (`'--:o(_@;
/`;--.,__ `') *
;@`o % O,*`'`&
* (`'--)_@ ;o %'() *
/`;--._`''--._O'@;
/&*,()~o`;-.,_ `""`)
* /`,@ ;+& () o*`;-';
(`""--.,_0 +% @' &()
/-.,_ ``''--....-'`) *
* /@%;o`:;'--,.__ __.'
;*,&(); @ % &^;~`"`o;@(); *
/(); o^~; & ().o@*&`;&%O
jgs `"="==""==,,,.,="=="==="`
__.----.(-''#####---...___...-----._
'` )_`"""""`
.--' ')
o( )_-
 
 

Happy holidays to everyone! It was fun communicating with all of your and bringing new features online. This year, we did not only extend rsyslog, but worked an a couple of enhancement in related topics, like log normalization, the log store, better Windows support and many improvements in the web based viewer.

Special thanks to all of who contributed code, good bug reports, ideas and encouragement – or helped funding the project by purchasing support contracts or incidents.

It was a busy year and I will try to relax a bit the next days. So please bear with me if I do not respond as quickly as usual (I will try to even be a few days offline ;)).

Thanks again guys, have a great holiday season and all the best, most importantly health, for 2012!

Rainer

PS: if you enjoy the ASCII art, have a look at the artist’s web site!

Feedback Request for digitally signed log store

I have just written about how I plan to implement digital signatures in LogStore, the secure store used by LogTools. The log store digital signature proposal details how and when signatures are written and provides reasoning why it will probably happen this way. There are two goals for the proposal: one is to document how things will work and the other, probably more important, one is to draw some feedback. It is easy to get security tools wrong, and even those with highest experience in that area (which I have not!) can fail. So it would be very beneficial to have some other folks read the proposal and comment on weaknesses they find – or simply things they would do differently or add to the overall idea. With that said, please read the (small) paper and provide feedback ;-).

Please keep on your mind that his is not only related to syslog but can  be used with any text-based log (including binary logs that are converted to text, e.g. by base64 encoding them). So it can affect you even if you are not interested in syslog itself. My (mostly uneducated) assumption is that this could be a toolset of great use for computer forensics.

LogTools 0.1.0 Released

I finally managed to do the initial public release of LogTools, a set of useful utilities for log data processing. Their current most important feature is the initial version of LogStore, a tamper-proof way to store textual log data especially designed for long-term archival. Note that LogTools perfectly process syslog messages, but can be used for anything that is text-based (like Apache or other application text logs).

I am very happy to finally have the initial release ready. Full details can be found in the release announcement. I initially thought it would become available early last week, but I wanted to create some packages. As I had never done this before, it turned out to become a bit of a problem for me. For the time being, I have settled to do an experimental Debian package via checkinstall. While this is obviously a quick and dirty solution, it enables folks to obtain LogTools via the easy way. Also, I don’t think it is too problematic, because in essence only some user-tools are installed that do not affect anything else on the system. But, of course, I’ll think about better packaging as the project continues.

At this point, I would be very interested in feedback both on the current tools as well on what would be considered a plus in the future. Please let me know!

Announcing LogStore

While it probably is a bit early for a “real” announcement,  I wanted to tell a bit about the project I have been working on the past days, a dedicated storage for (sys)log messages. It will be available as part of LogTools, the actual project I am working with. A key feature of the LogStore format will be its tamper-proofness. I wanted to write such an improved storage system for quite a while. However, I have to admit that the recent journald proposal brought more life to it. While the journald proposal aims at building a kind of Window Event Log for Linux, the LogTools effort is more interested in traditional text log files (but I won’t outrule going beyond that in the future).

It is important to note that LogTools, and their storage format LogStore, can protect any kind of text file. Of course, it is great for syslog logs, but you may also secure things like Apache http logs or whatever else you have in text format.

You may probably remember that I was – and still am – very skeptic about the way journald tries to secure logs via hash chains (be sure to read the comments as well!). While the journald propsal has some technical deficits, I learned that many folks we interested in this kind of hash-chaining, even though they knew it would not be truly tamperproof (I followed a lot of forum posts). Seeing this, I thought it may be useful to provide this level of protection inside some simple to use tools, what gave birth to LogTools. Still, my assesment in regard to journald holds to the current LogStore format as well: it is far from being real cryptography, it is insecure and it may be counter-productive if it generates a false sense of security. However, if one knows the limits, it can provide some useful function. So be sure to know what you do when you use these tools!

For LogStore, I have also planned to employ some real cryptography and cryptographically sign the hash chain. This will actually make the log tamperproof in a very strong sense as long as the signing key is not compromised. That functionality will be addressed once the initial release is out.

The LogStore format itself is deliberately defined in a text-tool friendly way and well documented. For your initial review, I have included its man page below.

The LogTools project is currently available only via the LogTools git. I plan to finish the remaining man pages soon (at latest this week), and then create distribution tarballs (and hopefully some simple packages, but this needs to be seen).

Feedback on this effort is appreciated.


NAME

logstore – enhanced log message storage  

DESCRIPTION

The logstore is an enhanced log message storage. It can be used to store log messages in a way that secures their integrity. Currently, all data is stored in sequential files.
The logstore provides integrity checks by chaining of SHA1-checksums. Each log record (except the first) is hashed, together with the hash of the previous record. As such, manipulations inside the log store can be detected, as long as the checksums of all records are not also recomputed. Sequential logstore files are pure text files.
 

FORMAT

A sequential logstore is a text file containing variable-length records. Within each record, there is recordtype, cryptodesignator and content.

recordtype
A single character designating the type of record. Currently only “m” is defined, which specified original message text.
cryptodesignator
Variable length, terminated by a colon. The is printable data that has some cryptographic function. For example, for “m”-type recrods it is the message’s chained SHA1 hash.
content
Variable length content terminated by a LF (. For obvious reasons, LF is not permitted within the message. Also, the US-ASCII NUL character ( ) is forbidden in order to prevent trouble with text based tools. It is suggested that only printable characters are used inside the message, but this is currently not enforced.
The structure is recordtype Rsyslog has a modular design. Consequently, there is a growing number of modules. See the html documentation for their full description.

 

SECURITY

In its current form, logstore provides limited security. While it is possible to verify the correctness of the hash chain, an attacker may simply rewrite the complete file, computing new hashes. However, protection can be (manually) gained by saving the last hash inside the file to a separate location. If so, one can compare the last hash with this previously saved information and check if it is still valid. If someone mangeled the store, this will not be the case. Once the authenticy of the last hash has been proven, it is easy to verify the rest ot the file. The logreader (1) tool can be used to do that.
In the future, cryptographic signatures based on public key cryptography will be used to protect the hash chain.
 

SAMPLE

The following is a minimalistic 4-line sample of a sequential logstore file.

m04e3324670626451755aa2257a9b92395e26c2e4:line 1
m347d4500ea11fa41800a58972699e57e0c0d7cd7:line 2
m3c329d40e37ae20c475c06bfaab892892ef4579d:line 3
m807cc61d7b04cbc1f048810df9d3a652988d745e:line 4

Note that all records have “m” in the first postion, designating them as message records. The cryptographic hash in line one is a SHA1 hash of just line one’s content, whereas the hashes for lines n (with n>1) are taken after the concatenation of hash(n-1) and line(n), without the colon. So in order to obtain line two’s hash, the following string is hashed:
04e3324670626451755aa2257a9b92395e26c2e4line2
 

SEE ALSO

logreader(1), logwriter(1), liblogtools(3)
 

AUTHORS

Rainer Gerhards (rgerhards@adiscon.com)

Serious syslog problems?

In the paper introducing journald/Linux Journal a number of shortcommings in current syslog practice are mentioned. The authors say:

Syslog has been around for ~30 years, due to its simplicity and ubiquitousness it is an invaluable tool for administrators. However, the number of limitations are substantial, and over time they have started to be serious problems:

I have now taken some time to look at each of these claims in depth. But before I start, I need to tell that I am working in the IT logging field for nearly 15 years, have participated in a number of standards efforts and written a lot of syslog-related software with rsyslog being a prime example (some commercial tools I have been involved with can be found here). So probably I have a bias and my words need to be taken with a grain of salt. On the other hand, the journald authors also have a bias, so I guess that’s a fair exchange of arguments ;). 

In my analysis, I compare the journald effort with what rsyslog currently provides and leave closed source software out. It is also important to note that there is a difference between syslog, the protocol, a specific syslog application (like rsyslog) and a system log message store. Due to tradition, these terms are often used for different things and one must deduce from context, what is meant. The paper applies the same sloppiness in regard to terms. I use best effort to extract the proper meaning. I quote the arguments as they originally appeared inside the paper. However, I rearrange them a bit in order to put related things closer together. I retain the original numbering so that you can compare to the original paper. I also tried to be similar brief with my arguments. Now proof-reading the post, I see that I failed with that. Sorry, but that’s as brief as I can provide serious counterargument. I broadly try to classify arguments in various levels of “True” vs “Wrong”, so you may take this as an ultra-short reply. 

So let’s start with Arguments related to the log storage system. In general, the paper is right that there is no real log storage system (like, for example, the Windows Event Log). Keeping logs only in sequential text files definitely has disadvantages. Syslog implementations like rsyslog or syslog-ng have somewhat addressed this by providing the ability to use databases as storage backends (the commercial syslog-ng fork also has a proprietary log store). This has some drawbacks as well. The paper proposes a new proprietary indexed syslog message store. I kind of like this idea, have even considered to write something like this as an optional component for rsyslog (but had no time yet to actually work on it). I am not convinced, though, that all systems necessarily need such a syslog storage subsystem.

With that said, now let’s look at the individual arguments:

5. Reading log files is simple but very inefficient. Many key log operations have a complexity of O(n). Indexing is generally not available.

True. It just needs to be said that many tools inside the tool chain only need sequential access. But those that need random access have to pay a big price. Please note, however, that it is often only necessary to “tail” log files, that is act on the latest log entries. This can be done rather quickly even with text files. I know both the problems and the capabilities, because Adiscon LogAnalyzer, in which I am involved, is a web-based analysis and reporting tool capable of working on log files. Paging is simple, but searching is slow with large files (we recommend databases if that is often required). Now that I write that, a funny fact is that one of the more important reasons for creating rsyslog was that we were unhappy with flat text files (see rsyslog history doc). And so I created a syslogd capable of writing to databases. Things seem to be a bit cyclic, though with a different spin ;)

8. Access control is non-existent. Unless manually scripted by the administrator a user either gets full access to the log files, or no access at all.

Mostly True and hard to make any argument against this (except, of course, if you consider database back ends as log stores, but that’s not the typical case).

10. Automatic rotation of log files is available, but less than ideal in most implementations: instead of watching disk usage continuously to enforce disk usage limits rotation is only attempted in fixed time intervals, thus leaving the door open to many DoS attacks.

Partly True, at least in regard to current practice. Rsyslog, for example, can limit file sizes as they are written (“outchannel action”), but this feature is seldomly used and due to be replaced by a better one. The better one is partly implemented but received no priority because nobody in the community flagged this as an urgent requirement. As a side-note: Envision that journald intends to shrink the log and/or place stricter restrictions on rate-limiting when disk space begins to run low. If I were an attacker, I would simply begin to fill the disk then, and make journald swipe out the log store for me.

11. Rate limiting is available in some implementations, however, generally does not take the disk usage or service assignment into account, which is highly advisable.

It needs to be said what “rate limiting” means. I guess it means preventing an application from spamming the logs with frequently repeated messages. This feature is available  in rsyslog. It is right that disk usage is not taken into account (see comment above on implications). I don’t know what “service assignment” means in this context, so I don’t comment on that one. Rate limiting is more than run-away or spamming processes. It is a very complex issue. Rsyslog has output rate limiting as well, and much more is thinkable. But correct, current rate limiting looks at a number of factors but not the disk assignment. On the other hand, does that make sense, if e.g. a message is not even destined to go to the disk?

12. Compression in the log structure on disk is generally available but usually only as effect of rotation and has a negative effect on the already bad complexity behaviour of many key log operations.

Partly True. Rsyslog supports writing in zip format for at least one and a half year (I am too lazy to check the ChangeLog). This provides huge savings for those that turn on the feature. Without doubt, logs compressed in this way are much harder to process in real-time.

7. Log files are easily manipulable by attackers, providing easy ways to hide attack information from the administrator

Misleadingly True. If thinking of a local machine, only, this is true. However, all security best practices tell that it is far from a good idea to save logs on a machine that is publicly accessible. This is the reason that log messages are usually immediately sent do some back end system. It is right that this can not happen in some setup, especially very small ones.

My conclusion on the log store: there definitely is room for improvement. But why not improve it within the existing frameworks? Among others, this would have the advantage that existing methods could be used to decide what needs to be stored inside the log store. Usually, log contain noise events that administrators do not want to log at all, because of the overhead associated with them. The exists best practices for the existing tool chain on how to handle that.

Now on to the other detail topics:

1. The message data is generally not authenticated, every local process can claim to be Apache under PID 4711, and syslog will believe that and store it on disk.

9. The meta data stored for log entries is limited, and lacking key bits of information, such as service name, audit session or monotonic timestamps.

Mostly wrong. IMHO, both make up a single argument. At the suggestion of Lennart Poettering, rsyslog can force the pid inside the TAG to match the pid of the log message emitter – for quite a while now. It is also easy to add additional “trusted properties”. I made an experimental implementation in rsyslog yesterday. It took a couple of hours and the code is available as part of rsyslog 5.9.4. As a side-note, the level of “trust” one wants to have in such properties needs to be defined – for truly trusted trusted properties some serious cryptography is needed (this is not specified in the journald proposal nor currently implemented in rsyslog).

2. The data logged is very free-form. Automated log-analyzers need to parse human language strings to a) identify message types, and b) parse parameters from them. This results in regex horrors, and a steady need to play catch-up with upstream developers who might tweak the human language log strings in new versions of their software. Effectively, in a away, in order not to break user-applied regular expressions all log messages become ABI of the software generating them, which is usually not intended by the developer.

Trivial (I can’t commit myself to a “True” or “Wrong” on such a trivial finding). Finally, the authors have managed to describe the log analysis problem as we currently face it. This is not at all a syslog problem, it is problem of development discipline. For one, syslog has “solved” this issue with RFC5424 structured data. Use it and be happy (but, granted, the syslog() API currently is a bit problematic). The real problem is the missing discipline. Take, for example, the Windows Event Log. The journald proposal borrows heavily on its concepts. In Windows Event Log, there is a developer-assigned unique ID within the application’s reserved namespace available. The combination of both app namespace (also automatically created) and ID together does exactly the same thing as the proposed UUID. In Windows Event Log, there are also “structured fields” available, but in the form of an array (this is a bit different from name-value pairs but far from totally different). This system has been in place since the earliest versions of Windows NT, more than 15 years ago. So it would be a decent assumption that the problem described as a syslog problem does not exist in the Windows world, right (especially given the fact that Windows purposefully does not support syslog)? Well, have a look at the problems related to Windows log analysis: these are exactly the same! I could also offer a myriad of other samples, like WELF, Apache Log Format, … The bottom line is that developer discipline is not easy to achieve. And, among others, a taxonomy is actually needed to extract semantic meaning from the logged event. It probably is educating to read the FAQ for CEE, a standard currently in development that tries to somewhat solve the logging mess (wait a moment: before saying that CEE is a bunch of clueless morons, please have a look at the CEE Board Members first).

3. The timestamps generally do not carry timezone information, even though some newer specifications define support for it.

Partly Wrong. High-Precision timestamps are available for many years and default in rsyslog. Unfortunately, many distros have turned them off, because they break existing tools.  So in current practice this is a problem, but it could be solved by deleting one line in rsyslog.conf. And remember that if that causes trouble to some “vital” tool, journald will break that tool even more. Note that some distros, like Gentoo, already have enabled high precision timestamps.

4. Syslog is only one of many log systems on local machines. Separate logs are kept for utmp/wtmp, lastlog, audit, kernel logs, firmware logs, and a multitude of application-specific log formats. This is not only unnecessarily complex, but also hides the relation between the log entries in the various subsystems.

Rhetorically True – but what why is that the failure of syslog? In fact, this problem would not exist if developers had consistently used syslog. So the problem is not rooted in syslog but rather in the fact that syslog is not being used. Lesson learned: even if standards exist, many developers simply ignore them (this is also an interesting argument in regard to problem number #2, think about it…).

13. Classic Syslog traditionally is not useful to handle early boot or late shutdown logging, even though recent improvements (for example in systemd) made this work.

True – including that fact that systemd already solved that problem.

14. Binary data cannot be logged, which in some cases is essential (Examples: ATA SMART blobs or SCSI sense data, firmware dumps).

Wrong, the short answer is: it can be logged, but must be properly encoded. In the IETF syslog working group we even increased the max message sizes for this reason (actually, there is no hard limit anymore).

The longer, and more correct, answer is that this is a long-standing discussion inside the logging world. Using that view, it is hard to say if the claim is true or false; it often even is argued like being a religion. Fact is that the current logging toolset does not work well for binary data (even encoded). This is even the case for the Windows Event Log, which supports binary data. In my view, I think most logging experts lean towards the side that binary data should be avoided and, if unavoidable, must be encoded in a text-friendly way. A core problem with the usefulness of binary data is that it often is hard to decode, and even more to understand, on the non-native platform (remember that the system used during analysis is often not the system where the event was initially recorded).

6. The syslog network protocol is very simple, but also very limited. Since it generally supports only a push transfer model, and does not employ store-and-forward, problems such as Thundering Herd or packet loss severely hamper its use.

Wrong,  missing all improvement made in the past ten years. There is a new RFC series which supports TLS-secured reliable transmission of syslog messages and which permits to place fine-grain access control on who can talk with whom inside a relay chain. UDP syslog is still available and is so for good reason. I cannot dig into the details here, part of that reasoning is on the same grounds why we use audio more often over UDP than TCP. Using UDP syslog is strictly optional and there are few scenarios where it is actually needed. And, if so, the “problem” mentioned is actually a “solution” to a much more serious problem not even mentioned in the journald paper. For a glimpse at these problems, have a lock at my blog post on the “reliability problem”. Also, store-and-foward is generally available in rsyslog via action queues, failover handling and a lot of other things. I admit that setting up a complex logging system, sometimes requires an expert. On the “loss issue”, one may claim that I myself say that plain TCP syslog is not totally lossless. That claim is right, but the potential loss Window is relatively small. Also, one can use different protocols that solve the issue. In rsyslog, I introduced proprietary RELP for that very reason. There is also completely lossless RFC3195, which is a great protocol (but without future). It is supported by rsyslog, but only extremely few other projects implement it. The IETF (including me) assumes that RFC3195 is a failure – not from technical fact but in the sense that it was too far away from the usual logging practice to be picked up by enough folks. [Just to avoid mis-intepretation: contrary to RFC3195, RELP is well alive, well-accepted and in widespread use. It is only RFC3195 that is a failure.]

Concluding my remarks, I do not see anything so broken in syslog that it can only be fixed by a total replacement of technology. Right contrary, there is a rich tool set and expertise available. Existing solutions, especially in the open source world, are quite modular and can easily be extended. It is not even necessary to extend existing projects. A new log store, for example, could also be implemented by a new tool that imports a decent log format from stdin to a back end data store. That would be easily usable not only from rsyslog but from any other tool that is part of the current log tool chain. For example, it may immediately consume Apache or other application logs (of course, such a tool would require proper cryptography to be used for cryptographic tasks…). There is also need for a new logging API – the catch-all syslog() call is clearly insufficient (an interesting detail fact is that journald promises to retain syslog() as a first-class logging interface — that means journald can solve none of the issues associated with that API, especially in regard to claim #2).

So extending existing applications, or writing new ones that tightly integrate into the existing toolset is the right thing to do. One can view journald as such an extension. However, this extension is somewhat problematic as its design document tells that it intends to replace the whole logging system. Especially disturbing is that the reasoning, as outlined above, essentially boils down to a new log store and various well-known mostly political problems (with development discipline for structured formats right at the top of them). Finally, the proposal claims to provide more security, but fails to achieve at least the level that RFC5848 syslog is able to provide. Granted, rsyslog, for example, does not (yet) implement RFC5848. But why intends journald to implement some home-grown pseudo security system when a standard-based method designed by real crypto experts is available? I guess the same question can be applied to the reasoning for the journald project at large.

Let me conclude this posting with the same quote I started with:

Syslog has been around for ~30 years, due to its simplicity and ubiquitousness it is an invaluable tool for administrators. However, the number of limitations are substantial, and over time they have started to be serious problems:

Mostly Wrong. But it is true that syslog is an invaluable tool,especially in heterogeneous environments.

Trusted Properties in rsyslog

Today, I implemented “trusted (syslog) properties” inside rsyslog’s imuxsock module. The term “trusted” refers to the fact that these properties can not be faked by the logging application, creating an additional layer of log integrity protection. The idea is rooted in the journald proposal, where they are called “metadata” and “trusted fields”. Actually I liked the idea implied by “trusted”, but thought “property” would be a better name than “field”.

The concept is not totally new. Actually, for some month rsyslog can patch the PID field of the syslog TAG with the correct pid, so that this cannot be mangled with. This was based on an idea from Lennart Poettering, which I found nice and implemented quickly (I met him at Linux Tag 2010 in Nürnberg, Germany where we discussed this and other things). The core idea is to use SCM_CREDENTIALS so that the OS itself records pid, gid and uid. With the new feature, this is taken one step further. Now, we also query the /proc virtual file system for additional information like the location of the logging application’s binary. Undoubtedly, this provides some extra protection against faked messages. On the downside, it has some obvious overhead. A simple and immediate solution to this is to use rsyslog’s omfile in zip mode. In journald, overhead is tried to avoid via a proprietary binary format, its event log, which provides compression features (but for syslog transmission the journald event log obviously needs to be decompressed as well). Some restrictions exist with trusted properties, some obvious, some less obvious (see the trusted property doc for details; it also has the list of currently supported properties).

The current implementation is in experimental status. Based on feedback, some specifics may be changed in future versions. Also, the current implementation does not try to be standards-compliant. This will probably also change in the future. I hope that the new capability is useful to the logging community. As a side-note, the new feature, implemented in one morning, also shows that it often is easy to extend existing technology instead of writing everything new from scratch ;)

The actual release announcement will go out either today or tomorrow. The code is available via the v5-devel git branch right now.

What I don’t like about journald / Linux Journal

I heard of journald only a couple of hours ago (Tuesday?) and since then some intense discussion is going on. I had a chance to look at the journald material in more depth. I also had a quick look at journald’s source, but (as I know) Lennart and I are on the strictly opposite sides in regard to the amount of comment lines in source files (I put half the spec in, Lennart nothing at all ;-)). So I did not try too hard to make sense of the code and my impression is still primarily based on the initial paper (though I have to admit his code is probably simpler as he does not need to carry any legacy nor consider platforms other than recent Linux). 

The contra-syslog arguments can be classified in two classes: vaporware and correct fact. In the vaporware camp are things like the hash chaining “security urban legend”, the timezone argument (though he is right in regard to current practice inside distros), syslog network transport and compression (this list is not conclusive). Technically correct is the current store format, different log sources, and free-formedness of messages (I prefer the term “semi-structuredness”). This list is also not conclusive.

I think Lennart makes some good points, but discredits the paper somewhat by going overboard at times. It looks like he really needed some hard selling points (I also got this impression by his usage of the kernel.org breakage to promote this effort…). I think his paper would have been more useful if he had argued only those problems that actually exist. I am in full agreement that there are some spots that really deserve to be changed and addressed. Unfortunately, the paper is phrased in such terms that people not at least at the medium expert level on logging tend to believe everything that is stated.

The question is how the actual problems can best be fixed. Is it necessary to create a totally new infrastructure and throw away everything that exists? Maybe. I still prefer the alternative approach: why not extend existing technology? I modeled rsyslog specifically for this reason to be a highly modular system where extensions can easily be added. As far as I understand, syslog-ng has also moved to such a design in the recent v3 version. In rsyslog, I have accepted even experimental technology inside the source tree quickly. Getting a new log store was on my agenda for quite some month (the syslog-ng commercial fork already has it). I unfortunately had not enough time to implement it – and nobody else helped out with it. Wouldn’t it have been a good idea to contribute something to rsyslog instead of crafting something totally new?

Another thing that I strongly doubt is if the Linux journal idea will actually manage to solve the logging format dilemma. Microsoft’s event log is in place for 15+ years, and app developer still don’t use it correctly (as I initially wrote, the Linux Journal looks quite similar to the Windows Event Log). While I think the UUID idea is actually not a bad one, I seriously doubt all developers will understand and use it (correctly). This is a problem with the Windows Event log as well. One needs to know that a lot of high-profile folks are working for several years (10+) on solving this dilemma. Lennart may be a genius, but I have concerns that he over-promises (but I really wish he has success, that would be a very, very big advantage for the community).

One thing, I have to admit, that disappoints me is that Lennart never approached me with his proposal. He knows me (even personally) and we have worked together on systemd/rsyslog integration. I heard about journald first on a Google Alert and quickly after from some folks who asked me what went on. Then  I found out that the systemd development mailing list also never mentioned any work on journald. So, to me, it looks the idea was well hidden for a surprise at Kernel Summit. Well done, but not my style ;-) This missing openness concerns me. My decisions in regard to rsyslog were controversial at times and dictatorial at others (and for sure sometimes wrong). And we currently have some big and controversial discussion on rsyslog going on (partly fueled by the arrival of journald). But I have always played very open, communicated what I had on my mind (in advance), discussed and did never try to hide something. This, to be honest, is how I expect work to be carried out on an important system component. I also never met Lennart at any of the standard bodies work on logging. Not everyone runs Linux and probably not even everyone on Linux will run journald. So standards matter.

journald log hash chaining is broken

I promised to dig into some of the details of the journald announcement. One of the most hyped features is log hash chaining. Lennart describes this in his paper as follows (highlighting by me):

The Internet is a dangerous place. Break-ins on high-profile web sites have become very common. After a successful break-in the attacker usually attempts to hide his traces by editing the log files. Such manipulations are hard to detect with classic syslog: since the files are plain text files no cryptographic authentication is done, and changes are not tracked. Inspired by git, in the journal all entries are cryptographically hashed along with the hash of the previous entry in the file. This results in a chain of entries, where each entry authenticates all previous ones. If the top-most hash is regularly saved to a secure write-only location, the full chain is authenticated by it. Manipulations by the attacker can hence easily be detected.

For a moment, let’s assume he really means what he writes (I somewhat doubt that…). Then this is vaporware. You don’t get anything by providing a hash chain by itself. Let’s assume you have a log of 2,000 records. Now an attacker wants to remove record number 1,001 to 1,010. All he needs to do is seek to the proper location inside the (binary) file, and remove these 10 records, regenerating the hashes for record 1,011 to 2,000. Now let’s assume that you saved your initial hash to write only memory. First thing is that it probably is complicated to read the hash off from an unreadable location (write-only medium, mhhh ;)). Assuming you manage that, you can verify the whole log of now 1,990 records. You will not detect the missing records because the chain as such is perfectly well. This, by the way, is the main reason why I have not (yet) implemented such a simplistic method inside rsyslog.

This approach is “data sheet cryptography” at best. To do it right, you need some crypto experts. Bruce Schneier and John Kelsy have written a non-nonsene paper on securing computer audit logs (often called “Counterpane Paper”) in 1999. Note that John Kelsy and others have also written RFC5848, which describes how to securely sign syslog messages. This RFC went through numerous revisions and took a couple of years to complete. An interesting fact is that Albert Mietus reported the first implementation of syslog-sign (as RFC5848 was called these days) on EuroBSDCon in 2002! In his presentation “Securing Syslog on FreeBSD” he nicely describes what needs to be done.

I have not yet implemented this method in rsyslog because it has some serious issues when used in larger environments. When CEE discussed about signature chaining (note the difference to hash chaining!), I wrote a small paper about the issues with log signature chaining and remote logging. As I describe there, RFC5848 addresses only the less complex issues. This is not a failure of it’s authors, which for sure are real crypto experts – and me not. This is rooted in the fact that this is a very complex problem and a real good answer is still not known. As you can see, this is not something you can solve in with a few hours (or even days) of hacking.

Let me close with a quote from the journald paper: “The Internet is a dangerous place.”. And, indeed, it is. The most dangerous thing in my experience is a false sense of security. I guess black hats will *absolutely love* journald and its crypto stuff ;)

Update: Lennart’s non standard (for the logging comminity) use of the term top vs. bottom caused some confusion. Please be sure to read the comments attached to this posting. I probably need to blog about the issue again, but right now there are so many things going on. Again, read the comments, they have all information.

Quick update on log normalization

I just wanted to give a heads up on the status of log normalization. We have just released updated versions of libee and liblognorm. These provide important new features, like the capability to annotate events based on classification. This, among others, brings the libraries more back inline with recent CEE developments. Also, the log normalizer tool is nearly ready for prime time. The “only” thing that is missing is a decent set of rule bases. Thankfully, sagan already has a couple. I guess besides programming obtaining rule bases is a major thing to target.

As soon as I find time (I hope soon!), I’ll finalize some lose ends on the software side and get doc online on how to use the normalizer tool. I think with that a great tool for everyday use in log analysis will become available. Feedback and collaboration is always appreciated!

funding rsyslog development

To be honest, funding the rsyslog project is not easy these days. It never was, but has seen an extra hit by the current economic crisis. Rsyslog, in its initial phase, has been sponsored exclusively by Adiscon as part of its open source involvement. In 2007, we added rsyslog professional services with things like support contracts or custom development. While some customers used these services, Adiscon was still required to sponsor the project and is so until now. Unfortunately, professional services are not doing extremely well (to phrase it politely) and the global crisis is having a hit on Adiscon’s customers. As a consequence, I have been more involved with paid work during the past weeks and could not work as much on rsyslog as I had liked to. The shift in Linux logging that probably will be brought by journald (read blog posting) doesn’t strengthen my position inside Adiscon either and works as an accelerator for change…

We have been discussing for quite some while how to improve this situation. While I don’t like the idea, we probably need to think about a dual licensing approach for rsyslog. Please keep reading, you can be upset when I have made the rest of my argument ;-). First of all, I really don’t like dual-licensing. In fact, syslog-ng’s dual licensing approach was one reason that made me start working on rsyslog (blog post). I also know that rsyslog’s simple GPL license was one of the major “buying points” that made rsyslog become the default syslogd on Fedora and later many other distributions. In order to permit reuse of rsyslog technology in some other tools, in 2008 we created a licensing model that puts the so-called runtime – a large part of rsyslog – under LGPL (see “licensing rsyslog” and a previous blog post outlining the change). Syslog-ng later cloned this licensing model, but it seems like they put a couple of more things under LGPL than we did (so there seem to be rather weak “product driver” with most of the “real meat” being under LGPL – in rsyslog larger parts are GPL, only). There is an interesting article on lwn.net that tells about this development, and does so from a syslog-ng point of view. The most interesting fact I got from this article was that syslog-ng faced quite the same problems we have with rsyslog — and could not solve them without a commercial fork. Bare other options, it looks like this is a path that rsyslog needs to go, too. If so, of course this needs to be done as careful as possible.

After dual-licensing finally surfaced as something hard to avoid yesterday evening, I have done git log review today. I have to admit it was a bit scary: we have had some excellent and larger code contributions by Fedora folks in rsyslog’s infancy (and continuous support since them), we have had some larger chunks of code in form of modules contributed and there is Michael Biebl, who not only creates great Debian packages but always helps with autotools and smoothing some edges. Finally, we have a couple of folks who sent in very specific patches. But I have to admit that the very vast majority of code was written by myself ;) As of today, we have 2819 git commits. Out of them 2676 were made by me (and another 50 or so by other Adiscon folks). These number need to be taken with a grain of salt: rsyslog was initially kept in a CVS archive, and all contributions at that time were logged with my user account. The early Fedora patches were in that timeframe. That have been around 20 or so. Also, my commit count is a bit higher due to automatic merges. On the other hand, the difference in code lines is probably even a bit higher than the difference in commit count. I have not done any in-depth analysis, bu an educated guess is that more than 98% of code lines were written by me (after all, I have worked a couple of years on this project…).

I am now tasked with actually looking at the code. I will try to differentiate addon user contributions (like omoracle) from core files. This is useful anyway, because it makes clearer to users what is directly supported by the project and what not. Then, I will probably look into contributions and see which code remains at which locations. After that is know, I need to have another set of talks with my peers at Adiscon (and probably the top contributors) and see where we can head from here.


This is, honestly, how the state of affairs in regard to the rsyslog project currently is. Most probably we need to move to some commercial licensing model. I know this is not ideal. I know many of you will not really like it. On the other hand, it is plain fact that many for-profit organizations greatly benefit from rsyslog without ever contributing anything. While they can continue to do so, it is probably a good idea to help them find an offering that funds the project. As final remark for today, let me introduce you to a blog post that IMHO very nicely describes the problems, and needs, around dual licensing. I am not affiliated with the author, do not even know him.

I hope that the ideas described here will enable us to keep pushing forward with rsyslog technology, something I would really like to do!