Friday, November 25, 2011

journald log hash chaining is broken

I promised to dig into some of the details of the journald announcement. One of the most hyped features is log hash chaining. Lennart describes this in his paper as follows (highlighting by me):
The Internet is a dangerous place. Break-ins on high-profile web sites have become very common. After a successful break-in the attacker usually attempts to hide his traces by editing the log files. Such manipulations are hard to detect with classic syslog: since the files are plain text files no cryptographic authentication is done, and changes are not tracked. Inspired by git, in the journal all entries are cryptographically hashed along with the hash of the previous entry in the file. This results in a chain of entries, where each entry authenticates all previous ones. If the top-most hash is regularly saved to a secure write-only location, the full chain is authenticated by it. Manipulations by the attacker can hence easily be detected.
For a moment, let's assume he really means what he writes (I somewhat doubt that...). Then this is vaporware. You don't get anything by providing a hash chain by itself. Let's assume you have a log of 2,000 records. Now an attacker wants to remove record number 1,001 to 1,010. All he needs to do is seek to the proper location inside the (binary) file, and remove these 10 records, regenerating the hashes for record 1,011 to 2,000. Now let's assume that you saved your initial hash to write only memory. First thing is that it probably is complicated to read the hash off from an unreadable location (write-only medium, mhhh ;)). Assuming you manage that, you can verify the whole log of now 1,990 records. You will not detect the missing records because the chain as such is perfectly well. This, by the way, is the main reason why I have not (yet) implemented such a simplistic method inside rsyslog.

This approach is "data sheet cryptography" at best. To do it right, you need some crypto experts. Bruce Schneier and John Kelsy have written a non-nonsene paper on securing computer audit logs (often called "Counterpane Paper") in 1999. Note that John Kelsy and others have also written RFC5848, which describes how to securely sign syslog messages. This RFC went through numerous revisions and took a couple of years to complete. An interesting fact is that Albert Mietus reported the first implementation of syslog-sign (as RFC5848 was called these days) on EuroBSDCon in 2002! In his presentation "Securing Syslog on FreeBSD" he nicely describes what needs to be done.

I have not yet implemented this method in rsyslog because it has some serious issues when used in larger environments. When CEE discussed about signature chaining (note the difference to hash chaining!), I wrote a small paper about the issues with log signature chaining and remote logging. As I describe there, RFC5848 addresses only the less complex issues. This is not a failure of it's authors, which for sure are real crypto experts - and me not. This is rooted in the fact that this is a very complex problem and a real good answer is still not known. As you can see, this is not something you can solve in with a few hours (or even days) of hacking.

Let me close with a quote from the journald paper: "The Internet is a dangerous place.". And, indeed, it is. The most dangerous thing in my experience is a false sense of security. I guess black hats will *absolutely love* journald and its crypto stuff ;)

Update: Lennart's non standard (for the logging comminity) use of the term top vs. bottom caused some confusion. Please be sure to read the comments attached to this posting. I probably need to blog about the issue again, but right now there are so many things going on. Again, read the comments, they have all information.


michich said...

I don't follow your reasoning in the example.
Yes, the attacker can delete the 10 records and recalculate the hashes of the following records. And yes, the hash chain as such will be fine afterwards. But the previously recorded hash of the 2000th record (in a write-once location) will then be nowhere to be found in the new hash chain, will it?

Rainer said...

That's right, but this means the write-once location must be continously updated, as each record is written. Or maybe every ten minutes, if it is OK that 10 minutes of attack can go undetected. Also, if this is saved all so often, the write-once memory seems to be accessible, so as an attacker why not add the new hash there, too? Also, the journald paper says "If the top-most hash is regularly saved to a secure write-only location", with top-most usually meaning the "oldest". Anyhow even if you save the youngest hash in short intervals, this is not how crypto works. Read the papers to which I linked to see how it is done decently.

Rainer said...

Oh, and as a side-note: what's the difference then to writing all log entries to a write-once media? You can do this with syslog today ;-)

michich said...

Lennart was "inspired by git". In git the "top-most commit" usually means the HEAD, i.e. the newest one. I am reading the papers, thanks for the links.

Rainer said...

Well, in a paper on logging, it probably is a good idea to use logging terminology. I find it a bit inappropriate to talk to a community and not even try to understand the commnunities terms...

Anyhow, so what does the proposal (if meant as you say) offer over directly wiriting to a write-once medium (as data centers today do with important logs)? (I have to admit I thought something novel had been in the approach...)

Frank Ch. Eigler said...

"Anyhow, so what does the proposal (if meant as you say) offer over directly writing to a write-once medium (as data centers today do with important logs)? "

Only hypothetical space savings.

gasche said...

You completely missed the point of the proposed security method. He assumes that you have regularly saved the top-most hash. When checking if a given message chain may have been manipulated by an attacker, you have to :

1. Check that the top-most hash you remember is present somewhere in that chain

2. Check that the sub-chain that ends with this hash is validly hashed (eg. each hashing step is correct).

If you perform this method (and you hash function is secure enough), you are sure that no malicious modification was done in the sub-chain that ends with this hash. (Indeed you don't know about the newer part, because you haven't saved any information on that)

In your post, you only used method (2). Indeed this check is meaningless without check (1). The fact that you completely missed step (1) and didn't even mention it shows that you did not understand the proposed security measure.

This is not meant to be an attack: I read your "Serious syslog problems" post with great interest and completely agree with you that anything security-related needs to be confirmed by actual security expert. I came reading this post assuming that you had discovered a real flaw in the proposed technique, and am simply disappointed.

No harm meant, but still the rather triumphal tone of your post hurts, I think, your credibility.

Rainer said...

You are right with the observation, but please read the other comments -- they both tell the basic misunderstand and why the proposed method still does not solve the issue (at least not more than the traditional methods already used). It is also important to read the links to the papers I posted.

I should probably write a new posting on the issue. I have also learned that many folks seem to find hash-chaining useful (even though it has serious flaws). For that reason, I have written a prototype that does hash chaining, to be released soon. That effort, however, will be amended by some cryptographically sound method.

On the tone: I had writen this shortly after reading the journald proposal. Maybe I important some of their tone ;)

Rainer said...

I have to add to the tone: when I read what was proposed, I really thought "how silly can someone to actually propose that?" and I seem to have conveyed that thought ;-)

Again, the hash chaining is still silly, especially as no point is made how, how often, when and with wich permissions hashes are saved. I still think this is "datasheet cryptography" and it is very dangerous if you think you are secure just from using journald.

I have also updated to post with the information that one should read the comments before coming to a final conclusion.

gasche said...

Thanks for your answer. I think you are handling the "issue" gracefully -- which is nothing serious anyway, making occasional mistakes is human.

I think Frank Ch. Eigler is spot on: saving a hash is more convenient than saving the logs because its space usage is constant, as well as the time needed to check its correctness (in the "chain" case). This allows you to save the validity data in more places and check it more often.

That said, your concerns are also valid: you need a secure process of how/when/where to save the chain -- though I'm not sure it's something cryptography experts would handle that more precisely -- and you will eventually also need to save the full logs in a secure place anyway. It's more of a complementary quick-and-simple checksumming technique.

Rainer said...

In general, discussion seems to tell that people like the idea of hash chains (no matter which issues exist ;)). If so, than that idea is probably useful not only for journald, but also for syslog AND also for other log files as well. Consequently, I have begun to work an a tool that will hash-chain any log file (plus a small lib that offers this functionality). I intend to also make it self-verifyable via real signatures in the somewhat longer term. I'll post an announcement when the first version is ready (it's a simple standard *nix filter).

simplifying rsyslog JSON generation

With RESTful APIs, like for example ElasticSearch, you need to generate JSON strings. Rsyslog will soon do this in a very easy to use way. ...