Introducing liblognorm

Hi folks,

With this posting, I introduce a new project of mine: liblognorm. This library shall help to make sense out of syslog data, or, actually, any event data that is present in text form.

In short words, one will be able to throw arbitrary log message to liblognorm, one at a time, and for each message it will output well-defined name-value pairs and a set of tags describing the message.

So, for example, if you have traffic logs from three different firewalls, liblognorm will be able to “normalize” the events into generic ones. Among others, it will extract source and destination ip addresses and ports and make them available via well-defined fields. As the end result, a common log analysis application will be able to work on that common set and so this backend will be independent from the actual firewalls feeding it. Even better, once we have a well-understood interim format, it is also easy to convert that into any other vendor specific format, so that you can use that vendor’s analysis tool.

So liblognorm will be able to provide interoperability between systems that were never designed to be interoperable.

This sounds bold, so why am I thinking I can do this?

Well, I am working for quite some years in this field and have accumulated a lot of experience including a sufficient number of implementations where I failed in one way or another. You can read about this in my previous blog post on “syslog normalization“. So know-how is available. The core technology is actually not that complex. I hope to code the core parts of the lib within one month (but it may take some more real-world time as I can not yet work on it full time). Also, very importantly, there is the Common Event Expression (CEE) standard coming up. Its aim is nothing less than to provide the plumbing that is needed for log normalization. CEE is initiated by the same folks that made so successful standards like CVE alive — so it is something we can hope to succeed. Thankfully, I have recently become a member of the editorial board so I have good insight of what it takes to write a library that utilizes CEE.

This all sounds good, but there is one ingredient missing: liblognorm will not be able to magically know the format of each message. We must teach it the formats. This involves some community effort as a single person can not compile all message formats this IT world has to offer. Not even a small group can do. I hope that we can get sufficient log samples from the rsyslog community and hopefully later from a growing liblognorm community. I have thought long about how this can be done. A core finding is that it must be dumb easy to do. This is how it is intended to work:

When you feed liblognorm a message that it can not recognize, it will spit out that message. Let’s assume we have the following message:

AAA credentials rejected: reason = reason: server = server_IP_address: user = user

Then the user just needs to tell which fields it contains and do so via a simple syntax. A hypothetical sample could be:

reject,AAA:AAA credentials rejected: reason = %reason_msg%: server = %sourceIP%: user = %user%

The strings “reject” and “AAA” are tags. Tags will be placed in comma-delimited format in front of the actual message sample and are terminated by a colon. Everything after the colon is actual message text. The field between percent signs reflect some well-known properties (which are taken from the CEE base def and/or are custom defined). The syntax will be taken from a data dictionary, so the user does not need to bother about that in most cases. So creating a message sample out of an unknown message type should be fairly easy.

The idea now is that we gather these one-line message samples and compile them into a central repository. Then, all users can download fresh versions of the “sample base” and use them to normalize their tags — much like a virus scanner works. Of course, there are a number of questions in regard of how trustworthy samples are. But this is definitely something we can solve. To get started, we could simply do some manual review first, much like code integration. At later stages, some trust level policy could be applied. Well-known technology…

None of this is cast in stone as I am in a very early stage of this project. Your feedback definitely makes a difference and so be sure to provide ample ;)

That’s it for today, but I plan to do a series of blog posts on the new system within the next couple of days. Please provide feedback whenever you have it. It’s definitely something very useful to have. I’ll also make sure that I set up a new list for this effort, but initially I will be abusing the rsyslog mailing list, as I assume folks on that list will definitely be interested in liblognorm as well. And, of course, once the library is ready I’ll utilize it in various ways inside rsyslog.