Thursday, December 02, 2010

log normalization with rsyslog

I just wanted to give you a quick heads-up on my current development efforts:  I have begun to work heavily on a message modfication module for rsyslog which will support liblognorm-style normalization inside rsyslog. In git
there already is a branch "lognorm", which I will hopefully complete and merge into master soon. It provides some very interesting shortcuts of pulling specific information out of syslog messages. I'll probably promote it
some more when it is available. IMHO it's the coolest and potentially most valuable feature I have added in the past three years. Once I have enabled tags in liblognorm/libee, you can even very easily classify log messages
based on their content.
 

Wednesday, November 17, 2010

normalizing Apache Access logs to JSON, XML and syslog

I like to make my mind up based on examples, especially for complex issues. For a discussion we had on the CEE editorial board, I'd like to have some real-world example of a log file with many empty fields. An easy to grasp, well understood and easy to parse example of such is the Apache access log. Thanks to Anton Chuvakin and his Public Security Log Sharing Site I also had a few research samples at hand.

Apache common log format is structured data. So there is no point in running it through a free-text normalization engine like liblognorm. Of course it could process the data, but why use that complex technology. Instead, the decoder is now part of libee and receives a simple string describing which fields are present. It's called like this:

$ ./convert  -exml -dapache -D "host identity user date request status size f1 useragent" < apache.org > apache.xml

Options specify encoder and decoder, and the string after -D tells the convert field names and order. But now let's speak the input and output for itself:

The converter works by calling the decoder, which creates an in-memory representation of the log format in a CEE-like object model. Then, that object model and the called-for encoder is used to write the actual output format. That way, the conversion tool can convert between any structured log format, as long as the necessary encoders and decoders are available. This greatly facilitates log processing in heterogeneous environments.

Note that liblognorm works similar and, from libee's point of view, can be viewed at as an decoder for unstructured text data (or, more precisely, for text data where the structure is well-hidden ;)).

Friday, November 12, 2010

log normalization - first results

At the beginning of this week I was pretty confident, that I would not make my self-set deadline of one month to implement a first rough proof of concept of liblognorm, a log normalizing library. Fortunately, I made extremely good progress the past two days and I am now happy to say that I have such a proof of concept available. All of this can be seen by pulling from Adiscon's public git server: you need libestr, libee and liblognorm to make it work.

Right now, I'd like to provide a glimpse at how things work. Thanks to Anton Chuvakin and his Public Security Log Sharing Site I got a couple of examples to play with (but I am still interested in more lag samples, especially from Cisco devices). Out of the many, I took a random messages.log file written by sysklogd. This is my input file and can be seen here.

To normalize events, liblognorm needs to know which fields are present at which positions of the input file. It learns this via so-called "samples". Samples are very similar to the patterns used by virus scanners: like virus patterns describe how a specific virus looks, log samples describe how a specific log line looks. Other than virus patters, I have crafted a format hopefully easy (enough) to understand by sysadmins, so that everyone can add relevant samples himself. To support this, samples look relatively similar to actual log lines, and this is the reason I have termed them "log samples". Like log files, samples are stored in simple text files. For the initial test, I used a a very small set of samples, available here. A production system will have many more samples, and I envision systems that have many (ten?-) thousand of samples loaded at the same time. If you look at the samples, take special care about entities enclosed in '%' - these are field definitions, the rest is literal text.

The actual normalization is performed by the libraries engine, which parses log lines, based on the samples, into fields. This creates an in-memory representation of the event, which can than be processed by the driving application or be written to some other media or the network.

Liblognorm will come with a small toll called "the normalizer". It is a minimal library user: it loads a sample database and reads log lines from standard input, creates the event in-memory representation and then writes this representation to standard output in a standardized format. So far, it supports formats as they are expected for the upcoming CEE standard.

The result of a normalizer run on my test input file based on the provided sample base can be seen here. The output is actually a bit more verbose than described above, because it lists the to-be-normalized line as well. If you look at the properties I extracted, you'll probably notice that some do not make too much sense (maybe...). Also, a classification of the message is missing. Don't care about these aspects right now: it's a proof of concept and these things will be addressed by future development (the classification, for example, will be based on CEE taxonomy via tags).

I hope I was able to convey some of the power that is available with liblognorm. Of course, a "little bit" of more work and time will be required to get it production-ready. Unfortunately, I will be unavailable for larger parts of the next two weeks (other work now pressing plus a long-awaited seminar ;)), but I will try to get liblognorm as quickly as possible into the best shape possible. In the meantime, if you like, feel free to have a look at its code or play with it. All of what I wrote can actually be done with the versions available in git.

Call for Log Samples

My log normalization effort made good progress and I have a very rough first proof of concept available. It will take a log sample database, and transform input log files to a CEE-like output format.

Now I am looking at ways to practice-test it. So I'd appreciate if you could point me to some sources of log files. It mustn't be terabytes, but they should be anonymized and be usable in the public Internet. For obvious reasons, it would be good if they are from widely deployed devices.

I would use a subset of these samples to extract usable sample database entries and see how the run through the normalizer.

Thanks,
Rainer

Tuesday, November 09, 2010

libee status update


Yesterday I reached some important milestones:
  • defined an internal simple event representation format "int" to create test cases
     http://doc.libee.org/classee__int.html
  • wrote a decoder "int" -> CEE
  • wrote a syslog encoder CEE->syslog
    (with enough escapes to be testable, but without all details)
  • wrote a small program (int2syslog) that ties these pieces togetherand can be used to visualize test cases in syslog representation
 All is available from the libee git at http://git.adiscon.com/?p=libee.git;a=summary

Thursday, November 04, 2010

libestr git now available

The public libestr git repository is now online. I have also added enough code so that I can try to use it in libee ;)

introducing libestr

I tried to avoid this, but it looks like I need to start a new project. I need a simple counted-byte string "class" for the CEE effort. All of that is already present in rsyslog, but using it's runtime library is overkill. I could have copied the string code, but I really don't like to have multiple copies of the same code around. So I'll now create a new small library just for that purpose. Well, while I am at it, I'll probably also add basic hashtable support, as this can help keep libee small for cases when no XML-specific functionality is desired. And I don't want to start yet another library for that (we already have lib inflation... ;)).

I named it libestr, bare of a better name. I thought a while about the name, but could not find anything really decent. "estr" means "essentials for string handling" and is probably descriptive enough. Quite honestly, I really like to gain some speed in coding again instead of just creating skeletons and thinking about names...

Wednesday, November 03, 2010

logging and the C NUL problem

Again, I ran into the "C NUL Problem", that is the way C strings are terminated. Unfortunately, the creators of C have represented strings as variable arrays of char without an explicitely-stated size. Instead of a size property, a C string is terminated by an US-ASCII NUL character ('\0'). This works well enough in most cases, but has one serious drawback: the NUL character is a reserved character that cannot be part of any C string. So, for example strlen("A\0B") equals one and not three as one would usually expect.

CERT has a good presentation of some of the more important problems associated with the standard C string handling functions. I do not intend to reproduce this here or elaborate on any further details except that we get into big trouble if NUL characters are used e.g. in logging data sets. We had this problem in the IETF syslog WG, where we permited NUL to be part of the syslog message text, but permitted a receiver to escape it. This is far from being an ideal solution, but we considered it good enough, especially as it permits to keep compatible with existing toolset libraries.

Now, in CEE, we face the same challenge: the problem is if the in-memory representation of event fields should permit NUL characters. The correct technical answer to this question is "yes, of course", but unfortunately it has a serious drawback that can affect adoption: if NULs are permited, none of the string handling functions of the C runtime library can be used. This is, as said above, because the C runtime library is not able to handle NULs inside "standard" C strings. A potential solution would be to escape NULs when they enter the system. However, that would require an additional reserved character, to do the escaping. So in any case, we'll end up with a string that is different from what the "usual" runtime library routines expect.

Of course, this problem is not new, and many solutions already have been proved. The obvious solution is to replace the standard C string handling functions with octet-counting functions that do not require any reserved characters.

A short, non- weighted list of string replacement string libraries is:
Note that some of them try to mimic standard C strings as part of their API. I consider this highly dangerous, because it provides a false sense of security. While the library now can handle strings with included NUL characters (like "A\0B"), all parts of the string after the first NUL will be discarded if passed to a "regular" C runtime library string function (like printf!). So IMO this is a mis-feature. A replacement library must explicitely try to avoid compatibility to the C runtime library in order to safe the user from subtle issues, many of them resulting in security problems (think: information hiding).

Unfortunately, I could not identify any globally-accepted string replacement library that is in widespread use.. Despite its deficits, C programmers'  tend to use the plain old string functions present in the standard C runtime library.

So we are back to the original issue:

If CEE supports NUL characters inside strings, the C standard string library can not be used, and there are also problems with a potentially large number of other toolsets. This can lead to low acceptance rate.

But if CEE forbids NUL characters, data must be carefully asserted when it enters the system. Most importantly, a string value like "A\0B" must NOT be accepted when it is put in via an API. Experience tells that implementors sometimes simply overlook such restrictions. So this mode opens up a number of very subtle bug (security) issues.

I am very undicided which route is best. Obviously, a sound technical solution is what we want. However, the best technical solution is irrelevant if nobody actually uses it. In that light, the second best solution might be better. Comments, anyone?

Thursday, October 21, 2010

libee - first peek preview available

I have just published a first preview of the libee library API. This work is obviously far from being finished, but I think the preview provides at least some idea of how it will materialize.

To tell a bit about the status in general: I have completed a first round of evaluation of CEE objects and concepts, based on the draft standard. Note that the draft standard is far from being finished, and will probably undergo serious changes. As such, I have not literally taken object definitions from there. Instead, I have mostly pulled what I am currently interested in and also have added some additions here and there, as I need them to do what I intend to do.

My primary goal is to get some proof of concept working first, then re-evaluate everything and probably do a rewrite of some parts. For the same reason, performance is currently not on my radar (but of course I always think about it ;)).

I would also like to note that the libee git is also available -- and a good way to follow libee development for those interested. Comments are always appreciated and especially useful in this early phase.

Tuesday, October 19, 2010

Sagan and rsyslog [Guest Posting]

Hi everyone,

I am happy to feature a guest post written by Champ Clark III, the author of Sagan, a real time, "snort like" event and system log sniffing tool. Champ will explain a bit about it and how he made it work with rsyslog.  I think it is a very interesting project and I am glad it now has full rsyslog support.

But enough of my words, enjoy the real thing ;)

Rainer

I admit it, I'm a recent convert to rsyslog. I've known about rsyslog for years, but have only recently started using rsyslog in production environments. The primary reason for looking into rsyslog is users of Sagan are requesting support for it. I'm very glad they pushed me in that direction. I knew how popular rsyslog was,
but the 'hassles' of changing our core logging facilities seemed like a pain.

I can tell you, it was easy and seamless. Also, after reading Rainer Gerhards' excellent, "rsyslog: going up from 40K messages per second to 250K", I knew that I liked this project.

So I bit the bullet, and started working with Sagan and rsyslog. I haven't looked back since.

I work in the network & computer security field. I've known for years the importance of log management. One thing that I had noticed was a lack of open source log & packet levelcorrelation engines. This is essentially what Sagan does. One common comparison of Sagan is Cisco's MARS. Sagan reads in your logs and attempts to correlate the information with the intrusion detection/prevention system's packet level information.

At Softwink, Inc, my place of employment, we monitor security events for various clients. At the
packet-level inspection for 'bad events' (security related), we use Snort. Snort 'watches' the network connectionsand sends out an 'alert' when it sees nefarious traffic. We configure Snort to send the 'alert' to a MySQL database for further analysis. We can then monitoring these Snort sensors for 'bad events/attacks' in real time.

However, we found that we were missing the 'bigger picture' without logs. This is where rsyslog and Sagan come into play. Essentially, we take all machines and equipment on a network and forward it to a
centralized server. Rsyslog is the 'receiver', and sometimes the sender of these log messages. In many cases, we find that centralized secure logging is a requirement for clients. With rsyslog, we
have the ability to store log information into a MySQL database for archive purposes. We can then give the client access to the log information via Loganalyzer for easy, simple retrieval.

How does Sagan fit into this picture? For security analysis, we only want key, typically security related, events from the logs. Manually searching databases for 'security related' events is prone to error. It is easy to 'miss' key events. Sagan is the 'eyes on the logs' watching for security related events in real time. First, Sagan has to have access to the logs coming into the network. This is very simple with Rsyslog:

# rsyslog.conf file.
#
# As rsyslog receives logs from remote systems,  we put them into a format
# that Sagan can understand:
#

$template
sagan,"%fromhost-ip%|%syslogfacility-text%|%syslogpriority-text%|%syslogseverity-text%|%syslogtag%|%timegenerated:1:10:date-rfc3339%|%timegenerated:12:19:date-rfc3339%|%programname%|%msg%\n"

# We now take the logs,  in the above format,  and send them to a 'named pipe'
# or FIFO.

*.*     |/var/run/sagan.fifo;sagan


Sagan can now 'read' the logs as they come into rsyslog from the /var/run/sagan.fifo (named pipe/FIFO) in real time. rsyslog actually performs double duty for us; logging to our MySQLdatabase for archival purposes and handing Sagan log information for analysis.

Over all, there is nothing really new about this concept. However, Sagan does something a bit different than other log analysis engines. When Sagan sees a 'bad event', Sagan will log that to your Snort IDS/IPS MySQL/PostgreSQL database. What does this mean? Packet level security events and log events reside in
the same database for correlation. There are several advantages; for one, we can now have a single, unified console for log and IDS/IPS events! Second, we can now take advantage of Snort front-end
software to view log events. For example, if you use BASE or Snorby to view packet level IDS/IPS events, you can use the same software to view log level Sagan events. Maybe your shop uses report generation
tools that query the Snort database to show 'bad packet events' in your network. Guess what. You can use those same reporting tools for your log information as well. I've posted some example screen shots of Snort & Sagan working together here. The idea is that we take advantage of the Snort community's work on consoles.

Correlation with Sagan and Snort, at the engine level, works several different ways. First, Sagan can in some cases pull network information directly from the log message and use that for correlation in the SQL database. For example, let's say an attacker is probing your network and is attempting to get information on the SMTP port. The attacker sends your SMTP server 'expn root'. Your IDS/IPS engine will 'detect' this traffic and
store it. It'll record the source IP, destination IP, packet dump, time stamp, etc. Sagan will do the same at the log level. Sagan will 'extract' as much of the information from the log message for further correlation with the packet level.

Recently, Rainer announced liblognorm (early liblognorm website). This is an exciting project. The idea is to "normalize" log information to a nice, standard usable format. I plan on putting as much support and effort as I can into this project, because it's an important step. For Sagan, it means we will be able to better
correlate information. In my time to ponder about it since its recent announcement, I can see liblognorm being extremely useful for many different projects.

Sagan also shares another feature with Snort; it uses the same rule sets. Sagan rules sets are very much 'Snort like'. Here is an example rule (this is a single line, broken just for readability):

alert tcp $EXTERNAL_NET any -> $HOME_NET 22 (msg:"[OPENSSH] Invalid or illegal user";
pcre: "/invalid user|illegal user/i"; classtype: attempted-user;
program: sshd; parse_ip_simple; parse_port_simple; threshold:type limit, 
track by_src, count 5, seconds 300; reference: 
url,wiki.softwink.com/bin/view/Main/5000022; sid:5000022; rev:4;) 

If you're already a Snort user, then the format and syntax should be very simple to understand. We use 'pcre' (regular expressions) to 'look' for a message from the program 'sshd' that contains the term 'invalid user' or 'illegal user' (case insensitive). We set the classifications, just as Snort does (for further correlation). We can 'threshold' the rule, so we don't get flooded with events.

Sagan uses this format for a variety of reasons. For one, its a well know format in the security field. Second, we can now take advantage of Snort rule maintenance software! For example 'oinkmaster' or 'pulled pork'. The idea is that with Sagan, you don't need to 're-tool' your network in order for it to work.

Using Sagan with your Snort based IDS/IPS system is just a feature of Sagan. Sagan can operate independently from Snort databases, and offers the normal bells/whistlers you'd expect in a SEIM (e-mailing alerts, etc).

To tie all this together, it means we can simply monitor packet level threats and log level events from a unified console. We can monitor just about everything in a network from the log level standpoint. We can monitor Cisco gear, Fortigate firewalls, Linux/*nix servers, wireless access points, etc.

Sagan is a relatively new project and still under development. Like rsyslog, Sagan is built from the ground up with performance in mind. Sagan is multi-threaded and written in C with the thought that it should be as efficient with memory and the CPU(s) as possible. Rsyslog seems to follow the same philosophy, yet another reason I made the switch.

The more information you know about a threat to your network/system, the better off you'll be. That is what the mix of rsyslog and Sagan offers. Throw in IDS/IPS (Snort) monitoring, and you can get a complete view about 'bad things' happening in your environment.

For more information about Sagan, please see http://sagan.softwink.com.

Thursday, October 14, 2010

CEE library will be named libee

After some discussions, we have finally decided to name the CEE part of the original liblognorm project "libee". Note the missing "c" ;) We originally thought that libcee would be a smart name for a library implementing an API on top of the upcoming CEE standard. However, there were some complexities associated with that because "libcee" sounds much like *the* official library. While I hope I can finish a decent implementation of the full CEE standard, I think it is a bit bold to try to write the standard lib when the standard is not even finished.

So the compromise is to drop the "c". The "ee" in libee means "Event Expression", much as CEE means "Common Event Expression". If CEE evolves as we all hope and if I manage to actually create a decent implementation of it, we may reconsider calling the library libcee. But that's not important now and definitely something that will involve a lot of people.

In the mean time, I wanted to make sure you all understand what I mean when I talk about "libee". I hope to populate its public git with at least a skeleton soon (liblognorm reaches a point where I need some CEE definitions, so this becomes more important).

Wednesday, October 13, 2010

liblognorm site comes online

While there is not yet much content, the liblognorm site has been put online today. Over time, this will become an important place to both learn about liblognorm AND share log samples. It will most probably also contain the area that you can use to download new log samples (much like you download virus patterns for a scanner). But for now, I just wanted to share the good news.

liblognorm will use passive Unicode mode (UTF-8)

I thought a while on how to support Unicode in liblognorm. The final decision is to use passive mode, which is a very popular option under Linux. A core driver behind this decision is the ability to safe lots of space (and thus also cache space and so processing time as well) as the majority of log content is written in US-ASCII. This is even the case in Asian countries, where large parts of the log message are usually ASCII but contain a few select fields in local language support (like names). Even if the message itself is in local language, there is a lot of punctuation and numbers in them, so I think the overall result will not use up notably more space than a UTF-16 implementation. I18N-wise, it must also be noted that UTF-16 is a very small (but important) subset of full unicode, so using UTF-8 gives us the ability to encode full 32-bit UCS-4 characters should there be need to do so.

The same decision will apply to the CEE library (whatever it will be named). This is also nicely in line with libxm2, which I intend to use for XML parsing.

Tuesday, October 12, 2010

first glimpse of liblognorm API available

While this is not yet much, I have populated liblognorm's git repository with some early (and very incomplete) code. Most importantly, it contains a glimpse liblognorm API, which I tried to keep as simple as possible.

All of this material is, for obvious reasons, very preliminary and incomplete. But it may be of interest for those folks interested in the library.

Will my CEE library be named libcee?

After my thinking about splitting off a CEE library from the liblognorm project, Michael Biebl suggested that the CEE part of it be named "libcee". This sounds very natural and decent. However, CEE is a trademark of the Mitre corporation, which helps establish the standard. I have asked the folks there if it is possible to use this name. Not surprising, the answer will take a little while.

I need to progress while waiting. So I have decided to name the library libcee for now, but rename it if Mitre decides this is not possible. That should not be much of a problem, as this decision is probably much quicker made than me writing the first releasable version of it ;)

Monday, October 11, 2010

splitting up the normalization library

I have dug into the design of my upcoming event/log normalization library. As it will base on CEE, I intend to pull in CEE definitions for types defined there, like tags or field types. Also, I thought about what the library should output. An obvious choice for many use cases is an in-memory object model describing the normalized form of the event that was passed in. This is probably most convenient for applications that want to do further processing on the event.

However, it also seems useful to have the ability to serialize this data in the form of a text string. That string could be stored in a file for later reference, forensics or to feed some other tool capable of understanding the file format. And as the in-memory object model will be CEE based, and CEE defines such serialization formats, it seems obvious that the library should be able to generate serialization based on the CEE-defined and supported formats (note that does not necessarily means XML, it may be JSON or syslog structured data as well).

Looking at all this, the normalization library seems to consist of two largely independent (but co-operating) parts:
  • the parser engine itself, that part that is used to actually normalize the input string according to the provided sample base and CEE definitions
  • a CEE support library, which provides the plumbing for everything that is defined in CEE (like tags, field types and serialization formats)
Now consider that I intended to create the normalization feature as a separate library instead of a rsyslog plugin because I hope that other projects can reuse it. Looking at the above bullet points, it looks like it is also natural to split core parser from CEE functionality. Again, there seems to be a broader potential user base for generic CEE functionality than for normalization. For example, a CEE support library could also be used by projects that natively support CEE. It hopefully would safe them the hassle of coding all CEE functionality just to do some basic things. Think, for example, on some application that would "just" like to emit a well-formed CEE log record (a very common case, I guess). With a library, it could just generate (via the library) a proper in-memory representation of the event and then have the library process it. The library could then also check if it is syntactically correct and contains all the necessary fields to conform to a specific CEE profile.

The more I think about it, the more I think it is useful. So I'll probably split the core normalization library from the CEE part. This is not much effort, but opens up additional uses. I'll call the normalization part then liblognorm (or libeventnorm) and the CEE part libcee -- or something along these lines. Under this light, liblognorm may actually be a better name, because the parser part is more concernd about logs and log files instead of generic events (which often come in other format).

Again, feedback is appreciated!

Sunday, October 10, 2010

liblognorm or libeventnorm?

Names are always important. While thinking about liblognorm's desing, it occured to me that the name may be "too limited". In fact, the library will make havy use of CEE, the common event expression standard. In CEE, the term "log" was deliberately avoided in spite of the broader term "event". And thinking about this, my library is much more about normalizing all type of event representations than it is about normalizing "just" logs. So libeventnorm would probably be a better name. So for now, I think I will use the new name.

Any feedback in this issue is appreciated. I am also open to new name suggestions. But I think I will populate a public git early next week, so we should have fixed a name by that then ;)

Thursday, October 07, 2010

Introducing liblognorm

Hi folks,

With this posting, I introduce a new project of mine: liblognorm. This library shall help to make sense out of syslog data, or, actually, any event data that is present in text form.

In short words, one will be able to throw arbitrary log message to liblognorm, one at a time, and for each message it will output well-defined name-value pairs and a set of tags describing the message.

So, for example, if you have traffic logs from three different firewalls, liblognorm will be able to "normalize" the events into generic ones. Among others, it will extract source and destination ip addresses and ports and make them available via well-defined fields. As the end result, a common log analysis application will be able to work on that common set and so this backend will be independent from the actual firewalls feeding it. Even better, once we have a well-understood interim format, it is also easy to convert that into any other vendor specific format, so that you can use that vendor's analysis tool.

So liblognorm will be able to provide interoperability between systems that were never designed to be interoperable.

This sounds bold, so why am I thinking I can do this?

Well, I am working for quite some years in this field and have accumulated a lot of experience including a sufficient number of implementations where I failed in one way or another. You can read about this in my previous blog post on "syslog normalization". So know-how is available. The core technology is actually not that complex. I hope to code the core parts of the lib within one month (but it may take some more real-world time as I can not yet work on it full time). Also, very importantly, there is the Common Event Expression (CEE) standard coming up. Its aim is nothing less than to provide the plumbing that is needed for log normalization. CEE is initiated by the same folks that made so successful standards like CVE alive -- so it is something we can hope to succeed. Thankfully, I have recently become a member of the editorial board so I have good insight of what it takes to write a library that utilizes CEE.

This all sounds good, but there is one ingredient missing: liblognorm will not be able to magically know the format of each message. We must teach it the formats. This involves some community effort as a single person can not compile all message formats this IT world has to offer. Not even a small group can do. I hope that we can get sufficient log samples from the rsyslog community and hopefully later from a growing liblognorm community. I have thought long about how this can be done. A core finding is that it must be dumb easy to do. This is how it is intended to work:

When you feed liblognorm a message that it can not recognize, it will spit out that message. Let's assume we have the following message:

AAA credentials rejected: reason = reason: server = server_IP_address: user = user

Then the user just needs to tell which fields it contains and do so via a simple syntax. A hypothetical sample could be:

reject,AAA:AAA credentials rejected: reason = %reason_msg%: server = %sourceIP%: user = %user%

The strings "reject" and "AAA" are tags. Tags will be placed in comma-delimited format in front of the actual message sample and are terminated by a colon. Everything after the colon is actual message text. The field between percent signs reflect some well-known properties (which are taken from the CEE base def and/or are custom defined). The syntax will be taken from a data dictionary, so the user does not need to bother about that in most cases. So creating a message sample out of an unknown message type should be fairly easy.

The idea now is that we gather these one-line message samples and compile them into a central repository. Then, all users can download fresh versions of the "sample base" and use them to normalize their tags -- much like a virus scanner works. Of course, there are a number of questions in regard of how trustworthy samples are. But this is definitely something we can solve. To get started, we could simply do some manual review first, much like code integration. At later stages, some trust level policy could be applied. Well-known technology...

None of this is cast in stone as I am in a very early stage of this project. Your feedback definitely makes a difference and so be sure to provide ample ;)

That's it for today, but I plan to do a series of blog posts on the new system within the next couple of days. Please provide feedback whenever you have it. It's definitely something very useful to have. I'll also make sure that I set up a new list for this effort, but initially I will be abusing the rsyslog mailing list, as I assume folks on that list will definitely be interested in liblognorm as well. And, of course, once the library is ready I'll utilize it in various ways inside rsyslog.

Wednesday, October 06, 2010

Linux Kongress 2010 rsyslog paper online

I was able to place my conference paper "rsyslog: going up from 40K messages per second to 250K" online. It provides a good drill-down of what we did during the first performance tuning effort. Hopefully, it is also useful to other developers enhancing userland single-threaded applications to multi-threading.

I do not only focus on what we did well, but also provide quite some insight on where we (I!!) failed.

Tuesday, October 05, 2010

rsyslog now supports Hadoop's HDFS

I will be releasing rsyslog 5.7.1 today, a member of the v5-devel branch. With this version, omhdfs debuts. This is a specially-crafted output module to support Hadoop's HDFS file system. The new module was a sponsored project and is useful for folks expecting enormous amounts of data or having high processing time needs for analysis of the logs.

The module has undergone basic testing and is considered (almost) production-ready. However, I myself have limited test equipment and limited needs for and know-how of Hadoop, so it is probably be interesting to see how real-world users will perceive this module. I am looking forward to any experiences, be it good or bad!

One thing that is a bit bad at the moment is the way omhdfs is build: Hadoop is Java-based, and so is HDFS. There is a C library, libhdfs, available to handle the details, but it uses JNI. That seems to result in a lot of dependencies on environment variables. Not knowing better, I request the user to set these before ./configure is called. If someone has a better way, I would really appreciate to hear about that.

Please also note that it was almost impossible to check omhdfs under valgrind: the Java VM created an enormous amount of memory violations under the debugger and made the output unusable. So far I have not bothered to create a suppression file, but may try this in the future.

All in all, I am very happy we now have native output capability for HDFS as well. Adding the module also proved how useful the idea of a rsyslog core paired up with relatively light-weight output/action modules is.

comments on this blog are re-enabled

Finally, Google has create some useful tools to help fight comment spam. As a result, I was able to quickly delete the remaining 100 (or so) spam comments and so I can now show comments on this blog again. This is useful, as they contain a lot of insight. Also, I hope that my readers will now comment again!

Thursday, September 30, 2010

Some results from LinuxKongress

I had a great time both attending the talks on Linux Kongress 2010 in Nuremberg as well as giving my presentation on rsyslog performance enhancements. Even more, the social events helped driving things forward. Among others, I met with Lennart Poettering, who currently works on systemd. He had a couple of good suggestions for rsyslog and I hope to implement at least some of them. On the list for the next v5-devel release are definitely improvements for imuxsock, like the ability to obtain the process id of that process that actually emitted the log message (via SCM_Credentials). Once this is known, we can do a couple of "interesting things" with it.

Monday, August 23, 2010

rsyslog queues, reliability and forwarding

I would like to explain a little bit how rsyslog queues and forwarding interact. First of all, let me state that the queue engine and the action part of rsyslog, where the forwarding happens, are strictly separated. There is no interface between the two that permits an action to affect queue behaviour. For example, I was asked if a forwarding action could initiate disk queue mode when the forwarding fails. The root reason for this was that messages should not be endagered while a remote server fails.

This is not possible with the current design and involves a far-from-trivial design change. However, I do not think that the functionality is actually needed. When talking about reliablity, rsyslog works on the importance of messages and not on the importance of actions.

So in rsyslog we can configure the level of message loss that is acceptable for a given use case. This can be done on an action-by-action basis (or once at the ruleset/main queue level for all actions -- usually not a good idea, but from time to time it may be). The extreme ends are a) no message loss at all permited and b) message loss of any magnitude is acceptable. For a), this means we need to configure a disk-only queue with full sync of queue files and management information after each message processed (with message input batches of one). For b), this means we do not need to place any restrictions. Obviously, a) is rather slow while b) is extremely fast. As extremes are not usually what is needed, there are many configuration possibilities between these two extremes. For example, one may define a queue the goes to disk if more than 1,000 messages are kept in main memory, but only then, and that fsyncs queue files every 10 messages (a big performance saver). That means that at any instant, at most 1,010 messages are at risk and can potentially be lost. The queue than monitors these predicates and switches to disk mode only when required. This is a very big performance saver.

Now let's switch a bit the perception of things: Let's go with our starting example and say you want to go to disk only when the remote system is down. But what if the remote system is up, but can not accept messages quickly enough. Let's assume a backlog of 10,000 messages builds up. Is it then acceptable to keep these in main memory, just because the remote system is accepting data? If this risk is acceptable, why would it be inacceptable if the remote system is not yet accessible. If we say one case is acceptable but the other not, we somewhat contradict ourselves: it is almost random if the remote system is accepting messages, so why does it make a difference in the risk we permit?

This contradiction is the core reason why rsyslog does not look at external events or action failure causes but rather works on the basis of "acceptable risk". Let's say it is acceptable to lose 1,000 messages. Then, it is irrelevant if we lose these while the remote system is accepting or not. Consequently, rsyslog enforces disk mode if the remote system is down and there are more than 1,000 messages inside the queue. But it does not enforce this if there are only 500 messages waiting to be sent? Why should it? After all, the user has specified that a loss of 1,000 messages is acceptable, and so we do not try to guard these messages more than required by this policy. Note, of course, that if rsyslog is terminated in such a situation, of course a disk queue with 500 messages is created. We do not voluntarily lose messages, and if we terminate, we can no longer hold them in main memory. Consequently, they must be written out (of course, again depending on configuration). So the in-memory queue is retained across rsyslog restarts. But it is important to point out that unexected aborts - like sudden loss of power - can cause message loss in such scenarios. This is no different from sudden loss of power with an accepting remote system and a queue of 500. If such a risk is unaccetable, we have what I initially described in scenario a).

As a side note, rsyslog queues provide very high reliability. Every message is removed from the queue only after the action acknowledges that it has been processed. This kind of reliablity is used in some very demanding audit-grade environments (which I, as usally, not permitted to name...).

To sum up, rsyslog protects message integrity not be external events but by user-configurable "acceptable risk" policies.

We consider this a superior approach, as external events are somewhat unreliable when it comes to protecting traffic bursts. Relying on external events has a number of anomalies, as hopefully explained above.

Wednesday, August 11, 2010

getting serious with rsyslog v6

I have just created a new v5-devel branch based on the current master AND have then merged the newconf branch into master. This "officially" means the beginning of rsyslog v6. So I thought that justifies a short blog post.

As already elaborated, v6 will focus on a better configuration language. The current version already has scoping for actions, but no doc yet for it. I will try to add the doc, so that I can hopefully officially release the devel version this week. I'd also like to work a bit more on imptcp, so that I have some common base functionalty in all versions that support it.

Thursday, August 05, 2010

Starting new v5-beta branch

I will start a new v5-beta branch soon. I have seen that a number of things have been changed since the last stable release. Most importantly, some of these changes fix bugs. Bug, that are hard to fix in the current stable version. This is because v5 got not so much exposed to reality, but adoption rate seems to increase and so we have been able to iron out some issues while doing refactoring on the latest development release.

I have thought about backporting the bug fixes. However, this doesn't seem to make too much sense: it is a lot of work, and, when done the current v5-stable will have code very close to the development branch. So I decided to accept the bugs for the time being and instead see the we can get a new 4.6.0 stable release as soon as possible. The first step on that route is to create a new beta. Based on past experience, I think we can promote that the stable in September (as we already got some good feedback on it).

Anybody with problems in the current v5-stable should simply update to the beta as soon as it is available.

Friday, July 23, 2010

coding new config format begun

After a long discussion about potential new config formats for rsyslog, we came to the conclusion that the current format is not as bad as I thought and just needs scoping (OK, the whole story is longer, but I can't repeat that lengthy discussion in a single blog post, see mailing list archive for details).

After some thinking, I finally started coding today.

I have begun to implement action scoping. A snapshot with partial functionality is available at

http://git.adiscon.com/?p=rsyslog.git;a=shortlog;h=refs/heads/newconf

It does NOT yet include the necessary output plugin interface change (or updated plugins), but it implements

$Begin action
$End action
$StrictScoping [on/off] -- off default

So if you want to play a bit with it, feel free to do so. Note that it disallows global statements within action scope and in overall has somewhat better detection of user errors (these are easier to detect when scoping is used).

Note that scoping is purely optional: if not enabled, it will not be used. So any current rsyslog.conf remains valid.

I will see that I change the projects's output plugins next week, and will possibly then make an experimental release.

Thursday, July 01, 2010

Why I think Lua is no solution for rsyslog

During our discussion about the new rsyslog config format, a couple of times it was suggested to use Lua inside rsyslog. So far, I reject this idea for a couple of reasons, and I thought it is time to write up of them.

But first of all let me explain that I do not dislike Lua. I think it is a very good tool, and it can be extremely useful to embed it. What I question is if Lua is the right thing for rsyslog.

Let's first think about what rsyslog is: it is a very high-speed, very scalable message processor that handles message processing via fine-tuned algorithms and with a lot of concurrency. As one important detail, the underlying engine can be seen as a specialized SIMD (single instruction multiple data) computer. That is a machine that is able to execute the same instruction on multiple data elements at the same time concurrently. Speaking in rsyslog terms, this means that a single rule will process multiple messages "at once" in a concurrent operation (to be precise, the level of concurrency depends on a large number of factors, but let's stick with the simplified view). Also, rsyslog is able to execute various parts of a ruleset concurrently to other parts. Some of of these work units can be delayed for very long time.

So a rsyslog configuration is a partially ordered set of filters and actions, which are executed in parallel based on this partial ordering (thus the concurrency). Each of the parallel execution threads then works in a SIMD manner, with the notable fact that we have a variable number of data elements. All communication between the various parts is via a message passing mechanism, which provides very high speed, very high reliability and different storage drivers (like memory and disk). Finally, speed is gained by state-of-the-art non-blocking synchronization and proper partitioning (quite honestly, not much now, but this work is scheduled).

Then, there is Lua. Reading about the implementation of Lua 5.0, I learn that Lua employs a virtual machine and all code is interpreted. Also, other than rsyslog's engine, Lua's VM is a VM for a programming language, not a message processor (not surprisingly...). Thus, it's optimized statements are for control-of-flow instructions. I don't see anything that permits SIMD execution. That alone, based on rsyslog experience, means a 400 to 800% drop in performance -- just use a rsyslog v3 engine which did not have this mode. Not that this is the difference between the ability to process e.g. 200,000 messages per second (mps) vs. 50,000 mps. That in turn is already argument enough for me to reject the idea of Lua inside rsyslog.

But it comes even worse: nor surprisingly for something that claims to be simple and easy, Lua's threading is limited and makes it somewhat hard to integrate with threaded code (which would not be an issue, as the core part of rsyslog would be replaced by Lua if I'd use it). So coroutines would probably be the way to go. Reading the coroutines tutorial, I learn

"Coroutines in Lua are not operating system threads or processes. Coroutines are blocks of Lua code which are created within Lua, and have their own flow of control like threads. Only one coroutine ever runs at a time,"

Which is really bad news for rsyslog,which tries to fully utilize highly parallel hardware.

"and it runs until it activates another coroutine, or yields (returns to the coroutine that invoked it). Coroutines are a way to express multiple cooperating threads of control in a convenient and natural way, but do not execute in parallel, and thus gain no performance benefit from multiple CPU's."

That actually does not need a comment ;)

"However, since coroutines switch much faster than operating system threads and do not typically require complex and sometimes expensive locking mechanisms, using coroutines is typically faster than the equivalent program using full OS threads."

Well, with a lot of effort going into threading, rsyslog is much faster on multiple CPUs than on a single one.

So weighing this, we are back to how the rsyslog v1 engine worked. I don't even have performance numbers for that at hand, it was too long ago.

Looking at the Lua doc, I do not find any reference to trying to match the partial order of execution into something that gains concurrency effects (and given the single-threadedness, there is no point in doing so...).

I have not dug deeper, but I am sure I will also find no concept of queues, message passing etc. If at all, I would expect such concepts in an object oriented language, which Lua claims not to be primarily.

So using Lua inside rsyslog means that I will remove almost all of those things that helped me make up a high performance syslogd, and it will also remove lot's of abilities from the product. The only solution then would be to heavily modify Lua. And even if I did, I wonder if it's maintainers would like the direction I would need to take, as this would add a lot of extra code to Lua. Which supposedly is not needed for the typical simple (read: non highly parallel) applications that use Lua.

This is why I reject Lua, as well as other off the shelf script interpreters for rsyslog. They do not reflect the needs a high-speed, high-concurrency message processor has.

Of course, I'd like to be proven wrong. If you can, please do. But please do not state generics but rather tell me how exactly I can gain SIMD capability and look-free multi-threaded synchronization with the embedded language of your choice. Note that I am not trying to discourage you. The problem is that I received so often suggestions that "this and that" is such a great replacement for the config, invested quite some time in the evaluation and always saw it is the same problem. So before I do that again, I'd like to have some evidence that the time to evaluate the solution is well spent. And support for rsyslog's concurrency requirements is definitely something we need.


I would also like to add some notes from one (of the many) mailing list post about Lua. These comments, I hope, provide some additional information about the concerns I have:

David Lang stated:
> If speed or security are not major issues, having a config language be
> a
> snippet of code is definantly convienient and lets the person do a huge
> number of things that the program author never thought of (see simple
> event correltator for an example of this), but in rsyslog speed is a
> significant issue (processing multiple hundreds of thousands of logs
> per
> second doesn't leave much time) and I don't think that an interpreter
> is
> up to the task. Interpreted languages also usually don't support
> multi-threaded operation well.

And I replied:

This is a *very* important point. And it is the single reason why I
re-thought about RainerScript and tend not to use it. While (in design) it
can do anything I ever need, the interpretation is too slow -- at least as
far as the current implementation is concerned. I have read up on Lua, and
there seem to be large similarities between how Lua works and how
RainerScript actually (in filters!) works. Let met assume that Lua is far
more optimized than RainerScript. Even then, it is a generic engine and
running that engine to actually process syslog data is simply too slow.

In order to gain the high data rates we have. Using my test lab as an
example, we are currently at ~250,000 mps. The goal for my next performance
tuning step will be to double that value (I don't know yet when I will start
with that work). Overall, the design shall be that rsyslog almost linearly
scales with the number of CPUs (and network cards) added. I've done a couple
of design errors with that in the past, but now I am through with that, have
done a lot of research and think that I can achieve this nearly-linear
speedup in the future. That means there will no longer be an actual upper
limit on the number of messages per second rsyslog can process. Of course,
even on a single processor, we need *excellent* performance.

For the single-processor, this means we need highly optimized, up to the task
algorithms that don't do many things generically.

For the multi-processor, that means we need to run as many of these tasks
truly concurrently.

For example, in the last performance tuning step, I radically changed the way
rules are processed. Rather than thinking in terms of messages and steps to
be done on these, I now have an implementation that works, semi-parallel, on
the batch as whole and (logically) computes sub-steps of message processing
concurrently to each other (to truly elaborate on this would take a day or
more to write, thus I spare the details).

I don't think any general language can provide the functionality I need to do
these sorts of things. This was also an important reason that lead to
RainerScript, a language where I could define the level of granularity
myself. The idea is still not dead, but the implementation effort was done
wrongly. But I have become skeptic if a language at all is the right
approach.

Also note the difference between config and runtime engine. Whatver library /
script/ format/ language we use for the config will, for the reasons given
above, NOT be used during execution. It can only be used as a meta-language
to specify what the actual engine will do.

So if we go for Lua (for example), we could use Lua to build the rsyslog
config objects. But during actual execution, we will definitely not use Lua.
So we would need a way to express rsyslog control flow in Lua, what probably
would stretch the spec too far. Note that a Lua "if then" would not be
something that the engine uses, but rather be used to build a config object.
So we still have the issue how to specify an "rsyslog engine if then" inside
a Lua script". Except, of course, if you think that Lua can do regular
processing, which I ruled out with argument above.


And a bit later I added:
The order of execution of the task inside a given configuration
can be viewed as a partially ordered set. Some of the tasks need to be
preceded by others, but a (large) number of tasks have no relationship. To
gain speed and scalability, the rsyslog engine tries to identify this partial
order and tries to run those task in parallel that have no dependency on each
other. Also, one must note that a config file is written with the assumption
of a single message traversing the engine, which is a gross simplification.
In practice, we now have batches (multiple messages at once) traversing
through the engine, where a lot of things are done concurrently and far
different from what one would expect when looking at the config file (but in
a functionally equivalent way). It is this transformation of in-sequence,
single-message view to partial execution order, parallel view that provides
the necessary speedup to be able to serve demanding environments.

Thursday, June 24, 2010

rsyslog performance improvement rather impressive

I (think I ;)) have finished this round of performance rsyslog tuning. The result is rather impressive:

On my virtualized 4-core development environment (not exactly a high-end environment), I went from approx. 70,000 messages per second (mps) to approx. 280,000 mps. Note that these numbers do not necessarily represent a practice configuration, but I think they show the magnitude of the optimization. Also note that some complex configurations have far lower gain, because some things (like execute an action only n times within m seconds or "message repeated n times" processing) require serialization of the message flow and there is little we can gain in this case.

I plan to do an "official release" in the not so distant future. Next, I will first see which patches I have in my queue and then I'll focus on the config language enhancement. That's much more complex than just the format -- I'll blog the details hopefully either later today or tomorrow morning.

Tuesday, June 15, 2010

Getting Top Performance out of rsyslog

Rsyslog is lightning fast. However, the configuration influences speed very much. This blog post tells what offers optimal performance for the most recent v5 version.

I will update this blog post whenever there is news to share (at least this is the plan). This information will also hopefully flow into the rsyslog doc at some time.

  • do not use more than one ruleset within a single queue
  • do not use rate limiting unless absolutely necessary
  • use array-based queue modes
  • do not use
  • send data from different inputs to multiple queues
  • use "$ActionWriteAllMarkMessages on" for all actions where you can afford it (it really makes a difference!)


This following blogpost also has some solid information on performance-influencing parameters: rsyslog evaluation. Note that it talks about a somewhat older rsyslog release. While already quoting 250,000 messages per second, rsyslog 5.5.6 is quite a bit faster.

Wednesday, June 09, 2010

what are actions and action instance data?

On the rsyslog mailing list, the question about what actions are in in which way they are kept single-threaded from the POV of the output module came up again. I try to summarize the most important points and term here.

David Lang gave the following example configuration:

*.* file1
*.* file2
*.* @ip1
*.* @ip2
*.* @@ip3
*.* @@ip4

and asked how many different actions/entities that were. Here is my answer:

An *action* is a specific instance of some desired output. The actual processing carried out is NOT termed "action", even though one could easily do so. I have to admit I have not defined any term for that. So let's call this processing. That actual processing is carried out by the output module (and the really bad thing is that the entry point is named "doAction", which somewhat implies that the output module is called the action, what is not the case).

Each action can use the service of exactly one output module. Each output module can provide services to many actions. So we have a N:1 relationship between actions and output modules.

In the above samples, 3 output modules are involved, where each output module is used by two actions. We have 6 actions, and so we have 6 action locks.

So the output module interface does not serialize access to the output module, but rather to the action instance. All action-specific data is kept in a separate, per-action data structure and passed into the output module at the time the doAction call is made. The output module can modify all of this instance data as if it were running on a single thread. HOWEVER, any global data items (in short: everything not inside the action instance data) is *not* synchronized by the rsyslog core. The output module must take care itself of synchronization if it desires to have concurrent access to such data items. All current output modules do NOT access global data other than for config parsing (which is serial and single-threaded by nature).

Note that the consistency of the action instance data is guarded by the rsyslog core by actually running the output module processing on a single thread *for that action*. But the output module code itself may be called concurrently if more than one action uses the same output module. That is a typical case. If so, each of the concurrently running instances receives its private instance data pointer but shares everything else.

Monday, June 07, 2010

further improving tcp input performance

As one of the next things, I will be further improving rsyslog's tcp syslog input performance. As you know, rsyslog already has excellent performance (some sources, for example, quote 250,000 msgs per second). But, of course, there is room for improvement.

One such area is imtcp, the tcp syslog input module. It uses a single polling loop to obtain data from all senders. It is worth noting that the actual input module does NOT do very much, but hands the majority of work off to queue worker threads. However, it pulls the data from operating system buffers to our user space and also fills some basic properties (like time of reception, remote peer and so on). Then, the message is pushed to the message queue and at the other side of the queue the majority of processing happens (including such things like parsing the message, which some would assume to happen inside the receiving thread).

As can be seen in practice, this design scales pretty well in most cases. However, on a highly parallel system, it obviously limits the process of pulling data "off the wire" to be done on a single CPU. If then the rule set is not very complex (and thus fast to process), the single-threadedness off the initial receiver becomes a bottleneck. On a couple of high performance systems, we have seen this to be the bottleneck, and I am now trying to address it.

Right now, I am looking for a good solution. There are two obvious ones:

a) start up a single thread for each connection
b) do a hybrid approach of what we currently do and a)

Even with 64bit machines and NPTL, approach a) does probably not work well for a very large number of active sessions. Even worse, receiving messages from two different hosts would then require at least one context switch, and do so repeatedly. Context switches are quite expensive in terms of performance, and so better to avoid. Note that the current approach needs no context switch at all (for the part it does). On a system with many connections, I would be close to betting that the runtime required by the a)-approch context switching alone is probably more than what we need to do the processing with our current approach. So that seems to be a dead end.

So it looks like b) is a route to take, combining a (rather limited) number of threads with an reception-even driven loop. But how to best do that? A naive approach is to have one thread running the epoll() loop and have a pool of worker threads that actually pull the data off the wire. So the epoll loop would essentially just dispense to-be processed file descriptors to the workers. HOWEVER, that also implies one context switch during processing, that is when the epoll loop thread activates a worker. Note that this situation is by far not as bad as in a): as we have limited number of workers, and they are activated by the epoll thread, and that thread blocks when no workers are available, we have limited the level of concurrency. Note that limiting the concurrency level roughly to the number of CPUs available makes a lot of sense from a performance point of view (but not necessarily from a program simplicity and starvation-avoidance point of view - these concerns will be locked at, but now I have a focused problem to solve).

One approach to this problem could be that I further reduce the amount of work done in imtcp: if it no longer pulls data off the wire, but just places the file descriptor into a "message" object and submit that to the overall queue, modified queue processing could then take care of the rest. However, there are many subtle issues, including how to handle system shutdown and restart as well as disk queues. In short: that probably requires a full redesign, or at least considerable change. Anything less than that would probably result in another processing stage in front of the rule engine, as outlined initially (and thus require additional context changes).

So I focused back to the optimal way to partition this problem. One (simple) approach is to partition the problem by tcp listeners. It would be fairly easy to run multiple listeners concurrently, but each of the listeners would have its own (epoll/take data off the wire)-loop that runs on the listener's single thread. So in essence, it would be much like running two or more rsyslog instances, using the current code, concurrently. That approach obviously causes no additional context switches. But it has a major drawback: if the workload is spread unevenly between listeners, it may not provide sufficient parallelism to busy all CPU cores. However, if the workload is spread evenly enough, the approach can prevent starvation between listeners - but not between sessions of one listener. This problem is also not addressed by the current code, and there has never been any user complaint about that (or it's potential effects). So one may conclude starvation is not an issue.

It looks like the usefulness of this approach is strongly depending on the spread of workload between different listeners. Looking at a busy system, we need focus on the number of highly active listeners in relation to the number of expectedly idle CPU cores i. That number i obviously must take into consideration any other processing requirements, both from rsyslog (parsing, rule processing, ...) as well as all other processes the system is intended to run. So, in general, the number i is probably (much) lower than the total number of cores inside the system. If we now have a number l of listeners, we must look closely: if among all listeners, l_h is the number of high activity listeners, than it is sufficient to have i equals l_h: few occasional wakeups from low activity listeners do not really matter. However, if l_a is lower than i, or even just one, then we can not fully utilize the system hardware. In that case, we would need to provide partitioning based on sessions, and there we see a similar scheme based on the view of low- and high-activity sessions.

But the real questions is if we can assume that most busy systems have a sufficient number of high activity listeners, so that per-listener concurrency is sufficient to fully utilize the hardware. If that is the case, we can drastically improve potential message processing rates and still be able to keep the code simple. Even more concrete, the question is if we re sufficiently sure this approach works well enough so that we implement it. Doing so, could save considerable development effort, which could be put to better uses (like speeding up queue processing). BUT that development effort is wasted time if for a large enough number of systems we can not see benefit. And note that single-listener systems are not uncommon, a case where we would gain NO benefit at all..

I am actually somewhat undecided and would appreciate feedback on that matter.

Thanks in advance to all who provide it.
Rainer

Update: there is a long and very insightful discussion about this post on the rsyslog mailing list. All interested parties are strongly advised to read through it, it will definitely enhance your understanding. Please also note that based on that discussion the development focus shifted a bit.

Friday, June 04, 2010

rsyslog string generators ... done :)

A rsyslog string generator is what I had previously called a "template module" - in essence a facility to generate a template string with some custom native C code. I have decided to name it a bit differently, because at some later stage there may be other uses for these types of modules as well. Specifically, I am thinking about adding custom name-value pairs to the message object, and then a string generator (or strgen for short) could be used to generate such a value as well.

Implementation went smooth. I implemented both the interface as well as a small set of important core strgens, those that are frequently used to write files or forward message to remote machines. I did not touch any others, as that is probably not really necessary -- and could easily be done any time if need arises.

The new interface also provides a capability to third-parties that enables them to create their own high speed parsers. The performance impact can be dramatic, just think about cases where multiple regular expression calls can be replaced by a single call and some C logic.

Finally, these modules may even provide a way to fund rsyslog development. Adiscon can probably sell them for some small amount (I guess around $500 based on what needs to be done, in some cases maybe less, in some maybe a bit more). I guess that would be attractive for anyone who needs both high speed and a custom format and runs rsyslog for profit. Getting into all the details to develop such a thing oneself probably costs more than our whole implementation effort. I hope we will get some orders for these, and I hope that folks will contribute the strgen back to the project. Their plus is then that we maintain it for free and the plus for the community is that, in the long term, we will hopefully get a large directory of ready-to use custom strgens (OK, that sidesteps the funding process a bit, but... ;)).

I have also managed to write some basic doc on the new capability, to be seen here:
What now is missing is some feedback from the field, including from someone who actually uses this to create a custom format.

The code has been merged into v5-devel (the master branch) and will most probably be released early next week. Then, it will undergo the usual devel/beta cycle, so that availability in a stable v5 release can be expected towards the end of summer 2010. Special thanks go to David Lang, who provided good advise that helped me create the new functionality.

Tuesday, June 01, 2010

rsyslog template plugins

As I have written yesterday, I am evaluating the use of "template modules" in rsyslog.

In that post, I mentioned that I'd expect a 5% speedup as proof that the new plugin type was worth considering. As it turns out, this method seems to provide a speedup factor of 5 to 6 percent, so it seems to be useful in its own right.

After I had written yesterday's post, I checked what it would take to create a test environment. It turned out that it was not too hard to change the engine so that I could hardcode one of the default templates AND provide a vehicle to activate that code via the configuration file. Of course, we do not yet have full loadable modules, but I was able to create a proof of concept in a couple of hours and do some (mild) performance testing on it. The current code provides a vehicle to use a c-function based template generator. It is actiated by saying

$template tpl,=generator

where the equal sign indicates to use a C generator instead of the usual template string. The name that follows the equal sign that will probably later become the actual module name, but is irrelevant right now. I then implemented a generator for the default file format in a very crude way, but I would expect that a real loadable module will not take considerably more processing time (just a very small amount of calling overhead after the initial config parsing stage). So with that experimental code, I could switch between the template-based default file format and the generator based format, with the outcome being exactly the same.

Having that capability, I ran a couple of performance tests. I have to admit I did not go to a real test environment, but rather used my (virtualized) standard development machine. Also, I ran the load generator inside the same box. So there were a lot of factors that influenced the performance, and this for sure was no totally valid test. To make up for that, I ran several incarnations of the same test, with 1 to 10 million of test messages. The results quite consistently reported a speedup between 5 and 6 percent achieved by the C template generator. Even though the test was crude, this consistently seen speedup is sufficient proof for me that native template generators actually have value in them. I have to admit that I had expected improvements in the 1 to 2 percent area, so the 5 and more percent is considerable.

I committed the experimental branch to git, so everyone is free to review and test it oneself.

Now that I am convinced this is a useful addition, my next step will be to add proper code for template plugins (and, along that way, decide if they will actually be called template plugins -- I guess library plugins could be used as well and with somewhat less effort and greater flexibility). Then, I will convert the canned templates into such generators and included them statically inside rsyslog (just like omfile and a couple of other modules are statically included inside rsyslog). I hope that in practice we will also see this potential speedup.

Another benefit is that any third party can write new generator functions. Of course, there is some code duplication inside such functions. But that should not be a bit issue, especially as generator functions are usually expected to be rather small (but of course need not be so). If someone intends to write a set of complex generator functions, these can be written with a common core module whom's utility functions are accessed by each of the generators. But this is not of my concerns as of now.

Note that I will probably use very simple list data structures to keep track of the available generators. The reason is that after the initial config file parsing, access to these structures is no longer required and so there is no point in using a more advanced method.

I expect my effort to take a couple of days at most, but beware that Thursday is a public holiday over here in Germany and I may not work on the project on Thursday and Friday (depending, I have to admit, a little bit on the weather ;)).

Monday, May 31, 2010

rsyslog speed & output formatting

I'd like to reproduce a mailing list post here, both because I would like to retain it for easy reference and I consider it important enough to achieve better visibility. It originated by asking a question about output formatting options, but took as down to important roadmap questions for rsyslog.

For simplicity, I simply copy over the text of my relevant posting:

> On Mon, 31 May 2010, david@lang.hm wrote:
>
> > On Mon, 31 May 2010, Rainer Gerhards wrote:
> >
> >>> I agree that doing it in the output would be far better in many
> ways,
> >>> but
> >>> since there isn't a way to do a plugin there (at least not as far
> as I
> >>> know, it would be good to get confirmation or a better idea)
> >>
> >> David, can you tell me what you have on your mind for this
> functionality? I
> >> have thought a bit about it, and I probably have one approach
> myself. But I
> >> would prefer to hear your idea before I push you into a direction.
> >
> >
> > two options
> >
> > 1. something that would work similar to the existing format
> > string, but would call a C subroutine that could read the existing
> > properties and would create the output string in a buffer
> >
> > 2. something that could also modify the exisitng properties (more
> > powerful, but also more dangerous and could involve locking to
> prevent
> > other things from trying to read properties at the same time)
> >
> > we haven't gone too far down the road of researching the output
> > performance (since the input and queue locking has dominated so far),
> but
> > it is clear that the output currently takes significantly more CPU
> time
> > than input, it may be that being able to use C to define the output
> format
> > instead of interpreting the format string may be a noticable
> improvement.
> > Is there a relativly easy way to test this? (say, hard-code a format
> or
> > two and test writes to file and network with the hard-coded format vs
> a
> > format string that produces the same output?)
>
> for the traditional output formats the difference may not be that much,
> but if there is extensive parsing involved (as the initial poster is
> doing, or what I would expect is common for specific log types into a
> database) the difference can be much more significant since it can
> replace
> multiple regex statements with a much faster single pass that looks for
> word breaks and inserts standard filler in those spots.
>
> With the new syslog format where the data is 'supposed to be' in a
> series of name=value tuples, something like this would be a pretty
> efficiant way of extracting particular portions of the data to be
> output
> (although the properties could be extended to do this sort of thing by
> providing something similar to a perl hash)

You are looking in the same direction I am, and I think this is good news ;)

The current engine supports functions coded in C, but not yet as real plugins
nor in an easy to see way. It is done via a crude function interface library
module, and only within the script engine. My original plan (over a year, or
even two, ago) was to generalize these library plugins, so that it is easy to
add new code and load them as plugins. Actually, making them available as
plugins should not be too much work given the already existing
infrastructure. There already exist a handful of "function modules", the
control structure is just statically created during compile time, much as
some of the output plugins are statically linked.

Then the original plan was to enable templates to call scripts and enable
scripts to define templates (kind of). Unfortunately, I got distracted by
more important things before I could complete all of this.

HOWEVER, at this time performance was not a major concern. With what has
evolved in the mean time, I do not like the original approach that much any
longer. At least the script engine must become much faster before I can take
a real look at that capability. Right now, scripts generate a interim code
that then is interpreted by a (kind of) virtual machine. A script invocation
inside a template would mean that a VM must be instantiated, the script
interpreted and the resulting string be used as template contents. Clearly,
this is not for high-performance use. Still, however, it may be useful to
have that capability for those cases, where performance is not the #1
consideration. But given that everything would need to be implemented, it
does make limited sense to look into something known to be too slow in the
long run. BTW, this is one reason that I have not yet continued to work on
the script engine, knowing that some larger redesign is due to fit it into
the now much tighter runtime constraints.

On the performance of the output system: I think the system in general is
quite fast and efficient, with only ONE important exception: that is, if
multiple replacements need to happen. Still, the algorithm is quite
efficient, but it is generic and needs to run though a number of steps. Of
course, it is definitely faster to permit a C plugin to look at the message
and then format, in an "atomic" way the resulting custom string. Thus, you
need to write multiple C codes instead of using a generic engine, but can do
so in a much higher performance way. I would assume, however, that this
approach cannot beat the simple templates we usually use (maybe by less than
5% and, of course, there may be cases where this matters).

As you know, my current focus is speed, together with some functional
enhancements. I was looking at queue operations improvements, but the
potential output speed improvements may be more interesting than the queue
mode improvements (and apply to more use cases). So it may make sense to look
into these, first. My challenge here is to find something that is

a) generic enough to be useful in various (usual) cases
b) specific enough to be rather fast

and it should also be able to implement within a few weeks at most, because I
can probably not spend much more time on a single feature/refactoring.

One solution may be to create "template modules". I could envision a template
module to be something that generates the template string *as a whole* from
the input message.

That is, we would have

$template current-style,"%msg%\n"

but also (**)

$modload tplcustom
$template custom,tplcustom

where tplcustom generates the template string.

While this sounds promising, we have some issues. One immediately pops up my
mind: we will probably be able to use the same template for file writing or
forwarding, but for file writing we need a LF at the end, while for
forwarding we do not need it.

So the most natural way would be to have the ability to embed a "custom
template" into a regular template, like suggested by this syntax:

$template both,"%=tplcustom%\n"

however, this brings us down to the slippery slope of the original design. As
a next thing to be requested, I could ask for using not the msg object (with
its fixed unmodified properties), but rather of a transformation of the
message object. So we would end up with something like this:

$template cmplx,"%=tplcustom(syslogtag & msg)%"

Which would require a much more complex logic working behind the scenes.

Of course, depending on the format used, the engine could select different
processing algorithms. Doing this on the fly seems possible, but requires
more work than I can commit in one sequence.

Also, it would be useful to have the ability to persist already-generated
properties with the message while it is continued to be processed in the rule
engine. So far, we do not have this ability, and the reason is processing
time (plus, as usual, implementation effort): for that, we would need to
maintain a list (or hash, ...) of name/value pairs, store them to disk for
disk queues and shuffle them through the rule engine as processing is carried
out. As I said, quite doable, but another big addition.

So I am somewhat stuck with things that sound interesting, but are a bit
interdependent. Doing them all together is too big to be useful, and it will
probably fail because I can probably not keep focus on all of the for the
next, say, 9 to 12 month that it would require to complete everything.

So I am again down to picking what is most useful. Out of this discussion, it
looks like the idea I marked with (**), the plain C template generator could
be a useful route to take. I am saying this under the assumption that it
would be relatively easy to implement and cause at least some speedup in
standard cases (contrary to what I expect, I have to admit...). But that
approach is highly specialized, requiring a C module for each custom format.
So does it really serve the rsyslog community well - or just some very
isolated use cases?

Thinking more about it, it would probably be useful if it is both

a) relatively easy to implement and
b) causes some speedup in standard cases

But b) cannot be proven without actually implementing the interface. So, in
practice, the questions boils down to what we *expect* about the usefulness
of this utility.

Having said that, I'd appreciate feedback, both on the concrete question of
the usefulness of this feature as well as any and all comments on the
situation at large. I am trying to put my development resources, which
thankfully have been somewhat increased nowadays :) to the area where they
provide greatest benefit.

Busy at the moment...

Some might have noticed that I am not as active as usual on the rsyslog project . As this seems to turn out to keep at least for the upcomi...