Wednesday, December 23, 2015

rsyslog and liblognorm will switch to libfastjson as replacement for json-c

We have been using json-c for quite a while now and had good success with it. However, recent problem reports and analysis indicate that we need to replace it in the future. Don't get me wrong: json-c is a solid piece of software, but we most probably use it much more intensely as the json-c developers ever anticipated. That's probably the actual root cause why we need to switch.

A main problem spot is performance: various experiments, profiler runs, code review and experimental implementations have prooven that json-c is a severe bottleneck. For example, in the evaluation of liblognorm v2 performance, we found out that json-c calls dominated processing time by more than 90%. Once we saw this, we dug down into the profiler and saw that the hashtable hash calculation as well as memory allocations took a large amount of overall processing time. We have submitted an initial performance enhancement PR to json-c which also got merged. That already removed processing time considerably. We continued on that path, resulting in a quite large second enhancement PR, which I withdrew due to disagreement with the json-c development lead.

A major problem for our application of json-c is that the hash table implementation beneath it is not a good match to our needs. We have been able to considerably speed it up by providing a new hash function (based on perl's hash function), but to really get the performance we need, we would need to replace that system. However, that is not possible, because json-c considers the hash tables part of its API. Actually, json-c considers each function, even internal ones, as part of the API, so it is very hard to make any changes at all.

Json-c also aims at becoming fully JSON compliant. It currently is not due to improper handling of NUL bytes, but the longer-term plan is to support NUL bytes this. While this is a good thing to do for a general json library, it is a performance killer for our use case. I know, because I faced that same problems with the libee implementation years ago, where we ditched it later in accordance with the CEE standards body board. I admit I also have some doubts if that change in json-c will actually happen, as it IMHO requires a total revamp of the API.

Also, the json-c project releases far to infrequently (have a look at recent json-c releases, the last one was April, 2014). And then, it takes the usual additonal timelag for distros to pick up the new version. So even if we could successfully submit further performance-enhancing PRs to json-c, it would take an awful lot of time before we could actually use them. I would definitely not like to create private packages for rsyslog users, as this could break other parts of a system.

Finally, json-c contains a real bad race bug in reference counting, which can cause rsyslog to segfault under some conditions. A proposed fix was unfortunately not accepted by the json-c development lead, so this is an open issue. Even if it were, it would probably take a long time until the release of the fixed version and its availability in standard packages.

In conclusion and after a lot of thinking, we decided that it is best to fork json-c, which we than did. The new project is named libfastjson. As the name suggests, it's focus is on performance. It will not try to be 100% JSON compliant. We will not support NUL characters in a standards-conformant way. I expect this not to be a big deal, as json-c also never did this, and the number of complaints seem to be very low. So libfastjson will not aim to be  general purpose json library, but one that offers high performance at some functionality cost and works well in highly threaded applications. Note that we will also reduce the number of API functions and especially remove those that we do not need and that cost performance. Also, the data store will probably be changed from the current hashtable-only system to something more appropriate to our tasks.

Libfastjson already includes many performance enhancement changes and a solid fix for the reference counting bug. Up until that bug, we planned to release in the Feb..April 2016 time frame, together with liblognorm v2. Now this has changed, and we actually did a kind of emergency release (0.99.0) because of the race bug. The source tarball is already available. We are working on packages in the rsyslog repositories (Ubuntu is already done). Rsyslog packages are not yet build against it, but we may do an refresh after the holiday period.

Rsyslog 8.15.0 optionally builds against libfastjson (it is preferred if available). Due to the race bug, we have decided that rsyslog 8.16.0 will require libfastjson.

A side-note is due: we have been thinking about a replacement for the variable subsystem since summer or so. We envision that there are capabilities even beyond of what libfastjson can do. So we still consider this project and think it is useful. In regard to liblognorm, however, we need to provide a more generic interface, and libfastjson is a good match here. Also, we do not know how long it will take until we replace the variable system. We don't even know if we actually can do it time-wise.

Friday, December 18, 2015

rsyslog release policy issues

The usual end of the year release policy discussion has begun on the rsyslog mailing list and I wanted to post some thoughts here for broader audience and easy access in the future. Enjoy ;)

Up until ~15 month ago, we released when there was need to. Need was defined as

- important enough (set of bugfixes)
- new functionality

This resulted in various releases. We had the stable/devel releases. Stable releases were rare, devel frequent.

Now, we have scheduled releases. Actually, a release is triggered when we hit a certain calender date, irrelevant of whether or not there is need to release (there is always one or two minor fixes, so we will probably never exprience a totally blank release). We also have switched to stable releases only, and done so without grief (basically because a) we have improved testing and b) users didn't use devel at all).

I just dug into the old discussion. A good entry point is probably this here, where we talk about patches:

The new system works reasonably well. It has it's quircks, though. Let's look at a concrete example:

8.14.0, to me, was an absolutely horrible release. The worst we have done in the past 2 to 3 years. I worked hard on fixing some real bad race issues with JSON variables. Friday before the release I was ready to release that work, which would be really useful for folks that make heavy use of those variables. Then, over the weekend and Monday, it turned out that we may get unwanted regressions that weren't detected earlier (NO testbench can mimic a heavy-used production system, so let's not get into "we need better tests" blurb). The end result was that I pulled the plug on release day, and what we finally released was 8.13.0 plus a few small things. All problems with variables persisted. If I had have half a week to a week (don't remember exactly) more, we could have done a real release instead of the 8.13 re-incarnation. But, hey, we run on a schedule.

Now 8.15.0 fixes these problems (except for the json-c induced segfault, which we cannot fix in rsyslog). I also has all other "8.14" enhancements and fixes and so is actually worth 3 month of work. It is a *very heavy* release. Usually, I'd never released such a fat release shortly before the holiday period. Not that I distrust it, and we really got some new testing capabilites (really, really much better), so it is probably the most solid release for a longer time (besides the small quirk with the missing testbench files). But in general I don't like to do releases when I know there is very limited resources available to deal with problems. That's the old datacenter guy in me. But, again, hey, we run on a schedule.

There have similiar occasions in the past 14 month. That's the downside. And due to the 6-week cycle things usually do not get really bad.

The scheduled model has a lot of good things as well. First of all, everyone (users and contributors) know when the next release will be. This also means you can promise to include something into a specific release. However, usually users know when the release happens, but not what will be part of it, so in a sense it's not much better than before IMO. The new model has advantages for me: less releases mean less work. Also, I do not longer really need to think about when to do a release, which feature is important engouh and so. I just look at the calender and know that, for example, in 2016, November 15th we will have a release, no matter if I am present, no matter what is done code-wise etc (we actually had, for the first time everm a release while I was in vacation and it went really well as I learned later). That really eases my task.

All of this bases on the "we release every 6 weeks, interim releases happen only for emergencies and anything else may be pulled as patches" policy. If we now begin to say "this problem is inconvenient to ..{pick somebody}", we need to do a re-release we get into trouble. I wonder which groups of "sombody" are important enough to grant non-emergency releases. Are only distro maintainers important enough? Probably not. So enterprise users? Mmmm.. maybe small enterprises as well? Who judges this? So let's assume every user is as important as every other (an idea I really like). If I then look at my change logs, I think I would need to release more frequent. In essence, I would need to release again when it is needed, which is, surprise, the as-needed schedule).

Rsyslog is not a project big enough to do an even more complex release schedule. To keep things managable to me, I need to release either

a) as-needed

b) on schedule (except for *true* emergencies)

And *that all* is the reason for my reluctance to break the release policy because this time distro maintainers experience the bug versus end users.

I am currently tempted to switch back to "as-needed" mode, even though this means more work for me. 

Wednesday, August 26, 2015

moving towards liblognorm v2

The initial version of liblognorm v2 is almost ready. It offers many new features, like custom data types, much easier rule description langugage, and potentially even greater performance (we have not yet verfied this). As some of you know, I have worked very hard on liblognorm during the past weeks. I have now reached a very important milestone and will switch the git master branch to use the new version. If things go smooth enough, the initial release of liblognorm v2 will go along with the next rsyslog release. Daily build will have it very soon.

Liblognorm v2 also contains the full v1 engine and thus is fully compatible with the previous versions, as far as rulebases are concerned. For more on the compatibility, please read the compatiblity document. In fact, by default the v1 engine is used. To opt in for the new features, you need to add a line


to the top of your rulebases Then it is when you need to really check the compatibility document. This is also what brings you the enhancements.

A couple of notes are due: while we are approaching the initial release, not all design goals have been met yet. Most importantly, we are feeding back user comments into the development process. As such, the v2 feature set is not 100% finalized yet. This means that we cannot yet fully guarantee that all constructs you use will remain compatible with versions released later.  But those that know us also know that this risk is minimal and, if it happens, will be easy to fix. The core concepts are ready and unlikely to change. Note that I will continue to actively work on v2 and more features will be upcoming in the next weeks.

The online doc should be updated in two days at latest (actual update date depends on when I can switch the git branch and how this interacts with the automatic doc generating scripts). I invite you to use the new version and am sure it will be much easier to use and powerful.

Note to developers: the v2 and v1 engine are very different. V2 is a complete rewrite of the core components. Nevertheless, v1 and v2 share some of the same file names. For many reasons, this means I need to rewrite the git master branch with the new version.

Also note that as of now, no new development happens to v1, this version is essentially dead. Very important fixes to the v1 engine will be applied to the v1 subsystem of v2.

Feedback on v2 is appreciated, please post issues or feature requests directly to liblognorm's github trackers (if possible).

Tuesday, April 28, 2015

liblognorm's "rest" parser now more useful

The liblognorm "rest" parser was introduced some time ago, to handle cases where someone just wants to parse a partial message and keep all the "rest of it" into another field. I never was a big fan of this type of parser, but I accepted it because so many people asked. Practice, however, showed that my concerns were right: the "rest" parser has a very broad match and those that used it often got very surprising results.

A key cause of this issue was that the rest parser had the same priority as other parsers, and most importantly a higher priority than a simple character match. so it was actually impossible to match some constant text that was at the same location than the "rest" parser.

I have now changed this so that the rest parser is always called last, if no other thing matches - neither any parser nor any constant text. This will make it work much more like you expect. Still, I caution against using this parser as it continues to provide a very broad match.

Note that the way I have implemented this is not totally clean from a software engineering point of view, but very solid. A cleaner solution will occur during the scheduled rewrite of the algorithm (later in spring/summer).

Note that existing rulebases using "rest" may behave differently with the new algorithm. However, previously the result was more or less random, so any other change to the rulebase could also have caused different behaviour. So this is no compatibility break as there really is no compatibility to retain.

This will be released with 1.1.2, probably in early may. If you need it urgently, you can use a daily build.

Monday, March 30, 2015

Call for Log Samples

There is one big problem in research for better logging methods: no good logging sample repositories exist. Well, not even bad ones... I am currently doing some preliminary steps towards a new, better log normalization system. Among others, it will contain a structure analyzer which will remove much of the manual burden of creating normalization rules. But, guess what: while the project looks very promising, lack of log samples is a real big problem!

To solve that problem, I have setup a public log ingestor that you can simply send logs to. The system is reachable as follows:

port: 514
protocol: any flavor of syslog or other text data

If you run rsyslog, you can use add this snippet to /etc/rsyslog.conf:


How did this idea materialize? During my talk at the German Unix User's Group FFG 2015 conference last week in Stuttgart, I mentioned that problem and Dirk Wetter had the idea to provide a log receiver that makes it very easy for people to contribute. There were some concerns that this may open up my server for DoS, and that of course is true. Nevertheless, I liked the idea and so we setup a machine today. It may be DDoS'ed and other bad things may happen, but then we got more experience. It's split from the main systems, so that shouldn't cause much harm.

For log contributors, please keep on your mind that you send data to a public service and so this is probably not a great idea to do this for sensitive systems. But if we get enough data from uncritical systems, we can still gain a lot from that, most importantly it helps us gain insight into structural log mining methods -- which will also lead to above-mentioned tool. All logs gathered by this method will be placed in the research log repository, which currently is hosted on github. It is licensed under BSD 2-clause in the hope that a sufficiently large and diverse data set is also of great value for other researchers (did I mention it is ultra-hard to find any log sample data sets?). If you are interested in cotributing logs, but would want to do so under NDA, that's of course also possible. In that case, please just drop me an email to see how to best go forward with that.

Friday, February 27, 2015

looking for Java stacktrace samples

I am currently working on log normalization as well as improvements for rsyslog's imfile. Among the things that regularly come up on the rsyslog mailing list is support for multi-line logs and Java stack traces in general.

I would like to see what I can do to improve processing of these. To do so, I need a set of samples of such logs. As such, I look for people who would like to contribute log records for my research.

Please contact me at (or any other way you prefer) to contribute log samples. Please let me know if it is OK if I put them into the public log repository for research or you would like me to keep them private.

All types of multi-line logs are appreciated, this is not limited to java stacktraces.

Your support is greatly appreciated.

Monday, January 26, 2015

rsyslog daily builds and tarballs

rsyslog daily builds on Launchpad
The past days, I have worked on making rsyslog daily builds and tarballs a reality. I hope this will enable users to rapidly deploy the latest features as well as make it easier to help with testing the current development system. Daily builds are what the scheduled v8-devel builds were under the previous release paradigm. Consequently, the archives are named v8-devel.

Right now, builds are only supported for Ubuntu. Users of other platforms are advised to use the daily tarballs to build from source. Depending on feedback on and success of the daily builds, I will make them available for more platforms. 

A daily build is based on the latest git master version. So it really is at the [b]leading edge of technology. So why create them?

A top reason is that I often fix a bug for someone, and that someone then is unable to build from source. In the end result, we have a bugfix, but there is no external confirmation that it really fixed the bug when I merge it into the next release. I hope that now those users can simply pick the daily build and check if that solves their problem.

Also, in general I hope that some users will use the daily tarballs to get not only the latest and greatest but contribute to the project by doing some testing.

Finally, and quite important, with daily builds we will see build problems as early as possible. In the past, we often saw problems only after source release (or very close to it), which was obviously problematic. Now, this should no longer happen. For obvious reasons, the final release build is now more or less a copy of a daily build.

As a technical side-note, daily builds are identified by the git master branch head hash that was used to build them. As a forth version component, they have the first 12 digits of that hash (an example is ""). This enables us to track error reports to the right version. The packages have a different version name, based on the build date. The reason is that the hash does not increment and so newer versions (with lower hash values) are considered as "old" by Launchpad. We avoid this by using an always incrementing package version. Also note that the package changelog just contains a "daily build" entry -- anything else makes limited sense.

I hope you enjoy this new feature! Feedback is appreciated.

Thursday, January 15, 2015

what's next with rsyslog?

Now that we have released version 8.7.0, planning for 8.8.0 is in full force. I thought I'd share some of those things that made it to the top of the todo list:

I already have begun on some experimental research work on a pull model for rsyslog. Scenarios where that would have been extremely helpful surfaced on the mailing list and support forums since long. While never asked for violently, I think it is the time to explore that option. The first pilot implementation will probably very simplistic, but has a big impact: if it works for simple syslog, it will work for other pull protocols as well. That would open up some wholly new use cases. But be careful: it's still unclear if and how fast we can realize such a method.

Secondly, we have receive a grant from GuardTime which enables us to improve the signature-related tooling. While this, too, is a bit of a large project, I will definitely begin to work on it in the 8.8 timeframe.

Finally, the ability to reparse messages is on the list. That's another biggie, and it may be one that requires a handful of release cycles. To make this happen in a clean way, we need to change some of the internal interfaces as well as some of the processing philosophy. It will also need some good discussions on the mailing list.

Note well that these three topics won't necessarily show up in 8.8, but at least they are something we strongly intend to work on - as said, I already started with the pull model.

Besides these three topics, there will be a number of minor improvements and bug fixes. I will also keep some focus on automated testing, but the most urgent need has been solved by the system I set up in Q4 2014. If all goes well, I'll also get some inhouse help on expanding the testbench, what would be a real great plus.

That's it for now, and as always: priorities may shift as needs arise ;)

Monday, January 12, 2015

rsyslog branches and git history

There was a lengthy mailing list discussion in November and December  of 2014 of whether or not to avoid git merge entries. There was also an intermingled discussion on QA and CI. The idea was to trim the git history and make sure tests are run a quickly as possible. As a result of that discussion, I added more automated testbench runs, which also required a new branch master-candidate, which is used as a staging area to run the test, and from which changes are (manually) migrated to master when all testbench runs are OK. In order to avoid merge entries in git log, I made master-candidate the default github branch and also asked contributors to file PRs against that branch.

I've now tried all of this for a couple of weeks. That approach works, but it creates a lot of overhead and quite some confusion for a lot of folks. Some users have voiced they don't really care if there is a merge entry. Fewer have voiced they don't like them. Michael Biebl has pointed out that it is easy to make them disappear from "git log", via the --no-merges command line switch.

After careful consideration and some frustration, I conclude that avoiding merge entries is unnecessary overhead for me. Being the 90%+ contributor for this project, I conclude that avoiding merge entries is unnecessary overhead for the project. As such, I will no longer try to avoid them at all costs. I will, however, try to keep the git history as neat as possible .. but not any more.

As such, I'll reset the default branch on github to "master" and will accept pull requests to master. Internally, everything still needs to go through master-candidate, as this is how the new testbench setup requires. If someone doesn't like this approach to the testbench, I am open to changes, BUT I than ask that someone to actually contribute running code to make that change happen. Good advise only is good, but doesn't help getting things done at this stage. We already know the advise, we just have nobody who has time to implement it!
I would like to thank all users for their comments. I think they have considerable helped move forward. Sorry that I could not accept all suggestions. I guess it's like always in life: not everybody can be fully happy. But I hope we have achieved a sufficient level of overall happiness :-)