Thursday, March 15, 2012

JSON and rsyslog templates

Rsyslog already supports JSON parsing and formatting (for all cee properties). However, the way formatting currently is done is unsatisfactory to me. Right now, we just take the cee properties as they are and format them into JSON format. In this mode, we do not have any way to specify which fields to use and we also do not have a way to modify the field contents (e.g. pick substrings or do case conversions). Exactly these are the use cases rsyslog invented templates for.

One way to handle the situation is to have the user write the JSON code inside the template and just inject the data field where desired. This almost works (and I know Brian Knox tries to explore that route). IT just works "almost" as there is currently no property replacer option to ensure proper JSON escaping. Adding this option is not hard. However, I don't feel this approach is the right route to take: making the admin craft the JSON string is error-prone and very user-unfriendly.

So I wonder what would be a good way to specify fields that shall go into a JSON format. As a limiting factor, the method should be possible within the limits of the current template system - otherwise it will probably take too long to implement it. The same question also arises for outputs like MongoDB: how best to specify the fields (and structure!) to be passed to the output module?

Of course, both questions are closely related. One approach would be to solve the JSON encoding and say that to outputs like MongoDB JSON is passed. Unfortunately, this has strong performance implications. In a nutshell, it would mean formatting the data to JSON, and then re-parsing it inside the plugin. This process could be be somewhat simplified by passing the data structure (the underlaying tree) itself rather than the JSON encoding. However, this would still mean, that a data structure specific for this use would need to be created. That obviously involves a lot of data-copying. So it would probably be useful to have a capability to specify fields (and replacement options) that are just passed down to the module for its use (that would probably limit the required amount of data copying, at least in common cases). Question again: what would be a decent syntax to specify this?

Suggestions are highly welcome. I need to find at least an interim solution urgently, as this is an important building block for the MongoDB driver and all work that will depend on it. So please provide feedback (note that I may try out a couple of things to finally settle on one - so any idea is highly welcome ;)).

4 comments:

Brian said...

For my immediate needs, being able to JSON escape properties as a property option is satisfactory. I'm experimenting with templates such as:

$template cee_enhanced,"@cee: {\"Event\":{\"p_proc\":\"%programname%\",\"p_sys\":\"%hostname%\",\"time\":\"%timestamp:::date-rfc3339%\"},\"Msg\":{\"raw_msg\":\"%rawmsg%\"}}\n"
26 *.* /var/log/cee_events.log;cee_enhanced

Which works for a lot of my messages, the escaping is the only issue. Adding a JSON escape property option would allow me to move pretty quickly as I prototype things.

Longer term, having some sort of JSON call where I could pass a property list would be nice... although the templates do give me the flexibility of specifying nested document structures, which might get trickier. I'll think more about it.

Rainer Gerhards said...

Lol... some month ago, I was smart enough to add a ",json" option to the template system, which seems to do what you urgently need. It's not a real JSON-encoder, but handles the quote char. So:

$template cee_enhanced,"@cee: {\"Event\":{\"p_proc\":\"%programname%\",\"p_sys\":\"%hostname%\",\"time\":\"%timestamp:::date-rfc3339%\"},\"Msg\":{\"raw_msg\":\"%rawmsg%\"}}\n",json
*.* /var/log/cee_events.log;cee_enhanced

Note the end of the $template line!

... should do what you need to get started. This is very experimental code, and I think it will not properly work with already-JSON escaped data (like in $!all-json!) and it will also not do proper full JSON escaping. But it really looks like it is what you right now need for your PoC ;)

But please expect syntax changes, so you probably need to adjust your rsyslog.conf later. I do not like to be forced to keep supporting this even if we go into a totally different direction (and I think this experimental solution is not clean in the light of what has changed since I implemented it).

Radu Gheorghe said...

I'm quite a newbie at all this, so excuse me if I'm a bit naive. Anyway, here's how I see it:

As a user, I think it would be just right if I could write my JSON like any other template, and rsyslog would do the escaping for me. Like in Brian's example. A usability enhancement here would be the ability to enclose the template in single quotes, so that you can leave your double-quotes in the template itself unescaped.

A more easy-to-use and less error-prone (though limited) version would be to be able to define key:value pairs in a different sort of property replacer. And let rsyslog do both escaping and formatting. Something like this:

$json_template date_and_message_only,"date_to_second","%timereported:1:19:date-rfc3339%"
$json_template date_and_message_only,"message","%msg%"
*.* /var/log/jsonned;date_and_message_only

which would produce something like:
{"date_to_second":"2012-03-15T14:41:58", "message":"This is a test message"}

For MongoDB, an option would be to convert the JSON into BSON. I just hope that this would be faster than parsing the whole message a second time.

On the other hand, once we have a JSON template, we could have rsyslog modules to pipe logs into anything that accepts inserting JSONs via a REST interface, for example. This would include MongoDB, Elasticsearch, CouchDB and I suppose much more. Which I guess brings up the performance topic, where I have a question (with an attached opinion):

Suppose I have a bunch of machines running rsyslog that are normally sending logs to a "central" rsyslog which puts those logs in a database. Wouldn't it be better for performance and scalability if the machines would use rsyslog to put logs into the database using the REST interface?

Rainer Gerhards said...

Just tying togther discussions: there is also good information on the rsyslog mailing list. Be sure to follow the complete thread.

Busy at the moment...

Some might have noticed that I am not as active as usual on the rsyslog project . As this seems to turn out to keep at least for the upcomi...