Tuesday, April 10, 2012

Using different templates with rsyslog's ElasticSearch plugin

Recently, an experimental ElasticSearch plugin has been added to rsyslog, omelasticsearch. Like all other output plugins, it comes with a canned template, which specifies a default "schema". However, the template engine provides capabilities to use a completely different set of fields. In this blog post, I'll briefly describe how this is done.

Note: all work is based on the current (April, 10th 2012) implementation of both omelasticsearch as well as the rsyslog core. Future implementation changes are expected in an effort to make things more intuitive. This may even break what I describe here. So if you come to this blog post at a later time, it probably is best to check if things have changed by "now" (especially if the procedure does not work ;)).

The current implementation ties into the template system. The default template looks as follows (all on one line, broken for readability):

$template JSONDefault, "{\"message\":\"%msg:::json%\",\"fromhost\":\"%HOSTNAME:::json%\",\"facility\":\"%syslogfacility-text%\",\"priority\":\"%syslogpriority-text%\",\"timereported\":\"%timereported:::date-rfc3339%\",\"timegenerated\":\"%timegenerated:::date-rfc3339%\"}"

The '\"' sequence is needed to represent a quote character inside the template. To format JSON, this is pretty ugly, but that's the way the template processor currently works (and that is one reason why it is under review). As you can see, the JSON is actually "hand-crafted", with the "json" option specifying that property text needs to be properly escaped to be well-formed JSON. If that option is specified, the template processor does the necessary escaping. Note that not all properties have the "json" option. This is purely for performance reasons. For example, time stamps do never include characters that need to be escaped. Consequently, the "json" option is not used there (but could be, e.g. "date-rfc3339,json").

So now let's define a different set of fields to be used. Let's say we just want to have the date from the syslog message and the MSG part of the message itself. That would be as follows:

$template miniSchema, "{\"message\":\"%msg:::json%\",\"timereported\":\"%timereported:::date-rfc3339,json%\"}"

Note: I requested JSON formatting in this example just to prove the point - don't use it in a real deployment, as it is nonsense that costs CPU cycles ;)

Now I need to use the template within the omelasticsearch action. This is done as follows:

*.*     action(type="omelasticsearch" template="miniSchema")

Note: the "all" ("*.*") filter of course can be replaced with a different type of filter.

That's all that needs to be done. Please note that you can add several templates and use these in several different elasticsearch output actions. Just exactly the same thing used in other actions. Also keep in mind that the property replacer permits to access a wide range of message properties. Most importantly, normalized properties or cee-enhanced syslog properties can be accessed via the CEE family of property names (essentially "$!cee-name" style). Just be sure that you include the "json" option into any property that may contain unsafe characters (which means almost all of the fields). This is not done automatically by the current engine and invalid characters can lead to strange problems, even aborts of ElasticSearch itself!

In the future, a more intuitive syntax is planned for JSON template definitions. Nevertheless, the current code permits full customization but requires taking care of the details.
Post a Comment