Friday, April 20, 2012

MongoDB, BSON and templates...

After I have improved the template system yesterday, I have begun to think again about the integration of custom templates (actually field lists) with ommongodb (rsyslog's mongodb output plugin). A "problem" with the mongo interface is that it does not support native JSON but rather BSON, its binary equivalent. So what needs to be done is convert the textual JSON representation to BSON before it can be stored in MongoDB. Given the fact that the JSON representation must be build with the property replacer, this looks like a was of coding/enconding. Assuming that I would take JSON and tansfrom it to BSON (all this in ommongodb), the workflow would be as follows:

text properties -> encode JSON -> decode JSON -> generate BSON

The "encode JSON" step would happen inside the template processr, the "decode JSON" part in ommongodb. In essence, this looks like a quite flexible, but rather slow approach. After all, it would serialize to JSON just for interim needs. What I am actually looking for is this workflow:

text properties -> generate BSON

In that we would replace the JSON format with some internal format. That internal format in a kind already exists, in array passing mode. In this mode, the property text is passed in via an array. As a side-note, some transformations are necessary and desired even in internal format, as the property replacer permits to use not only the raw properties themselves but substrings, case conversions, regexes and the like. The problem with array passing mode is that it provides just the plain values. However, for BSON (and MongoDB) we also need to know the field name - and type information. The latter is probably easy, as rsyslog usually deals with text, only, and so we could stick to strings except maybe for dates. The field name is available since yesterday inside the template structure. However, there currently is no way for a plugin to access this information.

So it looks like the decent thing is to create a new interface that passes in a (description,value) pair to the plugin. The description most probably could be the template structure (or some abstraction if we feel bad about tying things too deeply together). That will prevent the detour via JSON, but still provide otherwise full capabilities. The bad thing, however, is that some complex interface gets yet another option (maybe it is time for a general cleanup?).

Feedback on this issue is appreciated.

No comments:

The clang thread sanitizer

Finding threading bugs is hard. Clang thread sanitizer makes it easier. The thread sanitizer instruments the to-be-tested code and emits u...