Wednesday, October 13, 2010

liblognorm will use passive Unicode mode (UTF-8)

I thought a while on how to support Unicode in liblognorm. The final decision is to use passive mode, which is a very popular option under Linux. A core driver behind this decision is the ability to safe lots of space (and thus also cache space and so processing time as well) as the majority of log content is written in US-ASCII. This is even the case in Asian countries, where large parts of the log message are usually ASCII but contain a few select fields in local language support (like names). Even if the message itself is in local language, there is a lot of punctuation and numbers in them, so I think the overall result will not use up notably more space than a UTF-16 implementation. I18N-wise, it must also be noted that UTF-16 is a very small (but important) subset of full unicode, so using UTF-8 gives us the ability to encode full 32-bit UCS-4 characters should there be need to do so.

The same decision will apply to the CEE library (whatever it will be named). This is also nicely in line with libxm2, which I intend to use for XML parsing.

No comments:

Introducing new team member

Good news: we have some new folks working on the rsyslog project. In a small mini-series of two blog postings I'd like to introduce the...