Wednesday, April 30, 2008

rsyslog work log

Hi all,

again, I have been quite busy the past days. A lot was rsyslog related, but phpLogCon also kept me busy. I am still working on TLS support with some occasional bug-hunting in between.

TLS support grew too a much larger project than it initially looked. In short, it caused a more or less complete rewrite of the base networking classes. Quite complex work, even though it doesn't involve lots of code (I am always surprised how few code changes actually happen). I am going slowly but steadly through it and hope that we will have an even better abstraction of the networking classes when I am done. In the long term, it will probably also affect gssapi (in a positive way) which could finally lead to a fully generic, driver-based and highly flexible networking model. Well worth the wait, I'd say... ;)

So here is the past day's rsyslog work log:
- worked on netstrm abstraction (server side)
- got socket abstraction to work for server part
- fixed newly introduced memory leaks
- released 3.16.0
- -c option no longer must be the first option - thanks to varmjofekoj
for the patch
- added missing copyright statements (thanks to Michael Biebl for noticing)
- added select() driver for GnuTls
- made gtls server driver work in plain tcp mode
- added $DefaultNetstreamDriver config directive
- added $ActionSendStreamDriverMode config directive
- worked on klogd problem on gentoo
- moved netstrms, netstrm and nssel into a single loadble module
because they belong together
- fixed "loadbale module leak"
- fixed problem with module unload sequence

Thursday, April 24, 2008

finally... phpLogCon v2

Finally, we have released phpLogCon 2.1.0, the first official beta version of the v2 branch. PhpLogCon has been completely rewritten from scratch. It now offers a state-of-the art modern user interface and also is able to work with log files and not just databases. For example, it can be used to view a remote server's log files over the web (proper authentication settings highly recommended). It will evolve to a very capable search, reporting and analysis frontend for syslog data.

Let me stress the point that it can work with log files directly. For example, we have set it up on one of our mail relays so that we can review mail logs without the need to login onto that machine. Obviously, this functionality should only be available to authenticated users, but then it is quite useful. I would appreciate to learn about any more thought of how this tool can be put to good use.


We are currently setting up the infrastructure (mailing list et al) for phpLogCon. I'll do one more announcement here when this is completed. In the meantime, I suggest subscribing to freshmeat's announcements, which we maintain in a timely way. You can do so at:

Tuesday, April 22, 2008

work log

Past day's rsyslog work log:
- abstracted a driver level for netstream
- converted netstrm into generic netstrm and the nsd_pctp driver
- prepared everything so that I finally can begin to implement
TLS sender part (in plain tcp output)
- bugfix: a recent change effectively disabled error messages
- implemented a first working version of a TLS-enabled plain TCP
sender (but, of course, the implementation is insecure as it is)
- primarily phpLogCon design

A good blog post on syslog reliability

I found a good blog post describing the problems that go along with reliable logging. It also offers most of the options to resolve them:

The author missed one thing: a buffer must not necessarily exist only in main memory. When the in-core buffer runs out of space, you may also use a disk-based buffer, which offers much more capacity. Of course, even the largest disk-based buffer may be exhausted at some point, where one needs to resort to other strategies. But a disk-based buffer is an excellent solution for temporary (but lengthy) receiver outages.

Rsyslog has implemented everything mentioned in the paper, plus more. You can directly apply the knowledge you got from this paper to rsyslog. And then, you can dig down into the dirty details:

Friday, April 18, 2008

rsyslog work log

Yesterday's rsyslog work log:
- some cleanup
- created a global data object
- moved "family" variable to global data pool
- moved "bDropMalPTRMsgs" variable to global data pool
- moved "option_DisallowWarning" variable to global data pool
- moved "DisableDNS" variable to global data pool
- moved host/domain-name related variables to global data pool
- moved "glblModPath" variable inside global data pool (but
still as a variable, not part of glbl object)
- added the ability to specify an error log function for the
- removed dependency of core runtime on dirty.h
- imported tcp module from librelp as basis for new stream class
we got permission to include the tcp module from librelp
copyright holders
- done some forward-compatibility work on librelp
- brought netstrm to a (hopefully) somewhat usable state
- partly rewritten and improved omfwd
- some (small) cleanup of omgssapi
- optimized omfwed, now loads TCP code only if this is actually necessary

Thursday, April 17, 2008

rsyslog work log

Yesterday's rsyslog work log:
- more or less finished im3195, but need changes in liblogging
to complete this work - does not compile yet
- moved files to runtime library part
- some cleanup
- provided ability to initialize the runtime
- some more cleanup; reduced dependencies, moved non-runtime
files to its own directory except for some whom's status
is unclear
- completed im3195 including some documentation
- changes due to restructuring in 3.17.2 have big bug potential;
beta 3.15.x has almost no bug potential; thus I initiated a
shift of devel -> beta -> v3-stable; devel will restart at 3.19.0
- prevented segfault during runtime library init phase

Wednesday, April 16, 2008

rsyslog work log

Yesterday's rsyslog work log:
- added imklog doc
- begin LGPL change for a select set of files (core runtime)
- merged in bsd-port and klogd changes
- released 3.17.1
- worked on rsyslog runtime library
- worked a bit on phplogCon
- worked on liblogging 0.7.0
- begun re-integrating rfc3195 in rsyslog

Tuesday, April 15, 2008

on the rsyslog runtime

I had a conversation on the new runtime design for rsyslog. I think it had a quite some good technical information (and brief), so I reproduce it here:

> The new design is:
> rsyslog core (GPL)
> rsyslog runtime (LGPL)
> modules (whatever)

Yes, actually I intended to say that ;)

> what is the interface between
> rsyslog core/syslog-ng <> runtime? Pipe/linked
> runtime <> module? Pipe/linked
The runtime always needs to be linked. A few cases:

rsyslog <> runtime --> always linked
syslog-ng/whatever <> runtime... linked
runtime <> module -> linked

BUT the interesting case is:

rsyslog <> plugin -> linked or pipe
syslog-ng/whatever -> linked or pipe

In these cases, it depends on build parameters (all of this of course
not yet implemented). I would anticipate these combinations to be found
in practice:

rsyslog <> plugin -> linked [pipe for new functionality on old engine]
syslog-ng/whatever -> pipe

So with rsyslog <> plugin, that would just a fallback if you need it for
some reason (e.g. run a v3 engine with a v4 plugin WITHOUT the need to
backport it [requires a v3 and v4 runtime to be present on the system,

For non-rsyslog syslogd's I expect that pipe is always used, because I
do not think they'll change their environment to adopt to the runtime.

rsyslog will provide wrappers for either interface. They will come as
separate binaries. There will be an input and output plugin to allow any
process (not necessarily rsyslog technology) to utilize a standard unix
pipe interface for producing and consuming messages.

Library modules will never use the pipe interface - it's too slow and
too loosely coupled.

linking offers much greater performance and is lossless.
pipe is comparatively slow and may lose some messages.

Pipe is obviously a much easier and thus universal interface than the
plugin interface.

rsyslog work log

Past day's rsyslog work log:
- bugfix: omsnmp had a too-small sized buffer for hostname+port. This
could not lead to a segfault, as snprintf() was used, but could cause
some trouble with extensively long hostnames.
- removed dependency on MAXHOSTNAMELEN as much as it made sense.
GNU/Hurd does not define it (because it has no limit), and we have taken
care for cases where it is undefined now. However, some very few places
remain where IMHO it currently is not worth fixing the code. If it is
not defined, we have used a generous value of 1K, which is above IETF
RFC's on hostname length at all. The memory consumption is no issue, as
there are only a handful of this buffers allocated *per run* -- that's
also the main reason why we consider it not worth to be fixed any further.
- worked on tls support (as part of libsci)
- thought about modularization
- wrapped up modularization problem --> suggested rsyslog-runtime
- some cleanup
- enhanced legacy syslog parser to handle slightly malformed messages
(with a space in front of the timestamp) - at least HP procurve is
known to do that and I won't outrule that others also do it. The
change looks quite unintrusive and so we added it to the parser.
- implemented high precision timestamps for the kernel log. Thanks to
Michael Biebl for pointing out that the kernel log did not have them.
- provided ability to discard non-kernel messages if they are present
in the kernel log (seems to happen on BSD)
- cleanup of imklog
- implemented $KLogInternalMsgFacility config directive
- implemented $KLogPermitNonKernelFacility config directive

Friday, April 11, 2008

TLS, loosely coupled modules, a runtime and licensing...

Again, I reproduce a mailing list post in the hopes to reach the broadest audience and also keep a permanent record here in my rsyslog history.

Hi folks,

I am sorry, this will be a long mail. But I would appreciate if you'd
read it in full and comment on it. This mail covers a really important
decision for rsyslog and will probably even influence if the project
succeeds in the long term. Package maintainers and code contributors are
especially requested to *really* read it. Though I t try hard to provide
all relevant facts (that's why it is getting long), I will probably miss
some and not properly convey others. Please feel free to ask.

Let me start with explaining that the rsyslog project conceptually
consists of three parts:

- the modules
- "helper" functions
- rsyslog-specific functionality

Modules are actually projects in their own right, just being distributed
with the rsyslog tarball for convenience. A module may be released under
any license. Note that modules call both rsyslog-specific functionality
(e.g. to submit a message) as well as helper function (e.g. to handle
tcp sessions).

The "helper" functions are a growing set of generic objects. Examples
are the module loader, the queue engine, networking support, the script
engine and virtual machine, ... - you get the idea: Things that are used
inside rsyslog but are not necessarily of use only for rsyslog.
Actually, this could be called a "rsyslog runtime library".

Rsyslog-specific functionality is primarily rsyslogd and everything it
takes to glue together helpers and plugins to build the working syslog

Let's stick with that for a brief while. Now let me explain the idea of
loosely coupled modules. This stems back to JF's effort to convince me
to the Unix philosophy of "small tools that work well together". We had
another good discussion yesterday (on the blog) and it made me change my
mind a bit (well, probably, not 100% convinced yet, but it he managed to
seed the thought ;)). While I still think that there are things that
need to be really tightly coupled to the rsyslog core, there are others
which not necessarily need to. Let me call the later "loosely coupled
modules", in contrast to the (tightly coupled) plugins that actually
become part of the rsyslogd process during runtime. The analysis plugins
I have on my mind could become such loosely coupled modules. As an
interface, the usual Unix "send it and forget it" pipe could be used,
and it would probably be acceptable to allow for minor message loss
during shutdown and plugin failure (anything else would require a pipe
application protocol, e.g. relp over pipe, which sounds scary).

The plus in doing so would be the ability to use those plugins in
configurations where rsyslog is not present (e.g. driven by syslog-ng or
a detached message generator [fed from a log file]). Done right, one
could even select the (sligly lossy) pipe interface or the full blown
plugin interface as a compile time switch. If you think it out, we may
even end with an abstraction layer where each module can be compiled for
either the plugin interface or the pipe (no promises, though).

One problem with this approach is that modules call into the rsyslog
helpers. For example, rsyslog's network support need to be available for
all those modules that do something over the net. That's not a problem
if I have a tightly coupled plugin as today (the rsyslog core makes the
necessary bindings). It would become more problematic if I move the
module to a pipe interface, because I now need to find a way to use the
rsyslog objects. But that's still doable (though pretty ugly). It
becomes really problematic if the same module, using a pipe interface,
is to be used with e.g. syslog-ng. I don't think that syslog-ng will be
able to provide it with an emulated rsyslog "net" object.

Let's stick with this problem for a moment. Coincidentally, we had
another discussion on the mailing list yesterday - on the TLS support
wrapper for rsyslog and librelp. That discussion centered around
licenses. Technically, there are also a number of issues. I have now
involved myself enough with GnuTLS and a bit of NSS so that I am able to
try draft a first abstraction layer. I thought hard and the "right
solution" involves encapsulation stream network access. So the right
thing to do is to have one object that handles network streams. That
object then is configured to use either plain tcp, TLS (via whatever
library) or even GSS-API. Nice and clean. It gets dirty if I think about
the details. If I do it that way, it makes rsyslog depend on this object
(so-far codenamed libsci). However, that would mean that any rsyslog
installation would need to pull in libsci. Not a big deal, except,
right, except if the crypto libraries are also pulled in by libsci. So
would every desktop system running rsyslog need to have the crypto
libraries installed? Scary... unacceptable.

In rsyslog, we had the same problem a few month ago (at that time the
mysql client libraries were the problem). The solution was the rsyslog
loader, which dynamically loads other libraries (and their dependencies)
on demand. The loader is what enables rsyslogd to be installed
everywhere, but only with minimal core requirements (and have everything
else in separate packages). So if libsci would be part of rsyslog, we
would not have any problem at all. After all, the necessary plumbing is
ready at hand in form of the rsyslog helper objects.

This is where we come back to loosely coupled modules. You notice it is
the same problem? Both them as well as libsci would need to call the
rsyslog helpers.

Now let's come a bit to licensing. In order to understand that, we need
to talk about rsyslog funding first. Obviously, I am spending full time
(and a bit more than that) on rsyslog for quite a while now. I even
intend to do that for some more months as rsyslog is currently mabye 55%
of what I would like it to be. Somehow I must get funding - for the
time, for the hardware and for all the other things ;) What made the
rsyslog project possible, and still 99.9% funds it is Adiscon, the
company that provides (of course ;)) the best-ever logging solutions on
Windows. Actually the Windows closed source pays for the rsyslog
project. While we hope to find other sources of funding in the future, I
can not ignore the fact. Once thing we would like to do at Adiscon is
include select parts of the technology I am now developing into the
closed source applications, too. The most prominent example is the RELP
protocol. I obviously find this a fair policy - after all, the
alternative would be to do it in closed source only and I was able to
convince my folks at Adiscon that it is far better to contribute to the
open source world.

There is one drawback in this requirement: licensing. Of course, we
could pick a BSD style license and every problem would be solved. But, I
have to admit, we do not like to give everything to our competitors in
the *closed source* space. We have made very bad experiences with folks
building on our technology and even turning it against us. I won't get
agreement from Adiscon to use a BSD license for everything (plus I
personally, too, don't like to see that effect).

We already discussed this on the mailing list here as part of
dual-licensing in the past. The solution was that the technology in
question was created as its own, dual-licensed, project. This lead to
the creation of librelp. Rsyslog itself was left under GPLv3 (which I
sincerely believe in because of its anti-patent, anti-drm clauses - even
though the license gives myself obviously some troubles).

Dual-licensing librelp lead to some duplicate code and made me not use
some features which I could have used if I had access to all rsyslog
helper objects. For librelp, that is not yet a big deal, because it is
quite unique, very few needs to access the rsyslog helpers. With TLS,
however, the situation changes and we get the dangling issue of rsyslog
helpers in librelp, too.


If I put all of this together, I think I have taken a (slightly? ;))
wrong path. The core problem is monolithic design from a very high point
of view. I have to admit I think this is what JF and some others were
pointing out, but I didn't realize it quickly enough. Sure, rsyslog is
quite modular by now. But rsyslog always requires rsyslog to do
everything. It is very hard to do any rsyslog-related work without the
rsyslog core. While rsyslog has a carefully crafted set of helper
objects, these are not exposed to the outside world. And the licensing
issues associated with that design begin to screw up everything in the
long run.

I think we need change. The obvious solution seems to be extracting the
rsyslog helpers out of the rsyslog core project and create a "rsyslog
runtime". That runtime than could individually be installed and be put
under a different license (bear with me, explanation follows below).

Let's consider a complicated case with the runtime. Assume we have a
plugin "NeverBeforeSeenAnalysis". Let's say someone wants to use it with
syslog-ng (!). With the runtime, all needed would be to compile it for
the pipe interface and install rsyslog-runtime and the module onto the

Now let's consider Adiscon's MonitorWare products on Windows. When they
implement RELP, they need librelp and can pull in the rsyslog-runtime
(for network access including TLS).

For rsyslogd itself nothing really happens, the runtime is now just its
own library - linking to it needs not to be modified. So for rsyslogd,
the change would be transparent.

Technically, this indeed solves the issues. Let me stress the point that
it leads to code reuse, where I currently need to rewrite things (which
increasingly concerns me, especially from a maintenance point of view).

Now, on to the licensing. Obviously, the MonitorWare use case above
would be totally incompatible with GPLv3. So the rsyslog-runtime would
need to be under a different license. It could be dual-licensed, but I
think that would probably do more bad than good. I think I can convince
Adiscon to go with LGPL for the runtime part. Granted, it introduces
risk of closed source competitors pulling it in, but the advantages
should outweigh this risk.

>From the ability to put this work under a different license, I think I
am in good shape: most of the helper objects are freshly written and
have only received limited patches (if at all) from contributors. I can
contact them and ask for permission to change the license. Where I don't
get permission, I think I can re-implement the contribution. Again, most
of the code in question has been written in the past 4 month and is
99.999% non-contributed. There may be some few runtime objects which
stem back to sysklogd. There, a license change is impractical. I'll have
to life with the fact that those can not go into a re-licensed runtime.
Depending on how important the functionality is, I either need to
rewrite or drop it (for non-rsyslogd use). In any case, this looks
(pending detail analysis) quite possible.

Big question number one is what you think of this runtime approach? Have
I overlooked something? Do you object it for some reason? If so, which?

The next question is how to package this inside the source tree.
Remember, currently rsyslog and the plugins (considered separate
projects) are all packed together inside a single tarball. This is very
convenient, both for me as well as for package maintainers and users.
The question is if we split rsyslog into the rsyslogd and the
rsyslog-runtime, will we continue to deliver the runtime as part of the
rsyslog package? Or would it be better to move it to a librsyslog
project? Other than with the plugins, we actually would have two
different licenses, so it may be confusing to have both of it in the
same project (but I have seen that GnuTLS uses exactly this approach,
with the main library being LGPL and the - included - extras library GPL

So that's the next question, obviously depending on the first: how to
pack projects if we do a runtime split?

I know this is a long and dense mail. My apologies for this. But I think
the discussion is needed. I honestly believe that a number of
discussions in the past weeks actually circled around this theme, we
just didn't actually get down to the point.

Please note that I will hold TLS development until we have reached
consensus on the runtime/licensing topic. The reason is if we don't do a
runtime split, I need to do things considerable different than when we
do one (much more code, probably yet another external library). So,
obviously, I have a current bias towards the split. However, experience
shows that I (as everyone ;)) tend to overlook or misunderstand things.
Thus your feedback is so important. I don't like the idea of jiggling
back and forth on such an important topic as licensing and high-level
modularization, so I would like to get it now done in "the right way"
and keep it stable for at least the foreseeable future. Given the fact
that the decision somehow affects rsyslog's development as whole, I
would even appreciate quick feedback.

In this spirit: please let the comments flow ;)


phpLogCon - why the next version

I am in an interesting email discussion and would like to share something on phpLogCon that's probably of interest for others, too:

A major reason for phpLogCon v2 is enhanced functionality. We'll switch away from a pure database paradigm. We'll be able to work with log files (much faster and sufficient in many cases), of course databases but in the long term also with a specialized not-yet-written logging-specific datastore. We also want to build a community around the new phpLogCon and as an important step set up a public troubleshooting database and of course connect it to exisiting ones on the web. So the core idea of phpLogCon v2 is to create a system where users can analyze their logs but at the same time collaborate on finding solutions to issues they see (in the long term we may even get to a point where we can identify problems based on patters - but that is far too far away ;)). So the phpLogCon idea has gotten a bit broader (I should probably post that on the side, too).

Of course, rsyslog will also contribute to this vision - the overall idea is to create a great system monitoring and auditing system that not only helps with compliance but enables you to fix any upcoming trouble before it really hurts.

Thursday, April 10, 2008

rsyslog work log

Yesterday's rsyslog work log:
- begun to work a bit on BSD portability issues
- changed imklog to a driver interface
imklog now uses os-specific drivers. The initial "set" contains
the linux driver. This is a prequisite for BSD klog, which can
now be implemented on that driver interface.
- improved detection of modules being loaded more than once
thanks to varmojfekoj for the patch (v3-stable)
- released 3.14.2

Wednesday, April 09, 2008

Why is native email capability an advantage for a syslogd?

Following up on my post on rsyslog's new native email capability, an interesting conversation arose. I'd like to share it with you:

> > I promise to listen very carefully and try to implement anything that is
> > doable and makes sense in the rsyslog context.
> >
> One thing springs to mind - I think "sendmail" support is more important
> than you give it credit.
> What if you've got an alert rule in rsyslog to email you when your
> network link fails - but your SMTP server is at the other end of the
> link? :-) If you used sendmail - you get requeuing and retrying for free
> - I don't think you want to have to add that to your SMTP support...

Well, that's actually not an issue at all in rsyslog. The rsyslog core
engine is reliable [to be precise: can be configured to be reliable,
it's not by default] in a way that exactly handles this situation. In
rsyslog, any action, including now mail, can run on its own queue. When
an action fails, it tells the rsyslog core that it could not
successfully complete. Then, the rsyslog core schedules retries until it
finally succeeds. While doing so, the messages are kept inside a queue.
This queue is in memory as long as that's sufficient and is moved to
disk if there is demand (e.g. rsyslog shutdown, running out of
configured in-memory queue space). A sample of such a configuration
(this time with the database writer), can be found at:

Bottom line - rsyslog is designed to work with failing destinations and
automatically recover these. So there is nothing special needed to make
it handle a failing smtp connection.

In fact, I consider the SMTP direct mode more reliable than the sendmail
mode, exactly because of that feature. With sendmail, I hand over the
message to an external entity but do not know if delivery succeeded.
With SMTP direct, I know at least it made its way to the SMTP sever.
Granted, I don't know if the SMTP server will ultimately deliver it, but
I have a bit more control over what's going on.

For example: rsyslog also has a mode where it can use backup actions if
things fail (after n retries). So let's consider the example above.
Let's say we have an urgent alert, but the smtp server is down. With
sendmail, I hand the message over to sendmail but do not know that
sendmail actually queues it. With smtp direct, I *know* that the smtp
server is unresponsive. Depending on the urgency, I may either do a few
retries or I may immediately switch to another delivery method. For
example, I may than go to try SNMP. Or I may do another email action in
this case and try to contact a email-to-sms gateway so that this can be

Please note that in rsyslog one can have multiple actions chained
together. So a probable scenario to handle such a case could be

1. try to email via the corporate server
2. if that fails, try to email via a public gateway
3. if that fails, start a program to do some automagic action

All of this is possible because of I do not use sendmail. But, again, I
of course do not know if the mail server I used with rsyslog succeeds in
its delivery attempt. One weak spot always remains ;)

To use yesterday's sample, one could use a backup SMTP server with just
a little bit of configuration as follows:

$ModLoad ommail
$template mailSubject,"disk problem on %hostname%"
$template mailBody,"RSYSLOG Alert\r\nmsg='%msg%'"

# primary action
$ActionMailSubject mailSubject
# make sure we receive a mail only once in six
# hours (21,600 seconds ;))
$ActionExecOnlyOnceEveryInterval 21600
# the if ... then ... mailBody mus be on one line!
if $msg contains 'hard disk fatal failure' then :ommail:;mailBody

# begin backup action, carried out if primary fails
$ActionExecOnlyWhenPreviousIsSuspended on
$ActionMailSubject mailSubject
$ActionExecOnlyOnceEveryInterval 21600
& :ommail:;mailBody

phpLogCon design decisions...

I had a nice chat with my buddy Andre today on some open issues with phpLogCon v2. With his permission, I am posting it here. Others may find it useful (or may not ;)). If you'd like to voice your opinion, please simply do...

Rainer Gerhards [15:10]:
on the windows event log format inside syslog messages
Rainer Gerhards [15:10]:
we could go with the syslog-protocol format:
Rainer Gerhards [15:10]:
see section 6.3.5 - samples
Andre Lorbach [15:11]:
ok one second
Rainer Gerhards [15:12]:
I had this use case on my mind what I crafted the draft
Andre Lorbach [15:12]:
loooks good to me
Rainer Gerhards [15:12]:
an elaborate sample a bit later on page 19
Rainer Gerhards [15:12]:
<165>1 2003-10-11T22:14:15.003Z
evntslog - ID47 [exampleSDID@0 iut="3" eventSource=
"Application" eventID="1011"] BOMAn application
event log entry...
Rainer Gerhards [15:12]:

Rainer Gerhards [15:12]:
tell it all
Rainer Gerhards [15:12]:
so it could actually be
Rainer Gerhards [15:13]:
[winevent@ iut="3" eventSource="Application" eventID="1011"]
Rainer Gerhards [15:13]:
questions is .. do we need more than iut, eventSoruce and eventID to classify the message sufficiently?
Andre Lorbach [15:14]:
hrm well there is eventCategory, eventLogType and eventUser which migh tbe important
Rainer Gerhards [15:14]:
what was type again?
Andre Lorbach [15:14]:
EventLogType is "Application" or "System" for example
Rainer Gerhards [15:15]:
isn't that source?
Andre Lorbach [15:15]:
source is the source machine
Andre Lorbach [15:15]:
or you mean eventsource ?
Rainer Gerhards [15:15]:
eventsource... yepp, that was what I wrote
Andre Lorbach [15:15]:
eventsource is "Adiscon WinSyslog" or "Adiscon Eventreporter"
Rainer Gerhards [15:16]:
doesn't matter
Rainer Gerhards [15:16]:
I meant the log type "system", "applicaton", ...
Andre Lorbach [15:16]:
Andre Lorbach [15:17]:
ok it could be eventSource="Application\AdisconEventReporter"
Andre Lorbach [15:17]:
for example
Rainer Gerhards [15:17]:
let's use the defined values
Rainer Gerhards [15:17]:
I'd say we transfer in structured data:
Rainer Gerhards [15:18]:
id, user, netventlogtype
Andre Lorbach [15:18]:
but sourceproc shouldn't be missing
Rainer Gerhards [15:18]:
and maybe severityid and, yesm, sourceproc
Rainer Gerhards [15:19]:
you are right, that should be required
Andre Lorbach [15:19]:
severityid is mapped to syslog priority
Rainer Gerhards [15:19]:
but its a lossy mapping
Andre Lorbach [15:19]:
ok thats a point
Rainer Gerhards [15:19]:
e.g. audit success becomes info
Rainer Gerhards [15:20]:
it would be best if the parser would understand a all properties
Rainer Gerhards [15:20]:
but accept if some were missing
Rainer Gerhards [15:20]:
too much work for an initial effort?
Andre Lorbach [15:20]:
it could be done easily with preg_match if we order the properties from mandetory to optional
Andre Lorbach [15:21]:
for example
Rainer Gerhards [15:21]:
that we can require
Andre Lorbach [15:21]:
id, user, sourceproc, netventlogtype is mandetory
Rainer Gerhards [15:21]:
lets keep stupid simple for the beginning
Rainer Gerhards [15:21]:
Andre Lorbach [15:21]:
but it can also be

id, user, sourceproc, netventlogtype, severityid
id, user, sourceproc, netventlogtype, severityid, category
id, user, sourceproc, netventlogtype, severityid, category, bdata

Rainer Gerhards [15:22]:
bdata is toooo much
Rainer Gerhards [15:22]:
I'd drop that idea
Andre Lorbach [15:22]:
ok leave bdata out
Rainer Gerhards [15:22]:
it scares me from the syslgod side
Andre Lorbach [15:22]:
then I would have 3 different preg_match calls for all
Rainer Gerhards [15:22]:
it won't survive the 2k limit in any case
Rainer Gerhards [15:22]:
sounds good
Rainer Gerhards [15:22]:
so I'd say that's decided
Andre Lorbach [15:22]:
ok I will note it down into my to do list ... on emoment
Rainer Gerhards [15:23]:
we should also see that MonitorWare has a few default defintiions for this
Andre Lorbach [15:23]:
Rainer Gerhards [15:23]:
would be useful and prevent users from getting frusrtrated - plus not a big deal to do
Andre Lorbach [15:23]:
three should be a button for it like for other products
Rainer Gerhards [15:23]:
Andre Lorbach [15:23]:
we made before
Rainer Gerhards [15:23]:
oh, one thing
Rainer Gerhards [15:23]:
if we go the -syslog-protocol path, we must be a bit careful
Rainer Gerhards [15:24]:
there are side-effects that could negativly affect us
Rainer Gerhards [15:24]:
I'd say we create the templates and see what happens
Andre Lorbach [15:24]:
Rainer Gerhards [15:24]:
if it is too bad, we may use an interim template
Rainer Gerhards [15:24]:
e.g. replace [ by {
Rainer Gerhards [15:25]:
(the problem i see is that e.g. rsyslog *knows* syslog-protocol and does NOT put structured data into log files by default -these were not meant for humans)
Rainer Gerhards [15:25]:
*that* would be problematic
Andre Lorbach [15:25]:
Rainer Gerhards [15:25]:
of course, proper rsyslog configuration gets that, but another point that can get the user frustratec
Rainer Gerhards [15:26]:
ok, lets move on (I may need to be afk for a quick while but I let you know then)
Andre Lorbach [15:27]:
one minute ...
Rainer Gerhards [15:28]:
ok, now I need a couple of minutes
Andre Lorbach [15:28]:
ok back
Andre Lorbach [15:28]:
lol ok ...
Rainer Gerhards [15:28]:
will ping you
Andre Lorbach [15:29]:
Rainer Gerhards [15:42]:
Andre Lorbach [15:42]:
Rainer Gerhards [15:43]:
so what's the next point?
Andre Lorbach [15:43]:
ok lets begin from top of the list
Rainer Gerhards [15:43]:
Andre Lorbach [15:43]:
- syslogtag, source filtered with OR like facility, priority

Andre Lorbach [15:43]:
this is how I implemented the filtering for now
Rainer Gerhards [15:44]:
I think that's fine
Andre Lorbach [15:44]:
Andre Lorbach [15:44]:
Rainer Gerhards [15:44]:
I got it
Andre Lorbach [15:44]:
because and filtering didnt make much sense
Rainer Gerhards [15:44]:
of course, it depends on user feedback
Rainer Gerhards [15:44]:
but it sounds intuitive and useful
Andre Lorbach [15:44]:
Andre Lorbach [15:44]:
- Add predefined searches into config? If yes how?
Andre Lorbach [15:44]:
next point
Rainer Gerhards [15:44]:
Andre Lorbach [15:44]:
we said we wanted to have a predefined selection of "searches" next to the search field
Rainer Gerhards [15:44]:
I'd say "yes", but for the time being, these must be on a system-wide basis
Rainer Gerhards [15:45]:
because we do not yet have the user profiles
Andre Lorbach [15:45]:
ok ... we wanted system wide predefined searches anyway
Rainer Gerhards [15:45]:
so I'd simply add them as config variables
Rainer Gerhards [15:45]:
Rainer Gerhards [15:45]:
Andre Lorbach [15:45]:
so we just add user defined later
Rainer Gerhards [15:45]:
that I would suggest
Andre Lorbach [15:46]:
my idea to save them in the configuration is by kindly having a nother array, with two strings, one for the name of the search and one containing the filter string.
Rainer Gerhards [15:46]:
sounds perfect to me
Andre Lorbach [15:46]:
kk I will just note this down ...
Andre Lorbach [15:47]:
- Making displayed columns configureable, how and which fields are available?
Rainer Gerhards [15:47]:
again... this time a system default
Rainer Gerhards [15:48]:
later taken from the user's profile
Andre Lorbach [15:48]:
So we go for a fixed set of columns for now ?
Andre Lorbach [15:48]:
and make them configureable later ?
Rainer Gerhards [15:48]:
no, I'd go for configurable
Rainer Gerhards [15:48]:
but again, on a system-wide basis
Rainer Gerhards [15:48]:
(which also means the plumbing is there, but we "just" do not yet have the user-settings)
Andre Lorbach [15:50]:
ok ... my best idea for now is also to use an array again, a single dimension array might be sufficient. each array entry contains the internal property name of the column
Rainer Gerhards [15:50]:
I concur on the array. I'd just use two values inside it
Rainer Gerhards [15:51]:
the name and also the POSITION
Rainer Gerhards [15:51]:
so where it is shown inside the table first column, second and so on
Rainer Gerhards [15:51]:
or do you take that simply from the array index?
Rainer Gerhards [15:51]:
lol... probably
Andre Lorbach [15:52]:
the name could be taken automatically by having the properties associated to names internally, and the position can be definied by simply moving the array entries up and down
Rainer Gerhards [15:52]:
so, yes just the name and it is ordered in order of appearance of the index (provided the index is monotonically incrementing)
Andre Lorbach [15:52]:
but this only works on a single dimensikon array
Andre Lorbach [15:52]:
Rainer Gerhards [15:52]:
I got it
Andre Lorbach [15:52]:
$col[] = "prio";
$col[] = "fac";
$col[] = "msg";

Andre Lorbach [15:52]:
you just move them as you need
Rainer Gerhards [15:52]:
sounds good
Andre Lorbach [15:54]:
next point: - split configuration files, or everything in one FILE?
Andre Lorbach [15:54]:
I would say for now one file, or ?
Rainer Gerhards [15:55]:
one file - and I honestly don't see any use for multiple files in the future
Rainer Gerhards [15:55]:
(except for includes )
Rainer Gerhards [15:55]:
why would they be useful? What's the idea behind it?
Andre Lorbach [15:56]:
no real idea, just to seperated db config from user config and so on... but I would also rather use one config file for now
Rainer Gerhards [15:56]:
let's go for one and see if there actually ever pops up a real need to have multiple files
Andre Lorbach [15:56]:
Andre Lorbach [15:56]:
next point: - Add Help Page
Rainer Gerhards [15:56]:
would be very nice to have
Andre Lorbach [15:57]:
maybe something for Florian later ?
Rainer Gerhards [15:57]:
but its pretty intuitive, so I don't think its a showstopper
Rainer Gerhards [15:57]:
well... whoever
Rainer Gerhards [15:57]:
just lets keep the todo item
Andre Lorbach [15:57]:
Rainer Gerhards [15:57]:
(and just to make sure: we are talking abotu a one-pager, with maybe 200 lines or so?)
Andre Lorbach [15:58]:
I would go for one page, maybe with anchors it... like the SimpleMail manual, but with the full phplogcon framework of course
Rainer Gerhards [15:59]:
I concur - just wanted to evaluate required effort a bit
Rainer Gerhards [15:59]:
so its not a big deal in any case
Andre Lorbach [15:59]:
nope it isnÄ't
Rainer Gerhards [15:59]:
I'd say we should bring it up once we know what exactly needs to go in
Andre Lorbach [15:59]:
Andre Lorbach [16:00]:
then to the next point:
- What does "I'am feeling sad ..." button?
Rainer Gerhards [16:01]:
Rainer Gerhards [16:01]:
I like that
Andre Lorbach [16:01]:
Rainer Gerhards [16:01]:
first of all, would you object if we rename it to either "I'd like to feel sad" or "I am feeling too lucky"
Andre Lorbach [16:01]:
an image could show up, showing the linux pengiun shooting bill gates ?
Rainer Gerhards [16:01]:
Andre Lorbach [16:01]:
Rainer Gerhards [16:01]:
with the rename
Rainer Gerhards [16:02]:
I'd do a search for "error" inside msg
Rainer Gerhards [16:02]:
something that helps you feel sad ;=)
Andre Lorbach [16:02]:
Rainer Gerhards [16:02]:
do we have a deal on that?
Andre Lorbach [16:02]:
Ok so the button actually would do a search for error messages
Andre Lorbach [16:02]:
sounds good to me xD
Rainer Gerhards [16:02]:
messages with "error" in them, to be precises
Rainer Gerhards [16:02]:
but the button must be renamed
Rainer Gerhards [16:03]:
and, to get serious again,
Rainer Gerhards [16:03]:
i'd do this via another set of config option: text for the button as well as query to execute
Rainer Gerhards [16:03]:
so one can acutally set it to something that is used really frequently
Rainer Gerhards [16:03]:
with the joke just being the default (of course )
Andre Lorbach [16:03]:
So it is a kind of a custom shortcut button
Andre Lorbach [16:03]:
Rainer Gerhards [16:03]:
Rainer Gerhards [16:04]:
I personally would find such a shortcut very useful
Andre Lorbach [16:04]:
i like r "I'd like to feel sad"
Rainer Gerhards [16:04]:
Rainer Gerhards [16:04]:
Rainer Gerhards [16:04]:
or how about "make me feel sad"?
Andre Lorbach [16:05]:
lol ok
Rainer Gerhards [16:05]:
no, the other one's better
Andre Lorbach [16:05]:
I'd like to feel sad ?
Rainer Gerhards [16:05]:
Andre Lorbach [16:05]:
Andre Lorbach [16:05]:
last discussion point:
- TimeFilter, continue reading logfile if timefilter failed, so if

Andre Lorbach [16:06]:
Rainer Gerhards [16:06]:
Andre Lorbach [16:06]:
let me explain this further
Andre Lorbach [16:06]:
for example I want to view the events from last 24 hours.
Andre Lorbach [16:06]:
and we have the logfile stream
Andre Lorbach [16:07]:
the stream will be fully read if there are not more then 30 messages from the last 24 hours.
Andre Lorbach [16:07]:
Can we assume that a logfile will always be consistent in the date and time from beginning till the end ?
Rainer Gerhards [16:07]:
I'd say .... no
Rainer Gerhards [16:08]:
it very much depends on how the file is written
Andre Lorbach [16:08]:
only if we assume this, it could make sense to abort reading if time fails below 24 hours
Rainer Gerhards [16:08]:
by default, timestamps are taken from the message
Rainer Gerhards [16:08]:
as such, they can largely diverge
Andre Lorbach [16:08]:
otherwise we have to live with the overhead to read the whole log
Rainer Gerhards [16:08]:
maybe we should set a property in the stream's config
Rainer Gerhards [16:08]:
because the overhead may be high
Rainer Gerhards [16:09]:
and with a proper template, I can ensure that the time log record is always advancing and consistent
Rainer Gerhards [16:09]:
so someone who tweaks the system could use that template
Andre Lorbach [16:09]:
I would say for the first beta we live with this overhead and add a stream config property later
Rainer Gerhards [16:09]:
and then define the stream to be "with consistent time"
Rainer Gerhards [16:09]:
makes sense
Rainer Gerhards [16:09]:
just add a TODO note... It's easily forgotten
Andre Lorbach [16:10]:
I'll put it on my TO DO after BETA ;)
Rainer Gerhards [16:10]:
Rainer Gerhards [16:10]:
btw: isn't that something that we could track as an enhancement request in bugzilla? or is it too early for that right now?
Andre Lorbach [16:11]:
I would start doing this after teh first beta
Rainer Gerhards [16:11]:
Rainer Gerhards [16:11]:
whatever you like most
Rainer Gerhards [16:13]:
so we are done for now? :)
Andre Lorbach [16:13]:
ok this were all my discussions topics for now, then I have some todo's before the BETA to reconcile with you
Rainer Gerhards [16:14]:
go ahea
Andre Lorbach [16:14]:
i just post them, if it is fine you just answer with yes or ok, an d I go on.
Andre Lorbach [16:14]:
- Add Basic configuration variables into config file.
Rainer Gerhards [16:14]:
I don't fully get the point here
Andre Lorbach [16:14]:
ok this one is obvios
Andre Lorbach [16:14]:
Rainer Gerhards [16:14]:
Andre Lorbach [16:14]:
- Create installer script.

Rainer Gerhards [16:15]:
a lot of work?
Andre Lorbach [16:15]:
Guess not now
Andre Lorbach [16:15]:
I have a good sample of my installer from my stats project
Rainer Gerhards [16:15]:
ok, than its probably something to have starting from day one (its a must-have anyhow)
Andre Lorbach [16:15]:
I think i can easily adapt is
Rainer Gerhards [16:15]:
excellent :D
Andre Lorbach [16:15]:
+ for now we do not have to configure a database so its even more easy
Andre Lorbach [16:15]:
it will have to write a default config.php
Andre Lorbach [16:16]:
maybe we can decide what will be configured in the install procedure
Andre Lorbach [16:16]:
I would say, one default source for syslog messages. The default language
Rainer Gerhards [16:16]:
Well... I'd don't try to do too much magic in the first placd
Rainer Gerhards [16:17]:
after all, folks need to know what they are doing
Rainer Gerhards [16:17]:
else it doesn't play nicely with their syslog system
Andre Lorbach [16:17]:
I think the syslog source is mandeory
Rainer Gerhards [16:17]:
there are alos a number of security things...
Andre Lorbach [16:18]:
a big fat warning will be shown of course if the install script is still available after installation
Rainer Gerhards [16:18]:
well... what I want to convey... I think those that will try the intial version will be able to change the settings inside the config file as long as they are sufficiently well pointd at it
Andre Lorbach [16:18]:
yes sure
Andre Lorbach [16:18]:
it just will help to easily create a default config
Andre Lorbach [16:19]:
let me show you how the stats installer looks like
Rainer Gerhards [16:19]:


Andre Lorbach [16:20]:
you will see the screenshots in this thread
Rainer Gerhards [16:20]:
already browsing...
Rainer Gerhards [16:21]:
looks very good
Rainer Gerhards [16:21]:
and already proven in practice
Rainer Gerhards [16:21]:
I like that ;)
Andre Lorbach [16:21]:
Andre Lorbach [16:21]:
the install system is similar to the postnuke one
Rainer Gerhards [16:22]:
so its also along the line of what people expect?
Rainer Gerhards [16:22]:
that would be even better...
Andre Lorbach [16:22]:
its a common step by step install
Rainer Gerhards [16:22]:
Rainer Gerhards [16:23]:
so its agreed upon
Andre Lorbach [16:25]:
very well
Andre Lorbach [16:26]:
- Implement DB Driver

Rainer Gerhards [16:26]:
ohhhhh... yes
Rainer Gerhards [16:26]:
last thing before initial release
Andre Lorbach [16:26]:
Rainer Gerhards [16:26]:
and maybe first thing after it
Rainer Gerhards [16:26]:
I'd say....
Andre Lorbach [16:26]:
- Fix Pager

thats more a note for me
Rainer Gerhards [16:26]:
let me quickly go back
Andre Lorbach [16:26]:
Rainer Gerhards [16:26]:
as I already said, I'd expect that the db driver is quite trivial
Rainer Gerhards [16:27]:
I'd *NOT* go for optimization in the first step
Andre Lorbach [16:27]:
Rainer Gerhards [16:27]:
so in essence it is just pulling the records from db, sorted on the uid and a bit glue to not pull a full result set every time
Rainer Gerhards [16:27]:
do you think it is much bigger?
Andre Lorbach [16:28]:
depends on how dynamic it has to be ... but I think it will become clear when I just start doing the DB Driver
Rainer Gerhards [16:28]:
yeah.. that's right
Rainer Gerhards [16:28]:
lets postpone the discussion
Andre Lorbach [16:28]:
only minor points left
Andre Lorbach [16:29]:
- Add Text Highlight Feature
Andre Lorbach [16:29]:
- Show Details + Full Message on MouseOver on Message field

Andre Lorbach [16:29]:
like in php-syslogng
Rainer Gerhards [16:29]:
that sounds like a good feature
Rainer Gerhards [16:29]:
Rainer Gerhards [16:29]:
one point
Rainer Gerhards [16:29]:
I think we do not currently have that
Rainer Gerhards [16:29]:
we have the list view
Rainer Gerhards [16:29]:
did we already discuss detail view?
Rainer Gerhards [16:30]:
that is what should be displayed when somebody clicks on a row in list view
Andre Lorbach [16:30]:
we have none discussed yet
Rainer Gerhards [16:30]:
much like in phpLogCon v1
Andre Lorbach [16:30]:
Rainer Gerhards [16:30]:
IMHO its not necessarily something need to have for intial release
Rainer Gerhards [16:30]:
but shortly after that
Andre Lorbach [16:30]:
Rainer Gerhards [16:31]:
a biggie?
Andre Lorbach [16:31]:
Andre Lorbach [16:31]:
don't think so
Rainer Gerhards [16:31]:
puuuh... ;)
Andre Lorbach [16:32]:
ok that is all on my list
Rainer Gerhards [16:32]:
one thought on the full message
Andre Lorbach [16:32]:
i will send you what I have written down for verification
Rainer Gerhards [16:32]:
Rainer Gerhards [16:32]:
Rainer Gerhards [16:32]:
I just thought that this may be quite a lot of data
Rainer Gerhards [16:32]:
in the long term, there should be an option to restrict the displayed size (first n chars) or turn it off altogether
Rainer Gerhards [16:33]:
I think that woud make sense
Andre Lorbach [16:33]:
you mean in the list view ?
Rainer Gerhards [16:33]:
the popup
Andre Lorbach [16:33]:
ah ok
Rainer Gerhards [16:33]:
but also in listview, you are right
Rainer Gerhards [16:33]:
think of an nt even message
Andre Lorbach [16:33]:
its done in the listview already#
Rainer Gerhards [16:33]:
2k in size...
Rainer Gerhards [16:33]:
Andre Lorbach [16:33]:
but it should be configurable
Andre Lorbach [16:33]:
the char limit
Rainer Gerhards [16:33]:
by all means
Rainer Gerhards [16:34]:
and while we are at it, we may also add an option to turn the popup off
Rainer Gerhards [16:34]:
that's not much work ;)
Andre Lorbach [16:34]:
Rainer Gerhards [16:34]:
once again: do you have any concerns if I post the chatlog for others to see?
Andre Lorbach [16:35]:
no problem

rsyslog work log

Yesterday's rsyslog work log:
- finished mail functionality (ommail)
- added $ActionExecOnlyOnceEveryInterval config directive
- did a couple of klogd-related bugfixes
- cleaned up v2-stable --> v3-stable git branches so that I can
merge changes in the future
- some minor bug fixes
- released 3.17.0

Tuesday, April 08, 2008

rsyslog work log

Past day's rsyslog work log:
- added RELP doc to man pages
- removed the 32 character size limit (from RFC3164) on the
tag. This had bad effects on existing envrionments, as sysklogd didn't
obey it either (probably another bug in RFC3164...). We now receive
the full size, but will modify the outputs so that only 32 characters
max are used by default. If you need large tags in the output, you need
to provide custom templates.
- changed command line processing. -v, -M, -c options are now parsed
and processed before all other options. Inter-option dependencies
have been relieved. Among others, permits to specify intial module
load path via -M only (not the environment) which makes it much
easier to work with non-standard module library locations. Thanks
to varmojfekoj for suggesting this change. Matches bugzilla bug 55.
- bugfix: zero-length strings were not supported in object
- bugfix: some messages were emited without hostname
- begun work on mail output plugin (open questions on TLS, so I see
if I can put this in between) - does not the least yet work ;)
- released 3.14.1 (v3 stable branch)
- bugfix: segfault in expression-based filter
- converted to git
- continued working on ommail

Friday, April 04, 2008

rsyslog work log

Past day's rsyslog work log:
- disabled atomic operations for the time being because they introduce some
cross-platform trouble - need to see how to fix this in the best
possible way
- added librelp check via PKG_CHECK thanks to Michael Biebl's patch
- added more descriptive error codes to module loader
- added more meaningful error messages to rsyslogd (when some error
happens during startup)
- updated status informatation and syslog-ng comparison
- begun working on time-window based dequeueing
- bugfix: memory leaks in script engine
- properties are now case-insensitive everywhere (script, filters,
- added the capability to specify a processing (actually dequeue)
timeframe with queues - so things can be configured to be done
at off-peak hours
- bugfix: some memory leak when queue is runing in disk mode

Wednesday, April 02, 2008

On the (un)reliability of plain tcp syslog...

Did you ever use TCP to transfer syslog reliably? And do you think that makes you immune against message loss? If so, it's time to think again.

There is a subtle, yet important problem with plain tcp syslog. It is a very simple protocol and it works without any app-level acknowledgment. Well, "why care", you may say "we have the TCP ack". That, of course is right, but that low-level ack won't help you in all cases. The TCP ack is fine while everything goes well. But if the connection breaks, there is no mechanism in TCP that tells the sender immediately. In fact, "not immediately" actually is "after [2] hours" (and only if you have keep alive active, else its even later). This is not a flaw in TCP but desired behavior. To be honest, when the server aborts and is restarted, the client will see an error as soon as the server is restarted. On a busy system, this may still be many messages later. So when the client receives the error status, it has no clue at all which messages were lost and which finally made it to the server.

Why that? The problem here is that (by design!), the TCP send() API always returns success but buffers the message locally until it can be transmitted to the remote host. This behavior is fine - it enables TCP to be used reliably even when a network path is temporarily down. The downside is that the sender never knows if and when the data was received by the remote peer. Of course, there is an upper limit on the buffer size. It's depending on TCP window negotiations, and if I remember correctly the ultimate upper limit is around 240K. I haven't checked, but the lower limit is probably around 1.5K, the size of an Ethernet packet.

If we now assume a generous syslog message size of 150 bytes (many are much smaller), we can have between 10 and 1,600 messages sitting in this buffer! So we may lose up to 1,600 messages if the server goes down - without even noticing it. Frightening, isn't it?

I was aware of this problem since the early days of plain tcp syslog. I have even documented it as part of the SELP (simple event logging protocol) discussion on the loganalysis mailing list: see the selp spec, section 2.4. I have to admit that I never took it too seriously. After all, it only happens when the server goes down ... and there were so many other occasions on which syslog messages could get lost. The most prominent one probably running out of buffer space on the local syslogd or simply restarting it.

Then, I begun to implement more and more reliability into rsyslog (my *nix syslogd implementation for those not in the know). And the more reliable that engine got, the more the reliability problems of plain tcp syslog surfaced. A typical sample of TCP unreliability is included in this post, which also was the thing that finally reminded me on the root cause of all that (if you search the forum, you'll find some other example of "the tcp problem").

Once the core engine was reliable, the unreliability of TCP, so far well-hidden by the unreliability of the rest of the system, quickly surfaced. No advanced flow control, no tricks, no nothing helped. We can not build a reliable solution out of plain tcp syslog. It's simply a no-go. At this point, it is probably good to add that this is not a rsyslog problem. It's a protocol issue and as such all softwares implementing plain TCP syslog have the same shortcoming!

My quest for a highly reliable and tamper-proof logging system got shaky at that point. So what to do? Of course, a different, more reliable protocol was needed. It must support app-level acks, so that the client (sender) knows what was processed by the server and what not. If so, it can resend the right messages in case of a connection failure. My initial thoughts were to use liblogging and thus RFC 3195, which bases on BEEP (RFC 3080/3081). However, after some thinking, I came to the conclusion that RFC 3195 is far too less accepted and quite a bit to complicated for the job. I ended up designing a new, reliable and extensible protocol which I called RELP in honor of the initial SELP work. RELP stands for reliable event logging protocol. That implies two things: it's reliable and it's NOT only about syslog. RELP can carry a variety of payloads and will probably do so in the future. [You can find some more of my thought process in a previous blog post.]

I have now finished the initial alpha implementation of librelp, the RELP core library. I have also released the first version of rsyslog (3.15.0) with RELP support. There are still a few nits with RELP. Most importantly, it may duplicate some messages in extreme network failure cases (documented inside rsyslog's imrelp module). But, except for a bug, RELP will never again lose any message ... nearly. Now the ball is back to rsyslog. If the network connection breaks and rsyslog sent some yet-unacknowledged messages just before the network connection broke and rsyslog is terminated in that state ... then we may indeed still lose a handful of messages. The bad thing is that it happens, the good thing is that there is some cure, it "just" needs to be implemented. So the reliability quest is not over. It just went into the next round. But, again, the probability of message loss is now very much lower than with plain tcp syslog.

So just let me repeat: plain tcp syslog is *not* a reliable solution. It works well as long as everything is working well, but it screws up when the network or the server breaks... And, yes, you don't really need to care about that until you try to make the rest of the system fully reliable.


I have received a good comment via mail and provided an in-depth answer, probably interesting to others, too. So I here reproduced it (but snip much of the original mail):

> There is very expensive software out there that can do guarenteed
> packet
> deliver over tcp...

Well... RELP can *really* do the trick. It needs some help in the rsyslog engine to work under all cases, that is when the relp stack is terminated (when the client rsyslog is shut down). But that isn't rocket science and has just been pushed back by more urgent work (securing the channel).

So far, relp does single ack, that is the server ack's every packet back to the client. Client discards packets only after ack. So when a session breaks, we know what the server has processed. If, however, some of the acks get lost, we resend packets that the server already processed, resulting in some mild message *duplication* (we currently have a 128 packet app-layer max window and typically acks come in rather quickly). To work around this, the client must ack the server's acks (double-acking). That sounds scary, but is not. It's also not performance intense and the protocol is modeled that those with ultra-slim bandwidth can turn it off and live with the message duplication scenario. If double-acks are active, relp client and server will go into a recovery phase after reconnect negotiation, in which mutally acked package numbers are exchanged. To do so, the client must provide a session cookie back to the server. This is a potential attack vector and thus I'd like to have secure transport in place before I do it. Once the recovery negotiation is done, client and server know, and have discarded, what each other peer has processed. The remaining packets are re-sent, but under the same reliability settings. So if the connection breaks at this point again (or during the renegotiation), we'll simply go into another recovery phase until we finally succeed. It is of course important that session caches be persisted to disk when an engine stops - that will be part of rsyslog. The recovery procedure itself is in librelp. rsyslog needs to add just a slim persistence capability.

Once this is done, and with proper rsyslog queue settings, sufficiently stable hardware and sufficient disk space, I guarantee (not in a lawyers sense, though ;)) that you'll never lose a message nor get a duplicate.

This is when I think we have achieved our reliability goal. If all goes well... summer? ;)

ANOTHER UPDATE (2008-05-29)

Some idea was brought up about one may be able to circumvent this problem. I tried it out, saw it failed and could also prove that it is simply impossible to have a reliable plain tcp syslog protocol without application-level acknowledgment.

rsyslog work log

OK, I am guilty of lazyness... I worked quite a bit on RELP and by doing so, I always thought "it's not much for the rsyslog work log, so let's do it tomorrow" ;) Well, tomorrow is now two weeks old and I think it is a good plan to come back to my habit of posting work logs. This time, of course, a bit more than usual.

Please note that the initial version with relp support, 3.15.0 has been released. I am currently evaluating the next focus feature, which probably is native TLS support. If so, I'll probably again need to put some work into a utility library, which will not directly be rsyslog related. But I hope I'll still continue to post something useful here. In the mean time, please be reminded that any implementation reports are still most welcome.

So here we go, the past ca. two-weeks worth of rsyslog work log:
- added flow control options to other input sources
- worked on librelp
- bugfix: some minor memory leaks
- bugfix: some slightly invalid memory accesses
- made librelp and rsyslog relp system send the first messages
to the remote peer (but it then discards them ;))
- made debug module free some memory on exit to make memory debugger
- added capability to receive RELP messages and forward them to the
main message queue to imrelp (not yet fully finished)
- cleanup of omrelp
- prepared omrelp for real "relp action"
- changed queue's discard severities default value to 8 (do not discard)
to prevent unintentional message loss
- removed a now-longer needed callback from the output module
interface. Results in reducing code complexity.
- Greatly enhanced rsyslogd's file write performance by disabling
file syncing capability of output modules by default. This
feature is usually not required, not useful and an extreme performance
hit (both to rsyslogd as well as the system at large). Unfortunately,
most users enable it by default, because it was most intuitive to enable
it in plain old sysklogd syslog.conf format. There is now a new config
setting which must be enabled in order to support syncing. By default it
is off. So even if the old-format config lines request syncing, it is
not done unless explicitely enabled. I am sure this is a very useful
change and not a risk at all. I need to think if I undo it under
compatibility mode, but currently this does not happen (I fear a lot of
lazy users will run rsyslogd in compatibility mode, again bringing up
this performance problem...).
- added $ActionfileEnableSync config directive
- bugfix: internally generated messages had "FROMHOST" property not set
- bugfix: continue parsing if tag is oversize (discard oversize part) - thanks
to for the patch
- added $HHOUR and $QHOUR system properties - can be used for half- and
quarter-hour logfile rotation
- bugfix: QHOUR and HHOUR properties were wrongly calculated
- made relp modules use new relpengine-provided feature selection functions
- bugfix: fixed memory leaks in stream class and imfile
- bugfix: $ModDir did invalid bounds checking, potential overlow in
dbgprintf() - thanks to varmojfekoj for the patch
- changed default for "last message repeated n times", which is now
off by default
- implemented backward compatibility commandline option parsing
- bugfix: -t and -g legacy options max number of sessions had a wrong
and much too high value
- automatically generated compatibility config lines are now also
logged so that a user can diagnose problems with them
- added compatibility mode for -a, -o and -p options
- MILESTONE: compatibility mode processing finished
- updated man pages
- changed default file output format to include high-precision timestamps
- added a buid-in template for previous syslogd file format
- added new $ActionFileDefaultTemplate directive
- added support for high-precision timestamps when receiving legacy
syslog messages
- added new $ActionForwardDefaultTemplate directive
- added new $ActionGSSForwardDefaultTemplate directive
- added build-in templates
- fixed small memory leak in tcpclt.c
- bugfix: fixed small memory leak in template regular expressions
- bugfix: regular expressions inside property replacer did not work
- removed --enable-mudflap, added --enable-valgrind ./configure setting
- bugfix: tcp receiver could segfault due to uninitialized variable
- docfix: queue doc had a wrong directive name that prevented max worker
threads to be correctly set
- worked a bit on atomic memory operations to support problem-free
- added a --enable/disable-rsyslogd configure option so that
source-based packaging systems can build plugins without the need
to compile rsyslogd
- released 3.13.0-dev0
- begun 3.17.0 [focus: TLS for plain tcp syslog]
- bugfix: rsyslogd was no longer build by default; man pages are
only installed if corresponding option is selected. Thanks to
Michael Biebl for pointing these problems out.