Friday, February 06, 2009

When does rsyslog close output files?

I had an interesting question on the rsyslog mailing list that boils down to when rsyslog closes output files. So I thought I talk a bit about it in my blog, too.

What we need to look at is when a file is closed.
It is closed when there is need to. So, when is there need? There are currently three cases where need arises

a) HUP or restart
b) output channel max size logic
c) change in filename (for dynafiles, only)

I think a) needs no further explanation. Case b) should also be self-explanatory: if an output channel is set to a maximum size, and that size is reached, the file is closed and a new one re-opened. So for the time being let's focus on case c):

I simplified a bit. Actually, the file is not closed immediately when the file name changes. The file is kept open, in a kind of cache. So when the very same file name is used again, the file descriptor is taken from the cache and there is no need to call open and close APIs (very time consuming). The usual case is that something like HOSTNAME or TAG is used in dynamic filename generation. In these cases, it is quite common that a small set of different filenames is written to. So with the cache logic, we can ensure that we have good performance no matter in what order messages come in (generally, they appear random and thus there is a large probability that the next message will go to a different file on a sufficiently busy system). A file is actually closed only if the cache runs out of space (or cases a) or b) above happen).

Let's look at how this works. We have the following message sequence:


Host Msg
A M1
A M2
B Ma
A M3
B Mb


and we have a filename template, for simplicity, that consists of only %HOSTNAME%. What now happens is that with the first message the file "A" is opened. Obviously, messages M1 and M2 are written to file "A". Now, Ma comes in from host B. If the name is newly evaluated, Ma is written to file B. Then, M3 again to file A and Mb to file B.

As you can see, the messages are put into the right files, and these files are only opened once. So far, they have not been closed (and will not until either a) happens), because we have just two file descriptors and those can easily be kept in cache (the current default for the cache size, I think, 100).

I hope this is useful information.

No comments:

Busy at the moment...

Some might have noticed that I am not as active as usual on the rsyslog project . As this seems to turn out to keep at least for the upcomi...