Rsyslog's omelasticsearch plugin now supports bulk mode. With bulk
mode, message processing is much faster, especially if large loads are
to be processed.
Bulk mode works with rsyslog's
batching capabilities. So it probably is a good idea to refresh some of
the batching concepts. The core idea is that while we would like to
process many messages at once, we do NOT want to wait hold processing
messages "just" because they are too few. So with batching, you set an
upper limit on the batch size (number of messages inside a batch). Let's
say the batch size is set to 32. When a new batch is to be processed,
the queue worker tries to pull 32 messages off the queue. If there are
32 or more present, this is nice and all 32 are taken from the queue.
But now let's assume there are only 10 messages at all present inside
the queue. In that case, the queue worker does not try to guess when the
next 22 messages will arrive and wait for that (if the time is short
enough). Instead, it just pulls the 10 already-present messages off the
queue and these form the batch. When new messages arrive, they will be
part of the next batch.
Now let's look at the startup
of a busy system. Lot's of messages come in. Let's assume they are
submitted one-by-one (many sources submit multiple messages, but let's
focus on those cases that do not). If so, the first message is submitted
and the queue worker is activated. Assuming this happens immediately
and before any other message is submitted (actually unlikely!), it will
initially create a batch of exactly one message and process that. In the
mean time, more messages arrive and the queue fills. So when the first
batch is completed, there are ample messages inside the queue. As such,
the queue worker will pull the next set of 32 messages off the queue and
form a new batch out of them. This continues as long as there are
sufficient messages. Note that in practice the first batch will usually
be larger than one and often be the max batch size, thanks to various
inner workings I would not like to elaborate on in detail in this
article. Large batch sizes with more than 1024 messages are not bad at
all and may even help improve performance. When looking at a system
performing with such large batches, you will likely see that partial
batches are being created, simply for the reason that the queue does not
contain more messages. This is not an indicator for a problem but shows
that everything works perfectly!
The max batch size can be configured via
$ActionQueueDequeueBatchSize
and
$MainMsgQueueDequeueBatchSize
Note
that the default sizes are very conservative (read: low), so you
probably want to adjust them to some higher value. The best value
depends on your workload, but 256 is probably a good starting point. If
the action queue runs asynchronously (e.g. linkedlist mode, everything
non-direct), the action queue batch size specifies the upper limit for
the elasticsearch bulk submission.
To activate bulk mode, use
*.* action(type="omelasticsearch"
... other params ...
bulkmode="on")
The
default is the more conservative "off". Note that the action can of
course be used with any type of filter, not just the catch-all "*.*".
This is only used as a sample.
This Blog is about many things Rainer is interested in. This happens to include syslog, astronomy and other fun things.
Subscribe to:
Post Comments (Atom)
simplifying rsyslog JSON generation
With RESTful APIs, like for example ElasticSearch, you need to generate JSON strings. Rsyslog will soon do this in a very easy to use way. ...

-
Did you ever use TCP to transfer syslog reliably? And do you think that makes you immune against message loss? If so, it's time to think...
-
I gave an invited talk on this topic at LinuxTag 2013 in Berlin. I was originally asked to talk about "rsyslog vs. journal", but r...
-
As most of you know, rsyslog permits to pull multiple lines from a text file and combine these into a single message. This is done with the ...

No comments:
Post a Comment