Monday, April 27, 2009

Levels of reliabilty

We had a good discussion about reliability in rsyslog this morning. On the mailing list, it started with a question about the dynafile cache, but quickly morphed into something else. As the mailing list thread is rather long, I'll try to do a quick excerpt of those things that I consider vital.

First a note on RELP, which is a reliable transport protocol. This was the relevant thought from the discussion:

I've got relp set up for transfer - but apparently I discovered
that relp doesnt take care of a "disk full" situation on the receiver
end? I would have expected my old entries to come in once I had cleared the disk space, but no... I'm not complaining btw - just remarking that this was an unexpected behaviour for me.


That has nothing to do with RELP. The issue here is that the file output writer (in v3) uses the sysklogd concept of "if I can't write it, I'll throw it away". This is another issue that was "fixed" in v4 (not really a fix, but a conceptual change).

If RELP gets an ack from the receiver, the message is delivered from the RELP POV. The receiving end acks, so everything is done for RELP. Some thing if you queue at the receiver and for some reason lose the queue.

RELP is reliable transport, but not more than that. However, if you need reliable end-to-end, you can do that by running the receiver totally synchronous, that is all queues (including the main message queue!) in direct mode. You'll have awful performance and will lose messages if you use anything other than RELP for message reception (well, plain tcp works mostly correct, too), but you'll have synchronous end-to-end. Usually, reliable queuing is sufficient, but then the sender does NOT know when the message was actually processed (just that the receiver enqueued it, think about the difference!).

This explanation triggered further questions about the difference in end-to-end reliability between direct queue mode versus disk based queues:

The core idea is that a disk-based queue should provide sufficient reliability for most use cases. One may even question if there is a reliability difference at all. However, there is a subtle difference:

If you don't use direct mode, than processing is no longer synchronous. Think about the street analogy:


http://www.rsyslog.com/doc-queues_analogy.html


For synchronous, you need the u-turn like structure.

If you use a disk-based queue, I'd say it is sufficiently reliable, but it is no longer an end-to-end acknowledgement. If I had this scenario, I'd go for the disk queue, but it is not the same level of reliability.

Wild sample: sender and receiver at two different geographical locations. Receiver writes to database, database is down.

Direct queue case: sender blocks because it does not receive ultimate ack (until database is back online and records are committed).

Disk queue case: sender spools to receiver disk, then considers records committed. Receiver ensures that records are actually committed when database is back up again. You use ultra-reliable hardware for the disk queues.

Level of reliability is the same under almost all circumstances (and I'd expect "good enough" for almost all cases). But now consider we have a disaster at the receiver's side (let's say a flood) that causes physical loss of receiver.

Now, in the disk queue case, messages are lost without the sender knowing. In direct queue case we have no message loss.

And then David Lang provided a perfect explanation (to which I fully agree) why in practice a disk-based queue can be considered mostly as reliable as direct mode:


> Level of reliability is the same under almost all circumstances (and I'd
> expect "good enough" for almost all cases). But now consider we have a
> disaster at the receiver's side (let's say a flood) that causes physical loss
> of reciver.

no worse than a disaster on the sender side that causes physical loss of the sender.

you are just picking which end to have the vunerability on, not picking if you will have the vunerability or not (although it's probably cheaper to put reliable hardware on the upstream reciever than it is to do so on all senders)

> Now, in the disk queue case, messages are lost without sender knowing. In
> direct queue case we have no message loss.

true, but you then also need to have the sender wait until all hops have been completed. that can add a _lot_ of delay without nessasarily adding noticably to the reliability. the difference between getting the message stored in a disk-based queue (assuming it's on redundant disks with fsync) one hop away vs the message going a couple more hops and then being stored in it's final destination (again assuming it's on redundant disks with fsync) is really not much in terms of reliability, but it can be a huge difference in terms of latency (and unless you have configured many worker threads to allow you to have the messages in flight at the same time, throughput also drops)

besides which, this would also assume that the ultimate destination is somehow less likely to be affected by the disaster on the recieving side than the rsyslog box. this can be the case, but usually isn't.


That leaves me with nothing more to say ;)

1 comment:

Petro said...

I'm not on the mailing list, and I'm new to rsyslog (as in I found out about it 12 hours ago) so forgive me for intruding, but in the reliablity case you're talking about you wind up with three cases:

1) Reciever commits log entry(ies) to a disk based queue, notifies sender recieved and then gets blown up by a 120mm rocket (I'm doing SA work in Baghdad. One rocket CAN ruin your whole datacenter. Not that it's happened here. Yet).

2) Reciever commits log entries to memory queue, notifies sender I got it, then gets hit by a mortar.

3) Reciever gets log, notifies sender I got it, SUCCEEDS in stuffing it into a database on a server up in Balad and then the Balad datacenter gets hit by a 120mm rocket.

If you want really reliable logging get two syslog servers in different legal jurisdictions with completely separate network paths to them.