After I had written my writeup on the rsyslog data race, I had another very useful discussion with Lorenzo Catucci. We touched some points that are not yet included in the write-up. I also explained some of the rsyslog internals that you probably find no other source yet. In short: I think this discussion is useful for anyone interested in more details. I am not sure when (if) I will be able to include it into the original post. So rather than lose it, I asked Lorenzo if he agreed to making it public and thankfully he did.
So here it is - very authentic and totally unedited, with all the typos and sometimes non-sequence of a chat discussion as we all like it. Have fun! Comments are appreciated.
Rainer: writeup finally finished: http://blog.gerhards.net/2009/01/rsyslog-data-race-analysis.html ;)
So here it is - very authentic and totally unedited, with all the typos and sometimes non-sequence of a chat discussion as we all like it. Have fun! Comments are appreciated.
Rainer: writeup finally finished: http://blog.gerhards.net/2009/01/rsyslog-data-race-analysis.html ;)
now lets hope for the best...
but with the barrier, I actually meant the real thing: http://en.wikipedia.org/wiki/Memory_barrier
it forces the cpu to in-order memory operations
I thought it is sufficient.. but obviously not
I didn't have the exact failure scenario on my mind this morning
that evolved during the analysis
and was clarified by the write-up
in fact I started a new lab in the midst of the writeup
which again proves the point that it pays to write things down
I didn't explain that all
rsyslog shall run great on uniprocessor
so the message mutex is enabled depending on whether it is needed or not
similarly, when the message is constructed, it is done in a single thread
the mutex is disabled while doing so
only after it is constructed, the object is finalized, which then (and only then) enables the mutex
so it is ok that part of the objhects lifetime it is not synchronized
maybe never, depending on the config
that gains quite some performance
even on MP, because of the construction phase
of course, it doesn't ease troubleshooting...
the message is first finaliuzed, then it is enqueued
from then on, the sequence is always the same (there is only a single call to do so)
well... is it.. good point
will check reality vs. theory tomorrow ;)
that's why I love discussion :)
Sent at 6:49 PM on Wednesday
Sent at 9:47 AM on Thursday
remember: with -d option active.
yeah... verification will be a long road
at some point we should switch back to production code
I missed to write that down
should add it
09:45:01 0 216 0.70 0.59 0.49
I know that it is only probable under high load
indeed
do you concur to this POV?
Sent at 9:54 AM on Thursday
This was overdone. If I reduced the concurrency, I was again able to reproduce. For every machine, there seems to be a mix that probably triggerst the failure scenario. I think this was one of the reasons why it was so hard to find. Not knowing the root cause, the bug appeared very random and was extremely hard to track down to a config/environment.
BTW: do you mind if I post a copy of this conversation on the blog? I will probably not be able to update the analysis today, but our conversation should be very helpful to everyone interested.
there is one mutex that guards each individual action. No action (output module) may be called concurrently - that's part of the module spec (for good reason with many modules)
so there is one mutex for each action and it is aquired before an action is entered and released when it is done.
The other mutex class guards each individual message. that is the one described in the analysis.
if two different action invocations process a message, they may concurrently try to modify the message (that depends on the actual config, most importantly on the templates used and order of action execution)
So any such modifcation - AFTER the message has been finalized (think yesterday evening) is guarded by the message mutex.
(side-note: if n is 1, it can not be used multiple times, and this is - I think - one of the cases where the message mutex is "turned off")
I needed to set stage ;)
with that design, all actions use exactly the same sequence to increment and decrement the reference counter.
so this will always be used in a consistent manner. That part of the logic was absolutely right. BUT.... as I wrote, the msgAddRef used atomics, while msgDestruct used the message mutex.
yes, exactly
exactly!
and this is what the final fix will look like: it'll check if atomics are avaiable and, if so, use them. If they are unavailable, we will use the mutex (which is much slower)
The problem is that we for now have just a indication, I wouldn't say its a confirmation yet.
indeed
my point is we do not have enough proof yet
but... we have a sound theory
plus we know for sure (as we can show a failure case) that this was a bug
so code has improved
in that sense, however, theory and experiment go together
and this indeed makes me feel good :)
I'll see that I get a new master branch together today that includes everything needed to address the failure scenario AND enabling atomics again.
I need to be careful as the atomics version is ugly and I need to verify I do everything needed in a clean version, too.
It would be excellent if you could then run that new master branch. And then we see what happens. I think a debug build would be good at first (as this was the repro environment) and once this works out well (a day or two without aborts should be sufficent), we can switch the production build.
In parall, I hope that others who experienced the problem will also try the new code.
so over the next couple of days, maybe weeks, we should get a good feeling if there still is an issue.
I have to admit it is quite complex code, so I wouldn't outrule there is something else...
so I think we should get some feedback from there, too. But the problem is that the problem occured very sporadic in that case. I'd say anything less than two weeks of undisturbed operation is not even an indication
I will also ask the folks on the rsyslog mailing list to apply patches .. as soon as new versions are out.
0 comments:
Post a Comment