I was able to place my conference paper "rsyslog: going up from 40K messages per second to 250K" online. It provides a good drill-down of what we did during the first performance tuning effort. Hopefully, it is also useful to other developers enhancing userland single-threaded applications to multi-threading.
I do not only focus on what we did well, but also provide quite some insight on where we (I!!) failed.