Monday, June 15, 2009

high-peformance, low-precision time API under linux?

This time, I raise a question in my blog. Suggestions, tips and full answers are very welcome.

In rsyslog, there are various situations where I only need low resolution timestamps. With low resolution, I precise within a second. Of course, this thing is provided by the time() API. However, time() is very slow - far too slow for many things I do in rsyslog. So far, I have been able to work around this problem by doing a time() call only every n-th time where I run in tight loops and know that this will not bring me outside of me 1-second window (well, to be precise, this is at least very unlikely and thus acceptable).

However, this approach does not work for all work that I am doing. Now I am facing the challenge once gain, but this time in an area where the "query only n-th time" approach does not work. I need the time in order to schedule asynchronous activities (like writing so far unwritten buffers to disk). With them, there is no tight loop that provides me with some sense of timing, and so I simply do not know if half a second or half an hour has elapsed between calls - except when I do one of these costly time() calls.

A good work-around would be to define my own interval timer, awaking me e.g. every seconds. So I would not need absolute time but could do things based on these timer ticks. However, there is lot of evil in this approach, too: most importantly: this means rsyslogd will be active whenever the system is up, and running on a tick will prevent the operating system from switching the CPU to power saving modes. So this option looks very dirty, too.

So what to do now? Is there any (decently portable) way to get a second-resolution current timestamp (or a tick counter) without actually running on a tick?

If I don't find a better solution, I'll probably be forced to run rsyslogd on a tick, which would not be a good thing from a power consumption point of view.

As I already said, feedback is greatly appreciated...

Edit: in case my description was a bit unclear: it is not so important that the timestamp is of low resolution. Of course, I prefer higher resolution, but I would be OK with lower resolution if that is faster.

The problem with time() and gettimeofday() is that they are quite slow. As an example, I can only do around 250,000 time()/gettimteofday() calls per second on my current development system. So each API call takes around 4ms on that system. While this sounds much, it adds considerable runtime to each messages being processed - especially if multiple calls are required thanks to modular structure.

I have also thought about a single "lowres system time getter" inside rsyslog. However, that brings up problems with multi-threading. If one would like to be on the safe side, its entry points need to be guarded by mutexes, another inherently slow operation (depending on circumstances, overhead can be even worse then time()). With atomic operations, things may improve. But even then, we run into the issue that we do not know if the last call was half a second or half an hour ago...

Another edit:
This is a recording from a basic test I did on one lab system:

[rgerhards@rf10up tests]# cat timecaller.c

int main(int argc, char* argv[])
time_t tt;
struct timeval tp;
int i;

for(i = 0 ; i < atoi(argv[1]) ; ++i) {
// time(&tt);
gettimeofday(&tp, NULL);
[rgerhards@rf10up tests]# cc timecaller.c
[rgerthards@rf10up tests]# time ./a.out 100000

real 0m0.309s
user 0m0.004s
sys 0m0.294s

The runtime for the time() call is roughly equivalent (especially giving the limited precision of the instrumentation). Please also note that we identified the slowness of the time() calls in autumn 2008, when doing performance optimization with the help of David Lang. David was the first to point to the time-consuming time() calls in strace. Reducing them made quite a difference.

Since them, I try to avoid time() calls at all costs.


SEJeff said...


You can call out to x86 ASM and use rdtsc() to get what you're wanting. I remember someone talking about the HPET writing to shm and then userspace reading from that somehow. In practice I've always found gettimeofday() calls much slower using HPET over TSC.

SEJeff said...

Robert Horn said...

rdtsc() has two problems:
1) On multi-core systems, each core has it's own independent timer. You need to lock the process to the core before you can use rdtsc.
2) There is no way to reliably detect wrap around on the timer. If you have some other means of ensuring that wrap around is detected (e.g., a timed wait of 200ms) then you can use that timer.

On slow single processor systems found in some embedded devices, rdtsc is very effective. It's really fast and doesn't trigger an OS reschedule.

Robert Horn said...

Take a look at the itimer calls. I think that these have much less overhead. My recollection from the last time looking inside the code was that the time() code consumes a lot of resources creating the proper time_tm structure. The itimer calls work on raw counts.

You have to be somewhat careful. Itimer also triggers interrupts that will abort system calls. The usual timing technique was to set an itimer for some reasonably long period like 10 seconds. Each time some incoming event happens, you get the itimer value to find the elapsed time, then set the itimer back up to 10 seconds. You manage time yourself.

If things get quiet, the itimer goes off and needs to be re-started, but that's not too big a deal. Every so often, you need to readjust the self-managed time to correct for ticks that get dropped while you are doing the get/set itimer stuff.

Most Unix family OS have itimer. You'll need to check how their performance varies

simplifying rsyslog JSON generation

With RESTful APIs, like for example ElasticSearch, you need to generate JSON strings. Rsyslog will soon do this in a very easy to use way. ...