I found an interesting post "Algorithm Performance in Academic Papers and Practice" by fellow security blogger Steve Burnett on the SecurityBloggersNetwork.
Steve questions whether theoretical performance gains, often given based on big O notation, can really be realized in practice. He claims they often can not.
I tend to agree to his point, at least in broad terms. One must first remind oneself that big O notation is quite coarse, it tells you about the potential extremes. In practice, however, these extreme cases may routinely never hit. Even further, I think it depends very much on what the algorithm in question is actually doing. If it is "just" doing calculations on its own, theoretical performance benefits can much better be predicted than if there is any external reference.
The obvious case is code that needs to do some library or operating system calls inside the algorithm. A good example is the performance optimization I and David Lang did on rsyslog in fall of 2008. Here, and I have to admit partly to my surprise, it turned out that optimizing rsyslog algorithms actually had almost no effect in boosting the daemon's performance. Why? Simply because a hand full of system calls, many time-related, used up the majority of execution time. So rather than optimizing the algorithms used, we optimized out OS calls and that had a very big effect (and even after that initial optimization, there is still much room for improvement just by looking at the APIs). Of course, this is an extreme sample, because usual syslog server processing is not at all computational and the frequent context switches themselves are performance intense. The picture for a graphics application is probably totally different.
However, many less obvious cases exist, and I guess lots of them have to do with the fact that multiple processes and/or thread are executed. So resources are shared. On such a system, for example, theoretical performance gains may be lost due to task switches which purge vital data off the CPU cache. Even worse, a theoretically optimized algorithm may require additional main memory, which may, in practice, force cache purges because the cache size now is insufficient. Funny, eh?
Wrap-up: big O notation is useful, but for practical needs, it needs to be taken with a grain of salt. For real world deployments, actual performance testing is quite important. And as a side-note, test results on a dedicated system may be quite different from practical performance on a system where other things are also executed...