P245 [[Optimizing Linux Perf]] (c2005) points out this “intriguing” scenario. Here are the investigation steps to identify it —
First, we use _oprofile_ to identify a function(s) taking up the most application time. In other words, the process is spending a large portion (I would imagine 5%+) of cpu cycles in this function. However, the cycles could be spent in one lengthy call or a million quick calls. Either way, this would be a hotspot. Then we use oprofile/valgrind(cachegrind)/kcache on the same process, and check if the hot function generates high cache misses.
The punch line – the high cache misses could be the cause of the observed process hogging. I assume the author has experienced the same but I'm not sure how rare or common this scenario is.
Some optimization guy in a HFT shop told me main memory is now treated as IO, so cache miss is treated seriously. http://programmers.stackexchange.com/questions/142864/writing-low-latency-java mentions that “Cache misses are your biggest cost to performance. Use algorithms that are cache friendly.”
By the way, instruction cache miss is worse than data cache miss.