P245 [[Optimizing Linux Perf]] (2005) points out this “intriguing” scenario. Here are the investigation steps to identify it —
First, we use _oprofile_ to identify a function(s) taking up the most application time. In other words, the process is spending a large portion (I would imagine 5%+) of cpu cycles in this function. However, the cycles could be spent in one lengthy entry or a million quick re-entries. Either way, this would be a hotspot. Then we use oprofile/valgrind(cachegrind)/kcache on the same process, and check if the hot function generates high cache misses.
The punch line – the high cache misses could be the cause of the observed process hogging. I assume the author has experienced the same but I’m not sure how rare or common this scenario is.
Some optimization guy in a HFT shop told me main memory is now treated as IO, so cache miss is treated seriously. http://programmers.stackexchange.com/questions/142864/writing-low-latency-java mentions that “Cache misses are your biggest cost to performance. Use algorithms that are cache friendly.”
By the way, instruction cache miss is worse than data cache miss. My friend Shanyou also said the same.