inline perf can Backfire ! #Google#i-cache

As illustrated below, without inline, instruction cache system could hit ChangeOfFlow twice as it enters/exits your function aa(). If aa() is actually inlined and embedded in a hostFunc, then the instruction cache system can often load entire hostFunc, eliminating COF. This helps instruction cache, but excessive inlining can increase executable footprint (code bloat).

google c++ guide points out that

  • inline can either increase or decrease (for tiny functions) executable footprint. In general, smaller footprint improves running time due to instruction cache efficiency
  • virtual functions are inlined (i.e. defined in class body) primarily for convenience/maintainability, not performance

See also https://www.eetimes.com/document.asp?doc_id=1275470 and https://www.eetimes.com/document.asp?doc_id=1275474

As an example, consider the function call tree flow in Figure 1. Suppose function F2 is linked near function F1, but function F3 is not linked near F1. When function F1 calls F2, it is possible that F2 is already in the cache and that there will be no cache miss. (The likelihood that F2 is in the cache depends on the sizes of F1 and F2, as well as the location of the call inside F1.) In contrast, there will probably be a cache miss when F1 calls F3. Because F3 is located far away from F1 in memory, it is not likely to be in the cache when F1 calls it.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s