https://stackoverflow.com/questions/1950878/c-for-loop-indexing-is-forward-indexing-faster-in-new-cpus top answer is concise. I think the observations may not be relevant in x years but the principles are.
- adjacent cache line (ACL) prefetcher — simple to understand
- cpu can detect streams of memory accesses in forward or backward directions
Note L1/L2/L3 caches are considered part of the CPU even if some of them are physically outside the microprocessor.