kernel time > 25%#linux tuning

label – tuning

Here's a rule of thumb from P244 [[optimizing linux performance]].

“time” command can show it. If kernel time > 25%, then it's excessive and warrants investigation. The investigation is relatively standard – use strace to rank the most time-consuming system calls.

Advertisements

strace, ltrace, truss, oprofile, gprof – random notes

[[optimizing Linux performance]] has usage examples of ltrace.
I think truss is the oldest and most well-known.
Q: what values do the others add?
truss, strace, ltrace all show function arguments, though pointer to objects will not be “dumped”. (Incidentally, I guess apptrace has a unique feature to dump arguments of struct types.)
strace/ltrace are similar in many ways…
ltrace is designed for shared LLLibrary tracing, but can also trace syscalls.
truss is designed for syscalls, but “-u” covers shared libraries.
oprofile — can measure time spent and hit rates on library functions

oprofile – phrasebook

root – privilege required to start/stop the daemon, but the query tools don’t need root

dtrace – comparable. I think these two are the most powerful profilers on solaris/linux.

statistical – results can be partially wrong. Example – call graph.

Per-process – profiling is possible. I think default is system-wide.

CPU – counters (hardware counters). Therefore, low impact on running apps, lower than “attachment” profilers.

userland – or kernel : both can be profiled

recompile – not required. Other profilers require recompiling.

kernel support – must be compiled in.

oprifiled – the daemon. Note there’s no executable named exactly “oprofile”.

[[Optimizing Linux performance]] has detailed usage examples of oprofile. [[linux programmer’s toolbox]] has decent coverage too.

sharing a huge memory chunk between 2 processes

Based on P244 [[linux sys programming ]]

I am not quite sure about the use cases[1], but let's say this huge file needs to be loaded into memory, by 2 processes. If readonly, then savings is possible by sharing the memory pages between the 2 processes. Basically, the 2 virtual address spaces map to the same physical pages.

Now suppose one of them — Process-A — needs to write to that memory. Copy-on-write takes place so Process-B isn't affected. The write is “intercepted” by the kernel which transparently creates a “copy”, before committing the write to the new page.

fork() system call is another user of Copy-on-write technique.

[1] Use cases?

* perhaps a large library, where the binary code must be loaded into memory

* memory-mapped file perhaps

algorithm efficiency at HFT shops

I was told Google engineers discuss algorithm efficiency everyday, including lunch time. I guess the intensity could be even higher at HFT shops.

I feel latency due to software algorithm might be a small component of overall latency. However, the bigger portions of that latency may be unavoidable – network latency, disk write(?), serialization for transmission(?), … So the only part we could tune might be the software algorithm.

Further, it's also possible that all the competitors are already using the same tricks to minimize network latency. In that case the competitive advantage is in the software algorithm.

I feel algorithm efficiency could be more permanent and fundamental than threading. If I compare 2 algorithms A1 and A2 and find A2 being 2x A1's speed, then no matter what threading or hardware solutions I use, A2 still beats A1.

expertise demand]SG: latency imt quant/HFT

Q: which skill in Singapore looks to grow in terms of salary, and less importantly, # of jobs
A: latency,

Q: which fields in S'pore has a skill shortage based on your recent observations
A: possibly latency. not so obvious in quant field. There are many real quants in Singapore. Generic c++/java is no shortage.

Q: which fields in S'pore present entry barrier in terms of tech?
A: thread, latency, data structure mastery (eg: iterator invalidation); green_field_ground_up design in general; high volume market data sys design in particular;

Q: which fields in S'pore present entry barrier in terms of domain knowledge?
A: FX (…..), P/Y conversion, duration, IRS pricing, bare bones concepts of a lot of major derivatives
A: I feel a lot of employers want relevant biz experience. That experience is a few weeks (or months) of on the job learning, but (the lack of) it does cast a long shadow.