label – tuning
Here's a rule of thumb from P244 [[optimizing linux performance]].
“time” command can show it. If kernel time > 25%, then it's excessive and warrants investigation. The investigation is relatively standard – use strace to rank the most time-consuming system calls.
root – privilege required to start/stop the daemon, but the query tools don’t need root
dtrace – comparable. I think these two are the most powerful profilers on solaris/linux.
statistical – results can be partially wrong. Example – call graph.
Per-process – profiling is possible. I think default is system-wide.
CPU – counters (hardware counters). Therefore, low impact on running apps, lower than “attachment” profilers.
userland – or kernel : both can be profiled
recompile – not required. Other profilers require recompiling.
kernel support – must be compiled in.
oprifiled – the daemon. Note there’s no executable named exactly “oprofile”.
[[Optimizing Linux performance]] has detailed usage examples of oprofile. [[linux programmer’s toolbox]] has decent coverage too.
Based on P244 [[linux sys programming ]]
I am not quite sure about the use cases, but let's say this huge file needs to be loaded into memory, by 2 processes. If readonly, then savings is possible by sharing the memory pages between the 2 processes. Basically, the 2 virtual address spaces map to the same physical pages.
Now suppose one of them — Process-A — needs to write to that memory. Copy-on-write takes place so Process-B isn't affected. The write is “intercepted” by the kernel which transparently creates a “copy”, before committing the write to the new page.
fork() system call is another user of Copy-on-write technique.
 Use cases?
* perhaps a large library, where the binary code must be loaded into memory
* memory-mapped file perhaps
I was told Google engineers discuss algorithm efficiency everyday, including lunch time. I guess the intensity could be even higher at HFT shops.
I feel latency due to software algorithm might be a small component of overall latency. However, the bigger portions of that latency may be unavoidable – network latency, disk write(?), serialization for transmission(?), … So the only part we could tune might be the software algorithm.
Further, it's also possible that all the competitors are already using the same tricks to minimize network latency. In that case the competitive advantage is in the software algorithm.
I feel algorithm efficiency could be more permanent and fundamental than threading. If I compare 2 algorithms A1 and A2 and find A2 being 2x A1's speed, then no matter what threading or hardware solutions I use, A2 still beats A1.
Important authors often describe important techniques in memory efficiency as well as speed efficiency.
Trading is mostly about speed. Memory is important in other domains.
Q: which skill in Singapore looks to grow in terms of salary, and less importantly, # of jobs
Q: which fields in S'pore has a skill shortage based on your recent observations
A: possibly latency. not so obvious in quant field. There are many real quants in Singapore. Generic c++/java is no shortage.
Q: which fields in S'pore present entry barrier in terms of tech?
A: thread, latency, data structure mastery (eg: iterator invalidation); green_field_ground_up design in general; high volume market data sys design in particular;
Q: which fields in S'pore present entry barrier in terms of domain knowledge?
A: FX (…..), P/Y conversion, duration, IRS pricing, bare bones concepts of a lot of major derivatives
A: I feel a lot of employers want relevant biz experience. That experience is a few weeks (or months) of on the job learning, but (the lack of) it does cast a long shadow.
800+ G4 linux machines (HP). 4-8 core / 16G RAM each. 32-bit OS
Later consolidated to
400+ G7 linux machines. 32 core / 144G RAM each. 64-bit OS
Roughly 6 of these machines are dedicated Coherence machines.
Database is not part of this market data and real time risk engine.