[[java performance]] by Scott Oaks

–[[java performance]] by Scott Oaks


best of breed..see chapter details on

[jvm] heap memory

[jvm] threading

[jvm] instrumentation



lambda, stream  (java 8 interviews!)


The Introduction chapter outlines 3 broad aspects

* JVM – like memory tuning

* java language – like threading, collections

* Java API — like xml parser, JDBC, serialization, Json


JVM tuning is done by “system engineers” who may not be developers.



which socket/port is hijacking bandwidth

I guess some HFT machine might be dedicated to one (or few) process, but in general, multiple applications often share one host. A low latency system may actually prefer this, due to the shared memory messaging advantage.  In such a set-up, It’s extremely useful to pinpoint exactly which process, which socket, which network port is responsible for high bandwidth usage.

Solaris 10? Using Dtrace? tough? See [[solaris performance and tools]]

Linux? doable

# use iptraf to see how much traffic flowing through a given network interface.
# given a specific network interface, use iptraf to see the traffic break down by individual ports. If you don’t believe it, [[optimizing linux perf ]] P202 has a iptraf screenshot showing the per-port volumes
# given a specific port, use netstat or lsof to see the process PID using that port.
# given a PID, use strace and /proc/[pid]/fd to drill down to the socket (among many) responsible for the traffic. Socket is seldom shared (see other posts) between processes. I believe strace/ltrace can also reveal which user functions make those socket system calls.

empty while(true) loop hogging CPU

Someone (barclays?) gave me a basic tuning quiz —

Q: what cpu utilization will you see if a program executes an empty while(1==1){} ?

A: On a 16-core machine, 3 instances of this program each take up 4% of the aggregate CPU according to windows taskmgr. I think 4% means hogging one entire core.

A: on my RedHat linux, the same program has 99.8% CPU usage meaning one entire core.

A: On a dual-core, if I start 2 instances, each takes up about 50% i.e. one entire core, keeping all other processes off both cores. With 3 instances, system becomes visibly slow. Taskmgr itself becomes a bit hard to use and reports about 30%-40% by each instance.

I think this would count as cpu intensive job.

let’s find out What the system is doing

A real, practical challenge in a low-latency, market-data system is to quickly find out “What’s the system doing”. Log files usually have a lot of details, but we also want to know what files/sockets our process is accessing, what kind of data it is reading and writing.

truss -s or -r can reveal actual data transferred(??)

if write() syscall is stuck, then perhaps disk is full.

lsof reads /proc to get open sockets/files