low-complexity topics #eg: jvm GC

java GC is an example of “low-complexity domain”. Isolated knowledge pearls. (Complexity would be high if you delve into the implementation.)

Other examples

  • FIX? slightly more complex when you need to debug source code. java GC has no “source code” for us.
  • tibrv set-up
  • socket programming? relatively small number of variations and combinations.
  • stateless feed parser against an exchange spec. Can become more complex when the code size increases.

java GC frequency and duration

Actually, GC overhead is more important than GC frequency or duration, except in low latency systems. This blog has many posts on overhead.


Could be Every 10 sec , as documented in my blog

–stop-the-world duration: (Concurrent collection probably doesn’t worry us as much.)

100 msec duration is probably good enough for most apps but too long for latency sensitive apps, according to my blog.


2 GC algos for latency^throughput

https://databricks.com/blog/2015/05/28/tuning-java-garbage-collection-for-spark-applications.html is a 2015 Intel blog.

Before the G1 algo, Java applications typically use one of two garbage collection strategies: Concurrent Mark Sweep (CMS) garbage collection and ParallelOld garbage collection.

The former aims at lower latency, while the latter is targeted for higher throughput. Both strategies have performance bottlenecks: CMS GC does not do compaction, while Parallel GC performs only whole-heap compaction, which results in considerable pause times.

For applications with real-time response, we generally recommend CMS GC; for off-line or batch programs, we use Parallel GC. In my experience, the 2nd scenario has less stringent requirements so no need to bother.

java collection in a static field/singleton can leak memory

Beware of collections in static fields or singletons. By default they are unbounded, so by default they pose a risk of unexpected growth leading to memory leak.

Solution — Either soft or weak reference could help.

Q: why is soft reference said to support memory/capacity-sensitive cache?
A: only when memory capacity becomes a problem, will this soft reference show its value.

Q: is WeakHashMap designed for this purpose?
A: not an ideal solution. See other posts about the typical usage of WHM

Q: what if I make an unmodifiable facade?
A: you would need to ensure no one has the original, read-write interface. So if everyone uses the unmodifiable facade, then it should work.

GC tuning: will these help@@

Q: have a small nursery generation, so as to increase the frequency of GC collections?

AA: An optimal nursery size for maximum application throughput is such that as many objects as possible are garbage collected by young collection rather than old collection. This value approximates to about half of the free heap.

Q: have maximum heap size exceeding RAM capacity.
A: 32-bit JVM won’t let you specify more than 4G even with 32 GB RAM. Suppose you use 64-bit JVM, then actually JVM would start and would likely use up all available RAM and starts paging.

suggest(hint) a GC run – CLR ^ JVM

Java’s gc() method is a suggestion/plea to JVM at run-time. (There’s no way to force an immediate GC cycle. I often call gc() some 4000 times in a few loops to coax the GC to start.)

CLR Collect() method with Optimized mode is exactly the same.

CLR Collect() method with Forced mode would force an immediate collection. No such feature in JVM.

All of these techniques are discouraged. When are they justified?

why java object always allocated on heap, again

Short answer #1: Because java objects are passed by reference.

Suppose a holder object (say a collection, or a regular object holding a field) holds a Reference to a student1 object. If student1 object were allocated on stack, and de-allocated when stack frame unwinds, the holder would hold a stray pointer. Java is allergic to and won’t tolerate a single stray pointer.

Java primitive entities are always passed by value (pbclone). No stray pointer – no pointer involved at all.

Short answer #2: all pbref entities live safer on heap.

How about c# and C++? Somewhat more complicated.

stop-the-world inevitable — either minor or major GC

Across All of Sun’s GC engines so far (2012), young generation (eden + survivors) algorithm is _always_ STW. The algo changed from single-threaded to parallel, but remains STW. Therefore, during a minor GC ALL application threads are suspended. Usually a short pause.

Across All of Sun’s GC engines so far (2012), oldgen is at best low-pause but _never_ no-pause. For the oldgen, there is _always_ some pause, due to an inevitable STW phase.

Note one of the oldgen algorithms (i.e. CMS) is mostly-concurrent, meaning the (non-concurrent) STW phase is a brief phase — low-pause. However, young gen algorithm has always been STW throughout, without any concurrent phase.

jvm stack ^ C stakck in JNI?

Background — we know the jvm heap and the C heap are separate. Among other things, GC can relocate and clean up stuff in jvm stack only.

Q: is the stack segment divided into the jvm stack and the C stack?

%%A: yes and no. Stack segment is naturally divided into individual call stacks. If there are 33 jvm stacks and 11 of them extend into JNI, and there are 5 threads created in C, then naturally a shared stack area and 2 unshared stack areas exist in the stack segment.

Even the heap can be divided per thread, if you realize your objects aren't shared across threads. I was told subheap-per-thread speeds up heap memory allocation, because the default allocator (malloc and derivatives) must synchronize to access the free list