Goal — to eliminate JGC completely.
Design 1: I will want Order.java to use primitive fields only and avoid reference fields  at all cost, so the total footprint of an Order is known in advance. Say it’s 100 bytes. I will create 10M of dummy Order instances, possibly scattered in heap, not adjacent as in c++, and hold their 10M addresses in an Order array… about 1GB footprint for the Order objects + 80M footprint for the array of 8-byte pointers.
(Note I reuse these Order instances in this object pool and never let them get garbage-collected.)
Then i need a few subscripts to identify the “activeRegion” of the ring but how about released slots enclosed therein?
 timestamps will be ints; symbolIDs and clientIDs are ints; short ascii strings will use 64-bit ints (8 characters/int); free-form strings must be allocated off-site:(
Design 2a: To avoid the “scatter” and to place the Order instances side by side, Can we use a serialized byte array object to represent one Order? Can we use one gigantic off-heap byte array to hold all Orders, eliminating the 80M footprint? See java off-heap memory
Design 2b: https://blog.bramp.net/post/2015/08/26/unsafe-part-2-using-sun.misc.unsafe-to-create-a-contiguous-array-of-objects/ shows a contiguous array of java objects, like std::vector<MyObject>
Design 2c: https://www.ibm.com/support/knowledgecenter/en/SSYKE2_7.1.0/com.ibm.java.lnx.71.doc/user/packed_optimizing.html is a feature in IBM jvm
Ring buffer is good if the object lifetimes are roughly equal, giving us FIFO phenomenon. This occurs naturally in market data or message passing gateways. Otherwise, we may need a linked list (free list) of released slots in addition to a pair of subscript to identify the active region.
It might be better to allocate a dedicated buffer for each thread, to avoid contention. Drawback? One buffer may get exhausted when another stays unused.