Requirement: Each worker thread operates independently on its own symbols, never interfering. Admin thread may read/write all symbols.
My friend CSY’s design resembles ConcurrentHashMap — split a big “graph-container” into 32 independent sub-graphs, to be locked individually.
For my 1st design, I briefly considered one lock per symbol. I think 100,000 locks is fine if symbols are consecutive integers. I simply use a symbol as array index to locate the corresponding lock. Every worker thread and the admin thread need to acquire the lock for symbol X before updating the AVL tree of X.
— Here’s my 2nd design, based on a single global lock:
- Regular writer threads only checks, never acquires, the lock. If lock is in-use (i.e. taken by the admin thread) then wait in a 1-second-sleep loop.
- admin thread immediately and unconditionally takes the lock and wait for a grace period before writing. I assume this “write” can take up to 1 minute.
- how to “check” the lock?
- TryLock solution [Design#1] — Just trylock and immediately unlock.
- LockFree solution [Design #2] — atomic boolean to be set only on the admin thread. I think this is one the simplest usage scenarios of atomic boolean. We are not even relying on the atomicity. We only need the visibility feature.
 This grace period is designed to be sufficient for a batch of updates on the worker thread. If we know a batch can write 100 orders and take X microsecond, then set grace period to X. We require worker routine to recheck the lock after each batch.
My friend CSY raised the concern that mid-stream the update could be suspended by kernel scheduler. I think in this case, a simple integer update could take ages, as in the GC stop-the-world scenario. So the design requires some realtime kernel support to guarantee timely response.
read-copy-update lockfree +! retry #RCU mentions a grace period used in RCU api
— Here’s a bigger lockfree design [Design #3]
- without realtime kernel
- Two global variables — in addition to the atomic boolean flag, I will add an atomic int “activeWorkerCnt”
- Two non-reentrant routines (to avoid incremental acquisition)