Requirement: Each worker thread operates independently on its own symbols, never interfering. Admin thread may read/write all symbols.
My friend CSY’s design resembles ConcurrentHashMap — split a big “graph-container” into 32 independent sub-graphs, to be locked individually.
For my 1st design, I briefly considered one lock per symbol. I think 100,000 locks is fine if symbols are consecutive integers. I simply use a symbol as array index to locate the corresponding lock. Every worker thread and the admin thread need to acquire the lock for symbol X before updating the AVL tree of X.
— Here’s my 2nd design, based on a single global lock:
- Regular writer threads only checks, never acquires, the lock. If lock is in-use (i.e. taken by the admin thread) then wait in a 1-second-sleep loop.
- admin thread immediately and unconditionally takes the lock and wait for a grace period[1] before writing. I assume this “write” can take up to 1 minute.
- how to “check” the lock?
- TryLock solution [Design#1] — Just trylock and immediately unlock.
- LockFree solution [Design #2] — atomic boolean to be set only on the admin thread. I think this is one the simplest usage scenarios of atomic boolean. We are not even relying on the atomicity. We only need the visibility feature.
[1] This grace period is designed to be sufficient for a batch of updates on the worker thread. If we know a batch can write 100 orders and take X microsecond, then set grace period to X. We require worker routine to recheck the lock after each batch.
My friend CSY raised the concern that mid-stream the update could be suspended by kernel scheduler. I think in this case, a simple integer update could take ages, as in the GC stop-the-world scenario. So the design requires some realtime kernel support to guarantee timely response.
read-copy-update lockfree +! retry #RCU mentions a grace period used in RCU api
— Here’s a bigger lockfree design [Design #3]
- without realtime kernel
- Two global variables — in addition to the atomic boolean flag, I will add an atomic int “activeWorkerCnt”
- Two non-reentrant routines (to avoid incremental acquisition)