condition.signalAll Usually requires locking

Across languages,

  • notify() doesn’t strictly need the lock;
  • wait() always requires the lock.
    • c++ wait() takes the lock as argument
    • old jdk uses the same object as lock and conditionVar
    • jdk5 makes the lock beget the conditionVar

—- original jdk
100% — notify() must be called within synchronized block, otherwise exception. See

—- jdk 5 onwards
99% — says that An implementation may (and typically does) require the lock acquired by the notifying thread.

Not 100% strict.

— compare await():
100% — On the same page, the await() javadoc is 100% strict on this requirement!

0% — notify_all() method does NOT require holding the lock.  See

–compare wait()
100% — wait(myLock) requires a lock as argument!

100% — Similar to old JDK, the notifying thread must hold the lock. See


read-copy-update lockfree without retry

RCU is an advanced concurrency construct, not implemented in a regular application, but valuable for lock-free interviews. is a simple java implementation

The Wikipedia article is accessible until the paragraph introducing read-side-critical-section and grace-period. This is the first paragraph on the implementation. I found it hard. Therefore a few background pointers:

· There must be multiple threads, typically many readers and few writers.

· There must a shared mutable data structure, probably on heap.

· Not all data structures are suitable for RCU. So which subset are? I would say pointer-graph including hash tables.

In the simplified model, we are talking about

· A reader thread R1 executing a block of code that has started reading the shared mutable data structure. This code block is the critical section

· A writer thread W1 executing a block of code attempting to update the same data.

· After the so-called grace period, a GC thread would reclaim the obsolete data that R1 has finished reading.

· A reader R2 enters the critical section after the update is done. R2 would see the new data

GC need to know when it’s safe to reclaim. It needs the wait-for-readers operation and the grace period.

Q: What if R1 gets a reference to the “obsolete node”, performs some slow task, reads the node, then exits the critical section? This node is reclaimable only after R1 exits critical section. The grace period would be long?

Q: Among 9999 nodes, how does system keep track which each node is reclaimable?

%%A: I feel kernel access to CPU registers might be needed.

Q: How does it compare with copy-on-write?

A: COW probably copies the entire data structure. Also, the modified copy is not visible to subsequent readers (like R2)

Q: How does it compare with read/write lock?

A: readlock can block

Q: How does it relate to lock-free concepts?

A: reader threads are lock-free and even better — not required to retry.

A GC threads need to wait.

A: Writer thread (like W1) ? not sure. Might be lock-free in some situations.

where are jvm locks stored@@ #cf Windows

A 2017 Morgan Stanley on-site interviewer asked…

  • %%A: can’t be on stack, since multiple threads need access to it
  • %%A: probably kernel objects. On Windows, most of the synchronization objects are wrappers over kernel objects. See [[Optimized c++]] P288.
    • If java threads maps to kernel threads then I would think the same.
    • If java threads are implemented in userland, then kernel scheduler is unaware and can’t use kernel objects to control them. I would think JVM must have its own mutex objects without kernel support. explains that only embedded or low-power devices use “green threads” i.e. userland threads. Linux and Windows JVM probably don’t support green threads, since the kernel provide “better” thread support.

jargon: mutex^lock^monitor

All these terms appear in function names and class names. They must have well-defined technical meanings. Problem is, each platform has its own definitions. [[Optimized c++]] P288 singled out “Semaphore” jargon as rather different animals across operation systems.

–In Java

  • Monitor refers to the “combo” object serving as both lock and condition variable. Java 5 introduced an alternative family of objects, where a lock object gives birth to condition objects. Most java developers still use the simpler combo object.
  • I feel lock vs mutex mean the same.

–in c++

  • Lock is an RAII object, based on a mutex.
  • C++ doesn’t use “monitor” jargon as far as I know.

–On Win32


readLock without writeLock – same as no lock@@

Q9: With regard to shared-mutable access semantics, using a readLock and discarding the writeLock (i.e. unable to use it) is similar to using no lock, but are they semantically the same?

This is a theoretical discussion. The implementations are not familiar to me and not relevant. Before we answer  Q9, let’s look at

Q1: in the absence of shared-mutable, do we need any lock?
A1: no

Q2: what if we use readLock without writeLock?
A2: should be fine.

Q3″ what if we do use a lock in a shared function?
A3: system behavior is modified — serialized access, which is unnecessary

A9: should be semantically identical. Everyone gets a free ticket at entrance.

mutex^condition top2constructs across platforms#except win32 is one more system programming manual that asserts the fundamental importance of mutex and condition variable.

NSPR started with java in mind but primary application is supporting clients written entirely in C or C++.

I said many times in this blog that most thread synchronization features are built on top these duo.

However, on Windows the fundamental constructs are locks and waitHandles…

more threads won’t help throughput if I/O bound

To keep things more concrete. You can think of the output interface in the I/O.

The paradox — given an I/O bound busy server, the conventional wisdom says more thread would increase CPU utilization but the work queue for CPU is already pretty much empty, whereas the I/O queue is constantly full, as the I/O subsystem keeps working at full capacity.

Ideal is simultaneous saturation. Suggestion: We could “steal” a cpu core from this engine and use it for other tasks. Additional threads or processes basically achieve that purpose. In other words, the cpu cores aren’t dedicated to this purpose.

Assumption — adding more I/O hardware is not possible. (Perhaps scaling out to more nodes could help.)

If the CPU cores are dedicated, then there’s no way to improve things without adding more I/O capacity. There’s simply too much CPU overcapacity.