— https://en.wikipedia.org/wiki/Hash_table#Separate_chaining_with_other_structures is a language-neutral discussion.
— https://digitalmars.com/d/archives//640.html#N803 shows an early implementation
they are implemented as a hashed array of binary trees.
My experience with them is that they are very efficient.
— JEP 180 is the white paper to introduce a self-balanced tree as an alternative to linked list.
https://dzone.com/articles/hashmap-performance shows with one million entries in a HashMap a single lookup taken 20 CPU cycles, or less than 10 nanoseconds. Another benchmark test demonstrates O(logN) get(key) in java8, but O(N) in java7, as traditionally known.
If two hashCode() values are different but ended up in the same bucket (due to rehashing and bucketing), one is considered bigger and goes to the right. If hashCodes are identical (as in a contrived hashCode() implementation), HashMap hopes that the keys are Comparable, so that it can establish some order. This is not a requirement of HashMap keys.
If hashCodes are mostly identical (rare) and keys are not comparable, don’t expect any performance improvements in case of heavy hash collisions. Here’s my analysis of this last case:
- if your Key.equals() is based on address and hashCode() is mostly the same, then the RBTree ordering can and should use address. You won’t be able to look up using a “clone” key.
- if you have customized Key.hashCode() then you ought to customize equals(), but suppose you don’t implement Comparable, then you are allowed to lookup using a clone key. Since there’s no real ordering among the tree nodes, the only way to look up is running equals() on every node.
A given bucket contains both Node (linked list) and TreeNode (red-black tree). Oracle decided to use both data structures with the following rules:
- If for a given index (bucket) in the inner table there are more than 8 nodes, the linked list is transformed into a red black tree
- If for a given index (bucket) in the inner table there are less than 6 nodes, the tree is transformed into a linked list
With the self-balanced tree replacing the linked list, worst-case lookup, insert and delete are no longer O(N) but O(logN) guaranteed.
This technique, albeit new, is one of the best simple ideas I have ever seen. Why has nobody thought of it earlier?
Need to developer more low-level insights as QQ…
- Out of the four kinds of method refs, only AA) static-method and BB) specific-instance kinds are popular. The other two types are obscure.
- q[ :: ] is used in all four kinds
- I think the static-method kind is most readable and most intuitive. The javadoc tutorial features a BB example that should be implemented as static method IMHO.
- a lambda expression has parameter types. A method ref has none and must be converted to a lambda expression. Where does compiler infer the parameter types? I think it is inferred from the calling context.
Before java8, a key type used in a HashMap is not required to be comparable, but now they should if hash collision is likely. See RBTree used inside java8 hashmap
Note cgroup is also usable beyond jvm and Docker, but i will just focus on jvm running in a Docker container..
Based on https://jaxenter.com/nobody-puts-java-container-139373.html
CPU shares are the default CPU isolation (or sharing??) and basically provide a priority weighting across all cpu time slots across all cores.
The default weight value of any process is 1024, so if you start a container as follows q[ docker run -it –rm -c 512 stress ] it will receive less CPU cycles than a default process/container.
But how many cycles exactly? That depends on the overall set of processes running at that node. Let us consider two cgroups A and B.
sudo cgcreate -g cpu:A
sudo cgcreate -g cpu:B
cgroup A: sudo cgset -r cpu.shares=768 A 75%
cgroup B: sudo cgset -r cpu.shares=256 B 25%
Cgroups A has CPU shares of 768 and the other has 256. That means that the CPU shares assume that if nothing else is running on the system, A is going to receive 75% of the CPU shares and B will receive the remaining 25%.
If we remove cgroup A, then cgroup B would end up receiving 100% of CPU shares.
https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/6/html/resource_management_guide/sec-cpu has more precise details.
https://scoutapp.com/blog/restricting-process-cpu-usage-using-nice-cpulimit-and-cgroups compares q(nice), cpulimit and cgroups. It provides more precise info on cpu.shares.
cpulimit can be used on an existing PID 1234:
cpulimit -l 50 -p 1234 # limit process 1234 to 50% of cpu timeslots. The remaining cpu timeslots can go to other processes or go to waste.
- #1 Most important – modular jars featuring declarative module-descriptors i.e. requires and exports
- #2 linux cgroup support.. For one example, see Docker/java9 cpu isolation/affinity
- #3 G1 becoming default JGC.. CMS JGC: deprecated in java9
- REPL JShell
- private interface methods, either static or non-static
- Minor: C++11 style collection factory methods like
List<String> strings = List.of(“first”, “second”);
It’s unbelievable but not uncommon in Java history —
- Java9 release introduced significantly fewer and less impactful features than java8.
- Similarly, java5 overshadows java6 and java7 combined
Recap — A QQ topic is defined as a “hard interview topic that’s never needed in projects”.
Background — I used to feel as new versions of an old language get adopted, the QQ interview topics don’t change much. I can see that in java7, c#, perl6, python3.
To my surprise, compared to java7/8, c++0x has more disruptive impact on QQ questions. Why? Here are my guesses:
- Reason: low-level —- c++ is more low-level than java at least in terms of interview topics. Both java8 and c++0x introduced many low-level changes, but the java interviewers don’t care that much.
- Reason: performance —- c++0x changes have performance impact esp. latency impact, which is the hot focus of my target c++ employers. In contrast, java8 doesn’t have much performance impact, and java employers are less latency-sensitive.
- Reason: template —- c++0x feature set has a disproportionate amount of TMP features which are very hard. No such “big rock” in java.
- move/forward, enable_if, type traits
Q: if that’s the case, for my career longevity, is c++ a better domain than java?
A: I’m still biased in favor or low-level languages
Q: is that a form of technology churn?
A: yes since the c++11 QQ topics are likely to show up less over the years, replaced by newer features.
https://jaxenter.com/nobody-puts-java-container-139373.html is a 2018 article with some concrete examples demonstrating cpu isolation.
a Docker cgroup can specify a cpu-set (like core0 + core3 + core14) and limit itself to this cpu-set. Performance Motivation — preventing a process hopping between cores.
The “cpu-set” scheme provides conceptually simpler cpu isolation, but less popular than the “cpu-share” scheme.
Java9 offers support for cpu isolation if you adopt the the cpu-set scheme but not the cpu-share scheme, as explained succinctly in the article.
A historical note — In 2006 (Mansion/Strategem) I spoke to a Sun Microsystems consultant. An individual Solaris “zone” can specify which cpu core to use. This is my first encounter with CPU isolation/affinity.
I feel the most strategic changes in java9 are about cloud, such as
- Docker container
- memory footprint reduction
https://qconsf.com/sf2016/sf2016/presentation/java-se9-cloud.html is a 2016 article
When cloud computing becomes an even bigger driver than it is today, some legacy technologies will gain mind-share while others will lose mind-share. Java 9 team wants to be a winner.
Incidentally, NIO buffer is also in native memory
Java9/10 default GC is G1. CMS is officially deprecated in Java 9.
Java8/7 default GC is ParallelGC, CMS. See https://stackoverflow.com/questions/33206313/default-garbage-collector-for-java-8
Note parallelGC uses
- parallel in most generations
- serial in old gen
…whereas parallelOldGC uses parallel in all generations.
Q: why is CMS deprecated?
A: one blogger seems to know the news well. He said JVM engineering team needs to focus on new GC engines and need to let go the most high-maintenance but outdated codebase — the CMS, As a result, new development will cease on CMS but CMS engine is likely to be available for a long time.
In terms of sorting performance, Arrays.sort(primitiveArray) is a few times faster than Collections.sort() even though both are O(N logN). My learning notes:
- Arrays.sort(int ) is a double-pivot quicksort, probably using random access
- Arrays.sort(Object ) is a mergesort
- Collections.sort(List) defers to List.sort()
- List.sort() is a Java8 default method in the List.java interface. It copies data to an array then runs a mergesort
- ArrayList.java overrides the default method, so no copying for ArrayList from java8 onwards
RandomAccess marker interface (ArrayList implements) is completely irrelevant. That’s because any List.java subtype that provides RandomAccess can simply override (at source code level) the default method as demonstrated in ArrayList.java. This is cleaner than checking RandomAccess at runtime. One or Both designs could potentially be JIT-compiled to remove the runtime check.
Compared to c#, java language design was cleaner and simpler, at the cost of lower power and flexibility. C++ is the most flexible, powerful and complex among the trio.
There was never strong reason to disallow static methods in an interface, but presumably disallowed (till java 7) for the sake of simplicity — “Methods in interface is always abstract.” No ifs and buts about it.
With “default methods”, java 8 finally broke from tradition. Java 8 has to deal with Multiple Inheritance issue. See other blog posts.
In total, there are now 4 “deviations” from that simple rule, though some of them are widely considered irrelevant to the rule.
- a concrete nested class in an interface can have a concrete method, but it is not really a method _on_ that interface.
- Suppose an interface MyInterFace re-declares toString() method of Object.java. That method isn’t really abstract.
- There’s very few reasons to do this.
- static methods
- default methods — the only real significant deviation from the rule
static methods in interface (SIM for short) is a minor feature of java8, fairly low-level and only interesting to java language students like me.
Two noteworthy points are raised in [[mastering lambdas]] P172 footnote (yes the footnote)
- A “traditional” static method (i.e. defined in a class) is inherited, but SIM is not inherited, Consequently …
- A “traditional” static method can be invoked using myObj.staticMeth1() but SIM can only be invoked using myInterface.staticMeth()
These two restrictions remove “loose” syntax around traditional static methods.
Among the java8 features, I think default method is a more drastic and fundamental change than lambda or stream, in terms of language
In my HSBC interview a London interviewer (Wais) challenged me and said that he thought default methods are designed for backward compatibility. I now think he was wrong.
—- Based on P171 [[Mastering Lambdas]]
Note The rare cases of incompatibility is an obscure (almost academic) concern. More important are the rules of method resolution when default methods are among the candidates. This topic is similar in spirit to popular interview questions around overriding vs overloading.
Backward compatibility (BWC) means that when an existing interface like Collection.java includes a brand new default method, the existing “customer” source code should work as before. Default methods has a few known violations of BWC.
- simplest case: all (incl. default) methods in an interface must be public. No ifs or buts. Suppose Java7 MyConcreteClass has private m2() and implements MyIntf. What if MyIntf is now updated with a default method m2()? Compilation error!
- a more serious case: java overriding rule (similar to c++) is very strict so m(int) vs m(long) is always, automatically overloading not overriding. Consider a method call myObj.m(33). Originally, this binds to the m(long) declared in the class. Suppose the new default method is m(int) … an Overload! At compile time, this is seen as a better match so selected by compiler (not JVM runtime)… Silent, unexpected change in business logic and a violation of BWC!
This refreshingly thin book gives 2 more examples. Its last example is a serious backward incompatibility issue but I am not convinced it is technically possible. Here’s my rationale —
Any legacy code relying on putIfAbsent() must have an implementation of putIfAbsent() somewhere in some legacy java7 class. Due to “class-wins-over-interface” rule, a new default method putIfAbsent() will never be chosen when compiling the legacy code using java8 tool chain.
(See also post on market data and size of Object)
High volume market data systems deal with primitives like ints, char-arrays and occasionally bit-arrays These often need to go into collections, like vector of int. These systems avoids DAM allocation like a plague.
Specifically, Java collections auto-boxing results in excessive memory allocation — a new object for every primitive item. A major justification for c++ STL, according to a CS veteran.
Sun also says autoboxing is inappropriate for high performance.
C# generic collection is free of autoboxing, just like STL, so is the primitive streams in java 8. But I don’t know if market data systems has actually embraced primitive streams.