STL container=#1 common resource owner #RAII

Stroustrup said every resource (usually heapy thingies) need to have an owner, who will eventually return the resource.

By his definition, every resource has an acquire/release protocol. These resources include locks, DB connections and file handles. The owner is the Object responsible for the release.

  • The most common resource owner is the STL container. When a std::vector or std::unorderded_multimap … is destructed, it would release/return the resource to the heap memory manager.
  • The best-known resource-owner is the family of smart pointers.
  • You can also combine them as a container of smart pointers.
  • All of these resource owners rely on RAII

y C++will live on #in infrastructure

I feel c++ will continue to dominate the “infrastructure” domains while application developer jobs will continue to shift towards modern languages.

Stroustrup was confident that the lines of source code out there basically ensure that c++compiler will still be needed 20 years out. I asked him “What competitors do you see in 20 years”. He estimated there are billions of c++ source code by line count.

I said C would surely survive and he dismissed it. Apparently, many of the hot new domains rely on c++. My examples below all fall under the “infrastructure” category.

  • mobile OS
  • new languages’ runtimes such as the dotnet CLR, JVM
  • blockchain mining — compute intensive
  • TensorFlow
  • AlphaGo
  • google cloud
  • Jupyter for data science
  • Most deep learning base libraries are written in c++, probably for efficiency

kernels(+lang extensions)don’t use c++

Q: why kernels are usually written in C not c++? This is an underpinning of the value and longevity of the languages.

I asked Stroustrup. He clearly thinks c++ can do the job. As to why C still dominates, he cited a historical reason. Kernels were written long before C++ was invented.

Aha — I think there are conventions and standard interfaces (POSIX is one of them)… always in C.

I said “The common denominator among various languages is always a C API”. He said that’s also part of what he meant.

many modern languages rely@c++4heavy-lifting

Stroustrup said the jvm, tensorflow, javascript, … can be considered c++ applications. They make use of the c++ compiler.

The c++ compiler is more flexible, more complex, more powerful, more engineered. Those other languages’ compilers lack those advanced features so they leverage the c++ compiler.

I would not say “most modern languages” rely on c++ for heavy-lifting.

benchmark c++^newer languages

C++ can be 5x faster than java if both programs are well-tuned — A ball-park estimate given by Stroustrup.

The c++ code is often written like java code, using lots of pointers, virtual functions, no inline, perhaps with too many heap allocations (STL containers) rather than strictly-stack variables .

Many other benchmarks are similarly questionable. These new languages out there are usually OO and rely on GC + pointer indirection. If you translate their code to C++, the resulting c++ code would be horribly inefficient, not taking advantage of c++ compiler’s powers. An expert c++ developer would rewrite everything to avoid virtual functions and favor local variables and inline, and possibly use compile-time programming. The binary would usually become comparable in benchmark.  The c++ compiler is more sophisticated and have more optimization opportunities, so it usually produces faster code.

local_var=c++strength over GC languages

Stroustrup told me c++ code can use lots of local variables, whereas garbage collected languages put most objects on heap.

I hypothesized that whether I have 200 local variables in a function, or no local variable, the runtime cost of stack allocation is the same. He said it’s nanosec scale, basically free. In contrast, with heap objects, biggest cost is allocation. The deallocation is also costly.

Aha — At compile-time, compiler already knows how many bytes are needed for a given stack frame

insight — I think local variables don’t need pointers. GC languages rely heavily on “indirect” pointers. Since GC often relocates objects, the pointer content need to be translated to the current address of the target object. I believe this translation has to be done at run time. This is what I mean by “indirect” pointer.

insight — STL containers almost always use heap, so they are not strictly “local variables” in the memory sense

heap allocation: java Can beat c++

  • case 1 (standard java): you allocate heap memory. After you finish with it you wait for the java GC to clean it up.
  • case 2 (low latency java): you allocate heap memory but disable java GC. Either you hold on to all your objects, or you leave unreachable garbage orbiting the earth forever.
  • case 3 (c++): you allocate heap memory with the expectation of releasing it, so the compiler sets up housekeeping in advance for the anticipated delete(). This housekeeping overhead is somehow similar to try/catch before c++11 ‘noexcept’.

Stroustrup suggested that #2 will be faster than #3, but #3 is faster than #1. I said “But c++ can emulate the allocation as jvm does?” Stroustrup said C++ is not designed for that. I have seen online posts about this “emulation” but I would trust Stroustrup more.

  • case 4 (C): C/c++ can sometimes use local variables to beat heap allocation. C programmers use rather few heap allocations, in my experience.

Note jvm or malloc are all userland allocators, not part of kernel and usually not using system calls. You can substitute your own malloc.

https://stackoverflow.com/questions/18268151/java-collections-faster-than-c-containers top answer by Kanze is consistent with what Stroustrup told me.

  • no dynamic allocation is always faster than even the fastest dynamic allocation. Similar to Case 4
  • jvm allocation (without the GC clean-up) can be 10 times faster than c++ allocation. Similar to Case 2^3
    • Q: Is there a free list in JVM allocator?

https://softwareengineering.stackexchange.com/questions/208656/java-heap-allocation-faster-than-c claims

  • c++ Custom allocators managing a pool of fixed-sized objects can beat jvm
  • jvm allocation often requires little more than one pointer addition, which is certainly faster than typical C++ heap allocation algorithms

new lang2challenge c++on efficiency@@

I asked Stroustrup — efficiency is the traditional strength of C and C++,  both memory efficiency and speed.. Is that still true? He immediately said yes.

I think it was clear in his mind that c/c++ were still the most efficient languages around. He did say Fortran is optimized for scientific computing.

I later asked him — any new language he watches out for. He said none, without real hesitation, no ifs or buts.

Recalling that conversation, I feel new languages are usually more high-level and easier to use. They are more likely to use heap to provide a “consistent interface” and avoid the complexities of a low-level language.

If I’m right, then these languages can’t and won’t optimize for efficiency as a top priority. Efficiency is possibly a 2nd priority.

scaffolding around try{}block #noexcept

[[ARM]] P358 says that all local non-static objects on the current call stack fully constructed since start of the try-block are “registered” for stack unwinding. The registration is fine-grained in terms of partial destruction —

  • for any array with 3 out of 9 objects fully constructed, the stack unwinding would only destruct those 3
  • for a half constructed composite object with sub-objects, all constructed sub-objects will be destructed
  • Any half-constructed object is not registered since the dtor would be unsafe.

I guess this registration is an overhead at run time.

For the stack objects created in a noexcept function, this “registration” is not required, so compiler may or may not call their destructors.

— in http://www.stroustrup.com/C++11FAQ.html#noexcept Stroustrup hints at the  scaffolding

  • noexcept is a efficiency feature — widely ans systematically used in standard library to improve performance
  • noexcept is crude and “very efficient”
  • dtor may not be invoked upon stack unwinding
  • stack unwinding may not happen at all

 

 

3 real overheads@vptr #inline

Suppose your class Trade has virtual functions and a comparable class Order has no virtual functions. What are the specific runtime overheads of the vptr/vtable usage?

  1. cpu cache efficiency — memory footprint of the vptr in each object. Java affected! If you have a lot of Trade objects with only one char data field, then the vptr greatly expands footprint and you overuse cache lines.
    • [[ARM]] singles out this factor as a justification for -fno-rtti… see RTTI compiler-option enabled by default
    • [[moreEffC++]] P116 singles out vptr footprint as the biggest performance penalty of vptr
  2. runtime indirection — “a few memory references more efficient” [1] in the Order usage
  3. inlining inhibition is the most significant overhead. P209 [[ARM]] says inline virtual functions make perfect sense so it is best to bypass vptr and directly call the virtual function, if possible.

[1] P209 [[ARM]] wording

Note a virtual function unconditionally introduces the first overhead, but the #2/#3 overheads can sometimes be avoided by a smart compiler.

 

std::weak_ptr phrasebook

ALWAYS need to compare with raw ptr + shared_ptr, to understand the usage context, motivations and justifications

http://www.stroustrup.com/C++11FAQ.html#std-weak_ptr is concise.

— Based on the chapter in [[effModernC++]]:

#1 feature — detect dangle

  • use case — a subject that keeps track of its observers who might become dangling pointers
  • use case — objects A and B pointing to each other with ref count … leading to island. Using raw pointers exclusively is possible but requires explicit deletion, as pointed out on P 84 [[Josuttis]]
  • In both use cases, Raw ptr won’t work since dangle becomes unnoticed.
  • Achilles’ heel of the #1 feature — manual “delete” on the raw ptr is beneath the radar of reference counting, and leads to chaos and subversion of ownership control, as illustrated —
#include <iostream>
#include <memory>
using namespace std;

void f1(){
  auto p = new int(55);
  shared_ptr<int> sp(p);
  weak_ptr<int> wp(sp);

  cout<<"expired()? "<<wp.expired()<<endl; // false
  cout<<"deleting from down below\n";
  delete p; // sp.reset();
  cout<<"expired()? "<<wp.expired()<<endl; // still false!
  // at end of this function, shared_ptr would double-delete as the manual delete 
// is beneath the radar of reference counting:(
}
int main(){
  f1();
}

std::sort() beating ANSI-C qsort() #inline

Stroustrup was the first one to tell me c++ std::sort() can beat C qsort() easily.

https://travisdowns.github.io/blog/2019/05/22/sorting.html says:

Since the qsort() code is compiled ahead of time and is found inside the shared libc binary, there is no chance that the comparator funciton, passed as a function pointer, can be inlined.

https://martin-ueding.de/articles/qsort-vs-std-sort/index.html says

For qsort(), since the function is passed as a pointer, and the elements are passed as void pointers as well, it means that each comparison costs three indirections and a function call.

In C++, the std::sort is a template algorithm, so that it can be compiled once for each type. The operator< of the type is usually baked into the sort algorithm as well (inlining), reducing the cost of the comparison significantly.

c++condVar 2 usages #timedWait

poll()as timer]real time C : industrial-strength #RTS is somewhat similar.

http://www.stroustrup.com/C++11FAQ.html#std-condition singles out two distinct usages:

1) notification
2) timed wait — often forgotten

https://en.cppreference.com/w/cpp/thread/condition_variable/wait_for shows std::condition_variable::wait_for() takes a std::chrono::duration parameter, which has nanosec precision.

Note java wait() also has nanosec precision.

std::condition_variable::wait_until() can be useful too, featured in my proposal RTS pbflow msg+time files #wait_until

unordered_map^map performance: unpredictable

Small halo… the theories and explanations are subject to change.

Many people report that unordered_map can be slower than traditional std::map for small collections. I feel it’s implementation-dependent. Result may NOT apply to java or c#.

http://www.stroustrup.com/C++11FAQ.html#std-unordered (by Stroustrup) says “For larger numbers of elements (e.g. thousands), lookup in an unordered_map can be much faster than for a std::map.” In the same paragraph Stroustrup also says unordered_map “lookup involves a single call of a hash function and one or more equality operations

https://blog.dubbelboer.com/2012/12/04/lru-cache.html is mostly about lookup speed. It shows that string key size hurt hash table, but sample size hurts RBTree.

Many interviewers asked me

  • Q: why for a small string collection, RBTree can outperform hash table?
  • A: String hashing takes time proportional to string size
  • A: Poor hashing => Hash collision => linear search is another potential problem. String hashing is unpredictable compared to integer hashing. No one can guarantee the next sample will be “uniform” according to our hash function.
  • A: rehash

Cache efficiency on the bucket array (array of pointers) is another reported weakness in hash tables. However, I guess a small bucket array can be completely held in cache and each linked list usually holds 1 node only so cache miss is on that 1 node. For a tree, logN nodes are accessed for a lookup and all of them could get cache-miss.

Practical suggestion — it’s easy to swap the two in a given codebase, so just swap and benchmark. Either can be faster. If your data set changes over time, you may need to re-benchmark.

c++ uninitialized "static" objects ^ stackVar

By “static” I mean global variables or local static variables. These are Objects with addresses, not mere symbols in source code. Note Some static objects are _implicitly_ static — P221 effC++.

Rule 1: uninitialized local and class fields of primitive types (char, float…) aren’t automatically initialized. See [[programming]] by Stroustrup.

Rule 2: uninitialized local or file-scope statics are automatically initialized to 0 bits at a very early stage of program loading. See https://stackoverflow.com/questions/1597405/what-happens-to-a-declared-uninitialized-variable-in-c-does-it-have-a-value

Rule 3: all class instances are “initialized”, either explicitly, or implicitly via default ctor. Beware… Reconsider Rule 1 — is a field of primitive type of the class initialized? I don’t think so. Therefore, “initialized” means a ctor is called on the new-born instance, but not all fields therein are necessarily initialized. I’d say the ctor can simply ignore a primitive-typed field.

Uninitialized static Objects (stored in BSS segment) don’t take up space in object file. Also, by grouping all the symbols that are not explicitly initialized together, they can be easily zeroed out at once. See
http://stackoverflow.com/questions/9535250/why-is-the-bss-segment-required

Any static Object explicitly initialized by programmer is considered an “initialized” static object and doesn’t go into BSS.

P136 [[understanding and using c pointers]] uses a real example to confirm that a pointer field in a C struct is uninitialized. C has no ctor!

P261 [[programming]] by Stroustrup has a half-pager summary
* globals are default-initialized
* locals and fields are truly uninitialized …
** … unless the data type is a custom class having a default ctor. In that case, you can safely declare the variable without initialization, and content will be a pre-set value.

divide-by-0: c++no excp;java throws..why

https://stackoverflow.com/questions/8208546/in-java-5-0-statement-doesnt-fire-sigfpe-signal-on-my-linux-machine-why explains best.

http://stackoverflow.com/questions/6121623/catching-exception-divide-by-zero — c++ standard says division-by-zero results in undefined behavior (just like deleting Derived via a Base pointer without virtual dtor). Therefore programmer must assume the responsibility to prevent it.

A compliant c++ compiler could generate object code to throw an exception (nice:) or do something else (uh :-() like core dump.

If you are like me you wonder why no exception. Short answer — c++ is a low-level language. Stroustrup said, in “The Design and Evolution of C++” (Addison Wesley, 1994), “low-level events, such as arithmetic overflows and divide by zero, are assumed to be handled by a dedicated lower-level mechanism rather than by exceptions. This enables C++ to match the behavior of other languages when it comes to arithmetic. It also avoids the problems that occur on heavily pipelined architectures where events such as divide by zero are asynchronous.”.

C doesn’t have exceptions and handles division-by-zero with some kind of run time error (http://en.wikibooks.org/wiki/C_Programming/Error_handling). C++ probably inherited that in spirit. However, [[c++primer]] shows you can create your own divideByZero subclass of a base Exception class.

java has no “undefined behavior” and generates an exception instead.

RAII^ContextManager^using^java-AutoCloseable

1) Stroustrup commented that c++ doesn’t support finally{} because it has RAII dtor. See
http://www.stroustrup.com/bs_faq2.html#finally

Both deal with exceptional exits.
Both are robust.
Both are best practices.

However, try{} etc has performance cost, so much so that some c++ compilers can be configured to disable it. C++ Memory management relies heavily on RAII. Using Try for that would be too costly.

2) python ContextManager protocol defines __enter__() and __exit__() methods

Keyword “with” required …

3) Java uses finally(). Note finally{} becomes implicit in java7 try-with-resources

AutoCloseable interface is needed in try-with-resource. See https://docs.oracle.com/javase/tutorial/essential/exceptions/tryResourceClose.html

4) c# — Achilles’ heel of java GC  is non-deterministic. C#’s answer is q(using). C# provides both USING and try/finally. Under the hood USING calls try/finally.

I feel c# USING is evolution-wise a closer cousin to RAII (while try/finally is is a distant cousin). Both use variable (not obj) scope to manage object (not var) lifetime.

USING uses Dispose() method, which is frequently compared to the class dtor/Finalize(). For the difference between c# Dispose() vs dtor vs Finalize, see other blog post(s).

As you can see, c# borrowed all the relevant techniques from c++ and java. So it’s better to first understand the c++/java constructs before studying c# constructs.