Stroustrup told me c++ code can use lots of local variables, whereas garbage collected languages put most objects on heap.
I hypothesized that whether I have 200 local variables in a function, or no local variable, the runtime cost of stack allocation is the same. He said it’s nanosec scale, basically free. In contrast, with heap objects, biggest cost is allocation. The deallocation is also costly.
Aha — At compile-time, compiler already knows how many bytes are needed for a given stack frame
insight — I think local variables don’t need pointers. GC languages rely heavily on “indirect” pointers. Since GC often relocates objects, the pointer content need to be translated to the current address of the target object. I believe this translation has to be done at run time. This is what I mean by “indirect” pointer.
insight — STL containers almost always use heap, so they are not strictly “local variables” in the memory sense
- case 1 (standard java): you allocate heap memory. After you finish with it you wait for the java GC to clean it up.
- case 2 (low latency java): you allocate heap memory but disable java GC. Either you hold on to all your objects, or you leave unreachable garbage orbiting the earth forever.
- case 3 (c++): you allocate heap memory with the expectation of releasing it, so the compiler sets up housekeeping in advance for the anticipated delete(). This housekeeping overhead is somehow similar to try/catch before c++11 ‘noexcept’.
Stroustrup suggested that #2 will be faster than #3, but #3 is faster than #1. I said “But c++ can emulate the allocation as jvm does?” Stroustrup said C++ is not designed for that. I think he meant impractical/invalid. I have seen online posts about this “emulation” but I would trust Stroustrup more.
- case 4 (C): C/c++ can sometimes use local variables to beat heap allocation. C programmers use rather few heap allocations, in my experience.
Note jvm or malloc are all userland allocators, not part of kernel and usually not using system calls. You can substitute your own malloc.
— https://stackoverflow.com/questions/18268151/java-collections-faster-than-c-containers top answer by Kanze is consistent with what Stroustrup told me.
- zero dynamic allocation (Similar to Case 4) is always faster than even the fastest dynamic allocation.
- jvm allocation (without the GC clean-up) can be 10 times faster than c++ allocation. Similar to Case 2^3
- Q: Is there a free list in JVM allocator? Yes
— https://softwareengineering.stackexchange.com/questions/208656/java-heap-allocation-faster-than-c claims
- c++ Custom allocators managing a pool of fixed-sized objects can beat jvm
- jvm allocation often requires little more than one pointer addition, which is certainly faster than typical C heap allocation algorithms in malloc
I now recall that when I programmed in C, my code never used malloc() directly.
The library functions probably used malloc to some extent, but malloc was advanced feature. Alexandrescu confirmed my experience and said that c++ programmers usually make rather few malloc() calls, each time requesting a large chunk. Instead of malloc, I used mostly local variables and static variables. In contrast, C++ uses heap much more:
- STL containers are 99% heap-based
- virtual functions require pointer, and the target objects are usually on heap, as Alexandrescu said on P78
- pimpl idiom i.e. private implementation requires heap object, as Alexandrescu said on P78
- the c++ reference is used mostly for pass-by-reference. Pass-by-reference usually works with heap objects.
In contrast, C++ uses small chunks of heap memory.
Across languages, heap usage is is slow because
- In general OO programming uses more pointers more indirection and more heap objects
- heap allocation is much slower than stack allocation, as Stroustrup explained to me
- using a heap object, always always requires a runtime indirection. The heap object has no name, only an address !
- In Garbabe-Collected languages, there’s one more indirection.
I see various evidence that industry practitioners consider the default allocator too slow.
I don’t think system call is the issue. System calls are very infrequent with malloc.
Heap allocation is extremely slow compared to other operations.
For a given class C with a derived class D,
C::operator new(……); // is inheritable by D; implicit static; allocates raw memory
C::operator delete(…); // is inheritable by D; implicit static; deallocates raw memory i.e. return to the “pool”
C(……..); // never inherited; never static; turns allocated raw memory into object
~C(void); // never inherited; never static; turns object into raw memory, to be deallocated.
All of them can be private.
As explained in another blog post, the new expression like “new D()” can’t be overloaded. This expression invokes the operator new and the ctor.
[[effC++]] has a lengthy chapter on how to customize new/delete.
P139 [[understanding and using c pointers]] has a short chapter on how to avoid the overhead of frequent malloc/free operations. Note this is NOT replacing malloc with our own malloc.
P70 [[understanding and using c pointers]] has a one-pager super-simple wrapper over free()
Someone speculated that “in any programming language, heap memory is allocated always using the C function malloc(). There’s no alternative.” Nigel (Citi) disagrees. If a language is not based on C, then it can use its own heap-management library.
The heap-mgmt library is a wholesaler/retailer. For efficiency this library requests large blocks  of memory and gives out small chunks to the application. Probably many languages have a heap-mgmt library. C’s heap library (in glibc) uses malloc(). Nigel felt C# has its own heap-mgmt library and may have a malloc-equivalent. JVM is written in C but it could re-implement the heap-mgmt library with its own malloc-equivalent.
Everyone must file tax returns with the same government, but through different tax consultants. Tax consultants are not part of the government. Similarly, Heap-mgmt library is one level above system calls. It makes system calls (perhaps brk()/sbrk()) to request the large blocks from OS. Every language must use system calls to request memory but possibly using its own heap-mgmt library.