Q: if I declare a huge int array in static memory, will the object file become huge?
This is possibly a QQ nlg pearl, a halo zbs, not GTD
Q: if I declare a huge int array in static memory, will the object file become huge?
This is possibly a QQ nlg pearl, a halo zbs, not GTD
Advantage 1: faster allocation, as explained in other blogposts
Advantage 2: programmer can "carelessly" create an "local" Object in any method1, pass (by reference) the object into other methods and happily forget about freeing the memory.
In this extremely common set-up, the reference itself is a stack variable in method1, but the heapy thingy is "owned" by the GC.
In contrast, c/c++ requires some "owner" to free the heap memory, otherwise memory would leak. There’s also the risk of double-free. Therefore, we absolutely need clearly documented ownership.
Stroustrup said every resource (usually heapy thingies) need to have an owner, who will eventually return the resource.
By his definition, every resource has an acquire/release protocol. These resources include locks, DB connections and file handles. The owner is the Object responsible for the release.
Deepak CM described a pre-allocated free-list used in his telecom system. I think https://www.embedded.com/dynamic-memory-and-heap-contiguity/ briefly describes the same scheme.
https://github.com/tiger40490/repo1/blob/cpp1/cpp/lang_66mem/fixedSizedFreeList.cpp is my self-tested implementation. Am proud of the low-level details that I had to nail down one by one.
He said his system initialization could make 400,000 new() calls to allocate 400,000 dummy objects and put them into a linked list. Personally, My design /emulates/ it with a single malloc() call. This is at startup time.
During the day, each new msg will overwrite [1] a linked node retrieved at the Head of the slist.
[1] using operator=(). Placement-new would be needed if we use a single malloc()
Every release() will link-in the node at the Tail of the slist. Can we link it in at the Head? I think so. Benefit — It would leave a large chunk of continuous free space near the tail. Improved Fragmentation.
Illustration — Initially the physical addresses in the slist are likely consecutive like addr1 -> addr2 -> addr3…. After some release() calls, it would look like a random sequence.
Using return-to-Head, I would get
— The API usage:
Technique: reinterpret_cast
Technique: memcpy
Technique: pre-allocated buffer as local static objects … can be better than "stack" allocation only for a large buffer.
Unlike local variables, local Static objects can be returned by pointer — major flexibility in low-level memory management.
https://stackoverflow.com/questions/3835857/in-c-does-using-static-variables-in-a-function-make-it-faster?rq=1 discusses the runtime cost of static local vs local
Q: Your resume says “memory mgmt” for c++, so what did you do personally, not as a team?
A: I will talk about the features in my system. Most of them are not written by me, honestly
Q: java
In general, we can’t use memcpy to clone the internal array of a vector.
If we have a vector of simple value types like an int or float, memcpy is probably simpler.
If there’s any pointer member data, then we are in trouble. For example, if there’s any std::string in the internal array, memcpy will bypass the string copy operations. The internal pointer in the string will be copied incorrectly. When the original array content gets deleted, our cloned strings will be messed up.
There are many other element types that involve pointer:
Q: For vector of int, is memcpy faster than copying element by element?
A: Probably not. Both O(N). A for-loop to copy an array of integers may compile to the same binary.
By the way, you can also allocate memory without calling ctor. The malloc() and q[ operator new ] can do that, as explained in [[moreEffC++]]
[[Pro .net performance]] P92 has 2 pages on C/c++ free-list. Concise, in-depth.
#define PACKED __attribute__ ((packed))
This is used in our parsers. https://stackoverflow.com/questions/11770451/what-is-the-meaning-of-attribute-packed-aligned4 explained that this is a compiler feature.
https://stackoverflow.com/questions/27728142/c11-what-is-its-gc-interface-and-how-to-implement
GC interface is partly designed to enable
The probe program listed in the URL shows that as of 2019, all major compilers provide trivial support for GC.
Q: why does c++ need GC, given RAII and smart pointers?
A: system-managed automatic GC instead of manual deallocation, without smart pointers
I hit many errors with memcpy, but memset can show the same error more easily:
int main(){ static size_t aa = 0; static unsigned char buf[1]; memset(buf, 1, 16); cout<<aa<<endl; //prints 72340172838076673 }
In many cases, people need to store addresses in a container. Let’s use std::vector for example. Both smart ptr and raw ptr are common and practical
Popular IV topic. P41 [[more effective c++]] has an excellent summary:
The book has examples of Case 2 and Case 3.
Note it’s common to directly call constructor on stack and in global area, but on the heap, placement-new is the only way.
Placement-new is a popular interview topic (Jump, DRW and more …), rarely used in common projects.
P9 [[c++game development primer]] has a short implementation without using heap. The memory pool comes from a large array of chars. The allocator keeps track of allocated chunks but doesn’t reuse reclaimed chunks.
It showcases the header associated with each allocated chunk. This feature is also part of a real heap allocator.
reinterpret_cast is used repeatedly.
I find the chapter fairly practical and up to date, better than the specific techniques in [[effC++]]
This domain is full of bite-sized knowledge pearls — my advantage. Here are some examples:
….
Sound byte — packing is for structs but alignment and padding is Not only for structs
Longer version — alignment and padding apply to Any object placement, whether that object is a field of a struct or an independent object on stack. In contrast, the packing technique is specific to a struct having multiple fields.
https://github.com/tiger40490/repo1/blob/cpp1/cpp1/memAlign.cpp has my test code to reveal the rules:
http://stackoverflow.com/questions/11108328/double-alignment Wug’s answer echoes Ashish’s comment that tiny fields like char should be grouped together, due to Rule #4. This applies to stack layout as well [1]. However, compiler optimization can be subversive:
“Not only can the compiler reorder the stack layout of the local variables, it can assign them to registers, assign them to live sometimes in registers and sometimes on the stack, it can assign two locals to the same slot in memory (if their live ranges do not overlap) and it can even completely eliminate variables.”
[1] See https://stackoverflow.com/questions/1061818/stack-allocation-padding-and-alignment
See also post about MSDN overview on mem mgmt…
For all simple scenarios, I feel the default and standard idiom to manage the delete is RAII. This means the delete is inside a dtor, which in turn means there’s a wrapper class, perhaps some smart ptr class.
It also means we create stack instances of
– the wrapper class
– a container holding the wrapper objects
– an umbrella holding the wrapper object
Should every departure/deviation be documented?
I feel it’s not a best idea to pass a pointer into some cleanup function, and inside the function, delete the passed-in pointer. What if the pointee is a pointee already deleted, or a pointee still /in active service/, or a non-heap pointee, or a pointee embedded in a container or umbrella… See P52 [[safe c++]]
Deallocation is explained in https://stackoverflow.com/questions/13944886/is-stdvector-memory-freed-upon-a-clear
I now think there are various overhead with DMA
* search the free list for a suitable chunk
* fragmentation
* If an allocation is needed where the heap is almost used up, glibc must grab more memory from the kernel, then hand out a slice of it.
Linux alloca() and variable-length-array both avoids some of the overhead. See P267 [[linux sys programming]].
If a low-latency module does a ton of malloc(), then alloca() might outperforms malloc() easily. We should benchmark.
The op-new and op-new[] operators both return an address. If you received the address without knowing which new operator was used, then it’s impossible to know how to delete it. As [[effC++]] points out, deleting the wrong way is UndefinedBehavior.
On a similar note, if your function receives a char*, you would have no idea whether it’s a single char or a string.
Note there's NOT a 2nd parameter to indicate length of the array.
I might have blogged about this…. [[more effC++]] by Scott Meyers has a chapter on this. The basic ideas, expressed in my own language for maximum absorption —
case 1) it’s relatively easy to prevent qq( new MyClass ) P157 basically says customize a new-operator as a private static class member and leave it undefined.
— below cases 2 and 3 are non-issues in “our” projects, where team is small and guys are expected to read the source code comments —
case 2) MyClass could be subclassed to Der. If Der is beyond my control, then MyClass could (against my will) be instantiated on heap as a subobject
case 3) MyClass could be a field in class Umbrella. If Umbrella is beyond my control, then MyClass could (against my will) be instantiated on heap as a subobject
I feel in many enterprise app teams, Cases 2/3 are rare and impractical, because Der and Umbrella would be owned by myself or my own team.
Therefore, in many real world contexts it is feasible to prevent heap instantiation of my class, and reduces memory leak.
http://msdn.microsoft.com/en-us/library/vstudio/hh438473.aspx is a brief but decent overview, illustrated with code snippets. I re-read in Nov 2017.
In general, I feel mem leak is the worst of all memory bugs.
wild pointers, double-delete … may not but usually do cause immediate failure. Tend to be uncovered early. In contrast, ML is too often a well-hidden bug.
See RAII phrasebook
Smart pointer is an example of RAII for memory management.
Item 8 of [[more eff C++]] has a good coverage. Here’s my summary.
Q: Which of these does the compiler allow you to invoke directly in your source code?
OPERATOR placement-new? Not sure. Perhaps never needed.
placement-new EXPRESSION? Of course. P40
—
placement delete? NO SUCH THING
dtor? yes. P42. I also saw it in some C++ FAQ.
operator new? yes P39. Rather similar to how you call malloc()
operator delete? Yes p41
—
delete[] EXPRESSION? Of course. That’s the standard way to de-allocate an array of “heapy thingies”
OPERATOR delete[]? Not sure.
—
operator new[] ? Not sure. I feel we can always avoid invoking this explicitly. We can invoke operator-new with a sizeof(targetObject)*arraySize. Both operators return a void pointer to a chunk of raw memory of the correct size.
————Here are the legitimate pairings. “Inter-faith marriage” leads to undefined behaviour ———–
malloc === free
operator new === operator delete // similar to malloc/free, but very rarely needed
new expression === delete expression
new[] expression === delete[] expression
placement-new expression === explicit dtor call followed by some special form of de-allocation. There’s no such thing as placement-delete
————
Q: which of these can we customize?
placement-new EXPRESSION? no
OPERATOR placement new? yes P40
OPERATOR new[] and operator delete[]? Yes P43
Every single STL container use allocators including the (implicit!) default allocators. It’s probably worthwhile to see a real allocator. [[STL tutorial]] has some incomplete code. [[effC++]] Items 8,9,10 present a customized op-new, which seems to be the basis of a reusable allocator class.
An allocator keeps its own free list, like malloc. Wholesale vs retail.
I believe an allocator is tightly integrated to customized op-new.
I was told in multi-threaded apps, the default op-new hits the same heap every time and can be a synchronization hotspot. I guess if there’s a single free-list then it needs synchronization protection. By giving a mini-heap to each thread, op-new must be customized to hit the mini-heaps.
– some operations must be piecemeal
– some operations must be one-gulp
* ctor and dtor are always piece-meal. Reason — each subclass and embedded component class can define its own ctor and dtor.
** For example, dtor execute in the DCBC sequence — http://bigblog.tanbin.com/2010/01/d-c-b-destructor-execution-order.html
* “operator new” (I don’t mean a new-expression) is always one-gulp. Reason — the memory (“real-estate”) of an entire object must be a contiguous piece of “real estate”, but one allocation for one part of the real estate and another allocation for another part will be dis-contiguous.
** There’s an important implication on P157 [[more eff C++]]
Background —
+ if you are a derived class instance, Your base class instance will occupy part of “your” real estate
+ if you have a nonref[1] field of a another class, That instance occupies part of “your” real estate
(Much of this write-up is based on http://www.cprogramming.com/tutorial/operator_new.html) It’s useful if we can restrict a special class’s instantiation to the stack only, and disallow heap allocation. Overload op-new for your class, and make it throw exception unconditionally? Alternatively, can we declare it but leave it unimplemented like —
class Myclass {
public:
void* operator new(size_t);
void operator delete(void*);
};
Both of them are by default static members. Overloading can be used for many purposes. For example, we may need to alter the exception thrown in case of failure–std::bad_alloc–and throw something else:
Don’t be confused by the fact that there is both a new keyword and an operator new. When you write:
MyClass *x = new MyClass;
there are actually two things that happen–memory allocation and object construction; the new keyword is responsible for both. One step in the process is to call operator new in order to allocate memory; the other step is to actually invoke the constructor. Operator new lets you change the memory allocation method, but does not have any responsibility for calling the constructor. That’s the job of the new keyword.
It is possible to invoke the constructor without calling operator new. That’s the job of placement new
For a given class C with a derived class D,
C::operator new(……); // is inheritable by D; implicit static; allocates raw memory
C::operator delete(…); // is inheritable by D; implicit static; deallocates raw memory i.e. return to the “pool”
C(……..); // never inherited; never static; turns allocated raw memory into object
~C(void); // never inherited; never static; turns object into raw memory, to be deallocated.
All of them can be private.
As explained in another blog post, the new expression like “new D()” can’t be overloaded. This expression invokes the operator new and the ctor.
[[effC++]] has a lengthy chapter on how to customize new/delete.
P139 [[understanding and using c pointers]] has a short chapter on how to avoid the overhead of frequent malloc/free operations. Note this is NOT replacing malloc with our own malloc.
P70 [[understanding and using c pointers]] has a one-pager super-simple wrapper over free()
update — Exactly which code module performs the wholesale-retail …? [[linux sys programming]] P256 reveals glibc.
One of the shortest definitions of Operating System is — “software-controlling-access-to-hardware-devices”. OS encapsulates hardware resources and presents a facade to competing userland applications.
The most important hardware resource is memory. For memory, the OS control point is the _allocation_ of heap memory aka free-store, shared among competing applications. Once a particular memory cell is allocated to a user (or application, or process), the OS doesn’t intervene further. The application freely read/write to the memory location.
Q: At run time, when JVM calls new SomeClass(), does the thread go into kernel and executes a system call to grab a chunk of heap memory?
%%A: I would say NO.
Q: But first, let’s look at a malloc() call at run time… does the thread go into kernel and executes a system call to grab a chunk of heap memory?
A: answer is NO.
See the divvy-up “sawing” diagram on P188 [[Michael Daconta]]. It’s too _inefficient_ to make so many “small” syscalls. Instead when you call malloc(4), you call into glibc functions namely malloc() and friends. Synchronously, the library functions call into the kernel (brk()/sbrk() perhaps?) to grab a large chunk wholesale, then gives 4 bytes to you before your malloc() returns. This is one malloc() scenario, the wholesale scenario, when there’s insufficient memory “in the library”.
The more common scenario is the retail scenario — malloc() finds a free block of 4 bytes in one of the “large chunks” grabbed earlier. So malloc() marks those 4 bytes as ALLOCATED and gives you the starting address.
By the way, malloc() typically stores the size requested (i.e. 4) in the header block before returning the address. The size information is needed by free(). Since the header block is in addition to the requested 4 bytes, malloc() _loses_ more than 4 bytes.
http://en.cppreference.com/w/cpp/language/new shows you can hit memory-leak if you simply forget to call delete. Good illustrations to a newbie.
#1
int* p = new int(7); // dynamically allocated int with value 7
p = somethingElse; // memory leak, because the new int object’s address is lost
#2
void f(){
int* p = new int(7);
} // memory leak – forgot to delete
We know c++ keeps retrying when a “new” operation encounters OOM condition (no such exception in c++ actually)
Q: what are the very few (therefore important) ways to break out of the loop?
A1: more memory becomes available and retry succeeds
A1a: new-handler releases a secret “stash” of memory
A1b: install a more resourceful new-handler, which releases some special memory block
A2: new-handler turns out null -> std::bad_alloc -> abort()
A2a: set new-handler to null
A2b: directly throw bad_alloc and pass on the hot potato
A2c: directly call abort()
Suppose a 32-bit pointer p2 contains address 0x1234,and 22==sizeof(Derived)
******* q[base p2 = new Derived()] implements —
– grab 22 bytes from freelist
– initialize by ctor
– return 0x1234 i.e. starting address of the 22 bytes
–> from now on, “system” won’t give those 22 bytes to other malloc requests, since those are ALLOCATED to us.
****** q(delete p2) performs
– call dtor, possibly virtual
– determine size of the (polymorphic) pointee object, possibly via vtbl. Suppose it’s 22.
(The compile-time declared type of p2 doesn’t matter. If we did a Base p2 = new Derived(), then deletion will reclaim all the mem cells of Derived.)
– mark that many bytes starting from Address 0x1234
– return these bytes to freelist
–> Any time from now on, those 22 bytes can be allocated to anyone. Note p2 continues to be seated at 0x1234. The deletion performs no state change in the 32-bit object p2. If you read/write/delete on p2, you die.
Rule — after you call q(delete p2), always fill the 32-bit object p2 with zeros, by calling q(p2 = 0)
Heap memory allocators (java GC is actually among them) make a great effort to avoid (heap) fragmentation. Having many holes in the free store is wasteful and hurts allocation speed, to say the least
The fastest allocation-time algorithm is to simply carve out a piece of real estate from THE “frontier” and ignore any usable holes. Done naively, this could lead to huge waste of usable “hole” real estate – imagine a 100MB array gets freed and becomes a hole but ignored during all subsequent allocations.
The frontier is simply marked by a forward moving pointer (“bump-the-pointer”). In JVM, the popular mark-and-sweep (also mark-compact?) and young-generation coping GC carefully creates such a frontier.
In C, there’s no garbage collector so holes are perhaps inevitable. I think the standard solution is the free-list. This hurts allocation-time. To satisfy a 2KB allocation “request” we must search the free-list for a hole, before trying the frontier.
Both the free-list and the “frontier” solutions entail concurrency control.
http://www.flipcode.com/archives/How_To_Find_Memory_Leaks.shtml is a dated (2000) but readable and detailed treatment. A home-made new/delete overload, using malloc and free rather than qq[[ ::operator new ]] as advised by Scott Meyers.
don’t explicitly clean up a stack object — let system do it.
“clean up” includes both destruct or delete. Don’t explicitly delete a stack pointee or call destructor on a stack object. http://www.cembedded.com/2009/07/c-c-interview-questions-part-6.html says