unordered_set implementation note

https://www.gamedev.net/forums/topic/582613-c-how-is-unordered_set-implemented/ says

“unordered_set is typically implemented as a linked list that holds all the elements and a hash table stores pointers to the linked list nodes.”

LinkedHashMap?

Can remember insertion order.

Advertisements

RVO^move on return value

Let’s set the stage. A function returns a local Trade object “myTrade” by value. Will RVO kick in or move-semantic kicks in? Not both!

I had lots of confusions about these 2 features. P23 [[c++stdLib]] has 2 lines

  1. if Trade class has a suitable copy or move ctor, then compiler may choose to “elide the copy”. This was long implemented as RVO optimization in most compilers before c++11. https://github.com/tiger40490/repo1/blob/cpp1/cpp/rvr/moveOnlyType_pbvalue.cpp is my experiment.
  2. otherwise, if Trade class has a move ctor, the myTrade object is robbed

So if condition for RVO is present, then most likely your move-ctor will NOT run.

q[inline] to avoid a.. jump^stackFrame

Dino of BBG FX team asked me — when you mark a small f1()function inline (like manually copying the code into main()), you save yourself a jump or a new stack frame?

It turns out a new stack frame would require a jump, because after the new stack frame is created, thread jumps to the beginning of f1().

However, there’s something to set up before the jump — Suppose f1() is on Line 5 in main(), then Line 6’s address has to be saved to CPU register, or the thread has no idea where to jump back after returning from f1(). According to my codebashing training, this Line 6’s address is saved in the old stack frame, not the f1() stack frame!

Note the Line 6’s address is not a heap address not a stack address but an address in the code area.

concurrent lazy singleton using static-local var

https://stackoverflow.com/questions/26013650/threadsafe-lazy-initialization-static-vs-stdcall-once-vs-double-checked-locki has 12 upvotes and looks substantiated. It also considers double-checking, std::call_once, atomics, CAS…

GCC uses platform-specific tricks to ensure a static local variable is initialized only once, on first use. 

If it works, this is the easiest solution.

As explained in another blog post, static local is a shared mutable.

pbclone large obj(eg:vector)rely`@move

This is impressive in QQ interviews + coding questions

GotW #90 Solution: Factories

has a good illustration of move semantics put to good use.

  • Before c++11, a function returning a large vector (or any large object) by value incurs expensive deep copying of all vector elements.
  • With c++11 move features added to std::vector class, returning a vector by value is cheap and recommended.
  • RVO may kick in but (i feel) less reliable than move semantic. For the specific rules see RVO^move-semantics

housekeeping^payload fields: vector,string,shared_ptr

See also std::string/vector are on heap; reserve() to avoid re-allocation

std::vector — payload is an array on heap. Housekeeping fields hold things like size, capacity, pointer to the array. These fields are allocated either on stack or heap or global area depending on your variable declaration.

  • Most STL (and boost) containers are similar to vector in terms of memory allocation
  • std::string — payload is a char-array on heap, so it can expand both ways. Housekeeping data includes size…
  • shared_ptr — payload includes a ref counter and a raw-pointer object [1] on heap. This is the control-block shared by all “club members”. There’s still some housekeeping data (pointer to the control block), typically allocated on stack if you declare the shared_ptr object on stack and then use RAII.

If you use “new vector” or “new std::string”, then the housekeeping data will also live on stack, but I find this practice less common.

[1] this is a 32-byte pointer object, not a pure address. See 3 meanings of POINTER + tip on q(delete this)

memset: a practical usage #Gregory

  • memset is a low-level C function.
  • memset takes a void pointer.
  • Fast and simple way to zero out an array of struct, having primitive data members. No std::string please. No ptr please. Use sizeof to get the byte count.
  • Useful in low level wire coding
// illustrates packed and memset
#include <iostream>
using namespace std;

struct A{
  unsigned int i1; //4 bytes
  bool b; //1 byte
  char cstr[2];
  int* ptr; //8 bytes
} __attribute__((packed));
size_t const cnt = 3;
A arr[cnt];
int main(){
  cout<<sizeof(A)<<endl;
  size_t sz = sizeof(arr);
  cout<<sz<<endl;
  memset(arr, 0, sz);
  for(size_t i=0; i<cnt; ++i){
    A* tmp = &arr[i];
    cout<<"i1 = "<<tmp->i1<<"; b = "<<tmp->b<<" ; cstr[1] = "<<(int)tmp->cstr[1]<<" ptr = "<<tmp->ptr<<endl;
  }
}

memory leak demo; memset(); std::string

valgrind proves the leak.

using namespace std;
size_t const len=15;
struct A{
        string s4;
        A(string s="default std::string"): s4(s){ cout<<"ctor"<<endl; }
};
size_t const  cnt=2;
size_t siz=cnt * sizeof(A);
A arr[cnt], ar2[cnt];

char fname[] = "/tmp/,.dat";
void leakDemo1() {
        {
                arr[0]=A("grin");
                arr[1]=A("frown"); //somehow skipping this causes core dump
        }
        cout<<"before write()"<<endl;

        int fd = open(fname, O_CREAT | O_WRONLY, S_IRUSR | S_IWUSR);
        write(fd, arr, siz);
        close(fd);

        if(1){
                int fd2 = open(fname, O_RDONLY);
                read(fd2, ar2, siz);
                close(fd2);
        }
        for (int idx = 0; idx < cnt; ++idx){
                A * tmp = ar2 + idx;
                cout<<tmp->s4<<endl;
        }
}
void leakDemo2(){
        int * intp = new int(11); //not deleted. valgrind detected the leak
        memset(&intp, 0, 8); //overwrite the 8-byte pointer object
        delete intp;  //deleting on the fake address from memset
}
void leakDemo3(){
        string s="hello";
        cout<<"sie of string == size of pointer = " << sizeof(string)<<endl; //inside the string object, there's nothing but a poitner!
        memset(&s, 0, 1); // overwite pointer object itself, so it now points to somewhere else ... leak

        //somehow, memset(....,2) causes seg fault?
}
int main() {
        leakDemo1();
}

c++variables: !! always objects

Every variable that holds data is an object. Objects are created either with static duration (sometimes by defining rather than declaring the variable), with automatic duration (declaration alone) or with dynamic duration via malloc().

That’s the short version. Here’s the long version:

  • heap objects — have no name, no host variable, no door plate. They only have addresses. The address could be saved in a “pointer-object”, which is a can of worm.
    • In many cases, this heap address is passed around without any pointer object.
  • stack variables (including function parameters) — each stack object has a name (multiple possible?) i.e. the host variable name, like a door plate on the memory location. This memory is allocated when the stack frame is created. When you clone a stack variable you get a cloned object.
    • Advanced — You could create a reference to the stack object, when you pass the host variable by-reference into another function. However, you should never return a stack variable by reference.
  • static Locals — the name myStaticLocal is a door plate on the memory location. This memory is allocated the first time this function is run. You can return a reference to myStaticLocal.
  • file-scope static objects — memory is allocated at program initialization, but if you have many of them scattered in many files, their order of initialization is unpredictable. The name myStaticVar is a door plate on that memory location, but this name is visible only within this /compilation-unit/. If you declare and define it (in one step!) in a shared header file (bad practice) then you get multiple instances of it:(
  • extern static objects — Again, you declare and define it in one step, in one file — ODR. All other compilation units  would need an extern declaration. An extern declaration doesn’t define storage for this object:)
  • static fields — are tricky. The variable name is there after you declare it, but it is a door plate without a door. It only becomes a door plate on a storage location when you allocate storage i.e. create the object by “defining” the host variable. There’s also one-definition-rule (ODR) for static fields, so you first declare the field without defining it, then you define it elsewhere. See https://bintanvictor.wordpress.com/2017/05/30/declared-but-undefined-variable-in-c/

Note: thread_local is a fourth storage duration, after 1) dynamic, 2) automatic and 3) static