git hooks: learning notes

— post-commit hook runs after the commit is completed and closed. So I don’t think it can modify or cancel the commit.

— pre-commit hook can’t distinguish a Amend vs a regular Append.

[19] q[extern] idiosyncracies

Despite my numerous summaries, I could never reach the bottom of this keyword.

http://www.goldsborough.me/c/c++/linker/2016/03/30/19-34-25-internal_and_external_linkage_in_c++/ has good details like

int x;          // is the same as
extern int x{}; // which will both likely cause linker errors if placed in header.

extern int x;   // while this only declares the integer, which is ok.

Can C implement basic containers@@

Templates — I think template is unavailable in C, and I don’t know how C could emulate it, except using macros.

C doesn’t support exceptions but luckily very few exceptions in STL.

Most STL container classes have relative few member functions. I think C can implement free functions with “this” as an additional parameter.

Bulk of the basic operations on container are global functions in the form of STL algorithms. I think C can implement them.

Iterators are tricky. In STL I think they are usually nested classes. I think I would just give up. C supports no operator overloading so there’s no point providing an iterator.

reference(instead of ptr) to smart ptr instance

I usually pass smart pointers by value (copy-constructor or move-constructor), just like copying a raw ptr.  Therefore the code below looks unnatural:

unique_ptr<Trade> & ref2smartPtr

Well, my “pbclone” intuition was incorrect.  Actually pbref is rather common because

  • As Herb Sutter suggested, when we need to put pointer into containers, we should avoid raw ptr. Unique ptr is the default choice, and the first choice, followed by shared_ptr
  • I often use unique_ptr as map value . The operator[] return type is a reference to the value type i.e. reference to unque_ptr
  • I may need to put unique_ptr into a vector…. ditto for vector operator[]

pink sheets #learning notes

The pink sheets are a stock quotation service on unlisted stocks.

  • Many are penny stocks, trading for extremely low prices,
  • some are legitimate foreign companies that don’t wish to file reports with the SEC.
  • … There’s less regulation, less transparency, more risk of fraud in these stocks.

OTC Markets Group offers this service.

PinkSheet stocks are considered non-hedgeable in some swap dealer systems. I guess liquidity is too low.

https://www.fool.com/knowledge-center/what-are-the-pink-sheets.aspx is good intro.

std::reference_wrapper usages

std::reference_wrapper is a class template that wraps a reference in a copyable, assignable object. Usages:

  • std::reference_wrapper is primarily used to store references inside standard containers which cannot hold references. In my projects, no one ever stores references in containers. Pointless.
  • std::reference_wrapper is also used to pass objects by reference to the constructor of std::thread, or the helper functions std::make_pair() and std::make_tuple()

Helper functions std::ref and std::cref() are often used to generate std::reference_wrapper objects.

See https://en.cppreference.com/w/cpp/utility/functional/reference_wrapper

–details on the std::thread usage

https://github.com/tiger40490/repo1/blob/cpp1/cpp/thr/takeTurn.cpp shows that without std::ref(), the object passed to std::thread ctor is a clone, so the original object is not updated by the thread routine.

My https://github.com/tiger40490/repo1/blob/cpp1/cpp/thr/functorToThrCtor.cpp uses std::ref() too. Same code also shows another adapter wrapper for a function

https://en.cppreference.com/w/cpp/thread/thread/thread says std::ref() is likely needed.

https://thispointer.com/c11-multithreading-part-3-carefully-pass-arguments-to-threads/ has detailed sample code

Empty base-class optimization

This optimization only zeros out the runtime [1] size of baseclass subobject [2]. All other sizes of empty types are the same — one byte.

Suppose Der subclasses B1 and B2.

Note on [1]:
If B1 is empty, then the compile-time operator sizeof() shows sizeof(B1) == 1 not zero, but at run time, the Der object size shows the B1 sub-object has zero size, due to EBCO.

Note on [2]:
If all of Der, B1 and B2 are empty, we know sizeof(Der) == 1, but how about rutime sizeof(an_Der_instance)? Also one, not zero, because this instance is not a baseclass subobject.

childThr.get_id() after join()

Not sure about pthreads but here is c++11 std::thread::get_id() behavior:
“If the thread object is not joinable, the function returns a default-constructed object of member type thread::id.”
I believe after you join a childThr, that thread is no longer joinable, SO get_id() will return a meaningless boilerplate value.

Solution: to use that id, you need to save it before joining
https://github.com/tiger40490/repo1/blob/cpp1/cpp/thr/takeTurn.cpp is my experiment

pure virtual dtor

pure virtual dtor is a low-value obscure topic in 1) interview 2) zbs. So I won’t spend too much time on it.

https://stackoverflow.com/questions/1219607/why-do-we-need-a-pure-virtual-destructor-in-c addresses the rationale and justification for pure virtual dtor

https://www.geeksforgeeks.org/pure-virtual-destructor-c/ explains this dtor must be defined somewhere or compiler/linker would complain.

unordered_set implementation note

https://www.gamedev.net/forums/topic/582613-c-how-is-unordered_set-implemented/ says

“unordered_set is typically implemented as a linked list that holds all the elements and a hash table stores pointers to the linked list nodes.”, echoed on https://stackoverflow.com/questions/31112852/how-stdunordered-map-is-implemented

LinkedHashMap.java?

contents to keep in .C rather than .H file

1) Opening example — Suppose a constant SSN=123456789 is used in a1.cpp only. It is therefore a “local constant” and should be kept in a1.cpp not some .H file.  Reason?

The .H file may get included in some new .cpp file in the future. So we end up with multiple .cpp files dependent (at compile-time) on this .H file. Any change to the value or name of this SSN constant would require recompilation to not only a1.cpp but unnecessarily to other .cpp files 😦

2) #define and #include directives — should be kept in a1.cpp as much as possible, not .H files. This way, any change to  the directives would only require recompiling a1.cpp.

The pimpl idiom and forward-declaration use similar techniques to speed up recompile.

3) documentation comments — some of these documentations are subject to frequent change. If put in .H then any comment change would trigger recompilation of multiple .cpp files

std::dynamic_pointer_cast on a shared_ptr

Returns a copy (of the given shared_ptr) instantiated in the same “club”.

The returned type must be a derived type of the original… equivalent to a dynamic_cast() on a raw ptr.

http://www.cplusplus.com/reference/memory/dynamic_pointer_cast/ example looks lame as it doesn’t have virtual function, so dynamic_cast() isn’t applicable !

https://github.com/tiger40490/repo1/blob/cpp1/cpp/template/shPtrDownCast.cpp is my own experiment.

[13]arrayName^ptrVar differences tabulated

See other posts why an array name ((including c-str) is a permanent name plate on a permanently allocated room in memory. Once you understand that, you know
– can’t move this name plate to another room
– can’t put this array name on the LHS of assignment

I feel overall, in most everyday contexts you can treat the array name in this example as if it’s a variable whose type is int*. However, the differences lurk in the dark like snakes, and once bitten, you realize there are many many differences. The 2 constructs are fundamentally different.

Q: how to edit the below html table structure (content is easy)?
A: edit the html
A: copy paste to ms-word

ptr to heap array ptr-var array-name (array not on heap)
declaration array-new int* p//allocate 32bit int arr[3]//allocate 3 ints
declared as a struct/class field same as ptr-var 4-bytes onsite entire array embedded onsite
declared as func param var no such thing common converted a ptr-var by compiler
initialize probably not supported char* p=”abc”;// puts the literal string in RO memory char arr[]=”xyz”; //copying the literal string from RO memory to stack/heap where the array is allocated. The new copy is editable.
object passed as func arg same as ptr-var common converted to &arr[0]
dereference same as ptr-var common gives the first int element value. P119[[primer]]
address of same as ptr-var double ptr &arr[0]==&arr==arr. See headfirstC post and also
http://publications.gbdirect.co.uk
/c_book/chapter5
/arrays_and_address_of.html
rebind/reseat same as ptr-var common syntax error
add alias to the pointee same as ptr-var common unsupported
as LHS (not as func param) same as ptr-var common. Reseat syntax error

2 Active connections on 1 TCP server IP/port

This is not the most common design, but have a look at the following output:

remote          local        state
*:*           - 4.3.2.1:5000 LISTENING
1.2.3.4:12345 - 4.3.2.1:5000 CONNECTED
4.5.6.7:83247 - 4.3.2.1:5000 CONNECTED

What needs to be unique, is the 5-tuple (protocol, remote-ip, remote-port, local-ip, local-port)… so this situation can exist. [[tcp/ip sockets in C]] P100 has a full section on this topic.

http://stackoverflow.com/questions/11129212/tcp-two-different-sockets-sharing-a-port also says “Multiple worker-sockets on the same TCP server can share the same server-side IP/Port pair as long as they are associated with different client-side IP/Port pairs”. This “accept, move the connection to a dedicated server socket, then go back to accept()” is the common design — On each incoming connection, the listening TCP server will start a new thread/task/process  using a new “worker” socket on the server side. Note the new worker socket shares server ip:port with the original listening socket

https://github.com/tiger40490/repo1/blob/py1/py/sock/1sock2server.py is my experiment. It pushes the concept of “sharing” further  — two TCP serves sharing a single socket not just a single ip:port endpoint!

read-copy-update lockfree +! retry #RCU

RCU is an advanced concurrency construct, not implemented in a regular application, but valuable for lock-free interviews.

http://concurrencyfreaks.blogspot.sg/2016/09/a-simple-userspace-rcu-in-java.html is a “simple Userspace” java implementation. I didn’t read it in detail and assume it’s rather different from the linux kernel RCU

http://www.modernescpp.com/index.php/aba-a-is-not-the-same-as-a (by a published author) has a brief mention of Userspace RCU solution to ABA problem. Also touches on Garbage Collection.

https://lwn.net/Articles/262464/ — by RCU inventor. I can see RCU is non-trivial !

The Wikipedia article is accessible until the paragraph introducing read-side-critical-section and grace-period. This is the first paragraph on the implementation. I found it hard. Therefore a few background pointers:

  • · There must be multiple threads, typically many readers and few writers.
  • · There must a shared mutable data structure, probably on heap.
  • · Not all data structures are suitable for RCU. So which subset are? I would say pointer-graph including hash tables.

In the simplified model, we are talking about

  • · A reader thread R1 executing a block of code that has started reading the shared mutable data structure. This code block is the critical section
  • · A writer thread W1 executing a block of code attempting to update the same data.
  • · After the so-called grace period, a GC thread would reclaim the obsolete data that R1 has finished reading.
  • · A reader R2 enters the critical section after the update is done. R2 would see the new data

GC need to know when it’s safe to reclaim. It needs the wait-for-readers operation and the grace period.

Q: What if R1 gets a reference to the “obsolete node”, performs some slow task, reads the node, then exits the critical section? This node is reclaimable only after R1 exits critical section. The grace period would be long?

Q: Among 9999 nodes, how does system keep track which each node is reclaimable?
%%A: I feel kernel access to CPU registers might be needed.

Q: How does it compare with copy-on-write?
A: COW probably copies the entire data structure. Also, the modified copy is not visible to subsequent readers (like R2)

Q: How does it compare with read/write lock?
A: readlock can block

Q: How does it relate to lock-free concepts?
A: reader threads are lock-free and even better — not required to retry.
A GC threads need to wait.
A: Writer thread (like W1) ? not sure. Might be lock-free in some situations.

order routing inside an exchange #Nsdq

A major stock exchange’s job spec mentions that the exchange actually runs an internal SOR.

I thought only trading houses maintain SOR. Now I know that when an order (probably a market order) hits a stock exchange NDQ, NDQ may need to route it somewhere else. Not sure if it’s internal or external destination.

My friend Alan said all 13 U.S. exchange receive a live NBBO feed. Based on that, ABC can “forward” the order to the “best” venue.

empty while(true) loop hogging CPU

Someone (barclays?) gave me a basic tuning quiz —

Q: what cpu utilization will you see if a program executes an empty while(1==1){} ?

A: On a 16-core machine, 3 instances of this program each take up 4% of the aggregate CPU according to windows taskmgr. I think 4% means hogging one entire core.

A: on my RedHat linux, the same program has 99.8% CPU usage meaning one entire core.

A: On a dual-core, if I start 2 instances, each takes up about 50% i.e. one entire core, keeping all other processes off both cores. With 3 instances, system becomes visibly slow. Taskmgr itself becomes a bit hard to use and reports about 30%-40% by each instance.

I think this would count as cpu intensive job.

use STL Map to emulate a Set#as JDK does

Q1: is it practical to use a STL Map to emulate the Set class-template, at comparable efficiency?

We know a Map stores PAIRs in a red-black tree. Can we just put an empty thing in the PAIR.second field?

Q2: how small can the “PAIR.second” footprint be? 0 byte? 1 byte? Java is less memory efficiency … where everything is heapy thingy..
%%A: i don’t think it can be 0 byte. sizeof(PAIR) is known when you specialize the PAIR template with T1 and T2. sizeof(PAIR) depends on sizeof(T1) + sizeof(T2). Compiler needs to know sizeof(PAIR) in order to allocate memory to a new PAIR object.

Q3: is there any type T2 such that sizeof(T2) == 0?
%%A: I don’t know any.

Q4: we know (confirmed) java HashSet is implemented using HashMap physically. Is that different from C++?
A: different

Q5: java TreeSet.java source code shows a physical implementation using a map. Is that different from STL?
A: different

std::copy-print array of pointers

Suppose you already have a friend operator<<(ostream&, YourClass &) for YourClass, but you need to print out an array of pointers —

YourClass* array[99]

Here’s a first attempt —

copy(array, array+99, ostream_iterator(cout,” “); // prints the addresses

Simple solution —

Simply overload operator<<(ostream&, YourClass* const) using the existing operator<<

array-name can’t be LHS #almost

An array name myArr (including c-str) looks like a variable but no no no. It is the name plate on a room door.

  • It represents an allocated room of fixed size
  • It is permanently tied to a fixed address
  • You can’t put this name plate on another door later on.
  • (obscure) You can’t put a 2nd c-array name plate on the same room door like q[ cstr2=existingCstr ], either as initialization or assignment on cstr2.
    • you can use q[ char * ptr2=existingCstr ], though ptr2 is not a c-str not an array.

An array-name in source code is like a const ptr (not ptr-to-const) with a value like 0xFF82, i.e. an alias for Address 0xFF82. Such an Address is not an Lvalue. Therefore array-name isn’t an Lvalue. Array-name can’t be LHS —

int aa[3];
int b[3];
aa=b; // error

That’s trivial, but look at this —

f(aa); // f was declared f(int pa[])

When you pass the array-name “aa” to a function, the array-name is effectively on RHS (not breaking the LHS rule). The LHS is the function parameter-variable “pa”, which is always treated as pointer Variable even though you declare the function parameter-variable “pa” as an array-name! Compiler converts it to f(int * pa).

In summary,
* Rule 1 — array name can’t be LHS of an assignment
* Rule 2 — function param-var is effectively the LHS of an assignment. You can declare a function param-var as array type, which seems to violates the 2 rules above, but
* Rule 2b — array param-var is converted to pointer param-var

Now let’s look at c-string, an important special case , since the “=” is initialization, not assignment.

char const * s = “literal”; // compiler turns a blind eye if you remove the “const” but DON’T
char s8[] = “literal”;

Note the above creates an char-array of length 7+1=8 including the null char, but the array below has length 7 only:

char s7[] = {‘l’, ‘i’, ‘t’, ‘e’, ‘r’, ‘a’, ‘l’};

// To modify such a literal string,
char s[] = “literal”;
s[0]=’A’;
cout<<s<<endl;

4 Scopes for operator-overloads #new()

  1. non-static member operators are very common, such as smart ptr operator++(), operator< () in iterators
  2. static member operator new() is sometimes needed. ARM explains why static.
  3. friend operators are fairly common, such as operator<<()
  4. class specific free standing operator is recommended by Sutter/Andrei, to be placed in the same namespace as the target class. Need to understand more. Advanced technique.

fail fast:fundamental+Practical principle #java

Fail-fast is one of those low-level coding habits that deserve its place among the valuable habits to be adopted by practicing app developers in industry. In contrast, library developers (including open-source authors) generally adopt fail-fast by default.

Principle — prefer crashing the entire program. Don’t keep going. Don’t hope the dubious condition will be tolerated. Don’t hope the corrupted data will be left alone and left untouched. Sooner or later what you fear will happen. In such a case it can be very hard to find the root cause. The crash site might be far away from the buggy code.

Example — when dealing with DAM issues, there are specific tools to make the program crash as soon as detected. I think they intercept, replace or integrate with malloc/free.

Example — if null pointer can cause problem then check as early as possible. In java and c#, a lot of error messages simply say null pointer (like “unknown exception”) . It can take hours to find out the real cause.

Example — fail-fast iterators.

pure virtual with/out implementation: popular@IV

Item 44 in [[EffC++]] offers a summary that basically says

1) a pure virtual means only interface is inherited (since parent provides no implementation)
2) a “simple/traditional virtual” means interface plus a default implementation is inherited
3) a non-virtual means interface plus a mandatory implementation is inherited and subclasses are advised to keep the implementation. C++ actually allows subclass to /deviate/ by redefining and hiding the parent implementation. Java has a cleaner solution in the “final” keyword.

I’d like to add

1b) a pure-virtual-with-an-implementation tells the subclass author

“Inherit this interface. By the way I also offer an implementation upon request, but uninheritable.”

This differs from (2). Both are illustrated in Item 36.

Author of a subclass of an abstract base class (featuring pure2()) can choose one of three options:

  • 1. don’t declare pure2() at all.  As the default and most popular usage, the subclass is also abstract (pun intended) by virtual of the inherited pure2().
  • 2. define pure2(), and becoming a non-abstract class
  • … so far, exactly same as java syntax
  • 3. redeclare the same pure2() without implementation — an error. See P215 [[ARM]]

operator << must be a friend function #tested

P 331 [[absolute c++]] gives the standard signatures of operator <<

— sound bytes —
friend — It has to be a friend function. Without “friend” keyword, compiler complains.
free function — required
method — is not possible. For a method, host object must be on the LHS of the operator, but q[ operator<< ] needs cout on LHS

named pipe simple eg #unix socket

$ mkfifo –mode=0666 /tmp/myfifo # create a named pipe, with a file name

$ cat /etc/passwd > /tmp/myfifo # writer will block until some reader comes up on the receiving end.

Now open another terminal

$ cat < /tmp/myinfo # or

$ tail -f /tmp/myfifo

http://developers.sun.com/solaris/articles/named_pipes.html shows pros and cons. http://www.linuxjournal.com/article/2156 is a simple tutorial.

Motivation? Allow totally unrelated programs to communicate with each other

A side note to be elaborated in another blog — name pipe is a FIFO stream, whereas unix domain socket can be data gram or stream (like TCP)

fd_set in select() syscall, learning notes

First thing to really understand in select() is the fd_set. Probably the basis of the java Selector.

A fd_set is a struct holding a bunch[2] of file descriptors. (I don’t think there’s any boolean flag in it). A fd_set instance is used as an in/out parameter to select().
– upon entry, it carries the list of sockets to check
– upon return, it carries the subset of those sockets found “dirty” [1]

FD_SET(fd, fdSet) adds a file descriptor “fd” to fdSet. Used before select().
FD_ISSET(fd, fdSet) checks if fd is part of fdSet. Used after select().

—-
Now we understand fd_set, let’s look at …
First parameter to select() is max_descriptor. File Descriptors are numbered starting at zero, so the max_descriptor parameter must specify a value that is one greater than the largest descriptor number that is to be tested. I see a lot of confusion looking at how programmers populate this parameter.

See http://publib.boulder.ibm.com/infocenter/iseries/v5r3/index.jsp?topic=%2Frzab6%2Frzab6xnonblock.htm

[1] “ready” is the better word
[2] if you don’t have any file descriptor, then you should pass in NULL, not an empty fd_set

select() syscall lesson1: fd_set …

A fd_set instance is a struct (remember — C API) holding a bunch of file descriptors. I don’t think there’s any boolean flag in it.

A fd_set instance is used as an in/out parameter to select(). Only pointer arguments support in/out.
– upon entry, it carries the list of sockets to check
– upon return, it carries the subset of those sockets found “dirty” [1]

FD_SET(fd, fdSet) is a free function (remember — C API) which adds a file descriptor “fd” into a fdSet. Used before select().

FD_ISSET(fd, fdSet) is a free function (remember — C API) which checks if fd is part of fdSet. Used after select().

First parameter to select() is max_descriptor. File Descriptors are numbered starting at zero, so the max_descriptor parameter must specify a value that is one greater than the largest descriptor number that is to be tested.

See http://publib.boulder.ibm.com/infocenter/iseries/v5r3/index.jsp?topic=%2Frzab6%2Frzab6xnonblock.htm

[1] “ready” is the better word

concretized template class≠ordinary class

This topic is largely personal  curiosity, not needed for IV or project.

Many people say “use a concrete templ class just as an ordinary class” but nonono. A whale is a mammal, but not just another ordinary mammal. Concretized classes (i.e. concretized template class) differ from ordinary classes because

* when you declare a variable of the new type, your type must obey the “policy”
* a concrete template class’s behavior is specified in multiple places — the policy classes, the class of each template argument. A regular class’s behavior is specified in that class only. Look at the allocator param in a parametrized container.
* debugger for concrete templ classes is harder than ordinary classes
* casting is complicated
* when you plug in a functor type, the specialized template class’s instance instantiates a functor object. In short, you specify functor TYPE only — instantiation is implicit.
* a type involving templates interplays with ptr/ref in more complex ways than non-template types. (typedef often needed). When you add ptr (or ref) symbol to a non-template type, you just stick the qq(*) in the correct places. But how about qq[ map<char**, list*> & ] as a “first part” inside a parenthesis?

c++ stream format flags and manipulators – key words

bitVector — probably there’s a hidden bitArray of boolean flags for each stream instance. http://www.cplusplus.com/reference/iostream/ios_base/flags/

1-to-1 — one flag for one manipulator function

Concrete eg — http://www.cplusplus.com/reference/iostream/manipulators/showpos/

transformer — a typical manipulator is a function accepting a stream by reference, and returning the same stream object by reference

Thread object as a lock object: myThr.wait()sometimes useful

Someone (HSBC?) asked me what happens if I call myThread.wait().

MyThread thrd = new MyThread();
thrd.start();
synchronized(thrd){
 thrd.wait(); // similar to thrd.join()
}

Using a Thread.java object as a monitor object is legal, but not advisable. Unlike c++, java has no undefined behavior, so the behavior is probably “defined” in the java language spec. I guess the behavior is that main thread waits in the “waiting room” of the thrd object, but who will notify main thread?

My experiment shows that main thread actually gets notified when thrd ends. Before we uncover the mysterious notifier, let’s remember that it’s unusual, confusing and never necessary to use a Thread.java object as a lock. There’s always a simpler alternative.

If you really use a Thread.java object as a lock/conditionVar, there are hidden issues. Here’s one issue I know –

As a condition variable, a Thread.java object has the undocumented behavior of calling this.notifyAll() when it dies. Someone speculated — “In fact this is how the join() method is implemented, if I remember correctly: it waits on the thread object until it dies.” Note join() is an instance method of an Thread.java object. When main thread calls thrd2.join(), it treats thrd2 as a conditionVar and waits on it until another thread calls thrd2.notifyAll(). Notifying thread is actually the thrd2 itself.

user-dict^idict ] python #speculations

Never tested in IV and never needed in projects…

There are various terms for the same concept
– registry — generic concept across languages. A class has a registry and an instance has a registry too.
– namespace — has special meaning in python, but some veterans say the various registries are implemented same as a namespace
– idic i.e. internal dict, but exposed for runtime introspection

2 types of dictionaries in python —

A) user dictionary are the well-understood, simple dictionaries you create
B) “idic” meaning “internal dict” is a building block of python objects, (possibly) including user dictionary objects.

A user dict is usually taught as an object with methods and fields. I hope that’s not over-simplification.

A namespace for a module (or class or anything) is implemented as an idic. Accessed as whatever.__dict__. In a twist for consistency and coherence, this idic can be *used* as a regular user dictionary. You can do whatever.__dict__.keys(). You may rightly feel that a __dict__ is implemented the same as a user dictionary, but I doubt it. It could be implemented as a complicated symbol table but exposed as a simple dict.

Even if a __dict__ (i.e. an idic) is implemented the same way as a user dict, I strongly believe beginners better treat them as 2 distinct types unrelated to each other, though the syntax looks similar.

Internal dict is the basis of python objects[3].  I guess a idic is more like a small expandable hash table [2]. Must be small because a *concrete* idic instance exists in EVERY parent class, subclass, and EVERY instance[1] of EVERY class.

[1] It’s not too wrong to say it resembles the C struct.
[2] a perl object IS nothing but a hashtable or associative array. See the camel book. In php, an object is implemented as an associative array (or simply an “array”).
[3] at least user defined objects