standard SQL to support pagination #Indeed

Context — In the Indeed SDI, each company page shows the latest “reviews” or “articles” submitted by members. When you scroll down (or hit Next10 button) you will see the next most recent articles.

Q: What standard SQL query can support pagination? Suppose each record is an “article” and page size is 10 articles.

I will assume each article has an auto-increment id (easier than a unique timestamp) maintained in the database table. This id enables the “seek”” method.  First page (usually latest 10 articles) is sent to browser.  Then the “fetch-next” command from browser would contain the last id fetched. When this command hits the server, should we return the next 10 articles after that id (AA), or (BB) should we check the latest articles again, and skip first 10? I prefer AA. BB can wait until user refreshes the web page.

The SQL-2008 industry standard supports both (XX) top-N feature and (YY) offset feature, but for several reasons [1], only XX is recommended :

select * from Articles where id < lastFetchedId fetch first 10

[1] http://www.use-the-index-luke.com clearly explains that the “seek” method is superior to the “offset” method. The BB scenario above is one confusing scenario affecting the offset method. Performance is also problematic when offset value is high. Fetching the 900,000th page is roughly 900,000 times slower than the first page.

Advertisements

convert 512bit int to bit-vector #python=best

In coding questions, int-to-bit_vector conversion is common. I now think python offers a simple clean solution, applicable on integers of any size:)

'{0:032b} <-32|8-> {0:08b}'.format(myInt) #convert to a 32-element list, simultaneously to an 8-element
  • first zero refers to argument #0 (we have only one argument).
  • The zero after colon means zero-padding
  • The last ‘b’ means binary
  • Don’t forget your (dental) braces 🙂

Note this is the str.format() method, not the global bulitin function format()

Return value is a python string, but easily convertible to a vector

What if the number is too large? My {0:08b}  example shows that if the int is above 255, then python does the safe thing — extending the output beyond 8-element, rather than truncating 🙂

To convert bit-vector to an int, use int(bitArr , 2) # 2 is the base i.e. binary

pass generator output to next generator

I think this technique can be extremely time-saving in coding tests.

https://github.com/tiger40490/repo1/blob/py1/py/algo_combo_perm/1fromEachSet.py my code demos:

for myset in pool:     output = list(gen(output, myset))

The gen() function uses yield. For the first call to gen(), we exhaust all of its items and save into a list named “output”.

Then we pass this list into the second gen(), this time with a different myset

Realistically ideal next SG job: neglected factors

Essential factors discussed elsewhere: Reasonable coworker benchmark (within my grasp); Reasonable boss, salary, commute…

Here are Some of the neglected factors:

  • Ideally: enough spare time in office —- for blogging and tech exploration, AFTER I clear the initial hump.
    • wordpress access —- may be a temptation for distraction
  • Mainstream and low-churn tech —- like java, c++, py. In contrast GO, javascript, .. would create discontent and other negativity in me.
  • ideally: Mainstream dnlg —- to keep me engaged for a few months. Hopefully some math some low-level complexities
  • too much monotonous, mindless work, could get me disengaged intellectually. Rare. Any past examples? OC approval-seeking
  • gym in office —- could really make a difference
  • culture of knowledge sharing
  • respect for attention to details, but MY type of details (vague)

checked STL^checked java Collections

jdk checkedList, checkedMap etc are designed to check type errors — checking any newly added item has the correct type for the collection. See P 246 [[java generics]]

STL checked container checks very different coding errors. See http://www.informit.com/articles/article.aspx?p=373341, which is extracted from my book [[c++codingStd]]

## GP+Ashish: 随遇而安@PIP #GS

grandpa’s advice is 随遇而安 — “Do your best. If they decide it’s a role mismatch then look for another job”. I will expand on his advice and add relevant tips and observations

  • academic self-image .. fragile — Ashish pointed out I was academically too successful and unable to cope with put-downs
  • best effort — I don’t need to bend over backward and sacrifice family
  • no shame
  • no fear of stigma — sounds impossible but it is possible !
  • no regret
  • guilt — the guilt should be on employer for making a wrong hire and creating hardship in my life.
  • stay positive — there’s a chance I can survive for 1-2 years
  • peer caliber — Ashish said those guys aren’t rock stars
  • Saurabh attitude — I believe at a high salary or as the first technology hire for Julian, expectation would be rather high. Can I withstand the pressure as Saurabh did?
  • GS pressure cooker — I survived there, so I should be able to survive anywhere else.
  • learning to cope — At GS/Qz/Macq, did I learn coping strategies to manage the pressure? I hope so.

The pressure to perform would likely create real stress in the family, as i’m not as ‘carefree’ as in Bayonne. I feel some of the past stigmas would come back to haunt me.

See also choose job for respect,stigma,benchmark

 

TCP_NODELAY to improve latency #unclear

https://stackoverflow.com/questions/3761276/when-should-i-use-tcp-nodelay-and-when-tcp-cork

Nagle’s algo helps in applications like telnet. However, it may increase latency when sending streaming data. Additionally, if the receiver implements the ‘delayed ACK policy’, it will cause a temporary deadlock situation. In such cases, disabling Nagle’s algorithm is a better option.

recv()/send() with timeout #CSY

I was right — timeout is supported Not only in select()/poll(), but also read/recv/send..

SO_RCVTIMEO and SO_SNDTIMEO

Timeouts only have effect for system calls that perform socket I/O (e.g., read(2), recvmsg(2), send(2), sendmsg(2)); timeouts have no effect for select(2), poll(2), epoll_wait(2), and so on. These latter functions only check socket status… no I/O.

I believe this socket option has no effect if used on non-blocking sockets . Nonblocking socket can be created either

  • O_NONBLOCK in fcntl() or
  • SOCK_NONBLOCK in socket()

hardware mutex, based@XCHG instruction

[[OS by William Stallings]] P214 is concise and clear.

The atomic XCHG instruction on Intel processors can swap a local variable (held in cpu register) with a main memory location visible to all threads.

This XCHG instruction alone can implement a simple mutex usable by multiple threads on multiple processors sharing the same main memory.

Another atomic instruction, atomic test-n-set instruction, can also by itself implement mutex.

Both mutexes have only academic value because their Limitations are severe:

  • even with a single mutex, deadlock is possible at the hardware level — ThrA in critical section can get interrupted and preempted — perfectly normal. Now ThrA has to wait for high-priority ThrH to complete, but ThrH is busy-waiting (wheel-spinning) to enter the same critical section 😦
    • Note in this example there’s no “lock object” per-se, only a critical section in code !
  • by default, unsuccessful threads like ThrH are stuck in busy-wait — wasting CPU and generating heat 😉

 

STL containers +! pointer

  1. std::array
  2. std::string SSO i.e. small-string optimization

These containers

  1. heap storare is not required
    • trivial special case — if this container is a nonref field of a Shape object, then q(new Shape) would still allocate the container on heap.
    • This is similar to “Are java primitives on heap/stack?”
  2. Therefore contains no pointer.
  3. Therefore move ctor runs in linear [1] time compared to constant-time move-ctor in other STL containers
    • std::array probably offers a move-ctor and move-assignment

[1] P205 [[effModernC++]] also addresses

Q: What happens when you move a std::array container with 77 Acct elements?
A: 77 individual Acct objects are moved to the new container, in linear time, invoking the move-ctor of Acct class

anon classes^lambda: java perf

Based on [[JavaPerm]] P381

  • An anon class requires an actual *.class file created by javac compiler and loaded from serialized form (usually disk).
  • No such class file for a lambda.

More than half the interview questions are about fancy theoretical knowledge, so this is knowledge valuable to interviews.

This difference has very limited performance impact, as of java 8. This performance perspective is unimportant to interviews, IMHO.java

 

versioned-queue problem

I think this problem is mostly about data structure, not algorithm.

Q: Design and implement a Version-Queue. A Version-Queue maintains a version number along with normal Queue functionality. Every version is a snapshot of the entire queue. Every operation[Enqueue/Dequeue] on the Queue increments its version.

Implement the following functions:

1. Enqueue – appends an element at the end of the queue.
2. Dequeue – returns the top element of the queue.
3. Print – it takes a version number as input and prints the elements of the queue of the given version. The version number input can also be an old/historical version number.

E.g. if the current version number of the queue is 7 and the input to this function is 5, then it should print the elements of the queue when its version number was 5.

For simplicity, assume the elements are integers.

We expect you to write a helper program to test the above data structure which will read input from stdin and prints output to stdout.

Input format:
First line should have an integer n (number of operations). This should be followed by n number of lines, each denoting one operation.
e.g.
6
e 1
e 4
d
e 5
p 2
p 4

‘e’ stands for enqueue

— My design —
In addition to the regular queue data structure, we need a few helper data structures.

All current + historical queue elements are saved as individual elements in a “Snapshot” vector, in arrival order. This vector never decreases in size even during dequeue. Two pointers represent the trailing edge (dequeue) and leading edge (enqueue).

(minor implementation detail — Since it’s a vector, the pointers can be implemented as 2 integer index variables. Pointers and iterators get invalidated by automatic resizing.)

Every enqueue operation increments the right pointer to the right;
Every dequeue operation Increments the left pointer to the Right;
(No decrement on the pointers.)

With this vector, I think we can reconstruct each snapshot in history.

Every pointer increment is recorded in a Version table, which is a vector of Version objects {Left pointer, Right pointer}. For example, if Version 782 has {L=22, R=55} then the snapshot #782 is the sub-array from Index 22 to 55.

Additional space costs:
O(K) where K = number of operations

Additional time costs:
Enqueue operation — O(1). Insert at end of the Snapshot vector and Version vector
Dequeue operation — O(1). Move a pointer + insert into Version vector Print operation — O(M) where M = maximum queue capacity

Minor point — To avoid vector automatic resize, we need to estimate in advance the limit on K i.e. number of operations. If you tell me you get millions of operations a microsecond, then over a day there
could be trillions of “versions” so the Versions vector and the Snapshot vector need sufficient initial capacity.

Note we could get 9000 enqueue operations in a row.

master The yield-generator

https://github.com/tiger40490/repo1/tree/py1/py/algo_combo_perm uses python q[ yield ] to implement classic generators of

  • combinations
  • permutations
  • abbreviations
  • redraw

Key features and reasons why we should try to memorize it

  1. very brief code, not too dense (cryptic), .. helps us remember.
  2. reliability — brief means fewer hiding place for a bug. I added automated tests for basic quality assurance. Published solution
  3. versatile — one single function to support multiple classic generators.
  4. yield functions’ suspend-resume feature is particular powerful and possibly a power drill. This is my first opportunity to learn it.
  5. instrumentation — relatively easy to add prints to see what’s going on.
  6. halo — yield-based generator functions are a halo and also a time-saver
  7. elegant — brief, simple, relatively easy to understand
  8. recursive — calling itself recursively in a loop therefore fairly brief but powerful
  9. useful in take-home assignments
  10. identity-aware — The second time you call myYieldFunc(44), it would usually return the same generator object as the first time you call it with that same argument (i.e. 44). If that generator object has reached end of it’s execution, then you won’t get any more output.

— How I might try to memorize it

  1. If needed, we just need to reproduce it quickly from memory a few times
  2. I added comments to help me understand and memorize it

const vector{string} won’t let you modify strings #ChengShi

Point #1: a const vector only gives out const references. In this case, reference to const string objects.

My friend ChengShi said he has never seen const vector<const string>, because const vector<string> is good.

Point #2: In fact, my experiment of q[vector<const string>] can’t compile. Why?

https://stackoverflow.com/questions/17313062/vector-of-const-objects-giving-compile-error explains that in vector<T>, T must be assignable!

reference(instead of ptr) to smart ptr instance

I usually pass smart pointers by value (copy-constructor or move-constructor), just like copying a raw ptr.  Therefore the code below looks unnatural:

unique_ptr<Trade> & ref2smartPtr

Actually it is rather common because

  • As Herb Sutter suggested, when we need to put pointer into containers, we should avoid raw ptr. Unique ptr is the default choice, and the first choice, followed by shared_ptr
  • I often use unique_ptr as map value . The operator[] return type is a reference to the value type i.e. reference to unque_ptr
  • I may also put unique_ptr into a vector…. ditto for vector operator[]

std::reference_wrapper usages

std::reference_wrapper is a class template that wraps a reference in a copyable, assignable object. Usages:

  • std::reference_wrapper is primarily used to store references inside standard containers which cannot hold references. In my projects, no one ever stores references in containers. Pointless.
  • std::reference_wrapper is also used to pass objects by reference to the constructor of std::thread, or the helper functions std::make_pair() and std::make_tuple()

Helper functions std::ref and std::cref() are often used to generate std::reference_wrapper objects.

See https://en.cppreference.com/w/cpp/utility/functional/reference_wrapper

–details on the std::thread usage

https://github.com/tiger40490/repo1/blob/cpp1/cpp/thr/takeTurn.cpp shows that without std::ref(), the object passed to std::thread ctor is a clone, so the original object is not updated by the thread routine.

My https://github.com/tiger40490/repo1/blob/cpp1/cpp/thr/functorToThrCtor.cpp uses std::ref() too. Same code also shows another adapter wrapper for a function

https://en.cppreference.com/w/cpp/thread/thread/thread says std::ref() is likely needed.

https://thispointer.com/c11-multithreading-part-3-carefully-pass-arguments-to-threads/ has detailed sample code

blockingMutex implementation ] kernel!!userland

Background — Linux kernel provides two types of locks — spinlock and blocking mutex, as in https://www.kernel.org/doc/htmldocs/kernel-locking/locks.html . Here I focus on the mutex. I think this is far more useful to userland applications.

https://lwn.net/Articles/575460/ has good pointers:

  • I believe context switch is expensive since CPU cache has to be replaced. Therefore, optimistic spin is beneficial.

https://github.com/torvalds/linux/blob/master/kernel/locking/mutex.c shows

  • a blocking mutex used in kernel, perhaps not directly used by userland apps
  • implemented using spin lock + some wait_lock
  • maintains a wait_list. Not visible to any userland app.

 

translation lookaside buffer #stats

  • a.k.a Address Translation Cache. The TLB lets the processor very quickly convert virtual addresses to physical addresses.
  • TLB is a cache for the big, slow page table
  • A typical entry in TLB is a pair of {virtual -> physical addresses}. In contrast,
  • A typical entry in a L1 cache is mapping of {physical address -> payload}.
  • You can hit both caches!
  • Both caches sits between processor and main memory
  • each hardware system has one or more TLBs
  • TLB-miss can be handled in hardware or kernel
  • typical miss probability — 0.01% to 1%
  • typical miss latency (penalty) — 10 to 100 clock cycles to read the page table
  • typical hit latency: 0.5 to 1 clock cycle

If a TLB hit takes 1 clock cycle, a miss takes 30 clock cycles, and the miss rate is 1%, the effective memory cycle rate is an average of 1 × 0.99 + (1 + 30) × 0.01 = 1.30 i.e. 1.30 clock cycles per memory access.

https://stackoverflow.com/questions/5440128/thread-context-switch-vs-process-context-switch says TLB favors thread rather than process context-switch. TLB gets flushed during process rather than thread context-switch.

 

accept()+select() : multiple persistent worker-sockets

I feel it is not that common. See https://stackoverflow.com/questions/3444729/using-accept-and-select-at-the-same-time is very relevant.

The naive design — a polling-thread (select/poll) to monitor new data on 2 worker-sockets + accept-thread to accept on the listening socket. The accept-thread must inform the polling thread after a worker-socket is born.

The proposed design —

  1. a single polling thread to watch two existing worker sockets W1/W2 + listening socket LL. select() or poll() would block.
  2. When LL is seen “ready”, select() returns, so the same thread will run accept() on LL and immediately get a 3rd worker-socket W3. No blocking:)
  3. process the data on the new W3 socket
  4. go back to select() on W1 W2 W3 LL
  • Note if any worker socket has data our polling thread must process it quickly. If any worker socket is hogging the polling thread, then we need another thread to offload the work.
  • Note all worker sockets, by definition, have identical local (i.e. server-side) port, since they all inherit the local port from LL.

[[tcp/ip socket programming in C]] shows a select() example with multiple server ports.

 

bbg: allocate half the players to NY^SF

Always good to be concrete, so without loss of generality, let’s say J=99 i.e. 198 players.

Q1: Given 2J players, we need to allocate exactly half of them to NY and the rest to San Francisco, to form two teams for a inter-city contest. We receive the NY fare and SF fare for each player, as 2J (i.e. 198) pairs of integers. How can we minimize the total fares?

Q2(bonus): what if 2J+1 players, and you can decide which city gets an extra player?

— Analysis: how many possible allocations? 2J-choose-J. Very large number of allocations. Something like O(J!) so brute force is impractical.

If for any player the two fares both go up by $9800, it doesn’t change our algorithm at all. Therefore, we only care about the fare difference (N-S) for each player.

— solution: I will assume most players live near SF so SF fares are lower. I tabulate and sort the 198 fare differences “NY fare – SF fare” and suppose indeed at least 99 (half) are positive[1]. Therefore, my “base” is SF i.e. my base allocation is everyone-allocated-to-SF. Next I must pick 99 (half) of them and allocate to NY.

I will avoid the higher values in the table.

I simply pick the lower 99 (half) in the table, because these are the 99 “extra” costs I will incur. I want to minimize the total of these 99 values, whether they are mostly positive or mostly negative.

  • Note the highest number in the table indicates the most expensive player for NY relative to SF. If this number is negative, then SF is more expensive than NY for All players so follow the rule in [1] but notice he is the least expensive players for SF relative to NY. Regardless of positive/negative sign, we want to keep this person in SF .
  • Note the lowest number in the table indicates the least expensive player for NY relative to SF. If this number is negative, then SF is more expensive than NY for this player — normal. Regardless, we want to allocate her to NY.

[1] if proven otherwise (by the data), then we could easily treat NY as base to keep the solution intuitive. Actually even if we don’t change the base, the algorithm still works, albeit unintuitively.

A2: Say J=99 so total 199 players We already picked 99 for NY. Do we pick one more for NY or keep remaining 100 in SF?

Just look at the next lowest value in the table. If positive, then NY is more expensive for him, so keep him in SF.

## notable linux system calls: QQ question

https://syscalls.kernelgrok.com can sort the functions by function id

http://asm.sourceforge.net/syscall.html is ordered by function id

  • fork()
  • waitpid()
  • open() close() read() write()
  • –socket
  • socket() connect() accept()
  • recvfrom() sendto()
  • shutdown() is for socket shutdown and is more granular than the generic close()
  • select()
  • epoll family
  • –memory
  • brk
  • mmap

## G5 algoIV constructs x-lang #dataStruct++

  1. elegant, legit simplifications
  2. Judicious use of global -vs- argument -vs- local variables
  3. iterator and  Pointer in node/VO class
  4. Construct tree, hashmap or Sorted array as auxiliary, or for memoisation
  5. Recursion
  6. sentinel node trick: array/slists

The tools below are more specialized and less widely relevant

  • Graph traversal, recursive or iterative..
  • Permutations etc, recursive or iterative ..
  • Std::lower_bound etc
  • sorting

The topics below are Not really for a coding test:

  • DP
  • concurrency

Q: Just when do App (!! lib) devs write std::move()

I feel move ctor (and move-assignment) is extremely implicit and “in-the-fabric”. I don’t know of any common function with a rvr parameter. Such a function is usually in some library, but I don’t know any std lib function like that. Consequently, in my projects I have not seen any user-level code that shows “std::move(…)”

Let’s look at move ctor. “In the fabric” means it’s mostly rather implicit i.e. invisible. Most of the time move ctor is picked by compiler based on some rules, and I have basically no influence over it.

https://github.com/tiger40490/repo1/blob/cpp1/cpp1/rvrDemo.cpp shows when I need to call move() but it’s a contrived example — I have some object (holding a resource via heap pointer), I use it once then I don’t need it any more, so I “move” its resource into a container and abandon the crippled object.

Conclusion — as app developers I seldom write code using std::move.

  • P20 [[c++ std lib] shows myCollection.insert(std::move(x)); // where x is a local nonref variable, not a heap pointer!
    • in this case, we should provide a wrapper function over std::move() named getRobberAliasOf()
    • I think you do this only if x has part of its internal storage allocated on heap, and only if the type X has a move ctor.

I bet that most of the time when an app developer writes “move(…)”, she doesn’t know if the move ctor will actually get picked by compiler. Verification needed.

–Here’s one contrived example of app developer writing std::move:

string myStr=input;
vectorOfString.push_back(std::move(myStr)); //we promise to compiler we won’t use myStr any more.

Without std::move, a copy of myStr is constructed in the vector. I call this a contrived example because

  • if input is a char-array, then emplace_back() is more efficient
  • if input is another temp string, then we can simply use push_back(input)

after fork(): threads,sockets.. #Trex

I have read about fork() many times without knowing these details, until Trex interviewer asked !

–based on http://man7.org/linux/man-pages/man2/fork.2.html

The child process is created with a single thread—the one that called fork(). The entire virtual address space of the parent is replicated in the new process, including the states of pthread mutexes, pthread condition variables, and other pthreads objects In particular, if in parent process a lock was held by some other thread t2, then child process only has the main thread (which called fork()) and no t2 but the lock is still unavailable. This is a common problem, addressed in http://poincare.matf.bg.ac.rs/~ivana/courses/ps/sistemi_knjige/pomocno/apue/APUE/0201433079/ch12lev1sec9.html.

The very 1st instruction executed in Child is the instruction after fork() — as proven in https://github.com/tiger40490/repo1/blob/cpp1/cpp1/fork3times.cpp

The child inherits copies of the parent’s set of open file descriptors, including stdin/stdout/stderr. Child process should usually close them.

Special case — socket file descriptor inherited. See https://bintanvictor.wordpress.com/2017/04/29/socket-shared-between-2-processes/

 

ADL #namespace

  • “Put the function in the same namespace as the classes it operates on.” is a one-liner summary
  • If you want to write a function that needs only a class’s public interface – then that function doesn’t have to be a (static/non-static) member. The function can become a free function placed in the same name space as the class. This increases encapsulation and data hiding, and reduces coupling as only the public interface is needed.. P79 [[c++codingStd]]
  • It’s also an idiom/pattern to impress interviewers. The published experts all seem to view ADL as a library feature mainly for overloaded operators. Read cppreference and [[c++codingStd]]. I’m not interested in that usage.
  • boost seems to use a lot of ADL
  • I feel namespace loves ADL more than any other c++ feature 🙂
  • ADL is an endorsed, prized compiler feature but still with criticisms [1]. To eliminate the confusion, simply fully qualify the function call.

[1] https://stackoverflow.com/questions/8111677/what-is-argument-dependent-lookup-aka-adl-or-koenig-lookup has the best quickguide with simple examples.

https://softwareengineering.stackexchange.com/questions/274306/free-standing-functions-in-global-namespace is a short, readable discussion.

My annotations on the formal, long-winded definition — ADL governs look-up of unqualified function names in function-call expressions (including operator calls). These function names are looked up in the namespaces of their arguments in addition to the usual namespace-based lookup.

STL: few exceptions #no virtual

side note — virtual function is also very rare in STL classes. I don’t know any.


STL functions seldom throw exception. See P 248 [[c++standard library]]

  1. at() method of vector/string/map… throws, since it’s the “checked” version of the subscript operator
  2. reserve() method throws

Besides these two unusual functions, STL would only throw the standard low-level exceptions like memory allocation failures. All other error conditions are undefined-behavior, such as

  • top()/pop() on priority queue
  • std::string operator[index] with invalid index

However, Payload data types going into STL container can create exceptions:

  • If a payload class ctor/assignment throws, then it would propagate.
  • Payload destructors should never throw. [[c++coding standard]] says it’s forbidden by STL standard.

pbclone large obj(eg:vector)rely`@move

This is impressive in QQ interviews + coding questions

GotW #90 Solution: Factories

has a good illustration of move semantics put to good use.

  • Before c++11, a function returning a large vector (or any large object) by value incurs expensive deep copying of all vector elements.
  • With c++11 move features added to std::vector class, returning a vector by value is cheap and recommended.
  • RVO may kick in but (i feel) less reliable than move semantic. For the specific rules see RVO^move-semantics

discover cpu count ] my machine #lscpu

NIC — There’s another blog post https://bintanvictor.wordpress.com/2018/03/19/discover-nic-my-machine-lspci/

$ lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 16
On-line CPU(s) list: 0-15
Thread(s) per core: 1
Core(s) per socket: 8
CPU socket(s): 2
NUMA node(s): 2
L1d cache: 32K
L1i cache: 32K
L2 cache: 256K
L3 cache: 20480K


$ less /proc/cpuinfo|egrep 'core id|processor'  # on the same machine shows

16 “processors”, but #0 and #1 have the same core-id! Total 8 core-id values. I think this is because in each (of two) socket, there are 8 cores with unique core-id values.

dmesg | grep rocessor # doesn’t work any more

dxdiag # windows

dxdiag /t c:\deldel.txt
find “CPU” c:\deldel.txt # copy-paste may not work

2obj files compiled@different c++toolchains can link@@

(many interviewers asked…)

Most common situation — two static libs pre-compiled on toolchain A and B, then linked. Usually we just try our luck. If not working, then we compile all source files on the same toolchain.

Toolchain A and B could differ by version, or compiler brand, or c vs c++ … I guess there’s an Application Binary Interface between different toolchains.

https://gcc.gnu.org/onlinedocs/libstdc++/manual/using_dual_abi.html says that it’s possible (“straightforward”) to link C++03 and C++11 code together.

##fastest container choices: array of POD #or pre-sized vector

relevant to low-latency market data.

  • raw array is “lean and mean” — the most memory efficient; vector is very close, but we need to avoid reallocation
  • std::array is less popular but should offer similar performance to vector
  • all other containers are slower, with bigger footprint
  • For high-performance, avoid container of node/pointer — Cache affinity loves contiguous memory. After accessing 1st element, then accessing 2nd element is likely a cache-hit
    • set/map, linked list suffer the same

demo: static method declare/define separately n inherited

Low complexity in this demo, but if you lack a firm grip on the important details here, they will add to the complexity in a bigger code base.

  • When subclass is compiled, compiler complains about undefined sf() since it sees only the declaration. You need “g++ der.cpp base.cpp”.
  • Note the static method is inherited automatically, so you could call der::sf().
#include <iostream>
struct base{
  static void sf();
};
///////// end of header /////////
#include "base.h"
using namespace std;
void base::sf(){ // no "static" please
  cout<<"sf"<<endl;
}
///////// end of base class /////////
#include "base.h"
using namespace std;
struct der: public base{};
int main(){
  der::sf();
}
///////// end of sub class /////////

##shared_ptr thr-safety: 3 cases

This topic is worthwhile as it is about two high-value topics … threading + smart ptr

At least 3 interviewers pointed out thread safety issues …

http://stackoverflow.com/questions/14482830/stdshared-ptr-thread-safety first answer shows many correct MT usages and incorrect MT usages. Looking at that answer, I see at least 3 distinct”objects” that could be “shared-mutable”:

  1. control block — shared by different club members. Any one club member, like global_instance could be a shared mutable object. Concurrent access to the control block is internally managed by the shared_ptr implementation and probably thread-safe.
  2. pointee on heap — is shared  mutable. If 2 threads call mutator methods on this object, you can hit race condition.
  3. global_instance variable —  a shared mutable instance of shared_ptr. race condition 😦

 

 

 

some Technicalities for QnA IV

We often call these “obscure details”. At the same level, these are a small subset of a large amount of details, so we can’t possibly remember them all 😦

Surprisingly, interviewers show certain patterns when picking which technicality to ask. Perhaps these “special” items aren’t at the same level as the thousands of other obscure details??

These topics are typical of QQ i.e. tough topics for IV only, not tough topics for GTD.

  • archetypical: which socket syscalls are blocking and when
  • $LD_LIBRARY_PATH
  • hash table theoretical details? too theoretical to be part of this discussion
  • select() syscall details
  • vptr, vtable

 

c++base class method accessing subclass field #RTS IV

Surprisingly, the non-virtual base class method can Never know that it’s actually operating within a subclass instance. It always behaves as a strictly-baseclass method and can’t access subclass data. The non-virtual method getId() is compiled into base class binary, so it only knows the fields of the base class. When a subclass adds a field of the same name, it is not accessible to the getId(). A few workarounds:

  1. curiously recurring template pattern
  2. virtual function in subclass
  3. down cast the pointer and directly access the field
#include <iostream>
using namespace std;
struct B {
	char id = 'B';
	virtual ~B() {}
	char getId() {
		return id;
	}
} b;
struct C: public B{
	char id = 'C';
} c;
int main() {
	B* ptr = new C();
	C* ptr2 = dynamic_cast<C*>(ptr);

	cout << "base class nonref: " << b.getId() << endl;
	cout << "sub class nonref: " << c.getId() << endl; //still base class method. 
	// The fact that host object is subclass doesn't matter.

	cout << "base class ptr to subclass instance: " << ptr->getId() << endl; 
	cout << "downcast ptr non-virtual method: " << ptr2->getId() << endl; //B
	cout << "downcast ptr direct access: " << ptr2->id << endl; //C
	return 0;
}

 

##10 c++coding habits to optimize perf

Many of these suggestions are based on [[optimized c++]]

· #1 Habit – in c++ at least, ++counter performance is strictly “better or equal to” counter++. If there’s no compelling reason, I would prefer the former.

· #2 Habit – in a for loop, one of the beginning and ending values is more expensive to evaluate. Choose the more expensive one as the beginning value, so you don’t evaluate it over and over. Some people object that compiler can cache the more expensive end value, but 2016 tests show otherwise.

· If a method can be static, then make it static. Good for performance and semantics.

· For a small if-else-if block, put the most likely scenario first. May affect readability. Worthwhile only in a hot spot.

· For a long if-elif-elif-elif-elif block, a switch statement performance is strictly “greater or equal”

· For-loop starts by checking the condition (2nd component in header). If this initial check is redundant (as often is), then use a do-while loop

· Call a loop in a function, rather than call a function in a loop. Another micro-optimization.

wait()needs while-loop #spurious

Across all languages, I have never seen any exception. You must enclose wait() in a while-loop.

C++11 has a syntactic sugar — wait(myLock, myPredicate), hiding the while loop.

Q: why is spurious wake unavoidable? [[Josuttis]] P1004 has a concise answer:
A: thread library (not operating system) can’t reliably ensure to deliver the notification, so to play safe, the thread library wakes the waiting thread.

— I wrote about java:

Wait() is always awakened by a notification message sent to the monitor object.

Between the notification thread releasing the lock and the waking thread acquiring, a third thread can modify some shared data[1], and the condition you are waiting for disappears as a result.

Therefore the waking thread need to recheck the condition. if() won’t do[2]. while() is required.

If you know there is no 3rd thread messing with the condition, then the waking thread can assume the condition is still intact. However, this design unnecessarily weakens code extensibility and re-usability. It’s fairly easy to put a while() loop around a wait(). No good reason not to do it.

[2] This kind of bug is intermitten.
[1] in a database for example (a crude example)

9tips: hacker rank test #cout-macro

similar to codiliy…

  • I may need a macro for cout, so I can disable all cout quickly before uploading my source.
    • #define ss if(1>2)cout //1>0
  • I keep my own container data dumper function for instrumentation.
  • you can submit multiple times. So as soon as you can pass some tests, please submit.
  • 🙂 You can look at multiple questions at the same time, so feel free to skip tough or time-consuming questions.
  • You absolutely need your own local compiler and ECT environment.
    • I put a small dos window (for g++ and a.exe) over a notepad++ editor. I use F2 to save

The online one is too slow and provides a single screen. You can copy paste code from local IDE, but you can’t copy paste test data. You could try downloading test data, but you need to be really efficient there. Setting up test can take 5 minutes out of average 20 minutes per question.

  • There could be hidden test cases. Don’t worry too much about them. It’s a real achievement if you can pass all the visible test cases.
  • Ignore integer overflow. If there’s a hidden test for it, you will see the failure
  • Don’t ever worry about minor inefficiency. You won’t have the time. Just get it to work first.
  • pass by clone by default. Easier to debug
  • avoid long variable names. Use std abbreviations like li, vec, ma (for map)
  • I maintain many (unnecessary) global variables of generic names like i1, s2
  • 😦 lost connectivity? timer keeps ticking

 

## insertion sort: phrasebook

  • nearly sorted — for nearly-sorted sample, beat all other sorts  including counting sort
  • small — for small sample, beats MSD radix sort and all comparison sort
    • switch — many comparison sort implementations switch to insertion-sort when the sub-array becomes small
  • poker — Intuitively, this is how players sort poker cards
    • shift — requires shifting part of the sorted portion.

[[algo in a nutshell]] bucket sort, insertion sort..

== bucket sort
This book asserts that bucket sort is #1 top dog when data is uniformly distributed (not normally distributed!) — i.e. can be uniformly partitioned using a fast hashing function. Number of buckets created equals the input data size, just like in a standard hash table. Book shows bench-marking on large samples of random floating point numbers. However, I disagree.
  1.  for special data types like strings and small ints, radix sort is probably faster than bucket sort
  2. Bucket sort is #1 for large data volumes, beating quick sort, heap sort etc. But for nearly-sorted data, Insertion-sort is faster.

Other notes

  • Linked list — used inside each bucket. Arrays are possible alternative, but 50% slower.
  • —-Relation to other sorts:
  • Can be seen as generalization of counting sort
  • A cousin of radix sort in the most-to-least significant digit (MSD) flavor i.e. top-down radix sort
–hash sort, a variation of bucket sort, customized for random strings.
Book shows benchmarking on random strings. Hash sort is #1 for large data volumes, beating quick sort, heap sort etc

I guess if string distribution is unconstrained/unknown (without guarantee of randomness), hash sort will not have any advantage

in both cases, the hash code must be strictly “ordered” , so bucket #1 must hold lowest data items.

== insertion sort
P100. for nearly-sorted data, insertion sort is faster than all other sorts (including bucket sort, radix sorts), often by an order of magnitude

https://stackoverflow.com/questions/736920/is-there-ever-a-good-reason-to-use-insertion-sort shows insertion sort is better than divide-n-conquer for 1) nearly-sorted or 2) small collection

https://www.cs.princeton.edu/~rs/AlgsDS07/18RadixSort.pdf shows that MSD radix sort also need to switch to insertion-sort when sample size is small.

== counting sort
P315. “fastest” sort for the right data set. I think a single English word is suitable. However, for nearly sorted data, insertion sort is still faster.

I guess it’s still slower than insertion sort if nearly-sorted.

== Binary search tree vs hash table
P130 discusses when to use which

some Design questions at my interviews

Q: when you start a green field project and multi-threading is required, what are the key points you keep in mind about multi-threading.
A (my Answer): keep things really simple. For example, immutables; fewer locks; less granular locks;
A (now i think): have an idea where MT can really help and where it has marginal impact. In many cases ST works better, without any data races or locks.
A: It’s also possible to go for multi-processing instead of MT — lower risk (fewer MT hazards); simpler; better-understood
A: sometimes it makes sense to leverage on existing MT features in a database such as running code in the DB

Q: in your sybase/java/perl system at GS, what are the performance bottlenecks? How did you address them?
A: DB query.

Q (MS): how do you define scalability, in your specific contexts?
%%A: generically, it means nearly linear throughput improvement without code change.

Q: what strengths/complaints do you know about C++ STL?
A (my Answer): STL is more powerful than anything before or after it, but too complex. For example, I can have a vector of reference to pointer to pointer to int. No such thing allowed in java.
A(my Answer): STL debugging is harder
A (now i think): not much OO, but heavy usage of templates. Some people like/dislike it
A (now i think): thread-safety is minimal
A (now I think): storing pointers in containers is widely required but not well-supported without smart pointers.
A (now I think): storying interface or base types is widely required in Java but similar situation as above
A: performance very good.

Q(BGC): key differences between Perl and c++?

msg overflow ] JMS^tibrv #60sec

Message overflow is a frequent headache in MOM. Basically consumption rate (possibly 0) is slower than production rate, eating up storage capacity, either in memory or on disk.

  • eg: JMS Topic with NDS (non-durable-subscriber) only — almost free of overflow. Bonnie said broker will discard. Jerry said broker keeps track of alive subscribers. Every incoming message is guaranteed to be delivered (by retry) to each. Broker must persist pending messages. When a subscribe disconnects gracefully, broker no longer needs to keep any pending message for her.
  • eg: chatrooms don’t persist data but what type of MOM is that? probably NDS topic
  • eg: JMS queue. While any number of producers can send messages to the queue, each message is guaranteed to be delivered, and consumed by one consumer. If no consumers are registered to consume the messages, the queue holds (on disk) them until a consumer registers to consume them.
  • eg: JMS temp queue??
  • eg: regular queue? In SIM upfront, unread messages –persist–
  • eg: mailing list (topic with durable subscribers)? –Persist–
  • eg: JMS topic without durable subscriber? see post on message retention
  • eg: tibrv CM? Storage is the ledger file. –persist–
  • eg: tibrv non-CM? market data feed? Discarded but how soon? 60 sec by default? Yes, in slow consume and fast producer scenario message lost will be possible — http://arun-tibco.blogspot.com/2010/09/tibco-rv-faqs.html

 

operator << must be a friend function #tested

P 331 [[absolute c++]] gives the standard signatures of operator <<

— sound bytes —
friend — It has to be a friend function. Without “friend” keyword, compiler complains.
free function — required
method — is not possible. For a method, host object must be on the LHS of the operator, but q[ operator<< ] needs cout on LHS

real-world OMS c++data structure for large collection of orders

class OrderMap{
struct data_holder {/*fields only. no methods*/};
unordered_map<int, shared_ptr<data_holder> > orders; // keyed by orderID
… //methods to access the orders
};

Each data_holder is instantiated on heap, and the new pointer goes into a shared_ptr. No need to worry about memory.

Do you see ANY drawback, from ANY angle?

JGC overhead quick calc

(Based on P120 [[JavaPerformance]] by Charlie Hunt) There are many GC related command line options. These two have quite specific meanings —

C) -XX:PrintGCApplicationConcurrentTime
S) -XX:PrintGCApplicationStoppedTime

With these 2, the GC log will show something like

“app time: 0.527905 seconds” # due to (C)
[GC…]
“total time for which app threads where stopped: 0.0453124 sec” # due to (S)

If you divide the (S) duration by the Preceding (C) duration, you get a good estimate of GC overhead — .04531/.5279 = 8%

Warning — If you see Another “app time: …” right after, that would be another “uneventful period”, which would be followed by a minor/major GC.

It’s critical to find out if the (S) duration is a minor or major GC. You can find such details in the intervening messages in [GC…]

concurrent JGC does occasional stop-the-world

Folks often mention concurrent GC as the opposite of stop-the-world (STW) GC, but in fact in Java lingo “concurrent” actually involves short stop-the-world pauses.

* STW means always-STW.
* Concurrent means occasional-STW.

STW is simpler because … during concurrent marking you can miss some reachable objects. See https://java.sun.com/j2se/reference/whitepapers/memorymanagement_whitepaper.pdf  for the exact details.

Even in single-threaded STW you could design finalize() to perform resurrection like http://www.xyzws.com/Javafaq/what-is-resurrection-in-garbage-collection/47