context switch: 1-10 microsec

See also #CPU-cycles]1 time slice ~ millions #AshS 

In a MS interview, Deepak was asked about (typical) duration of a TaskSwitch or context switch. I think 20 millisec is way too long. and other sites suggest “below 10 microsecond”

— One of the most interesting and popular focus points is the difference between thread switching vs process switching

The L1-cache (private to a cpu core) reload happens if a core switches to another Process…?


array/deque-based order book

For HFT + mkt data + internal matching or market making .. this is a favorite interview topic, kind of similar to retransmission.

==== For a first design, look at … has a brief design doc, referenced by

  • TLB?
  • cache efficiency

— no insert/delete of array

Order cancel and full-fill both results in deletion of an array element .. shifting. Random inserts mid-stream also requires shifting in the array. To preempt shifts, the design adopted in engine.c is “one array element per price point, even if not in use”.

  1. when an existing order gets wiped out, its quantity becomes zero. It’s also possible to use a flag, but zero quantity is probably more space efficient.
  2. when we get an order to hit a new price of 9213, we don’t insert but update the dummy price point object in-situ.

What if all the price points in use are only 0.01% of the price points allocated? We need some estimates of the likely price levels and the footprint of the array element. Luckily, price levels are not floating points but integers essentially — a key constraint in the requirement.

  • An array element can be very space efficient — a nullable pointer.
  • Alternatively, it can be an integer index into another array of “received price points”. Dummy price point would be represented by “0”.
  • Likely price levels could range from 30D min to 30D max plus some margin. Such a range would be up to 10,000 price levels.
    • But What if price plunges at SOD or mid-day? “Not sure how my company system was designed, but here’s my idea –” we would need to allocate then initialize a new array of price levels. Deque would help.
  • Unlikely price levels (for outlier orders) would be hosted in a separate data structure, to support a super low bid (or super high ask). These outlier orders can tolerate longer delays.

==== vector insert as a second design

A fairly complete reference implementation for Nasdaq’s ITCH protocol supports order lookup, order reduction (partial fill or full fill), order cancel and  new order. New order at a fresh price level uses vector::insert —
  • If this insertion happens near the end of the vector (top of book), then the elements shifted would be very few
  • if the market slowly declines, then the best bid would be wiped out, shrinking the vector at the back end. New price levels would be inserted at lower price points, but again the elements above it would be few.
  • If this insertion happens far from the back end, as in an outlier order, then lots of elements shifted, but this scenario is rare.

Note a single element’s footprint can be very small, as explained earlier.

==== switch to a RBTree if some abnormal price movement detected. This idea may sound crazy, but consider RBTree used inside java8 hashmap notes

Q: if an optional is empty, will it remain forever empty?

— An variable could but should never be null, as this instance need a state to hold at least the boolean isPresent.

If a method is declared to return Optional<C>, then the author need to ensure she doesn’t return a null Optional ! This is not guaranteed by the language. illustrates a simple rule — use a local var retVal throughout then, at the very last moment, return Optional.ofNullable(retVal). This way, retVal can be null but the returned reference is never null.

If needed, an Optional variable should be initialized to Optional.empty() rather than null. shows the pitfalls of returning Optional from a method.

I feel this is like borrowing from a loan shark to avoid accidental credit card interest charge.

If you are careful, it can help you avoid the NPE of the traditional practice, but if you are undisciplined (like most of us), then this new stragey is even worse —

Traditional return type is something like String, but caller has to deal with nulls. New strategy is supposed to remove the risk/doubt of a null return value, but alas, caller still needs to check null first, before checking against empty!

–immutability is tricky

  1. the referent object is mutable
  2. the Optional reference can be reseated, i.e. not q[ final ]
  3. the Optional instance itself is immutable.
  4. Therefore, I think an Optional == a mutable ptr to a const wrapper object enclosing a regular ptr to a mutable java object.

Similarity to — [B/C]

Compare to shared_ptr instance — [A] is true.

  • C) In contrast, a shared_ptr instance has mutable State, in terms of refcount etc
  • B) I say Not-applicable as I seldom use a pointer to a shared_ptr

— get() can throw exception if not present

— not serializable

— My motivation for learning Optional is 1) QQ 2) simplify my design in a common yet simple scenario is a demo, featuring … flatMap() !!

max rectangle formed us`any 4points#Google#Rahul 50%

Q (Google onsite): given N Points defined by x,y coordinates, find the largest possible rectangle formed with four corner points drawn from the N points.


worst input? many points line up along the north and west of a rectangle?

From N points, we get (roughly) NN line segments, actual count is up N(N-1)/2.

Any 3 points forming a right-angle corner is represented as a Corner object with 5 fields (or more)

  • 3 Points: clockwise start point, corner point, clockwise end point
  • 2 line segments

A FloatingLineSegment object has 2 fields {length, slope}

Any 2 line segments of equal length and slope forming a parallelogram is represented as a Para object having fields (or more):

  • 4 Points
  • 4 line segments
  • 2 FLSs
  • a single point representing the central intersection

Each line segment has one FLS and one slope. So we scan the NN line segments to populate some hashmaps.

A hashmap of {FLS -> set of lineSegments}

A hashmap of {slope -> set of FLS }. From any one slope value, we can look up the FLSs of that slope + the FLSs of the opposite slope.

Scan the NN line segments. For each line segment L1, compute the opposite slope. Use slope hashmap to get the FLSs. Use these FLSs to get the line segments perpendicular to L1. Discard any of those line segments not joining L1. This is how to populate .. A hashmap of {lineSegment -> set of Corners}

Scan the NN line segments. For each line segment L1, compute FLS. Use the FLS hashmap to get the other line segments. This is how to populate … A hashmap of {lineSegment -> set of Paras}

Are there more Corners or more Paras? I hope at least one of them is O(NN)

(From NN line segments, we could get a large number of Corners objects. I think the count of Corners is too big if all points line up along some big rectangle. I feel we should start from one corner and try all possible rectangles rooted there, then remove this corner from the search space.)

Solution 1: At any point C1, there are N-1 relevant line segments. Using a hashmap {slope ->?}, it takes up to O(NN) time to identify all Corners at C1. If N-1 is 8, then at most 4×4 such Corners, so worst case O(NN) such corners. For each Corner, check if the opposite corner point exists. This check is O(1). This entire process at C1 takes O(NN).

Overall O(NNN) since there are N points like C1. Average case is much better.

I feel O(NN logN) is possible.

— O(NNN) by Rahul — pick any 3 points, check if they form a corner. If yes, compute the opposite corner and check its existence.

–Idea 3: for the N(N-1)/2 line segments, sort by slope, length, lowest x-coordinate, lowest y-coordinate…?
sorting takes O(NN logN)

–idea 4: group N(N-1)/2 line segments by slope and length.

Within each group, there are up to N points and O(NN) line segments.

— O(NN) solution … wrong!

Pick any 2 points as a pair-of-opposite-corners. Work out the locations of the other 2 corners and check if they exist. This check is O(1).

There are O(NN) such pairs, so overall O(NN).


G9 workhorse-algos ] speedCoding #XR

XR said we should take speed coding contests to find out what basic algos are needed.

Opening example — parent/child pairs→tree algos #Indeed is one example of realistic problem that requires common comp-science constructs…

Need to memorize enough to implement on the spot, as these are basic “comp science constructs” needed for “realistic problems”

  • dft/bft, with levels
  • BST find predecessor
  • Binary search in array
  • Merge-sort/insertion-sort/partitioning
  • .. realistic string problems:
  • .. realistic int array problems:
  • max profit; max subarray sum
  • .. realistic matrix problems:

Below are basic algos for hackerrank/codility but NOT applicable to realistic problems typical of Indeed/FB

  • Linked list: remove dupes
  • string: collapse consecutive chars

77 c++IV paperTigers

Avoid spending too much time on this list…. These c++ topics appeared non-trivial (perhaps daunting, intimidating) for years, until I started cracking the QnA interviews. Then I realized in-depth expertise isn’t required.

  1. TMP
  2. make_shared
  3. std::forward()
  4. fwd declaration
  5. … these are some new items to be sorted…
  6. — real tigers i.e. non-trivial nlg is quizzed
  7. [A] CRPP, SFINAE — real tigers. I got asked about these around 5 times, sometimes in-depth
  8. socket: non-blocking
  9. — Now the paper tigers
  10. open source or commercial instrumentation for memory, dependency instrumentation. See blogpost to Ashish
  11. [s] what debuggers and memory leak detectors are you familiar with?
  12. [a] singleton, factory, pimpl and other design patterns
  13. [s] c++ regex
  14. [s] stream I/O and manipulators
  15. —sockets # many more paper tigers to be listed
  16. udp, multicast, select()
  17. [s] socket buffers
  18. [a] byte alignment
  19. endianness
  20. TCP flow control
  21. TCP handshake and disconnect
  22. ack numbers
  23. partial send
  24. close() vs shutdown()
  25. — STL # many more paper tigers to be listed
  26. [s] STL binders
  27. [s] STL allocators
  28. adapters for containers, iterators and functors
  29. [a] iterator invalidation rules
  30. [s] how is deque different from a vector
  31. RBtree
  32. —concurrency # many more paper tigers to be listed
  33. [A] atomic types
  34. pthread functions
  35. [A] IPC mutex
  36. mutex in shared memory
  37. recursive lock
  38. read-write lock
  39. what if a thread dies while holding a lock?
  40. [s] RAII scoped lock
  41. — multi-file build
  42. forward class declarations (as required in pimpl) and their limitations
  43. [s] C/C++ integration, extern-C — heavily quizzed, but no in-depth
  44. [s] what C features are not supported by c++ compiler
  45. circular dependency between libraries — confusing. real tiger but seldom quizzed
  46. [As] shared lib vs static lib
  47. —integration and data exchange
  48. [A] shared memory
  49. [A] CORBA, RPC
  50. [a] serialization in compact wire format — only primitive data types!
  51. [s] OS system calls vs std library (glibc) — sounds daunting to most developers
  52. —exception
  53. catch by ref or by value or by pointer?
  54. [s] exception guarantees
  55. [s] stack unwinding due to exception
  56. throwing destructors — various questions
  57. —memory
  58. which part of memory do static data members go? How about file scope static variables? How about global variables
  59. [s] preventing heap/stack allocation of my class
  60. [s] custom new/delete,  set_new_handler()
  61. [s] intrusive smart ptr, weak_ptr
  62. [s] ref counting
  63. custom deleter in shared_ptr
  64. [s] reinterpret_cast  # always on pointers
  65. [A] custom allocators
  66. [A] free list in the free store
  67. what if you call delete on a pointer that’s created by array-new?
  68. placement new
  69. out-of-memory in operator-new
  70. —inheritance
  71. dynamic_cast
  72. [A] multiple inheritance
  73. [s] virtual inheritance… which base class ctor gets called first? See
  74. [a] slicing problem
  75. [a] private inheritance
  76. [s] pure virtual
  77. —other low level topics
  78. [s] setjmp, jmp_buf… See the dedicated blog post jmp_buf/setjmp() basics for IV #ANSI-C
  79. [s] cpu cache levels
  80. [s] translation lookaside buffer
  81. [s] what data structures are cache-friendly?
  82. [a] memcpy, memset
  83. [s] ++myItr vs myItr++ how are they implemented differently?
  84. —other language features
  85. [s] RAII
  86. [s] operator-overload
  87. [A] template specialization — part of the STL fabric but largely transparent
  88. [s] ptr to member (function) — seldom used outside library code. I tried the syntax in my binary tree serializer
  89. [A] std::forward() std::move(), rvalue-ref
  90. const and constexp
  91. [a] lambda with capture
  92. [a] double-pointer .. why do you need it?
  93. —-
  94. [s == shallow book knowledge is enough]
  95. [a == actually not that deep, IMHO]
  96. [A == actually deep topic]

At what juncture would kernel scheduler run@@

Kernel scheduler has an algorithm and therefore implemented as a sequence of instructions. You can think of it as some black-box function/routine.

I think it is Not really a long-running background process. In Linux, I believe it is an on-demand routine, but not run on behalf of any process.

Background — Many familiar on-demand kernel routines do run on behalf of an “owner process” —

  • accessing a file or socket
  • accessing some device such as NIC
  • accessing memory

However, other on-demand kernel routines (often interrupt handlers) do not have an “owner process”. Here are some routines —

  • reacting to timer interrupts
  • reacting to low-level emergency hardware interrupts like …. ?

So the scheduler is a classic example. I think scheduler can get triggered by timer interrupts. See P 350 [[linux kernel]]

make_shared() cache efficiency, forward()

This low-level topic is apparently important to multiple interviewers. I guess there are similarly low-level topics like lockfree, wait/notify, hashmap, const correctness.. These topics are purely for theoretical QQ interviews. I don’t think app developers ever need to write forward() in their code. touches on a few low-level optimizations. Suppose you follow Herb Sutter’s advice and write a factory accepting Trade ctor arg and returning a shared_ptr<Trade>,

  • your factory’s parameter should be a universal reference. You should then std::forward() it to make_shared(). See gcc source code See make_shared() source in
  • make_shared() makes a single allocation for a Trade and an adjacent control block, with cache efficiency — any read access on the Trade pointer will cache the control block too
  • I remember reading online that one allocation vs two is a huge performance win….
  • if the arg object is a temp object, then the rvr would be forwarded to the Trade ctor. Scott Meryers says the lvr would be cast to a rvr. The Trade ctor would need to move() it.
  • if the runtime object is carried by an lvr (arg object not a temp object), then the lvr would be forwarded as is to Trade ctor?

Q: What if I omit std::forward()?
AA: Trade ctor would receive always a lvr. See ScottMeyers P162 and my github code is my experiment.

— in summary, advantages of make_shared
* code size is smaller and more i-cache friendly
* one allocation fewer. I think one allocation is thousands of times slower than a simple calc


## optimize code for i-cache: few tips

I don’t see any ground-breaking suggestions. I think only very hot functions (confirmed by oprofile + cachegrind) requires such micro-optimization.

I like the function^code based fragmentation framework on (3 parts)

  • inline: footprint+perf can backfire. Can be classified as embedding
  • use table lookup to replace “if” ladder — minimize jumps
  • branching — refactor a lengthy-n-corner-case (not “hot”) code chunk out to a function, so 99% of the time the instruction cache (esp. the pre-fetch flavor) doesn’t load a big chunk of cold stuff.
    • this is the opposite of embedding !
  • Trim the executable footprint. Reduce code bloat due to inlining and templates?
  • loop unrolling to minimize jumps. I think this is practical and time-honored — at aggressive optimization levels some compilers actually perform loop unrolling! Programmers can do it manually.
  • Use array (anything contiguous) instead of linked list or maps to exploit d-cache + i-cache
  • is a 2014 Intel paper — split huge function if it’s invoked in a loop.


check a receiver socket is alive: tcp^udp

Q: given a TCP receiver socket, how do you tell if it’s connected to a session or disconnected?

Shanyou said when you recv() on the socket but got 0 it means disconnected. shows recv() return value of 0 indicates dead connection i.e. disconnected. uses getsockopt()

Q: given a UDP multicast receiver socket, how do you tell if it’s still has a live subscription to the multicast group?

%%A: I guess you can use getsockopt() to check socket /aliveness/. If alive but no data, then the group is quiet

TCP/UDP: partial or multiple messages in one buffer

This is often mentioned in IV. At least you can demonstrate your knowledge.

What if the UDP datagram is too big for recv() i.e. specified buffer length is too small? P116 [[tcp/ip soclets in C]] seems to say the oversize message is silently truncated.

UDP recv() will only return a single “logical” message [1]. I believe TCP can put partial or multiple messages into one “buffer” for recv().

Q: if my buffer is big enough, will my UDP recv() ever truncate a msg?
%%A: never

Note IP would always deliver a whole msg or miss a whole msg, never a partial msg. See P 329 [[comp networking]]

[1] a logical msg is the payload from one send()

generate paths in ternary matrix #Promethean TomJerry


Similar to binary-matrix cluster count #DeepakM #DST+fullScan but the connection is modeled differently.

–Original requirements: Input matrix has an object in each cell

  • 1 means wall or barrier
  • 0 means passage
  • 2 means cheese station

Jerry is [x][y]. Tom starts at [0][0] and must collect all cheese and deliver to Jerry. Find shortest path. Tom can only move one (U/D/L/R) step at a time. is my half-finished code. It builds a graph connecting all the cells (each a Node object). It’s able to check feasibility. Todo:

  • breadth-first walk from Tom to Jerry, without revisiting any node. Hopefully we can discover all possible paths (i.e. collecting all cheese)
  • determine the shortest.


memory fencing: terminology

Let’s standardize our terminology in this blog to ease searching — “fence instruction” and “memory-fencing” are more flexible words than “barrier”… also shorter.

Visibility-of-update is part of memory fencing.

Q: does memory fencing imply flush to main memory and bypassing register?
A: I think memory fencing is more about statement reordering and less about “flush to main memory”. All of these affect global visibility of a shared mutable.


locate a pair with targetSum==55 #bbg IV #Morris


Q: any O(1) space sort?

Q: From an unsorted array of positive integers, is it possible to find a pair of integers that sum up to a given sum?

Constraints: This should be done in O(n) and in-place without any external storage like arrays, hash-maps, but you can use extra variables/pointers.

If this is not possible, can there be a proof given for the same?

—–Initial analysis—-
I wish I were allowed to use a hash table of “wanted ” values. (iterate once and build hashtable. For Each new value encountered, check if it is in the “wanted” list…)

I feel this is typical of west coast algorithm quiz.

I feel it’s technically impossible, but proof?  I don’t know the standard procedure to prove O(n) is impossible. Here’s my feeble attempt:

Worst case — the pair happens to be the 1st and last in the array. Without external storage, how do we remember the 1st element?  We can only use a small number of variables, even if the array size is 99999999. As we iterate the array, I guess there would be no clue  that 1st element is worth remembering. Obviously if we forget the 1st element, then when we see the last element we won’t recognize they are the pair.
—–2nd analysis—-
If we can find a O(n) in-place sort then problem is solvable [1]. Let’s look at Radix sort, one of the most promising candidates. has an implementation for DNA strings.

Assumption 1: the integers all have a maximum “size” in terms of digits. Let’s say 32-bit. then yes radix is O(n) but not sure about space complexity. Now, with any big-O analysis we impose no limit on the sample size. For example we could have 999888777666555444333 integers. Now, 32-bit gives about 4 billion distinct “pigeon-holes”, so by the pigeon-hole principle most integers in our sample have to be duplicates!

Therefore, Assumption 1 is questionable. In fact, some programming languages impose no limit on integer size. One integer, be it 32 thousand bits or 32 billion bits, could use up as much memory as there is in the system. Therefore, Assumption 1 is actually superfluous.

Without Assumption 1, and if we allow our sample to be freely distributed, we must assume nothing about the maximum number of digits. I would simply use

Assumption 2: maximum number of digits is about log(n). In that case radix sort is O(n log(n)), not linear time:(

—– new analysis as of Jun 2017 —-
Can Morris algo sort an array (external storage)? If yes,  then use the 2-moving-pointer algo in locate a pair whose diff=55

However, Morris needs a tree not an array.

[1] locate a pair with targetSum=55 : pre-sorted #O(N) 1-pass

## c++0x features I understand as significant

Here are the subset of C++11 features I understand as significant. Some significant c++11 features I don’t appreciate, such as lambda.

#1 move semantics
– std::move and forward

#2 Lib: smart pointers
– make_shared() etc

#3 Lib: thread library
– atomic<int>

#4 type traits

nested class has special capabilities ] C++^java

Q: Can nested class (NN) methods access a private member of the enclosing class EE? See c++nested class accessing host private member

Can EE methods access NN’s private members? Yes. Tested. See below.

Below is based on P186 ARM, perhaps outdated.
Q: in terms of syntax, how does Inner class methods refer to Outer class’s non-static[1] field?
A: no special access. Unlike java, there’s no hidden “this” field pointing to the Outer _i n s t a n c e_
A: yes if the Inner method has a pointer, reference or a nonref variable of an Outer class _i n s t a n c e_,

Java’s nested class N1 is in the same “private club” along with E2() an Enclose class private method.
– N1 methods can access private members of E2
– E2 methods can access private members of N1
–> Not in C++

Q: So what special capabilities does a nested class has? Why make a class nested at all?
A: to mention the “Inner” type name, you need to prefix it as “Outer::Inner”
A: another specialness — inner class methods can access outer private members. As pointed out on
— in the original C++98 the inner class has no special privileges accessing the outer class.With C++98 compiler you’d either have to give the inner class the necessary privileges (friendship) or expose
the outer members x and y as public. However, this situation was classified as a defect in C++98, and it was decided that inner classes should have full access to outer class members (even private ones).

[1] static is fine — Outer::f