max-sum non-adjacent subSequence: signedIntArr#70%

Q: given a signed int array a[], find the best sub-sequence sum. Any Adjacent elements in a[] should not both show up. O(N) time and O(1) space.


— O(1) idea 2:

Two simple variables needed: m_1 is the m[cur-1] and m_2 is m[cur-2]. So compare a[cur] + m_2 vs m_1. is self-tested DP solution.


pre-allocated array as backing store for graph nodes #java No

I think any graph node can use the same technique, but here I present a simple yet interesting use case — a linked list with each node allocated from an array. shows three home-made implementations:

  1. backing array of dummy link nodes, pre-allocated at compile time
  2. backing array of dummy link nodes, pre-allocated from free store aka DMA
  3. backing array is a byte array on heap or data section. Each link node is constructed via placement-new.

Here are a few Advantages that I consider minor because linked list is seldom needed in low-latency

  1. d-cache efficiency
  2. eliminates runtime load on heap allocator, since memory is pre-allocated. See malloc=long considered costly

Advantage #3: For c++ algo questions, this set-up has an interesting advantage — The node address is now an index into the backing array. This index is a natural auto-increment ID , based on creation order.

Now, the biggest advantage of linked list over vector is mid-stream insert/delete. One of the biggest disadvantages is lack of random-access. If nothing happens mid-stream (as in coding questions), then we can achieve random-access-by-id using array as backing store.

If nothing happens mid-stream, then this linked list is physically similar to an array with extra capacity.

This technique won’t work in java because java array of Node is array-of-pointers.

power-of-2 sized hash tables: implications  and explain that java HashMap uses bucket count equal to power-of-two. A few implications

  • the modulo operation is a fast bit-masking i.e. select the lowest N bits: hash & (length-1)
  • an unlucky hashCode() may not offer dispersion among the lower bits. Assuming 16 buckets so only the lowest 4 bits matter, but hashCode() returns predominantly multiples of 8. Then two out of 16 buckets would be white hot. To guard against it, HashMap performs a rehash on the hashCode() output.

I implemented the same feature in

Note rehash is ineffective if hashCode() returns very few distinct values. In java8, there’s one more protection against it — RBTree as alternative to linked list… See RBTree used inside java8 hashmap

post-order walk: create valuable subtree statistics is rather simple and short, showing stats like

  • max path sum from current node to any subtree node. Note a path must be non-empty, so this max can be negative
    • We can even save in each node this actual path
  • subtree height,
  • subtree size,
  • d2root — tested but doesn’t use subtree statistics
  • — can be easily implemented:
  • subtree payload sum
  • subtree max payload… These stats are powerful AuxDS

Applicable on k-ary tree

array of sd::byte^Unsigned-char as “byte array”

Background: Suppose you need to store or transfer arbitrary binary data.

In RTS, we used char-array. Some say we should use unsigned char. It is the only data type that is guaranteed (by the ANSI C Standard) to have no padding bits. So all 8 bits in an unsigned char contribute to the value. None of them is a padding bit. I think std::uint8_t is similar but less common.

Contrary to some online posts, unsigned-char type is different from “char” —

In C++17, std::byte is probably preferred because only the bitwise operations are defined. I believe you can reinterpret_cast a pointer to std::ptr. In, I used none of the above — I used placement-new 🙂

In java we use the primitive “byte” type — an 8-bit signed integer.

python bisect #cod`IV

The bisect module is frequently needed in coding tests esp. codility. In this write-up, I will omit all other function parameters.

* bisect.bisect_right(x)  # returns an position “i” after the first hit, if any
all values are <= x from a[lo] to a[i-1]) for the left side and all values are > x from a[i] to a[hi-1]) for the right side. So the “hit” items are on the left, unconditionally 🙂
* bisect.bisect_left(x) # returns an index i before or on the first hit:
all values are < x from a[lo] to a[i-1]) for the left side and all values are >= x from in a[i] to a[hi-1]) for the right side.

My sound bytes:

  • bisect_left(needle) returns the first index above or matching needle.
  • bisect_right(needle) returns the first index above needle, unconditionally.

A few scenarios:

  1. If No perfect hit, then same value returned by both functions.
    • Common scenario: if needle is higher than all, then “i” would both be the last index + 1.
    • Common scenario: if the needle is lower than all, then “i” would both be 0
    • in all cases, You can always insert Before this position
  2. If you get a perfect hit on a list values, bisect_left would return that “perfect” index, so bisect_left() is more useful than bisect_right(). I feel this is similar to std::lower_bound
    • This is confusing, but bisect_right() would return a value such that a[i-1] == x, so the returned “i” value is higher. Therefore, bisect_right() would never return the “perfect” index.
  3. If you have a lower-bound input value (like minimum sqf) that might hit, then use bisect_left(). If it returns i, then all list elements qualify from i to end of list
  4. If you have an upper-bound input value that might hit, then use bisect_left(). If it returns i, then all list values qualify from 0 to i. I never use bisect_right.
  5. Note the slicing syntax in python a[lo] to a[i-1] == a[lo:i] where the lower bound “lo” is inclusive but upper bound “i” is exclusive.
import bisect
needle = 2
float_list = [0, 1, 2, 3, 4]
left = bisect.bisect_left(float_list, needle)
print 'left (should be lower) =', left # 2

right = bisect.bisect_right(float_list, needle)
print 'right (should be higher) =', right # 3

generate combinationSum compositions #backtrack up]trie #2review


Given a set of unique candidate numbers and a target number, find all unique combinations of candidates, where each combination sums to target. Each candidate may be used repeatedly.

My solution is , showing a reusable backtracking technique described below. It’s clever and brief. However, the efficiency is questionable. Memoization might be applicable here.

This backtracking relies on a key insight. Suppose we have target = 7 and X=2,Y=3,Z=4 as the candidates.

  • when we try a Y, we don’t need to try any more Xs. For example, If we are trying XY, then all XX* solutions are already handled by earlier recursive calls.
  • each combo sequence is naturally pre-sorted internally.
  • Also, “lower” formulas are generated before “higher” formulas
          x        y     z
      /   |  \
     x    y   z       
    / \   | \  \
 xxx  xxy | xyz \ 
         xyy     xzz
void //can return something if needed

recurs( solutionsFound &, //growing
        curPartialSolution &, 
// above collections could be global variables, to simplify things

        remainingCandidates, /*probably an immutable global array. 
If we need the remaining array to shrink, we can still rely on startIndex to skip used candidates.*/

        startIndex, //used to select from remaining candidates

Inside this function, we scan remaining candidates starting from startIndex. Typically in one iteration

  1. we add a new candidate into curPartialSolution
  2. we call recurs
  3. we remove the last added candidate from curPartialSolution to restore the original curPartialSolution — backtracking up the tree.
  4. move on to the next candidate

I feel this is more greedy than DP

serialize binTree #funptr

struct Node{
  Node * left, right;
  int data;

… If you simply output as “data,leftId,rightId | data,leftId,rightId | …” then the graph structure i.e. link is lost. Therefore, I feel the stream need to identify each node: “Id,data,leftId,rightId|…” Next question is how to assign the id values. Easiest — use node address as a string id! Suppose there’s a treewalk algo walk(Node * root, void ( *callback ) (Node *)), i will call it like

walk(root, serialize1node); // where serialize1node is the callback.

Note the position of the asterisk:

  • It’s on the right of type name Node — read left to right .. pointer to Node
  • It’s on the left of variable name callback — read left to right .. callback is a pointer to … is my tested solution I feel the challenge is ECT and syntax, not the idea. —- elegant home-grown two-pass solution Assign incremental IDs to each node in an in-order walk. Then run pre-order walk to output each ID.

  • 🙂 applicable for any binTree, but 3-way trees  or higher
  • 🙂 No null node needed
  • 🙂 no space overhead. Instead of serializing address, I serialize my generated ID of each node, which are usually smaller and never bigger than a pointer. I think the serialized byte array is the smallest possible.
  • Key insight — from a given preorder output, there’s exactly one possible BST we construct.

—- solution: standard graph representations — AdjMatrix + edgeSet, applicable on any graph —- solution : BFT -> list including nulls —-idea from my friend Deepak: pre-order tree walk and output one line per node {payload, direction from root, direction from Level 1 ancestor, direction from Level 2 ancestor..}. The technique is reusable even though overall efficiency and complexity is not optimal. We don’t need to be optimal.

  • first node in the file is always the root
  • the output file always includes the higher level nodes before the lower-level nodes
  • delimiter (\n) needed between nodes
  • demerit — data bloat… too verbose if tree is sparse but 99 levels deep

—-If the tree is a BST, then this idea is based on the fact that a reading a series of unsorted nodes can construct a BSTree.

For this solution, we have no requirement on the payloads. They can be booleans.

first pass — in-order walk to build a hash table of {node address -> integer id}. If we treat the ID’s as node keys, then the tree is a BST.

2nd pass — BFT or pre-order walk the original tree. For each node, we write node ID’s to the file, without writing the links or addresses. File looks like T,5|T,4|T,6|T,2|… When we reconstruct the tree, we read each node from the file, and insert that node into our new tree, using the id value as key to decide where.

If the original tree nodes have keys, I will just treat it as payload, and use my assigned ID as key

Kyle’s simple trie ] python is Usable for some rare coding IV. By Kyle Stewart. has a slightly more complete requirement.

I think I can modify (not write from scratch) Kyle’s code to add the word search.

Looks similar to the 10-billion phone number problem

pass generator output to next generator

I think this technique can be extremely time-saving in coding tests. my code demos:

for myset in pool:     output = list(gen(output, myset))

The gen() function uses yield. For the first call to gen(), we exhaust all of its items and save into a list named “output”.

Then we pass this list into the second gen(), this time with a different myset

2 threads taking turn #std::ref #CSY

Vanguard Q: write a program containing exactly 3 threads printing 1/3/5/7/… 2/4/6/8… respectively, but in lock steps, as if passing a token between them. is my solution.

I managed to avoid sleep() and condVar. The idea — all thread run the same function which

  1. check the shared mutable variable Next. If it’s “my” turn then
  2. grab lock, print my next number, update Next, release lock
  3. end-if
  4. yield and exit

I used an atomic<char> “Next” set to lastOutput%something, that’s visible to all threads, even if not holding a lock.


POSIX semaphores do grow beyond initial value #CSY

My friend Shanyou pointed out a key difference between c++ binary semaphore and a simple mutex — namely ownership. Posix Semaphore (binary or otherwise) has no ownership, because non-owner threads can create permits from thin air, then use them to enter the protected critical section. Analogy — NFL fans creating free tickets to enter the stadium. is one of the better online discussions.

  • ! to my surprise, if you initialize a posix semaphore to 1 permit, it can be incremented to 2. So it is not a strict binary semaphore. is my experiment using linux POSIX semaphore. Not sure about system-V semaphore. I now think a posix semaphore is always a multi-value counting semaphore with the current value indicating current permits available.

  • ! Initial value is NOT “max permits” but rather “initial permits”

I feel the restriction by semaphore is just advisory, like advisory file locking. If a program is written to subvert the restriction, it’s easy. Therefore, “binary semaphore” is binary IIF all threads follow protocol. claims “Mutex can be released only by thread that had acquired it, while you can signal semaphore from any non-owner thread (or process),” This does NOT mean a non-owner thread can release a toilet key owned by another thread/process. It means the non-owner thread can mistakenly create a 2nd key, in breach of exclusive, serialized access to the toilet.–-part-1-semaphores/ is explicit saying that a thread can release the single toilet key without ever acquiring it.


[[DougLea]] P220 confirms that in some thread libraries such as pthreads, release() can increment a 0/1 binary semaphore value to beyond 1, destroying the mutual exclusion control.

However, java binary semaphore is a mutex because releasing a semaphore before acquiring has no effect (no “spare key”) but doesn’t throw error.

2 nodes] binTree-with-cycle: locate common ancestor

Q (Leetcode #236): given 2 valid nodes (and root) of a binary tree, find the lowest common ancestor. A node can be a (direct/indirect) descendant of itself. All values distinct. No uplink.

classic problem:)

Q2 (my own requirement): what if cycle is possible?

My idea — Just run a lazy-dft to find the two paths-from-root. On each path, if we detect a cycle we terminate that path. Before terminating any path, need to check if we hit both nodes, so after finding one node we must go all the way to the leaf node or the one of the 2 given node.

As soon as we find the 2 paths we terminate DFT.

IIF two CPUs are given, my dft will use two threads — one left to right; the other right to left. This will more quickly locate the 2 target nodes if they appear near extremities. is my self-tested code, not tested on Leetcode

master The yield-generator uses python q[ yield ] to implement classic generators of

  • combinations
  • permutations
  • abbreviations
  • redraw

Key features and reasons why we should try to memorize it

  1. very brief code, not too dense (cryptic), .. helps us remember.
  2. reliability — brief means fewer hiding place for a bug. I added automated tests for basic quality assurance. Published solution
  3. versatile — one single function to support multiple classic generators.
  4. yield functions’ suspend-resume feature is particular powerful and possibly a power drill. This is my first opportunity to learn it.
  5. instrumentation — relatively easy to add prints to see what’s going on.
  6. halo — yield-based generator functions are a halo and also a time-saver
  7. elegant — brief, simple, relatively easy to understand
  8. recursive — calling itself recursively in a loop therefore fairly brief but powerful
  9. useful in take-home assignments
  10. identity-aware — The second time you call myYieldFunc(44), it would usually return the same generator object as the first time you call it with that same argument (i.e. 44). If that generator object has reached end of it’s execution, then you won’t get any more output.

— How I might try to memorize it

  1. If needed, we just need to reproduce it quickly from memory a few times
  2. I added comments to help me understand and memorize it

std::defer_lock #kill1@4deadlock factors

Only in the lock() call does the thread grabs both locks together. This breaks “incremental acquisition” , one of the four deadlock conditions.

Sample code from

  std::unique_lock<std::mutex> ulock1(mutex1,std::defer_lock);
  std::unique_lock<std::mutex> ulock2(mutex2,std::defer_lock);

  // now grab both locks

std::weak_ptr phrasebook

ALWAYS need to compare with raw ptr + shared_ptr, to understand the usage context, motivations and justifications is concise.

— Based on the chapter in [[effModernC++]]:

#1 feature — detect dangle

  • use case — a subject that keeps track of its observers who might become dangling pointers
  • use case — objects A and B pointing to each other with ref count … leading to island. Using raw pointers exclusively is possible but requires explicit deletion, as pointed out on P 84 [[Josuttis]]
  • In both use cases, Raw ptr won’t work since dangle becomes unnoticed.
  • Achilles’ heel of the #1 feature — manual “delete” on the raw ptr is beneath the radar of reference counting, and leads to chaos and subversion of ownership control, as illustrated —
#include <iostream>
#include <memory>
using namespace std;

void f1(){
  auto p = new int(55);
  shared_ptr<int> sp(p);
  weak_ptr<int> wp(sp);

  cout<<"expired()? "<<wp.expired()<<endl; // false
  cout<<"deleting from down below\n";
  delete p; // sp.reset();
  cout<<"expired()? "<<wp.expired()<<endl; // still false!
  // at end of this function, shared_ptr would double-delete as the manual delete 
// is beneath the radar of reference counting:(
int main(){

both base classes”export”conflicting typedef #MI

Consider class Der: public A, public B{};

If both A and B expose a public member typedef for Ptr, then C::Ptr will be ambiguous. Compiler error message will explicit highlight the A::Ptr and B::Ptr as “candidates”!

Solution — inside Der, declare

typedef B::Ptr Ptr; //to exclude the A::Ptr typedef
// This solution works even if B is a CRTP base class like

class Der: public A, public B{
  typedef B::Ptr Ptr;

generate loopfree paths: graph node A→B

Q1: given 2 nodes in a graph containing N (eg 121) nodes, potentially with cycles, generate all simple paths between the pair. A simple path has no cycle. (In other words, for a simple path, length + 1 ==  # unique nodes)

I think there are classic math algorithms for it, because this is part of basic graph theory. Here are some applications of this type of algorithms —

  • Q1b (special case of Q1): given 2 nodes in a C-by-R rectangular grid, where every node is connected to (up to) four neighbors, generate all cycle-free paths between the pair.
    • Below, I solved this problem in python
  • Q2 (simple application of Q1 algo): generate all simple paths between ALL node pair in a graph. The shortest simple path has length=0. Longest simple path can potentially visit every node exactly once.
  • A: first generate all 121-Choose-2 node pairs. For each pair, solve Q1. Lastly generate the 121 trivial paths of length=0.
    • repetitive 😦
  • Q2b (special case of Q2): In a C-by-R rectangular grid, where every node is connected to (up to) four neighbors, generate all cycle-free paths between ALL pairs.
  • Q2c (easy one based on Q2): given a binary tree containing no cycles, generate all paths.

— A1b: my DFT implementation (probably not 100% correct) , where each “trail” either fails or becomes a path.

  1. from NodeA start a breadcrumb/trail. We can’t revisit any node already visited on current breadcrumb,
    1. if this is a matrix, then instead of a hashtable, we can also use a shadow matrix, but the breadcrumb is much smaller than a shadow matrix
  2. if we reach a node surrounded by nodes on the same breadcrumb, then the trail fails at a dead-end
  3. else we will reach NodeB 🙂 Print the breadcrumb

By construction, we won’t see duplicate paths 🙂 is the implementation

–BFT? I don’t think BFT can print each unique path

shortest path:2nodes]binary matrix #BFT

Q: given 2 cells in a binary matrix (1=black, 0=white=blocked), check the pair are connected and if yes return the shortest path. There exists a path of length 1 between any 2 cells IFF both are side by side or stack atop.

count paths between 2 bTree nodes #PimcoQ9 Ashish is arguably harder than this problem, but this problem allows moving in four directions.

binary-matrix island count #DeepakM technique is more applicable. A BFT path should work.

  • every reachable node is painted Green (like 2)
  • we give up after our queue is empty is the implementation, briefly tested.

max-sum path Down binTree #self-tested

Q1: Given a non-empty binary tree of signed integers, find the maximum path sum. For this problem, a path is defined as any sequence of nodes from any starting node to any node in the tree along the parent->child connections. The path must contain at least one node and does not need to go through the root. No uplink. No cycle.

Luckily, there’s no published solution for this modified leetcode problem 🙂


My solution — DFT. Along each root-to-leaf path, use the max-subarray (Kadane) algo and store maxSumEndingHere value in each node, for reuse.

Q: is there any duplicate work?
A: I hope not, thanks to memoization i.e. Node::meh field

Q: do we visit every path?
A: I think so.

I simplified the idea further in

Time complexity is .. O(V+E) = O(N), since I visit every node and follow each edge once only.

There might be algorithmically superior solutions on leetcode but I don’t want it to affect my joy, motivation and momentum.

Longest Parentheses run with multiple hierarchies

Q (Leetcode): Given a string containing nothing but the characters ‘(‘ and ‘)’, find the length of the longest valid (well-formed) parentheses substring. is my solution 100% tested on Leetcode

–My Single-iteration solution:

Challenge is data structure. I ended up with 2 data structures to be updated during the iteration

  1. A stack (holding openers’ index values) to locate the matching openers
  2. an array to save “scores”

For each closer, I will record the position of the matching opener, then compute the distance (minimum two).



reverse slist in K-groups is the problem I tried today, not a classic problem. Challenge is not the algorithm per-se but the Edit-Compile-Test-Debug cycle. I think some of us can come up with a conceptual algorithm quickly, but to implement it correctly took me hours.

Similarly, the problems below are not tough due to algorithm but the ECTD cycle can take hours, sometimes due to c++ iterator pitfalls, sometimes because we can’t easily visualize the data structure .. I wrestled with all of these problem, so please feel free to try them and discuss with me.

* print any tree (you can start with a binary) by level, in zigzag sequence
* given a linked list, write a function to remove all nodes greater than 55 (or any user input). Return the head of the modified list.

As decided last week, I didn’t bother to run the Leetcode test suit. They make me feel frustrated, worthless, defeated, inferior, weakling, quitter…. Without these tests I ran my own tests and I feel like a joyful hacker.

Even though I may not pass all Leetcode tests, I feel my code is reasonable quality and I’m proud of it.

—-Problem is well-defined but not very common.

Given a linked list, reverse the nodes of a linked list k at a time and return its modified list.

k is a positive integer and is less than or equal to the length of the linked list. If the number of nodes is not a multiple of k then left-out nodes in the end should remain as it is. O(1) space. Hopefully O(N) time.

—-My sol1: use my existing O(1) solution but now keep a count.

The first group and the last group are both tricky and can take up hours.

merge K presorted lists #O(what)

Q: Merge k sorted linked lists and return it as one sorted list. Analyze and describe its complexity.

Note K could be much larger than N. is my solution.

I feel this is mostly an optimization challenge. I can think of a few solutions

–Sol1: merge 2nd list into first. Then merge 3rd list into first … shows that this has higher runtime cost than the brackets solution.

Reason is, each 2-merge-to-1 must visit every node in both lists. So the first list nodes get visited K times!

–Sol1b: brackets.

There are only (log K) levels in the bracket so any list gets visited that many times.

–Sol3: in-place (inefficient)

We maintain K node-pointers for the K lists (K teams)

We also maintain a pointer to the last-added node in the merged list.

first node in K lists are put into a min-heap. Winner (smallest) team would be the “current list”. Now the winner team offers next node and add it into the heap. Winning team ..

What if N=1 and K is 1 billion?

specify(by ip:port) multicast group to join has zipped sample code showing

mc_addr.sin_port = thePort;

bind(sock, (struct sockaddr *) &mc_addr, sizeof(mc_addr) ) // set the group port, not local port!
mc_req.imr_multiaddr.s_addr = inet_addr(“”);

(void*) &mc_req, sizeof(mc_req) // set the IP by sending a IGMP join-request

Note setsocopt() actually sends a request!

====That’s for multicast receivers.  Multicast senders use a simpler procedure —

mc_addr.sin_addr.s_addr = inet_addr(“”);
mc_addr.sin_port = htons(thePort);

sendto(sock, send_str, send_len, 0, (struct sockaddr *) &mc_addr, …

max-profit buy 1 sell any-bought #pimco java Q8

Pimco Java HackerRank Q8

Q8: Each minute, your trading platform allows you to either buy one share, sell any number of shares that you own (short sell forbidden), or not make any transaction at all. Your task is to find the maximum profit you can obtain with an optimal trading strategy.

I remember having issues with some HackerRank test cases. Should use 64-bit int rather than the default java int.

This problem appears to be very similar to day-trading in hindsight #Nsdq but requires drastically different thinking:

  • a single scan to the left, starting from the last price point.
  • any time there’s a left-ward drop, we have a trading opportunity! is my tested solution, peer-reviewed by Ashish.

bash: split long command to multiple lines

I managed to split a huge g++ command line  to about 300 lines… much more readable.

The trick:

  • terminate each line with a space, a backslash and … NO trailing space
  • my perl one-liner must user four backslashes to insert that single backslash, then another \n
  • total there are 5 backslashes in a row.

Here’s the command

perl -pe "s/ -/ \\\\\n-/g"

prefer std::deque when you need stack/queue

  • std::stack is unnecessarily troublesome for dumping and troubleshooting … see
  • std::queue has a similar restriction.

Philosophically, std::stack and std::queue are container adapters designed to deny random access, often for safety. They are often based on an underlying “crippled deque”. As soon as you need to dump the container content, you want a full-fledged deque.

std::reference_wrapper usages

std::reference_wrapper is a class template that wraps a reference in a copyable, assignable object. Usages:

  • std::reference_wrapper is primarily used to store references inside standard containers which cannot hold references. In my projects, no one ever stores references in containers. Pointless.
  • std::reference_wrapper is also used to pass objects by reference to the constructor of std::thread, or the helper functions std::make_pair() and std::make_tuple()

Helper functions std::ref and std::cref() are often used to generate std::reference_wrapper objects.


–details on the std::thread usage shows that without std::ref(), the object passed to std::thread ctor is a clone, so the original object is not updated by the thread routine.

My uses std::ref() too. Same code also shows another adapter wrapper for a function says std::ref() is likely needed. has detailed sample code

buggy RAII-powered smart_ptr #SIG

Q1: given a well-behaved class X, someone wrote a RAII wrapper below. What problems do you see?

Q1b: how would you fix them?

%%A1: memory leak of x1, due to the synthesized op=()
%%A1: double-delete on x2

Q1c: OK Good. Now what if X has subclass Y,  all with virtual dtor?
%%A1c: X and Y each create virtual clone() to return a pointer to the host type

Q2: what’s the outcome of double-delete?
AA: undefined behavior. You are lucky if it crashes.

struct A{
  A(X* x): myX_(x){}
  ~A(){delete myX_;}
  X* myX_;
void demo(){
  A a1(new X()); //unnamed x1 object on heap
  A a2(new X()); //unnamed x2 object on heap
  a1 = a2;
/////////// above is original code; below is my Q1b answer -- Add op=()
A & operator=(A const & rhs){
  if (this != &rhs){
    auto tmp = this->myX_; //delete only after successful "new"
    this->myX_ = new X(*rhs.myX_); //clone the X object of rhs
    // if new or ctor throws, the memory is not really allocated 🙂
    delete tmp; //let's assume no exception here.
  return *this;

[18]fastest threadsafe queue,minimal synchronization #CSY

I got this question in a 2017 Wells white-board coding interview, and discussed with my friend Shanyou. We hoped to avoid locks and also avoid other synchronization devices such as atomic variables..

Q1: only a single producer thread and a single consumer thread and no other threads.

I put together a java implementation that can enqueue without synchronization, most of the time … See

Q1b: Is it possible to avoid synchronization completely, i.e. single-threaded mode?
A: No. Consumer thread would have absolutely NO idea whatsoever how close it is to the producer end. No. We asneed a memory barrier at the very least.

Q2: what if there are multiple producer/consumer threads?

I believe we can use 2 separate locks for the two ends, rather than a global lock. This is more efficient but invites the tricky question “how to detect when the two ends meet“. I am not sure. I just hope the locks enforce a memory barrier.

Alternatively, we could use CAS on both ends, but see lockfree queue #popular IV


given arbitrary value X,get nearest2nodes in std::map

Basically, find the left/right neighbor nodes.

! Don’t use upper_bound since lower_bound is enough.

  • If perfect match, then lower_bound return value is all you need. No need for 2 nodes:)
  • If no perfect match, then lower_bound() and prev(lower_bound)
  • if X too low, then begin() alone is all we can get
  • if X too high then prev(end()) alone is all we can get

See interface offers four methods .. see closestMatch in sorted-collection: j^python^c++

lower_bound() means touchUpperBound #4 scenarios

  • std::upper_bound() should be named strictUpperBound()
  • std::lower_bound() should be named touchUpperBound() since it can return an element touching the target value

If no perfect hit, then both returns the same node — lowest node above the target

  1. if target is too high, both return end(), which is a fake element
  2. if target is too low, both return begin(), which is a real element
  3. if target is matching one of the nodes, then perfect hit
  4. if target is between 2 nodes, then … this is the most common. caters to all four scenarios.

isPreorderBST(randomArray) #hackerrank

This was asked in a 2018 hacker rank interview, not as important as an on-site coding question. However, I see this question as a classic.

Should study some examples ! has a tested solution but it’s too cryptic. I added instrumentation to help myself understand it. See my github code.

My analysis — After we have seen some number of nodes, there’s exactly one possible tree we could construct, so let’s construct it.

Note during the construction, each new node has only one place to go (key insight) and after that we can check if it breaks pre-order. is my tested code, probably less elegant than the G4G solution, but I’m still proud of it.

I don’t think any solution (including the G4G) is really O(N). My solution is not inferior. It has a readability advantage. It’s longer but not necessarily slower.


unbounded queue for 1-Producer-1-Consumer #Wells

———- Forwarded message ———
From: Bin TAN (Victor) <>
Subject: demo of my design that first interviewer didn’t see

Hi Angelo,

During my interview with Jun, I spent the last 20 minutes locked in a rigorous debate — given an unbounded queue designed for single-threaded mode, can we provide a thread-safe facade so a producer thread and a consumer thread can operate on it safely, but without using locks in every operation.
Note In JDK and STL there are many collections designed for single-threaded mode, because such designs are simpler and significantly faster.
I argued valiantly that there exist designs where most of the time we can avoid both locking and CompareAndSwap. I failed to convince Jun. Jun believed firmly that unsynchronized access by two threads is always unsafe so we must use lock or CompareAndSwap every single time.
I just spent 20 minutes at home posting an implementation in
I would also like to point out the java ConcurrentHashMap lets two threads (possibly a writer thread and a reader thread) access the shared data structure concurrently, usually without locking or CompareAndSwap , when the two threads happen to access two distinct segments. Note there can be a large number of segments, up to a few hundred at least, so the chance of two threads hitting distinct segments is very high (99.999% chance for a 333-segments map). Therefore, contrary to what Jun said, for reader/writer threads to access the same data structure,  we don’t need locking in every read and write access.
concurrencyLevel – the estimated number of concurrently updating threads. The implementation may use this value as a sizing hint.
I wrote this (like most of my concurrent designs) in Java since Java memory model is better documented and better understood.

check array@0-N +! auxDS #Nsdq#contrived

— Q1: Given a size-5 array of integers. For every element x: 0<= x <=4. Array would be considered Good if all elements are unique, i.e. all the numbers 0 to 4 show up exactly once each. Please write a program to check if the array is good. If Bad, then print every number that’s showing more than once. O(N) time and O(1) space please.

We will denote array size as sz, so N == sz-1.  N=4 and sz=5 in the opening example.

— comments:

I feel this question (even without the bonus) is too hard to complete on the spot, unless you have done it before.

I made a mistake that would be unforgivable in west coast and Bloomberg interviews — failed to clarify requirements and assumed input array is immutable.

— Q1b (bonus): Also print every missing number. (Requires iterating entire input.)

Example A {1,2,0,2,0} has repeating 0 and 2, and missing 3 and 4. Example B {3,1,4,2,0} is Good.

— comments:

If you feel the O() complexity requirements are daunting, then work out a less efficient solution first, and improve on it.

I feel in real projects, we always have enough memory to build a separate array, so the O(1) space requirement is contrived. A O(N) time and O(N) space solution is easier. Therefore, I rate this coding question as contrived and relatively non-standard. Yet the required techniques are “universal” in many high-end interviews.

has my 99% solution. The unfinished part is trivial.



max-profit #Nsdq short-sell

Q1: given a time series of price points within a past day, there are many ways to trade the day — one buy-sell, five buy-sells, or do-nothing … Short-sell allowed, but you must start and end the day holding no shares. For example, if you sell 9 times then you must buy 9 times, each time 1 lot (100 shares) exactly. Write a function to trade the day and produce the highest profit.  Note you can analyze the price points with perfect hindsight.

Interviewer asked for real code. Very little time given, so I believe the right solution is short, and much simpler than the question on “array of 0-N” (where interviewer asked for pure algorithm). is my buggy”greedy” solution. is my new solution.

Q2: now you are allowed to buy UP-TO 100 shares each time. All other rules remain. When if ever is it optimal to buy/sell fewer than 100 shares (a.k.a. an odd lot)?
%%A: never

Q3: same as Q1 but now your holding must always be long 1 lot, short 1 lot or zero holding.

–(broken?) brute force solution I gave on the spot:

Start with just four price points 1,2,3,4. Name every pair A=1>2; B=1>3; C=1>4; D=2>3; E=2>4; F=3>4. Each pair represents a buy/sell round trip, with a profit (ignore it if unprofitable).

How many “plays” i.e. how many ways to trade the day? 2^6 plays.

Just evaluate each play and find the best. Beware that ABD is same as B x 2 and should be scaled down by 2. Similarly, AEBFC == C x 3

detect cycle in binary tree

Q1: (Adapted from a real interview) Given a binary tree, where each node has zero or 1 or 2 child nodes but no “uplink” to parent, and given the root node, detect any cycle. is my tested implementation of 1a.

Solution 1a: Three web sites all point at DFT with hashset. I guess the hashset is shrunk whenever we return from a recursion

Solution 1b: I will first write a classic BFT, where each node is processed by processNode(). In this case, my processNode(me) function will start another BFT to traverse my subtree. If I hit myself, then that’s a cycle. I think the additional space required is the size of the queue, which is up to O(N). The (non-recursive) call stack at any time is at most 2 (or 3?) levels.

bft solution 1c: Upon append, each node keeps a parent-node-set, represented by hash table. Too much memory needed

Q2: how about constant space i.e. don’t use O(N) additional space?

I think any binary tree traversal requires more than O(1) additional space, except Morris. But can Morris even work if there are cycles?


CVA c++IV 1

Q: when would you pass (in/out of functions) by ptr rather than non-const ref?
%%A: if the argument can be null, or void ptr
A: if I need to pass a double ptr
A: if I need to pass a raw array
%%A: A factory often returns by ptr or smart ptr, seldom by reference

Q: do you catch exception by value, ptr or reference?
A: slicing

Q: iterating vector by int index vs iterator… which is faster?
A: iterator wins since vec[i] needs a O(1) pointer arithmetic operation to locate the element

Coding question

childThr.get_id() after join()

Not sure about pthreads but here is c++11 std::thread::get_id() behavior:
“If the thread object is not joinable, the function returns a default-constructed object of member type thread::id.”
I believe after you join a childThr, that thread is no longer joinable, SO get_id() will return a meaningless boilerplate value.

Solution: to use that id, you need to save it before joining is my experiment

given int array,find triplets: x divides y;y divides z #Trex

Within a sequence of unique natural numbers, count the occurrence of so-called triplets of (x,y,z) where x divides y and y divides z.

I feel this is not a classic, but the technique is somewhat reusable. is the improved solution after lots of hints.

— a useful technique:

Construct each {factor, multiple} pair and save it in a collection. Any new pair constructed will be checked against all existing pairs. The “check” is the trick. My 25 Apr 2018 commit shows the most efficient “check” I can think of.

list of pairs -> list of triplet -> list of quadruplet -> list of quintuplet … Build up pairs to construct triplets. Use triplets to construct quadruplets. Use quadruplets to construct quintuplets.

The pairs collection doesn’t need to be fully built before we start constructing triplets.

iterator invalidated{erase: opaque

See also

I had multiple encounters with STL iterator invalidation, esp. after erase().

  • sometimes I get correct result
  • sometimes I get segfault
  • sometimes I get incorrect result

Opaque i.e. no obvious clues, so no keyword to google

Luckily my code was 20-lines

Luckily i could reproduce it once a while.

This is another example of tough bugs to hunt down.

3-way sorter/classifier #FB

— Requirement — You are give a list of possibly repeating integers. Each belongs to exactly one category, either category Low, Medium or High. There is a function to return the category for any integer. Please write a constant-space function to output all category-L members, followed by all the category-M members, and finally the category-H members.

If input list has 9 trillion integers, output will have 9 trillion intergers too.
Within category Low, you can print in any order.
Within category Medium, you can print in any order.
Within category High, you can print in any order.

Personally, I would suggest (to make things concrete) that L means single-digit; M means double-digit and H means 3-digit or higher.

I failed to clarify requirements — input is array or linked list?

=== solutions

— 1st design — edit the same list in-place. Extract-insert each L or M element to group all elements into three sections. L on the left, H on the right. First design uses array, So I maintain 2 markers (itegers).

Invariant — the mm marker points to start of the M section. hh marker points to start of the H section, and a current marker points to start of unclassified section. L section is on the left of mm.

  • Every time I see a L element, I move it to beginning of array, and increment mm. (It’s more efficient to move it to left of mm.)
  • Every time I see a M element, I move it to the left of hh, and increment hh
  • Every time I see a H element, I do nothing.

— Second design avoids the inefficient array shift, using STL linked list.

Q: When to adjust mm and hh iterators?
A: Only when erase() hits mm or hh. This is important and easily missed. There’s no error message. Result may not show any difference. As a general rule

“Always check every iterator (except unneeded) when you do a list::erase()”

I plan to implement 1st design in python list (vector internally) and 2nd design in std::list.

–3rd design swap-based. If we only have an array and not a linked list, then swap-based design is efficient is my tested solution

c++get current command line as str

#include <fstream>

std::string const & get_command_line() {
  static std::string ret;
  if (ret.empty())
        std::string path="/proc/" + std::to_string((long long)getpid()) + "/cmdline";
        std::cerr<<"initializing rtsd parsername from "<<path<<" ..\n";
        std::ifstream myfile(path);
        if (!myfile) return ret;
        getline(myfile, ret);
  return ret;

move/forward used beyond argument-passing@@ rare

See also arg casting: #1 usage@move/forward

I believe move() and forward() are most often used when passing argument into a “worker” function (such as a ctor). I see no exception with forward() and only two questionable exceptions with move():

1) immediate steal:

Badstr && alias=move(passedIn);
do_something_with(alias.ptrField); //I don’t need the move() in this case
alias.ptrField = NULL;

2) move-assignment:

existingBadstr = move(passedIn); //could be written as
existingBadstr. operator=(move(passedIn)) //NOT really an exception to the norm

Tested in



Q1: generate one solution

Q2: generate all solution to the 4-queen problem without
A: there are up to 4^4 = 256 possible solutions so just iterate over all, but let’s say we want to extend the Q1 solution is my tested solution. The only important function is placeOnRow(), which took me 17 minutes to write on white-board, but I missed something very important — restoring Mat[r][c] to UNFILLED before continue

4×4 sudoku: backtracking #FB

4-queen is one of the simplest problems in the backtracking category, but I will look at a mini sudoku. Sudoku was once asked in a Facebook 45-min white-board coding test. is my tested implementation.

I name the four values as color1  color2  color3  color4. A placement is {a color, row index, column index}.

Basic idea — at each empty cell, we try all four possible colors. If color1 is allowed, then we call the same function recursively to fill the next empty cell. Then (crucially) we check the status. If OK, then game over. If NOK, then we try another color. If no more color to try, then (crucially) we reset current cell to unfilled and return NOK

Key to the algo — backtracking is implemented by the restoration and returning NOK

Key to the algo — each empty cell would start a new recursive stack frame. But what data is saved on that stack frame?

Key to the implementation — know what variables go into the loop, what go to the stack and what can be global (simplest). In this case the returned next cell indices CANNOT be globals because each stack frame must remember “next-cell-after-me”

Key to the implementation — simplify state. Each stack frame doesn’t need to remember the “breadcrumb” of all previous placements. Each layer of recursive call only need to remember the colors already tried for its own current cell.

Key to the implementation — factor out simple utilities if they are well-defined and easy to implement. See comment in code.

The important function requires no more than 10 lines! So an interviewer could say “This is something a candidate can write in 45 minutes, assuming all the simple utilities are already available to him, namely failedSomething() and findNextCell2fill().

Truth is, most of us can’t write these 10 lines in an hour on the white-board if we don’t have the correct idea. My initial white-board implementation took 18 minutes and missed one important part —

Mat[r][c] = UNFILLED_COLOR # restore

I then took about an hour to make it work.

binary-matrix island count #DeepakCM

Q: has my solution. I don’t want to spend the time passing all leetcode tests! Diminishing return is conceptually identical, though using a different criteria for “connected” — diagonal neighbors are “connected”

Hi Deepak

I realize for “connected” problems, there’s definitely a graph underneath, so graph traversal is required. I guess BFT or DST will both find all the nodes connected to “me”.

Given a 1000 x 1000 all-black matrix, I think DFT recursion will go into exactly 1,000,000 levels and overflow stack space.

A high-level (vague) idea is

  • Scan in type-writer fashion. Suppose there are 10×10 = 100 cells either black or write. I paint each cell brown [1] once visited. I also use a integer counter to keep track how many cells already visited.
  • During the scan. If a cell is already visited, I will ignore it and move on
  • Any time I find a black cell, I will start a BFT (I dislike DST) to find all connected black cells.  Once I find all the black cells connected to “me”, I resume the type-writer scan.
  • Once my counter hits 100, I have visited all cells and will exit.

[1] you paint it white, which is probably better.

I thought this problem was too hard and not worth study, but you convinced me that it may come up and I can at least give a vague solution  … better than no solution.

Compared to a binary tree, walking over a matrix is /operationally/ (not conceptually) harder because

  • need to check array bounds. Error-prone and time consuming in coding tests
  • The (invisible) links will not cover all cells. To avoid missing a cell, we still need a full matrix scan. The integration of tree walk + matrix scan is unintuitive, to put it mildly.
  • During the tree-walk, you don’t want to visit any cell too many times or get lost in an endless loop. A brute-force technique — shadow matrix to remember all visited cells.
  • … However, once you get over these nitty-gritty operational complexities, then tree-walk in matrix is not really harder. These algo questions can therefore frustrate and fail many candidates untrained on The technique.

## CRTP usageS #template Method

This blog post describes two usages of CRTP

I briefly read the excellent blog I feel CRTP is advanced for almost all the contexts I can think of. (In contrast,  I can see some “necessary” usages of SFINAE, such as the one in my little AddOrder.h) shows a simple [1] example of CRTP to replace virtual functions. How much efficiency improvement does it make? Questionable. I always say that if you want the lowest latency, then write selected modules in assembly language, and store it in hardware like FPGA.

[1] if there is anything simple in template meta-programming.

I have heard of several techniques to avoid virtual functions, but I believe the actual evidence (in terms of measured improvement in latency) is likely unconvincing or insignificant. Therefore, if CRTP is used to eliminate virtual function latency, then I am not sure how much value it adds.

There are other techniques to avoid “virtual”. I feel they are easier to understand than CRTP.

Sometimes CRTP is the only choice — if the virtual function needs its own template parameters, then compiler will complain that “function template can’t be virtual”. Rahul hit this complaint.

Q: for a true virtual method v1(), the derived class is not yet written when the base class is compiled. Later, Only at run time can the “system” pick the right implementation of v1(). How about CRTP?
A: base class is Not compiled ahead of the derived class. Each derived class includes a header defining the base class template.


Beside this “virtual-elimination” use case, CRTP has other applications (am still unfamiliar with), but if I’m asked in interviews I will only mention this one use case. One of the common “other usages” is TemplateMethod with compile time (not runtime) resolution, the 1st usage in the excellent blog . In the classic template method pattern, the “procedure” is published and finalized in the base class. Individual steps of the procedure are virtual methods, resolved at runtime. In the CRTP version, superclass methods call subclass methods, safely and cleanly. Superclass using subclass is a taboo in most contexts, but both traditional TemplateMethod and CRTP-TemplateMethod are notable exceptions.

The article didn’t clearly highlight a key point about this usage — The base class NumericalFunctions is general purpose, designed to be subclassed by anyone.  I could write a Temperature class to subclass NumericalFunctions too. This way, the code in NumericalFunctions is available for reuse. is working code. Key points to remember about the code sample:

  • base-class — is a template with a dummy type “Sub”
  • derived classes — have the form “class Sub1 public Base<Sub1>”
  • the static dispatch (non-virtual) function in Base always static_cast “this” to *Sub.


c++matrix using deque@deque #python easier

My own experiment shows

  • had better default-populate with zeros. Afterwards, you can easily overwrite individual cells without bound check.
  • it’s easy to insert a new row anywhere. Vector would be inefficient.
  • To insert a new column, we need a simple loop

For python, # zero-initialize a 5-row, 8-column matrix:
width, height = 8, 5
Matrix = [[0 for x in range(width)] for y in range(height)]

In any programming language, the underlying data structure is a uniform pile-of-horizontal-arrays, therefore it’s crucial (and tricky) to understand indexing. It’s very similar to matrix indexing — Mat[0,1] refers to first row, 2nd element.

Warning: 2d array is hard to pass in c++, based on personal experience 😦 You often need to specify the size in the receiving function declaration. Even if this is feasible, it’s unwanted legwork.

[1] Warning — The concept of “column” is mathematical (matrix) and non-existent in our implementation, therefore misleading! I will avoid any mention of it in my source code. No object no data structure for the “column”!

[2] Warning — Another confusion due to mathematics training. Better avoid Cartesian coordinates. Point(4,1) is on 2nd row, 5th item, therefore arr[2][5] — so you need to swap the subscripts.

1st subscript 2nd subscript
max subscript  44  77
height #rowCnt 45 #not an index value  <==
width #rowSz [1]  ==> 78 #not an index value
example value 1 picks 2nd row 4 picks 5th item in the row
variable name rowId, whichRow subId [1]
Cartesian coordinate[2] y=1 (Left index) x=4

FB: spiral number pattern

This is a fairly popular coding question.

Q: given an positive integer N, print all integers from 1 to N2 in a N-by-N matrix, following a spiral pattern, starting from top-left. Please implement it in

int[][] spiral(int n); //to output

9 8 7
2 1 6
3 4 5

Facebook focuses on corner cases, complexity analysis. is my implementation — iterative, deque<deque> matrix.

detect loop in slist #Part 2

Note there’s an open question at the end

Don’t use fast-slow iterators. These 3 solutions below each has advantages over the fast-slow iterators solution.

–solution: list-reversal, based on and implemented in
Basically, iterate from aa to bb to cc … and reverse each arrow. Suppose dd points to bb initially forming a loop. At some point the snapshot is

  • null <- aa <- bb <- cc <- dd  while currentNode is dd and nextNode is bb.

If we go on, we would be back at aa, proving the existence of a loop.

Benefit of this solution — O(1) space complexity and O(N) time complexity

constraint — wont work on read-only lists .

–solution: count nodes against memory usage — my invention
We can get a reliable estimate of the memory usage of the entire data structure, and divide it by per-node footprint, so we know within good approximation how many nodes exist. If we count more than that, there must be duplicates. shows one way to estimate the memory usage — keep track of the maximum and minimum addresses among all visited nodes.

The low level languages usually provide addresses. The higher level languages usually provide memory usage. In realistic applications, there is always a way to get an estimate of memory usage.

–solution: Brent. See my code

If we keep count of total number of increments, we should see that every time we double “power”, that total count is a power of 2. If we have incremented 32 times then we know { mu + lambda } must exceed 32… Advantages over the simple-2-iterator:

  1. tortoise is much faster, so if mu (merge point, or loop starting point) is far out, then we will find it faster. With the standard solution, tortoise takes a long time to reach the merge point.
  2. can return loop size

I now believe I can predict in which “phase” we will detect the loop — If lambda is between 2^5 and 2^6, then we will detect the loop in Phase 6, and we can’t detect in Phase 5

I suspect power-of-3 is also working:

  • If the slower iterator stops moving completely, then the algorithm is broken because the slower iterator could remain outside the loop.
  • If the slower iterator tailgates the faster iterator and the following distance is always always within a static limit (like 99), then the algorithm is broken because the loop size could exceed 99, so the two iterators would have no chance of rendezvous.
  • ==> If the following distance grows gradually, AND the follower is never stuck forever, I think eventually they will meet —
    • Suppose right after a tortoise jump, powOfN bumps up to 81 but actual loop length is shorter, like 80. Now within the next 81 iterations, the hare would move step by step while the tortoise remains. They would rendezvous for sure.
  • —- pros and cons for power-of-3 over power-of-2 —-
  • pro: if a big loop involves root, tortoise won’t have to jump too early too frequently ==> Fewer comparison needed.
  • con: if there’s a tiny loop far out, tortoise would end up jumping too late?

hackerrank IV: full query

/*requirement: given 99 sentences and 22 queries, check each query against all 99 sentences. All the words in the query must show up in the sentence to qualify.
bool check(string & q, string & sen){
    istringstream lineStream(q);
	string qword;
    while(getline(lineStream, qword, ' '))
	  if ( string::npos == sen.find(qword)) return false;
	return true;
void textQueries(vector <string> sentences, vector <string> queries) {
  for (string & qr: queries){
	string output1query;
    for(int i=0; i<sentences.size(); ++i){
	  if (check(qr, sentences[i]))	    output1query.append(to_string(i)+" ");
	if (output1query.empty()) output1query = "-1";
int main(){
  vector<string> sen={"a b c d", "a b c", "b c d e"};
  vector<string> que={"a b", "c d", "a e"};
  textQueries(sen, que);

RVO^move : on return value

Let’s set the stage. A function returns a local Trade object “myTrade” by value. Will RVO kick in or move-semantic kicks in? Not both!

I had lots of confusions about these 2 features.[[effModernC++]] P176 has a long discussion and an advice — do not write std::move() hoping to “help” compiler on a local object being returned from a function

  • If the local object is eligible for RVO then all compilers would elide the copy. Your std::move() would hinder the compiler and back fire
  • if the local object is ineligible for RVO then compiler are required to return an rvalue object, often implicitly using st::move(), so your help is unneeded.
    • Note local object returned by clone is a naturally-occurring temp object.

P23 [[c++stdLib]] gave 2-line answer:

  1. if Trade class has a suitable copy or move ctor, then compiler may choose to “elide the copy”. This was long implemented as RVO optimization in most compilers before c++11. is my experiment.
  2. otherwise, if Trade class has a move ctor, the myTrade object is robbed

So if condition for RVO is present, then most likely your move-ctor will NOT run.

c++compiler select`move-ctor

This is a __key__ part of understanding move-semantics, seldom quizzed. Let’s set the stage:

  • you overload a traditional insert(Amount const &)  with a move version insert(Amount &&)
    • by the way, Derrick of TrexQuant introduced a 3rd alternative emplace()
  • without explicit std::move, you pass in an argument into insert()
  • (For this example, I want to keep things simple by avoid constructors, but the rules are the same.)

Q1: When would the compiler select the rvr version?

P22 [[c++stdLib]] has a limited outline. Here’s my illustration

  • if I pass in a temporary like insert(originalAmount + 15), then this argument is a rvalue obj so the rvr version is selected
  • if I pass in a regular variable like insert(originalAmount), then this argument is an lvalue obj so the traditional version is selected
  • … See also my dedicated blogpost in c++11overload resolution TYPE means..

After we are clear on Q1, we can look at Q2

Q2: how would std::move help?
A: insert(std::move(originalAmount)); // if we know the object behind originalAmount is no longer needed. shows when we need to use std::move() and when we don’t need.

Q: Just when do App (!! lib) devs write std::move()

I feel move ctor (and move-assignment) is extremely implicit and “in-the-fabric”. I don’t know of any common function with a rvr parameter. Such a function is usually in some library, but I don’t know any std lib function like that. Consequently, in my projects I have not seen any user-level code that shows “std::move(…)”

Let’s look at move ctor. “In the fabric” means it’s mostly rather implicit i.e. invisible. Most of the time move ctor is picked by compiler based on some rules, and I have basically no influence over it. shows when I need to call move() but it’s a contrived example — I have some object (holding a resource via heap pointer), I use it once then I don’t need it any more, so I “move” its resource into a container and abandon the crippled object.

Conclusion — as app developers I seldom write code using std::move.

  • P20 [[c++ std lib] shows myCollection.insert(std::move(x)); // where x is a local nonref variable, not a heap pointer!
    • in this case, we should provide a wrapper function over std::move() named getRobberAliasOf()
    • I think you do this only if x has part of its internal storage allocated on heap, and only if the type X has a move ctor.

I bet that most of the time when an app developer writes “move(…)”, she doesn’t know if the move ctor will actually get picked by compiler. Verification needed.

–Here’s one contrived example of app developer writing std::move:

string myStr=input;
vectorOfString.push_back(std::move(myStr)); //we promise to compiler we won’t use myStr any more.

Without std::move, a copy of myStr is constructed in the vector. I call this a contrived example because

  • if input is a char-array, then emplace_back() is more efficient
  • if input is another temp string, then we can simply use push_back(input)

depth-first-traversal, height-aware


struct Node {
    int data;
    Node *left, *right, *next;
    Node(int x, Node * le = NULL, Node * ri = NULL) : data(x), left(le), right(ri), next(NULL) {}
    Node _15(15);
    Node _14(14);
    Node _13(13);
    Node _12(12);
    Node _11(11);
    Node _10(10);
    Node _9(9);
    Node _8(8);
    Node _7(7, &_14, &_15);
    Node _6(6, NULL, &_13);
    Node _5(5, &_10, NULL);
    Node _4(4, NULL, &_9);
    Node _3(3, &_6,  &_7);
    Node _2(2, &_4,  &_5);
    Node root(1, &_2, &_3);

int maxD=0;
void recur(Node * n){
  static int lvl=0;
  if (lvl>maxD) maxD = lvl;
  if (n->left){ recur(n->left); }
  cout<<n->data<<" processed at level = "<<lvl<<endl;
  if (n->right){ recur(n->right); }
int maxDepth(){
int main(){

gdb to show c++thread wait`for mutex/condVar shows my experiment using gdb supplied by StrawberryPerl.

On this g++/gdb set-up, “info threads” shows thread id number 1 for main thread, “2” for the thread whose pthread_self() == 2 … matching 🙂

The same “info-threads” output also shows

  • one of the worker threads is executing sleep() while holding lock (by design)
  • the other worker threads are all waiting for the lock.
  • At the same time, the main thread is waiting in a conditional variable, so info-threads shows it executing a different function.

convert non-null-terminated char-array to std::string

std::string ccy (ptr->ccy, ptr->ccy+3); //using a special string() ctor

my ptr->ccy is the address of a 3-char array, but it’s immediately followed by other chars belonging to another field, in a tightly packed struct without padding. If you simply pass ptr->ccy to string() ctor, your string will take in many extra chars until a null terminator.

RBTree range count #enum,auto

// demo range counting with lower_bound. I don't know any faster algorithm
// demo auto keyword
// demo enum
// demo upper_bound different from lower_bound!
#include <iostream>
#include <climits>
#include <set>
#include <assert.h>
using namespace std;

set<int> s;
//typedef set<int>::iterator It; // ---> auto is easier
enum Mode { NoMin, NoMax, BothGiven };

size_t countWithInclusiveLimits(int limit1, int limit2, Mode m = BothGiven){
  if (s.empty()) return 0;
  auto it  = s.begin();
  auto it2 = s.end(); //it2 is the node past the last wanted node.

  if (m != NoMin) it = s.lower_bound(limit1);
  if (m != NoMax){
    it2 = s.upper_bound(limit2);
    assert(*it2 != limit2 && "it2 initial value should be consistent with end()");

  size_t ret = 0;
  for (; it != it2; ++it){
    cout<<*it<<" ";
  cout<<" --> "<<ret<<endl;
  return ret;
int main(){
  for(int i=-4; i<=9; ++i) s.insert(i*10);
  countWithInclusiveLimits(11, 55);
  countWithInclusiveLimits(0, 50, NoMin);
  countWithInclusiveLimits(10, 0, NoMax);

bbg-Eq: longest abbreviation #easier

Given a long word with N non-unique characters, we know there can be 2^N abbreviations (including the original word, and including the empty string).

Requirement: I give you a list of strings and you suspect some of them could be an abbreviation of your word. Find the longest string among them that’s a valid abbreviation. Optimize for time complexity.

I feel this problem appears simple but not easy to complete quickly is my python solution, different from the c++ solution below. Not sure which has more bugs.

I started with string::find() and hit the problem of non-unique characters. Interviewer hinted char-matching — the key point I needed for this type of problem.

#include <iostream>
#include <algorithm>
#include <vector>
using namespace std;

typedef string candidate;

vector<candidate> v{"monkey","aaaaaaaaaaaaaaaaaa", "mamalon", "monk"};
string const haystack{"mamalonkey"};
size_t hsize=haystack.size();

//Tip: can use typdef alias in argument:)
bool lenComp(candidate const & a, candidate const & b){
  return a.size()<=b.size();
ostream & operator<<(ostream & os, vector<candidate> const & c){
  size_t const sz = c.size();
  for (int i=0;i<sz; ++i){
        os<<c[i]<<" ";
  os<<endl; return os; } //single-pass 🙂 

bool check1candate(candidate const & c){ 
  if (c.size() > hsize) return false;

  for(int i=0, h=0; h<hsize; ++h){
        if (c[i] == haystack[h]){
          if (i==c.size()-1) return true;
  return false;
int main(){
  sort(v.begin(), v.end(), lenComp);
  cout<<v; for (int i=v.size()-1; i>=0; --i){
    if (check1candate(v[i])){
        cout<<"winner is "<<v[i];
        return 0;


bbg-FX: in-place array shuffle #splice

Update: In-Place usually requires 1) splice 2) swap.

Requirement: given a list like

{1,2,3,4,  25,29,44,33,  159,139,155,150,  177,176,169,180} which describes four Persons
{id1,id2,..age1,age2, weight1,weight2,…height1,height2,..}

Write a program to output each person’s full attributes like

{1, 25, 139, 177,   2, 29, 129, 176,   3, 24, 135, 169,   4, 33, 140, 180}

Note the number of attributes (in this case 4) is an input parameter to your program. You don’t know (or care about) the name of each attribute. If you are told there are 37 attributes in each person, just output attr1 .. attr37.

Note the input and output sequence are important.

Q (Bonus question): do it in-place without creating a new collection. You can still have a small number of variables in your code.
A: one way to “cheat” is to append each output person to end of the original vector. It will double in size, so we will truncate the first half.
A: I believe I can use the “homeless/displacement” technique.
A: I now feel many in-place array reshuffle can be done in place with a single temp variable but this may not be one of them.
A: how about starting with a linked list? has ..

bbg-FX: check if a binary tree is BST #no deep recursion

Binary search tree has an “ascending” property — any node in my left sub-tree are smaller than me.

Q1: given a binary tree, check if it’s a BST. (No performance optimization required.) Compile and run the program.

Suppose someone tells you that the lastSeen variable can start with a random (high) initial value, what kind of test tree would flush out the bug? My solution is below, but let’s Look at Q1b.

Q1b: what if the tree could be deep so you can’t use recursion?
A: use BFT. When we actually “print” each node, we check left/right child nodes.

I made a mistake with lastSeen initial value. There’s no “special” initial value to represent “uninitialized”. Therefore I added a boolean flag.

Contrary to the interviewer’s claim, local statics are automatically initialized to zero, but zero or any initial value is unsuitable (payload can be any large negative value), so we still need the flag.

#include <iostream>
#include <climits>
using namespace std;

struct Node {
    int data;
    Node *le, *ri;
    Node(int x, Node * left = NULL, Node * right = NULL) : data(x), le(left), ri(right){}
/*    5
  2       7
 1 a
Node _7(7);
Node _a(6);
Node _1(1);
Node _2(2, &_1, &_a);
Node _5(5, &_2, &_7); //root

bool isAscending = true; //first assume this tree is BST
void recur(Node * n){
//static int lastSeen; // I don't know what initial value can safely represent a special value

  //simulate a random but unlucky initial value, which can break us if without isFirstNodeDone
  static int lastSeen = INT_MAX; //simulate a random but unlucky initial value
  static bool isFirstNodeDone=false; //should initialize to false

  if (!n) return;
  if (n->le) recur(n->le);

  // used to be open(Node *):
  if (!isAscending) return; //check before opening any node
  cout<<"opening "<<n->data<<endl; if (!isFirstNodeDone){ isFirstNodeDone = true; }else if (lastSeen > n->data){
        return; //early return to save time
  lastSeen = n->data;

  if (n->ri) recur(n->ri);

int main(){
  cout<< (isAscending?"ok":"nok");

private dtor restricts instantiation to heap only

My test proves that declaration (even without definition) of a stack instance or global instance is illegal because compiler generates a (illegal) call to the private dtor!

Friend and static method can both also call the private dtor.

One usage of such a class — force all users to go trough factory to instantiate this class.

class PrivDtor{
  static void destroy(PrivDtor* p){p->~PrivDtor();}
  static void del(PrivDtor* p){delete p;}

//PrivDtor globalInstance;  //won't compile
int main(){
  //PrivDtor stackInstance; //won't compile

  PrivDtor * p = new PrivDtor;
  //PrivDtor::del(p); // works!
  PrivDtor::destroy(p);  //works!


3rd effect@volatile in java5

A Wells Fargo java interviewer said there are 3 effects. I named

  1. load/store to main memory on the target variable
  2. disable statement reordering

I think interviewer mentioned a 3rd effect about memory barrier.

This quote from Java Concurrency in Practice, chap. 3.1.4 may be relevant:

The visibility effects of volatile variables extend beyond the value of the volatile variable itself. When thread A writes to a volatile variable and subsequently thread B reads that same variable, the values of all variables that were visible to A prior to writing to the volatile variable become visible to B after reading the volatile variable. So from a memory visibility perspective, writing a volatile variable is like exiting a synchronized block and reading a volatile variable is like entering a synchronized block.

— “volatile” on a MyClass field provides special concurrency protection when the MyClass field is constructed :

java reference-type volatile #partially constructed has good details. address my doubt about “other writes“. Nialscorva’s answer echoes the interviewer:

Before java 1.5, the compiler can reorder the two steps

  1. construction of the new object
  2. assigning the new address to the variable

In such a scenario, other threads (unsynchronized) can see the address in the variable and use the incomplete object, while the construction thread is preempted indefinitely like for 3 hours!

So in java 1.5, the construction is among the “other writes” by the volatile-writing thread! Therefore, the construction is flushed to memory before the address assignment. Below is my own solution, using a non-static volatile field:

public class DoubleCheckSingleton {
	private static DoubleCheckSingleton inst = null;
	private volatile boolean isConstructed = false;
	private DoubleCheckSingleton() {
		/* other construction steps */
		this.isConstructed = true; //last step
	DoubleCheckSingleton getInstance() {
		if (inst != null && inst.isConstructed) return inst;
		synchronized(DoubleCheckSingleton.class) {
			if (inst != null && inst.isConstructed) return inst;
/**This design makes uses of volatile feature that's reputed to be java5
* Without the isConstructed volatile field, an incomplete object's 
* address can be assigned to inst, so another thread entering getInstance()
* will see a non-null inst and use the half-cooked object 😦
* The isConstructed check ensures the construction has completed
			return inst = new DoubleCheckSingleton();

first N prime Fibonacci: cache server #Thesys

The challenge is not in Fib or prime, but in the algo around them.

//This version is memory efficient as it only remembers the last
//two Fib numbers + all previous prime Fib numbers

//Requirement: For a given input number X, print
//the first X prime Fib numbers. Each output is a Fib and prime
//This version is memory efficient as it only remembers the last
//two Fib numbers + all previous prime Fib numbers
//The typedef of Fib is nice documentation
//My version of isPrime() is simpler
#include <iostream>
#include <string>
#include <vector>
#include <cstdlib>
using namespace std;

typedef unsigned int Fib;
bool isPrime(int i){
        if (i <= 1) return false;
        for(int j=2; j<=i/2; ++j) {
            if(i%j==0) return false;
        return true;

pair<Fib,bool> generate1Fib(){
  static Fib parent = 1;
  static Fib grandparent = 0;
  Fib newFib = parent + grandparent;
  grandparent = parent;
  parent = newFib;
  return make_pair(newFib, isPrime(newFib));

void print1stXFibPrimes(size_t const X){
    static vector<Fib> fibPrimes;
    int count = 0;
    for(int j=0; j<fibPrimes.size(); ++j){
                cout<<fibPrimes[j]<<" ";
                if (++count >= X) return;

    while( count<X ) {
          pair<Fib,bool> new1 = generate1Fib();
          if (new1.second){
                cout<<new1.first<<" ";
inline bool isInteger(const string & s) {
   if(s.empty() || ((!isdigit(s[0])) && (s[0] != '-') && (s[0] != '+'))) return false ;
   char * p ;
   strtol(s.c_str(), &p, 10) ;
   return (*p == 0) ;
int main(int argc, char *argv[]){
  for(int i=1; i<argc; ++i){
    string arg(argv[i]);
    if (isInteger(arg)){
      cout<<"calculate the first "<<arg<<" prime fibs"<<endl;
      print1stXFibPrimes( stoi(arg)) ;
      cout<<arg<<" is not an integer\n";

declare iterator]function template #gotcha

(Needed in some coding interviews and also in GTD!)

Update: With c++11, you can use the “auto” keyword and avoid the complexity.

If you drop the “typename” from the for-loop header, then compiler is confused

error: dependent-name ‘std::multiset::iterator’ is parsed as a non-type, but (template) instantiation yields a type
note: say ‘typename std::multiset::iterator’ if a type is meant

Basically, we need to be extra explicit to the confused compiler.

template<typename T> ostream & operator<<(ostream & os, multiset<T> const & l){
  for(typename multiset<T>::iterator it = l.begin(); 
      it != l.end(); ++it){
        os<<*it<<" ";

populate map<string,list> no list cloning #make_shared

I feel a few things are fairly impressive for a short coding interview:

  • no leak.
  • no explicit new(). make_shared allocates the control block efficiently

In c++11, we could probably return the list by value, and rely on move semantics.

Here’s an inefficient nested container. Saving pointer-to-list in the map is more common, because inserting or reading the map doesn’t need to clone the list!

//tested with -std=c++0x
unordered_map<string, list<size_t> > lookup;

shared_ptr<list<size_t> > myfunc(){
  shared_ptr<list<size_t> > ret= make_shared<list<size_t> >();
  return ret; //return shared_ptr by clone, cheaply

lookup.insert(make_pair(word, *myfunc())); //dereference to get the list

top N active stocks #bbg#cache,VO

Requirement: implement top_n(int N): top N most active stocks

Requirement 2: add a top_n_last_x_minute(int N, int X): top N that are last traded in the last X millisec

This program also demonstrates how to avoid storing the same string twice, as map key and also as field of the map value

#include <unordered_map> //requires -std=c++0x
#include <iostream>
#include <vector>
#include <string>
#include <algorithm>
#include <chrono>
#include <assert.h>
using namespace std;

struct Rec{
  string const * strPtr; //a best-effort to avoid storing duplicate strings
  int cumvol; //cum vol
  long lastUpd; //millsec since epoch

  Rec(int vol): strPtr(NULL), cumvol(vol),
    lastUpd( (chrono::system_clock::now().time_since_epoch()).count()) { }

  bool operator<(Rec const & other) const{
        return this->cumvol < other.cumvol;
  friend ostream & operator<<(ostream & os, Rec const & r){
        os<<r.lastUpd<<"-"<<*(r.strPtr)<<"  : "<<r.cumvol;
        return os;

typedef unordered_map<string, Rec>::iterator ITR;

class App{
  unordered_map<string, Rec> lookup;
  vector<Rec> v;
  bool dirty;
  App(): dirty(true){}
  void update(string const & symbol, int vol){
    this->dirty = true;
    ITR it = lookup.find(symbol);
    if (it == lookup.end()){
        pair<ITR, bool> pair = lookup.insert(make_pair(symbol, Rec(vol)));
        assert(pair.first->second.strPtr == NULL);
        pair.first->second.strPtr = //second is the valye in the pair
          & pair.first->first; //the key of the new pair
        assert(it->second.strPtr != NULL);
        it->second.cumvol += vol;
        it->second.lastUpd = (std::chrono::system_clock::now().time_since_epoch()).count();
  void top_n(int N, unsigned long X=-1){ //unsigned -1 gives the largest positive value!
    size_t sz=lookup.size();
    if (dirty){
      cout<<"resorting due to updates to database ..."<<endl;
      for(ITR i=lookup.begin(); i!= lookup.end(); ++i){
      sort(v.begin(), v.end());
      dirty = false;
    long now = (std::chrono::system_clock::now().time_since_epoch()).count();
    //cout<<now<<" is now"<<endl;
    for(int k=sz-1, count=0; k>=0; --k){
        if (now - v[k].lastUpd > X) continue;
        if (++count >= N) break;
int main(){
  App app;
  app.update("ibm", 1000);
  app.update("gs", 700);
  app.update("gs", 500);
  app.update("goog", 600);
  app.update("msft", 800);
  app.update("ge", 500);
  app.update("ibm", 160);
  app.update("c", 400);

make_shared syntax #Gotcha’s

Needed for take-home coding test.

#include <iostream>
#include <memory> // make_shared

struct C{
  float f;
  C(float arg): f(arg){}
int main() {
  std::shared_ptr<int> a(new int(13));
  std::shared_ptr<int> b = std::make_shared<int>(12);

  std::shared_ptr<C> c = std::make_shared<C>(1.1); //must pass in ctor args
  //shared_ptr<C> b= make_shared<C>(new C(1.1)); //can't use "new" with make_shared.


detect loop in slist #Part 1

Q: A singly-linked list (slist) contains a loop. You are given nothing but the head node. With O(1) space complexity, how do you locate the join node? For example,

0(head)->1->2->…101->102->103->4(again), so #4 is the merge point

Here’s Deepak’s ingenious trick

  1. first use the 2-pointer trick to find any node inside the loop.
  2. find the length (say, 55, denoted LL) of the loop using a single moving pointer, starting from that node
  3. now we cant discard that node
  4. Now start a pointer B from head and move LL steps.
  5. Now put pointer A at head
  6. Now move A and B in locksteps. They will meet for sure, at the merge point. Here’s the proof:

Suppose the merge point is at MM, i.e. MM steps from head. When A and B starts moving in locksteps,

  • how far is B from MM? MM steps!
  • how far is A from MM? MM steps too.

Note LL value could be small like 1.

struct Node{
  Node const * p;
  int const data;
  Node(int _data, Node const * _p): p(_p), data(_data){}
  friend ostream & operator<<(ostream &os, Node const & node){
        os<<<<" / "<<&node<<" -> "<<node.p<<endl;
        return os;

Node _9(9, NULL);
Node _8(8, &_9);
Node _7(7, &_8);
Node _6(6, &_7);
Node _5(5, &_6);
Node _4(4, &_5);
Node _3(3, &_4);
Node _2(2, &_3);
Node _1(1, &_2);
Node _0(0, &_1);
Node & root = _0;
Node const * mergePoint = &_1;

//how many distinct nodes in the loop
size_t getLoopLen(Node const & root){
  Node const * brunner = &root;
  Node const * frunner = &root;
        frunner = frunner->p->p;
        brunner = brunner->p;
        if (frunner == brunner) break;
  cout<<"now the two runners have met somewhere in the loop: "<<*frunner ;
  for(int ret = 1; ;++ret){
        frunner = frunner->p ;
        if (frunner == brunner) return ret;

Node const * getMergePoint(Node const & root){
  size_t LL = getLoopLen(root);
  cout<<"# of nodes in loop = "<<LL<<endl;
  Node const * frunner = &root;
  for(int i = 0; i<LL; ++i){ frunner = frunner->p; }
  //cout<<"front runer is "<<*frunner;
  Node const * brunner = &root;
  for(;frunner != brunner;){
        brunner = brunner->p;
        frunner = frunner->p;
  return frunner;

int main(){
  _9.p = mergePoint;
  Node const * ret = getMergePoint(root);
  cout<<"Merge point is "<<*ret;
  assert(ret == mergePoint);

check if a word can reshuffle to palindrome

requirement: With minimal time and space complexity, the corner cases are tricky. I decided to add a null terminator:)

int main() {
 string S;
  vector<char> v(S.begin(), S.end());
  sort(v.begin(), v.end());
  v.push_back(0); //null char
  char last=v[0];
  size_t occur=0;
  bool isOddFound = false;

  for(int i=0; i<v.size(); ++i) {
    bool isNew = v[i] != last;
    //cout<<"--> "<<v[i]<<" isNew = "<<isNew<<endl;
    if (!isNew){
        //cout<<last<<" occured "<<occur<<" times"<<endl;
    //cout<<last<<" occured total "<<occur<<" times"<<endl;
    if (occur % 2){
        return 0 ;
      isOddFound = true;
    last = v[i];
    occur = 1;

milePerGallon #DeepakCM SIG

Similar to GS Tick Query question but without optimization requirements.

–Requirement: Your Choice of Editor, OS, Compiler & Programming Language. Total time allotted for this  test is 75 mins, The further rounds of Interview will be based on this programming test.

A group of N people go for driving. When ever they refill the fuel in their vehicles, they update a text file with the following fields

<Person>, <Car>, <No of Miles driven on that day, before the fill>, <NO of Gallons filled i.e. gallons used for those miles>,<Date of fill>

All the fields are comma separated. A sample such file is given below

John, Ford, 350, 20, 20160921
John, Ford, 220, 13, 20160920
John, Ford, 230, 35, 20160918
John, Ford, 300, 22.5, 20161112
Jonathan, Mercedes GLA, 220, 13, 20160920
Mishal, Ford Escort, 230, 35, 20160919

Write a function with the following parameters
GetMPG(name_of_person, Start_date, End_date)
such that it should return the following information for each User/driver
1) Name of Car
2) Miles per Gallon for the mentioned period for each of the Car

A person can have more than one car but never share his cars. If there is no record for the mentioned date range, the function should not return anything for that specific car.

–analysis is my python solution.

next_perm@N color socks #complex #100%

A common IDE coding challenge — given x pairs socks of various colors, generate all the permutations, in ascending order. Each color has a value.

–Solution 1: std::next_permutation() and prev_permutation()

–solution 2: I can probably write my own next_perm(). Using This function we can generate an ascending sequence of permutations starting from the current content of a vector. is my iterative solution, but should use lower_bound() is my python solution, overcoming the lower_bound problem.

next abbreviation@word with repeat char #2^N #100%

This is a real coding question at 10gen/mongoDB is a simple iterative py solution. Not ascending.

Q1: Given a word with unique letter (like “abc”), Can you write c++ program to generate all abbreviations in any order?
Q2: same question, but don’t use any external storage beside character arrays.
Q3: same question, but output in ascending order? Expected output:

  1. a
  2. ab
  3. abc
  4. ac
  5. b
  6. bc
  7. c

It’s daunting to generate this sequence without recursion. is my recursive solution to Q3.

Q4: what if the word has non-unique letters, like “mongo”? The only solution I know relies on an external lookup device.

find anagram groups]file #1-pass, FB

Requirement: if any 2 words are anagrams, that’s a group. Within a file, identify all such groups.

Normalize each word using counting sort on the characters of the word. There’s no competing alternative in this isolated part of the solution. Main part of the solution is sorting all words based on the normalized version. There are many alternative solutions to this sort.

desirable — memory efficiency. create no or few objects besides the strings read in from the file.

desirable — avoid the need to create/allocate linkNode or wrapper objects around original words. In c++, such a wrapper is smaller than in java, probably a pointer to the original word + a link pointer

Solution 1 — insert into a sorted set, using a customized comparison (subclass binary_function) based on the normalized string and something unique, such as line number or object address. After all the inserts, when you iterate the set, an anagram group would show up in a row. We can also keep a separate array or iterators (pointers) that point to the first word in each group — emulating a multimap.

Without the something unique, the set would drop words.

I think insertion is inefficient since a normalized string is regenerated for the same word over and over. To overcome this, we could create a wrapper object with 2 fields — normalized string + a pointer to the original word. However, if “SAT” is a very common normalized string then we create multiple copies of it.

Solution 2 — separate chaining. Maintain a hashed map (faster) or sorted map keyed by the normalized strings, just a single copy for each normalized string. Value of each map entry is a linked list of the original words.

#include <iostream>
#include <iomanip>
#include <fstream>
#include <sstream>
#include <string>
#include <vector>
#include <map>
#include <algorithm>
using namespace std;

map<vector<char>, size_t> m;
template<typename T> ostream & operator<<(ostream & out , vector<T> const & v){
        for(int i=0; i<v.size(); ++i) out<<setw(1)<<v[i];
        out<<" ";
int main(){
  ifstream fs("words.txt");
  string line, word;
  while(getline(fs, line)){
     stringstream ss(line);
     while(getline(ss, word, ' ')){
        int endpos = word.find_last_not_of(" ");
        if (endpos < string::npos) word = word.substr(0, endpos+1);
        if (word.empty()) continue;
        //lower case
        transform(word.begin(), word.end(), word.begin(), ::tolower);
        vector<char> sorted(word.begin(), word.end()); sort(sorted.begin(), sorted.end());

        int count = ++m[sorted]; //1st time we get 1 🙂
        cout<<sorted<<"\t<- "<<word<<" :" <<count<<endl;


next Combo@3,using5distinct chars,permitting redraw

Modified from Leetcode problem 17. Suppose there is nothing but one red button. and there are L (like 5) letters on it.

Q: With 4 (=C) draws from 3 (=L) letters (same phone pad button), how many permutations? L^C.
Q: With 4 (=C) draws from 3 (=L) letters, how many combinations?

For C=1, Ans=3
For C=2, Ans=6
For C=3, Ans=10= 6 + 3 + 1?
For C=4, Ans=15=10+(10-6)+1
For C=5, Ans=21=15+(15-10)+1

–My explanation of the count:

Key: Each combo is represented as a sorted vector (ascending). There’s one-to-one mapping between such a vector and a combo.

let’s say L=3 and C grows from 1 to 3 .. For C=2, I put all combos on 3 rows (3 vectors or lists)

  • aa ab ac #all that starts with a
  • bb bc #all that start with b
  • cc # lone wolf

For C=3, Ans

  • first container is populated by prepending ‘a’ to all 3 “old” containers of items
  • 2nd container is populated by prepending ‘b’ to 2nd-and-lower old containers
  • 3rd container is always a lone wolf

Both solutions below are simple, reusable and implemented in

–sol-1 efficient incremental solution — for C=3, given one combo as a string, find the “next” combo. For example after bbc, we should get bcc. Bonus feature — output is naturally sorted 🙂

–sol 2: append the next char ..

next Perm@3boys out@5, non-recursive #complex

Latest --

// Requirement: generate all permutations of C(like 3) chars
//from N(like 5). The count should be N!/(N-C)!
// Bonus: generate in ascending order, where 'ascending' is
//defined on the basis that within the original word, a char
//has lower value than any char on its right. This is more clear
//when the word itself is a sorted string, but actually it's
//not needed.
//global collection is simpler than stack variables. Easier to visualize
//and uses less memory
#include <iostream>
#include <sstream>
#include <vector>
#include <deque>
#include <algorithm>
#include <iomanip>
#include <assert.h>
using namespace std;

string _tmp="abcd"; vector<char> const pool(_tmp.begin(), _tmp.end());
vector<size_t> poolIndex;
size_t const C = 3, N = pool.size(); //wanted: all permutations of C, out of the pool of items

//global collection, to be updated in each recursive layer.
vector<vector<size_t> > coll;
// Note each item (like a char or a color or a studentId) is
// represented in the global collection by an unsigned integer.
// This is the positional index in the original "pool" of items.
// This scheme ensures the permutations produced is ascending

string show(vector<size_t> const & v, string headline="", bool isPrint=true){
  stringstream ss;
  for (int i=0; i<v.size(); ++i){
  if (isPrint) cout<<ss.str()<<" <-- "<<headline<<v.size()<<endl; return ss.str(); 
void dump(string headline="", bool isAssert=true){ size_t const cnt = coll.size(); assert(cnt); size_t const len = coll[0].size(); size_t exp = 1; for (int t=N; t>N-len; --t) exp *= t; //loop len times
  assert(cnt == exp);

  stringstream ss;
  string last = "";
  for(int i=0; i<cnt; ++i){
    string item=show(coll[i], "", false);
    if (!isAssert) continue;
    assert(last<item && "should be strictly asending");
    last = item;
  cout<<headline<<"\t size "<<cnt<<endl<<ss.str()<<endl;

//RVO should kick in to skip the copying upon return
vector<size_t> const find_unused(vector<size_t> const & perm, size_t const len){
      vector<size_t> tmp4set_diff(perm); //permutations are by defnition unsorted.
      sort(tmp4set_diff.begin(), tmp4set_diff.end());
      vector<size_t> unused(N-len);
      set_difference(poolIndex.begin(), poolIndex.end(), tmp4set_diff.begin(),tmp4set_diff.end(),unused.begin());
      //show(tmp4set_diff, "tmp4set_diff"); show(poolIndex, "poolIndex"); show(unused, "unused");
      return unused;

//RVO should kick in to skip the copying upon return
vector<size_t> const new_perm(vector<size_t> const & perm, size_t unused){
        vector<size_t> newPerm(perm);
        //show(newPerm, "newPerm");
        return newPerm;
//This algo is considerably more complex than many recursive algos
//we have written recently, largely due to the set_difference()
void next_perm_from_pool_iterative(){
  for(size_t loop=0;loop<9 /*useful during dev to control when to exit*/;++loop){
    if (coll.empty()){
      for(size_t j=0; j<pool.size(); ++j){
        coll.push_back(vector<size_t>(1, j));
      // dump("^^^^ after initilization of coll ^^^^");
    size_t const len=coll[0].size();
    assert(loop+1 == len);
    if (len == C) {
    vector<vector<size_t> > newcoll;
    for(int kk=0; kk<coll.size(); ++kk){
      vector<size_t> const & perm = coll[kk];
      vector<size_t> unused(find_unused (perm, len));

      //now unused contains the pool items not in the current perm.
      //Suppose there are 3 unused items, we will create 3 new permutations
      //by appending each one to the current perm
      for(vector<size_t>::iterator it=unused.begin(); it != unused.end(); ++it){
        newcoll.push_back(new_perm(perm, *it));
    coll = newcoll;
    dump("^^^^ end of iteration ^^^^");

int main(){

next_Perm@3boys out@5 #80%

algo-practice: generate permutation@3, using5distinct chars

Such a skill might be needed in some coding IV sooner or later. Let’s write this in py or c++. Formally,

Q1: generate all permutations of 3, from 5 distinct chars, in any order.
Q2: generate all permutations of 3, from 5 distinct chars, in ascending order. You can sort the 5 chars first.
Q2b: once you have generated one permutation, how do you identify The next?

Note the same solution is a generalization of std::next_permutation(), so once you have this solution you have that solution.

How many? 5-choose-3 * 3! = 5!/(5-3)! = 5*4*3 = 60, denoted Answer(3). Paradoxically, Answer(4) == Answer(5) = 120

–algorithm 1 for Q1

  • 1st generate 5 single-char strings;
  • then for each generate 4 two-char strings. We get 20 strings.

–algorithm 1b for Q1: rec(5,3) will call rec(5,2). rec(5,2) has 20 items, each can generate 3 items for rec(5.3), because each item has 2 characters and a void to be filled by the 3 unused chars.

The 20 items returned should be a pair{vector of 2, vector of 3}

This produces a sorted collection:)

See my code




next_Combo@3 boys out@5 #recursive descending

//This recursive version suffers from stack overflow
//but it's able to print out combinations in Descending order and
//maintains the relative positions between any 2 items
//This version reduces vector cloning by growing/shrinking the prefix vector
#include <iostream>
#include <sstream>
#include <vector>
#include <iomanip> //setw
#include <algorithm>  //sort
#include <assert.h>
using namespace std;
size_t calls=0, combos=0;
size_t const C=3; //how many in each combination
vector<char> emptyV;

template<typename T> void dump(vector<T> const & p, string const & s){
  cout<<"------------ "<<s<<" ------------ size = "<<p.size()<<endl;
  for(int i=0; i<p.size(); ++i) cout<<setw(5)<<p[i];
  if (p.size()) cout<<endl;
template<typename T> int show(vector<T> const * p, vector<T> const * v = NULL){
  ++ combos;
  stringstream ss;
  for(int i=0; i<p->size(); ++i) ss<<setw(5)<<(*p)[i];
  if (v){
    for(int i=0; i<v->size(); ++i) ss<<setw(5)<<(*v)[i];
  static string last("ZZZZZ");
  string combo=ss.str();
  cout<<"combo: "<<combo<<endl; assert(last >= combo);
  last = combo;

template<typename T> int recurs(vector<T> & prefix, vector<T> const & pool){// size_t const suffix ){
  dump(prefix, "entering ... prefix"); dump(pool, "pool");
  if (prefix.size()            == C) return show(&prefix);
  if (prefix.size()+pool.size()== C) return show(&prefix, &pool);
  assert (!pool.empty());
  vector<T> const newPool(pool.begin()+1, pool.end());
  recurs(prefix, newPool);
  recurs(prefix, newPool);//this function returns only after all the layer are done

int main() {
  string tmp = "abcde";
  vector<char> v(tmp.begin(), tmp.end());
  assert(C <= v.size());
  recurs(emptyV, v);
  cout<<calls<<"  calls to the recursive funcion to generate "<<combos<<endl;

next_Combo@3 boys out@5 #recursive

//This recursive version suffers from stack overflow but
//it's able to print out combinations in ascending order and
//maintains the relative positions between any 2 items
//This version reduces vector cloning by growing/shrinking
//global objects but with higher complexity
//However, global variables actually simplify the logic!
#include <iostream>
#include <sstream>
#include <deque>
#include <iomanip> //setw
#include <algorithm>  //sort
#include <assert.h>
//#define DEBUG
using namespace std;
size_t calls=0, combos=0;
size_t const C=3; //how many in each combination
string tmp = "abcde";
deque<char> pool(tmp.begin(), tmp.end());
deque<char> prefix;

template<typename T> void dumpDeque(deque<T> const & p, string const & headline){
  cout<<"-- "<<headline<<" -- size = "<<p.size()<<endl;
  for(int i=0; i<p.size(); ++i) cout<<setw(5)<<p[i];
template<typename T> int showCombo(deque<T> const * p){
  ++ combos;
  stringstream ss;
  for(int i=0; i<p->size(); ++i) ss<<setw(5)<<(*p)[i];
  static string last;
  string combo=ss.str();
  cout<<"combo: "<<combo<<endl;
  assert(last <= combo && "should be ascending");
  last = combo;

template<typename T> int recurs(){
#ifdef DEBUG
  cout<<"-------------\nentering "; dumpDeque(prefix, "prefix"); dumpDeque(pool, "pool");
  if (prefix.size() == C) return showCombo(&prefix);
  if (pool.empty()) return 0;
  T poolhead = pool.front(); pool.pop_front();

  prefix.push_back(poolhead); //add poolhead to prefix

  //this 1st recursive function call starts a rather deep call stack and prints
  //all combinations with the given (new longer) prefix
  recurs<T>();//use the longer prefix and the shorter pool
  prefix.pop_back();//restore prefix
  pool.push_front(poolhead); //restore pool, needed by the 2nd call in the parent stack
#ifdef DEBUG
  cout<<"^^^^^^ restored before returning "; dumpDeque(prefix, "prefix"); dumpDeque(pool, "pool");

int main() {
  assert(C <= pool.size());
  cout<<calls<<"  calls to the recursive function to generate "<<combos<<endl;

next_Combo@3 boys out@5 #iterative complex

//Without loss of generality, each combination is internally represented
//as a sorted vector (ascending).
//There's one-to-one mapping between such a vector and a combination
#include <iostream>
#include <vector>
#include <iomanip> //setw
#include <algorithm>  //sort
#include <assert.h>
using namespace std;
size_t changes=0, calls=0;
size_t const C=3; //how many in each combination

template<typename ITR> bool isAscending (ITR const b, ITR const end){
  for (ITR last = b, i = b; i!=end; ++i){
    if (*last > *i) {
      cout<<*last<<" should be lower (or equal) but is higher than "<<*i<<endl;
      return false;
  return true;
template<typename T> void dump(vector<T> & v,  bool isAssert = true){
  for(int i=0; i<v.size(); ++i) {
    if (i == C-1) cout<<"  unused:";
  for(int i=0; i<v.size(); ++i){
    if (i == C-1) cout<<"  unused:";
    assert(isAscending(v.begin(), v.begin()+C) && "1st C elements should be ascending after next_combo (not next_perm)");
    assert(isAscending(v.begin()+C+1, v.end()) && "unused section should be ascending");

template<typename T> bool reshuffle(vector<T> & v, int p2u){
//      cout<<"after swap"<<endl; dump(v);
        sort(v.begin()+p2u+1, v.end());
//      cout<<"after sorting everyting to the right of p2u"<<endl; dump(v);

        if (p2u == C-1){
                sort(v.begin()+C, v.end());
                return true;
        //now reset everything to my right
        //now locate the best_man (next item after the new p2u) .. can use binary search
        for(int i=p2u+1; i<v.size() ; ++i){
          if (i==v.size()){ //p2u is highest possible!
                //cout<<"p2u is already highest"<<endl;
                sort(v.begin()+C, v.end());
                return true;
          if (v[p2u]<v[i]){
                //cout<<"best man = "<<i<<endl;
                for(int j=0; p2u+1+j<=C-1; ++j){
                  swap(v[p2u+1+j], v[i+j]);
                sort(v.begin()+C, v.end());
                return true;
        // now must return!

        assert(1==0 && "should never reach here");
        cout<<"after best_man search"<<endl; dump(v);

// reshuffles vector to the next higher combo
//Assuming 5-choose-3, the 1st 3 chars represent the combination,
//and the remaining characters at the end of the vector are
//unused in the current combination.
template<typename T> bool next_combo(vector<T> & v){
  dump(v );
  if (v.size() == C) return false; // C-choose-C == 1 !

  for(int p2u=C-1; /*last position*/ p2u >=0 ;--p2u){
    for (int unusedItem=C; unusedItem<v.size(); ++unusedItem){ //scan the unused section of the array
        if (v[p2u] < v[unusedItem]) {
          swap(v[p2u], v[unusedItem]);  //p2u should not change further
        //cout<<"identified "<<p2u<<" as position to upgrade... Will reset subsequent positions, and return"<<endl;
          return reshuffle(v, p2u);
    // no p2u identified yet. move p2u marker to the left
  cout<<"no more higher combo. This is the end"<<endl;
  return false;
int main() {
//  vector<float> v{111,222,333,444,555,666};
  string tmp = "abcdefg";
  vector<char> v(tmp.begin(), tmp.end());
  assert(C <= v.size());
  for(; calls<9992; ){
        if (!next_combo(v)){
         cout<<changes<<" changes performed till the highest combo; next_combo() call count = "<<calls<<endl;
         return 0;

multiple hits: lower_bound gives Earliest

Looking for lower_bound (2) in {0,1,2,2,3,4}, you get the earliest perfect hit among many, i.e. the left-most “2”.

No such complexity in upper_bound since upper_bound never returns the perfect hit.

No such complexity in set.lower_bound since it won’t hold duplicates.

int main(){
  vector<int> s{0,1,2,2,3,4};
  vector<int>::iterator it = lower_bound(s.begin(), s.end(), 2);
  cout<<"to my left: "<<*(it-1)<<endl;
  cout<<"to my right: "<<*(it+1)<<endl;
  cout<<"to my right's right: "<<*(it+2)<<endl;

lower_bound() may return end() #gotcha

lower_bound() may return end().

If your target value is too high and nothing qualifies, all 6 functions return the right end of the range. If you look at the (key value in) return value i.e. end-of-range,

  • This end-of-range node is a dummy. Never read its key value.
  • After lower_bound or upper_bound, always validate before reading the return value

I spent hours puzzled by the wrong data returned after lower_bound()->first. Specifically, if the map/set is keyed by integer, then the end()->first can be normal int value even when it fails and returns map.end()!

Consistent across 6 functions:

  • std::lower_bound
  • std::upper_bound
  • set/map methods

What if the target value is too low? Easier — upper bound should return left boundary iterator, and lower_bound returns the same iterator! See


std::dynamic_pointer_cast on a shared_ptr

Returns a copy (of the given shared_ptr) instantiated in the same “club”.

The returned type must be a derived type of the original… equivalent to a dynamic_cast() on a raw ptr. example looks lame as it doesn’t have virtual function, so dynamic_cast() isn’t applicable ! is my own experiment.

python to dump binary data in hex digits

Note hex() is a built-in, but I find it inconvenient. I need to print in two-digits with leading 0.

Full source is hosted in

def Hex(data): # a generator function
  for code in map(ord,data):
    yield "%02x " % code
    i += 1
    if i%8==0: yield ' '

print ''.join(Hex("\x0a\x00")); exit(0)

tail-recursion Fibonacci # tricky]python

Tail recursion is a “halo” skill in coding interviews. It turns out that most recursive functions can be reworked into the tail-call form, according to

The same author also demonstrates

  1. python recursion stack depth is about 1000 only, so deep recursion is unpopular in python
  2. python doesn’t support tail recursion
  3. some decorator trick can simulate tail recursion in python


Easiest demo problem is factorial(N). For Fibonacci, has a very short python implementation (though I suspect python doesn’t optimize tail recursion). Let me rephrase the question:

Q: Given f(firstVal=0, secondVal=1, length=0) returns 0, f(0,1,1) returns 1, can you implement f(0,1,N) using recursion but in O(N) time and O(1) space? Note Fib(N) ==f(0,1,N)

Key points in the python solution:

  • Start with iterative algo, then convert it to tail recursion.
  • use 2 extra arguments to hold last two intermediate values like Fib(2) Fib(3) etc
  • We saw in the iterative solution that memory usage is O(1), a good sign that tail recursion might be possible.
  • if you observe the sequence of Fib() values computed in the blackbox, actually, you see Fib(2), Fib(3) … up to Fib(N), exactly like the iterative solution.
  • solution is extremely short but non-trivial is my very brief implementation

split std::string on Custom delimiter #practice every6M

See also post on csv string parse…

For a longer delimiter, you may need string.find() has my own tested solution parsing individual tree node details from a stringstream

ifstream f1(fileName.c_str());
string line;
while(getline(f1, line)){
  for(int i=1; ;++i){
        int pos = line.find_first_of("\t");
        string token = line.substr(0,pos);
        cerr<<i<<" : " <<token<<endl;
        if (line == token) break; //there's no more tab in the line
        line = line.substr(pos + 1);

///// a simpler method:
istringstream lineStream("denmark sweden   india us"); //consecutive spaces are Not treated as one
string outputToken;
int main(){
  while (
    getline(lineStream, outputToken, ' ')) // <-- the only thing to remember
        cout << outputToken << endl;

shared_ptr dislikes private ctor/dtor in payload class

1) make_shared can’t call a private ctor. If your ctor is private, you have to use

shared_ptr sp(new MyClass); # within your class??

2) If your MyClass dtor is private, you simply can’t hold it in shared_ptr.

#include <memory>
using namespace std;
class C{
  //~C(){cout<<"dtor\n"; } // breaks shared_ptr<C>
  C(){cout<<"ctor\n"; }
  static shared_ptr<C> mk(){
    //shared_ptr<C> sp = make_shared<C>(); //won't compile if ctor is private
    return shared_ptr<C>(new C());
int main(){ shared_ptr<C> sp = C::mk(); }

custom delimiter for cin operator>> #complicated

Tested but is too hard to remember. Better use the getline() trick in

struct comma_is_space : std::ctype<char> { //use comma as delimiter
  comma_is_space() : std::ctype<char>(get_table()) {}
  static mask const* get_table() {
    static mask rc[table_size];
    rc[','] = std::ctype_base::space;
    return &rc[0];

istringstream iss(line);
iss.imbue(locale(cin.getloc(), new comma_is_space));

binary search in sorted vector of Tick pointer

Note the mismatched args to the comparitor functions.

(I was unable to use a functor class.)

std::vector<Tick const*> vec;
int target;
bool mylessFunc(Tick const * tick, unsigned int target) {
     //cout<<tick->ts<<" against "<<target<<endl; 
     return tick-ts < target;
lower_bound(vec.begin(),vec.end(),target, mylessFunc);

bool mygreaterFunc(unsigned int target, Tick const * tick){
     //cout<<a->ts<<" against "<<target<<endl; 
     return tick->ts > target;
upper_bound(vec.begin(),vec.end(),target, mygreaterFunc)

demo: static method declare/define separately n inherited

Low complexity in this demo, but if you lack a firm grip on the important details here, they will add to the complexity in a bigger code base.

  • When subclass is compiled, compiler complains about undefined sf() since it sees only the declaration. You need “g++ der.cpp base.cpp”.
  • Note the static method is inherited automatically, so you could call der::sf().
#include <iostream>
struct base{
  static void sf();
///////// end of header /////////
#include "base.h"
using namespace std;
void base::sf(){ // no "static" please
///////// end of base class /////////
#include "base.h"
using namespace std;
struct der: public base{};
int main(){
///////// end of sub class /////////

LRU cache #Part 2 std::map holding slist itr

Java offers LinkedHashMap-based solution. Below is a simple c++ version written by

    • Basic idea is nice — list as primary storage; and map value points to list nodes.  Therefore, By design, the lookup operation never inserts new nodes.
      • (Even if I can think of the same idea, I would be slow writing up this code in real time.)
    • This code showcases a handful of useful methods of list and map. For example,
    • list.splice() : Transfers the element pointed to by 3rd arg (itr), from 2nd arg (another list), into *this. The element is inserted before the element pointed to by 1st arg (itr)
    • To reproduce this code in real time, you need to memorize list.splice()! I don’t have the time.
    • The map value is an iterator — essential!
class LRUCache{
    size_t m_capacity;
    unordered_map<int,  list<pair<int, int>>::iterator> m_map; //m_map_iter->first: key, m_map_iter->second: list iterator;
    list<pair<int, int>> m_list;                               //m_list_iter->first: key, m_list_iter->second: value;
    LRUCache(size_t capacity):m_capacity(capacity) {
    int get(int key) {
        auto found_iter = m_map.find(key);
        if (found_iter == m_map.end()) //key doesn't exist
            return -1;
        m_list.splice(m_list.begin(), m_list, found_iter->second); //move the node corresponding to key to front
        return found_iter->second->second;                         //return value of the node
    void set(int key, int value) {
        auto found_iter = m_map.find(key);
        if (found_iter != m_map.end()) //key exists
            m_list.splice(m_list.begin(), m_list, found_iter->second); //move the node corresponding to key to front
            found_iter->second->second = value;                        //update value of the node
        if (m_map.size() == m_capacity) //reached capacity
           int key_to_del = m_list.back().first; 
           m_list.pop_back();            //remove node in list;
           m_map.erase(key_to_del);      //remove key in map
        m_list.emplace_front(key, value);  //create new node in list
        m_map[key] = m_list.begin();       //create correspondence between key and node

SFINAE #sizeof#ptr2member as template param is relatively simple, concise. Shows how to test T has method1() is shorter and uses the same sizeof trick. is another illustration

–all 3 resource above use sizeof and function template (not class template) — useful demo of my own code in production, powering the nyse+nyseAmerican real time market data parsers behind most of the biggest financial data sites.

When the compiler evaluates sizeof, which is a compile-time task, it would try one of the 3 func() overloads and check the winner’s return type[1] . Always exactly one of the 3 overloads can compile.

When T is AddRefreshOrderStruct, the compiler tries 1st overload, where AddRefreshOrderStruct needs a long field, and AddRefreshOrderStruct needs a sendTimeNS field. Pass! So the return type is int.

When T is EmptyStruct, the compiler tries the 1st overload, where EmptyStruct needs a long field … failed:( Only the last overload, the default overload, passes.

[1] the size of the return type is used to initialize the static const field!

The asterisk at end of the func declarations is needed as the func() argument will be NULL pointer. NULL pointer can match a pointer to any type.

c++get stacktrace@particular call site #core dump

Not needed in any interview…

It’s easier to trigger core dump in order to get a stack trace. I was successful with raise(SIGABRT)

Here’s a simple demo generating a back trace in a c++ source code. Not sure if it would work in a big code base, but it requires -rdynamic!

#ifdef usage_demo

g++ -g -rdynamic backtrace.cpp # -rdynamic needed
./a.out | perl -pe 's/.*\((.*)\+.*\[(.*)\]/ `c++filt $1` . `addr2line $2` /e;'


#include <execinfo.h>
#include <stdexcept>
using namespace std;
int f2(){
                throw 1;
                void *array[10];
                size_t size = backtrace(array, 10);
                backtrace_symbols_fd(array, size, STDOUT_FILENO);
int f1(){
        return f2();
int main(){

show free slots between meetings #bbg c++11

Q: You are given a list of teleconference booking like {9,10} meaning from 9am to 10am. We can multi-task, so overlap is fine. Can you print out all the free slots within the day? You can safely assume all the input data are sensible and between 9 to 18, i.e. our workday.

I solved this problem on the spot. My weakness is still the ECT cycle esp. the compiler errors.

I was familiar with the inverse problem of listing all busy sessions. So I decided to *focus* on that. I took a calculated risk that once I get that to work, I will have the cool confidence to solve the original problem.

corner case: two adjacent meetings. I added comment to specify exactly how to check for that
corner case: what if the entire day is one busy session?

int const SOD = 0, EOD=24;

struct Intv{ //can be a single meeting, a joined session or a free window
  int bgn, end; //end = 10 means 10 o'clock sharp. Will become float
  Intv(int a, int b): bgn(a), end(b){}
  bool operator<(Intv const & other) const { return this->bgn < other.bgn;}

  //can be a free function if no private fields
  friend ostream & operator<<(ostream & os, Intv const & a){
        return os;
template<typename T> ostream & operator<<(ostream & os, vector<T> const & v){
        for(int i = 0; i<v.size(); ++i) os<<v[i]<<"  ";
void printBusy(vector<Intv> const & arg){
  if (arg.empty()) return;
  vector<Intv> & v = const_cast<vector<Intv>&> (arg);
  sort(v.begin(), v.end());
  size_t const sz = v.size(); //cache
  Intv growable = v[0];
  for(size_t i = 1; i<sz; ++i){
        Intv & req = v[i];
        if (growable.end <{
          cout<<growable<<endl; //close current session
          growable = req;//start new session
        growable.end = max(growable.end, req.end);
  cout<<"last session : "<<growable<<endl;
void printWindow(vector<Intv> const & arg){ //subsequent exercise
        if (arg.empty()){
                cout<<"free slot "<<Intv(SOD, EOD)<<endl;
        vector<Intv> &v = const_cast<vector<Intv> &>(arg); //avoid vector deep cloning
        sort(v.begin(), v.end());
        if (v[0].bgn > SOD){
          cout<<"SOD free slot "<<Intv(SOD, v[0].bgn)<<endl;
        Intv growing(v[0]); //current busy period
        for(int i=1; i<v.size(); ++i){
          //Heart of the algo. each new meeting either extends the current
          //busy period or prints a free window before it
          Intv & meeting = v[i];
          //cout<<i<<" a meeting: "<<meeting<<endl;
          //cout<<"growing = "<<growing<<endl;
          if (growing.end < meeting.bgn){
                cout<<"free slot "<<Intv(growing.end, meeting.bgn)<<endl;
                growing = meeting;
                growing.end = max(growing.end, meeting.end);
        if (growing.end < EOD)
                cout<<"EOD free slot "<<Intv(growing.end, EOD)<<endl;
int main() {
        //pass in initializer list to initliaze a const vector
        printBusy({Intv(15,17), Intv(7,9), Intv(5,7)});

memory leak demo: memset() std::string

valgrind proves the leak.

using namespace std;
size_t const len=15;
struct A{
        string s4;
        A(string s="default std::string"): s4(s){ cout<<"ctor"<<endl; }
size_t const  cnt=2;
size_t siz=cnt * sizeof(A);
A arr[cnt], ar2[cnt];

char fname[] = "/tmp/,.dat";
void leakDemo1() {
                arr[1]=A("frown"); //somehow skipping this causes core dump
        cout<<"before write()"<<endl;

        int fd = open(fname, O_CREAT | O_WRONLY, S_IRUSR | S_IWUSR);
        write(fd, arr, siz);

                int fd2 = open(fname, O_RDONLY);
                read(fd2, ar2, siz);
        for (int idx = 0; idx < cnt; ++idx){
                A * tmp = ar2 + idx;
void leakDemo2(){
        int * intp = new int(11); //not deleted. valgrind detected the leak
        memset(&intp, 0, 8); //overwrite the 8-byte pointer object
        delete intp;  //deleting on the fake address from memset
void leakDemo3(){
        string s="hello";
        cout<<"sie of string == size of pointer = " << sizeof(string)<<endl; //inside the string object, there's nothing but a poitner!
        memset(&s, 0, 1); // overwite pointer object itself, so it now points to somewhere else ... leak

        //somehow, memset(....,2) causes seg fault?
int main() {

remove subsequent duplicates in slist #SocGen

Q (SocGen hackerrank): given the head node of a linked list that might have duplicate values,  return the head node of a sanitized linked list without duplicates.

eg: Node@0x7089 (value=3) ->Node@0x6481 (value=2) ->Node@0x7921 (value=3) ->Node@0x6308 (value=7) ->null

would become Node@0x7089 (value=3) ->Node@0x6481 (value=2) ->Node@0x6308 (value=7) ->null

Note the first node (value=3) is still in the list, but all subsequent nodes with (value=3) are discarded. is my tested solution

  • The two-pointer solution sounds simple but is over-complicated and error-prone during implementation.
  • fake-link-head technique is again useful

file (de)serialization, for array of simple structures #Gregory

Not needed for IV..

// simple and fast (de)serialization to file given an array of structures

#include <fcntl.h>
#include <iostream>
using namespace std;
size_t const len=15;

struct A{
        int i1;
        char cstr[len];
        //string s4; //doesn't really work
        A(int i=999, string cs="default c-string", string s="default std::string"): i1(i){
                strncpy(cstr, cs.c_str(), len);
size_t const  cnt=2, siz=cnt * sizeof(A);
A arr[cnt], ar2[cnt];

char fname[] = "/tmp/,.dat";
int main() {
        arr[0]=A(1,  "grin", "backbone");
        arr[1]=A(2,  "frown", "try/except/else");

        int fd = open(fname, O_CREAT | O_WRONLY, S_IRUSR | S_IWUSR);
        write(fd, arr, siz);

        int fd2 = open(fname, O_RDONLY);
        read(fd2, ar2, siz);

        for (int idx = 0; idx < cnt; ++idx){
                A * tmp = ar2 + idx;
                cout<<tmp->i1<<" ; "<<tmp->cstr<<" ; "<<endl; //tmp->s4<<endl;

ssh host1 ssh host2 q[cmd1; cmd2]

This simple automation script demonstrates how to ssh 2 layers into a machine to run a command.

Obviously you need to set up authorized_keys.

set -x
ssh -q uidev1 ssh -q bxbrdr2 "tar cfvz $tgz /data/mnt/captures/tp5/lfeeds/nysemkt/nysemkt-primary.2017${date}_03*"
ssh -q uidev1 ssh -q bxbrdr2 "hostname; ls -l ~/$tgz"
ssh -q uidev1 scp -pq bxbrdr2:$tgz .
set +x
ssh -q uidev1 "hostname; ls -l ~/$tgz"
scp -pq uidev1:$tgz .
ls -l $tgz

%%rwlock using only 1 mutex #Ashish

Q: implement a simple read/write lock using a basic mutex provided by a basic thread library.

— Target Usage Scenario —
Imagine a data table that any thread can read in the absence of concurrent updates, but only one writer thread at a time can update.

I feel some minimum fairness is needed. You don’t want a writer thread to keep waiting for a long time while readers come and go, as if the writer is a second-class citizen. In fact, this implementation favors writers to readers, assuming write is more urgent. Reader starvation is considered tolerable.

— implementation notes–

Since there are 2 mutexes involved, we will always acquire the police first. Helps reduce chance of deadlocks. Still, it’s better to use a single mutex, as shown in [[pthreads]]. One simple suggestion is replacing this->_mutex with a bool this->isWriting. I feel this is a common simplification to many synchronization designs.

After reading P84 [[pthreads]], the mutex field can be a nonref rather than a pointer. In that case, we should never copy this object. It should be created at time of construction.

Ashish pointed out a bug, so now, all 4 methods must acquire _police from onset, before reading/writing  internal shared data.

#include <cassert>
using namespace std;
class mutex{ //this class should be provided to us
  void release();
  void acquire();

class raiiLock{//when an instance goes out of scope on the stack, dtor will release the lock
  mutex const _lock; 

class rwlock{
  rwlock(rwlock const & other); //disabled
  rwlock & operator =(rwlock const & other); //disabled

  // data members
  mutex  _mutex; //used by writers only, never by readers
  mutex  _police; // controls modifications to internal data members
  size_t const _permits; // max number of concurrent readers allowed in theory
  size_t _available; // available permits, controlled by the police
  size_t _waitingWriters; // controlled by the police
  // note dtor will be synthesized, which will do nothing if all data members are primitives or pointers
  // note no-arg ctor is not allowed.
  rwlock(size_t maxPermits):
    _permits  (maxPermits),

  void release1permit(){
    raiiLock autoLock(_police); //blocking briefly
    _available ++;
    if (_available == _permits && _waitingWriters){
      //wake up one of the writers, perhaps using a conditional variable tied to _police mutex
  void releaseWriterLock(){
    raiiLock autoLock(_police); //blocking briefly
    _available = _permits;
    if (_waitingWriters){
      //wake up one of the writers, perhaps using a conditional variable tied to _police mutex
  char try_acquire1permit(){ // non-blocking
    //As Ashish pointed out, _police need to be acquired before 
    //checking _waitingWriters and _available. Any access to shared internal
    //data needs police protection.
    raiiLock autoLock(_police); //blocking briefly
    if (_waitingWriters) return 'w'; //immediately reject this
    // reader's request if there's any waiting writer.
    // Therefore, any subsequent reader requests will not affect the waiting writers.
    if (_available == 0 ) return 'a';

    assert( _available > 0 && _waitingWriters == 0);
    _available --;
    return '0'; // 0 means safe to read the protected data table
  void acquireWriterLock(){ // blocking
    raiiLock autoLock(_police); //blocking briefly
    _waitingWriters ++ ; //Subsequent readers will be rejected, to avoid writer starvation
    while(_available < _permits){
      //block in some kind of waiting room perhaps using a conditional variable tied to _police mutex
    assert (_available == _permits && "all permits should be released and the writer, if any, is done");
    _mutex.acquire(); // should never block
    _available = 0;
    _waitingWriters --;
    // now the current thread is free to update the table

3groupsOf3digits #YH #li.remove().. is the original blog post by my friend

Q: Find three 3-digit numbers with a ratio of 1:2:3;
These numbers pick their 3 digits from a range of 1 to 9;
All digits that form these numbers must be completely unique, i.e. you must use up all distinct 9 digits. For example, a set of 123:246:369 doesn’t qualify. is my solution, not necessarily smart.

Shows list.remove(anyValue)
shows default return value is None
shows sys.exit()
shows no main() needed, to save screen real estate

I see two distinct constraints.
* The obvious — number2 and number3 must be 2x and 3x and they must use the remaining 6 digits.
* The other constraint — number1 must use 3 distinct digits.

Real challenge is the first step — iteration to generate the first 3-digit number without checking the 2x and the 3x numbers. My implementation of it is basically the code outside the functions.

Now I think the simpler iteration is to take each numbers 123..329 and check each the number the same way now. I would reuse the “check” routine — better code reuse. Performance would only be significantly slower when there are many many disqualified integers between 123 and 329 i.e. a sparse array.

In contrast my iteration won’t waste time on those disqualified. So I consider it a little bit clever. My iteration took 20-40 minutes to write. Once this is done, the check on the 2x and 3x is much simpler.