TreeNode lock/unlocking #Rahul

Q: given a static binary tree, provide O(H) implementation of lock/unlocking subject to one rule — Before committing locking/unlocking on any node AA, we must check that all of AA’s subtree nodes are currently in the “free” state, i.e. already unlocked. Failing the check, we must abort the operation.

H:= tree height


— my O(H) solution, applicable on any k-ary tree.

Initial tree must be fully unlocked. If not, we need pre-processing.

Each node will hold a private hashset of “locked descendants”. Every time after I lock a node AA, I will add AA into the hashset of AA’s parent, AA’s grand parent, AA’s great grandparent etc. Every time after I unlock AA, I will do the reverse i.e. removal.

I said “AFTER locking/unlocking” because there’s a validation routine

bool canChange(node* AA){return AA->empty(); } # to be used before locking/unlocking.

Note this design requires uplink. If no uplink available, then we can run a pre-processing routine to populate an uplink lookup hash table { node -> parent }

— simplified solution: Instead of a hashset, we may make do with a count, but the hashset provides useful info.



count lower ints to my right is labelled “hard”.

Q: given an integer array nums , return a new counts array, wherein counts[i] is the number of smaller elements to the right of nums[i]


Order statistics tree  (i.e. an augmented RBTree) should make O(N logN). However, the actual algorithm is not clear to me.

One scan from right. Insert each node into this tree. Before inserting a node of value like 22, we will query the tree getRank(22).

Implementation wise, it’s hard to create a self-balancing BST from scratch. So I think an unbalanced BST might do.

Also, there might be some alternative solutions, like mergesort??

longest descending path through matrix #60%

Q: given an int-matrix that allows 4-way moves,  find longest path with strictly descending nodes. Lateral move disallowed.


I have code to generate all paths… generate loopfree paths: graph node A→B

— Solution 1: O(N logN)
O(N) First pass  to construct “waterflow” graph — visit N nodes. For each, check the four neighbors. Add an outgoing edge (in a Node.edgeList) to a neighbor node if it is strictly lower.

Now put all nodes into a min-heap, according to node height. Can be part of first pass.

Second pass we visit each item in the min-heap. For each node, compute longest descending path starting therefrom. Record this path length as Node.score. Note if a node is surrounded by higher or equal nodes, then it has score 0, as in the lowest node.

A higher node AA is always upstream to some “computed” nodes. (We don’t care about any incoming edges into AA). So pick the max of (up to four) lower neighbors’ scores. Add 1 to get AA’s score. This computation is O(1) but the heap pop() makes second pass O(N logN)

Note the lowest node may not be part of the longest path, if it is surrounded by the highest nodes.

getByRank() in sorted matrix: priorityQ^RBTree


recombinant binTree pyramid, where “water” flows east or south.

  • first level has one node .. lowest value. Add it to pq (i.e. priorityQ)
  • pop the pq and insert the two downstream nodes
  • total K pops, each pop is followed by up to 2 inserts

Heap will grow to up to K items, so each pop will be up to logK

Total O(K logK). To achieve this time complexity, we can also use a RBTree. The tree nodes can come from a pre-allocated array.

doubly-linked list as AuxDS for BST

I find it a very natural auxDS. Every time the BST gets an insert/delete, this list can be updated easily.

Q: How about the self-adjustment after an insert or delete?
%%A: I think this list is unaffected

With this list, in-order walk becomes really easy. is the first place I saw this simple technique.

post-order walk: create valuable subtree statistics is rather simple and short, showing stats like

  • max path sum from current node to any subtree node. Note a path must be non-empty, so this max can be negative
    • We can even save in each node this actual path
  • subtree height,
  • subtree size,
  • d2root — tested but doesn’t use subtree statistics
  • — can be easily implemented:
  • subtree payload sum
  • subtree max payload… These stats are powerful AuxDS

Applicable on k-ary tree

append+maxInRangeK #Rahul

Q: given a sequence of integers, implement two operations

  1. Operation 1: append another integer. Size becomes N
  2. Operation 2: given two arbitrary subscripts into the sequence, (they define a sub-array of size K), find the max

Rahul saw discussions without any optimal solution. I think a simple vector-based algo can achieve amortized O(1) for Append and O(logK) for Max.

AuxDS required.

— idea 1: segment the sequence into fixed segments


max-sum path up+down binTree #FB

Q2 (Leetcode “hard” Q124): find max path sum, where a ” path ” has minimum one node A and can include A’s left child and/or A’s right child. No uplink available.


I see two types of paths

  1. down the tree, starting at some upstream node A ending at a node in A’s subtree
  2. up-down the tree, starting at some A’s left subtree, ending somewhere in A’s right subtree.

For 1) has my tested solution

For 2: post-order walk to update each node a (signed) max-path-sum-from-here. See The 2 values from left+right children can solve this sub-problem

Q: can we use the 2) solution for 1)? I think so

#1(reusable)AuxDS for algo challenges

Here’s a Reusable data structure for many pure algo challenges:

Pre-process to construct a static data store to hold a bunch of “structs” in linked list -OR- growing vector -OR- growing RBTree , all O(1) insertion :). Then we can build multiple “indices” pointing to the nodes

Here are a few O(1) indices: (Note O(1) lookup is the best we can dream of)

  • hashtable {a struct field like a string -> iterator into the data store}
  • array indexed by a struct field like small int id, where payload is an iterator from the data store
  • If the structs have some non-unique int field like age, then we can use the same key lookup to reach a “group”, and within the group use one (or multiple) hashtable(s) keyed by another struct field

I think this is rather powerful and needed only in the most challenging problems like LRU cache.

balloon burst #DP optimization #50%

Q [ Leetcode 312]: not really classic : Given n (up to 500) balloons, indexed from 0 to n-1. Each balloon is painted with a number on it represented by array “nums”. You are asked to burst all the balloons one by one. If the you burst balloon i you will get nums[left] * nums[i] * nums[right] coins. Here left and right are adjacent indices of i. After the burst, the left and right then becomes adjacent. Find the maximum coins you can collect by bursting the balloons wisely.

If you burst a leftmost balloon, you collect 1*it*rightNeighbor coins. In other words, when multiplying 3 numbers, any absentee is a one.

0 ≤ nums[i] ≤ 100

Example: Input: [3,1,5,8]
Output: 167
Explanation: nums = [3,1,5,8] –> [3,5,8] –> [3,8] –> [8] –> []
coins = 3*1*5 + 3*5*8 + 1*3*8 + 1*8*1 = 167
int-array optimization problem.
Might be related to some classic problem.

Let’s define a generic math-function of 3 balloon IDs score(myle, me, myri). In this problem, score() is simply “return myle*me*myri “, but in the next problem, score() could be any math function of the three inputs.

I see each possible snapshot (having K balloons, i.e. at level K) as a graph node. Exactly 2^N nodes in the grid, i.e. 2^N possible snapshots i.e. 2^N combinations of these N balloons.

Every edge has a score. To compute the score, we only need the two nodes (snapshots) of the edge to identify the 3 balloons for score().

Pyramid — Let’s assume at bottom is “origin” i.e. snapshot of the original array ..Level 500; on top is “phi” i.e. snapshot of the empty array .. Level 0.

The problem transforms into a max path sum problem between these 2 nodes.

–solution-1 DP
From origin to any given node, there are many distinct paths each with a total score up to that node. If a node has 55 paths to it, the max sum among the 55 paths would be the uprank (upward rank) of the node.

If the node also has 44 paths from phi, the max sum among the 44 paths would be the downrank (downwrd rank) of the node. This is an interesting observation, but not needed in this solution since every edge is evaluated exactly once.

To our delight, uprank of a node AA at Level-5 depends only on the six Level-6 parent node upranks, so we don’t need to remember all the distinct paths to AA:). Our space complexity is the size of previous level + current level.

We just need to compute the uprank of every node at Level 6, then use those numbers to work out Level 5…. the Level 4 … all the way to phi.

If there are x nodes at Level 6 and y nodes at level 5, then there are 6x==5y edges linking the two levels.

Time complexity is O(V+E) i.e. visit every edge.

Level n: 1 node
Level n-1: n nodes
Level n-2: nc2 nodes

Level 2: nc2 nodes
Level 1: n nodes
Level 0: 1 node

Each node at level K has K child nodes above. This graph now suggests the max-path-sum algo (with edge scores), but it might be the only way to solve the problem, like the bbg odometer.

consider a DP algo to update the score at each node at level K, ie the max sum from root till here, via one of the K-1 nodes at level K-1

But Level 2 has too many (N-choose-2) nodes. Can We prune the tree, from either origin or phi?

%%AuxDS strength ] coding test #algoQQ

Update — Indeed experience..

  1. when a tough problem requires us to construct an explicit and non-trivial auxiliary data structures (AuxDS), i often have an intuitive feel for the data structures needed, not always the optimal.
  2. when a tough problem requires a simpler AuxDS + unusual algo, I don’t have an edge.
  3. When a tough problem requires only an unusual algo, I don’t have an edge.
    * eg: edit distance
    * eg: sliding window max
    * eg: maximal sub-matrix

    • Suggestion: invest in algoQQ. I feel they will stick with me.

intervalTree: classic RBTree to find overlapp`event

Notable detail — non-balancing BST won’t give logN performance, since the tree depth may degrade

Q1: given N event objects {start, end, id}, use a RBTree to support a O(logN) query(interval INT) that returns any one event that overlaps with INT, or NULL if none.

If the end time == start time in the input INT, then it becomes the focus today :

Q2: given N event objects {start, end, id}, use a RBTree to support a O(logN) query(timestamp ii) that returns any one event that’s ongoing at time ii, or NULL if none.

P312 [[intro to algorithms]] describes an elegant solution.

The text didn’t highlight — I think the end time of the input interval INT is … ignored. (Therefore, in my Q2, input ii is a single timestamp, not an event.) Precisely because of this conscious decision, the tree is sorted by event start time, and the additional payload (in each node N) is the subtree end time i.e. last end time of all events started before N (including N itself). By construction, N’s payload is equal or higher than N’s start time. (Note Our input ii can fall into gaps between the event intervals.)

Eg: suppose N starts at 2:22 and left-child payload says 3:33, then we know for sure there at least one ongoing even during the interval 2:22 to 3:33.  Not sure if this is useful insight.

The text illustrates and proves why this design enables a clean and simple binary search, i.e. after each decision, we can safely discard one of “my” two subtrees. Here’s my (possibly imperfect) recall:

Let’s suppose each time the current node/event is not “ongoing”. (If it is then we can return it 🙂  ) So here’s the gist of the algorithm:

Suppose i’m the “current node”, which is root node initially. Compare ii to my left child L’s payload (not my payload).

As defined, L’s payload is the highest end times of L + all its sub nodes. In other words, this payload is the highest end time of all events starting before me.

Note the algo won’t compare L’s start time with ii
Note the algo won’t compare my start time with ii, though the scenario analysis below considers it.

  • case 1: If payload is lower than ii, then we know all events on my left have ended before ii, so we can discard the left subtree completely. Only way to go is down the right subtree.
  • the other case (case 2): if payload is higher or equal to ii, then we know my left subtree contains some events (known as “candd” aka candidates) that will end after ii.
  • case 2a: suppose my start time is before ii, then by BST definition every candidate has started before ii. We are sure to find the “candd” among them.
  • case 2b: suppose my start time is after ii, then by BST definition, all right subtree events have not started. Right subtree can be discarded
  • In both cases 2a and 2b, we can discard right subtree.


O(1)getRandom+add+del on Bag #Rahul

Q: create an unordered multiset with O(1) add(Item), del(Item) and a getRandom() having the  probability of returning any item  based on the PMF.

Rahul posed this question to our Princeton PhD candidate, who needed some help on the Bag version.

====my solution:
On the spot, I designed a vector<Item> + hashmap<Item, hashset<Pos>>. The hashset records the positions within the vector.

Aha — Invariant — My vector will be designed to have no empty slots even after many del(). Therefore vec[ random() * vec.size() ] will satisfy getRandom() PMF.

add() would simply (in O(1)) append to the vector, and to the hashset.

— del() algo is tricky, as Rahul and I agreed. Here’s an illustration: Let’s say Item ‘A’ appears at positions 3,7,8,11,16 and B appears at positions 2,5,31 (the last in the vector). del(A) needs to remove one of the A’s and move the B@31 into that A’s position.

  1. Suppose the PMF engine picks vec[11] which is an A.
  2. unconditionally O(1) find the item at the last position in vector. We find a B, which is different from our ‘A’
  3. Here’s how to physically remove the A from position 11:
  4. O(1) replace ‘A’ with ‘B’ at position 11 in the vector
  5. O(1) remove 11 from A’s hashset and add 11 into B’s hashset, so A’s hashset size decrements.
  6. O(1) remove 31 from B’s hashset, so B’s hashset size remains

max all-black subMatrix #ZR

Same problem as

Q: Given a 2D binary matrix (L by N) filled with white(0) and black(1) cells, find the largest all-black rectangle. See raiserchu’s mail on 12 Sep 13. There is a clever DP solution, probably O(LN).


Worst case — A standard chess board? We can’t do better than O(LN) since there are LN cells to read.

–O(LN) leetcode solution based on histogram .. is latest code with my adaptations and my detailed comments.

— sol5:

First scan O(LN) to record, in each cell {bar height; horizontalBarStart/End}.

— idea 4unfinished

Scan #1 O(LN): build a shadow matrix “histogram” where each integer in the cell is the height (possibly 0) of the bar anchored therein.

Scan #2 O(LN) for each cell, remember the currentRunStart column index i.e. from that column until current column, we have an all-black box of height == current bar height

— sol3 O(LNS) new idea based on max rectangle ] histogram treat top 2 (denote J:=2) rows as a histogram. Find the max rectangle therein. Then J:=3 …

  • Scan #1 O(LN): build a shadow matrix “histogram” where each integer in the cell is the height (possibly 0) of the bar anchored therein. In other words, if a cell value=5 then there are exactly 4 consecutive black cells above this (black) cell. Build it incrementally, level by level, top to bottom.
  • Scan #2a: for each row in the shadow matrix, we run the proven algo in O(NS), Note there’s no help from previous row:(
    • S:= #unique heights, N:= matrix width 
  • Scan #2 := the entire scan of L rows. so worst case we hit O(LNS)

Q: Can we do better by reducing scan #2a complexity to O(N), by making use of the previous row results?

— My brute force solution 1: Each rectangle is identified by 2 vertices, i.e 4 integers. Without loss of generality, We require the “high” corner to have higher x-coordinate and higher y-coordinate than the “low” corner. (We can assume y-axis run upward.) With this O(N^4) nested loop we can iterate over all possible rectangles:

Lock low corner
Move high corner in typewriter (zigzag) steps i.e.
  hold highY and move highX step by step
  process the (series of) resulting rectangles
  increment highY and repeat
Move the lower corner in typewriter steps and repeat

Key observation: any “bad pixel” disqualifies every rectangle containing it.

— Here’s my partial solution:
We can effectively ignore all the “good pixels”.

1) Look at the x coordinates of all bad pixels. Sort them into an array. Find the largest gap. Suppose it’s between x=22 and x=33. Our candidate rectangle extends horizontally from 23 to 32, exactly. Notice there’s no bad pixel within this vertical band [1].
2) Look at the y coordinates of all bad pixels. Sort them into an array. Find the largest gap. Suppose it’s between y=15 and y=18. Our candidate rectangle extends vertically from 16 to 17, exactly.
[1] This candidate rectangle can expand All the way vertically, though it may give a bigger rectangle
Ditto horizontally.

SCB-FM stack-based FIFO in O(1)amortized

Q: given a hardware-based stack API consisting of 3 functions {pop/push/isEmpty}, please implement a queue api consisting of 3 functions {enqueue/dequeue/isEmpty} is similar


service dequeue from a hidden stack.

When hidden stack is empty, pop all nodes from visible stack to hidden stack. Amortized O(1) pop()

isEmpty() must add up two sizes.

[[python cookbook]] P658 implements this classic algo in 9 lines.

LFU cache #cf.LRU #72%

Q LFU (Least-Frequently-Used) cache to support the following operations: get and put in O(1)
* get(key) – Get the value (will always be positive) of the key if the key exists in the cache, otherwise return -1.
* put(key, value) – Set or insert the value if the key is not already present. When the cache reaches its capacity, it should invalidate the least frequently used item before inserting a new item. For the purpose of this problem, when there is a tie (i.e., two or more keys that have the same frequency), the least recently used key would be evicted.


  1. dstruc — centry i.e. CacheEntry node {key, value, hitCount, lastHit (timestamp), (optional)ptr to host LinkNode}, to be used in an inner linked list.
    • invariant: hitCount can only increase
  2. dstruct — inner minilist of centry nodes
    • invariant: list always sorted by lastHit. We can remove any intermediate node, but incoming node is always added to the Tail
  3. dstruct — fixed-sized (rehash-free) hashtable {key -> ptr to centry}, needed for mid-stream laser-removal
  4. dstruct — LinkNode {level, minilist-of-centry} where all centry objects share the same hitCount denoted “level”.
  5. dstruct — outer list of LinkNodes, always sorted by level

“bubble-up” operation — Whenever a centry gets a cache-hit, its hitCount increments. It immediately and unconditionally bubbles up to the LinkNode one level higher (to be created in O(1) if necessary) ((
* [o1] query the hashtable and follow ptr to remove the centry from the minilist in an old LinkNode
* [o1] insert the centry to the new level, at Tail of minilist. The new LinkNode could be non-existent but Never empty!
* [o1] optionally, new host LinkNode’s address is saved in the centry

  • Get() hit — relatively easy. Update the hitCount and bubble up
  • Get() miss — trivial
  • Put() Update — similar to get-hit
  • Insertion (possibly after deletion) — [o1] append to the minilist Tail in the Level-1 LinkNode (to be created if necessary) and add to hashtable
  • Deletion — always from list to hashtable, never the converse
    • [o1] identify lowest level present, then delete the head (i.e. eviction target) of minilist
    • when a linkNode becomes empty, it must disappear from the outer list, to prevent build-up of consecutive empty LinkNodes leading to linear search for eviction target. Imagine aaaaa bbbbb c[Now need to evict an “a”]. Therefore, array of LinkNode is unacceptable.

sequence@{peak within sliding window} O(N) no dequeue

Q (leetcode hard problem 239): Given an array nums, there is a sliding window of size k which is moving from the very left of the array to the very right. You can only see the k numbers in the window. Each time the sliding window moves right by one position, return the max item in it. Total run time should be O(N) but I think within each window we may not always get O(1). is the same problem stated more clearly.


Since there exists a O(N) solution i’m confident I can figure out a O(N) solution

–idea — first scan to identify all the troughs. Items around trough to be removed, but always keep original subscripts of remaining nodes

–latest idea, not using dequeue therefore likely original —

Update — consider a rising sequence before going any further. Virtually all items are showmen…

in my showman-stack, every element {idx,height} will show up on stage at least one episode. I would say each showman goes on stage either when it first enters the sliding window, or before it leaves the sliding window. If that’s the case at each slide i only need to consider the head and tail.

(Now i think this is in the right direction but is messier than it needs to be.)

first forward scan populates this stack and ensures only showmen are included. Let’s work out some examples. First part of first scan would simply find the max among first K items. 2nd scan might be combined with first scan, but for now, my goal is clear and focused — produce a clean stack of showmen-only

  • degenerate case — monotonic sequence requires virtually all elements be included
  • rules on the forward scan —
    • every new item needs to be pushed to the stack as it could be higher than all followers. The tricky logic is “what before the push”
    • Before we examine a new item, top of the stack is always the immediate predecessor. Need to add assert.

pseudo code:

O(N) pre-scan to find all troughs. For each of N items, record the preceding trough. No stack yet.

First scan: check if the most recent trough is within range. If NO then nothing to pop from stack, so simply push new item and advance. Now assuming YES.

while new item >= top of stack __&&__ the 2nd top is within range (distance <=K):

keep pop

while new item >= top of stack && distance between is below K && there would still exist an earlier higher item within range:
//at end of the while loop, we could actually produce some output, but I would postpone it to 2nd scan

(For the last K items in array, we need some O(K) special handling.)

I think both conditions are necessary for popping the stack. Are these two conditions alone sufficient justification to remove the top?

—- segment-based solution #elegant shows an elegant segment-based algo.

Suppose windows size := 10 and subscripts start at 0.
A) Conceptually we split the array into fixed segments of length 10. (The stub at the end is LG2.) Note unlike the ring algorithm, no physical data structure is implemented for this purely conceptual segments, but this conceptual data structure is essential to this clever solution.
B) Pre-scan – pre-scan the array twice .. rightward and leftward to populate two physical data structures, the LR-lookup and RL-lookup i.e. segment-wise max-till-here. For example,

RL[22] := max(#29 to #22) := Leftward max-till-22 within the enclosing egment@20 enclosing #20 to #29.

With the two auxiliary data structures RL lookup and LR lookup, the solution is mind-boggling simple and elegant. For example,

C) window@22 will overlap segment@20 and segment@30. The 29-to-22 section is covered by RL[22] and the 30-to-31 section is covered by LR[31]. So simply compare RL[22] vs LR[31].

I find this solution more intuitive, more visual than the ring solution. After the pre-scans, there’s no data structure to maintain so the iterations are stateless and extremely straightforward.

Q: what if increasing sequence?
Q: what if decreasing sequence?

—- The same webpage also has a concise explanation of the deque-based solution, kind of similar to but cleaner than my showman idea

• I will use a ring buffer of size=W as a ring is allocated at start-up time (at compile time if W is compile-time constant) and requires no dynamic allocation.
• Ring is FIFO. Terminology — I will say the earlier elements are removed from the Head like a ticketing queue; new elements are appended at the Tail.
• Invariant — I think within this ring, payloads are descending from head to tail.
• Ring holds subscripts, not payloads. Payloads could be strings or floats.
• At each iteration, 0 or 1 head item is removed if it is out of reach.
• At each iteration, incoming item kicks out all smaller tail items, as they are too close to the incoming giant. This operation enforces the invariant.

As a result, head of the ring (as a subscript) always points to the current max payload.


sum@arbitrarySubArray, mutable int #Rahul#segmentTree

Q: given an array of mutable integers, implement subArraySum(le, ri), and updateElement(idx, newVal)

This is data-structure heavy. You need correct data structure to support efficient update/query.

Assumption A: Without loss of generality, i will assume original array length is a power of two, such as 8

— Idea 1: carefully segment the array. Maintain array of Segment object {le, sum}

The segments can shrink/expand based on heuristics. For now, I will assume “Segment.le” is immutable.

Every update() will update the Segment.sum in exactly one segment per level.

At the leaf level, there are 8 segments of length one or two. (Given Assumption A, it would be two.)

Next level I will have 4 segments. Each segment at this level consists of exactly 2 leaf segments. Similar to Fenwick tree and segmented binary tree, update() and query() are both O(log N)

AuxDS ] each node2simplify algo #shadow matrix

In some algo questions, I like to add a tiny “metadata” field to the VO.

eg: BFT showing level

Q: how likely is interviewer to accept it?

A: I feel west coast interviews tend to entertain crazy ideas since algo challenges are more free-flowing, exploratory, thought-provoking. I remember the Bbg FX interviewer (Karin?) said my “in-place” is OK if I append to the existing vector then truncate the original portion.

A: I feel if it’s clever then they may appreciate it.

Space/time trade-off. The metadata field

  • can be a pointer to parent or next node (like LinkedHashMap)
  • can be an iterator into another (static) data structure
  • can be the element’s own address if not already available
  • can be a sequence id as sorting key
    • insertion id is popular,
  • can be a bunch of flags packed into 8-bit integer


top N active stocks #bbg#cache,VO

Requirement: implement top_n(int N): top N most active stocks

Requirement 2: add a top_n_last_x_minute(int N, int X): top N that are last traded in the last X millisec

This program also demonstrates how to avoid storing the same string twice, as map key and also as field of the map value

#include <unordered_map> //requires -std=c++0x
#include <iostream>
#include <vector>
#include <string>
#include <algorithm>
#include <chrono>
#include <assert.h>
using namespace std;

struct Rec{
  string const * strPtr; //a best-effort to avoid storing duplicate strings
  int cumvol; //cum vol
  long lastUpd; //millsec since epoch

  Rec(int vol): strPtr(NULL), cumvol(vol),
    lastUpd( (chrono::system_clock::now().time_since_epoch()).count()) { }

  bool operator<(Rec const & other) const{
        return this->cumvol < other.cumvol;
  friend ostream & operator<<(ostream & os, Rec const & r){
        os<<r.lastUpd<<"-"<<*(r.strPtr)<<"  : "<<r.cumvol;
        return os;

typedef unordered_map<string, Rec>::iterator ITR;

class App{
  unordered_map<string, Rec> lookup;
  vector<Rec> v;
  bool dirty;
  App(): dirty(true){}
  void update(string const & symbol, int vol){
    this->dirty = true;
    ITR it = lookup.find(symbol);
    if (it == lookup.end()){
        pair<ITR, bool> pair = lookup.insert(make_pair(symbol, Rec(vol)));
        assert(pair.first->second.strPtr == NULL);
        pair.first->second.strPtr = //second is the valye in the pair
          & pair.first->first; //the key of the new pair
        assert(it->second.strPtr != NULL);
        it->second.cumvol += vol;
        it->second.lastUpd = (std::chrono::system_clock::now().time_since_epoch()).count();
  void top_n(int N, unsigned long X=-1){ //unsigned -1 gives the largest positive value!
    size_t sz=lookup.size();
    if (dirty){
      cout<<"resorting due to updates to database ..."<<endl;
      for(ITR i=lookup.begin(); i!= lookup.end(); ++i){
      sort(v.begin(), v.end());
      dirty = false;
    long now = (std::chrono::system_clock::now().time_since_epoch()).count();
    //cout<<now<<" is now"<<endl;
    for(int k=sz-1, count=0; k>=0; --k){
        if (now - v[k].lastUpd > X) continue;
        if (++count >= N) break;
int main(){
  App app;
  app.update("ibm", 1000);
  app.update("gs", 700);
  app.update("gs", 500);
  app.update("goog", 600);
  app.update("msft", 800);
  app.update("ge", 500);
  app.update("ibm", 160);
  app.update("c", 400);

binary search in sorted vector of Tick pointer

Note the mismatched args to the comparitor functions.

(I was unable to use a functor class.)

std::vector<Tick const*> vec;
int target;
bool mylessFunc(Tick const * tick, unsigned int target) {
     //cout<<tick->ts<<" against "<<target<<endl; 
     return tick-ts < target;
lower_bound(vec.begin(),vec.end(),target, mylessFunc);

bool mygreaterFunc(unsigned int target, Tick const * tick){
     //cout<<a->ts<<" against "<<target<<endl; 
     return tick->ts > target;
upper_bound(vec.begin(),vec.end(),target, mygreaterFunc)

sorted vector^multiset for tick query

I see no advantage to using a tree.

Note the key (timestamps) is repeating, so only multiset is valid.

Space — vector is more compact leading to better cache affinity. There’s a price to pay — Vector should use reserve() to prevent reallocation, if possible. Tree doesn’t need it.

Query by lower_bound/upper_bound — I guess both are similar O(log N)

Initial insertions — Remember our data are ticks from an exchange loaded from a large file.

  1. 1. For one insert (assuming no re-balancing no reallocation), vector is O(1) but tree is O(log N) 😦
  2. 2. If data comes in randomly, then tree re-balancing is less frequent but more than once. For vector, it’s a single quicksort after all insertions. I would say vector is faster mostly due to the previous factor
  3. 3. if data comes in (almost) by ascending timestamp, then vector requires no or minimal sorting, but the tree would require frequent re-balancing — presorted data is probably the worst input sequence for one-by-one-insertion.
  4. 4. If no re-balancing is done, then the tree would have depth = N, and query would be linear search 😦

LRU cache #Part 1

I believe my friend got this question in a bbg phone interview.

Design and implement a data structure for Least Recently Used (LRU) cache. It should support the following operations:

get(key) – Get the value (will always be positive) of the key if the key exists in the cache, otherwise return -1.
put(key, value) – Set or insert the value if the key is not already present. When the cache reached its capacity, it should invalidate the least recently used item before inserting a new item.

Follow up:
Could you do both operations in O(1) time complexity?


Hash table to support lookup. The “value” is a pointer to a link node. Link node also has key/value — 2-way linkage.

slist is a FIFO and grows at tail for every new key/value pair, so head is the earliest pair. Every time a key/value is accessed via hash table, we move the node to the tail.

When capacity is reached, we would remove the tail node. Using the key in that node, we also remove from hash table.

min-stack #bbg

My friend Abhinav (not his real name, to protect his privacy) got this question at Bloomberg internship interview. I added some details to make it clearer:

Q: Design a stack that supports push, pop, top, and retrieving the minimum element all with O(1) time complexity in the worst case.

There exists a function compare(Item A, Item B) that returns 1 if A is greater, 0 if equal, and -1 if A is smaller.

  • getMin() — Retrieve the minimum element in the stack.
  • push(x) — Push element x onto stack.
  • pop() — Removes the element on top of the stack.
  • top() — Get the top element.

==== analysis =====

The most efficient getMin() data structure is the binary heap, but insert/delete runs in O(logN). Therefore I felt the requirements here are computationally impossible. But in this context, we only need to support deleting the last added item 🙂

Key insight — popping is a constrained form of deletion. It’s hard to hit O(1) while supporting unconstrained deletions, BUT with a constraint on deletions, all operations can be O(1).

I need a regular stack + a helper data structure. A linked list or vector can support the stack operations

— The helper — a naturally sorted stack (or vector or deque) to hold record-breakers and record-matchers.

IFF a new minimum (record breaker) or another item matching the existing minimum (record-matcher) is added, we push it to the sorted stack.

After every pop(), we check the popped item. If equal to top of sorted stack, then pop the sorted stack.

At any time, top of the sorted stack is the current minimum.

Vector and deque are actually fine. Interviewer may feel they are inferior to a stack, but with a modified requirement, they may become very useful.

— Here’s a design to beat binary_heap, based on finite-sized (32 or 64-bit int) keys

Assuming the comparison is based on 32-bit integer key (string or floats can also use radix sort). I will use a radix array structure. Perhaps 4 arrays of 256-elements.  Or perhaps a 4×256 matrix, Ultimately the payload stored in the data structure are pointers to stack nodes. This saves memory since a stack node may be 99MB.

Every time we push a node, we derive the new key value in O(1), use the key value to locate its ‘home’ in the radix structure and store the new node’s address in a linked list therein. After the push(), we can compare and update (in constant time) a global variable pointing to the minimum node.

Each stack node also has a back-pointer to the iterator into the list, so before pop() we can use the iterator to locate the object in the radix structure and delete it from the host linked list. We will also update the global variable.

find the line passing the most points #solved might be similar

Q: Given N points with positive integer coordinates, find the straight line passing through the most points
A: For each of (N*N-N)/2 pairs of points, compute a Line object identified by 2 numbers:
* a slope S = (y2 – y1)/(x2 – x1)
* intercept on y-axis.

So these 2 numbers can be computed easily from any pair.

Save each Line object as key in a hashmap. When a pair gives a Line that’s already seen, increment its count.

Intercept formula y_inter(int, int, int, int) can be assumed to exist. Writing this function isn’t relevant to a coding interview:

Suppose this value is y3, so the incept point is (0,y3), so

(y3-y1)/(0-x1) = S, so y3 = y1 – S x1

[15] EPI300 skyline #event-handler

Reusable technique – SQL

PROBLEM STATEMENT (equivalent to Leetcode Q 218)
We have to design a program which helps drawing the skyline of a two-dimensional city given the locations of the rectangular buildings in the city. Each building B_i is represented by a triplet of (L_i, R_i, h_i) where L_i and R_i are the left and right coordinates of the ith building, and h_i is the height. In the diagram below there are 8 buildings, represented from left to right by the triplets (1, 5, 11), (2, 7, 6), (3, 9, 13), (12, 16, 7), (14, 25, 3), (19, 22, 18), (23, 29, 13) and (24, 28, 4).

The input of the program is a sequence of building triplets. The triplets are sorted by L_i (the left coordinate of the building) in ascending order.


A data structure challenge. Once I had the correct data structure … 迎刃而解

Q1 —– For any given x, determine the height in the skyline.
Note If x == R_i, then the ith building doesn’t count. In other words, If you look at the first building, the 1-to-5 range it covers is a half-open interval, sometimes written as [1,5) as this range includes 1 but excludes 5. You can think of [1,5) as [1, 4.99999999] approximately.

A1(brute force): look at each building and decide if it “covers” the point X. Given the pre-sort, most buildings aren’t relevant. A complete solution would be

Select max(h) from myTable t where t.l =< x < t.r

To support this solution, the objects could be stored in two sorted data structures, one sorted by L_i and one sorted by R_i.

Q2 —- draw the skyline.
A2: evaluate Q1 (i.e. get the height) at every L_i and R_i value. This solution is probably suboptimal.

Q2b (Leetcode Q218) —- draw the skyline by outputting a sequence of {x,h} pairs showing the change in height (represented by a new height h) at each vertical edge (marked by x).

Is it possible to do this in one scan after preprocessing the triplets with some clever data structures? [[EPI300]] may have a solution. Here’s my own proposal —

Pre-sort the N buildings into a list by L_i, and sort the same buildings into another list by R_i, and merge the 2 lists into a big sorted list of 2N pointers to N unique objects. Each building shows up twice. Each of the 2N entries consists of {building_object_id, x, boolean flag left_or_right }. We will go through this big list one-pass.

I convert this list into a sequence of “events” as we move along the x-axis. That’s why all left/right edges must be sorted into a single sequence.

Main scan — As we hit the left edge of a building, we include this building in the Alive container (probably a BST). In Alive, we keep the buildings sorted by height. We also maintain a lookup table { building_id, pointer/iterator into Alive}. As we hit the right edge of a building, we remove it from Alive. (No need to remove from lookup since we won’t see the same building again.)

As we hit any edge, we need to determine the impact on the skyline if any. This is when we make use of the Alive system. For any edge, there’s impact iff the target building is taller than all other in Alive.

data structure to hold spreadsheet content

Q: Assume string content in each cell. As a start let’s target 1,000,000 rows by 1,000,000 columns. Goals
– memory
– random access
– growable

%%A: Intermediate solution (good habit): a 3-column DB table with a composite key (row,col). Now how do we represent this in a data structure. First attempt would be a hashmap whose key is a customized big integer. Higher 64 bits represent row numbr, while lower half col number. Value is a reference with Copy-on-write. Hash access with a given (row,col) is …. considered random access with O(1).

Now I think a second try would be a hashmap where
main key -> row number
value -> sub-hashmap, where
sub-key -> col number

The other of the twin hashmap keeps col numbers as main keys. Every update to the spreadsheet requires updates to both.