BST: post-order as serialization

Q: if I backscan a post-order output (unsorted sequence), is there only a single BST we can build?

Intuitively, I think this is like fwd scanning a pre-order output so yes.

If I keep a webcam at each tree node
* During a fwd scan of pre-order output, every node’s left child is born before right child.
* During a backscan of post-order output, every node’s right child is born before left child. During the actual post-order walk, my left child is always printed before my right child.


For a given BST, the pre-order output is unique; the post-order output is also unique. However,

Can two BSTs produce the same pre-order output? I think impossible
Can two BSTs produce the same post-order output? I think impossible

Like the pre-order, the post-order sequence is also a serialization but only for a BST.


count lower ints to my right is labelled “hard”.

Q: given an integer array nums , return a new counts array, wherein counts[i] is the number of smaller elements to the right of nums[i]


Order statistics tree  (i.e. an augmented RBTree) should make O(N logN). However, the actual algorithm is not clear to me.

One scan from right. Insert each node into this tree. Before inserting a node of value like 22, we will query the tree getRank(22).

Implementation wise, it’s hard to create a self-balancing BST from scratch. So I think an unbalanced BST might do.

Also, there might be some alternative solutions, like mergesort??

RBTree O(1)insert quite common ] coding IV

— for single-insert is precise saying that hinted insert is O(1).
— for mass insert
Special case — If we know the input sequence is pre-sorted and going to be “appended” to existing tree nodes, then each insert will always hit the end(). We can rely on hinted insert to achieve O(N) .  is even more encouraging — ” If N elements are inserted, Nlog(size+N) in general, but linear in size+N if the elements are already sorted” so as long as pre-sorted, then we get O(N)

For 64-bit integer or float inputs, we can radix sort in O(N) then mass-insert in O(N).

— for construction from a pre-sorted sequence confirms it’s O(N) if the input range is already sorted.

getByRank() in sorted matrix: priorityQ^RBTree


recombinant binTree pyramid, where “water” flows east or south.

  • first level has one node .. lowest value. Add it to pq (i.e. priorityQ)
  • pop the pq and insert the two downstream nodes
  • total K pops, each pop is followed by up to 2 inserts

Heap will grow to up to K items, so each pop will be up to logK

Total O(K logK). To achieve this time complexity, we can also use a RBTree. The tree nodes can come from a pre-allocated array.

RBTree used inside java8 hashmap is a language-neutral discussion. shows an early implementation

they are implemented as a hashed array of binary trees. 
My experience with them is that they are very efficient.

— JEP 180 is the white paper to introduce a self-balanced tree as an alternative to linked list. shows with one million entries in a HashMap a single lookup taken 20 CPU cycles, or less than 10 nanoseconds. Another benchmark test demonstrates O(logN) get(key) in java8, but O(N) in java7, as traditionally known.

If two hashCode() values are different but ended up in the same bucket (due to rehashing and bucketing), one is considered bigger and goes to the right. If hashCodes are identical (as in a contrived hashCode() implementation), HashMap hopes that the keys are Comparable, so that it can establish some order. This is not a requirement of HashMap keys.

If hashCodes are mostly identical (rare) and keys are not comparable, don’t expect any performance improvements in case of heavy hash collisions. Here’s my analysis of this last case:

  • if your Key.equals() is based on address and hashCode() is mostly the same, then the RBTree ordering can and should use address. You won’t be able to look up using a “clone” key.
  • if you have customized Key.hashCode() then you ought to customize equals(), but suppose you don’t implement Comparable, then you are allowed to lookup using a clone key. Since there’s no real ordering among the tree nodes, the only way to look up is running equals() on every node. says

A given bucket contains both Node (linked list) and TreeNode (red-black tree). Oracle decided to use both data structures with the following rules:

  • If for a given index (bucket) in the inner table there are more than 8 nodes, the linked list is transformed into a red black tree
  • If for a given index (bucket) in the inner table there are less than 6 nodes, the tree is transformed into a linked list

With the self-balanced tree replacing the linked list, worst-case lookup, insert and delete are no longer O(N) but O(logN) guaranteed.

This technique, albeit new, is one of the best simple ideas I have ever seen. Why has nobody thought of it earlier?

RBTree: technical notes #Realtime

Red–black trees offer … worst-case guarantees, valuable in real-time applications. The Completely Fair Scheduler used in current Linux kernels and epoll system call implementation[19] uses red–black trees.

This valuable guarantee is valid only on always-balanced trees, but no need to be strictly balanced. In fact, AVL is more rigidly balanced but lacks this guarantee.

In contrast, hash table doesn’t offer worst-case guarantee in the face of hash collision. In fact, Java 8 HashMap uses RBTree in additional to linked list… see my blogpost RBTree used in java hashmap.

After an insert or delete, restoring the red-black properties requires a small number (O(log n) or amortized O(1)) of color changes (which are very quick in practice) and no more than three tree rotations (two for insertion). Although insert and delete operations are complicated logically, their times remain O(log n).

The AVL tree is another structure supporting O(log n) search, insertion, and removal. AVL trees can be colored red-black, thus are a subset of RB trees. Worst-case height is better than the worst-case height of RB trees, so AVL trees are more rigidly balanced. However, Mehlhorn & Sanders (2008) point out: “AVL trees do not support constant amortized deletion costs”, but red-black trees do.[25]

doubly-linked list as AuxDS for BST

I find it a very natural auxDS. Every time the BST gets an insert/delete, this list can be updated easily.

Q: How about the self-adjustment after an insert or delete?
%%A: I think this list is unaffected

With this list, in-order walk becomes really easy. is the first place I saw this simple technique.

BST: finding next greater node is O(1) #Kyle shows that in a std::map, the time complexity of in-order iterating every node is O(N) from lowest to highest.

Therefore, each step is O(1) amortized. In other words, finding next higher node in such a BST is a constant-time operation. confirms that std::next() is O(1) if the steps to advance is a single step. Even better, across all STL containers, iterator movement by N steps is O(N) except for random-access iterators, which are even faster, at O(1).

multimap implementation

Given a multimap of {name -> Acct}, here’s a practical question:

Q: how do you save two different Acct objects having the same name?

I would use a linked list to hold all the different Acct objects for a given name. The tree node would hold the linked list.

std::multimap CAN hold different values under the same key, but std::multimap has other constraints and may not be able to use my simple idea.


sorted_Array|_List ⇒ balanced_BST

Q1 (easy): Given an array where elements are sorted in ascending order, convert it to a height balanced BST, where the depth of the two subtrees of every node never differ by more than 1.

Q2 (medium): How about a sorted slist?

==== analysis

I feel this “easy” problem should be medium. Perhaps there are elegant solutions.

I need not care about the payloads in the array. I can assume each payload equals the subscript.

There are many balanced BSTs. Just need to find one

— idea 1: construct an sequence of positions to probe the array. Each probed value would be added to the tree. The tree would grow level by level, starting from root.

The sequence is similar to a BFT. This way of tree building ensures

  • only the lowest branch nodes can be incomplete.
  • at all branch levels, node count is 2^L

I think it’s challenging to ensure we don’t miss any position.

Observation — a segment of size 7 or shorter is easy to convert into a balanced subtree, to be attached to the final BST.

When a subarray (between two probed positions) is recognized as such a segment we can pass it to a simple routine that returns the “root” of the subtree, and attach it to the final BST. This design is visual and a clean solution to the “end-of-iteration” challenge.

The alternative solution would iterate until the segment sizes become 0 or 1. I think the coding interviewer probably prefers this solution as it is shorter but I prefer mine.

— idea 2

Note a complete binary tree can be efficiently represented as an array indexed from one (not zero).

— Idea 3 (elegant) dummy payload — For the slist without using extra storage, I can construct a balanced tree with dummy payload, and fill in the payload via in-order walk

All tree levels are full except the leaf level — guarantees height-balanced. We can leave the right-most leaf nodes missing — a so-called “complete” binary tree. This way the tree-construction is simple — build level by level, left to right.

— idea 4 STL insert() — For the slist, we can achieve O(N) by inserting each element in constant time using STL insert() with hint

— For the slist without using an array, I can get the size SZ. (then divide it by 2 until it becomes 7,6,5 or 4.) Find the highest 3 bits (of SZ) which represent an int t, where 4 <= t <= 7. Suppose that value is 6.

We are lucky if every leaf-subtree can be size 6. Then we just construct eight (or 4, or 16 …) such leaf-subtrees

If SZ doesn’t give us such a nice scenario, then some leaf-subtrees would be size 5.  Then my solution is to construct eight (or 4, or 16 …) leaf-trees of type AAAAA (size 6) then BBB (size 5).

Lucky or unlucky, the remaining nodes must number 2^K-1 like 3,7,15,31 etc.  We then construct the next level.

–For the slist without using an array, I can propose a O(N logN) recursive solution:
f(slist, sz){
locate the middle node of slist. Construct a tree with root node populated with this value.
Then cut the left segment as a separated slist1, compute its length as sz1. node1 = f(slist1,sz1); set node1 as the left child of root.
Repeat it for the right segment
return the root

us`lowest N natural nums,count BST#70%#Rahul

how do you make use of the previous results for N=2 to tackle N=3?
f(z) denotes count of unique BSTs consisting of the fist z natural number

For N=21, we can have 1 node on the left side, 19 nodes on the right side

  • for odd z, express it as z:=2y+1 where y > 0
    f(2y+1)=[ f(a=0)*f(2y) + f(a=1)f(2y-1) + f(2)f(2y-2)… +f(y-1)f(y+1) ]*2 + f(y)f(y)

Note a should increment from 0 up to y-1.

  • for even z, express it as z:=2y where y > 0
    f(2y)=[ f(0)*f(2y-1) + f(1)f(2y-2) + f(2)f(2y-3)… +f(y-1)f(y) ]*2

Let’s try this formula on f(2)…= 2 good
Let’s try this formula on f(3)…=5 y=1

if ‘1’ is root, then there are f(2) BSTs
if ‘3’ is root, then there are f(2) BSTs
if ‘2’ is root, then there are f(1) * f(1) BSTs
if ‘1’ is root, there are f(8) BSTs
if ‘9’ is root, there are f(8) BSTs
if ‘4’ is root, there are f(3) left-subtrees and f(5) right-subtrees, giving f(3)*f(5) BSTs

This is not coding challenge but math challenge.

Q2: output all BSTs. We need a way to represent (serialize) a BST?

CIV: I still prefer RBtree #Rahul

  • goal #1 in CIV — speedy completion of an optimal solution in terms of O().
  • j4 #1 — RBTree is more intuitive more natural to my problem solving, more than priorityQ and sorted vector. Given my #1 goal, I would go for my favorite tools.
  • j4 — if the interviewer gives a new requirement, my RBtree may become useful (51% chance) or become unusable (49% chance)
  • Drawback #1 of RBtree — not supported in python
  • Drawback  — array sorting can achieve O(N) using radix sort or counting sort, esp. in the contrived contexts of Leetcode problems.

Q: what if interviewer feels RBtree is overkill and over-complicated?
A: I would say overall bigO is not affected.
A: I would say RBTree supports future features such as delete and dynamic data set.  Realistic reasons are powerful arguments.

Q: what if interviewer gives a follow-up request to simplify the design?
A: if I already have an optimal solution, then yes I don’t mind replacing my RBTree

avoid lopsided BST

Self-balanced is not a theoretical nicety but an essential BST feature. Without it a BST could easily become lopsided and all operations will slow down.

If for any reason (I don’t know any) we can’t use a AVL or RBT, then we could randomize the input and insert them (without deletion) and get a fairly balanced BST.

intervalTree: classic RBTree to find overlapp`event

Notable detail — non-balancing BST won’t give logN performance, since the tree depth may degrade

Q1: given N event objects {start, end, id}, use a RBTree to support a O(logN) query(interval INT) that returns any one event that overlaps with INT, or NULL if none.

If the end time == start time in the input INT, then it becomes the focus today :

Q2: given N event objects {start, end, id}, use a RBTree to support a O(logN) query(timestamp ii) that returns any one event that’s ongoing at time ii, or NULL if none.

P312 [[intro to algorithms]] describes an elegant solution.

The text didn’t highlight — I think the end time of the input interval INT is … ignored. (Therefore, in my Q2, input ii is a single timestamp, not an event.) Precisely because of this conscious decision, the tree is sorted by event start time, and the additional payload (in each node N) is the subtree end time i.e. last end time of all events started before N (including N itself). By construction, N’s payload is equal or higher than N’s start time. (Note Our input ii can fall into gaps between the event intervals.)

Eg: suppose N starts at 2:22 and left-child payload says 3:33, then we know for sure there at least one ongoing even during the interval 2:22 to 3:33.  Not sure if this is useful insight.

The text illustrates and proves why this design enables a clean and simple binary search, i.e. after each decision, we can safely discard one of “my” two subtrees. Here’s my (possibly imperfect) recall:

Let’s suppose each time the current node/event is not “ongoing”. (If it is then we can return it 🙂  ) So here’s the gist of the algorithm:

Suppose i’m the “current node”, which is root node initially. Compare ii to my left child L’s payload (not my payload).

As defined, L’s payload is the highest end times of L + all its sub nodes. In other words, this payload is the highest end time of all events starting before me.

Note the algo won’t compare L’s start time with ii
Note the algo won’t compare my start time with ii, though the scenario analysis below considers it.

  • case 1: If payload is lower than ii, then we know all events on my left have ended before ii, so we can discard the left subtree completely. Only way to go is down the right subtree.
  • the other case (case 2): if payload is higher or equal to ii, then we know my left subtree contains some events (known as “candd” aka candidates) that will end after ii.
  • case 2a: suppose my start time is before ii, then by BST definition every candidate has started before ii. We are sure to find the “candd” among them.
  • case 2b: suppose my start time is after ii, then by BST definition, all right subtree events have not started. Right subtree can be discarded
  • In both cases 2a and 2b, we can discard right subtree.


priorityQ: 2 advantages over RBtree#O(1)add #Part2

RBTree O(1) insert is quite common in coding questions.

[1] Contrary to popular belief, RBTree mass insert (including mass-construction) is O(logN) rather than O(1) per node in the general case. However, see link above.

See lecture notes and SOF post on

killer app@RBTree #priorityQ,sortedVector

A common usage context is query on some data set after pre-processing . In these contexts, BST competes with sorted vector and priority queue.

  • Killer app against vector: incremental insert or live updates
  • Killer app against vector: if there’s even occasional delete(), then sorted vector would suffer
  • Killer app against vector: update on one item can be implemented as delete/reinsert. Vector requires binary search -> mid-stream insert
  • minor advantage over sorted vector: vector sorting requires swapping, so value-semantics is problematic
  • Killer app against priority queue: range query, approximate query,
  • Killer app against priority queue: both max() and min()
  • Application: if each BST node needs additional data. Binary heap doesn’t expose those nodes(?)

It’s important to remember the advantages of vector

  1. cache efficiency
  2. runtime malloc cost

Java, c++ and c# all provide self-balancing BST in their standard libraries, from the very beginning. In my projects, I use these containers on an everyday basis. However, after talking to a few friends I now agree that most coding problems don’t need self-balancing BST because

  1. These problems have no incremental insertion/deletion, so we can simply sort an array of items at O(N logN) as pre-processing
  2. In some cases, priorityQ is a viable alternative
  3. Python doesn’t have this container at all and all coding problems must support python.

versionTable^BST to support binary search #Indeed

My son’s exam grading table says
A: 85 and above
B: 62 and above
C: 50 and above

One solution to support efficient getGrade(score) is a deque of record {qualifyingScore, grade}. Binary search on the qualifyingScore field is O(logN)

Now a subtly different problem. Suppose you update your resume title periodically,

“developer” in version 1 or later
“SrDev” in version 5 or later
“Consultant” in version 9 or later
“Architect” in version 13 or later

This problem is “subtly” different because new versions always get higher version numbers.

I have always avoided the deque-table data structure, in favor of the more powerful RedBlack trees. Current conclusion — RBTree is still superior, but IFF we receive the table content (high volume) in sorted order, then deque-table is simpler at least conceptually. Since python and javascript don’t offer RBTree, many interviewers aren’t familiar with it.

  • prepend/append is the advantage of deque-table. However, RBTree insert-with-hint is amortized O(1), comparable to deque table.
  • mid-stream Insert in general is more efficient on RBTree because deque is O(N)
  • Delete is similarly more efficient in RBTree.
  • Lookup is all O(logN).

Delete is often simplified away in coding tests, but important in practice, to correct mistakes or prune overgrown data stores. Realistic Binary search systems seldom operate on “immutable” data store.

isPreorderBST(randomArray) #G4G

This was asked in a 2018 hacker rank interview, not as important as an on-site coding question. However, I see this question as a classic.

— G4G solution based on sorted stack has a tested solution but it’s too cryptic. I added instrumentation to help myself understand it. See my github code.

Insight — at any time, the stack is sorted with the top being the smallest value . If the new item is highest item so far, then stack is wiped clean !

The algo is fairly simple —

  • initialize stack with arr[0] i.e. root
  • for every new item xx, start looping
    • if xx falls below stack top or stack is empty, then push xx, otherwise pop and loop
  • so when would we fail. It relies on the “root” variable which holds the top of the stack if non-empty, but frozen (the real tree root) once stack wiped clean !

— My solution 2

My analysis — After we have seen some number of nodes, there’s exactly one possible tree we could construct, so let’s construct it.

Note during the construction, each new node has only one place to go (key insight) and after that we can check if it breaks pre-order. is my tested code, probably less elegant than the G4G solution, but I’m still proud of it.

I don’t think any solution (including the G4G) is really O(N). My solution is not inferior. It has a readability advantage. It’s longer but not necessarily slower.


RBTree range count #enum,auto

// demo range counting with lower_bound. I don't know any faster algorithm
// demo auto keyword
// demo enum
// demo upper_bound different from lower_bound!
#include <iostream>
#include <climits>
#include <set>
#include <assert.h>
using namespace std;

set<int> s;
//typedef set<int>::iterator It; // ---> auto is easier
enum Mode { NoMin, NoMax, BothGiven };

size_t countWithInclusiveLimits(int limit1, int limit2, Mode m = BothGiven){
  if (s.empty()) return 0;
  auto it  = s.begin();
  auto it2 = s.end(); //it2 is the node past the last wanted node.

  if (m != NoMin) it = s.lower_bound(limit1);
  if (m != NoMax){
    it2 = s.upper_bound(limit2);
    assert(*it2 != limit2 && "it2 initial value should be consistent with end()");

  size_t ret = 0;
  for (; it != it2; ++it){
    cout<<*it<<" ";
  cout<<" --> "<<ret<<endl;
  return ret;
int main(){
  for(int i=-4; i<=9; ++i) s.insert(i*10);
  countWithInclusiveLimits(11, 55);
  countWithInclusiveLimits(0, 50, NoMin);
  countWithInclusiveLimits(10, 0, NoMax);

bbg-FX: check if a binary tree is BST #no deep recursion

Binary search tree has an “ascending” property — any node in my left sub-tree are smaller than me.

Q1: given a binary tree, check if it’s a BST. (No performance optimization required.) Compile and run the program.

Suppose someone tells you that the lastSeen variable can start with a random (high) initial value, what kind of test tree would flush out the bug? My solution is below, but let’s Look at Q1b.

Q1b: what if the tree could be deep so you can’t use recursion?
A: use BFT. When we actually “print” each node, we check left/right child nodes.

I made a mistake with lastSeen initial value. There’s no “special” initial value to represent “uninitialized”. Therefore I added a boolean flag.

Contrary to the interviewer’s claim, local statics are automatically initialized to zero, but zero or any initial value is unsuitable (payload can be any large negative value), so we still need the flag.

#include <iostream>
#include <climits>
using namespace std;

struct Node {
    int data;
    Node *le, *ri;
    Node(int x, Node * left = NULL, Node * right = NULL) : data(x), le(left), ri(right){}
/*    5
  2       7
 1 a
Node _7(7);
Node _a(6);
Node _1(1);
Node _2(2, &_1, &_a);
Node _5(5, &_2, &_7); //root

bool isAscending = true; //first assume this tree is BST
void recur(Node * n){
//static int lastSeen; // I don't know what initial value can safely represent a special value

  //simulate a random but unlucky initial value, which can break us if without isFirstNodeDone
  static int lastSeen = INT_MAX; //simulate a random but unlucky initial value
  static bool isFirstNodeDone=false; //should initialize to false

  if (!n) return;
  if (n->le) recur(n->le);

  // used to be open(Node *):
  if (!isAscending) return; //check before opening any node
  cout<<"opening "<<n->data<<endl; if (!isFirstNodeDone){ isFirstNodeDone = true; }else if (lastSeen > n->data){
        return; //early return to save time
  lastSeen = n->data;

  if (n->ri) recur(n->ri);

int main(){
  cout<< (isAscending?"ok":"nok");

seek successor of a given node in a BST # no uplink

Input: root node and an arbitrary node A.

We can’t start from A because by moving left/right, we may not be able to locate the successor. So we start from root and will encounter A.

I think this is a simple, barebones in-order walk entering at root. We will encounter A, and the next node encountered would be A’s successor.

Note this barebones walk requires no uplink.

seek successor of a given node in a BST #root node unknown

EPI300 Q14.2.

I feel code is hard to write, as it’s hard to visualize or build the tree.

In-order walk must enter at root. If we only have the current node as the only input, then I assume there’s an uplink in each node. Here’s my algo:

Case 1: if i have Right child, then descend there and then descend Left all the way to a leaf node and return. Easy case.

Case 2: now we know i have no Right child. Am i a Left child? If yes then return my parent. Easy case.

Case 3: Now we know I am the Right child, without my own right child. This is the worst case.   My left child (if any) is irrelevant, so effectively I am a leaf node. A right leaf node. Solution: Move up step by step. After the first up-right move,  descend Left all the way.

For predecessor, I found this algo online, but it can’t move up so unable to support Case 2 and 3.

  1. /* Find the inorder predecessor of current */
  2. pre = current->left;
  3. while (pre->right != NULL && pre->right != current)
  4.    pre = pre->right;

Based on that, here’s my successor algo:

  1. /* Find the inorder sucessor of current */
  2. pre = current->right;
  3. while (pre->left != NULL && pre->left != current)
  4.    pre = pre->left;

priorityQ^RBtree, part 1

No expert here… Just a few pointers.

I feel binary heap is designed for a Subset of the “always-sorted” requirement on sorted trees. The subset is “yield the largest item on demand”.

Therefore, a binary heap is simpler (and faster) than a red-black tree.

As a solution, sorted data structures (like RB tree) are more in demand than priority queues (like heap). For example, the classic exchange order book is more likely a sorted list, though it can be a priority queue.

A binary heap hides all but the maximum node, so we don’t know how those nodes are physically stored.

A binary heap is a binary tree but not sorted, not a BST.

binary search tree

defining property: any node to my left is smaller or equal. left child = parent. Both children is allowed to be equal to parent — binary search tree is completely unbiased.

The defining property defined visually: For each node, draw a vertical line (long shadow) through the tree.
* Left descendants’ v-lines are on or inside my v-line.
* Right descendants’ v-lines are on or outside my v-line.

When nodes move up or down the three, their shadows shift. I think this may help us visualize a BST successor, deletion and insertion.

the same group of items can build 2 different bs-trees, with different tree heights. How? The same entries could arrive out of sequence, just like IP packets.

The deeper the tree, the more iterations for a tree walk, the slower the sort.

Worst case bst has a tree height equal to input size, with one branch at each node?