merge 2 binTrees by node position

Q (leetcode 617): https://leetcode.com/problems/merge-two-binary-trees/submissions/

==== Analysis

https://github.com/tiger40490/repo1/blob/py1/py/algo_tree/merge2Tree.py is a solution fully tested on leetcode.com but hard to run offline.

Insight — Challenge is implementation. Input and return type of DFT function are tricky but server as useful implementation techniques.

Labelled as easy, but pretty hard for me.

— Idea 1: BFT. When a node has any null child, put null into queue. Not so simple to pair up the two iterations

— Solution 2: DFT on both trees. Always move down in lock steps. When I notice a node in Tree A is missing child link that Tree B has, then I need to suspend the Tree A DFT?

My DFT function would have two parameters nodeInA and nodeInB. One of them could be null , but the null handling is messy.

Aha — I set the input parameters to to dummy objects, to avoid the excessive null check. In this problem, this technique is not absolutely necessary, but very useful in general

 

Advertisements

BST: post-order as serialization

Q: if I backscan a post-order output (unsorted sequence), is there only a single BST we can build?

Intuitively, I think this is like fwd scanning a pre-order output so yes.

–insights
If I keep a webcam at each tree node
* During a fwd scan of pre-order output, every node’s left child is born before right child.
* During a backscan of post-order output, every node’s right child is born before left child. During the actual post-order walk, my left child is always printed before my right child.

 

For a given BST, the pre-order output is unique; the post-order output is also unique. However,

Can two BSTs produce the same pre-order output? I think impossible
Can two BSTs produce the same post-order output? I think impossible

Like the pre-order, the post-order sequence is also a serialization but only for a BST.

count lower ints to my right #52%

https://leetcode.com/problems/count-of-smaller-numbers-after-self/ is labelled “hard”.

Q: given an integer array nums , return a new counts array, wherein counts[i] is the number of smaller elements to the right of nums[i]

====analysis

Order statistics tree , i.e. an augmented RBTree should make O(N logN).

One scan from right. Insert each node into this tree. Before inserting a node of value like 22, we will query the tree getRank(22).

Implementation wise, it’s hard to create a self-balancing BST from scratch. So I think an unbalanced BST might do.

Also, there might be some alternative solutions.

RBTree O(1)insert quite common ] coding IV

— for single-insert
https://en.cppreference.com/w/cpp/container/map/insert is precise saying that hinted insert is O(1).
— for mass insert
Special case — If we know the input sequence is pre-sorted and going to be “appended” to existing tree nodes, then each insert will always hit the end(). We can rely on hinted insert to achieve O(N) .

http://www.cplusplus.com/reference/map/map/insert/  is even more encouraging — ” If N elements are inserted, Nlog(size+N) in general, but linear in size+N if the elements are already sorted” so as long as pre-sorted, then we get O(N)

For 64-bit integer or float inputs, we can radix sort in O(N) then mass-insert in O(N).

— for construction from a pre-sorted sequence
https://en.cppreference.com/w/cpp/container/map/map confirms it’s O(N) if the input range is already sorted.

getByRank() in sorted matrix: priorityQ^RBTree

https://leetcode.com/problems/kth-smallest-element-in-a-sorted-matrix/

====analysis

recombinant binTree pyramid, where “water” flows east or south.

  • first level has one node .. lowest value. Add it to pq (i.e. priorityQ)
  • pop the pq and insert the two downstream nodes
  • total K pops, each pop is followed by up to 2 inserts

Heap will grow to up to K items, so each pop will be up to logK

Total O(K logK). To achieve this time complexity, we can also use a RBTree. The tree nodes can come from a pre-allocated array.

RBTree used inside java8 hashmap

https://en.wikipedia.org/wiki/Hash_table#Separate_chaining_with_other_structures is a language-neutral discussion.

https://digitalmars.com/d/archives//640.html#N803 shows an early implementation

they are implemented as a hashed array of binary trees. 
My experience with them is that they are very efficient.

— JEP 180 is the white paper to introduce a self-balanced tree as an alternative to linked list.

https://dzone.com/articles/hashmap-performance shows with one million entries in a HashMap a single lookup taken 20 CPU cycles, or less than 10 nanoseconds. Another benchmark test demonstrates O(logN) get(key) in java8, but O(N) in java7, as traditionally known.

If two hashCode() values are different but ended up in the same bucket (due to rehashing and bucketing), one is considered bigger and goes to the right. If hashCodes are identical (as in a contrived hashCode() implementation), HashMap hopes that the keys are Comparable, so that it can establish some order. This is not a requirement of HashMap keys.

If hashCodes are mostly identical (rare) and keys are not comparable, don’t expect any performance improvements in case of heavy hash collisions. Here’s my analysis of this last case:

  • if your Key.equals() is based on address and hashCode() is mostly the same, then the RBTree ordering can and should use address. You won’t be able to look up using a “clone” key.
  • if you have customized Key.hashCode() then you ought to customize equals(), but suppose you don’t implement Comparable, then you are allowed to lookup using a clone key. Since there’s no real ordering among the tree nodes, the only way to look up is running equals() on every node.

http://coding-geek.com/how-does-a-hashmap-work-in-java/#JAVA_8_improvements says

A given bucket contains both Node (linked list) and TreeNode (red-black tree). Oracle decided to use both data structures with the following rules:

  • If for a given index (bucket) in the inner table there are more than 8 nodes, the linked list is transformed into a red black tree
  • If for a given index (bucket) in the inner table there are less than 6 nodes, the tree is transformed into a linked list

With the self-balanced tree replacing the linked list, worst-case lookup, insert and delete are no longer O(N) but O(logN) guaranteed.

This technique, albeit new, is one of the best simple ideas I have ever seen. Why has nobody thought of it earlier?

RBTree: technical notes #Realtime

Red–black trees offer … worst-case guarantees, valuable in real-time applications. The Completely Fair Scheduler used in current Linux kernels and epoll system call implementation[19] uses red–black trees.

This valuable guarantee is valid only on always-balanced trees, but no need to be strictly balanced. In fact, AVL is more rigidly balanced but lacks this guarantee.

In contrast, hash table doesn’t offer worst-case guarantee in the face of hash collision. In fact, Java 8 HashMap uses RBTree in additional to linked list… see my blogpost RBTree used in java hashmap.

After an insert or delete, restoring the red-black properties requires a small number (O(log n) or amortized O(1)) of color changes (which are very quick in practice) and no more than three tree rotations (two for insertion). Although insert and delete operations are complicated logically, their times remain O(log n).

The AVL tree is another structure supporting O(log n) search, insertion, and removal. AVL trees can be colored red-black, thus are a subset of RB trees. Worst-case height is better than the worst-case height of RB trees, so AVL trees are more rigidly balanced. However, Mehlhorn & Sanders (2008) point out: “AVL trees do not support constant amortized deletion costs”, but red-black trees do.[25]

doubly-linked list as AuxDS for BST

I find it a very natural auxDS. Every time the BST gets an insert/delete, this list can be updated easily.

Q: How about the self-adjustment after an insert or delete?
%%A: I think this list is unaffected

With this list, in-order walk becomes really easy.

https://leetcode.com/problems/kth-smallest-element-in-a-bst/solution/ is the first place I saw this simple technique.

T.isLikeSubtree(S) #60%

Q (Leetcode 572): Given two non-empty binary trees s and t, check whether tree t has exactly the same structure and node values as a subtree of s. A subtree of s is a tree consists of a node in s and all of this node’s descendants. The tree s could also be considered as a subtree of itself.

====analysis
https://leetcode.com/problems/subtree-of-another-tree/solution/ relies (as “primary check”) on the payloads of  each tree node, but in some trees, all payloads are empty or all payloads are either True or False. In these cases, the comparison of payload is only usable as a secondary check. The primary check must be structural. See key in a tree node

The O(N+K) is plain wrong.

I guess the 2nd solution (and possibly 1st solution) would compare every node in S to the root of T. I think there are more efficient solutions using subtree size and subtree height as secondary checks – more reliable than payload check.

My solution below uses BFT + pre/post/in-order walk !

— Preliminary step: post-order walk to get subtree-size, subtree-height at each S node + T root. (I will skip other T nodes.). Suppose T is size 22 height 4. We will look for any node Y of size 22 and height 4 and a matching payload. This would eliminate lots of S nodes:

If T height is more than 2, then lots of low-level S nodes are eliminated.
If T height is 2 or 1, then T size would be at most 3. Most high-level S nodes are eliminated.

— Solution 1: For both T and S, We take in-order walk to assign incremental IDs, then take pre-order walk to produce an array of IDs that represent the tree structure.

Can We run a level-aware BST. Only one level need to be examined … wrong!

I think the in-order walk itself is what I need. Find any node Y in S that matches size+height+payload of T root. Suppose ID(Y)=44 but ID(T root) = 4, then simply shift down by 40 and do a linear scan. Must check payload, but not height/size.

 

reverse post-order mirrors pre-order #diagram+Dot

For a given binTree the pre-order sequence is mirrored by the reverse-post-order sequence.

“Reverse” := “right_child then left_child”

I had an “Aha” moment when reading P 389 [[discrete mathematics]]. Insightful illustrated path-tracer diagram showing that pre-order = ABCDEFGHIJ with root as first visited, and J=right most leaf node. The reverse-post-order = JIHGFEDCBA with root as the last visited, due to post-order. Note the DOT is on the Left for both reverse-post and pre.

(https://en.wikipedia.org/wiki/Tree_traversal#Depth-first_search has standard diagrams with a DOT on the left for pre-order; post-order has dot on the right of each node)

Note any post-order curve is less intuitive less visual (than pre-order) because the curve hits a root of subtree upon leaving ! The earlier encounters are touch-n-go. The dot can reinforce this key insight. The dot can help bridge the conceptual gap.

Similarly post-order sequence is mirrored by reverse-pre-order. For the same binTree above, reverse-pre-order = AFGHJIBDEC (not mentioned in the book). Post-order = CEDBIJHGFA, as printed in the middle of P 389. Note root is last visited due to post-order.

max-sum path up+down binTree #FB 60%

Q2 (Leetcode “hard” Q124): find max path sum, where a ” path ” has minimum one node A and can include A’s left child and/or A’s right child. No uplink available.

====analysis

I see two types of paths

  1. down the tree, starting at some upstream node A ending at a node in A’s subtree
  2. up-down the tree, starting at some A’s left subtree, ending somewhere in A’s right subtree.

For 1) https://bintanvictor.wordpress.com/wp-admin/post.php?post=23360&action=edit has my tested solution

For 2: post-order walk to update each node a (signed) max-path-sum-from-here. See https://bintanvictor.wordpress.com/wp-admin/post.php?post=31601&action=edit. The 2 values from left+right children can solve this sub-problem

Q: can we use the 2) solution for 1)? I think so

BST: finding next greater node is O(1) #Kyle

https://stackoverflow.com/questions/11779859/whats-the-time-complexity-of-iterating-through-a-stdset-stdmap shows that in a std::map, the time complexity of in-order iterating every node is O(N) from lowest to highest.

Therefore, each step is O(1) amortized. In other words, finding next higher node in such a BST is a constant-time operation.

https://en.cppreference.com/w/cpp/iterator/next confirms that std::next() is O(1) if the steps to advance is a single step. Even better, across all STL containers, iterator movement by N steps is O(N) except for random-access iterators, which are even faster, at O(1).

serialize binTree #funptr

struct Node{
  Node * left, right;
  int data;
}

… If you simply output as “data,leftId,rightId | data,leftId,rightId | …” then the graph structure i.e. link is lost. Therefore, I feel the stream need to identify each node: “Id,data,leftId,rightId|…” Next question is how to assign the id values. Easiest — use node address as a string id! Suppose there’s a treewalk algo walk(Node * root, void ( *callback ) (Node *)), i will call it like

walk(root, serialize1node); // where serialize1node is the callback.

Note the position of the asterisk:

  • It’s on the right of type name Node — read left to right .. pointer to Node
  • It’s on the left of variable name callback — read left to right .. callback is a pointer to …

https://github.com/tiger40490/repo1/blob/cpp1/cpp/binTree/serialize_bbg.cpp is my tested solution I feel the challenge is ECT and syntax, not the idea. —- elegant home-grown two-pass solution Assign incremental IDs to each node in an in-order walk. Then run pre-order walk to output each ID.

  • 🙂 applicable for any binTree, but 3-way trees  or higher
  • 🙂 No null node needed
  • 🙂 no space overhead. Instead of serializing address, I serialize my generated ID of each node, which are usually smaller and never bigger than a pointer. I think the serialized byte array is the smallest possible.
  • Key insight — from a given preorder output, there’s exactly one possible BST we construct.

—- solution: standard graph representations — AdjMatrix + edgeSet, applicable on any graph —- solution : BFT -> list including nulls —-idea from my friend Deepak: pre-order tree walk and output one line per node {payload, direction from root, direction from Level 1 ancestor, direction from Level 2 ancestor..}. The technique is reusable even though overall efficiency and complexity is not optimal. We don’t need to be optimal.

  • first node in the file is always the root
  • the output file always includes the higher level nodes before the lower-level nodes
  • delimiter (\n) needed between nodes
  • demerit — data bloat… too verbose if tree is sparse but 99 levels deep

—-If the tree is a BST, then this idea is based on the fact that a reading a series of unsorted nodes can construct a BSTree.

For this solution, we have no requirement on the payloads. They can be booleans.

first pass — in-order walk to build a hash table of {node address -> integer id}. If we treat the ID’s as node keys, then the tree is a BST.

2nd pass — BFT or pre-order walk the original tree. For each node, we write node ID’s to the file, without writing the links or addresses. File looks like T,5|T,4|T,6|T,2|… When we reconstruct the tree, we read each node from the file, and insert that node into our new tree, using the id value as key to decide where.

If the original tree nodes have keys, I will just treat it as payload, and use my assigned ID as key

multimap implementation

Given a multimap of {name -> Acct}, here’s a practical question:

Q: how do you save two different Acct objects having the same name?

I would use a linked list to hold all the different Acct objects for a given name. The tree node would hold the linked list.

std::multimap CAN hold different values under the same key, but std::multimap has other constraints and may not be able to use my simple idea.

 

sorted_Array|_List ⇒ balanced_BST

Q1 (easy): Given an array where elements are sorted in ascending order, convert it to a height balanced BST, where the depth of the two subtrees of every node never differ by more than 1.

Q2 (medium): How about a sorted slist?

==== analysis

I feel this “easy” problem should be medium. Perhaps there are elegant solutions.

I need not care about the payloads in the array. I can assume each payload equals the subscript.

There are many balanced BSTs. Just need to find one

— idea 1: construct an sequence of positions to probe the array. Each probed value would be added to the tree. The tree would grow level by level, starting from root.

The sequence is similar to a BFT. This way of tree building ensures

  • only the lowest branch nodes can be incomplete.
  • at all branch levels, node count is 2^L

I think it’s challenging to ensure we don’t miss any position.

Observation — a segment of size 7 or shorter is easy to convert into a balanced subtree, to be attached to the final BST.

When a subarray (between two probed positions) is recognized as such a segment we can pass it to a simple routine that returns the “root” of the subtree, and attach it to the final BST. This design is visual and a clean solution to the “end-of-iteration” challenge.

The alternative solution would iterate until the segment sizes become 0 or 1. I think the coding interviewer probably prefers this solution as it is shorter but I prefer mine.

— idea 2

Note a complete binary tree can be efficiently represented as an array indexed from one (not zero).

— Idea 3 (elegant) dummy payload — For the slist without using extra storage, I can construct a balanced tree with dummy payload, and fill in the payload via in-order walk

All tree levels are full except the leaf level — guarantees height-balanced. We can leave the right-most leaf nodes missing — a so-called “complete” binary tree. This way the tree-construction is simple — build level by level, left to right.

— idea 4 STL insert() — For the slist, we can achieve O(N) by inserting each element in constant time using STL insert() with hint

— For the slist without using an array, I can get the size SZ. (then divide it by 2 until it becomes 7,6,5 or 4.) Find the highest 3 bits (of SZ) which represent an int t, where 4 <= t <= 7. Suppose that value is 6.

We are lucky if every leaf-subtree can be size 6. Then we just construct eight (or 4, or 16 …) such leaf-subtrees

If SZ doesn’t give us such a nice scenario, then some leaf-subtrees would be size 5.  Then my solution is to construct eight (or 4, or 16 …) leaf-trees of type AAAAA (size 6) then BBB (size 5).

Lucky or unlucky, the remaining nodes must number 2^K-1 like 3,7,15,31 etc.  We then construct the next level.

–For the slist without using an array, I can propose a O(N logN) recursive solution:
f(slist, sz){
locate the middle node of slist. Construct a tree with root node populated with this value.
Then cut the left segment as a separated slist1, compute its length as sz1. node1 = f(slist1,sz1); set node1 as the left child of root.
Repeat it for the right segment
return the root
}

Preorder+Inorder list → binTree #50%

https://leetcode.com/problems/construct-binary-tree-from-preorder-and-inorder-traversal/ says — Q: Given preorder and inorder traversal of a tree, construct the binary tree. You may assume that duplicates do not exist in the tree.

For example, given

preorder = [3,9,20,15,7]
inorder = [9,3,15,20,7]

Return the following binary tree…

====analysis

The int values in the int arrays are meaningless and useless. I can improve things by assigning incremental ranks [1] to each node in the inorder list. Then use those ranks to rewrite the preorder list. After that, walk the preorder list:

  • first node is root
  • any subsequent node can be placed precisely down the tree.

I solved a similar problem of converting preorder list into BST — https://bintanvictor.wordpress.com/wp-admin/post.php?post=19713&action=edit

identical binTree

Q: Given two binary trees, write a function to check if they are the same or not. Two binary trees are considered the same if they are structurally identical and the nodes have the same key.

====analysis:

Look at hints? No need !

— solution 1: I can use the serialization solution and compare the two serialized strings in real time.

— solution 2: BFT but ensure each branch level has strictly 2^n items including nulls

Intuition — If both sides match up on every level, then identical

This solution works if each tree node has a fixed number of child nodes.

— solution 2b: a null node at level N will reduce Level N+1 nodes by two.

— solution 3: recursion
bool diff(node1, node2){
if diff(node1.left, node2.left) or diff(node1.right, node2,right): return false
}
There might be some implementation issues

— Idea 9: BFT but each queue item is {payload, childrenCount}

This solution doesn’t work for binTree as it can’t detect “null leftChild” vs “null rightChild”.

This solution works if each node can have arbitrary number of child nodes, but I don’t know how common this is.

us`lowest N natural nums,count BST#70%#Rahul

how do you make use of the previous results for N=2 to tackle N=3?
f(z) denotes count of unique BSTs consisting of the fist z natural number
f(0)=f(1)=1

For N=21, we can have 1 node on the left side, 19 nodes on the right side

  • for odd z, express it as z:=2y+1 where y > 0
    f(2y+1)=[ f(a=0)*f(2y) + f(a=1)f(2y-1) + f(2)f(2y-2)… +f(y-1)f(y+1) ]*2 + f(y)f(y)

Note a should increment from 0 up to y-1.

  • for even z, express it as z:=2y where y > 0
    f(2y)=[ f(0)*f(2y-1) + f(1)f(2y-2) + f(2)f(2y-3)… +f(y-1)f(y) ]*2

Let’s try this formula on f(2)…= 2 good
Let’s try this formula on f(3)…=5 y=1

–N=3:
if ‘1’ is root, then there are f(2) BSTs
if ‘3’ is root, then there are f(2) BSTs
if ‘2’ is root, then there are f(1) * f(1) BSTs
–N=9:
if ‘1’ is root, there are f(8) BSTs
if ‘9’ is root, there are f(8) BSTs
if ‘4’ is root, there are f(3) left-subtrees and f(5) right-subtrees, giving f(3)*f(5) BSTs

This is not coding challenge but math challenge.

Q2: output all BSTs. We need a way to represent (serialize) a BST?

CIV: I still prefer RBtree #Rahul

  • goal #1 in CIV — speedy completion of an optimal solution in terms of O().
  • j4 #1 — RBTree is more intuitive more natural to my problem solving, more than priorityQ and sorted vector. Given my #1 goal, I would go for my favorite tools.
  • j4 — if the interviewer gives a new requirement, my RBtree may become useful (51% chance) or become unusable (49% chance)
  • Drawback #1 of RBtree — not supported in python
  • Drawback  — array sorting can achieve O(N) using radix sort or counting sort, esp. in the contrived contexts of Leetcode problems.

Q: what if interviewer feels RBtree is overkill and over-complicated?
A: I would say overall bigO is not affected.
A: I would say RBTree supports future features such as delete and dynamic data set.  Realistic reasons are powerful arguments.

Q: what if interviewer gives a follow-up request to simplify the design?
A: if I already have an optimal solution, then yes I don’t mind replacing my RBTree

a standard representation of binTree

I think on Leetcode there’s now a standard representation of simple binTree. https://leetcode.com/problems/convert-sorted-array-to-binary-search-tree/ shows one example.

see also (de)serialize binary tree #funptr ECT

Let’s not spend too much time here. This representation is simple but not very visual. Can’t deal with recombinant trees. It is useful only for Binary trees not general graphs. Graphs need adjacency matrix or edgeSet.

— My own idea is probably similar:

Can we make do without id and design the serialization sequence to save/reconstruct the tree structure?
I can output layer by layer.  If 2nd layer has no missing node, then 3rd layer consists of exactly 4 nodes, using a sentinel value for any NUL.  If 2nd node is NUL like [A/NUL/C/D], then the next layer would have “2 child nodes for A, 2 child nodes for C, 2 child nodes for D”, where the 2 child nodes for A could be NUL/NUL

What if all integer values are valid values? Trivial problem. I could add a one-bit flag to each serialized payload.

The id/edgeList based solution is general purpose and efficient. The technique above are 雕虫小技. Venkat of OC said something similar about regex state machine. In the same vein, disjoint set is a general solution though many problems have simplified union-find solutions.

pre-order: Polish notation #any binTree

  • Pre-order walk can create a expression tree in Polish notation
  • Post-order walk can create a expression tree in reverse-Polish notation

https://en.wikipedia.org/wiki/Tree_traversal#Uses has a concise example.

This usage is fairly popular in my past coding interviews.

In fact, the wikipedia article goes on to say that pre/post order walk can create a “representation  of a binary tree” … see my blogposts for details.

avoid lopsided BST

Self-balanced is not a theoretical nicety but an essential BST feature. Without it a BST could easily become lopsided and all operations will slow down.

If for any reason (I don’t know any) we can’t use a AVL or RBT, then we could randomize the input and insert them (without deletion) and get a fairly balanced BST.

intervalTree: classic RBTree to find overlapp`event

Notable detail — non-balancing BST won’t give logN performance, since the tree depth may degrade

Q1: given N event objects {start, end, id}, use a RBTree to support a O(logN) query(interval INT) that returns any one event that overlaps with INT, or NULL if none.

If the end time == start time in the input INT, then it becomes the focus today :

Q2: given N event objects {start, end, id}, use a RBTree to support a O(logN) query(timestamp ii) that returns any one event that’s ongoing at time ii, or NULL if none.

P312 [[intro to algorithms]] describes an elegant solution.

The text didn’t highlight — I think the end time of the input interval INT is … ignored. (Therefore, in my Q2, input ii is a single timestamp, not an event.) Precisely because of this conscious decision, the tree is sorted by event start time, and the additional payload (in each node N) is the subtree end time i.e. last end time of all events started before N (including N itself). By construction, N’s payload is equal or higher than N’s start time. (Note Our input ii can fall into gaps between the event intervals.)

Eg: suppose N starts at 2:22 and left-child payload says 3:33, then we know for sure there at least one ongoing even during the interval 2:22 to 3:33.  Not sure if this is useful insight.

The text illustrates and proves why this design enables a clean and simple binary search, i.e. after each decision, we can safely discard one of “my” two subtrees. Here’s my (possibly imperfect) recall:

Let’s suppose each time the current node/event is not “ongoing”. (If it is then we can return it 🙂  ) So here’s the gist of the algorithm:

Suppose i’m the “current node”, which is root node initially. Compare ii to my left child L’s payload (not my payload).

As defined, L’s payload is the highest end times of L + all its sub nodes. In other words, this payload is the highest end time of all events starting before me.

Note the algo won’t compare L’s start time with ii
Note the algo won’t compare my start time with ii, though the scenario analysis below considers it.

  • case 1: If payload is lower than ii, then we know all events on my left have ended before ii, so we can discard the left subtree completely. Only way to go is down the right subtree.
  • the other case (case 2): if payload is higher or equal to ii, then we know my left subtree contains some events (known as “candd” aka candidates) that will end after ii.
  • case 2a: suppose my start time is before ii, then by BST definition every candidate has started before ii. We are sure to find the “candd” among them.
  • case 2b: suppose my start time is after ii, then by BST definition, all right subtree events have not started. Right subtree can be discarded
  • In both cases 2a and 2b, we can discard right subtree.

 

priorityQ: 2 advantages over RBtree#O(1)add #Part2

RBTree O(1) insert is quite common in coding questions.

[1] Contrary to popular belief, RBTree mass insert (including mass-construction) is O(logN) rather than O(1) per node in the general case. However, see link above.

See lecture notes https://courses.cs.washington.edu/courses/cse373/02au/lectures/lecture11l.pdf and SOF post on
https://stackoverflow.com/questions/6147242/heap-vs-binary-search-tree-bst

2 nodes] binTree-with-cycle: locate common ancestor

Q (Leetcode #236): given 2 valid nodes (and root) of a binary tree, find the lowest common ancestor. A node can be a (direct/indirect) descendant of itself. All values distinct. No uplink.

classic problem:)

Q2 (my own requirement): what if cycle is possible?

My idea — Just run a lazy-dft to find the two paths-from-root. On each path, if we detect a cycle we terminate that path. Before terminating any path, need to check if we hit both nodes, so after finding one node we must go all the way to the leaf node or the one of the 2 given node.

As soon as we find the 2 paths we terminate DFT.

IIF two CPUs are given, my dft will use two threads — one left to right; the other right to left. This will more quickly locate the 2 target nodes if they appear near extremities.

https://github.com/tiger40490/repo1/blob/cpp1/cpp/binTree/commonAncestor_Cycle.cpp is my self-tested code, not tested on Leetcode

max-sum path Down binTree #self-tested

Q1: Given a non-empty binary tree of signed integers, find the maximum path sum. For this problem, a path is defined as any sequence of nodes from any starting node to any node in the tree along the parent->child connections. The path must contain at least one node and does not need to go through the root. No uplink. No cycle.

Luckily, there’s no published solution for this modified leetcode problem 🙂

====analysis====

My solution — DFT. Along each root-to-leaf path, use the max-subarray (Kadane) algo and store maxSumEndingHere value in each node, for reuse.

Q: is there any duplicate work?
A: I hope not, thanks to memoization i.e. Node::meh field

Q: do we visit every path?
A: I think so.

I simplified the idea further in

https://github.com/tiger40490/repo1/blob/cpp1/cpp/algo_binTree/maxPathSum.cpp

Time complexity is .. O(V+E) = O(N), since I visit every node and follow each edge once only.

There might be algorithmically superior solutions on leetcode but I don’t want it to affect my joy, motivation and momentum.

isSymetric(root of a binary tree)

Leetcode published solution has a clever BFT solution, easier to implement.

–idea 1 (untested): BFT to list all nodes at a given level
For each node’s enqueue() (including the null nodes), record the path-from-root as a list. As a lighter alternative to this “list”, the path can degenerate to the last step, as a le/ri flag.

Now scan each level from both ends. The left item’s path should mirror the right item’s path.

(Before the scan. confirm the node count is even.)

–in-order dft
first scan records the paths to each leaf node. (Optionally, each path can includes east/west directions).
2nd scan does the same but always visits right child first.

The two outputs should match

killer app@RBTree #priorityQ,sortedVector

A common usage context is query on some data set after pre-processing . In these contexts, BST competes with sorted vector and priority queue.

  • Killer app against vector: incremental insert or live updates
  • Killer app against vector: if there’s even occasional delete(), then sorted vector would suffer
  • Killer app against vector: update on one item can be implemented as delete/reinsert. Vector requires binary search -> mid-stream insert
  • minor advantage over sorted vector: vector sorting requires swapping, so value-semantics is problematic
  • Killer app against priority queue: range query, approximate query,
  • Killer app against priority queue: both max() and min()
  • Application: if each BST node needs additional data. Binary heap doesn’t expose those nodes(?)

It’s important to remember the advantages of vector

  1. cache efficiency
  2. runtime malloc cost

Java, c++ and c# all provide self-balancing BST in their standard libraries, from the very beginning. In my projects, I use these containers on an everyday basis. However, after talking to a few friends I now agree that most coding problems don’t need self-balancing BST because

  1. These problems have no incremental insertion/deletion, so we can simply sort an array of items at O(N logN) as pre-processing
  2. In some cases, priorityQ is a viable alternative
  3. Python doesn’t have this container at all and all coding problems must support python.

versionTable^BST to support binary search #Indeed

My son’s exam grading table says
A: 85 and above
B: 62 and above
C: 50 and above

One solution to support efficient getGrade(score) is a deque of record {qualifyingScore, grade}. Binary search on the qualifyingScore field is O(logN)

Now a subtly different problem. Suppose you update your resume title periodically,

“developer” in version 1 or later
“SrDev” in version 5 or later
“Consultant” in version 9 or later
“Architect” in version 13 or later

This problem is “subtly” different because new versions always get higher version numbers.

I have always avoided the deque-table data structure, in favor of the more powerful RedBlack trees. Current conclusion — RBTree is still superior, but IFF we receive the table content (high volume) in sorted order, then deque-table is simpler at least conceptually. Since python and javascript don’t offer RBTree, many interviewers aren’t familiar with it.

  • prepend/append is the advantage of deque-table. However, RBTree insert-with-hint is amortized O(1), comparable to deque table.
  • mid-stream Insert in general is more efficient on RBTree because deque is O(N)
  • Delete is similarly more efficient in RBTree.
  • Lookup is all O(logN).

Delete is often simplified away in coding tests, but important in practice, to correct mistakes or prune overgrown data stores. Realistic Binary search systems seldom operate on “immutable” data store.

isPreorderBST(randomArray) #hackerrank

This was asked in a 2018 hacker rank interview, not as important as an on-site coding question. However, I see this question as a classic.

Should study some examples !

https://www.geeksforgeeks.org/check-if-a-given-array-can-represent-preorder-traversal-of-binary-search-tree/ has a tested solution but it’s too cryptic. I added instrumentation to help myself understand it. See my github code.

My analysis — After we have seen some number of nodes, there’s exactly one possible tree we could construct, so let’s construct it.

Note during the construction, each new node has only one place to go (key insight) and after that we can check if it breaks pre-order.

https://github.com/tiger40490/repo1/blob/py1/py/algo_tree/isPreOrderBST.py is my tested code, probably less elegant than the G4G solution, but I’m still proud of it.

I don’t think any solution (including the G4G) is really O(N). My solution is not inferior. It has a readability advantage. It’s longer but not necessarily slower.

 

detect cycle in binary tree

Q1: (Adapted from a real interview) Given a binary tree, where each node has zero or 1 or 2 child nodes but no “uplink” to parent, and given the root node, detect any cycle.

https://github.com/tiger40490/repo1/blob/cpp1/cpp/binTree/cycleInBinTree.cpp is my tested implementation of 1a.

Solution 1a: Three web sites all point at DFT with hashset. I guess the hashset is shrunk whenever we return from a recursion

Solution 1b: I will first write a classic BFT, where each node is processed by processNode(). In this case, my processNode(me) function will start another BFT to traverse my subtree. If I hit myself, then that’s a cycle. I think the additional space required is the size of the queue, which is up to O(N). The (non-recursive) call stack at any time is at most 2 (or 3?) levels.

bft solution 1c: Upon append, each node keeps a parent-node-set, represented by hash table. Too much memory needed

Q2: how about constant space i.e. don’t use O(N) additional space?

I think any binary tree traversal requires more than O(1) additional space, except Morris. But can Morris even work if there are cycles?

 

RBTree range count #enum,auto

// demo range counting with lower_bound. I don't know any faster algorithm
// demo auto keyword
// demo enum
// demo upper_bound different from lower_bound!
#include <iostream>
#include <climits>
#include <set>
#include <assert.h>
using namespace std;

set<int> s;
//typedef set<int>::iterator It; // ---> auto is easier
enum Mode { NoMin, NoMax, BothGiven };

size_t countWithInclusiveLimits(int limit1, int limit2, Mode m = BothGiven){
  if (s.empty()) return 0;
  auto it  = s.begin();
  auto it2 = s.end(); //it2 is the node past the last wanted node.

  if (m != NoMin) it = s.lower_bound(limit1);
  if (m != NoMax){
    it2 = s.upper_bound(limit2);
    assert(*it2 != limit2 && "it2 initial value should be consistent with end()");
  }

  size_t ret = 0;
  for (; it != it2; ++it){
    cout<<*it<<" ";
    ++ret;
  }
  cout<<" --> "<<ret<<endl;
  return ret;
}
int main(){
  for(int i=-4; i<=9; ++i) s.insert(i*10);
  countWithInclusiveLimits(11, 55);
  countWithInclusiveLimits(0, 50, NoMin);
  countWithInclusiveLimits(10, 0, NoMax);
}

bbg-FX: check if a binary tree is BST #no deep recursion

Binary search tree has an “ascending” property — any node in my left sub-tree are smaller than me.

Q1: given a binary tree, check if it’s a BST. (No performance optimization required.) Compile and run the program.

Suppose someone tells you that the lastSeen variable can start with a random (high) initial value, what kind of test tree would flush out the bug? My solution is below, but let’s Look at Q1b.

Q1b: what if the tree could be deep so you can’t use recursion?
A: use BFT. When we actually “print” each node, we check left/right child nodes.


I made a mistake with lastSeen initial value. There’s no “special” initial value to represent “uninitialized”. Therefore I added a boolean flag.

Contrary to the interviewer’s claim, local statics are automatically initialized to zerohttps://stackoverflow.com/questions/1597405/what-happens-to-a-declared-uninitialized-variable-in-c-does-it-have-a-value, but zero or any initial value is unsuitable (payload can be any large negative value), so we still need the flag.

#include <iostream>
#include <climits>
using namespace std;

struct Node {
    int data;
    Node *le, *ri;
    Node(int x, Node * left = NULL, Node * right = NULL) : data(x), le(left), ri(right){}
};
/*    5
  2       7
 1 a
*/
Node _7(7);
Node _a(6);
Node _1(1);
Node _2(2, &_1, &_a);
Node _5(5, &_2, &_7); //root

bool isAscending = true; //first assume this tree is BST
void recur(Node * n){
//static int lastSeen; // I don't know what initial value can safely represent a special value

  //simulate a random but unlucky initial value, which can break us if without isFirstNodeDone
  static int lastSeen = INT_MAX; //simulate a random but unlucky initial value
  static bool isFirstNodeDone=false; //should initialize to false

  if (!n) return;
  if (n->le) recur(n->le);

  // used to be open(Node *):
  if (!isAscending) return; //check before opening any node
  cout<<"opening "<<n->data<<endl; if (!isFirstNodeDone){ isFirstNodeDone = true; }else if (lastSeen > n->data){
        isAscending=false;
        return; //early return to save time
  }
  lastSeen = n->data;

  if (n->ri) recur(n->ri);
}

int main(){
  recur(&_5);
  cout<< (isAscending?"ok":"nok");
}

seek successor of a given node in a BST # no uplink

Input: root node and an arbitrary node A.

We can’t start from A because by moving left/right, we may not be able to locate the successor. So we start from root and will encounter A.

I think this is a simple, barebones in-order walk entering at root. We will encounter A, and the next node encountered would be A’s successor.

Note this barebones walk requires no uplink.

seek successor of a given node in a BST #root node unknown

EPI300 Q14.2.

I feel code is hard to write, as it’s hard to visualize or build the tree.

In-order walk must enter at root. If we only have the current node as the only input, then I assume there’s an uplink in each node. Here’s my algo:

Case 1: if i have Right child, then descend there and then descend Left all the way to a leaf node and return. Easy case.

Case 2: now we know i have no Right child. Am i a Left child? If yes then return my parent. Easy case.

Case 3: Now we know I am the Right child, without my own right child. This is the worst case.   My left child (if any) is irrelevant, so effectively I am a leaf node. A right leaf node. Solution: Move up step by step. After the first up-right move,  descend Left all the way.

For predecessor, I found this algo online, but it can’t move up so unable to support Case 2 and 3.

  1. /* Find the inorder predecessor of current */
  2. pre = current->left;
  3. while (pre->right != NULL && pre->right != current)
  4.    pre = pre->right;

Based on that, here’s my successor algo:

  1. /* Find the inorder sucessor of current */
  2. pre = current->right;
  3. while (pre->left != NULL && pre->left != current)
  4.    pre = pre->left;

binary tree in-order walk : notes

This is a favorite algo interview topic.

  • I believe in-order walk will print a binary search tree in left-to-right order.
  • De-confuse — visiting is not same as printing. I believe we need to visit a node P to access its left child, but we must not print P so early!
  • Binary tree is the only thing that uses in-order
  • De-confuse — “node.parent” is not available. Singly-linked tree, not doubly-linked.

Let’s write some python code to

Morris in-order walk]O(N) #O(1)space

I feel this is too hard and unlikely to come up in coding interviews. Also, most trees have up-links, so the no-uplink constraint is artificial and unrealistic. In reality, Morris complexity is usually unnecessary.

–threaded binary tree is the basis of Morris

https://en.wikipedia.org/wiki/Threaded_binary_tree

The Morris algo need to construct the threaded BST, walk and remove the extra links, all without recursion.

  • Tip: For my ascending walk, I only add right-up thread link to bring me to my ancestors. I don’t need any leftward thread link.
  • Tip: All the thread links point upward
  • How many right-up links? Let’s define two clubs Honey and White.
    • every node HH having a left child will get a incoming right-up link pointing to it! After printing HH’s left subtree, this link is used to revisit HH.
    • Except the very last node, every node WW without a right child needs to add a right-up link, so we don’t get stuck at WW.
    • If the Honey club has H members, and White club has W members, then I think we create H right-up links. W +1 = H
    • A node can be both Honey and White i.e. it has a right-up link and is also target of a right-up link.
  • Tip: the sequence of Honey nodes to try has to be top-down. We must create the uplink to every higher Honey node first, as insurance that we can always come back.  While a higher Honey node is locked down and we search for its predecessor in its left subtree, we ignore any lower Honey node
    • First time we hit a Honey node, I’m sure it is not uplinked. After we get it uplinked, we must descend Left.
    • 2nd time we hit a Honey node, I believe its left subtree is completely fixed, so if we descend left again, we will go to a dead end. We must move right, either down or up.
  • Tip: should really use a few variables instead of “cur”. This “cur” is like a “temp1” variable used in first pass, then temp2 variable in 2nd pass etc. These variables are not related at all.

https://github.com/tiger40490/repo1/blob/cpp1/cpp1/binTree/MorrisInOrder.cpp is my c++ implementation with a concise/cryptic implementation and an “unfolded” elaborate implementation

priorityQ^RBtree, part 1

No expert here… Just a few pointers.

I feel binary heap is designed for a Subset of the “always-sorted” requirement on sorted trees. The subset is “yield the largest item on demand”.

Therefore, a binary heap is simpler (and faster) than a red-black tree.

As a solution, sorted data structures (like RB tree) are more in demand than priority queues (like heap). For example, the classic exchange order book is more likely a sorted list, though it can be a priority queue.

A binary heap hides all but the maximum node, so we don’t know how those nodes are physically stored.

A binary heap is a binary tree but not sorted, not a BST.

binary search tree

defining property: any node to my left is smaller or equal. left child = parent. Both children is allowed to be equal to parent — binary search tree is completely unbiased.

The defining property defined visually: For each node, draw a vertical line (long shadow) through the tree.
* Left descendants’ v-lines are on or inside my v-line.
* Right descendants’ v-lines are on or outside my v-line.

When nodes move up or down the three, their shadows shift. I think this may help us visualize a BST successor, deletion and insertion.

the same group of items can build 2 different bs-trees, with different tree heights. How? The same entries could arrive out of sequence, just like IP packets.

The deeper the tree, the more iterations for a tree walk, the slower the sort.

Worst case bst has a tree height equal to input size, with one branch at each node?