T.is_subtree(S) #60%

Q (Leetcode 572): Given two non-empty binary trees s and t, check whether tree t has exactly the same structure and node values as a subtree of s. A subtree of s is a tree consists of a node in s and all of this node’s descendants. The tree s could also be considered as a subtree of itself.

====analysis
https://leetcode.com/problems/subtree-of-another-tree/solution/ relies (as “primary check”) on the payloads of  each tree node, but in some trees, all payloads are empty or all payloads are either True or False. In these cases, the comparison of payload is only usable as a secondary check. The primary check must be structural.

The O(N+K) is plain wrong.

I guess the 2nd solution (and possibly 1st solution) would compare every node in S to the root of T. I think there are more efficient solutions using subtree size and subtree height as secondary checks – more reliable than payload check.

My solution below uses BFT + pre/post/in-order walk !

— Preliminary step: post-order walk to get subtree-size, subtree-height at each S node + T root. (I will skip other T nodes.). Suppose T is size 22 height 4. We will look for any node Y of size 22 and height 4 and a matching payload. This would eliminate lots of S nodes:

If T height is more than 2, then lots of low-level S nodes are eliminated.
If T height is 2 or 1, then T size would be at most 3. Most high-level S nodes are eliminated.

— Solution 1: For both T and S, We take in-order walk to assign incremental IDs, then take pre-order walk to produce an array of IDs that represent the tree structure.

Can We run a level-aware BST. Only one level need to be examined … wrong!

I think the in-order walk itself is what I need. Find any node Y in S that matches size+height+payload of T root. Suppose ID(Y)=44 but ID(T root) = 4, then simply shift down by 40 and do a linear scan. Must check payload, but not height/size.

 

Advertisements

## BST images curated

Some useful BST diagrams up to 5 levels.

 

4-level complete tree 5-level irregular
Image result for example binary search tree diagrams Related image
Image result for example binary search tree diagrams Image result for example binary search tree diagrams
5-level skewed 4-level irregular
4-level 4-level
Image result for example binary search tree diagrams Image result for example binary search tree diagrams
Image result for skewed bst diagrams Image result for skewed bst diagrams
5-level skewed 5-level skewed
4-level small
Image result for skewed bst diagrams

 

Image result for example binary search tree diagrams

reverse post-order mirrors pre-order

For a given binTree the pre-order (output) sequence is mirrored by the reverse-post-order sequence.

“Reverse” == “right_child then left_child”

P 389 [[discrete mathematics]] has an insightful illustrated path-tracer diagram showing that pre-order = ABCDEFGHIJ with root as first visited, and J=right most leaf node. The reverse-post-order = JIHGFEDCBA with root as the last visited, due to post-order.

Note any post-order curve is less intuitive less visual (than pre-order) because the curve hits a root of subtree upon leaving ! The earlier encounters are touch-n-go.

Similarly post-order sequence is mirrored by reverse-pre-order. For the same binTree above, reverse-pre-order = AFGHJIBDEC (not mentioned in the book). Post-order = CEDBIJHGFA, as printed in the middle of P 389. Note root is last visited due to post-order.

print height@subtree at every node #post-order

For these problems, elegant implementation is expected.

Note in-order is for binary tree, but post-order is for any tree.

== Q: given Any tree without cycle, print the size of subtree at every node

I think post-order walk is usable.

== Q: given Any tree without cycle, print the height of subtree at every node. Also print the node address just for identification

Intuition — dft post-order walk. Need to become more familiar with the detailed steps. https://www.techiedelight.com/calculate-height-binary-tree-iterative-recursive/ has details.

== Q (Leetcode): Given a binary tree, determine if it is height-balanced — a binary tree in which the depth of the two subtrees of every node never differ by more than 1.

I can try it on Leetcode .. no need to set up test harness myself but I might be frustrated

Intuition– post-order walk. A simple recursive algo — for any node, find the its subtree height including itself and return it. Before returning, also compare the left vs right subtree heights

BST: finding next greater node is O(1) #Kyle

https://stackoverflow.com/questions/11779859/whats-the-time-complexity-of-iterating-through-a-stdset-stdmap shows that in a std::map, the time complexity of in-order iterating every node is O(N) from lowest to highest.

Therefore, each step is O(1) amortized. In other words, finding next higher node in such a BST is a constant-time operation.

https://en.cppreference.com/w/cpp/iterator/next confirms that std::next() is O(1) if the steps to advance is a single step. Even better, across all STL containers, iterator movement by N steps is O(N) except for random-access iterators, which are even faster, at O(1).

serialize binTree #funptr

struct Node{
  Node * left, right;
  int data;
}

… If you simply output as “data,leftId,rightId | data,leftId,rightId | …” then the graph structure i.e. link is lost. Therefore, I feel the stream need to identify each node: “Id,data,leftId,rightId|…
Next question is how to assign the id values. Easiest — use node address as a string id!

Suppose there’s a treewalk algo walk(Node * root, void ( *callback ) (Node *)), i will call it like

walk(root, serialize1node); // where serialize1node is the callback.

Note the position of the asterisk:

  • It’s on the right of type name Node — read left to right .. pointer to Node
  • It’s on the left of variable name callback — read left to right .. callback is a pointer to …

https://github.com/tiger40490/repo1/blob/cpp1/cpp/binTree/serialize_bbg.cpp is my tested solution

I feel the challenge is ECT and syntax, not the idea.

—- elegant home-grown solution : two-pass

Assign incremental IDs to each node in an in-order walk. Then run pre-order walk to output each ID.

  • 🙂 applicable for any binTree
  • 🙂 No null node needed
  • 🙂 no space overhead. Instead of serializing address, I serialize the ID of each node, which are usually smaller and never bigger than a pointer. I think the serialized byte array is the smallest possible.
  • Key insight — from a given preorder output, there’s exactly one possible BST we construct.

—- solution: standard graph representations — AdjMatrix + edgeSet 

—- solution : BFT -> list including nulls

—-idea from my friend Deepak: pre-order tree walk and output one line per node {payload, direction from root, direction from Level 1 ancestor, direction from Level 2 ancestor..}. The technique is reusable even though overall efficiency and complexity is not optimal. We don’t need to be optimal.

  • first node in the file is always the root
  • the output file always includes the higher level nodes before the lower-level nodes
  • delimiter (\n) needed between nodes
  • demerit — data bloat… too verbose if tree is sparse but 99 levels deep

—-If the tree is a BST, then this idea is based on the fact that a reading a series of unsorted nodes can construct a BSTree. For this solution, we have no requirement on the payload values. They can be booleans.

first pass — in-order walk to build a hash table of {node address -> integer id}. If we treat the Id’s as payload, then the tree is a BST.

2nd pass — BFT the original tree. For each node, we write original payload, id to the file, without writing the links or addresses. File looks like T,5|T,4|T,6|T,2|…

When we reconstruct the tree, we read each node from the file, and insert that node into our new tree, using the id value to decide where.

sorted_Array|_List ⇒ balanced_BST

Q1 (easy): Given an array where elements are sorted in ascending order, convert it to a height balanced BST, where the depth of the two subtrees of every node never differ by more than 1.

Q2 (medium): How about a sorted slist?

==== analysis

I feel this “easy” problem should be medium. Perhaps there are elegant solutions.

I need not care about the payloads in the array. I can assume each payload equals the subscript.

There are many balanced BSTs. Just need to find one

— idea 1: construct an sequence of positions to probe the array. Each probed value would be added to the tree. The tree would grow level by level, starting from root.

The sequence is similar to a BFT. This way of tree building ensures

  • only the lowest branch nodes can be incomplete.
  • at all branch levels, node count is 2^L

I think it’s challenging to ensure we don’t miss any position.

Observation — a segment of size 7 or shorter is easy to convert into a balanced subtree, to be attached to the final BST.

When a subarray (between two probed positions) is recognized as such a segment we can pass it to a simple routine that returns the “root” of the subtree, and attach it to the final BST. This design is visual and a clean solution to the “end-of-iteration” challenge.

The alternative solution would iterate until the segment sizes become 0 or 1. I think the coding interviewer probably prefers this solution as it is shorter but I prefer mine.

— idea 2

Note a complete binary tree can be efficiently represented as an array indexed from one (not zero).

— Idea 3 (elegant) dummy payload — For the slist without using extra storage, I can construct a balanced tree with dummy payload, and fill in the payload via in-order walk

All tree levels are full except the leaf level — guarantees height-balanced. We can leave the right-most leaf nodes missing — a so-called “complete” binary tree. This way the tree-construction is simple — build level by level, left to right.

— idea 4 STL insert() — For the slist, we can achieve O(N) by inserting each element in constant time using STL insert() with hint

— For the slist without using an array, I can get the size SZ. (then divide it by 2 until it becomes 7,6,5 or 4.) Find the highest 3 bits (of SZ) which represent an int t, where 4 <= t <= 7. Suppose that value is 6.

We are lucky if every leaf-subtree can be size 6. Then we just construct eight (or 4, or 16 …) such leaf-subtrees

If SZ doesn’t give us such a nice scenario, then some leaf-subtrees would be size 5.  Then my solution is to construct eight (or 4, or 16 …) leaf-trees of type AAAAA (size 6) then BBB (size 5).

Lucky or unlucky, the remaining nodes must number 2^K-1 like 3,7,15,31 etc.  We then construct the next level.

–For the slist without using an array, I can propose a O(N logN) recursive solution:
f(slist, sz){
locate the middle node of slist. Construct a tree with root node populated with this value.
Then cut the left segment as a separated slist1, compute its length as sz1. node1 = f(slist1,sz1); set node1 as the left child of root.
Repeat it for the right segment
return the root
}

Preorder+Inorder list → binTree #50%

https://leetcode.com/problems/construct-binary-tree-from-preorder-and-inorder-traversal/ says — Q: Given preorder and inorder traversal of a tree, construct the binary tree. You may assume that duplicates do not exist in the tree.

For example, given

preorder = [3,9,20,15,7]
inorder = [9,3,15,20,7]

Return the following binary tree…

====analysis

The int values in the int arrays are meaningless and useless. I can improve things by assigning incremental ranks [1] to each node in the inorder list. Then use those ranks to rewrite the preorder list. After that, walk the preorder list:

  • first node is root
  • any subsequent node can be placed precisely down the tree.

I solved a similar problem of converting preorder list into BST — https://bintanvictor.wordpress.com/wp-admin/post.php?post=19713&action=edit

identical binTree

Q: Given two binary trees, write a function to check if they are the same or not. Two binary trees are considered the same if they are structurally identical and the nodes have the same value.

====analysis:

Look at hints? No need !

— solution 1: I can use the serialization solution and compare the two serialized strings in real time.

— solution 2: BFT but ensure each branch level has strictly 2^n items including nulls

Intuition — If both sides match up on every level, then identical

This solution works if each tree node has a fixed number of child nodes.

— solution 2b: a null node at level N will reduce Level N+1 nodes by two.

— solution 3: recursion
bool diff(node1, node2){
if diff(node1.left, node2.left) or diff(node1.right, node2,right): return false
}
There might be some implementation issues

— Idea 9: BFT but each queue item is {payload, childrenCount}

This solution doesn’t work for binTree as it can’t detect “null leftChild” vs “null rightChild”.

This solution works if each node can have arbitrary number of child nodes, but I don’t know how common this is.

us`lowest N natural nums,count BST#70%#Rahul

how do you make use of the previous results for N=2 to tackle N=3?
f(z) denotes count of unique BSTs consisting of the fist z natural number
f(0)=f(1)=1

* for odd z, express it as z:=2y+1 where y > 0
f(2y+1)=[ f(a=0)*f(2y) + f(a=1)f(2y-1) + f(2)f(2y-2)… +f(y-1)f(y+1) ]*2 + f(y)f(y)

Note a should increment from 0 up to y-1.

* for even z, express it as z:=2y where y > 0
f(2y)=[ f(0)*f(2y-1) + f(1)f(2y-2) + f(2)f(2y-3)… +f(y-1)f(y) ]*2

Let’s try this formula on f(2)…= 2 good
Let’s try this formula on f(3)…=5 y=1

–N=3:
if 1 is root, then there are f(2) BSTs
if 3 is root, then there are f(2) BSTs
if 2 is root, then there are f(1) * f(1) BSTs
–N=9:
if 1 is root, there are f(8) BSTs
if 9 is root, there are f(8) BSTs
if 4 is root, there are f(3) left-subtrees and f(5) right-subtrees, giving f(3)*f(5) BSTs

This is not coding challenge but math challenge.

Q2: output all BSTs. We need a way to represent (serialize) a BST?

a standard representation of binTree

I think on Leetcode there’s now a standard representation of simple binTree. https://leetcode.com/problems/convert-sorted-array-to-binary-search-tree/ shows one example.

see also (de)serialize binary tree #funptr ECT

Let’s not spend too much time here. This representation is simple but not very visual. Can’t deal with recombinant trees. It is useful only for Binary trees not general graphs. Graphs need adjacency matrix or edgeSet.

— My own idea is probably similar:

Can we make do without id and design the serialization sequence to save/reconstruct the tree structure?
I can output layer by layer.  If 2nd layer has no missing node, then 3rd layer consists of exactly 4 nodes, using a sentinel value for any NUL.  If 2nd node is NUL like [A/NUL/C/D], then the next layer would have “2 child nodes for A, 2 child nodes for C, 2 child nodes for D”, where the 2 child nodes for A could be NUL/NUL

What if all integer values are valid values? Trivial problem. I could add a one-bit flag to each serialized payload.

The id/edgeList based solution is general purpose and efficient. The technique above are 雕虫小技. Venkat of OC said something similar about regex state machine. In the same vein, disjoint set is a general solution though many problems have simplified union-find solutions.

simple tree applied to DP++

This is part of my thick->thin effort, to connect the dots and pattern-recognition. May or may not be highly effective, but no harm trying

By my own definition, A “simple tree” is non-recombinant and cycle-free, and every node has one uplink only. I find these trees very visual. They often help me visualize tough Dynamic Programming and other algorithms:

  • decision tree
  • (Rahul pointed out) DP algos are often recursive, and recursive algos often have a call-tree structure
  • recursive call within each loop iteration — this powerful algo can often be represented as a call-tree or decision-tree
  • backtracking — often uses a tree
  • generating all solutions (paths, formulas..) — often uses a tree
    • paths-from-root — each path often maps to a solution
    • eg: punctuation — to print out all sentences
    • eg: AQR factorization problem may be solvable using the comboSum algo.

parent/child pairs→tree algos #Indeed

As a speed-coding test, this problem requires you to apply common computer science constructs to a realistic problem, and then challenges you

“How many seconds do you need to build a working directed graph from raw data, and run BFT/DFT on it?”

45 minutes given. Target is to complete first 2 questions with working code. Can some candidates complete all 3 questions? I guess so.

Q: You are given some raw data like

parent_child_pairs = [ (1, 3), (2, 3), (3, 6), (5, 6), (5, 7), (4, 5), (4, 8), (8, 10), (11,2) ]

Suppose we have some input data describing a graph of relationships between parents and children over multiple generations. The data is formatted as a list of (parent, child) pairs, where each individual is assigned a unique integer identifier.

For example, in this diagram, 3 is a child of 1 and 2, and 5 is a child of 4:

  11
   \
1   2   4
 \ /   / \
  3   5   8
   \ / \   \
    6   7   10

Q1: write a function to output all individuals having no parents(like 1 and 4) in a list, and another list of individuals having a single parent (like 8 and 7)

Q2: write a bool function to determine if two named individuals have any common ancestor. 3 and 6 yes; 3 and 1 no!

I wrote a DFT solution .. https://github.com/tiger40490/repo1/blob/py1/py/tree/commonAncestor_Indeed.py Not very efficient but I really should care less about that since the real challenge is .. timely completion. I was not used to writing DFT on the spot within minutes but I hacked it together under ticking clock, first time in my career !

To find if two sets intersect, I was forced to make a quick judgment call to write my own loop.

  • I didn’t know if there’s a simple and reliable solution online
  • i didn’t know how much effort is required to search online and understand it
  • i didn’t know how much effort is required to adapt standard solution to suit my needs
  • My own loop gives more legwork but more control if requirements turns out to be non-standard.

Q3: (original wording) Write a function that, for a given individual in our dataset, returns their earliest known ancestor — the one at the farthest distance from the input individual. If there is more than one ancestor tied for “earliest”, return any one of them. If the input individual has no parents, the function should return null (or -1).

Sample input and output:

findEarliestAncestor(parentChildPairs, 8) => 4
findEarliestAncestor(parentChildPairs, 7) => 4
findEarliestAncestor(parentChildPairs, 6) => 11
findEarliestAncestor(parentChildPairs, 1) => null or -1

avoid lopsided BST

Self-balanced is not a theoretical nicety but an essential BST feature. Without it a BST could easily become lopsided and all operations will slow down.

If for any reason (I don’t know any) we can’t use a AVL or RBT, then we could randomize the input and insert them (without deletion) and get a fairly balanced BST.

intervalTree: classic RBTree to find overlapp`event

Notable detail — non-balancing BST won’t give logN performance, since the tree depth may degrade

Q1: given N event objects {start, end, id}, use a RBTree to support a O(logN) query(interval INT) that returns any one event that overlaps with INT, or NULL if none.

If the end time == start time in the input INT, then it becomes the focus today :

Q2: given N event objects {start, end, id}, use a RBTree to support a O(logN) query(timestamp ii) that returns any one event that’s ongoing at time ii, or NULL if none.

P312 [[intro to algorithms]] describes an elegant solution.

The text didn’t highlight — I think the end time of the input interval INT is … ignored. (Therefore, in my Q2, input ii is a single timestamp, not an event.) Precisely because of this conscious decision, the tree is sorted by event start time, and the additional payload (in each node N) is the subtree end time i.e. last end time of all events started before N (including N itself). By construction, N’s payload is equal or higher than N’s start time. (Note Our input ii can fall into gaps between the event intervals.)

Eg: suppose N starts at 2:22 and payload says 3:33, then at we know for sure there at least one ongoing even during the interval 2:22 to 3:33.  Not sure if this is useful insight.

The text illustrates and proves why this design enables a clean and simple binary search, i.e. after each decision, we can safely discard one of “my” two subtrees. Here’s my (possibly imperfect) recall:

Let’s suppose each time the current node/event is not “ongoing”. (If it is then we can return it 🙂 So here’s the gist of the algorithm:

Suppose i’m the “current node”, which is root node initially. Compare ii to my left child L’s payload.

As defined, this payload is the highest end times of L + all its sub nodes. In other words, this payload is the highest end time of all events starting before me.

Note the algo won’t compare L’s start time with ii
Note the algo won’t compare my start time with ii, though the scenario analysis below considers it.

  • case 1: If payload is lower than ii, then we know all events on my left have ended before ii, so we can discard the left subtree completely. Only way to go is down the right subtree.
  • the other case (case 0): if payload is higher or equal to ii, then we know my left subtree contains some events (known as “candd” aka candidates) that will end after ii.
  • case 0a: suppose my start time is before ii, then by BST definition every candidate has started before ii. We can pick any candidate to return.
  • case 0b: suppose my start time is after ii, then by BST definition, all right subtree events have not started. Right subtree can be discarded
  • In both cases 0a and 0b, we can discard right subtree.

 

priorityQ: 2 advantages over RBtree#O(1)add #Part2

[1] Contrary to popular belief, RBTree mass insert (including mass-construction) is O(logN) rather than O(1) per node in the general case. However, it is O(1) if input is presorted.

See lecture notes https://courses.cs.washington.edu/courses/cse373/02au/lectures/lecture11l.pdf and SOF post on
https://stackoverflow.com/questions/6147242/heap-vs-binary-search-tree-bst

2 nodes] binTree-with-cycle: locate common ancestor

Q (Leetcode #236): given 2 valid nodes (and root) of a binary tree, find the lowest common ancestor. A node can be a (direct/indirect) descendant of itself. All values distinct. No uplink.

classic problem:)

Q2 (my own requirement): what if cycle is possible?

My idea — Just run a lazy-dft to find the two paths-from-root. On each path, if we detect a cycle we terminate that path. Before terminating any path, need to check if we hit both nodes, so after finding one node we must go all the way to the leaf node or the one of the 2 given node.

As soon as we find the 2 paths we terminate DFT.

IIF two CPUs are given, my dft will use two threads — one left to right; the other right to left. This will more quickly locate the 2 target nodes if they appear near extremities.

https://github.com/tiger40490/repo1/blob/cpp1/cpp/binTree/commonAncestor_Cycle.cpp is my self-tested code, not tested on Leetcode

generate combinationSum compositions #backtrack up] trie+tree

Q: https://leetcode.com/problems/combination-sum/description/

Given a set of unique candidate numbers and a target number, find all unique combinations of candidates, where each combination sums to target. Each candidate may be used repeatedly.

My solution is https://github.com/tiger40490/repo1/blob/cpp1/cpp/combo_permu/comboSum.cpp , showing a reusable backtracking technique described below. It’s clever and brief. However, the efficiency is questionable. Memoization might be applicable here.

This backtracking relies on a key insight. Suppose we have target = 7 and X=2,Y=3,Z=4 as the candidates.

  • when we try a Y, we don’t need to try any more Xs. For example, If we are trying XY, then all XX* solutions are already handled by earlier recursive calls.
  • each combo sequence is naturally pre-sorted internally.
  • Also, “lower” formulas are generated before “higher” formulas
                 root
          x        y     z
      /   |  \
     x    y   z       
    / \   | \  \
 xxx  xxy | xyz \ 
         xyy     xzz
void //can return something if needed

recurs( solutionsFound &, //growing
        curPartialSolution &, 
// above collections could be global variables, to simplify things

        remainingCandidates, /*probably an immutable global array. 
If we need the remaining array to shrink, we can still rely on startIndex to skip used candidates.*/

        gapFromTarget, 
        startIndex, //used to select from remaining candidates
) 

Inside this function, we scan remaining candidates starting from startIndex. Typically in one iteration

  1. we add a new candidate into curPartialSolution
  2. we call recurs
  3. we remove the last added candidate from curPartialSolution to restore the original curPartialSolution — backtracking up the tree.
  4. move on to the next candidate

I feel this is more greedy than DP

max path sum through binTree #self-tested

Q1: Given a non-empty binary tree of signed integers, find the maximum path sum. For this problem, a path is defined as any sequence of nodes from any starting node to any node in the tree along the parent->child connections. The path must contain at least one node and does not need to go through the root. No uplink. No cycle.

Q2 (Leetcode “hard” Q124): uplink provided. No clue yet

====analysis====

My solution — DFT. Along each root-to-leaf path, use the max-subarray (Kadane) algo and store maxSumEndingHere value in each node, for reuse.

Q: is there any duplicate work?
A: I hope not, thanks to memoization i.e. Node::meh field

Q: do we visit every path?
A: I think so.

I simplified the idea further in

https://github.com/tiger40490/repo1/blob/cpp1/cpp/algo_binTree/maxPathSum.cpp

Time complexity is .. O(V+E) = O(N), since I visit every node and follow each edge once only.

There might be algorithmically superior solutions on leetcode but I don’t want it to affect my joy, motivation and momentum.

isSymetric(root of a binary tree)

Leetcode published solution has a clever BFT solution, easier to implement.

–idea 1 (untested): BFT to list all nodes at a given level
For each node’s enqueue() (including the null nodes), record the path-from-root as a list. As a lighter alternative to this “list”, the path can degenerate to the last step, as a le/ri flag.

Now scan each level from both ends. The left item’s path should mirror the right item’s path.

(Before the scan. confirm the node count is even.)

–in-order dft
first scan records the paths to each leaf node. (Optionally, each path can includes east/west directions).
2nd scan does the same but always visits right child first.

The two outputs should match

killer app@RBTree #priorityQ,sortedVector

A common usage context is query on some data set after pre-processing . In these contexts, BST competes with sorted vector and priority queue.

  • Killer app against vector: incremental insert or live updates
  • Killer app against vector: if there’s even occasional delete(), then sorted vector would suffer
  • Killer app against vector: update on one item can be implemented as delete/reinsert. Vector requires binary search -> mid-stream insert
  • minor advantage over sorted vector: vector sorting requires swapping, so value-semantics is problematic
  • Killer app against priority queue: range query, approximate query,
  • Killer app against priority queue: both max() and min()
  • Application: if each BST node needs additional data. Binary heap doesn’t expose those nodes(?)

It’s important to remember the advantages of vector

  1. cache efficiency
  2. runtime malloc cost

Java, c++ and c# all provide self-balancing BST in their standard libraries, from the very beginning. In my projects, I use these containers on an everyday basis. However, after talking to a few friends I now agree that most coding problems don’t need self-balancing BST because

  1. These problems have no incremental insertion/deletion, so we can simply sort an array of items at O(N logN) as pre-processing
  2. In some cases, priorityQ is a viable alternative
  3. Python doesn’t have this container at all and all coding problems must support python.

versionTable^BST to support binary search #erase

My son’s exam grading table says
A: 85 and above
B: 62 and above
C: 50 and above

One solution to support efficient getGrade(score) is a deque of record {qualifyingScore, grade}. Binary search on the qualifyingScore field is O(logN)

Now a subtly different problem. Suppose you update your resume title periodically,

“developer” in version 1 or later
“SrDev” in version 5 or later
“Consultant” in version 9 or later
“Architect” in version 13 or later

This problem is “subtly” different because new versions always get higher version numbers.

I have always avoided the deque-table data structure, in favor of the more powerful RedBlack trees. Current conclusion — RBTree is still superior, but IFF we receive the table content (high volume) in sorted order, then deque-table is simpler at least conceptually.

  • prepend/append is the advantage of deque-table. However, RBTree insert-with-hint is amortized O(1), comparable to deque table.
  • mid-stream Insert in general is more efficient on RBTree because deque is O(N)
  • Lookup is all O(logN).
  • Delete is similarly more efficient in RBTree.

Delete is often simplified away in coding tests, but crucial in practice, to correct mistakes or prune overgrown data stores. Realistic Binary search systems seldom operate on “immutable” data store.

isPreorderBST(randomArray) #hackerrank

This was asked in a 2018 hacker rank interview, not as important as an on-site coding question. However, I see this question as a classic.

Should study some examples !

https://www.geeksforgeeks.org/check-if-a-given-array-can-represent-preorder-traversal-of-binary-search-tree/ has a tested solution but it’s too cryptic. I added instrumentation to help myself understand it. See my github code.

My analysis — After we have seen some number of nodes, there’s exactly one possible tree we could construct, so let’s construct it.

Note during the construction, each new node has only one place to go (key insight) and after that we can check if it breaks pre-order.

https://github.com/tiger40490/repo1/blob/py1/py/algo_tree/isPreOrderBST.py is my tested code, probably less elegant than the G4G solution, but I’m still proud of it.

I don’t think any solution (including the G4G) is really O(N). My solution is not inferior. It has a readability advantage. It’s longer but not necessarily slower.

 

detect cycle in binary tree

Q1: (Adapted from a real interview) Given a binary tree, where each node has zero or 1 or 2 child nodes but no “uplink” to parent, and given the root node, detect any cycle.

https://github.com/tiger40490/repo1/blob/cpp1/cpp/binTree/cycleInBinTree.cpp is my tested implementation of 1a.

Solution 1a: Three web sites all point at DFT with hashset. I guess the hashset is shrunk whenever we return from a recursion

Solution 1b: I will first write a classic BFT, where each node is processed by processNode(). In this case, my processNode(me) function will start another BFT to traverse my subtree. If I hit myself, then that’s a cycle. I think the additional space required is the size of the queue, which is up to O(N). The (non-recursive) call stack at any time is at most 2 (or 3?) levels.

Q2: how about constant space i.e. don’t use O(N) additional space?

I think any binary tree traversal requires more than O(1) additional space, except Morris. But can Morris even work if there are cycles?

 

depth-first-traversal, height-aware

Latest: https://github.com/tiger40490/repo1/blob/cpp1/cpp1/binTree/DFT_show_level.cpp


struct Node {
    int data;
    Node *left, *right, *next;
    Node(int x, Node * le = NULL, Node * ri = NULL) : data(x), left(le), right(ri), next(NULL) {}
};
    Node _15(15);
    Node _14(14);
    Node _13(13);
    Node _12(12);
    Node _11(11);
    Node _10(10);
    Node _9(9);
    Node _8(8);
    Node _7(7, &_14, &_15);
    Node _6(6, NULL, &_13);
    Node _5(5, &_10, NULL);
    Node _4(4, NULL, &_9);
    Node _3(3, &_6,  &_7);
    Node _2(2, &_4,  &_5);
    Node root(1, &_2, &_3);

int maxD=0;
void recur(Node * n){
  static int lvl=0;
  ++lvl;
  if (lvl>maxD) maxD = lvl;
  if (n->left){ recur(n->left); }
  cout<<n->data<<" processed at level = "<<lvl<<endl;
  if (n->right){ recur(n->right); }
  --lvl;
}
int maxDepth(){
    recur(&root);
    cout<<maxD;
}
int main(){
   maxDepth();
}

bbg-FX: check if a binary tree is BST #no deep recursion

Binary search tree has an “ascending” property — any node in my left sub-tree are smaller than me.

Q1: given a binary tree, check if it’s a BST. (No performance optimization required.) Compile and run the program.

Suppose someone tells you that the lastSeen variable can start with a random (high) initial value, what kind of test tree would flush out the bug? My solution is below, but let’s Look at Q1b.

Q1b: what if the tree could be deep so you can’t use recursion?
A: use BFT. When we actually “print” each node, we check left/right child nodes.


I made a mistake with lastSeen initial value. There’s no “special” initial value to represent “uninitialized”. Therefore I added a boolean flag.

Contrary to the interviewer’s claim, local statics are automatically initialized to zerohttps://stackoverflow.com/questions/1597405/what-happens-to-a-declared-uninitialized-variable-in-c-does-it-have-a-value, but zero or any initial value is unsuitable (payload can be any large negative value), so we still need the flag.

#include <iostream>
#include <climits>
using namespace std;

struct Node {
    int data;
    Node *le, *ri;
    Node(int x, Node * left = NULL, Node * right = NULL) : data(x), le(left), ri(right){}
};
/*    5
  2       7
 1 a
*/
Node _7(7);
Node _a(6);
Node _1(1);
Node _2(2, &_1, &_a);
Node _5(5, &_2, &_7); //root

bool isAscending = true; //first assume this tree is BST
void recur(Node * n){
//static int lastSeen; // I don't know what initial value can safely represent a special value

  //simulate a random but unlucky initial value, which can break us if without isFirstNodeDone
  static int lastSeen = INT_MAX; //simulate a random but unlucky initial value
  static bool isFirstNodeDone=false; //should initialize to false

  if (!n) return;
  if (n->le) recur(n->le);

  // used to be open(Node *):
  if (!isAscending) return; //check before opening any node
  cout<<"opening "<<n->data<<endl; if (!isFirstNodeDone){ isFirstNodeDone = true; }else if (lastSeen > n->data){
        isAscending=false;
        return; //early return to save time
  }
  lastSeen = n->data;

  if (n->ri) recur(n->ri);
}

int main(){
  recur(&_5);
  cout<< (isAscending?"ok":"nok");
}

Breadth-first-traversal, level-aware

BST visits root level (one node only), and 2nd generation, and 3rd generation etc. The BST below is aware of the height or “generation” currently visited.

Latest: https://github.com/tiger40490/repo1/blob/cpp1/cpp1/binTree/BST_show_level.cpp

struct Node {
    int data;
    Node *left, *right, *next;
    Node(int x, Node * le = NULL, Node * ri = NULL) : data(x), left(le), right(ri), next(NULL) {}
};
    Node _15(15);
    Node _14(14);
    Node _13(13);
    Node _12(12);
    Node _11(11);
    Node _10(10);
    Node _9(9);
    Node _8(8);
    Node _7(7, &_14, &_15);
    Node _6(6, NULL, &_13);
    Node _5(5, &_10, NULL);
    Node _4(4, NULL, &_9);
    Node _3(3, &_6,  &_7);
    Node _2(2, &_4,  &_5);
    Node root(1, &_2, &_3);
    Node marker(-1); //to be added to queue to mark new level

Node * dequeue(queue<Node*> & q){
    assert(!q.empty());
    Node * n = q.front();
    q.pop();
    return n;
}
void BFT(){
  queue<Node *> q;
  size_t height=0;

  for( q.push(&marker), q.push(&root); !q.empty();){
    Node * n = dequeue(q);
    if (n == &marker){
        if (q.empty()){
                cout<<"  ^^^^^^^^^^"<<endl;
                break;
        }
        n = dequeue(q);
        q.push(&marker); //move the marker to end of queue
        cout<<"starting height # "<<++height<<endl;
    }
    int dataL = 0, dataR=0;
    if (n->left){
        cout<<"pushing "<<n->left->data<<endl; q.push(n->left);
        dataL = n->left->data;
    }
    if (n->right){
        cout<<"pushing "<<n->right->data<<endl; q.push(n->right);
        dataR = n->right->data;
    }
    cout<<dataL<<" <- "<<n->data<<" -> "<<dataR<<endl;
// "  #  next is "<<dataN<<endl;
  }
}
int main() {
    BFT();
}

BFT^DFT, pre^in^post-order #trees++

Here are my own “key” observations, possibly incomplete, on 5 standard tree walk algos, based on my book [[discrete math]]

  • in-order is defined for binary trees only. Why? in-order means “left-subtree, parent, right-subtree”, implying only 2 subtrees.
  • In contrast, BFS/DFS are more versatile. Their input can be any graph. If not a tree, then BFS/DFS could produce a tree — you first choose a “root” node arbitrarily.
  • The 3 *-order walks are recursively defined. I believe they are three specializations of DFS. See https://www.cs.bu.edu/teaching/c/tree/breadth-first/ and https://en.wikipedia.org/wiki/Tree_traversal
  • The difference of the 3 *-order walks are visually illustrated via the black dot on  https://en.wikipedia.org/wiki/Tree_traversal#Pre-order_(NLR)

—-That’s a tiny intro. Below are more advanced —

BFT uses an underlying queue. In contrast DFS is characterized by backtracking, which requires a stack. However, you can implement the queue/stack without recursion.

BFT visits each node twice; DFT visits each node at least twice — opening (to see its edges) and leaving (when all edges are explored)

DFT backtrack in my own language —

  1. # start at root.
  2. At each node, descend into first [1] subtree
  3. # descend until a leaf node A1
  4. # backtrack exactly one level up to B
  5. # descend into A1’s immediate next sibling A2’s family tree (if any) until a leaf node. If unable to move down (i.e. no A2), then move up to C.

Visually — if we assign a $val to each node such that these nodes form a BST, then we get the “shadow rule”. In-order DFS would probably visit the nodes in ascending order by $value

[1] the child nodes, of size 1 or higher, at each node must be remembered in some order.

seek successor of a given node in a BST # no uplinks

Input: root node and an arbitrary node A.

We can’t start from A because by moving left/right, we may not be able to locate the successor. So we start from root and will encounter A.

I think this is a simple, barebones in-order walk entering at root. We will encounter A, and the next node encountered would be A’s successor.

Note this barebones walk requires no uplink.

seek successor of a given node in a BST #root node unknown

EPI300 Q14.2.

I feel code is hard to write, as it’s hard to visualize or build the tree.

In-order walk must enter at root. If we only have the current node as the only input, then I assume there’s an uplink in each node. Here’s my algo:

Case 1: if i have Right child, then descend there and then descend Left all the way to a leaf node and return. Easy case.

Case 2: now we know i have no Right child. Am i a Left child? If yes then return my parent. Easy case.

Case 3: Now we know I am the Right child, without my own right child. This is the worst case.   My left child (if any) is irrelevant, so effectively I am a leaf node. A right leaf node. Solution: Move up step by step. After the first up-right move,  descend Left all the way.

For predecessor, I found this algo online, but it can’t move up so unable to support Case 2 and 3.

  1. /* Find the inorder predecessor of current */
  2. pre = current->left;
  3. while (pre->right != NULL && pre->right != current)
  4.    pre = pre->right;

Based on that, here’s my successor algo:

  1. /* Find the inorder sucessor of current */
  2. pre = current->right;
  3. while (pre->left != NULL && pre->left != current)
  4.    pre = pre->left;

binary tree in-order walk : notes

This is a favorite algo interview topic.

  • I believe in-order walk will print a binary search tree in left-to-right order.
  • De-confuse — visiting is not same as printing. I believe we need to visit a node P to access its left child, but we must not print P so early!
  • Binary tree is the only thing that uses in-order
  • De-confuse — “node.parent” is not available. Singly-linked tree, not doubly-linked.

Let’s write some python code to

Morris in-order walk]O(N) #O(1)space

I feel this is too hard and unlikely to come up in coding interviews. Also, most trees have up-links, so the no-uplink constraint is artificial and unrealistic. In reality, Morris complexity is usually unnecessary.

–threaded binary tree is the basis of Morris

https://en.wikipedia.org/wiki/Threaded_binary_tree

The Morris algo need to construct the threaded BST, walk and remove the extra links, all without recursion.

  • Tip: For my ascending walk, I only add right-up thread link to bring me to my ancestors. I don’t need any leftward thread link.
  • Tip: All the thread links point upward
  • How many right-up links? Let’s define two clubs Honey and White.
    • every node HH having a left child will get a incoming right-up link pointing to it! After printing HH’s left subtree, this link is used to revisit HH.
    • Except the very last node, every node WW without a right child needs to add a right-up link, so we don’t get stuck at WW.
    • If the Honey club has H members, and White club has W members, then I think we create H right-up links. W +1 = H
    • A node can be both Honey and White i.e. it has a right-up link and is also target of a right-up link.
  • Tip: the sequence of Honey nodes to try has to be top-down. We must create the uplink to every higher Honey node first, as insurance that we can always come back.  While a higher Honey node is locked down and we search for its predecessor in its left subtree, we ignore any lower Honey node
    • First time we hit a Honey node, I’m sure it is not uplinked. After we get it uplinked, we must descend Left.
    • 2nd time we hit a Honey node, I believe its left subtree is completely fixed, so if we descend left again, we will go to a dead end. We must move right, either down or up.
  • Tip: should really use a few variables instead of “cur”. This “cur” is like a “temp1” variable used in first pass, then temp2 variable in 2nd pass etc. These variables are not related at all.

https://github.com/tiger40490/repo1/blob/cpp1/cpp1/binTree/MorrisInOrder.cpp is my c++ implementation with a concise/cryptic implementation and an “unfolded” elaborate implementation

binary tree in-order walk #famed bbg Q

Is there a way to design an iterator over a binary search tree with the following properties?

1. Elements are visited in ascending order (i.e. an in-order traversal)
2. next() and hasNext() queries run in O(1) time.
3. Memory usage is O(1)

——-
I feel O(1) memory is impossible since this is recursive (though all recursions can convert to iteration with a queue bigger than O(1)), with a stack data structure of size H := height of tree.

Similarly, hasNext() need to work through the stack of size H, so O(1) run time impossible

Morris may meet the requirement.

priorityQ vs red-black tree, part 1

No expert here… Just a few pointers.

I feel binary heap is designed for a Subset of the “always-sorted” requirement on sorted trees. The subset is “yield the largest item on demand”.

Therefore, a binary heap is simpler (and faster) than a red-black tree.

As a solution, sorted data structures (like RB tree) are more in demand than priority queues (like heap). For example, the classic exchange order book is more likely a sorted list, though it can be a priority queue.

A binary heap hides all but the maximum node, so we don’t know how those nodes are physically stored.

A binary heap is a binary tree but not sorted, not a BST.

store a billion phone numbers for fast lookup #Barc

The solution I managed to recall in 2015: tree. Each node has the count as payload, and up to 10 child nodes.

Q: efficiently store a billion phone numbers for fast lookup.

I assume realistic usage is read-mostly rather than read-only, but this solution is reasonably efficient in read-write scenarios too. My home-grown idea happens to resemble http://en.wikipedia.org/wiki/Trie.

Assumption — phone numbers across the world have at most 15 digits i.e. short strings. Our trie would be at most 15 levels deep. Fan-out at any parent node is at most 10.

( This 15-digit assumption is not as restrictive as it may appear. The distribution of any realistic “population” of interest to an information system is often /bounded/ by some important constraints. We seldom need to store data of arbitrary length or arbitrary precision. )

Compared to a naïve array data structure (converting each phone number to integer and treat it as subscript) which requires 10^15 array elements (about a million billion elements), our trie would create only a billion leaf nodes and a comparable number [1] of intermediate nodes.

[1] I’d estimate this number: on 14th level, there can be only a few hundred million nodes. On the 13rd lowest, even fewer. Phone numbers share area codes, so on the 4th or 5th lowest level, we have clusters.

Search is O(1) given the Assumption above — just follow the pointers. I feel performance is more predictable than hash table and not susceptible to bad hashing or skewed distribution.

Insert is O(1) given the Assumption above.

How about a hash table? Most standard implementations use a fill factor below 100%, so we need a bucket array of 1billion+. Memory efficient. I’m not sure about quality of hash code distribution. I feel it’s possible to get many collisions.

recursive tree walk -> iterative #barcap

Suppose there’s a deep directory tree (no hard/soft links) with millions of files and directories. Write a tree walker to list files excluding directories. For memory efficiency, turn the recursive algorithm into iterative.

I feel recursive thinking is far more powerful than iterative, because it solves problems otherwise unsolvable. Most classic algo problems fall into this category. I always try to train myself to think recursively. In this regard, It’s good to study and compare iterative algorithms. What kind of recursive algo can be converted to iterative? I feel only simple ones.

Inspired by the article in the link, here are my rules for converting recursion to iteration

Rule: one push (on the stack) per call of the recursive func.

Rule: a stack to hold the arg of the recursive calls, in the correct order. Assuming m(1) -> m(2) -> m(3), then we should push 1,2,3 on the stack. Ideally, the order in our stack should match the recursive call stack.

Rule: since it’s a stack, you remove from the top and add to the top too.

Rule: first step when you reenter the while(not_empty) loop, you simulate a new recursive call by _popping_ the stack. The popped item is no longer a “pending” todo item.

Next Question: how to parallelize? You don’t know how deep each sub-tree is.


public class LowLatencyTreeWalker {
static java.util.LinkedList envelopesToOpen = new java.util.LinkedList();

static void factorial() { // a simple iterative solution to compute
// factorials
envelopesToOpen.add(6);
long ret = 1;
while (!envelopesToOpen.isEmpty()) {
int currNum = envelopesToOpen.removeLast();
System.out.println(ret = ret * currNum);
if (currNum > 1)
envelopesToOpen.add(currNum - 1);
}
}

static void walkTree(String args[]) { // a harder iterative algorithm for
// tree-walk
envelopesToOpen.add(new File("C:/"));
while (!envelopesToOpen.isEmpty()) {
// once we open an envelope, we remove from the todo stack
File f = envelopesToOpen.removeLast();
System.out.println(f.getAbsolutePath());
if (f.listFiles() != null)
envelopesToOpen.addAll(Arrays.asList(f.listFiles()));
}
}
}

binary search tree

defining property: any node to my left is smaller or equal. left child = parent. Both children is allowed to be equal to parent — binary search tree is completely unbiased.

The defining property defined visually: For each node, draw a vertical line (long shadow) through the tree.
* Left descendants’ v-lines are on or inside my v-line.
* Right descendants’ v-lines are on or outside my v-line.

When nodes move up or down the three, their shadows shift. I think this may help us visualize a BST successor, deletion and insertion.

the same group of items can build 2 different bs-trees, with different tree heights. How? The same entries could arrive out of sequence, just like IP packets.

The deeper the tree, the more iterations for a tree walk, the slower the sort.

Worst case bst has a tree height equal to input size, with one branch at each node?

trie: IS-A n-way tree #usage@@

  • descendants (not only “children”) from a particular ancestor (i did not say “parent”) share a prefix. In fact, the prefix is often labelled on the ancestor node’s uplink (umbilical cord). The technical name of a link is an “edge”.
  • height of a trie is the length of the longest key (length=k) in the trie. Example: The key could be 52-character long
  • worst case search? O(k) ?
  • sort by what kind of tree walk (remember all keys are strings)? preorder
  • one node for every common prefix (http://www.nist.gov/dads/HTML/trie.html). one-to-one mapping

storing strings

associative arrays

space-efficient, since most (if not all) prefixes are shared and labelled on the uplinks

digitalSearchTree^trie^radixTree

  • designed for string (ie alphanumeric) keys. Simplest example could be decimal integers
  • sort order is the English dictionary sort order.
  • nearly optimal performance with low implementation effort
  • In insert time (or tree-build time), each digit (or character) of the key is used at the next branching node to decide to move left or right.
  • first digit is used to compare with the root node’s first digit? 2nd digit is used to compare with the 2nd digit of the 2nd generation branching node?
  • Unlike binary search tree, nodes on my left are not always smaller, but why???
  • comparable insert/search performance to read-black tree, but with an implementation as easy as binary search tree.
  • A trie forms the fundamental data structure of Burst-sort, which (in 2007) WAS the fastest string sorting algorithm. A faster (some radix sort) — fastest sort for:one str; arrayOfStr; 32bit ints: non-comparison 
  • Digit can also be a Bit, but still different from a binary search tree. To understand, you may need a reasonable understanding of binary search tree first.
  • BST requires key-comparison at each branching node; DST uses quicker digit-comparison?
  • DST is not a common term. Could be related to the more common radix-tree and “trie”