generate combinationSum compositions #backtrack up] trie+tree

Q: https://leetcode.com/problems/combination-sum/description/

Given a set of unique candidate numbers and a target number, find all unique combinations of candidates, where each combination sums to target. Each candidate may be used repeatedly.

My solution is https://github.com/tiger40490/repo1/blob/cpp1/cpp/combo_permu/comboSum.cpp , showing a reusable backtracking technique described below. However, the backtracking relies on a key insight. Suppose we have target = 7 and X=2,Y=3,Z=4 as the candidates.

  • when we try a Y, we don’t need to try any more Xs. For example, If we are trying XY, then all XX* solutions are already handled by earlier recursive calls.
  • each combo sequence is naturally pre-sorted. A blessing … but also a curse when X+X+Y anre Y+X+X are considered two distinct formulas. Latest code in github can support this requirement too.
                 root
          x        y     z
      /   |  \
     x    y   z       
    / \   | \  \
 xxx  xxy | xyz \ 
         xyy     xzz
void //can return something if needed

recurs( solutionsFound &, //growing
        curPartialSolution &, 
// above collections could be global variables, to simplify things

        remainingCandidates, /*probably an immutable global array. 
If we need the remaining array to shrink, we can still rely on startIndex to skip used candidates.*/

        gapFromTarget, 
        startIndex, //used to select from remaining candidates
) 

Inside this function, we scan remaining candidates starting from startIndex. Typically in one iteration

  1. we add a new candidate into curPartialSolution
  2. we call recurs
  3. we remove the last added candidate from curPartialSolution to restore the original curPartialSolution — backtracking up the tree.
  4. move on to the next candidate
Advertisements

maximum path sum through binTree #untested 70%

Q: (Leetcode “hard” Q124) Given a non-empty binary tree of signed integers, find the maximum path sum. For this problem, a path is defined as any sequence of nodes from any starting node to any node in the tree along the parent->child connections. The path must contain at least one node and does not need to go through the root. No uplink

====analysis====

optimal solution(?) — DFT. Along each root-to-leaf path, use the max-subarray (Kadane) algo and store maxSumEndingHere value in each node, for reuse.

priority queue 2 advantage over RBtree

  • binary heap is based on array — no memory footprint of pointer attributes/fields. Also cache friendly.
  • https://en.cppreference.com/w/cpp/algorithm/push_heap shows individual insert and delete_max are both logN in worst case
  • For mass-insert, per node is O(1) in heap. For RBtree, it takes logN tries to find the right insert position.
  • Heap reading max() is O(1). RBTree can achieve the same — we can locate the next max right after delete(), so delete() is still O(N logN), but max() would be reduced to O(1).

I think removing max is O(logN) for both.

See lecture notes https://courses.cs.washington.edu/courses/cse373/02au/lectures/lecture11l.pdf and SOF post on
https://stackoverflow.com/questions/6147242/heap-vs-binary-search-tree-bst

(de)serialize binary tree #funptr ECT

struct Node{
  Node * left, right;
  int data;
}

… If you simply output as “data,leftId,rightId | data,leftId,rightId | …” then the graph structure i.e. link is lost. Therefore, I feel the stream need to identify each node: “Id,data,leftId,rightId|…
Next question is how to assign the id values. Easiest — use node address as a string id!

Suppose there’s a treewalk algo walk(Node * root, void ( *callback ) (Node *)), i will call it like

walk(root, serialize1node); // where serialize1node is the callback.

Note the position of the asterisk:

  • It’s on the right of type name Node — read left to right .. pointer to Node
  • It’s on the left of variable name callback — read left to right .. callback is a pointer to …

https://github.com/tiger40490/repo1/blob/cpp1/cpp/binTree/serialize_bbg.cpp is my tested solution

I feel the challenge is ECT and syntax, not the idea.

—-idea from my friend Deepak: pre-order tree walk and output one line per node {payload, direction from root, direction from Level 1 ancestor, direction from Level 2 ancestor..}

  • first node in the file is always the root
  • the output file always includes the higher level nodes before the lower-level nodes
  • delimiter (\n) needed between nodes
  • demerit — data bloat… too verbose if tree is sparse but 99 levels deep

—-If the tree is a BSTree, then this idea is based on the fact that a reading a series of unsorted nodes can construct a BSTree. For this solution, we have no requirement on the payload values. They can be booleans.

first pass — in-order walk to build a hash table of {node address -> integer id}. If we treat the Id’s as payload, then the tree is a BST.

2nd pass — BFT the original tree. For each node, we write original payload, id to the file, without writing the links or addresses. File looks like T,5|T,4|T,6|T,2|…

When we reconstruct the tree, we read each node from the file, and insert that node into our new tree, using the id value to decide where.

Can this array be preorder-BST#hackerrank

This was asked in a 2018 hacker rank interview, not as important as an on-site coding question. However, I see this question as a classic.

https://www.geeksforgeeks.org/check-if-a-given-array-can-represent-preorder-traversal-of-binary-search-tree/ has a tested solution but it’s too cryptic. I added instrumentation to help myself understand it. See my github code.

My analysis — After we have seen N (say 7) nodes, there’s exactly one possible BST we could construct, so let’s construct it. The next node would have only one place to go, and we can check if it obeys pre-order. https://github.com/tiger40490/repo1/blob/py1/py/tree/canBePreOrderBST.py is my tested code, probably less elegant than the above, but I’m still proud of it.

I don’t think any solution (including the G4G) is really O(N). My solution is not inferior. It has a readability advantage. It’s longer but not necessarily slower.

 

detect cycle in binary tree

Q1: (Adapted from a real interview) Given a binary tree, where each node has zero or 1 or 2 child nodes but no “uplink” to parent, and given the root node, detect any cycle.

Solution 1a: Three web sites all point at DFT with hashset. I guess the hashset is shrunk whenever we return from a recursion

Solution 1b: I will first write a classic BFT, where each node is processed by processNode(). In this case, my processNode(me) function will start another BFT to traverse my subtree. If I hit myself, then that’s a cycle. I think the additional space required is the size of the queue, which is up to O(N). The (non-recursive) call stack at any time is at most 2 (or 3?) levels.

Idea 1c: reverse each link, but each node has only 2 fields for two children, and zero field for uplink

Q2: how about constant space i.e. don’t use O(N) additional space?

I think any binary tree traversal requires more than O(1) additional space, except Morris. But can Morris even work if there are cycles?

depth-first-traversal, height-aware

Latest: https://github.com/tiger40490/repo1/blob/cpp1/cpp1/binTree/DFT_show_level.cpp


struct Node {
    int data;
    Node *left, *right, *next;
    Node(int x, Node * le = NULL, Node * ri = NULL) : data(x), left(le), right(ri), next(NULL) {}
};
    Node _15(15);
    Node _14(14);
    Node _13(13);
    Node _12(12);
    Node _11(11);
    Node _10(10);
    Node _9(9);
    Node _8(8);
    Node _7(7, &_14, &_15);
    Node _6(6, NULL, &_13);
    Node _5(5, &_10, NULL);
    Node _4(4, NULL, &_9);
    Node _3(3, &_6,  &_7);
    Node _2(2, &_4,  &_5);
    Node root(1, &_2, &_3);

int maxD=0;
void recur(Node * n){
  static int lvl=0;
  ++lvl;
  if (lvl>maxD) maxD = lvl;
  if (n->left){ recur(n->left); }
  cout<<n->data<<" processed at level = "<<lvl<<endl;
  if (n->right){ recur(n->right); }
  --lvl;
}
int maxDepth(){
    recur(&root);
    cout<<maxD;
}
int main(){
   maxDepth();
}

bbg-FX: check if a binary tree is BST #no deep recursion

Binary search tree has an “ascending” property — any node in my left sub-tree are smaller than me.

Q1: given a binary tree, check if it’s a BST. (No performance optimization required.) Compile and run the program.

Suppose someone tells you that the lastSeen variable can start with a random (high) initial value, what kind of test tree would flush out the bug? My solution is below, but let’s Look at Q1b.

Q1b: what if the tree could be deep so you can’t use recursion?
A: use BFT. When we actually “print” each node, we check left/right child nodes.


I made a mistake with lastSeen initial value. There’s no “special” initial value to represent “uninitialized”. Therefore I added a boolean flag.

Contrary to the interviewer’s claim, local statics are automatically initialized to zerohttps://stackoverflow.com/questions/1597405/what-happens-to-a-declared-uninitialized-variable-in-c-does-it-have-a-value, but zero or any initial value is unsuitable (payload can be any large negative value), so we still need the flag.

#include <iostream>
#include <climits>
using namespace std;

struct Node {
    int data;
    Node *le, *ri;
    Node(int x, Node * left = NULL, Node * right = NULL) : data(x), le(left), ri(right){}
};
/*    5
  2       7
 1 a
*/
Node _7(7);
Node _a(6);
Node _1(1);
Node _2(2, &_1, &_a);
Node _5(5, &_2, &_7); //root

bool isAscending = true; //first assume this tree is BST
void recur(Node * n){
//static int lastSeen; // I don't know what initial value can safely represent a special value

  //simulate a random but unlucky initial value, which can break us if without isFirstNodeDone
  static int lastSeen = INT_MAX; //simulate a random but unlucky initial value
  static bool isFirstNodeDone=false; //should initialize to false

  if (!n) return;
  if (n->le) recur(n->le);

  // used to be open(Node *):
  if (!isAscending) return; //check before opening any node
  cout<<"opening "<<n->data<<endl; if (!isFirstNodeDone){ isFirstNodeDone = true; }else if (lastSeen > n->data){
        isAscending=false;
        return; //early return to save time
  }
  lastSeen = n->data;

  if (n->ri) recur(n->ri);
}

int main(){
  recur(&_5);
  cout<< (isAscending?"ok":"nok");
}

Breadth-first-traversal, height-aware

BST visits root level (one node only), and 2nd generation, and 3rd generation etc. The BST below is aware of the height or “generation” currently visited.

Latest: https://github.com/tiger40490/repo1/blob/cpp1/cpp1/binTree/BST_show_level.cpp

struct Node {
    int data;
    Node *left, *right, *next;
    Node(int x, Node * le = NULL, Node * ri = NULL) : data(x), left(le), right(ri), next(NULL) {}
};
    Node _15(15);
    Node _14(14);
    Node _13(13);
    Node _12(12);
    Node _11(11);
    Node _10(10);
    Node _9(9);
    Node _8(8);
    Node _7(7, &_14, &_15);
    Node _6(6, NULL, &_13);
    Node _5(5, &_10, NULL);
    Node _4(4, NULL, &_9);
    Node _3(3, &_6,  &_7);
    Node _2(2, &_4,  &_5);
    Node root(1, &_2, &_3);
    Node marker(-1); //to be added to queue to mark new level

Node * dequeue(queue<Node*> & q){
    assert(!q.empty());
    Node * n = q.front();
    q.pop();
    return n;
}
void BFT(){
  queue<Node *> q;
  size_t height=0;

  for( q.push(&marker), q.push(&root); !q.empty();){
    Node * n = dequeue(q);
    if (n == &marker){
        if (q.empty()){
                cout<<"  ^^^^^^^^^^"<<endl;
                break;
        }
        n = dequeue(q);
        q.push(&marker); //move the marker to end of queue
        cout<<"starting height # "<<++height<<endl;
    }
    int dataL = 0, dataR=0;
    if (n->left){
        cout<<"pushing "<<n->left->data<<endl; q.push(n->left);
        dataL = n->left->data;
    }
    if (n->right){
        cout<<"pushing "<<n->right->data<<endl; q.push(n->right);
        dataR = n->right->data;
    }
    cout<<dataL<<" <- "<<n->data<<" -> "<<dataR<<endl;
// "  #  next is "<<dataN<<endl;
  }
}
int main() {
    BFT();
}

BFT^DFT, pre^in^post-order #beyond trees

Here are my own “key” observations, possibly incomplete, on 5 standard tree walk algos, based on my book [[discrete math]]

  • pre/in/post-order is defined for binary trees only. Why? in-order means “left-subtree, parent, right-subtree”, implying only 2 subtrees.
  • In contrast, BFS/DFS are much more versatile. Their input can be any graph. If not a tree, then BFS/DFS could *produce* a tree — you first choose a “root” node arbitrarily.
  • The 3 *-order walks are recursively defined. I believe they are three specializations of DFS. See https://www.cs.bu.edu/teaching/c/tree/breadth-first/

—-That’s a tiny intro. Below are more advanced —

BFS uses an underlying queue. In contrast DFS is characterized by backtracking, which requires a stack. However, you can implement the queue/stack without recursion.

BFS feels like the most intuitive algo, recursion-free. Example — send a mailer to all linkedin contacts:

# send to each direct contact
# send to each second-degree contacts …

DFS backtrack in my own language —

  1. # start at root.
  2. At each node, descend into first [1] subtree
  3. # descend until a leaf node A1
  4. # backtrack exactly one level up to B
  5. # descend into A1’s immediate next sibling A2’s family tree (if any) until a leaf node. If unable to move down (i.e. no A2), then move up to C.

Visually — if we assign a $val to each node such that these nodes form a BST, then we get the “shadow rule”. In-order DFS would probably visit the nodes in ascending order by $value

[1] the child nodes, of size 1 or higher, at each node must be remembered in some order.

seek successor of a given node in a BST # no uplinks

Input: root node and an arbitrary node A.

We can’t start from A because by moving left/right, we may not be able to locate the successor. So we start from root and will encounter A.

I think this is a simple, barebones in-order walk entering at root. We will encounter A, and the next node encountered would be A’s successor.

Note this barebones walk requires no uplink.

seek successor of a given node in a BST #root node unknown

EPI300 Q14.2.

I feel code is hard to write, as it’s hard to visualize or build the tree.

In-order walk must enter at root. If we only have the current node as the only input, then I assume there’s an uplink in each node. Here’s my algo:

Case 1: if i have Right child, then descend there and then descend Left all the way to a leaf node and return. Easy case.

Case 2: now we know i have no Right child. Am i a Left child? If yes then return my parent. Easy case.

Case 3: Now we know I am the Right child, without my own right child. This is the worst case.   My left child (if any) is irrelevant, so effectively I am a leaf node. A right leaf node. Solution: Move up step by step. After the first up-right move,  descend Left all the way.

For predecessor, I found this algo online, but it can’t move up so unable to support Case 2 and 3.

  1. /* Find the inorder predecessor of current */
  2. pre = current->left;
  3. while (pre->right != NULL && pre->right != current)
  4.    pre = pre->right;

Based on that, here’s my successor algo:

  1. /* Find the inorder sucessor of current */
  2. pre = current->right;
  3. while (pre->left != NULL && pre->left != current)
  4.    pre = pre->left;

binary tree in-order walk : notes

This is a favorite algo interview topic.

  • I believe in-order walk will print a binary search tree in left-to-right order.
  • De-confuse — visiting is not same as printing. I believe we need to visit a node P to access its left child, but we must not print P so early!
  • Binary tree is the only thing that uses in-order
  • De-confuse — “node.parent” is not available. Singly-linked tree, not doubly-linked.

Let’s write some python code to

Morris in-order walk]O(N) #O(1)space

I feel this is too hard and unlikely to come up in coding interviews. Also, most trees have up-links, so the no-uplink constraint is artificial and unrealistic. In reality, Morris complexity is usually unnecessary.

–threaded binary tree is the basis of Morris

https://en.wikipedia.org/wiki/Threaded_binary_tree

The Morris algo need to construct the threaded BST, walk and remove the extra links, all without recursion.

  • Tip: For my ascending walk, I only add right-up thread link to bring me to my ancestors. I don’t need any leftward thread link.
  • Tip: All the thread links point upward
  • How many right-up links? Let’s define two clubs Honey and White.
    • every node HH having a left child will get a incoming right-up link pointing to it! After printing HH’s left subtree, this link is used to revisit HH.
    • Except the very last node, every node WW without a right child needs to add a right-up link, so we don’t get stuck at WW.
    • If the Honey club has H members, and White club has W members, then I think we create H right-up links. W +1 = H
    • A node can be both Honey and White i.e. it has a right-up link and is also target of a right-up link.
  • Tip: the sequence of Honey nodes to try has to be top-down. We must create the uplink to every higher Honey node first, as insurance that we can always come back.  While a higher Honey node is locked down and we search for its predecessor in its left subtree, we ignore any lower Honey node
    • First time we hit a Honey node, I’m sure it is not uplinked. After we get it uplinked, we must descend Left.
    • 2nd time we hit a Honey node, I believe its left subtree is completely fixed, so if we descend left again, we will go to a dead end. We must move right, either down or up.
  • Tip: should really use a few variables instead of “cur”. This “cur” is like a “temp1” variable used in first pass, then temp2 variable in 2nd pass etc. These variables are not related at all.

https://github.com/tiger40490/repo1/blob/cpp1/cpp1/binTree/MorrisInOrder.cpp is my c++ implementation with a concise/cryptic implementation and an “unfolded” elaborate implementation

binary tree in-order walk #famed bbg Q

Is there a way to design an iterator over a binary search tree with the following properties?

1. Elements are visited in ascending order (i.e. an in-order traversal)
2. next() and hasNext() queries run in O(1) time.
3. Memory usage is O(1)

——-
I feel O(1) memory is impossible since this is recursive (though all recursions can convert to iteration with a queue bigger than O(1)), with a stack data structure of size H := height of tree.

Similarly, hasNext() need to work through the stack of size H, so O(1) run time impossible

Morris may meet the requirement.

priority queue vs red-black tree, part 1

No expert here… Just a few pointers.

I feel binary heap is designed for a Subset of the “always-sorted” requirement on sorted trees. The subset is “yield the largest item on demand”.

Therefore, a binary heap is simpler (and faster) than a red-black tree.

As a solution, sorted data structures (like RB tree) are more in demand than priority queues (like heap). For example, the classic stock exchange order book is a sorted list, not a priority queue.

A priority queue hides all but the maximum node, so we don’t know how those nodes are physically stored.

store a billion phone numbers for fast lookup #Barc

The solution I managed to recall in 2015: tree. Each node has the count as payload, and up to 10 child nodes.

Q: efficiently store a billion phone numbers for fast lookup.

I assume realistic usage is read-mostly rather than read-only, but this solution is reasonably efficient in read-write scenarios too. My home-grown idea happens to resemble http://en.wikipedia.org/wiki/Trie.

Assumption — phone numbers across the world have at most 15 digits i.e. short strings. Our trie would be at most 15 levels deep. Fan-out at any parent node is at most 10.

( This 15-digit assumption is not as restrictive as it may appear. The distribution of any realistic “population” of interest to an information system is often /bounded/ by some important constraints. We seldom need to store data of arbitrary length or arbitrary precision. )

Compared to a naïve array data structure (converting each phone number to integer and treat it as subscript) which requires 10^15 array elements (about a million billion elements), our trie would create only a billion leaf nodes and a comparable number [1] of intermediate nodes.

[1] I’d estimate this number: on 14th level, there can be only a few hundred million nodes. On the 13rd lowest, even fewer. Phone numbers share area codes, so on the 4th or 5th lowest level, we have clusters.

Search is O(1) given the Assumption above — just follow the pointers. I feel performance is more predictable than hash table and not susceptible to bad hashing or skewed distribution.

Insert is O(1) given the Assumption above.

How about a hash table? Most standard implementations use a fill factor below 100%, so we need a bucket array of 1billion+. Memory efficient. I’m not sure about quality of hash code distribution. I feel it’s possible to get many collisions.

recursive tree walk -> iterative #barcap

Suppose there’s a deep directory tree (no hard/soft links) with millions of files and directories. Write a tree walker to list files excluding directories. For memory efficiency, turn the recursive algorithm into iterative.

I feel recursive thinking is far more powerful than iterative, because it solves problems otherwise unsolvable. Most classic algo problems fall into this category. I always try to train myself to think recursively. In this regard, It’s good to study and compare iterative algorithms. What kind of recursive algo can be converted to iterative? I feel only simple ones.

Inspired by the article in the link, here are my rules for converting recursion to iteration

Rule: one push (on the stack) per call of the recursive func.

Rule: a stack to hold the arg of the recursive calls, in the correct order. Assuming m(1) -> m(2) -> m(3), then we should push 1,2,3 on the stack. Ideally, the order in our stack should match the recursive call stack.

Rule: since it’s a stack, you remove from the top and add to the top too.

Rule: first step when you reenter the while(not_empty) loop, you simulate a new recursive call by _popping_ the stack. The popped item is no longer a “pending” todo item.

Next Question: how to parallelize? You don’t know how deep each sub-tree is.


public class LowLatencyTreeWalker {
static java.util.LinkedList envelopesToOpen = new java.util.LinkedList();

static void factorial() { // a simple iterative solution to compute
// factorials
envelopesToOpen.add(6);
long ret = 1;
while (!envelopesToOpen.isEmpty()) {
int currNum = envelopesToOpen.removeLast();
System.out.println(ret = ret * currNum);
if (currNum > 1)
envelopesToOpen.add(currNum - 1);
}
}

static void walkTree(String args[]) { // a harder iterative algorithm for
// tree-walk
envelopesToOpen.add(new File("C:/"));
while (!envelopesToOpen.isEmpty()) {
// once we open an envelope, we remove from the todo stack
File f = envelopesToOpen.removeLast();
System.out.println(f.getAbsolutePath());
if (f.listFiles() != null)
envelopesToOpen.addAll(Arrays.asList(f.listFiles()));
}
}
}

binary search tree

defining property: any node to my left is smaller or equal. left child = parent. Both children is allowed to be equal to parent — binary search tree is completely unbiased.

The defining property defined visually: For each node, draw a vertical line (long shadow) through the tree.
* Left descendants’ v-lines are on or inside my v-line.
* Right descendants’ v-lines are on or outside my v-line.

When nodes move up or down the three, their shadows shift. I think this may help us visualize a BST successor, deletion and insertion.

the same group of items can build 2 different bs-trees, with different tree heights. How? The same entries could arrive out of sequence, just like IP packets.

The deeper the tree, the more iterations for a tree walk, the slower the sort.

Worst case bst has a tree height equal to input size, with one branch at each node?

trie: IS-A n-way tree #usage@@

  • descendants (not only “children”) from a particular ancestor (i did not say “parent”) share a prefix. In fact, the prefix is often labelled on the ancestor node’s uplink (umbilical cord). The technical name of a link is an “edge”.
  • height of a trie is the length of the longest key (length=k) in the trie. Example: The key could be 52-character long
  • worst case search? O(k) ?
  • sort by what kind of tree walk (remember all keys are strings)? preorder
  • one node for every common prefix (http://www.nist.gov/dads/HTML/trie.html). one-to-one mapping

storing strings

associative arrays

space-efficient, since most (if not all) prefixes are shared and labelled on the uplinks

digitalSearchTree^trie^radixTree

  • designed for string (ie alphanumeric) keys. Simplest example could be decimal integers
  • sort order is the English dictionary sort order.
  • nearly optimal performance with low implementation effort
  • In insert time (or tree-build time), each digit (or character) of the key is used at the next branching node to decide to move left or right.
  • first digit is used to compare with the root node’s first digit? 2nd digit is used to compare with the 2nd digit of the 2nd generation branching node?
  • Unlike binary search tree, nodes on my left are not always smaller, but why???
  • comparable insert/search performance to read-black tree, but with an implementation as easy as binary search tree.
  • A trie forms the fundamental data structure of Burst-sort, which (in 2007) WAS the fastest string sorting algorithm. A faster (some radix sort) — fastest sort for:one str; arrayOfStr; 32bit ints: non-comparison 
  • Digit can also be a Bit, but still different from a binary search tree. To understand, you may need a reasonable understanding of binary search tree first.
  • BST requires key-comparison at each branching node; DST uses quicker digit-comparison?
  • DST is not a common term. Could be related to the more common radix-tree and “trie”