topN,N-th ranked within a bag: quickselect

For any top-N problem, and for any problems about N-th ranked item,

Fairly efficient solution is the quickselect algo

Advertisements

count lower ints to my right

https://leetcode.com/problems/count-of-smaller-numbers-after-self/ is labelled “hard”.

Q: given an integer array nums , return a new counts array, wherein counts[i] is the number of smaller elements to the right of nums[i]

====analysis

Order statistics tree  (i.e. an augmented RBTree) should make O(N logN). However, the actual algorithm is not clear to me.

One scan from right. Insert each node into this tree. Before inserting a node of value like 22, we will query the tree getRank(22).

Implementation wise, it’s hard to create a self-balancing BST from scratch. So I think an unbalanced BST might do.

Also, there might be some alternative solutions, like mergesort??

print height@subtree at every node #post-order

For these problems, elegant implementation is expected.

Note in-order is for binary tree, but post-order is for any tree.

== Q: given Any tree without cycle, print the size of subtree at every node

I think post-order walk is usable.

== Q: given Any tree without cycle, print the height of subtree at every node. Also print the node address just for identification

Intuition — dft post-order walk. Need to become more familiar with the detailed steps. https://www.techiedelight.com/calculate-height-binary-tree-iterative-recursive/ has details.

== Q (Leetcode): Given a binary tree, determine if it is height-balanced — a binary tree in which the depth of the two subtrees of every node never differ by more than 1.

I can try it on Leetcode .. no need to set up test harness myself but I might be frustrated

Intuition– post-order walk. A simple recursive algo — for any node, find the its subtree height including itself and return it. Before returning, also compare the left vs right subtree heights

max-sum path up+down binTree #FB

Q2 (Leetcode “hard” Q124): find max path sum, where a ” path ” has minimum one node A and can include A’s left child and/or A’s right child. No uplink available.

====analysis

I see two types of paths

  1. down the tree, starting at some upstream node A ending at a node in A’s subtree
  2. up-down the tree, starting at some A’s left subtree, ending somewhere in A’s right subtree.

For 1) https://bintanvictor.wordpress.com/wp-admin/post.php?post=23360&action=edit has my tested solution

For 2: post-order walk to update each node a (signed) max-path-sum-from-here. See https://bintanvictor.wordpress.com/wp-admin/post.php?post=31601&action=edit. The 2 values from left+right children can solve this sub-problem

Q: can we use the 2) solution for 1)? I think so

serialize binTree #funptr

struct Node{
  Node * left, right;
  int data;
}

… If you simply output as “data,leftId,rightId | data,leftId,rightId | …” then the graph structure i.e. link is lost. Therefore, I feel the stream need to identify each node: “Id,data,leftId,rightId|…” Next question is how to assign the id values. Easiest — use node address as a string id! Suppose there’s a treewalk algo walk(Node * root, void ( *callback ) (Node *)), i will call it like

walk(root, serialize1node); // where serialize1node is the callback.

Note the position of the asterisk:

  • It’s on the right of type name Node — read left to right .. pointer to Node
  • It’s on the left of variable name callback — read left to right .. callback is a pointer to …

https://github.com/tiger40490/repo1/blob/cpp1/cpp/binTree/serialize_bbg.cpp is my tested solution I feel the challenge is ECT and syntax, not the idea. —- elegant home-grown two-pass solution Assign incremental IDs to each node in an in-order walk. Then run pre-order walk to output each ID.

  • 🙂 applicable for any binTree, but 3-way trees  or higher
  • 🙂 No null node needed
  • 🙂 no space overhead. Instead of serializing address, I serialize my generated ID of each node, which are usually smaller and never bigger than a pointer. I think the serialized byte array is the smallest possible.
  • Key insight — from a given preorder output, there’s exactly one possible BST we construct.

—- solution: standard graph representations — AdjMatrix + edgeSet, applicable on any graph —- solution : BFT -> list including nulls —-idea from my friend Deepak: pre-order tree walk and output one line per node {payload, direction from root, direction from Level 1 ancestor, direction from Level 2 ancestor..}. The technique is reusable even though overall efficiency and complexity is not optimal. We don’t need to be optimal.

  • first node in the file is always the root
  • the output file always includes the higher level nodes before the lower-level nodes
  • delimiter (\n) needed between nodes
  • demerit — data bloat… too verbose if tree is sparse but 99 levels deep

—-If the tree is a BST, then this idea is based on the fact that a reading a series of unsorted nodes can construct a BSTree.

For this solution, we have no requirement on the payloads. They can be booleans.

first pass — in-order walk to build a hash table of {node address -> integer id}. If we treat the ID’s as node keys, then the tree is a BST.

2nd pass — BFT or pre-order walk the original tree. For each node, we write node ID’s to the file, without writing the links or addresses. File looks like T,5|T,4|T,6|T,2|… When we reconstruct the tree, we read each node from the file, and insert that node into our new tree, using the id value as key to decide where.

If the original tree nodes have keys, I will just treat it as payload, and use my assigned ID as key

Roman-to-integer converter

https://leetcode.com/problems/roman-to-integer/

Roman numerals are usually written largest to smallest from left to right. However, the numeral for four is not IIII. Instead, the number four is written as IV. Because the one is before the five we subtract it making four. The same principle applies to the number nine, which is written as IX. There are six instances where subtraction is used:

  • I can be placed before V (5) and X (10) to make 4 and 9.
  • X can be placed before L (50) and C (100) to make 40 and 90.
  • C can be placed before D (500) and M (1000) to make 400 and 900.

Input: “MCMXCIV” Output: 1994 Explanation: M = 1000, CM = 900, XC = 90 and IV = 4.

==== analysis

Too complicated for a speed coding test?

Reusable technique — One back scan might be enough. Normally the rank of letters encountered would increase. A decrease (only one position) means subtraction. See code in https://leetcode.com/problems/roman-to-integer/discuss/6547/Clean-O(n)-c%2B%2B-solution

Reusable technique — hardcode the 6 “subtraction” cases to simplify the if/else logic.



			

clone connected graph given any one node #!!need4stdSol

There are a few ways to phrase the same question.

  1. For a connected graph of colored nodes (say, only a few colors or black/white), you are given one node only. Please clone entire graph.
  2. Given one node in a connected graph, visit every node and recreate entire graph in another computer.
  3. Given one node in a directed graph, serialized it, transmit to another computer and deserialize.

===== analysis

I feel this is one of the most practical algo problems. I feel my solution is fairly efficient for binary tree too.

Not too hard conceptually, if your mental picture is clear. I would need to assign node IDs (based on addresses [1]) to differentiate the black nodes. These IDs are serialized. So are the edge lists [1].

— For serialization, Basic idea is a BFT (or DFT) to visit every node. Each time a node address is encountered again (probably among the edge list of some node), we would simply skip it.

Within each node, the this->edges field is “translated” and serialized as a list or treeSet of node_IDs. The serialized file looks like

  • ID=1, edges=3,4,5,6
  • ID=2, edges=3,5,7
  • ID=3, edges=5,7,9
  • ID=4, edges=6,8

I think serialization is O(V+E). I think same O() for deserialization.

— For deserialization, we can simply construct all node objects in an array, using ID as index.  The edge list can stay as is. Optionally, we translate each edge from an ID into an address in the array.

[1] these are basic algoQQ pointers

compute any sum@subMatrix by O(1) hashmap lookup

suppose origin cell is at northwest [1,1] not [0,0]

Pre-processing — We can incrementally compute each the rooted-submatrix SumToOrigin[r,c] in linear time. Save them in a s2o hashmap (shadow matrix is actually better)

Now Consider the submatrix identified by [2,2] and [3,4] enclosing 6 cells. This submatrix has sum =

s2o[3,4]+s2o[1,1]-s2o[1,4]-s2o[3,1]

12 cells + 1 cell  – 4 cells – 3 cells

So any submatrix sum can be computed in O(1). To go through all NNMM of them requires O(NNMM) where N and M are the board dimensions

us`lowest N natural nums,count BST#70%#Rahul

how do you make use of the previous results for N=2 to tackle N=3?
f(z) denotes count of unique BSTs consisting of the fist z natural number
f(0)=f(1)=1

For N=21, we can have 1 node on the left side, 19 nodes on the right side

  • for odd z, express it as z:=2y+1 where y > 0
    f(2y+1)=[ f(a=0)*f(2y) + f(a=1)f(2y-1) + f(2)f(2y-2)… +f(y-1)f(y+1) ]*2 + f(y)f(y)

Note a should increment from 0 up to y-1.

  • for even z, express it as z:=2y where y > 0
    f(2y)=[ f(0)*f(2y-1) + f(1)f(2y-2) + f(2)f(2y-3)… +f(y-1)f(y) ]*2

Let’s try this formula on f(2)…= 2 good
Let’s try this formula on f(3)…=5 y=1

–N=3:
if ‘1’ is root, then there are f(2) BSTs
if ‘3’ is root, then there are f(2) BSTs
if ‘2’ is root, then there are f(1) * f(1) BSTs
–N=9:
if ‘1’ is root, there are f(8) BSTs
if ‘9’ is root, there are f(8) BSTs
if ‘4’ is root, there are f(3) left-subtrees and f(5) right-subtrees, giving f(3)*f(5) BSTs

This is not coding challenge but math challenge.

Q2: output all BSTs. We need a way to represent (serialize) a BST?

xp: pure algo question often require QQ 打通脉络

  • eg: bbg 2017 pure algo question on tree serialization required a standard BFS. Just a few lines but I struggled for a long time. If you memorize the key parts and practice enough, then the *knowledge* would show as a algo skill
  • eg: facebook regex is very hard to anyone I know, unless they worked on a similar problem before, and *know* the tricks.
  • eg: generate permutations, combinations. Can be tricky unless you *know* the key points.
  • eg: DP bottom-up vs top-down with memoization
  • eg: SCB-FM IV: stack-based queue
  • .. many more examples

打通知识脉络 is the Chinese phrase

#1(reusable)AuxDS for algo challenges

Here’s a Reusable data structure for many pure algo challenges:

Pre-process to construct a static data store to hold a bunch of “structs” in linked list -OR- growing vector -OR- growing RBTree , all O(1) insertion :). Then we can build multiple “indices” pointing to the nodes

Here are a few O(1) indices: (Note O(1) lookup is the best we can dream of)

  • hashtable {a struct field like a string -> iterator into the data store}
  • array indexed by a struct field like small int id, where payload is an iterator from the data store
  • If the structs have some non-unique int field like age, then we can use the same key lookup to reach a “group”, and within the group use one (or multiple) hashtable(s) keyed by another struct field

I think this is rather powerful and needed only in the most challenging problems like LRU cache.

%%AuxDS strength ] coding test #algoQQ

Update — Indeed experience..

  1. when a tough problem requires us to construct an explicit and non-trivial auxiliary data structures (AuxDS), i often have an intuitive feel for the data structures needed, not always the optimal.
  2. when a tough problem requires a simpler AuxDS + unusual algo, I don’t have an edge.
  3. When a tough problem requires only an unusual algo, I don’t have an edge.
    * eg: edit distance
    * eg: sliding window max
    * eg: maximal sub-matrix

    • Suggestion: invest in algoQQ. I feel they will stick with me.

O(1)space,O(1)search,O(N)sort: tricks

  • every problem asking “top N” — quick-select gives average linear time. Real upper bound is O(NN)
  • Every time I see O(1) space required on an array problem, I think of …. swapping.
  • Every time I see O(1) space required on small int arrays, I think of … save (2D) array indices in the array as payload
  • Every time I see O(1) space required on a list problem, I ask is it a ….. linked list.
  • Every time I see O(N) time required on an array problem, I think of … radix sort, applicable to 64-bit integers, 64-bit floats and strings.
  • Every time I see O(1) search, I think of … hash table and radix …?
  • Every time I see O(1) operation on a sorted data structure, I think of …. append on BST/priorityQ, deque

count paths between 2 binTree nodes #PimcoQ9 Ashish,Pinsky

The DP idea — compare matrix-path-counter and EditDistance

  • showcasing efficient queue in python.
  • showcasing using (x,y) coordinates as dictionary key
  • showcasing find max value in a dictionary

— Requirement: (https://leetcode.com/problems/unique-paths-ii/description/ is similar)
See Q9.pdf in the email to Ashish. Here are some key points:

Consider a maze mapped to a 2d-grid with an upper left corner at coordinates (row, column) = (0, 0). Any movement must be in increasing row or column direction. You must determine the number of distinct paths through the maze. You will always start at position (0, 0), the top left, and end up at (max(row), max(column)), the bottom right.
1 1 0 1
1 1 1 1
As an example, consider the above matrix where 1 indicates an open cell and 0 indicates blocked. You can only travel through open cells, so no path can go through the cell at (0, 2). There are two distinct paths to the goal. As a 2nd example, matrix below has 10 paths:
1 1 1 1
1 1 1 1
1 1 1 1

https://github.com/tiger40490/repo1/blob/py1/py/2d/classic_pathCountQ9.py is my solution.

==== Analysis ====
I see a binary tree, where each node is a cell. Cell (0,0) has down/left node (1,0) + right node (0,1). I feel this is similar to a more generic problem “count paths between 2 nodes in a binary tree”

— DFT:
😦 I implemented something like a DFT but it was too slow with some test cases
😦 can overflow stack
🙂 DFT can print each unique path. I think BFT can’t.
🙂 DFT is easier to implement than BFT

— DynamicProgramming BFT solution from origin, as I described to Bill Pinsky:
This solution is more versatile than the one from Ashish. It can handle any directed graph.

Mark each node as “computed” once we have computed a score denoting how many paths-from-root there are. Save the score in a shadow matrix.

Origin node has score 1. Blocked cell has score 0.

Start BFT from origin. The two immediate neighbors are set to 1 (or 0 if blocked). Every node can be computed by adding up above-score + left-score. (This algo is simpler than the EditDistance algo.)

Memoization Performance note — My implementation was extremely slow (albeit correct) until I added an “if already computed, then continue” early in the loop

This Layered DP algo is also described in max-path-sum problems

Q: why is score[1,1] accessed 4 times?
A: node[1,1] is added to the queue twice. Each dequeue would need one check.
A: Also score[1,2] need to access score[1,1] as a parent. Ditto score[2,1]

— Ashish Singh gave me a much simpler solution, as shown in my github code.

isPreorderBST(randomArray) #G4G

This was asked in a 2018 hacker rank interview, not as important as an on-site coding question. However, I see this question as a classic.

— G4G solution based on sorted stack

https://www.geeksforgeeks.org/check-if-a-given-array-can-represent-preorder-traversal-of-binary-search-tree/ has a tested solution but it’s too cryptic. I added instrumentation to help myself understand it. See my github code.

Insight — at any time, the stack is sorted with the top being the smallest value . If the new item is highest item so far, then stack is wiped clean !

The algo is fairly simple —

  • initialize stack with arr[0] i.e. root
  • for every new item xx, start looping
    • if xx falls below stack top or stack is empty, then push xx, otherwise pop and loop
  • so when would we fail. It relies on the “root” variable which holds the top of the stack if non-empty, but frozen (the real tree root) once stack wiped clean !

— My solution 2

My analysis — After we have seen some number of nodes, there’s exactly one possible tree we could construct, so let’s construct it.

Note during the construction, each new node has only one place to go (key insight) and after that we can check if it breaks pre-order.

https://github.com/tiger40490/repo1/blob/py1/py/algo_tree/isPreOrderBST.py is my tested code, probably less elegant than the G4G solution, but I’m still proud of it.

I don’t think any solution (including the G4G) is really O(N). My solution is not inferior. It has a readability advantage. It’s longer but not necessarily slower.

 

next_perm@N color socks #complex #100%

A common IDE coding challenge — given x pairs socks of various colors, generate all the permutations, in ascending order. Each color has a value.

–Solution 1: std::next_permutation() and prev_permutation()

–solution 2: I can probably write my own next_perm(). Using This function we can generate an ascending sequence of permutations starting from the current content of a vector.

https://github.com/tiger40490/repo1/blob/cpp1/cpp/combo_permu/nextPerm.cpp is my iterative solution, but should use lower_bound()

https://github.com/tiger40490/repo1/blob/py1/py/combo_perm/nextPermOfNSocks.py is my python solution, overcoming the lower_bound problem.

next abbreviation@word with repeat char #2^N #100%

This is a real coding question at 10gen/mongoDB

https://github.com/tiger40490/repo1/blob/py1/py/combo_perm/abbrIterative.py is a simple iterative py solution. Not ascending.

Q1: Given a word with unique letter (like “abc”), Can you write c++ program to generate all abbreviations in any order?
Q2: same question, but don’t use any external storage beside character arrays.
Q3: same question, but output in ascending order? Expected output:

  1. a
  2. ab
  3. abc
  4. ac
  5. b
  6. bc
  7. c

It’s daunting to generate this sequence without recursion. https://github.com/tiger40490/repo1/blob/cpp1/cpp1/combo_permu/abbr_ascendRecursive.cpp is my recursive solution to Q3.

Q4: what if the word has non-unique letters, like “mongo”? The only solution I know relies on an external lookup device.

next Combo@3,using5distinct chars,permitting redraw

Modified from Leetcode problem 17. Suppose there is nothing but one red button. and there are L (like 5) letters on it.

Q: With 4 (=C) draws from 3 (=L) letters (same phone pad button), how many permutations? L^C.
Q: With 4 (=C) draws from 3 (=L) letters, how many combinations?

For C=1, Ans=3
For C=2, Ans=6
For C=3, Ans=10= 6 + 3 + 1?
For C=4, Ans=15=10+(10-6)+1
For C=5, Ans=21=15+(15-10)+1

–My explanation of the count:

Key: Each combo is represented as a sorted vector (ascending). There’s one-to-one mapping between such a vector and a combo.

let’s say L=3 and C grows from 1 to 3 .. For C=2, I put all combos on 3 rows (3 vectors or lists)

  • aa ab ac #all that starts with a
  • bb bc #all that start with b
  • cc # lone wolf

For C=3, Ans

  • first container is populated by prepending ‘a’ to all 3 “old” containers of items
  • 2nd container is populated by prepending ‘b’ to 2nd-and-lower old containers
  • 3rd container is always a lone wolf

Both solutions below are simple, reusable and implemented in https://github.com/tiger40490/repo1/blob/cpp1/cpp/combo_permu/nextComboOf3redrawFrom5.cpp

–sol-1 efficient incremental solution — for C=3, given one combo as a string, find the “next” combo. For example after bbc, we should get bcc. Bonus feature — output is naturally sorted 🙂

–sol 2: append the next char ..

next_Perm@3boys out@5 #80%

algo-practice: generate permutation@3, using5distinct chars

Such a skill might be needed in some coding IV sooner or later. Let’s write this in py or c++. Formally,

Q1: generate all permutations of 3, from 5 distinct chars, in any order.
Q2: generate all permutations of 3, from 5 distinct chars, in ascending order. You can sort the 5 chars first.
Q2b: once you have generated one permutation, how do you identify The next?

Note the same solution is a generalization of std::next_permutation(), so once you have this solution you have that solution.

How many? 5-choose-3 * 3! = 5!/(5-3)! = 5*4*3 = 60, denoted Answer(3). Paradoxically, Answer(4) == Answer(5) = 120

–algorithm 1 for Q1

  • 1st generate 5 single-char strings;
  • then for each generate 4 two-char strings. We get 20 strings.

–algorithm 1b for Q1: rec(5,3) will call rec(5,2). rec(5,2) has 20 items, each can generate 3 items for rec(5.3), because each item has 2 characters and a void to be filled by the 3 unused chars.

The 20 items returned should be a pair{vector of 2, vector of 3}

This produces a sorted collection:)

See my code  https://github.com/tiger40490/repo1/blob/py1/py/combo_perm/nextPerm%403boysFrom5.py

 

 

 

next_Combo@3 boys out@5 # 100%

Given, abcde, How many combinations of 3? 5-choose-3 = 10 standard formula, but let’s implement in py or c++.

It’s useful to define a score for a combination — sort the combination into a string. The string is the score. Now,

Q1: generate all combinations@3 from 5 distinct chars, in ascending score.
Q2 (more importantly) given any combination of 3, generate the next higher combination of 3.  Output each combination as a sorted string to keep things clean. Extremely effective constraint and simplification.

1) https://github.com/tiger40490/repo1/blob/py1/py/combo_perm/nextComboOf3from6boys.py is a decent recursive.

2) fixed-length abbreviation generator also generates exactly the same  sequence!

3) Perhaps we can modify the iterative generate random perm@3boys out@5 algorithm:

Key: Scan from end of string to identify position2upgrade. Compare current (i.e.last) position with the unused characters. If can’t upgrade me [eg abe], then move left which is guaranteed to be a lower character. Now compare the new “me” with only the unused characters (in this algo we don’t allow swap within the string) If can’t upgrade “me”, then move left.

If current position is upgradeable, then set it as p2u. Upgrade to the lowest unused character. Then reset all characters on my right and return.

  1. abc
  2. abd
  3. abe
  4. acd
  5. ace
  6. ade //last 2 letters are the max. p2u = 0
  7. bcd
  8. bce
  9. bde
  10. cde

 

Breadth-first-traversal, level-aware

BST visits root level (one node only), and 2nd generation, and 3rd generation etc. The BST below is aware of the height or “generation” currently visited.

Latest: https://github.com/tiger40490/repo1/blob/cpp1/cpp1/binTree/BST_show_level.cpp

struct Node {
    int data;
    Node *left, *right, *next;
    Node(int x, Node * le = NULL, Node * ri = NULL) : data(x), left(le), right(ri), next(NULL) {}
};
    Node _15(15);
    Node _14(14);
    Node _13(13);
    Node _12(12);
    Node _11(11);
    Node _10(10);
    Node _9(9);
    Node _8(8);
    Node _7(7, &_14, &_15);
    Node _6(6, NULL, &_13);
    Node _5(5, &_10, NULL);
    Node _4(4, NULL, &_9);
    Node _3(3, &_6,  &_7);
    Node _2(2, &_4,  &_5);
    Node root(1, &_2, &_3);
    Node marker(-1); //to be added to queue to mark new level

Node * dequeue(queue<Node*> & q){
    assert(!q.empty());
    Node * n = q.front();
    q.pop();
    return n;
}
void BFT(){
  queue<Node *> q;
  size_t height=0;

  for( q.push(&marker), q.push(&root); !q.empty();){
    Node * n = dequeue(q);
    if (n == &marker){
        if (q.empty()){
                cout<<"  ^^^^^^^^^^"<<endl;
                break;
        }
        n = dequeue(q);
        q.push(&marker); //move the marker to end of queue
        cout<<"starting height # "<<++height<<endl;
    }
    int dataL = 0, dataR=0;
    if (n->left){
        cout<<"pushing "<<n->left->data<<endl; q.push(n->left);
        dataL = n->left->data;
    }
    if (n->right){
        cout<<"pushing "<<n->right->data<<endl; q.push(n->right);
        dataR = n->right->data;
    }
    cout<<dataL<<" <- "<<n->data<<" -> "<<dataR<<endl;
// "  #  next is "<<dataN<<endl;
  }
}
int main() {
    BFT();
}

BFT^DFT, pre^in^post-order #trees++

Here are my own “key” observations, possibly incomplete, on 5 standard tree walk algos, based on my book [[discrete math]]

  • in-order is defined for binary trees only. Why? in-order means “left-subtree, parent, right-subtree”, implying only 2 subtrees.
  • In contrast, BFS/DFS are more versatile. Their input can be any graph. If not a tree, then BFS/DFS could produce a tree — you first choose a “root” node arbitrarily.
  • The 3 *-order walks are recursively defined. I believe they are three specializations of DFS. See https://www.cs.bu.edu/teaching/c/tree/breadth-first/ and https://en.wikipedia.org/wiki/Tree_traversal
  • The difference of the 3 *-order walks are visually illustrated via the black dot on  https://en.wikipedia.org/wiki/Tree_traversal#Pre-order_(NLR)

—-That’s a tiny intro. Below are more advanced —

BFT uses an underlying queue. In contrast DFS is characterized by backtracking, which requires a stack. However, you can implement the queue/stack without recursion.

BFT visits each node twice; DFT visits each node at least twice — opening (to see its edges) and leaving (when all edges are explored)

DFT backtrack in my own language —

  1. # start at root.
  2. At each node, descend into first [1] subtree
  3. # descend until a leaf node A1
  4. # backtrack exactly one level up to B
  5. # descend into A1’s immediate next sibling A2’s family tree (if any) until a leaf node. If unable to move down (i.e. no A2), then move up to C.

Visually — if we assign a $val to each node such that these nodes form a BST, then we get the “shadow rule”. In-order DFS would probably visit the nodes in ascending order by $value

[1] the child nodes, of size 1 or higher, at each node must be remembered in some order.