flip binTree #ez

Q (google): given the root, invert a binary tree, possibly asymmetrical

====analysis:

It’s better to visualize north/south child pointers — everything would be turned upside-down.

— solution 1:
swap the two child pointers in each node

BFT to append each node. When popping a node, swap the 2 child pointers

DFT? post-order or pre-order.

versioned-queue problem

I think this problem is mostly about data structure, not algorithm.

Q: Design and implement a Version-Queue. A Version-Queue maintains a version number along with normal Queue functionality. Every version is a snapshot of the entire queue. Every operation[Enqueue/Dequeue] on the Queue increments its version.

Implement the following functions:

1. Enqueue – appends an element at the end of the queue.
2. Dequeue – returns the top element of the queue.
3. Print – it takes a version number as input and prints the elements of the queue of the given version. The version number input can also be an old/historical version number.

E.g. if the current version number of the queue is 7 and the input to this function is 5, then it should print the elements of the queue when its version number was 5.

For simplicity, assume the elements are integers.

We expect you to write a helper program to test the above data structure which will read input from stdin and prints output to stdout.

Input format:
First line should have an integer n (number of operations). This should be followed by n number of lines, each denoting one operation.
e.g.
6
e 1
e 4
d
e 5
p 2
p 4

‘e’ stands for enqueue

— My design —
In addition to the regular queue data structure, we need a few helper data structures.

All current + historical queue elements are saved as individual elements in a “Snapshot” vector, in arrival order. This vector never decreases in size even during dequeue. Two pointers represent the trailing edge (dequeue) and leading edge (enqueue).

(minor implementation detail — Since it’s a vector, the pointers can be implemented as 2 integer index variables. Pointers and iterators get invalidated by automatic resizing.)

Every enqueue operation increments the right pointer to the right;
Every dequeue operation Increments the left pointer to the Right;
(No decrement on the pointers.)

With this vector, I think we can reconstruct each snapshot in history.

Every pointer increment is recorded in a Version table, which is a vector of Version objects {Left pointer, Right pointer}. For example, if Version 782 has {L=22, R=55} then the snapshot #782 is the sub-array from Index 22 to 55.

Additional space costs:
O(K) where K = number of operations

Additional time costs:
Enqueue operation — O(1). Insert at end of the Snapshot vector and Version vector
Dequeue operation — O(1). Move a pointer + insert into Version vector Print operation — O(M) where M = maximum queue capacity

Minor point — To avoid vector automatic resize, we need to estimate in advance the limit on K i.e. number of operations. If you tell me you get millions of operations a microsecond, then over a day there
could be trillions of “versions” so the Versions vector and the Snapshot vector need sufficient initial capacity.

Note we could get 9000 enqueue operations in a row.

given unsorted ints, find median in O(N)

Classic problem — Q: Given an unsorted int array, find its median in O(N) time. For simplicity, we can first assume array size is odd.

I think we can use any data structure. We can scan the array multiple times.

—-simple partition algo

Pick a random item in the array as pivot value and partition the array. Let’s say we are unlucky and get 12% low vs 88% high. So we discard the 12% and repeat. Suppose then we get 66% vs 22% (within high section). We would then discard the 22%.

So we are likely to require 2N “visits”, I don’t think it would degrade to ((N logN).

—-fancy idea — weighted pivot

In one scan find the max and min. Calculate the mid value. Say this is 23.5, not even an integer. I will use this as a pivot value to partition the array into 2 segments.

Suppose N=101, so I’m looking for #51 item. Suppose left segment has 10 and right segment has 91, so I discard left segment. Now I’m looking for the #41 in the remaining array.

Now I find max and min again (if necessary). Now, Instead of getting the mid point between them, I will use a weighted pivot of (10*min+91*max)/101, so this pivot is shifted to the right, based on suspicion of a right-heavy histogram.

–complexity?

On average, I should get N+N/2+N/4 … <2N

Worst case? The earlier illustration is rather unlucky since that histogram happens to be right-heavy. In such a case, my “weighted pivot” idea should alleviate it.

SCB-FM algo Q2a 2 slists

Q: Given two well-formed singly-linked lists (loop-free), that might be merged at some node, locate the merge point. If not merged, return null.

Note every node has exactly one child node, but may have two parent nodes.

====analysis====

Suppose list A has size P=55 and list B has size Q=32

–SolH: hashtable O(N+M) — optimal in time but not space

First populate hashtable with the short list’s 32 nodes. Then iterate the longer list and check each node’s address.

–SolR: reverse one list … how?

–SolA: array-based.  construct a simple 55-array of A node pointers, and another 32-array of B node pointers. Then compare the two final elements in the two arrays. If same then binary search in B. O(N+M) + O(log M)

–Sol2: 2-pointer algo O(1) space, usable on read-only lists 🙂

first scan to find the 2 end nodes and remember the sizes like 55 vs 32.

IFF the end nodes match, then 2nd scan:

skip first P-Q (23) nodes in the longer list and then increment 2 iterators in lock steps. Compare the 2 iterators at each step.

SCB-FM stack-based FIFO in O(1)amortized

Q: given a hardware-based stack API consisting of 3 functions {pop/push/isEmpty}, please implement a queue api consisting of 3 functions {enqueue/dequeue/isEmpty}

https://leetcode.com/problems/implement-queue-using-stacks/description/ is similar

====analysis====

service dequeue from a hidden stack.

When hidden stack is empty, pop all nodes from visible stack to hidden stack. Amortized O(1) pop()

isEmpty() must add up two sizes.

[[python cookbook]] P658 implements this classic algo in 9 lines.

edit distance

The DP idea — compare matrix-path-counter, which is more visual and easier than This one.

Q72 on Leetcode: Given two words word1 and word2, find the minimum number of operations required to convert word1 to word2. You have the following 3 operations permitted on a word:

  1. Insert a character
  2. Delete a character
  3. Replace a character

Comment — Top 100, naturally occurring. I won’t bother to pass all Leetcode tests esp. the load tests. If I pass all non-load tests I would consider my solution decent.

https://github.com/tiger40490/repo1/tree/py1/py/str has my implementation based on a DP idea online, and a spreadsheet illustration. The idea is elegant once you wrap your mind around it.

===analysis===
Starting with the small string (length S), The challenge is to project as many of the S chars to the large string (length L). If we can project 5 chars at most, then … (wrong — the remaining S-5 chars need replacement, and the other L-S chars need insertion.)

–idea2: draw all the projection arrows from S to L. In a good-projection, every arrow on the right should be more slanted than every arrow on the left. We want the largest good-projection. In the opening example, the largest would have 5 arrows, …

—-
None of these ideas has proven effective.

shortest path:2nodes]binary matrix #BFT

Q: given 2 cells in a binary matrix (1=black, 0=white=blocked), check the pair are connected and if yes return the shortest path. There exists a path of length 1 between any 2 cells IFF both are side by side or stack atop.

count paths between 2 bTree nodes #PimcoQ9 Ashish is arguably harder than this problem, but this problem allows moving in four directions.

binary-matrix island count #DeepakM technique is more applicable. A BFT path should work.

  • every reachable node is painted Green (like 2)
  • we give up after our queue is empty

https://github.com/tiger40490/repo1/blob/py1/py/grid/classic_connectedPair.py is the implementation, briefly tested.

generate all abbr starting from longest.. +! recursion

I won’t insist on relative ordering among the shortest.

Idea 1() — Start with longest abbreviation i.e. the original string S, assuming 5 characters.

  1. populate the smallHM with the original word
  2. copy every char except the first. save into bigHM, then print/process this abbrevation.
  3. copy every char except the 2nd and print
  4. ..
  5. copy every char except the last. Now we have 5 strings in bigHM (a Level-4 hashmap), each length S-1=4
  6. make smallHM point to bigHM object; point bigHM to an empty hm
  7. now take a string from smallHM (Level-4 collection) and generate 4 shorter strings and save them in bigHM (a Level-3 collection), each length S-2=3
  8. now take 2nd string from Level-4 …
  9. After we finish Level-4, we have generated 20 strings in Level-3, but there are only 10 distinct items! so we need a L3 hashmap.

 

merge K presorted lists #O(what)

Q: Merge k sorted linked lists and return it as one sorted list. Analyze and describe its complexity.

Note K could be much larger than N.

https://github.com/tiger40490/repo1/blob/py1/py/linklist/merge4lists.py is my solution.

I feel this is mostly an optimization challenge. I can think of a few solutions

–Sol1: merge 2nd list into first. Then merge 3rd list into first …

https://leetcode.com/problems/merge-k-sorted-lists/solution/ shows that this has higher runtime cost than the brackets solution.

Reason is, each 2-merge-to-1 must visit every node in both lists. So the first list nodes get visited K times!

–Sol1b: brackets.

There are only (log K) levels in the bracket so any list gets visited that many times.

–Sol3: in-place (inefficient)

We maintain K node-pointers for the K lists (K teams)

We also maintain a pointer to the last-added node in the merged list.

first node in K lists are put into a min-heap. Winner (smallest) team would be the “current list”. Now the winner team offers next node and add it into the heap. Winning team ..

What if N=1 and K is 1 billion?

staircase problem #CSY@Bbg

Q (bbg question posed to CSY): given a staircase of height N (eg 3), you can reach the top by three steps 1,1,1 or two steps 1,2 or 2,1, or a single step of 3. Generate all paths for a given N.

I think the solution(s) need more review and internalization.

This problem has many classic solutions

  • bottom-up
  • top-down recursion with memoization
    • yield-based generator function
  • BFT without recursion without duplicates

I feel this is simpler than AQR factorization and the CombinationSum problems.

Is this same as N-boy-split? No… split is harder. With the split, ABC can have a “step” of AC.

easy to test —  https://github.com/tiger40490/repo1/blob/cpp1/cpp/algo_comboPermu/staircase_CSY.cpp was my non-recursive BFT solution.

https://github.com/tiger40490/repo1/blob/py1/py/algo_combo_perm/staircase_CSY.py is my yield-based recursive solution. Only 5 lines in the implementation. I added lots of tests.

Q: why path count == 2n-1?
Answer from CSY: f(1)=1, f(2)=2, f(n) = f(n-1)+f(n-2)…+f(1) = 2n-1
A: For a staircase of n, you can take step of 1 and have f(n-1) paths, or you can take step of 2 and have f(n-2) paths…

—jargon file:

a STEP from LEVEL 0 to 2 has LENGTH=2, and skips level 1; a PATH consists of 1 or more STEPS; a FORMULA is a complete PATH, and always starts at level 0 and may hit intermediate levels #2, #5, #6 …

— key observations:
* Observation 1: Every formula starts with firstStepOf1, or firstStepOf2, (or firstStepOf3…) with not duplicate between these two groups !
* Observation 2: After taking firstStepOf1, the remaining steps are really a formula for staircase of N-1, a smaller problem. This invites a bottom-up DP solution.

—BFT solution. suppose N = 5. We model each path as a directory path from root. Each queue item is a path represented by a linked list or vector of capacity N.

I hope this solution avoids duplicates… Yes it does:)

  1. enqueue all “first levels” /1; /2; /3; /4; /5
  2. then dequeue and visit first path i.e. “1”. In this case, there are 4 levels remaining in the staircase, so enqueue /1/1;  /1/2;  /1/3;  /1/4
  3. then dequeue and visit 2nd path in the queue i.e. “/2”. enqueue /2/1; /2/2;  /2/3

Every time (after dequeue) we see a path has no remaining level, we have a formula.

—DP (bottom-up) iterative solution to be implemented

  1. first, build the formulas for a length-2 staircase. They are /1/1 + /2 i.e. two formulas saved in FormulaArrayForStaircase2 i.e. “fa2”
  2. then build the formulas for a length-3 staircase — fa3 = firstStepOf2 -> fa1 + firstStepOf1 -> fa2
  3. then build the formulas for a length-4 — fa4 = firstStepOf3 -> fa1 + firstStepOf2 -> fa2 + firstStepOf1 -> fa3

— recursive: CSY gave a recursive solution and interviewer asked “Can you find a non-recursive solution”?

I think the recursive solution repeats the same sub-problem (such as a 3-level staircase) over and over again, but we can make the recursive function more efficient with memoization — it should short-circuit when the input already has a solution saved in a global hashtable.

—- python yield/generator recursive solution

My github code shows simple generator function without memoization is about half the speed of bottom-up. With memoization, it’s 20% faster than bottom-up.

max palindrome substring

https://leetcode.com/problems/longest-palindromic-substring/ seems to be the same problem.

Deepak CM received this problem in a real IV.

https://github.com/tiger40490/repo1/blob/py1/py/str/longestPalindrome.py is one solution, not O(N) at all.

— linear time solutions are on wikipedia, but probably not so intuitive so I give up.

— my simple O(NN) solution 1. first {O(N)} identify each “core” which is defined as

  • “ABA”
  • at least 2 count of the same char like AA

Then {O(N)} for each core, scan both ways in lock steps until we see a mismatch. Then we know the length of this palindrome.

https://leetcode.com/problems/longest-palindromic-substring/solution/ shows a O(NN) DP solution

— my one-scan (original) idea 2, but now I feel unsure.

We stop at every character on our forward scan. When we encounter any seed, we need to keep growing it, as one of these seeds will surely grow into the longest palindrome. However, how do we simultaneously grow so many seeds? We won’t due to efficiency.

Instead, I grow the earliest (oldest) seed only. Any seed encountered afterwards will be shorter , that is until the oldest seed stops growing and gets archived. After the archiving, I start a “manhunt’ — I look at the next oldest [1] seed and determine if it can grow to the current position. If it can’t then it is surely inferior to the just-archived oldest. If it can, then we end the manhunt right there and keep scanning forward

Guarantee to find it — every seed is queued. The only way for a seed to get archived is if it stops growing i.e. we have computed it’s full length

[1] FIFO container is needed

One of the worst test cases is a long black/white string containing only length-1 palindromes. My algo would archive many short seeds… and achieves O(N)

N-queen

Q1: generate one solution

Q2: generate all solution to the 4-queen problem without
A: there are up to 4^4 = 256 possible solutions so just iterate over all, but let’s say we want to extend the Q1 solution

https://github.com/tiger40490/repo1/blob/py1/py/2d/classic_nQueens.py is my tested solution. The only important function is placeOnRow(), which took me 17 minutes to write on white-board, but I missed something very important — restoring Mat[r][c] to UNFILLED before continue

4×4 sudoku: backtracking #FB

4-queen is one of the simplest problems in the backtracking category, but I will look at a mini sudoku. Sudoku was once asked in a Facebook 45-min white-board coding test. https://github.com/tiger40490/repo1/blob/py1/py/2d/sudoku4.py is my tested implementation.

I name the four values as color1  color2  color3  color4. A placement is {a color, row index, column index}.

Basic idea — at each empty cell, we try all four possible colors. If color1 is allowed, then we call the same function recursively to fill the next empty cell. Then (crucially) we check the status. If OK, then game over. If NOK, then we try another color. If no more color to try, then (crucially) we reset current cell to unfilled and return NOK

Key to the algo — backtracking is implemented by the restoration and returning NOK

Key to the algo — each empty cell would start a new recursive stack frame. But what data is saved on that stack frame?

Key to the implementation — know what variables go into the loop, what go to the stack and what can be global (simplest). In this case the returned next cell indices CANNOT be globals because each stack frame must remember “next-cell-after-me”

Key to the implementation — simplify state. Each stack frame doesn’t need to remember the “breadcrumb” of all previous placements. Each layer of recursive call only need to remember the colors already tried for its own current cell.

Key to the implementation — factor out simple utilities if they are well-defined and easy to implement. See comment in code.

The important function requires no more than 10 lines! So an interviewer could say “This is something a candidate can write in 45 minutes, assuming all the simple utilities are already available to him, namely failedSomething() and findNextCell2fill().

Truth is, most of us can’t write these 10 lines in an hour on the white-board if we don’t have the correct idea. My initial white-board implementation took 18 minutes and missed one important part —

Mat[r][c] = UNFILLED_COLOR # restore

I then took about an hour to make it work.

live updated hitCount over last5s#presumably Indeed

I hit a similar question in NY, possibly LiquidNet or CVA

Q: Make a system (perhaps a function?) that returns the average number of hits per minute from the past 5 minutes.

I will keep things simple by computing the total hit over the last 300 seconds. (Same complexity if you want average order amount at Amazon over last 5 minutes.)

Let’s first build a simple system before expanding it for capacity.

Let’s first design a ticking system that logs an update every time there’s an update. The log can be displayed or broadcast like a “notice board”, or we can update a shared atomic<int>.

Whenever we get a new record (a hit), we save it in a data structure stamped with an expiry date (datetime). At any time, we want to quickly find the earliest unexpired record i.e. the blue record. There’s only one blue at any time.

What data structure? RingBuffer with enough capacity to hold the last 5 minutes worth of record.

I will keep the address of the current blue record which is defined as the earliest unexpired record in the last update. When a new record comes in, i check “Is the blue expired?” If NO, then easy.. this new record is too close to the last new record. I simply update my “notice board” in O(1). If YES then we run a binary search for the new blue. Once we find it, we have to compute a new update in O(W), where W is the minimum of two counts, A) recently expired records B) still unexpired records.  After the update, we remove the expired items from our data structure.

–That concludes my first design. Now what if we also need to update the notice board even when there is no new record?

I would need an alarm set to the expiry time of the current blue.

–Now what if the updates are too frequent? I can run a schedule update job. I need to keep the address of a yellow record, defined as the newest record of the last update.

When triggered, routine is familiar. I check “Is the blue expired?” If NO then easy… If YES then binary-search for the new blue.

coin problem #all large-enough amounts are decomposable

This is not really a algo IV question, but more like brain teaser problem.

Based on https://en.wikipedia.org/wiki/Coin_problem — For example, the largest amount that cannot be obtained using only coins of 3 and 5 units is 7 units. The solution to this problem for a given set of coin denominations is called the Frobenius number of the set. The Frobenius number exists as long as the set of coin denominations has no common divisor.

Note if a common divisor exists as in {2,4} then all the odd amounts will be non-decomposable.

Q: why a very large amount is always decomposable ? Give an intuitive explanation for 2 coin values like 3 and 5.

Here’s an incomplete answer — 15 (=3*5), 16, 17 are all decomposable. Any larger number can be solved by adding 3’s .

In fact, it was proven that any amount greater than (not equal to) [xy-x-y] are always decomposable. So if we are given 2 coin values (like 4,5, where x is the smaller value) we can easily figure out a range

xy-x-y+1  to xy-y

are each decomposable. Note this range has x distinct values. So any higher amount are easily solved by adding x’s

Also note xy-y is obviously decomposable as (x-1)y.