modified BFT to check binTree is symmetric

https://leetcode.com/problems/symmetric-tree/solution/ iterative solution is elegant adaptation of BFT.

Advertisements

longest run@same char,allow`K replacements #70%

https://leetcode.com/problems/longest-repeating-character-replacement/

Q: Given a string s that consists of only uppercase English letters, you can perform at most k operations on that string. In one operation, you can choose any character of the string and change it to any other character. Find the length of the longest sub-string containing all repeating letters you can get after performing the above operations.

Here’s my description of the same problem:

Q: Suppose we stand by highway and watch cars of the each color. Only 26 possible colors. Cars pass fast, so sometimes we miscount.

My son says “I saw 11 red cars in a row in the fast lane”.
My daughter says “I saw 22 blue cars in a row in the middle lane”
We allow kids to miss up to 3 cars in their answer. In other words, my son may have seen only 8, 9 or 10 red cars in a row.

When we review the traffic video footage of N cars in a single lane, determine the max X cars in a row of the same color, allowing k mistakes. K < N.
====analysis
Suppose k is 3

— solution 1: O(N) use 2 variables to maintain topFrq and w i.e. winSize

Within a sliding window of size w, maintain a frq table. initialize w to a good conservative value of 4 (i.e. k+1).

If we notice top frq is 2, better than (w-k) i.e. w-k<=topFrq , then lucky we can be less conservative and we can expand the current window backward (possibly safer than fwd).

After expansion, immediate try further expansion. IFF impossible i.e. w – topFrq > k, then slide the window.

If correct answer is 11 i.e there’s a 11-substring containing 8 reds, I feel my sliding window will not miss it.

how many ways to decode #60%

Q(Leetcode 91): A message containing letters from A-Z is being encoded to numbers using the following mapping:

‘A’ -> 1, ‘B’ -> 2, … ‘Z’ -> 26
Given a non-empty string containing only digits, determine the total number of ways to decode it.

====analysis

I think this is similar to the punctuation problem.

–my botup solution

At each position in the string, keep a “score” number that represents “how many ways to decode a left-subtring ending here”

Useful — define my convenient jargon: we will say the encoding 10 to 26 are “high letters”, and the encoding 1 to 9 are “low letters”. If there are 95 ways to decode a string, i will call them 95 “formulas”.

At Position 33, i will look at score[31] (say equal to 95) and score[32] (say, equal to 97). if the two-char substring str[32:34] is between 10 and 26, then score[33] should include the 95 ways to decode str[:32]. Those 95 “formulas” can grow one high letter.

If str[33] is not ‘0’, then score[33] should also include the 97 ways to decode str[:33], because those 97 “formulas” can grow one low letter.

Aha — The 95 and the 97 formulas are all distinct because of the ending letter

I think we only need two variables to hold the previous two scores, but it’s easier to code with a score[] array.

zero out rows/columns +! auxDS

Q (Leetcode 73): Given a m x n matrix, if an element is 0, set its entire row and column to 0. Do it in-place in O(1) space. No time complexity.

I will assume all signed ints.

====analysis
I find this problem very well-defined, but the O(1) space is highly contrived. I think it only needs some clever technique, not really reusable technique.

Reusable technique — for array of integers with stringent space complexity, we can save indices in the array

Aha — I only need to know the full set of rowIDs and columnIDs.

— My O(minimum(m,n)) space solution 1:
zeroRowCnt:=how many rows to be zeroed out
zeroColCnt  :=how many columns to be zeroed out

Compare the two. Suppose zeroRowCnt == 11 is smaller. I will save the 11 rowID’s in a collection. Then scan horizontally to zero out by column. Then use the rowIDs to zero out by row

–My O(1) space idea 2 — more elaborate than the published solution.

Aha — Can we save the 11 rowID’s in a column to be zeroed out?

Compare zeroRowCnt and zeroColCnt as earlier. Get first rowID among the 11. Suppose it’s Row #3.

Now we know Row#3 has some zeros, so find the first column having a zero. It might be the last column (farthest east). Wherever it is, we pick that column as our “bookkeeper column”.

Visual insight — Suppose bookkeeper is Column #33. Then a[3,33] would be the first zero if we scan entire matrix by-row-and-internally-by-column

We scan row by row again (since we don’t remember those 11 rowIDs), starting after that first rowID. For every rowID found, we will zero out one corresponding cell in bookkeeper column.

Insight — We should end up with exactly 11 zeros in that column. Can’t exceed 11 (only 11 rows having zeros). Can’t fall below 11 (we save all 11 rowIDs)

From now on, freeze that column until further notice. Now zero out each Column to be zeroed out, but leave out our bookkeeper column.

Lastly, follow our bookkeeper column to zero out every “dirty row”.

count lower ints to my right

https://leetcode.com/problems/count-of-smaller-numbers-after-self/ is labelled “hard”.

Q: given an integer array nums , return a new counts array, wherein counts[i] is the number of smaller elements to the right of nums[i]

====analysis

Order statistics tree  (i.e. an augmented RBTree) should make O(N logN). However, the actual algorithm is not clear to me.

One scan from right. Insert each node into this tree. Before inserting a node of value like 22, we will query the tree getRank(22).

Implementation wise, it’s hard to create a self-balancing BST from scratch. So I think an unbalanced BST might do.

Also, there might be some alternative solutions, like mergesort??

longest descending path through matrix #60%

https://leetcode.com/problems/longest-increasing-path-in-a-matrix/

Q: given an int-matrix that allows 4-way moves,  find longest path with strictly descending nodes. Lateral move disallowed.

====analysis

I have code to generate all paths… generate loopfree paths: graph node A→B

— Solution 1: O(N logN)
O(N) First pass  to construct “waterflow” graph — visit N nodes. For each, check the four neighbors. Add an outgoing edge (in a Node.edgeList) to a neighbor node if it is strictly lower.

Now put all nodes into a min-heap, according to node height. Can be part of first pass.

Second pass we visit each item in the min-heap. For each node, compute longest descending path starting therefrom. Record this path length as Node.score. Note if a node is surrounded by higher or equal nodes, then it has score 0, as in the lowest node.

A higher node AA is always upstream to some “computed” nodes. (We don’t care about any incoming edges into AA). So pick the max of (up to four) lower neighbors’ scores. Add 1 to get AA’s score. This computation is O(1) but the heap pop() makes second pass O(N logN)

Note the lowest node may not be part of the longest path, if it is surrounded by the highest nodes.

getByRank() in sorted matrix: priorityQ^RBTree

https://leetcode.com/problems/kth-smallest-element-in-a-sorted-matrix/

====analysis

recombinant binTree pyramid, where “water” flows east or south.

  • first level has one node .. lowest value. Add it to pq (i.e. priorityQ)
  • pop the pq and insert the two downstream nodes
  • total K pops, each pop is followed by up to 2 inserts

Heap will grow to up to K items, so each pop will be up to logK

Total O(K logK). To achieve this time complexity, we can also use a RBTree. The tree nodes can come from a pre-allocated array.

lowest missing+ve int#Codility #70%

Q: Write a function int solution(int[] A);  that, given an array A of N integers, returns the smallest natural number that does not occur in A. For example, given A = [1, 3, 6, 4, 1, 2], the function should return 5.
Given A = [1, 2, 3], the function should return 4.
Given A = [−1, −3], the function should return 1.

• each element of array A is an 32-bit signed int
• expected worst-case time complexity is O(N);
• expected worst-case space complexity is O(N).
* Elements of input arrays can be modified.

https://leetcode.com/problems/first-missing-positive/description/ is similar but O(1) space and average O(N) time!

—— my analysis —–

The mutable and O(1) space hints at — saving array indices as payload !

—- Idea A:

first scan to swap non-positives to the end and remember the new boundary (say #111).

In the same scan also determine min and max. Suppose min=55.

Another scan to reduce all payloads by 55 so they fall in the range of 0 to max-55.

Now use CSY technique .. check array@0-N in-situ #Nsdq#contrived

—-solutionB: make_heap O(1) space but O(N logN) time. Build a min-heap based on the array in O(1) space, then keep calling min().

  • make_heap shows random access container (vector used in demo), and “rearrange”, implying O(1) space
  • make_heap shows O(N) to construct the heap

min() is O(1) but delete_min() is O(log N), so overall we probably get O(N logN)

—-solutionC: radix sort. https://en.wikipedia.org/wiki/Radix_sort#In-place_MSD_radix_sort_implementations shows an in-place binary radix sort.

First pass to transform all negative numbers to 0. Then iterate the sorted array and check for the earliest “gap”. Worst case — you get 1,2,3… without gap, so answer is the 1+ largest array element.

O(W*N) where W is width of the largest integer. If we assume 64-bit then W is a constant:)

http://www.drdobbs.com/parallel/parallel-in-place-radix-sort-simplified/229000734 is another in-place radix sort.

get_majority_elem]unsorted array,O(1)space #90%

Q: Given an array of size n, find the majority element. The majority element is the element that appears more than ⌊ n/2 ⌋ times. You may assume that the array is non-empty and the majority element always exist in the array.

====analysis
worst input: odd length, only X and Y. X occurs once more than Y.

hash table solution needs O(N) space since there can be N/2 distinct values. To improve space complexity, how about quick select? Discard the smaller side and pick another random pivot.

Median-finder algorithm can solve this problem, using std::nth_element() which uses QuickSelect… O(1) space despite recursive set-up.

— idea 3 O(1) space: random pick then verify
Random pick and count the occurrence of this pick. Is it more than N/2? Within a few trials we should find a good pick.

T.isLikeSubtree(S) #60%

Q (Leetcode 572): Given two non-empty binary trees s and t, check whether tree t has exactly the same structure and node values as a subtree of s. A subtree of s is a tree consists of a node in s and all of this node’s descendants. The tree s could also be considered as a subtree of itself.

====analysis
https://leetcode.com/problems/subtree-of-another-tree/solution/ relies (as “primary check”) on the payloads of  each tree node, but in some trees, all payloads are empty or all payloads are either True or False. In these cases, the comparison of payload is only usable as a secondary check. The primary check must be structural. See key in a tree node

The O(N+K) is plain wrong.

I guess the 2nd solution (and possibly 1st solution) would compare every node in S to the root of T. I think there are more efficient solutions using subtree size and subtree height as secondary checks – more reliable than payload check.

My solution below uses BFT + pre/post/in-order walk !

— Preliminary step: post-order walk to get subtree-size, subtree-height at each S node + T root. (I will skip other T nodes.). Suppose T is size 22 height 4. We will look for any node Y of size 22 and height 4 and a matching payload. This would eliminate lots of S nodes:

If T height is more than 2, then lots of low-level S nodes are eliminated.
If T height is 2 or 1, then T size would be at most 3. Most high-level S nodes are eliminated.

— Solution 1: For both T and S, We take in-order walk to assign incremental IDs, then take pre-order walk to produce an array of IDs that represent the tree structure.

Can We run a level-aware BST. Only one level need to be examined … wrong!

I think the in-order walk itself is what I need. Find any node Y in S that matches size+height+payload of T root. Suppose ID(Y)=44 but ID(T root) = 4, then simply shift down by 40 and do a linear scan. Must check payload, but not height/size.

 

Pacific/Atlantic #51%

https://leetcode.com/problems/pacific-atlantic-water-flow/

Q: Given an m*n matrix of non-negative integers representing the height of each unit cell in a continent, the “Pacific ocean” touches the left and top edges of the matrix and the “Atlantic ocean” touches the right and bottom edges. Water can only flow in four directions (up, down, left, or right) from a cell to another one with height equal or lower.

Print all cells where water can flow to both the Pacific and Atlantic ocean.

====Analysis
peek? not yet
— solution 1:
Create m * n nodes in a 2D grid. Two more nodes:
All north|west boundary nodes have an edge From Pacific node
All south|east boundary nodes have an edge From Atlantic node

Between two adj cells, there’s 1 or 2 directed edges, from low to high. Intuition — tracing pollutant upstream. An arrow from P to Q means Q is suspect.

How do I represent the graph? Each node can hold a list of “suspect neighbors” (higher or equal).

Run BFT or DFT from Pacific node and turn on flag in each reachable node.

Run the same from Atlantic. Every reachable node with the flag set is a double-suspect to be printed.

clean-up: non-overlapping intervals #70%

Q (L435): Given a collection of intervals, find the minimum number of intervals you need to remove to make the rest of the intervals non-overlapping.

Note:

  1. You may assume the interval’s end point is always bigger than its start point.
  2. Intervals like [1,2] and [2,3] have borders “touching” but they don’t overlap each other.

==== analysis:

I think this is same as the greedy room scheduler. Schedule the earliest-ending task, so to to maximize accepted meetings.

A deselected meeting is an interval removed.

recent algo questions: mostly presented in intArray+binTree

binary tree + int_array represent 60% of my recent algo questions. With string and matrix included, we get 80%.

  • “int_array” also include array of int_tuples. Other arrays are rarely quizzed
  • In the real world, binTree is less common than arrays , but trees are more common than binTree. I think interview questions tend to get watered down to the binary species. K-ary trees are rarely quizzed
  • matrix — 95% of Matrix problems are 2D grid problems or integer array problems.
  • graph — Most graph problems are presented as matrix or binary trees, occasionally with cycle.
  • slist — shows up in 5-8% of problem descriptions. Some problems need slist as auxDS.

Leetcode is representative of a constellation of websites.

iterate K pre-sorted uneven immutable lists #FB

Interviewer (Richard) was kind enough to offer me a tip early enough. so we didn’t waste time (which could easily result in out-of-time)

Q: given K pre-sorted immutable lists, each up to N items, return an iterator that on demand yields each of the (up to K*N) items in sorted sequence.

Estimate time and space complexities.

====analysis

— I first proposed pair-wise merge. Since there are logK passes, Time complexity == O(NK logK)

Space complexity is tricky. Very first pass i have to create a single list up to NK items. Then I can reuse this list in each merge. so space complexity == NK [1], but I said NK*logK. Interviewer wasn’t familiar with this solution and didn’t correct me.

[1] See https://www.geeksforgeeks.org/sort-array-two-halves-sorted/. Suppose 8 lists to merge. I will merge A+B into first quarter of the big array (size NK), then C+D into 2nd quarter… In next pass, I will merge AB + CD in-place using the first half of my big array.

The advantage of this solution — once I create a single merged array, each iteration is O(1). This can be valuable if run-time iteration speed is crucial but initial warm-up speed is unimportant.

bigO insight — merging N pre-sorted arrays is O(N logN), same as merge sort?

— Then interviewer suggested iterating over the K lists so I implemented the solution in https://github.com/tiger40490/repo1/blob/py1/py/88miscIVQ/itr_K_presortedLists_FB.py

  • Space complexity = K
  • Time complexity:
    • next() O(logK) due to popping. I said Fib heap has O(1) insert
    • init() O(K)
    • hasNext() O(1)

How about a RBTree instead of heap? Space complexity is unchanged.  Time complexity:

  • next() O(logK) for both remove and insert
  • init()  O(K logK), worse than priorityQ
  • hasNext() O(1)

— FB interviewer asked why I prefer to keep state in global var rather than a per-record field

%%A: Runtime heap allocation is slower if the record is bigger. In contrast, the global dictionary is tiny and likely lives in L1-cache

find offending section ] originally-sorted array

I rephrase Leetcode 581 Q: There was an ascending int array containing duplicates. Someone reshuffled some elements so no longer sorted. Write a function to determine the shortest subarray needs reshuffle. Before or after this subarray, no change needed.

O(N) time and O(1) space.

==== analysis
Not “easy” as labelled.

I want to find lowest integer that’s out of place. (Do the same on the other end and problem completed.)

first scan to find global min and max values, and ensure they are correctly placed.

Next scan from beginning on the rising slope. At first down-turn, treat the array as two partitions. In the right partition after the first peak, we determine the lowest value, say, -55. Then we will find the correct position for -55, within the Left partition! We will look for the last item equal-or-lower-than -55…. This is the key constraint of this entire problem.

Insight — We know -55 must move left and we don’t care about any other item in the right partition.

Insight — As a concrete illustration, if within left partition, -55 should stand after #7, then we know all Earlier items up to #7 are already correctly placed. Why? All Subsequent items in the left section are strictly higher than -55; and all other items in right section are all (equal or)higher than -55.

— my idea 1 in ON) space : radix sort a clone of the array, then compare to the input array.

wildcard matching #more tractable than regex_FB

Q (Leetcode44): Given an input string (s) and a pattern (p), implement wildcard pattern matching with support for ‘?’ and ‘*’. The matching should cover the entire input string (not partial).

‘?’ Matches any single character.
‘*’ Matches any sequence of characters (including the empty sequence). The star is not a quantifier. I think a single star is like the “.*” in perl.

no space allowed. I think a wildcard can even sit at end of string. I think we could see two wildcards in a row. Two stars can be treated as one star.

=====analysis

My topdn-memoization solution (on github) was accepted at Leetcode.

I feel this is simpler than the regex_FB.py problem, therefore easier to absorb. I think either DP or my original recursive solution should work.

Accounts merge #disjoint set

https://leetcode.com/problems/accounts-merge/

Q: .. too long

====Analysis
Union-find + BFT

— solution 1:
Data structure:

  • Acct object {name, vector of Email objects, isVisited flag }
  • Email object {addr, set of Accts }
  • hash table {address -> Email objects}

First scan to Create one acct object for each row. When populating the vector, check if the address exists in the hash table. If yes, then save the existing email object in the vector. Also the Email object’s set is updated with the new Acct object.

Second scan to print everything. Iterate over the accounts. Need to merge all the emails belong to John —

Scan the vector. If any Email  object has more than one Acct (named John), visit each Acct following BFT, until all John accounts are visited. Collect all of these accounts’ emails in one tree-set, then print them.

 

max-product subArray #51% untested

Q: Leetcode 152: given signed int array, find the subarray (minimum size 1) with highest product. Return the product.

Note if there’s 0 in your subarray you must return 0.

====analysis
peek? Not yet

— solution 1: O(N)

One scan to identify all the zeros. They partition the array.  For each partition, we need algorithm-A:

Count neg elements. If even, then return product of entire array. Else

Accumulate from left until last neg element -> prodL.
Accumulate from right until last neg element -> prodR. Compare the two.

trivial edge case — if the zeros create only single-neg partitions, then zero is the best.

trivial edge case — If subarray has only one element, be it zero or negative, then that’s the best for that subarray.

 

 

FizzBuzz in O(N)

Q (Leetcode ) Write a program that outputs the string representation of numbers from 1 to n. But for multiples of three it should output “Fizz” instead of the number and for the multiples of five output “Buzz”. For numbers which are multiples of both three and five output “FizzBuzz”.

What if the divisors in the requirement is not two but much more and each a prime number?

====analysis

In the standard solution, there’s a hashtable O(NK) algo — for every int 1 to N, check the K divisors.  Here’s a O(N) solution:

  • first populate the return vector with the basic strings
  • Then make a strided iteration to append “Fizz” on every multiple of 3.
  • Then make a strided iteration to append “Buzz” on every multiple of 5.

The fact that prime numbers become very high very soon help our bigO:

N/3 + N/5 +… + N/31 + N/37… will be lower than 5N

generate combinationSum compositions #backtrack up]trie #2review

Q: https://leetcode.com/problems/combination-sum/description/

Given a set of unique candidate numbers and a target number, find all unique combinations of candidates, where each combination sums to target. Each candidate may be used repeatedly.

My solution is https://github.com/tiger40490/repo1/blob/cpp1/cpp/combo_permu/comboSum.cpp , showing a reusable backtracking technique described below. It’s clever and brief. However, the efficiency is questionable. Memoization might be applicable here.

This backtracking relies on a key insight. Suppose we have target = 7 and X=2,Y=3,Z=4 as the candidates.

  • when we try a Y, we don’t need to try any more Xs. For example, If we are trying XY, then all XX* solutions are already handled by earlier recursive calls.
  • each combo sequence is naturally pre-sorted internally.
  • Also, “lower” formulas are generated before “higher” formulas
                 root
          x        y     z
      /   |  \
     x    y   z       
    / \   | \  \
 xxx  xxy | xyz \ 
         xyy     xzz
void //can return something if needed

recurs( solutionsFound &, //growing
        curPartialSolution &, 
// above collections could be global variables, to simplify things

        remainingCandidates, /*probably an immutable global array. 
If we need the remaining array to shrink, we can still rely on startIndex to skip used candidates.*/

        gapFromTarget, 
        startIndex, //used to select from remaining candidates
) 

Inside this function, we scan remaining candidates starting from startIndex. Typically in one iteration

  1. we add a new candidate into curPartialSolution
  2. we call recurs
  3. we remove the last added candidate from curPartialSolution to restore the original curPartialSolution — backtracking up the tree.
  4. move on to the next candidate

I feel this is more greedy than DP

concretize asterisk among brackets #Okao

https://leetcode.com/problems/valid-parenthesis-string/

Q (Leetcode 678): Given a string containing only three types of characters: ‘(‘, ‘)’ and ‘*’, write a function to check whether this string is valid.  An asterisk can concretize to empty string or a left or right bracket

====analysis

Kyle reviewed an O(N) solution, but advanced. Not really a “Medium” question if you want optimal. However in an interview perhaps a less optimized solution is good enough?

Worst data set will point out the constraint and structure, as explained in CIV: we wish all corner cases+worst data sets r given and clarify_requirement means

Need to use spreadsheet to work out a good sample data.

I feel confident to crack it if I get a good sample including some worst data.

— published O(N) solution

One scan. At each position ask the question “How many extra openers can there be up to this char” When the answer is negative, we give up and return false, but usually the answer is s a range like 0-2.

Use two global variables to record the lowest answer, and highest possible answer.

  • When we hit a ‘(‘, lo++ and hi++
  • when we hit a ‘)’, lo– and hi–
  • when we hit a ‘*’, then range expands,
    • lo—. Lowest possible extra opener count is one lower now, since the asterisk can be a closer
    • high++. Highest possible extra opener count is one higher now, since the asterisk can be an opener

I think at end of the scan, lo should be zero, i.e. zero-extra-opener is at least possible.

touch not cross: path between 2 corners

Q1: from p 183 [[discrete math]]: given a n x n grid. Start from north west corner moving south or east each step, towards that corner. The diagonal connecting them can be touched from north, but not crossed. print all paths

easier to treat origin as [0,0] and end as [N,N]

DFT will require deep recursion.
BFT (with color) where each node remembers all paths-from-root? Kinda brute force

Insight — Actually this is not necessarily a graph problem though it can be solved that way.

Q2 (accepted@leetcode ): Given n pairs of parentheses, write a function to generate all combinations of well-formed parentheses. For example, given n = 3, a solution set is:

[ “((()))”,
“(()())”,
“(())()”,
“()(())”,
“()()()” ]

====analysis
Four related problems — These two problems are related to the abbreviation generator i.e. the combination generator.

However, the abbr/combo generators are more versatile and possibly overkill for this problem. These two problems can use a bit array to represent the output. I think my solution on github is probably considered inelegant but I don’t care. BigO insight —  number of paths (or valid strings) is O(N!) so any solution would not be any better than O(N!)

In general, our own ideas are often inefficient. If efficient, then often inelegant by some arbitrary interview standard. Still more valuable than learning standard solutions. One of the biggest values is xRef which helps build insight, intuition, thick->thin.

lone-wolf hidden ] AAABBBCCC unsorted #52%

Q[Lv] 60%: Given array of integers, every element appears three times except for one, which appears exactly once. Find that single one in O(1) time. Could you implement it without using extra memory?

====analysis

peek? Not yet

well-defined problem:)
greedy?
O(1) space probably means swapping
— Idea 1: with more space, I can use a hashtable of count to achieve O(N)
— Idea 2: with O(1) space I can also sort in O(N logN)
— Idea 9: My O(N) algo, applicable for any-size integers and also other than “three”.

Pick a random pivot and partition in O(N) time and O(1) space. Also keep track how many repetitions of the pivot value (probably 3). Exclude the pivot value, count size of both partitions and discard the one whose size=3X. Repeat.

N+N/2+N/4+N/8 …= 2N

3-sum #52%

Q[L] Given an array nums of K integers, are there elements a, b, c in nums such that a + b + c = 0? Find all unique triplets in the array which gives the sum of zero. Doable in O(KK)

====analysis

— Solution 1: separate into (M) non-negatives and (N) negatives. M+N=K.

— solution 1a: given M targets and N negative items,

if M > N, then check O(NN) pairs among negatives …. and look up the sum in a pre-populated hash table of M items.
If M < N, then use the two-sum algo O(MN)

Total O(MN + MM) for small M
Total O(MN + NN) for small N

Both comes out to O(MN+max(M,N)), smaller than O( [N+M]^2 )

Note some bigO students don’t like too many tokens and may simplify it to O(KK), but I think it’s imprecise… see when to introduce new tokens in O()

— solution 1c: given M targets and N negative items,

I can sort the N-array. For each target, I can binary-search, but this is possibly O(MN logN)

max-sum path up+down binTree #FB

Q2 (Leetcode “hard” Q124): find max path sum, where a ” path ” has minimum one node A and can include A’s left child and/or A’s right child. No uplink available.

====analysis

I see two types of paths

  1. down the tree, starting at some upstream node A ending at a node in A’s subtree
  2. up-down the tree, starting at some A’s left subtree, ending somewhere in A’s right subtree.

For 1) https://bintanvictor.wordpress.com/wp-admin/post.php?post=23360&action=edit has my tested solution

For 2: post-order walk to update each node a (signed) max-path-sum-from-here. See https://bintanvictor.wordpress.com/wp-admin/post.php?post=31601&action=edit. The 2 values from left+right children can solve this sub-problem

Q: can we use the 2) solution for 1)? I think so

sorted_Array|_List ⇒ balanced_BST

Q1 (easy): Given an array where elements are sorted in ascending order, convert it to a height balanced BST, where the depth of the two subtrees of every node never differ by more than 1.

Q2 (medium): How about a sorted slist?

==== analysis

I feel this “easy” problem should be medium. Perhaps there are elegant solutions.

I need not care about the payloads in the array. I can assume each payload equals the subscript.

There are many balanced BSTs. Just need to find one

— idea 1: construct an sequence of positions to probe the array. Each probed value would be added to the tree. The tree would grow level by level, starting from root.

The sequence is similar to a BFT. This way of tree building ensures

  • only the lowest branch nodes can be incomplete.
  • at all branch levels, node count is 2^L

I think it’s challenging to ensure we don’t miss any position.

Observation — a segment of size 7 or shorter is easy to convert into a balanced subtree, to be attached to the final BST.

When a subarray (between two probed positions) is recognized as such a segment we can pass it to a simple routine that returns the “root” of the subtree, and attach it to the final BST. This design is visual and a clean solution to the “end-of-iteration” challenge.

The alternative solution would iterate until the segment sizes become 0 or 1. I think the coding interviewer probably prefers this solution as it is shorter but I prefer mine.

— idea 2

Note a complete binary tree can be efficiently represented as an array indexed from one (not zero).

— Idea 3 (elegant) dummy payload — For the slist without using extra storage, I can construct a balanced tree with dummy payload, and fill in the payload via in-order walk

All tree levels are full except the leaf level — guarantees height-balanced. We can leave the right-most leaf nodes missing — a so-called “complete” binary tree. This way the tree-construction is simple — build level by level, left to right.

— idea 4 STL insert() — For the slist, we can achieve O(N) by inserting each element in constant time using STL insert() with hint

— For the slist without using an array, I can get the size SZ. (then divide it by 2 until it becomes 7,6,5 or 4.) Find the highest 3 bits (of SZ) which represent an int t, where 4 <= t <= 7. Suppose that value is 6.

We are lucky if every leaf-subtree can be size 6. Then we just construct eight (or 4, or 16 …) such leaf-subtrees

If SZ doesn’t give us such a nice scenario, then some leaf-subtrees would be size 5.  Then my solution is to construct eight (or 4, or 16 …) leaf-trees of type AAAAA (size 6) then BBB (size 5).

Lucky or unlucky, the remaining nodes must number 2^K-1 like 3,7,15,31 etc.  We then construct the next level.

–For the slist without using an array, I can propose a O(N logN) recursive solution:
f(slist, sz){
locate the middle node of slist. Construct a tree with root node populated with this value.
Then cut the left segment as a separated slist1, compute its length as sz1. node1 = f(slist1,sz1); set node1 as the left child of root.
Repeat it for the right segment
return the root
}

in-place-remove all dupes from sorted array #100%

https://leetcode.com/problems/remove-duplicates-from-sorted-array/ fully tested.

Simplification theme — minimize state maintenance during the scan. Eliminate every loop variables that can be derived.

variable: lastGoodPos — up to this position, all unique
variable: cur — the front pointer position under examination

My algo — During the scan, if a[cur] == a[cur-1] then just advance cur
else i.e. a[cur] is bigger, then need to save this bigger item in a[lastGood+1]

Roman-to-integer converter

https://leetcode.com/problems/roman-to-integer/

Roman numerals are usually written largest to smallest from left to right. However, the numeral for four is not IIII. Instead, the number four is written as IV. Because the one is before the five we subtract it making four. The same principle applies to the number nine, which is written as IX. There are six instances where subtraction is used:

  • I can be placed before V (5) and X (10) to make 4 and 9.
  • X can be placed before L (50) and C (100) to make 40 and 90.
  • C can be placed before D (500) and M (1000) to make 400 and 900.

Input: “MCMXCIV” Output: 1994 Explanation: M = 1000, CM = 900, XC = 90 and IV = 4.

==== analysis

Too complicated for a speed coding test?

Reusable technique — One back scan might be enough. Normally the rank of letters encountered would increase. A decrease (only one position) means subtraction. See code in https://leetcode.com/problems/roman-to-integer/discuss/6547/Clean-O(n)-c%2B%2B-solution

Reusable technique — hardcode the 6 “subtraction” cases to simplify the if/else logic.



			

invalid/unbalanced brackets: kernel #62%

Q: given a string of N left+right brackets, you can find out how many invalid chars to remove to make it balanced. Say it’s R. There are up to N-choose-R ways to make it balanced. Print all unique balanced strings in compact form.

====analysis

subproblem: minimum how many invalid chars to remove

Useful preprocessing technique — on the left end, any closers must be removed. Same on the right end. No choice 🙂 We would end up with a valid char on each end. This is nice but optional in my Idea3 below.

— my idea 3 to count minimum cuts

Aha — There will surely be some “kernels” i.e. opener-then-closer in a row. First scan I will remove them, then remove more kernels. This is always optimal if we aim to minimize the cuts

  • [] ] [ [] ] [ [] ] becomes ]
  • []][[] becomes ][
  • [] [ [] [] [] [] becomes [
  • [[[][]][[]] becomes [
  • [[][]]][[]]][[] becomes ]][

What remain are the positions of bad chars. I need to remember these positions.

Case: closers only. Let’s look at one position like #55. We can cut #55 or another closer at an earlier position.

Case: openers only. Similar to the above.

Case: closers-openers. The original string is partitioned into exactly two sections, each similar to the above cases.

merge 2 binTrees by node position

Q (leetcode 617): https://leetcode.com/problems/merge-two-binary-trees/submissions/

==== Analysis

https://github.com/tiger40490/repo1/blob/py1/py/algo_tree/merge2Tree.py is a short, simple solution fully tested on leetcode.com but hard to run offline. Elegant in term of implementation.

Insight — Challenge is implementation. Input and return type of DFT function are tricky but server as useful implementation techniques.

Labelled as easy, but pretty hard for me.

— Idea 1: BFT. When a node has any null child, put null into queue. Not so simple to pair up the two iterations

— Solution 2: DFT on both trees. Always move down in lock steps. When I notice a node in Tree A is missing child link that Tree B has, then I need to suspend the Tree A DFT?

My DFT function would have two parameters nodeInA and nodeInB. One of them could be null , but the null handling is messy.

Aha — I set the input parameters to to dummy objects, to avoid the excessive null check. In this problem, this technique is not absolutely necessary, but very useful in general

 

identical binTree

Q: Given two binary trees, write a function to check if they are the same or not. Two binary trees are considered the same if they are structurally identical and the nodes have the same key.

====analysis:

Look at hints? No need !

— solution 1: I can use the serialization solution and compare the two serialized strings in real time.

— solution 2: BFT but ensure each branch level has strictly 2^n items including nulls

Intuition — If both sides match up on every level, then identical

This solution works if each tree node has a fixed number of child nodes.

— solution 2b: a null node at level N will reduce Level N+1 nodes by two.

— solution 3: recursion
bool diff(node1, node2){
if diff(node1.left, node2.left) or diff(node1.right, node2,right): return false
}
There might be some implementation issues

— Idea 9: BFT but each queue item is {payload, childrenCount}

This solution doesn’t work for binTree as it can’t detect “null leftChild” vs “null rightChild”.

This solution works if each node can have arbitrary number of child nodes, but I don’t know how common this is.

nth largest element in unsorted array #QuickSelect

Q: Find the kth largest element in an unsorted array. Note that it is the kth largest element in the sorted order, not the kth distinct element.

I think this is mostly a bigO algoQQ problem.

std::nth_element is linear on average .. https://stackoverflow.com/questions/11068429/nth-element-implementations-complexities talks about QuickSelect algo

— idea 6: priority Q (Fib-heap) of size k
if any item is higher than the min, then pop min O(logK) and insert in O(1)
— idea 6: priority Q
Linear time to build it
— idea 5: 95th percentile problem from NYSE
— idea 4: multiple scans
— idea 3: segments
— Sol2: O(N). use the O(N) algo in the blog on “given int array, find median ] O(N)”. Then discard one of the two segments. Then repeat.
Note: Each time the target element must exist in one of the 2 segments.

O(N) + O(N/2) + O(N/4) … -> O(N)

— Sol2a: Use the O(N) qsort partition algo to anchor a (random) pivot element to create two segments. Our target must exist in one of the two, so discard the other by adjusting the le/ri boundaries.

This idea is same as the most voted solution in leetcode discussion.
O(N) on average — we get O(N)+O(N/2) + O(N/4) + … < O(2N)

Note average complexity is acceptable in hashtable!

in-place merge: 2 sorted arrays

Q: Given two sorted integer arrays nums1 and nums2, merge nums2 into nums1 as one sorted array.

  • The number of elements initialized in nums1 and nums2 are m and n respectively.
  • You may assume that nums1 has enough space (size that is greater or equal to m + n) to hold additional elements from nums2.

I would add a requirement — O(1) additional space. so you can’t create another array. This can be realistic if allocation is strictly controlled to prevent fragmentation in embedded environment.

====analysis:

Rather contrived, so I won’t spend too much time

–Idea: use num1’s right portion as the “new” array.

Suppose the allocated array of num1 has capacity k >= m + n. I will call it array KK. Note The right portion of KK is currently unused, so I can wipe it clean with some dummy value.

( If no dummy value is possible, then I probably can still solve the problem but with less clarity. )

Now backscan both arrays and put the highest value in KK[m+n-1] , filling KK leftward. The spare capacity to the right of this position will remain unused forever.

Implementation note — We need a back-scanner pointer into num1 as “cur” + another pointer to the right, “lastPicked”… meaning the item at this position has been copied to KK.

(We may not need lastPicked pointer, but it is less ambiguous more clear, easier to reason with.  You may say it’s a device for analysis and communication, not necessarily for coding.)

We also need such a pointer into num2.

staircase:1or2 each step

Q: You are climbing a stair case. It takes n steps to reach to the top. Each time you can either climb 1 or 2 steps. In how many distinct ways can you climb to the top?

I think this problem is not “easy”. similar to Fib:

f(3) = f(1) + f(2) where f(1) = how many unique paths after taking a double-step; f(2) how many unique paths after taking a single step.

word ladder: build graph +! test`every pair

Leetcode Q 127: word ladder — Given two words (beginWord and endWord), and a word list, find the length of shortest transformation sequence from beginWord to endWord, such that:

* Only one letter can be changed at a time.
* Each transformed word must exist in the word list. Note that beginWord is not a transformed word.
* Return 0 if there is no such transformation sequence.
* All words have the same length.
* All words contain only lowercase alphabetic characters.
* You may assume no duplicates in the word list.

Example 1:
beginWord = “hit”,
endWord = “cog”,
wordList = [“hot”,”dot”,”dog”,”lot”,”log”,”cog”]
Output: 5
Explanation: As one shortest transformation is “hit” -> “hot” -> “dot” -> “dog” -> “cog”,
return its length 5.

Example 2:
beginWord = “hit”
endWord = “cog”
wordList = [“hot”,”dot”,”dog”,”lot”,”log”]
Output: 0

==== analysis:
First scan O(NN)to build the bidirectional edges of the graph. Given an existing graph of N words (N can be 1), a new word is compared against each to construct the edge list.

N(N+1)/2 comparisons. No need to be extra clever as this simple O(NN) algo is optimal for small N. No need to worry about the visualization of the graph either because the edge list is a proven representation

Aha — depending on the relative size of N and S (the standard length of every word), the optimal algo is different !

— Idea 2

Now I realize there’s a more efficient algo to build the graph, based on the neglected fact that all strings have equal length.

  • for a new word (always same length S), like abc, create S(=3) “patterns” — *bc, a*c, ab*.
  • each pattern will try to join an existing club or create a new club. Clubs are maintained in a hashtable of {pattern -> list of original words} In java or python, each word is basically a pointer to an an immutable global object.
  • If joining an existing club, then all existing club members are linked to the new word, so new word will now hold a reference to this club as an “edge list”
  • I think this algo is O(N*S). If S is small, then this algo is more efficient than O(NN)

At the end of this phase, if beginWord and endWord belong to disjoint sets then return 0. However I see no simple implementation of disjoint set. Therefore, I will run 2nd scan O(N+E) BFS. But there are many cycles, so we need a hashset “Seen”, or array of size N.

Insight — Array is more elegant than hash table in this case.

To compare two words, char-wise subtraction should give all zero except one char. This last routine can be extracted to a “simple routine to be implemented later”, so no need to worry about it in a white board session.

friend-circle #Union-Find#60%

Q (Leetcode 547 union-find): There are N students in a class. Some of them are friends, while some are not. If A is a direct friend of B, and B is a direct friend of C, then A is an indirect friend of C. And we defined a friend circle is a group of students who are direct or indirect friends.

Given a N*N matrix M representing the friend relationship between students in the class. If M[i][j] = 1, then the ith and jth students are direct friends with each other, otherwise not. And you have to output the total number of friend circles among all the students.

— analysis:
Rated “medium” on leetcode but my Design #1 is easier than many “easy” questions. Clearly this is a data-structure question … my traditional stronghold.

Challenge is merging.

— design 3: island count by BFS, but I think DFS might be easier

— design 1:
lookup map{studentId -> circleId}
Circle class{ circleId, presized vector of studentId}

When we merge two circles, the smaller circle’s students would /each/ update their circleId. This merge process has modest performance but simple.

In reality, students outnumber circles, so here’s an alternative ..

— design 2:
map remains same (Not optional!) .
Circle class is now {circleId, parentCircleId (default -1)}

The swallowed circle will have this->parentCircleId set to a top-level circleId… Path-compression as described in disjoint set.
The merge would only update this one field in one or more Circles. O(H) i.e. height of tree. H is usually very small because at any time, each circle’s parentCircleId is either -1 or a top-level circle — I hope to maintain this invariant.

Scenario:

  1. circles AA, BB, CC created
  2. circle a2 acquired by AA
  3. circle a3 acquired by a2 ultimately “branded” by AA
  4. circle b2 and b3 acquired by BB
  5. a2 swallows b2 –> need to update BB as acquired. When we try to update b2.parentCircleId, we realize it’s already set, so we follow the uplink to trace to the top-level node BB, and update ALL nodes on the path, including b2 as b2 is on the “path” to BB, but do we have to update b3 which is off the path? Suppose I don’t. I think it’s safe.
  6. circle c2 acquired by CC
  7. c2 now swallowed by b3. Now c2 will get branded by AA, and so should the nodes on the path ( b3 -> BB -> AA) This chain-update would speed up future mergers. Should C2’s old parent (CC) also get branded by AA? I think so.

After the data structures are fully updated, we simply return the count of top-level circles. (Each time a top-level circle gets created or disappears, we update that count.)

Additional field in Circle: The vector of studentId is needed only if we need to output the individual students in a given circle.

balloon burst #DP optimization #50%

Q [ Leetcode 312]: not really classic : Given n (up to 500) balloons, indexed from 0 to n-1. Each balloon is painted with a number on it represented by array “nums”. You are asked to burst all the balloons one by one. If the you burst balloon i you will get nums[left] * nums[i] * nums[right] coins. Here left and right are adjacent indices of i. After the burst, the left and right then becomes adjacent. Find the maximum coins you can collect by bursting the balloons wisely.

If you burst a leftmost balloon, you collect 1*it*rightNeighbor coins. In other words, when multiplying 3 numbers, any absentee is a one.

0 ≤ nums[i] ≤ 100

Example: Input: [3,1,5,8]
Output: 167
Explanation: nums = [3,1,5,8] –> [3,5,8] –> [3,8] –> [8] –> []
coins = 3*1*5 + 3*5*8 + 1*3*8 + 1*8*1 = 167
==analysis:
int-array optimization problem.
Might be related to some classic problem.

Let’s define a generic math-function of 3 balloon IDs score(myle, me, myri). In this problem, score() is simply “return myle*me*myri “, but in the next problem, score() could be any math function of the three inputs.

I see each possible snapshot (having K balloons, i.e. at level K) as a graph node. Exactly 2^N nodes in the grid, i.e. 2^N possible snapshots i.e. 2^N combinations of these N balloons.

Every edge has a score. To compute the score, we only need the two nodes (snapshots) of the edge to identify the 3 balloons for score().

Pyramid — Let’s assume at bottom is “origin” i.e. snapshot of the original array ..Level 500; on top is “phi” i.e. snapshot of the empty array .. Level 0.

The problem transforms into a max path sum problem between these 2 nodes.

–solution-1 DP
From origin to any given node, there are many distinct paths each with a total score up to that node. If a node has 55 paths to it, the max sum among the 55 paths would be the uprank (upward rank) of the node.

If the node also has 44 paths from phi, the max sum among the 44 paths would be the downrank (downwrd rank) of the node. This is an interesting observation, but not needed in this solution since every edge is evaluated exactly once.

To our delight, uprank of a node AA at Level-5 depends only on the six Level-6 parent node upranks, so we don’t need to remember all the distinct paths to AA:). Our space complexity is the size of previous level + current level.

We just need to compute the uprank of every node at Level 6, then use those numbers to work out Level 5…. the Level 4 … all the way to phi.

If there are x nodes at Level 6 and y nodes at level 5, then there are 6x==5y edges linking the two levels.

Time complexity is O(V+E) i.e. visit every edge.

Level n: 1 node
Level n-1: n nodes
Level n-2: nc2 nodes

Level 2: nc2 nodes
Level 1: n nodes
Level 0: 1 node

Each node at level K has K child nodes above. This graph now suggests the max-path-sum algo (with edge scores), but it might be the only way to solve the problem, like the bbg odometer.

consider a DP algo to update the score at each node at level K, ie the max sum from root till here, via one of the K-1 nodes at level K-1

But Level 2 has too many (N-choose-2) nodes. Can We prune the tree, from either origin or phi?

Alien dictionary

https://leetcode.com/problems/verifying-an-alien-dictionary/

Suppose Total C characters, and N words

====analysis

Mostly implementation challenge.

insight — Published solution is mediocre performance as it scans each word exactly TWICE, but luckily “twice” doesn’t affect bigO — O(total char count across all words)

— idea 1: maintain a linked list of “clusters”. Each cluster is {pos, startWordID, optional lastWordID} Each cluster has words with the same prefix up to pos.

copy first letter of N words into an N-array. verify this array is sorted. Now separate the words into up to 26 clusters. Suppose we a cluster of 55 words. This cluster is the payload of a link node. When we look at 2nd char within this cluster, we see up to 26 sub-clusters, so we replace the big cluster with these sub-clusters.

Invariant — the original order among the N words is never changed.

Even if this idea is overkill, it can be useful in other problems.

the verify(arrayOf1stChar) is a util function.

— Idea 4: convert each word to an English word, in O(C).

Then sort the collection. What’s the O()? O(N logN C/N) = O(C logN)

— idea 5: compute a score for each word and check the words are sorted in O(N)

O(1)getRandom+add+del on Bag #Rahul

Q: create an unordered multiset with O(1) add(Item), del(Item) and a getRandom() having the  probability of returning any item  based on the PMF.

Rahul posed this question to our Princeton PhD candidate, who needed some help on the Bag version.

====my solution:
On the spot, I designed a vector<Item> + hashmap<Item, hashset<Pos>>. The hashset records the positions within the vector.

Aha — Invariant — My vector will be designed to have no empty slots even after many del(). Therefore vec[ random() * vec.size() ] will satisfy getRandom() PMF.

add() would simply (in O(1)) append to the vector, and to the hashset.

— del() algo is tricky, as Rahul and I agreed. Here’s an illustration: Let’s say Item ‘A’ appears at positions 3,7,8,11,16 and B appears at positions 2,5,31 (the last in the vector). del(A) needs to remove one of the A’s and move the B@31 into that A’s position.

  1. Suppose the PMF engine picks vec[11] which is an A.
  2. unconditionally O(1) find the item at the last position in vector. We find a B, which is different from our ‘A’
  3. Here’s how to physically remove the A from position 11:
  4. O(1) replace ‘A’ with ‘B’ at position 11 in the vector
  5. O(1) remove 11 from A’s hashset and add 11 into B’s hashset, so A’s hashset size decrements.
  6. O(1) remove 31 from B’s hashset, so B’s hashset size remains

2 nodes] binTree-with-cycle: locate common ancestor

Q (Leetcode #236): given 2 valid nodes (and root) of a binary tree, find the lowest common ancestor. A node can be a (direct/indirect) descendant of itself. All values distinct. No uplink.

classic problem:)

Q2 (my own requirement): what if cycle is possible?

My idea — Just run a lazy-dft to find the two paths-from-root. On each path, if we detect a cycle we terminate that path. Before terminating any path, need to check if we hit both nodes, so after finding one node we must go all the way to the leaf node or the one of the 2 given node.

As soon as we find the 2 paths we terminate DFT.

IIF two CPUs are given, my dft will use two threads — one left to right; the other right to left. This will more quickly locate the 2 target nodes if they appear near extremities.

https://github.com/tiger40490/repo1/blob/cpp1/cpp/binTree/commonAncestor_Cycle.cpp is my self-tested code, not tested on Leetcode

rangeAND: bitwise AND of all ints in range

Q: Given a continuous range [m, n] where 0 <= m <= n <= 2147483647, return the bitwise AND of all numbers in this range, inclusive.

  • the lowest bit will most likely see 0 and 1 so … becomes zero
  • (turns out to be a minor tip) if the range has size > 8 then lowest 3 bits all zeroed out
  • imagine the bit array object incrementing from m to n. We want to find out if there’s a stable higher portion
  • Key technique — look at some examples. We can convert m and n to two bit images. we can look at some examples below.
  • we can compare the two bit images left to right to find the “higher portion”. All lower bits are probably zeroed out in the result

10110101
10110100
10110011
10110010
10110001
—–
10110000

–my solution not tested on Leetcode: https://github.com/tiger40490/repo1/blob/cpp1/cpp/rangeAnd.cpp
* compare m and n left to right. If m is shorter, then return 0.
* if same length, then compare left to right until a difference is found. Until that bit, all left-end bits are “retained”.

 

longest substring+!repeating chars #60%#peek

Q(leetcode #3): Given a string, find the longest substring without repeating characters.

–Sol1 O(N):
keep a never-shrinking sliding window + a “hashmap” of chars in it. Actually this HM can be a 26-element integer array of frequencies.

Every time the lagging edge of the windows moves by one, by definition one char drops out, so we remove that char from the HM, by decrementing its frequency. If hitting 0 then we also decrement a global var uniqCnt := sizeof the HM.

IFF uniqCnt == windowSz then window is a clean.

Every time we see a clean window and it’s longer than the longest clean window, we update our record.

max all-black subMatrix #ZR

Same problem as https://leetcode.com/problems/maximal-rectangle/description/

Q: Given a 2D binary matrix (L by N) filled with white(0) and black(1) cells, find the largest all-black rectangle. See raiserchu’s mail on 12 Sep 13. There is a clever DP solution, probably O(LN).

—Analysis:

Worst case — A standard chess board? We can’t do better than O(LN) since there are LN cells to read.

–O(LN) leetcode solution based on histogram

https://github.com/tiger40490/repo1/blob/cpp1/cpp/algo_2d/maxRectangle.cpp .. is latest code with my adaptations and my detailed comments.

— sol5:

First scan O(LN) to record, in each cell {bar height; horizontalBarStart/End}.

— idea 4unfinished

Scan #1 O(LN): build a shadow matrix “histogram” where each integer in the cell is the height (possibly 0) of the bar anchored therein.

Scan #2 O(LN) for each cell, remember the currentRunStart column index i.e. from that column until current column, we have an all-black box of height == current bar height

— sol3 O(LNS) new idea based on max rectangle ] histogram treat top 2 (denote J:=2) rows as a histogram. Find the max rectangle therein. Then J:=3 …

  • Scan #1 O(LN): build a shadow matrix “histogram” where each integer in the cell is the height (possibly 0) of the bar anchored therein. In other words, if a cell value=5 then there are exactly 4 consecutive black cells above this (black) cell. Build it incrementally, level by level, top to bottom.
  • Scan #2a: for each row in the shadow matrix, we run the proven algo in O(NS), Note there’s no help from previous row:(
    • S:= #unique heights, N:= matrix width 
  • Scan #2 := the entire scan of L rows. so worst case we hit O(LNS)

Q: Can we do better by reducing scan #2a complexity to O(N), by making use of the previous row results?

— My brute force solution 1: Each rectangle is identified by 2 vertices, i.e 4 integers. Without loss of generality, We require the “high” corner to have higher x-coordinate and higher y-coordinate than the “low” corner. (We can assume y-axis run upward.) With this O(N^4) nested loop we can iterate over all possible rectangles:

Lock low corner
Move high corner in typewriter (zigzag) steps i.e.
  hold highY and move highX step by step
  process the (series of) resulting rectangles
  increment highY and repeat
Move the lower corner in typewriter steps and repeat

Key observation: any “bad pixel” disqualifies every rectangle containing it.

— Here’s my partial solution:
We can effectively ignore all the “good pixels”.

1) Look at the x coordinates of all bad pixels. Sort them into an array. Find the largest gap. Suppose it’s between x=22 and x=33. Our candidate rectangle extends horizontally from 23 to 32, exactly. Notice there’s no bad pixel within this vertical band [1].
2) Look at the y coordinates of all bad pixels. Sort them into an array. Find the largest gap. Suppose it’s between y=15 and y=18. Our candidate rectangle extends vertically from 16 to 17, exactly.
[1] This candidate rectangle can expand All the way vertically, though it may give a bigger rectangle
Ditto horizontally.

SCB-FM stack-based FIFO in O(1)amortized

Q: given a hardware-based stack API consisting of 3 functions {pop/push/isEmpty}, please implement a queue api consisting of 3 functions {enqueue/dequeue/isEmpty}

https://leetcode.com/problems/implement-queue-using-stacks/description/ is similar

====analysis====

service dequeue from a hidden stack.

When hidden stack is empty, pop all nodes from visible stack to hidden stack. Amortized O(1) pop()

isEmpty() must add up two sizes.

[[python cookbook]] P658 implements this classic algo in 9 lines.

LFU cache #cf.LRU #72%

Q LFU (Least-Frequently-Used) cache to support the following operations: get and put in O(1)
* get(key) – Get the value (will always be positive) of the key if the key exists in the cache, otherwise return -1.
* put(key, value) – Set or insert the value if the key is not already present. When the cache reaches its capacity, it should invalidate the least frequently used item before inserting a new item. For the purpose of this problem, when there is a tie (i.e., two or more keys that have the same frequency), the least recently used key would be evicted.

====Analysis

  1. dstruc — centry i.e. CacheEntry node {key, value, hitCount, lastHit (timestamp), (optional)ptr to host LinkNode}, to be used in an inner linked list.
    • invariant: hitCount can only increase
  2. dstruct — inner minilist of centry nodes
    • invariant: list always sorted by lastHit. We can remove any intermediate node, but incoming node is always added to the Tail
  3. dstruct — fixed-sized (rehash-free) hashtable {key -> ptr to centry}, needed for mid-stream laser-removal
  4. dstruct — LinkNode {level, minilist-of-centry} where all centry objects share the same hitCount denoted “level”.
  5. dstruct — outer list of LinkNodes, always sorted by level

“bubble-up” operation — Whenever a centry gets a cache-hit, its hitCount increments. It immediately and unconditionally bubbles up to the LinkNode one level higher (to be created in O(1) if necessary) ((
* [o1] query the hashtable and follow ptr to remove the centry from the minilist in an old LinkNode
* [o1] insert the centry to the new level, at Tail of minilist. The new LinkNode could be non-existent but Never empty!
* [o1] optionally, new host LinkNode’s address is saved in the centry
))

  • Get() hit — relatively easy. Update the hitCount and bubble up
  • Get() miss — trivial
  • Put() Update — similar to get-hit
  • Insertion (possibly after deletion) — [o1] append to the minilist Tail in the Level-1 LinkNode (to be created if necessary) and add to hashtable
  • Deletion — always from list to hashtable, never the converse
    • [o1] identify lowest level present, then delete the head (i.e. eviction target) of minilist
    • when a linkNode becomes empty, it must disappear from the outer list, to prevent build-up of consecutive empty LinkNodes leading to linear search for eviction target. Imagine aaaaa bbbbb c[Now need to evict an “a”]. Therefore, array of LinkNode is unacceptable.

max-profit at-most-2K trades #proven but untested

Q(Leetcode): Say you have a price history as array. Design an algorithm to find the maximum profit. You may complete at most 2K transactions, consisting of exactly K (eg 2) buys and K sells. You may not engage in multiple transactions at the same time (i.e. you must sell the stock before you buy again). No short sell please.

No O() requirement.

====analysis=====

I feel first challenge is to list all (not more than 10) scenarios. This step has taken me a few days, even though I drew many examples.

–Idea 3 for 2K, based on Leetcode discussion

f[2k, ii] represents the max profit up until prices[ii] using at most 2k transactions. 
f[2k, ii] = max(f[2k, ii-1], prices[ii] + max_for_all_jj(f[2k-2, jj-1] - prices[jj]))Two possibilities
  1. the optimal solution at [2k,ii] doesn’t involve the price point ii, so solution is f[2k,ii-1]
  2. the optimal solution at [2k,ii] involves a Sell at prince point ii. In this scenario, the last buy is at some previous price point jj, and before jj we have an optimal solution at [2k-2, jj-1]

–solution 1 (O(NN) brute force): construct all possible pairs, rank them and pick top 2.

–solution 2 (O(N) only works for K=2)

  1. Identify all the turning points so we end up with HLHLHL… We can eliminate or ignore the other points.
  2. * identify the best pair using the max-profit algo. denote them as L1/Hj
  3. * In the subarray before L1, find the best pair
  4. * in the subarray after Hj, find the best pair
  5. pick the best among the two an denote it as p2
  6. Now look into the subarray L1 to Hj. If there’s no enclosed pairs within then we have a simple case — use L1/Hj and p2. But let’s assume there are at least 2 nodes enclosed. I will denote entire subarray as L1 H1 L2 H2 … Lj Hj (where L1-Hj is the max-profit)
  7. * use max-profit algo to find the worst loss from H1 to Lj. Suppose it’s H3 to L5.
  8. If this loss exceeds p2, then the we will return L1/H3 and l5/Hj. Otherwise, return L1/Hj and p2

This solution uses the watermark algo 4 times (*).

I feel basic decision is to break the best pair or keep it

case: need to break the highest pair into 2 pairs,
case: best pair + another pair outside. I think this is easy..
case: 1,18,2,19,15,16. Perhaps the hardest case to solve.

–other ideas, for K > 2

can we use a matrix?

We can keep track of all profitable pairs i.e. le/ri indices, and also a pointer to the current best pair that’s not overlapping with “me”.

After creating 2nd pair, IFF no overlap, then we update the pointers in both instances.

After creating 7th pair, if it doesn’t overlap with the #3 highest pair, then check-update the pointer in #3.

I think if we can efficiently keep track of these then it should work.

reverse slist in K-groups

https://leetcode.com/problems/reverse-nodes-in-k-group/description/ is the problem I tried today, not a classic problem. Challenge is not the algorithm per-se but the Edit-Compile-Test-Debug cycle. I think some of us can come up with a conceptual algorithm quickly, but to implement it correctly took me hours.

Similarly, the problems below are not tough due to algorithm but the ECTD cycle can take hours, sometimes due to c++ iterator pitfalls, sometimes because we can’t easily visualize the data structure .. I wrestled with all of these problem, so please feel free to try them and discuss with me.

* print any tree (you can start with a binary) by level, in zigzag sequence
* given a linked list, write a function to remove all nodes greater than 55 (or any user input). Return the head of the modified list.
* https://www.geeksforgeeks.org/zigzag-or-diagonal-traversal-of-matrix/
* https://www.geeksforgeeks.org/create-a-matrix-with-alternating-rectangles-of-0-and-x/
* https://bintanvictor.wordpress.com/2018/02/06/spiral-number-printer/

As decided last week, I didn’t bother to run the Leetcode test suit. They make me feel frustrated, worthless, defeated, inferior, weakling, quitter…. Without these tests I ran my own tests and I feel like a joyful hacker.

Even though I may not pass all Leetcode tests, I feel my code is reasonable quality and I’m proud of it.

—-Problem is well-defined but not very common.

Given a linked list, reverse the nodes of a linked list k at a time and return its modified list.

k is a positive integer and is less than or equal to the length of the linked list. If the number of nodes is not a multiple of k then left-out nodes in the end should remain as it is. O(1) space. Hopefully O(N) time.

—-My sol1: use my existing O(1) solution but now keep a count.

https://github.com/tiger40490/repo1/blob/cpp1/cpp/linkedList/linkedListReverseInK_group.cpp

The first group and the last group are both tricky and can take up hours.

edit distance

The DP idea — compare matrix-path-counter, which is more visual and easier than This one.

Q72 on Leetcode: Given two words word1 and word2, find the minimum number of operations required to convert word1 to word2. You have the following 3 operations permitted on a word:

  1. Insert a character
  2. Delete a character
  3. Replace a character

Comment — Top 100, naturally occurring. I won’t bother to pass all Leetcode tests esp. the load tests. If I pass all non-load tests I would consider my solution decent.

https://github.com/tiger40490/repo1/tree/py1/py/str has my implementation based on a DP idea online, and a spreadsheet illustration. The idea is elegant once you wrap your mind around it.

===analysis===
Starting with the small string (length S), The challenge is to project as many of the S chars to the large string (length L). If we can project 5 chars at most, then … (wrong — the remaining S-5 chars need replacement, and the other L-S chars need insertion.)

–idea2: draw all the projection arrows from S to L. In a good-projection, every arrow on the right should be more slanted than every arrow on the left. We want the largest good-projection. In the opening example, the largest would have 5 arrows, …

—-
None of these ideas has proven effective.

size-N array find The duplicate int #1~N+1#Abhinav

https://leetcode.com/problems/find-the-duplicate-number/description/: Given an immutable int array nums containing n + 1 elements where each element is between 1 and n (inclusive), prove that at least one duplicate number must exist. You are guaranteed that there is only one duplicate number, find the duplicate value in O(1) space, below O(NN) time. The culprit may repeat many times.

I didn’t bother to write the code.

===== analaysis =====

contributed by a user and highly contrived:(
many likes:)

–bisection solution in O(N logN) time and O(1) space. I came up with this solution within a minute.

  1. Divide the full range [1 to n] into 2 almost-equal ranges (i.e. if n = 2K+1, then i use [1 to K] and [K+1 to n] as 2 ranges)
  2. Count how many nodes are in each range. Clearly one of the two ranges must have too many elements.
  3. Remember the boundary of that bad range so from now on we will ignore those nodes falling into the good range. We will use 2 variables to update/improve the boundary, until they coincide.
  4. within the bad range, repeat Step 1.

Key insight — progressive bisection.. non-recursive.

Key insight — applying pigeon-hold principle, we split the conceptual range. The more common (but ineffective) technique would split the physical array.

fewest jumps to reach right end #triple jump

Q(Leetcode 45): Given an array of non-negative integers, you are initially positioned at the first index of the array. Each element in the array represents the maximum permitted jump length from that position.

==== analysis =====
https://github.com/tiger40490/repo1/blob/py1/py/array/tripleJump.py is my solution, NOT tested on Leetcode. I won’t bother to test on leetcode. Protect my joy, momentum, absorbency

My solution not only determines feasibility but also finds the fewest jump.

Typical DP+

greedy algorithm. I will jump leftward starting from right end  [1].

Suppose there are N=99 nodes in the array. I will pre-scan the N nodes to build a shadow array of integer records, each a BestLefNode. (The first record is unused.)

Eg: If BestLefNode[44] == 33, it means that based on known data so far, the left-most (furthest) node we can jump to from Node #44 is Node #33.

Suppose the original array shows that from Node #7 we can jump 11 steps ahead. When we visit Node #7 during the rightward scan, we will update (up to) 11 BestLefNode records. These 11 records are located at #8 onwards. Each record, will be updated with “7” if appropriate.

As soon as we update BestLefNode[N-1] i.e. right-most record, we exit the initial scan since the optimal solution is now available. For example, if rightmost BestLefNode has value #88, that means the furthest node we can reach from the right end is Node #88, so we will jump to #88 and then check the best destination From #88.

[1] why not start from left end and jump rightward? No I think there’s no symmetry in this problem. From Node 1 the immediately-reachable nodes are a continuous region.

longest consecutive ints]O(N) #zebra

Popularity — 1000+ likes on Leetcode … possibly popular

Q(Leetcode #128): Given an unsorted array of integers, find the longest consecutive element sequence, in O(N) time. Eg: given [100, 4, 200, 1, 3, 2] return [1,2,3,4]

I call this the zebra problem because  every consecutive sequence of int is a black stripe and the gaps between them are white stripes. We want the widest black stripe. Obviously, each stripe has minimum size 1.

https://github.com/tiger40490/repo1/blob/py1/py/array/zebra.py is my O(N) solution, not tested on Leetcode.

========

What’s UnionFind? A reusable technique?

Like inserting interval #merging #80% done, I  feel this is a data structure problem,

To keep things simple, i will first run one iteration to remove all duplicate items.

I will use hashtable where key a known item. The value is a pointer to a “segment” object.

A segment stores the min and max values. All integers within [min, max] of the segment are always known-items during my scan of input array.

When a new item is either min-1 or max+1, we expand the segment by adjusting the extremes…

The trick is joining two segments, without link pointers. After joining, we don’t really adjust the min/max fields. We only update the max-length global variable if needed.

To keep the hashtable small, I can optionally delete from it but we don’t want to do a range delete within the loop — O(NN)

max-sum path Down binTree #self-tested

Q1: Given a non-empty binary tree of signed integers, find the maximum path sum. For this problem, a path is defined as any sequence of nodes from any starting node to any node in the tree along the parent->child connections. The path must contain at least one node and does not need to go through the root. No uplink. No cycle.

Luckily, there’s no published solution for this modified leetcode problem 🙂

====analysis====

My solution — DFT. Along each root-to-leaf path, use the max-subarray (Kadane) algo and store maxSumEndingHere value in each node, for reuse.

Q: is there any duplicate work?
A: I hope not, thanks to memoization i.e. Node::meh field

Q: do we visit every path?
A: I think so.

I simplified the idea further in

https://github.com/tiger40490/repo1/blob/cpp1/cpp/algo_binTree/maxPathSum.cpp

Time complexity is .. O(V+E) = O(N), since I visit every node and follow each edge once only.

There might be algorithmically superior solutions on leetcode but I don’t want it to affect my joy, motivation and momentum.

inserting interval #merging

Q (Leetcode): Given a set of non-overlapping intervals, insert a new interval into existing intervals (merge if necessary) and print updated list of intervals. Intervals were a vector sorted according to their start times.

–analysis–

Now I feel the #1 main data structure is a doubly linked list (dlist) of Segment objects:

  • { segment_left_mark,
  • ptr to next node, ptr to prev node
  • optionally a (bool or) enum having A/B, where A means current segment is AboveWater (an interval) or BelowWater i.e. a gap}.

Every time this dlist is modified, we would update a “helper container” — a tree of node pointers, sorted by the segment_left_mark value. Tree to help successive inserts. However, if each insert(vector intervals) has a sorted vector then we can binary search the vector and don’t need to tree.

First, binary search to locate the left mark among all existing marks. Ditto right mark. Based on these 2 results, there are many cases.

  1. done — Case (simple) both fall into the same existing interval. No op
  2. done — case (simple) both fall into the same gap segment. Create 2 new segments and insert into the dlist
  3. done — case (simple) one boundary falls into a gap the other falls into a adjacent interval — just adjust the segment_left_mark without inserting new segment
  4. done — case — bridge: both boundaries fall into different intervals. Adjust segment_left_mark of 2 affected segments, then link up the two to skip the intermediate segments
  5. done — case — wipeout: both boundaries fall into different gaps, wiping out at least 1 interval.
  6. done — case (most complex) — one falls into an interval, the other into a non-adjacent gap.
  7. case — incoming interval left boundary is lower than all boundaries, but right boundary falls into some segment
  8. case — incoming interval is very low
  9. case (special) — if an interval becomes adjacent to another, then merge the two.

Need a sorted tree of all marks + array of segments. Redundant but helpful.

Each segment (interval or gap) is represented by {left mark, right mark} where left <= right. I will save the segment objects into (a linked list and) an array. Even elements are interval objects and odd elements are gap objects. Now superceded by dlist.

I think this problem is all about corner cases. Perhaps start with the complex cases which will take care of the simpler cases. No need to pass Leetcode tests. Due to the pointer complexity, I prefer python.

https://github.com/tiger40490/repo1/blob/py1/py/linklist/insertInterval.py is my solution but I dare not test on Leetcode

max rectangle ] histogram

Q: https://leetcode.com/problems/largest-rectangle-in-histogram/description/. Given N possibly recurring non-negative integers representing the histogram’s bar heights, and given the width of each bar is 1, find the area of largest rectangle in the histogram.

Visually well-defined problem. Kind of naturally-occurring. Very simple data structure. No O() requirement, so I will just try my own solution.

https://github.com/tiger40490/repo1/blob/py1/py/array/maxHistoBox.py is my solution. 100% passed on Leetcode.

==== analysis — heavy on data structure design.

Key insight — one scan to update a clever data structure.

key insight — data structure is not per bar, but per height!

For every bar J, there exists an enclosing max-rectangle of J’s height. We can just compare all of these rectangles.

We might start with two extreme candidates
1) the peak — whose enclosing rectangle is likely slender — O(N) one scan to find all the peaks
2) the lowest bar — whose enclosing rectangle has width N — O(N)

If we paint the histogram as a binary matrix, then this is equivalent to anther problem max all-black submatrix #DP #zhurongbut I think there exists better solutions like O(N logN) or O(N*S) …

–homegrown algo with O[N*S] where S:= #unique heights. The binary search doesn’t show up as logS.

A pre-scan to get all distinct heights. For each distinct height, we maintain a RunRecord object {bestRun, currentRunStart, height}, in a sorted map {height -> record}. In py, I can use a pre-sorted vector of Records, sorted on height

In main scan, As we encounter a new bar of height J, we update these records.

  • if not falling or rising
    • record-J and each record-H below J must have a current run … extend that run (no-op)
  • if rising from height H
    • each record up to H must have a current run … extend that run by no-op
      • iterate the treemap up to H
    • iterate treemap from H+1 to J. start a new run for each record
  • if falling from height P to J
    • record-J and each record-H (where H <J) must have a current run … extend that run
    • iterate treemap from J+1 to P … each record-K must have a current run, indicated by a valid currentRunStart, then this record’s current run has just ended. We update bestRun and put a invalid value into currentRunStart.

At end of the main scan, every record has a bestRun i.e. the duration. I can then calc the area under each bestRun and return the max.

find min substr contain`all my fav chars

Update — a similar sliding window is used in longest substring without repeating chars

Q (leetcode): Given a string Haystack and a string T, find the minimum window in Haystack which contains (at least) all the characters in T according to the frequencies. Time complexity O(n). Eg: minWindow(ccbabccbabcb, bbc)==bcb

If there is such a window, you are guaranteed that there will always be only one unique minimum window in Haystack. <– I thought this guarantee means something but it doesn’t.

Without loss of generality, I will assume the chars are a-z. I believe those Leetcode corner cases will use only 3 chars

—analysis—

For single-string problem, use array indexed by ascii code. I can convert T to such an array to store the required frequencies (reqFrq)

I can construct a shadow array, same length as Haystack with these payloads:

  • if the hay is not in reqFrq, then payload is a special value like nullptr
  • if the hay is in reqFrq, then….?

–SolSW: sliding-window based

  1. Scan Haystack from left and keep count of actual frequency (check against reqFrq each time). I will inevitably find the earliest good window. By construction, both ends of this window are in reqFrq.
    • Note the entire haystack is more than a good window.
  2. Now I slide the fixed-sized window. If I find another good window, with extra chars on the left, then I have found a shorter window, so I truncate my window on the left
  3. continue Step 2

Longest Parentheses run with multiple hierarchies

Q (Leetcode): Given a string containing nothing but the characters ‘(‘ and ‘)’, find the length of the longest valid (well-formed) parentheses substring.

https://github.com/tiger40490/repo1/blob/cpp1/cpp/str/maxParensRun.cpp is my solution 100% tested on Leetcode

–My Single-iteration solution:

Challenge is data structure. I ended up with 2 data structures to be updated during the iteration

  1. A stack (holding openers’ index values) to locate the matching openers
  2. an array to save “scores”

For each closer, I will record the position of the matching opener, then compute the distance (minimum two).

 

 

sequence@{peak within sliding window} O(N) no dequeue

Q (leetcode hard problem 239): Given an array nums, there is a sliding window of size k which is moving from the very left of the array to the very right. You can only see the k numbers in the window. Each time the sliding window moves right by one position, return the max item in it. Total run time should be O(N) but I think within each window we may not always get O(1).

https://www.interviewbit.com/problems/sliding-window-maximum/ is the same problem stated more clearly.

====analysis====

Since there exists a O(N) solution i’m confident I can figure out a O(N) solution

–idea — first scan to identify all the troughs. Items around trough to be removed, but always keep original subscripts of remaining nodes

–latest idea, not using dequeue therefore likely original —

Update — consider a rising sequence before going any further. Virtually all items are showmen…

in my showman-stack, every element {idx,height} will show up on stage at least one episode. I would say each showman goes on stage either when it first enters the sliding window, or before it leaves the sliding window. If that’s the case at each slide i only need to consider the head and tail.

(Now i think this is in the right direction but is messier than it needs to be.)

first forward scan populates this stack and ensures only showmen are included. Let’s work out some examples. First part of first scan would simply find the max among first K items. 2nd scan might be combined with first scan, but for now, my goal is clear and focused — produce a clean stack of showmen-only

  • degenerate case — monotonic sequence requires virtually all elements be included
  • rules on the forward scan —
    • every new item needs to be pushed to the stack as it could be higher than all followers. The tricky logic is “what before the push”
    • Before we examine a new item, top of the stack is always the immediate predecessor. Need to add assert.

pseudo code:

O(N) pre-scan to find all troughs. For each of N items, record the preceding trough. No stack yet.

First scan: check if the most recent trough is within range. If NO then nothing to pop from stack, so simply push new item and advance. Now assuming YES.

while new item >= top of stack __&&__ the 2nd top is within range (distance <=K):

keep pop

while new item >= top of stack && distance between is below K && there would still exist an earlier higher item within range:
//at end of the while loop, we could actually produce some output, but I would postpone it to 2nd scan

(For the last K items in array, we need some O(K) special handling.)

I think both conditions are necessary for popping the stack. Are these two conditions alone sufficient justification to remove the top?

—- segment-based solution #elegant
https://stackoverflow.com/questions/8031939/finding-maximum-for-every-window-of-size-k-in-an-array shows an elegant segment-based algo.

Suppose windows size := 10 and subscripts start at 0.
A) Conceptually we split the array into fixed segments of length 10. (The stub at the end is LG2.) Note unlike the ring algorithm, no physical data structure is implemented for this purely conceptual segments, but this conceptual data structure is essential to this clever solution.
B) Pre-scan – pre-scan the array twice .. rightward and leftward to populate two physical data structures, the LR-lookup and RL-lookup i.e. segment-wise max-till-here. For example,

RL[22] := max(#29 to #22) := Leftward max-till-22 within the enclosing egment@20 enclosing #20 to #29.

With the two auxiliary data structures RL lookup and LR lookup, the solution is mind-boggling simple and elegant. For example,

C) window@22 will overlap segment@20 and segment@30. The 29-to-22 section is covered by RL[22] and the 30-to-31 section is covered by LR[31]. So simply compare RL[22] vs LR[31].

I find this solution more intuitive, more visual than the ring solution. After the pre-scans, there’s no data structure to maintain so the iterations are stateless and extremely straightforward.

Q: what if increasing sequence?
Q: what if decreasing sequence?

—- The same webpage also has a concise explanation of the deque-based solution, kind of similar to but cleaner than my showman idea

• I will use a ring buffer of size=W as a ring is allocated at start-up time (at compile time if W is compile-time constant) and requires no dynamic allocation.
• Ring is FIFO. Terminology — I will say the earlier elements are removed from the Head like a ticketing queue; new elements are appended at the Tail.
• Invariant — I think within this ring, payloads are descending from head to tail.
• Ring holds subscripts, not payloads. Payloads could be strings or floats.
• At each iteration, 0 or 1 head item is removed if it is out of reach.
• At each iteration, incoming item kicks out all smaller tail items, as they are too close to the incoming giant. This operation enforces the invariant.

As a result, head of the ring (as a subscript) always points to the current max payload.

 

isSymetric(root of a binary tree)

Leetcode published solution has a clever BFT solution, easier to implement.

–idea 1 (untested): BFT to list all nodes at a given level
For each node’s enqueue() (including the null nodes), record the path-from-root as a list. As a lighter alternative to this “list”, the path can degenerate to the last step, as a le/ri flag.

Now scan each level from both ends. The left item’s path should mirror the right item’s path.

(Before the scan. confirm the node count is even.)

–in-order dft
first scan records the paths to each leaf node. (Optionally, each path can includes east/west directions).
2nd scan does the same but always visits right child first.

The two outputs should match

tried 3″hard”leetcode Q’s #tests !! 100%

I tried Q4, Q10, Q23.

Observation — they are not really harder in terms of pure algo. I found some “medium” questions actually harder than Q4/Q23 in terms of pure algo.

Beside the algorithm, there are other factor to make a problem hard. For me and my peers, coding speed and syntax are a real problem. So the longer my program, the harder it becomes. Some of the “medium” questions require longer solutions than these “hard” problems.

Logistics of instrumentation is another factor. Some problems are easy to set up and easy to debug, whereas 3D, graph or recursive problems are tedious to set up and often confusing when you try to debug with print’s.

There’s another factor that can make any “medium” problem really hard

merge K presorted lists #O(what)

Q: Merge k sorted linked lists and return it as one sorted list. Analyze and describe its complexity.

Note K could be much larger than N.

https://github.com/tiger40490/repo1/blob/py1/py/linklist/merge4lists.py is my solution.

I feel this is mostly an optimization challenge. I can think of a few solutions

–Sol1: merge 2nd list into first. Then merge 3rd list into first …

https://leetcode.com/problems/merge-k-sorted-lists/solution/ shows that this has higher runtime cost than the brackets solution.

Reason is, each 2-merge-to-1 must visit every node in both lists. So the first list nodes get visited K times!

–Sol1b: brackets.

There are only (log K) levels in the bracket so any list gets visited that many times.

–Sol3: in-place (inefficient)

We maintain K node-pointers for the K lists (K teams)

We also maintain a pointer to the last-added node in the merged list.

first node in K lists are put into a min-heap. Winner (smallest) team would be the “current list”. Now the winner team offers next node and add it into the heap. Winning team ..

What if N=1 and K is 1 billion?

numPad problem: generator

Q (Leetcode problem 17)… Given a string containing digits from 2-9 inclusive, return all possible letter combinations (not permutations) that the number could represent.

2: abc
3: def
4: ghi
5: jkl
6: mno
7: pqrs
8: tuv
9: wxyz

Example:

Input: “23”
Output: [“ad”, “ae”, “af”, “bd”, “be”, “bf”, “cd”, “ce”, “cf”].

Output need not be sorted but I would aim to print each word as sorted and also print all words in ascending order

—–
Group all the er’s into bag2, then all the qi’s into bag7… Generate the strings for each bag independently. After that, problem becomes

Q2: given N (say 11) Sets of unique strings, pick one from each set and concate the N strings as one output. Generate all output. I feel this can’t use the cookbook recipe since input is not one “pool” string but n sets. I think iterative is fine.

idea: Loop variable to keep the N indices (iterators) into each set
idea (dp + yield): generate the output for N=2. save to a collection. then take in next Set.

–yield-generator redrawC() generates …
input “88” we have 6 combos? tt tu tv uu uv vv
input “888” we have 10 combos? ttt ttu ttv tuu tuv tvv uuu uuv uvv vvv

–we need good variable names.
For the 9 digits, every digit is immediately mapped to a name string like ‘2’ -> “er” and I hope not to use the digits any more.
Java would use enum

To minimize confusion, Create typedef LOB as alias for either vector<char> or the string form. Will have 8 const LOB instances. Java would use enum

struct Bundle{
set<vector<char>> clubOfWords;
size_t repeatOfThisButton;
LOB lob; //compile-time constant
};

The utility function would be
Bundle gen(vector<char> const & lob /*lettersOnOneButton*/ , int repeat). This function is fairly simple. for er_5, we have 3^5 possible words in the club

sort input into 222223444 then create map:
“er_5” -> a bundle
“san1” -> a bundle
“si_3” -> a bundle

A major Miletone is when the map is populated with the clubs. Now generate combos … better use “append” approach.

binary search in rotated sorted array

https://leetcode.com/problems/search-in-rotated-sorted-array/description/ has the requirement. I don’t want to look at other people’s solution, so I have reproduced the requirements below. I have not encountered this problem in any coding interview.

====Q: Suppose an array sorted in ascending order is rotated at some pivot unknown to you beforehand. (i.e., [0,1,2,4,5,6,7] might become [4,5,6,7,0,1,2]). You are given a target value to search. If found in the array return its index, otherwise return -1. You may assume no duplicate exists in the array. Your algorithm’s runtime complexity must be in the order of O(log n).

https://github.com/tiger40490/repo1/blob/cpp1/cpp/array/binSearchRoatedArr.cpp is my solution

–Solution 2: clean two-pass algo

first run a binary search to locate the pivot point. This pre-processing is a one-time investment.

Then run a O(1) test to discard one of the 2 segments. We are left with the remaining segment as a regular sorted array. Run binary search on it.

===Q (leetcode 153): find the pivot point, given the original array is ascending.

Not sure if I already solved this.

Compare a[0] vs a[1]. If descending, then a[1] is the answer.
Compare a[0] vs last. if a[0] is lower, then a[0] is the answer.

Initialize le at a[0] and ri at last. check the mid point

If mid is above a[0] then shift le
if mid is below last, then shift ri

sum@arbitrarySubArray, mutable int #Rahul#segmentTree

Q: given an array of mutable integers, implement subArraySum(le, ri), and updateElement(idx, newVal)

This is data-structure heavy. You need correct data structure to support efficient update/query.

Assumption A: Without loss of generality, i will assume original array length is a power of two, such as 8

— Idea 1: carefully segment the array. Maintain array of Segment object {le, sum}

The segments can shrink/expand based on heuristics. For now, I will assume “Segment.le” is immutable.

Every update() will update the Segment.sum in exactly one segment per level.

At the leaf level, there are 8 segments of length one or two. (Given Assumption A, it would be two.)

Next level I will have 4 segments. Each segment at this level consists of exactly 2 leaf segments. Similar to Fenwick tree and segmented binary tree, update() and query() are both O(log N)

update each cell with d2nearest 0 #DeepakCM

(Deepak 2019) tough matrix problem: given a black/white but mostly white matrix, for each white cell, compute the least horizontal/vertical steps (shortest distance) to nearest white cell.

Given a Matrix with 1’s and very few 0’s, replace all the 1’s in the matrix with the adjacent distance to nearest 0. There can be more than one ‘0’ in the matrix
Ex : Input: Matrix contains more than one ‘0’ Matrix = {
1, 1, 1, 1, 1,
1, 1, 1, 1, 1,
0, 1, 0, 1, 1,
1, 1, 1, 1, 1,
1, 1, 0, 1, 1 };
Output = {
2, 3, 2, 3, 4,
1, 2, 1, 2, 3,
0, 1, 0, 1, 2,
1, 2, 1, 2, 3,
2, 1, 0, 1, 2 }

——————-
Theoretical limit: O(N)

I will denote W := # white cells. Optimal solution should/might be O(N). For N = 1 quintillion, the run-time should NOT grow even when there are more white locations.

If we know for sure the scores in four neighbors, then it’s O(1) to work out my score. Which cells have known scores? those next to the zeros.

–idea 4: same as frontier but using a queue to hold frontier cells.

–idea 3 (frontier):
Does this algo work with any graph?
What if the white cells are on the perimeter and the frontier shrinks?
How would two frontiers join?

initial-scan #0a to initialize all non-white locations to -1 (indicating “green”). Save the number of greens in a “greenCount”
initial-Scan #0b to collect all white locations. For each white location F with some green neighbor, saves F in a “frontier” collection, perhaps a linked list.

When “saving” also cache (using a 4-bit integer) the nongreen neighbors, to avoid repeated memory access.

Also create an empty “new frontier” collection.

The initial scans can be combined but at the cost of simplicity.

Invariants before any subsequent update-scan —

  • Every frontier location has some green neighbor.
  • new frontier collection is empty.
  • greenCount is up to date.

update-scan #1 update each adjacent green location to the frontier. Set score to 1, finalized and no longer green. iif a finalized location F has a green neighbor, then save F in the new frontier collection.

After the scan, assert the new and old frontier collections have no overlap. Now swap old and new frontier collections and clear the new collection.

update-scan #2 for each location in the frontier, update adjacent green locations to 2, finalized and no longer green. If such a finalized location F has a green neighbor, then save F in the new frontier collection.

Green count should now reduce. When it becomes 0 we are done.

big-O? At each scan, the new frontier is usually larger than the old frontier until an inflection point. If before each update-scan we keep a count of the frontier collection size, i think they won’t add up to exceed N. Therefore, total complexity is O(N) provided the fanout is a constant like 4. If fanout is unlimited, then possibly O(V+E) since we visit each node and each ege up to 3 times.

–idea 2 (shells):
scan #1 to save all white cell locations, and save all black cell locations in a shadow matrix (bool shadow matrix of the same size as original matrix) and a blacklist (hashtable indexed by cell location)
For each while, compute distance to center. At end of this scan, designte one whte cell as red i.e. closest to center.

scan #1b[O(N)] update all black cells with d2red. Now we have some baseline values, to be improved
scan #2 [O(W)] for each white, update all cells around it with distance 1. Remove the updated cells from the “blacklist”

Also set the bool in the shadow matrix

Scan #3 for each white, update the 2nd shell

details?

If a black cell is updated by 5 white cells in the same iteration, then all 5 whites would be equally distant, so the first of them would remove the black cell from blacklist.

So each black cell is only updated once .. O(N)?

–idea 1 (DP) incomplete:
Scan#1 from top and left, update each cell with a “TL score” i.e. shortest dist ignoring the Bottom-Right quadrant of cells i.e. cells right-and-lower of current.

consider a typical cell on 2nd row. what’s tl score? Can compute using upper and left neighbors? will it ignore a white on the right?

Scan#2 from bottom right, to compute a BR score for each cell

Scan#3 (can be part of Scan#2) combine the data

Rationale — for each white cell, the shortest path can be in either BR quadrant (Scan2) or (Scan1) the other 3 quadrants.

max palindrome substring

https://leetcode.com/problems/longest-palindromic-substring/ seems to be the same problem.

Deepak CM received this problem in a real IV.

https://github.com/tiger40490/repo1/blob/py1/py/str/longestPalindrome.py is one solution, not O(N) at all.

— linear time solutions are on wikipedia, but probably not so intuitive so I give up.

— my simple O(NN) solution 1. first {O(N)} identify each “core” which is defined as

  • “ABA”
  • at least 2 count of the same char like AA

Then {O(N)} for each core, scan both ways in lock steps until we see a mismatch. Then we know the length of this palindrome.

https://leetcode.com/problems/longest-palindromic-substring/solution/ shows a O(NN) DP solution

— my one-scan (original) idea 2, but now I feel unsure.

We stop at every character on our forward scan. When we encounter any seed, we need to keep growing it, as one of these seeds will surely grow into the longest palindrome. However, how do we simultaneously grow so many seeds? We won’t due to efficiency.

Instead, I grow the earliest (oldest) seed only. Any seed encountered afterwards will be shorter , that is until the oldest seed stops growing and gets archived. After the archiving, I start a “manhunt’ — I look at the next oldest [1] seed and determine if it can grow to the current position. If it can’t then it is surely inferior to the just-archived oldest. If it can, then we end the manhunt right there and keep scanning forward

Guarantee to find it — every seed is queued. The only way for a seed to get archived is if it stops growing i.e. we have computed it’s full length

[1] FIFO container is needed

One of the worst test cases is a long black/white string containing only length-1 palindromes. My algo would archive many short seeds… and achieves O(N)

find median@2sorted arrays #Trex untested

https://leetcode.com/problems/median-of-two-sorted-arrays/description/ is similar except X and Y can be unequal length. My solution solves the harder, generalized problem.

This “coding” question is really math problem. Once you work out the math techniques, the coding is simple.

Designate arr1 as the shorter array. compare med(arr1) vs med(arr2)

Suppose former is lower, i can discard lower half of arr1 (s items). Can i discard highest s items in arr2? I think so because upper half of arr2 cannot have that median element, so any subset of it can be discarded

repeat until arr1 is completely discarded or left to a single element .. might be the final median. Now answer is close to the med of the remaining arr2.

–For the equal-length problem, My own idea on the spot — find the median of X and median of Y. If med(X) < med(Y) then discard the lower portion of X i.e. the “XB group”, and higher portion of Y (“YA group”). Then repeat.

  • Note len(XB) == len(YA) == min(len(X), len(Y))/2 := K. So every iteration would shrink the shorter array by half (i.e. K), and shrink the longer array by K. K would drop in value in next iteration.
  • loop exit — When the shorter of the two (say it’s X) shrinks to length 1, we are lucky — find the numbers around median(Y) and adjust the answer based on X[0].

Insight — Why can’t the final “winner”be somewhere in XB group? Because XA + YA already constitute half the population, and all of them are higher.

I always like concrete examples. So Suppose there are 512 items in the lower portion “XB group”, and the higher portion “XA” has 512 items. Suppose there are 128 items each in YB and YA groups. So in this iteration, we discard YA and the lowest 128 items in XB.

Definition of lower portion —
* all lower items up to but not including med(X) If len(X) is odd
* exactly the lower half of X if len(X) is even

binary-matrix island count #DeepakCM

Q: https://leetcode.com/problems/number-of-islands/description/

https://github.com/tiger40490/repo1/blob/py1/py/grid/ has my solution. I don’t want to spend the time passing all leetcode tests! Diminishing return

https://www.geeksforgeeks.org/find-number-of-islands/ is conceptually identical, though using a different criteria for “connected” — diagonal neighbors are “connected”

Hi Deepak

I realize for “connected” problems, there’s definitely a graph underneath, so graph traversal is required. I guess BFT or DST will both find all the nodes connected to “me”.

Given a 1000 x 1000 all-black matrix, I think DFT recursion will go into exactly 1,000,000 levels and overflow stack space.

A high-level (vague) idea is

  • Scan in type-writer fashion. Suppose there are 10×10 = 100 cells either black or write. I paint each cell brown [1] once visited. I also use a integer counter to keep track how many cells already visited.
  • During the scan. If a cell is already visited, I will ignore it and move on
  • Any time I find a black cell, I will start a BFT (I dislike DST) to find all connected black cells.  Once I find all the black cells connected to “me”, I resume the type-writer scan.
  • Once my counter hits 100, I have visited all cells and will exit.

[1] you paint it white, which is probably better.

I thought this problem was too hard and not worth study, but you convinced me that it may come up and I can at least give a vague solution  … better than no solution.

Compared to a binary tree, walking over a matrix is /operationally/ (not conceptually) harder because

  • need to check array bounds. Error-prone and time consuming in coding tests
  • The (invisible) links will not cover all cells. To avoid missing a cell, we still need a full matrix scan. The integration of tree walk + matrix scan is unintuitive, to put it mildly.
  • During the tree-walk, you don’t want to visit any cell too many times or get lost in an endless loop. A brute-force technique — shadow matrix to remember all visited cells.
  • … However, once you get over these nitty-gritty operational complexities, then tree-walk in matrix is not really harder. These algo questions can therefore frustrate and fail many candidates untrained on The technique.

LRU cache #Part 1

I believe my friend got this question in a bbg phone interview.

Design and implement a data structure for Least Recently Used (LRU) cache. It should support the following operations:

get(key) – Get the value (will always be positive) of the key if the key exists in the cache, otherwise return -1.
put(key, value) – Set or insert the value if the key is not already present. When the cache reached its capacity, it should invalidate the least recently used item before inserting a new item.

Follow up:
Could you do both operations in O(1) time complexity?

==Analysis==

Hash table to support lookup. The “value” is a pointer to a link node. Link node also has key/value — 2-way linkage.

slist is a FIFO and grows at tail for every new key/value pair, so head is the earliest pair. Every time a key/value is accessed via hash table, we move the node to the tail.

When capacity is reached, we would remove the tail node. Using the key in that node, we also remove from hash table.

min-stack #bbg

My friend Abhinav (not his real name, to protect his privacy) got this question at Bloomberg internship interview. I added some details to make it clearer:

Q: Design a stack that supports push, pop, top, and retrieving the minimum element all with O(1) time complexity in the worst case.

There exists a function compare(Item A, Item B) that returns 1 if A is greater, 0 if equal, and -1 if A is smaller.

  • getMin() — Retrieve the minimum element in the stack.
  • push(x) — Push element x onto stack.
  • pop() — Removes the element on top of the stack.
  • top() — Get the top element.

==== analysis =====

The most efficient getMin() data structure is the binary heap, but insert/delete runs in O(logN). Therefore I felt the requirements here are computationally impossible. But in this context, we only need to support deleting the last added item 🙂

Key insight — popping is a constrained form of deletion. It’s hard to hit O(1) while supporting unconstrained deletions, BUT with a constraint on deletions, all operations can be O(1).

I need a regular stack + a helper data structure. A linked list or vector can support the stack operations

— The helper — a naturally sorted stack (or vector or deque) to hold record-breakers and record-matchers.

IFF a new minimum (record breaker) or another item matching the existing minimum (record-matcher) is added, we push it to the sorted stack.

After every pop(), we check the popped item. If equal to top of sorted stack, then pop the sorted stack.

At any time, top of the sorted stack is the current minimum.

Vector and deque are actually fine. Interviewer may feel they are inferior to a stack, but with a modified requirement, they may become very useful.

— Here’s a design to beat binary_heap, based on finite-sized (32 or 64-bit int) keys

Assuming the comparison is based on 32-bit integer key (string or floats can also use radix sort). I will use a radix array structure. Perhaps 4 arrays of 256-elements.  Or perhaps a 4×256 matrix, Ultimately the payload stored in the data structure are pointers to stack nodes. This saves memory since a stack node may be 99MB.

Every time we push a node, we derive the new key value in O(1), use the key value to locate its ‘home’ in the radix structure and store the new node’s address in a linked list therein. After the push(), we can compare and update (in constant time) a global variable pointing to the minimum node.

Each stack node also has a back-pointer to the iterator into the list, so before pop() we can use the iterator to locate the object in the radix structure and delete it from the host linked list. We will also update the global variable.

rainfall: streaming mode #Ashish/CSY

In the static, non-streaming context, the optimal solution is perhaps in my gmail (need to check). Now I feel this is identical to the  “medium” leetcode water-container problem:

Q: Given an array of size N (>=2) of non-negative numbers representing N walls, find a pair of walls that can hold the most water.

https://github.com/tiger40490/repo1/blob/cpp1/cpp/array/rainfallMaxContainer.cpp is my solution + an equivalent but much shorter solution by CSY, basically same as the idea below !

The brute force would evaluate (N-1)*N/2 pairs. We can reduce that significantly. Start with 1st/last, which is possibly the final winner. Save this as maxProductOfLoopA and evaluate a[0] vs a[9 i.e. last].

  • Suppose a[0] > a[9], then 1/9 , 2/9 , 3/9 etc are automatically eliminated.
  • Suppose a[0] < a[9], then 0/8, 0/7, 0/6 etc are automatically eliminated.
  • if a[0] == a[9], then 0/8, 0/7 etc and 1/9, 2/9 etc are automatically eliminated

You can visualize it as removing an outer layer from a NxN matrix. Note the matrix is triangular and has exactly one outer row and one outer column at any moment. In the first step, you either remove the outer row or outer column, or both.

Supposed you removed the “*/9” column. In 2nd step, we compute 2nd product at 0/8, and 2nd evaluation of a[0] vs a[8] and remove another outer layer.

In about N steps we should reduce the matrix to 1 remaining cell. This cell could be the final winner so we must evaluate it.

—-

Hi Ashish,

Let me change your question slightly. Input numbers come individually in a stream. Among all possible pairs of numbers, we want to compute and publish the latest maximum product:

    A(i, j) * B(i, j)

, where A(i, j) is the time lag j-i and B(i, j) is minimum(arr[i], arr[j])

NB: We will need to keep all array elements seen so far, since the the winning pair might include the first few.

At any time there’s a current max. When we receive the 89th element, enter loop A:

compute the product for 1st/89th i.e. arr[0]/arr[88]
* If arr[0] > arr[88], then just exit the loop with this product as maxProductOfLoopA. No need to try 2nd/89th or 3rd/89th, as all those products are all smaller than 1st/89th. (This is the genius of the solution you told me.)
* otherwise, compute the product for 2nd/89th. If it exceeds maxProductOfLoopA, then update maxProductOfLoopA. Now check if arr[1] > arr[88]. If yes just exit loop A with maxProductOfLoopA
* otherwise compute the product for 3rd/89th….

Once we exit loopA, update the current max product using maxProductOfLoopA.

For streaming situation, I think this is one good solution, if not the optimal solution.

[15] EPI300 skyline #event-handler

Reusable technique – SQL

PROBLEM STATEMENT (equivalent to Leetcode Q 218)
We have to design a program which helps drawing the skyline of a two-dimensional city given the locations of the rectangular buildings in the city. Each building B_i is represented by a triplet of (L_i, R_i, h_i) where L_i and R_i are the left and right coordinates of the ith building, and h_i is the height. In the diagram below there are 8 buildings, represented from left to right by the triplets (1, 5, 11), (2, 7, 6), (3, 9, 13), (12, 16, 7), (14, 25, 3), (19, 22, 18), (23, 29, 13) and (24, 28, 4).

Input
The input of the program is a sequence of building triplets. The triplets are sorted by L_i (the left coordinate of the building) in ascending order.

====analysis====

A data structure challenge. Once I had the correct data structure … 迎刃而解

Q1 —– For any given x, determine the height in the skyline.
Note If x == R_i, then the ith building doesn’t count. In other words, If you look at the first building, the 1-to-5 range it covers is a half-open interval, sometimes written as [1,5) as this range includes 1 but excludes 5. You can think of [1,5) as [1, 4.99999999] approximately.

A1(brute force): look at each building and decide if it “covers” the point X. Given the pre-sort, most buildings aren’t relevant. A complete solution would be

Select max(h) from myTable t where t.l =< x < t.r

To support this solution, the objects could be stored in two sorted data structures, one sorted by L_i and one sorted by R_i.

Q2 —- draw the skyline.
A2: evaluate Q1 (i.e. get the height) at every L_i and R_i value. This solution is probably suboptimal.

Q2b (Leetcode Q218) —- draw the skyline by outputting a sequence of {x,h} pairs showing the change in height (represented by a new height h) at each vertical edge (marked by x).

Is it possible to do this in one scan after preprocessing the triplets with some clever data structures? [[EPI300]] may have a solution. Here’s my own proposal —

Pre-sort the N buildings into a list by L_i, and sort the same buildings into another list by R_i, and merge the 2 lists into a big sorted list of 2N pointers to N unique objects. Each building shows up twice. Each of the 2N entries consists of {building_object_id, x, boolean flag left_or_right }. We will go through this big list one-pass.

I convert this list into a sequence of “events” as we move along the x-axis. That’s why all left/right edges must be sorted into a single sequence.

Main scan — As we hit the left edge of a building, we include this building in the Alive container (probably a BST). In Alive, we keep the buildings sorted by height. We also maintain a lookup table { building_id, pointer/iterator into Alive}. As we hit the right edge of a building, we remove it from Alive. (No need to remove from lookup since we won’t see the same building again.)

As we hit any edge, we need to determine the impact on the skyline if any. This is when we make use of the Alive system. For any edge, there’s impact iff the target building is taller than all other in Alive.

locate a pair with targetSum==55 #bbg IV #Morris

Update

Q: any O(1) space sort?

Q: From an unsorted array of positive integers, is it possible to find a pair of integers that sum up to a given sum?

Constraints: This should be done in O(n) and in-place without any external storage like arrays, hash-maps, but you can use extra variables/pointers.

If this is not possible, can there be a proof given for the same?

—–Initial analysis—-
I wish I were allowed to use a hash table of “wanted ” values. (iterate once and build hashtable. For Each new value encountered, check if it is in the “wanted” list…)

I feel this is typical of west coast algorithm quiz.

I feel it’s technically impossible, but proof?  I don’t know the standard procedure to prove O(n) is impossible. Here’s my feeble attempt:

Worst case — the pair happens to be the 1st and last in the array. Without external storage, how do we remember the 1st element?  We can only use a small number of variables, even if the array size is 99999999. As we iterate the array, I guess there would be no clue  that 1st element is worth remembering. Obviously if we forget the 1st element, then when we see the last element we won’t recognize they are the pair.
—–2nd analysis—-
If we can find a O(n) in-place sort then problem is solvable [1]. Let’s look at Radix sort, one of the most promising candidates. https://stackoverflow.com/questions/463105/in-place-radix-sort has an implementation for DNA strings.

Assumption 1: the integers all have a maximum “size” in terms of digits. Let’s say 32-bit. then yes radix is O(n) but not sure about space complexity. Now, with any big-O analysis we impose no limit on the sample size. For example we could have 999888777666555444333 integers. Now, 32-bit gives about 4 billion distinct “pigeon-holes”, so by the pigeon-hole principle most integers in our sample have to be duplicates!

Therefore, Assumption 1 is questionable. In fact, some programming languages impose no limit on integer size. One integer, be it 32 thousand bits or 32 billion bits, could use up as much memory as there is in the system. Therefore, Assumption 1 is actually superfluous.

Without Assumption 1, and if we allow our sample to be freely distributed, we must assume nothing about the maximum number of digits. I would simply use

Assumption 2: maximum number of digits is about log(n). In that case radix sort is O(n log(n)), not linear time:(

—– new analysis as of Jun 2017 —-
Can Morris algo sort an array (external storage)? If yes,  then use the 2-moving-pointer algo in locate a pair whose diff=55

However, Morris needs a tree not an array.

[1] locate a pair with targetSum=55 : pre-sorted #O(N) 1-pass

island rainfall problem

#include
#include
#include
#include
#include
#include
using namespace std;
int const island[] = { 54, 50, 54, 54, 52, 55, 51, 59, 50, 56, 52, 50 };
///////////////   Pos # 0   1   2   3   4   5   6   7   8   9  10  11
int const size = sizeof(island) / sizeof(int);
int accu = 0;
//adapted from STL
template
ForwardIterator max_element_last(ForwardIterator scanner, ForwardIterator const end) {
ForwardIterator ret = scanner;
if (scanner == end)
return ret;//empty range, with zero element!
while (++scanner != end)
if (*ret <= *scanner) //"=" means find LAST
ret = scanner;
return ret;
}
//print height and address of a column
void print1(int const* const pos, char const * const label) {
//int const height = *pos;
printf(“%s=%d/%d “, label, *pos, pos – island);
}
void printAll(int const* const L, int const* const l, int const* const h,
int const* const H) {
if (l < h) {
print1(L, “wallL”);
print1(l, “ptr”);
printf(”  “);
print1(h, “ptr”);
print1(H, “wallH”);
} else {
print1(H, “wallH”);
print1(h, “ptr”);
printf(”  “);
print1(l, “ptr”);
print1(L, “wallL”);
}
printf(“%d=Accu\n”, accu);
}
//Rule: move the lo-side pointer only
void onePassAlgo(){
int*loptr; //moving pointer, moving-inward.
int*wallLo, *wallHi; //latest walls
int*h;

//1st we ASSUME the first left side wall will be lower than the first right side wall
wallLo = loptr = const_cast (island);
wallHi = h = const_cast (island) + size – 1;
//2nd, we validate that assumption
if (*wallLo > *wallHi) {
std::swap(wallLo, wallHi);
std::swap(loptr, h);
}
// now lo is confirmed lower than the hi side
printAll(wallLo,loptr,h,wallHi);
printf(“All pointers initialized (incl. 2 walls\n”);
while (loptr != h) {
if (*loptr > *wallHi) {
wallLo = wallHi;
wallHi = loptr;
std::swap(loptr, h);
//printf(“new wallHi:”);
} else if (*loptr >= *wallLo) {//see the >=
wallLo = loptr;
//printf(“wallLo updated:”);
} else {
assert (*loptr < *wallLo);
accu += (*wallLo – *loptr);
printf(“adding %d liter of water at Pos_%d (%d=A\n”, *wallLo – *loptr,
loptr – island, accu);
}
printAll(wallLo,loptr,h,wallHi);
// only by moving the loptr (not h) can we confidently accumulate water
if (loptr < h)
++loptr; //lo side is on the left, move loptr right
else
–loptr; //lo side is on the right, move loptr left
}
}
void twoPassAlgo() {//less convoluted
int const* const peak = max_element_last(island, island + size);
printf(“highest peak (last if multiple) is %d, at Pos %d\n”, *peak, peak
– island);
//(island, island + size, ostream_iterator (cout, ” “));

//forward scan towards peak
int* pos = const_cast (island); //left edge of island
int* wall = pos;
for (++pos; pos < peak; ++pos) {
if (*wall > *pos) {
accu += *wall – *pos; // accumulate water
printf(“adding %d liter of water at Pos#%d (T=%d)\n”, *wall – *pos,
pos – island, accu);
continue;
}
//ALL new walls must match or exceed previous wall.
printf(“found new wall of %d^ at Pos#%d\n”, *pos, pos – island);
wall = pos;
}
cout << "^^^ end of fwd scan ; beginning backward scan vvv\n";
//backward scan
pos = const_cast (island) + size – 1;
wall = pos;
for (–pos; pos > peak; –pos) {
if (*wall > *pos) {
accu += *wall – *pos; // accumulate water
printf(“adding %d liter of water at Pos#%d (T=%d)\n”, *wall – *pos,
pos – island, accu);
continue;
}
//Note all new walls must match or exceed previous wall.
printf(“found new wall of %d^ at Pos#%d\n”, *pos, pos – island);
wall = pos;
}
}
int main(int argc, char *argv[]) {
twoPassAlgo();
accu = 0;
cout<<"—————————–\n";
onePassAlgo();
}
/*
 Requirement — a one-dimentional island is completely covered with columns of bricks.
 If  between Column 
 A(height 9) and Column B(10) all columns are lower, then we get a basin to
 collect rainfall. Watermark height (absolute) will be 9.  We can easily calculate the
 amount of water. If I give you all the column heights, give me total rainfall collected.
 Code showcasing
 – stl algo over raw array
 – array/pointer manipulation
 – array initialization
 – array size detection
 – std::max_element modified
 – std::swap
 */