finding1st is easier than optimizing

  • Problem Type: iterate all valid choices without duplicates — sounds harder than other types, but usually the search space is quite constrained and tractable
    • eg: regex
    • eg: multiple-word search in matrix
  • Problem Type: find best route/path/combo, possibly pruning large subtrees in the search space — often the hardest type
  • Problem Type: find fist — usually easier

In each case, there are recurring patterns.

Advertisements

val@pattern_recognition imt reusable_technique

Reusable coding techniques include my classic generators, DP, augmented trees, array-element swap, bigO tricks like hash tables and medium-of-3 pivot partitioning.

  • One of the best examples of reusable technique — generate “paths/splits” without duplicates. Usually tough.

More useful than “reusable techniques” are pattern-recognition insight into the structure and constraints in a given coding problem. Without these insights, the problem is /impenetrable/intractable/.

Often we need a worst input to highlight the the hidden constraint. The worst input can sometimes quickly eliminate many wrong pathways through the forest and therefore help us see the right pathway.

However, in some context, a bigO trick could wipe out the competition, so much so that pattern recognition, however smart, doesn’t matter.

 

find any black-corner subMatrix #52%

https://www.geeksforgeeks.org/find-rectangle-binary-matrix-corners-1/

Q: given a black/white matrix, find any rectangle whose all four corners are black.
Q2: list all of them
Q3 (google): find the largest

— idea 2: record all black cell locations and look out for 3-corner groups

a Rec class with {northwest corner, northeast corner, southwest corner}

first pass, For each pair on the same row, create a pair object and save it in hashmap {northwest cell -> list of x-coordinates on my right} We will iterate the list later on.

2nd pass, scan each column. For each pair on the same col, say cell A and C, use A to look-up the list of northeast corners in the hashmap. Each northeast corner B would give us a 3-corner group. For every 3-corner pack, check if the forth corner exists.

— idea 1: row by row scan.
R rows and C columns. Assuming C < R i.e. slender matrix

For first row, record each ascending pair [A.x,B,x] (No need to save y coordinate) in a big hashset. If S black cells on this row, then O(SS).

In next row, For each new pair, probe the hashset. If hit, then we have a rectangle, otherwise add to the hashset. If T black cells on this row, then O(TT) probes and (if no lucky break) O(TT) inserts.

Note one single hashset is enough. Any pair matching an earlier pair identifies a rectangle. The matching would never occur on the same row 🙂 Optionally, We can associate a y-coordinate to each record, to enable easy calculation of area.

After all rows are processed, if no rectangle, we have O(SS+TT+…). Worst case the hashset can hold C(C-1)/2 pairs, so we are bound by O(CC). We also need to visit every cell, in O(CR)

If C > R, then we should process column-by-column, rather than row-by-row

Therefore, we are bound by O( min(CC,RR) + CR). Now min(CC,RR) < CR, so we are bound by O(CR) .. the theoretical limit.

— idea 4: for each diagonal pair found, probe the other corners
If there are H black cells, we have up to HH pairs to check 😦 In each check, the success rate is too low.

— Idea 5: Brute force — typewriter-scan. For each black cell, treat it as a top-left, and look rightward for a top-right. If found, then scan down for a pair of bottom-left/bottom-right. If no then complete the given row and discard the row.

For a space/time trade-off,

  • once I find a black cell on current row, I put it in a “horizontal” vector.

anagramIndexOf(): frqTable+slidingWindow

Q: Similar to java indexOf(string pattern, string haystack), determine the earliest index where a permutation of pattern starts.

====analysis

https://github.com/tiger40490/repo1/blob/py1/py/algo_str/anagramIndexOf.py is my tested solution featuring

  • O(H+P) where H and P are the lengths
  • elegant sliding window with frequency

Aha — worst input involves only 2 letters in haystack and pattern. I used to waste time on the “average” input.

On the spot I had no idea, so I told interviewer (pregnant) about brute force solution to generate all P! permutations and look for each one in the haystack

Insight into the structure of the problem — the pattern can be represented by a frequency table, therefore, this string search is easier than regex!

Then I came up with frequency table constructed for the pattern. Then I explained checking each position in haystack. Interviewer was fine with O(H*P) so I implemented it fully, with only 5 minutes left. Basically implementation was presumably slower than other candidates.

A few hours later, I realized there’s an obvious linear-time sliding window solution, but it would have taken more than the available time to implement. Interviewer didn’t hint at all there was a linear time solution.

— difficulty and intensity

I remember feeling extremely intense and extreme pressure after the interview, because of the initial panic. So on the spot I did my best. I improved over the brute force solution after calming down.

Many string search problems require DP or recursion-in-loop, and would be too challenging for me. This problem was not obvious and not easy for me, so I did my best.

I didn’t know how to deep clone a dict. The Indeed.com interviewers would probably consider that a serious weakness.

## Y revisit old algo Qn #Kyle

I find it hard to be interested in previously solved problems. I think many people (not only those in a hurry) simply skip such problems. However, we are not so “good at” previously solved problems actually.

  • Reason – there are often insights and repeating patterns in an old problem, that require slow and repeated digestion. Solving it quickly is usually not enough.
  • reason — our solutions may not be optimal.
  • reason — there are very smart solutions out there we don’t know.
  • reason — we forget.
  • reason — with a simple tweak the problem can become a new problem. How well and how fast we can solve the modified problem depends on our insight into the original problem.

Therefore, I feel folks tend to trivialize some well-known problems and fail to learn enough from them.

Well-known doesn’t mean well-understood.
Well-known doesn’t mean simple.

Revisiting old problems is boring to many programmers, but can often feels easier than tackling new problems. I don’t get tired easily when revisiting old problems. If you are like me, then it could be more beneficial than tackling new problems.

However, my visible progress is much slower in terms of #problems solved per week. I think Kyle also pointed out something related to this.

find offending section ] originally-sorted array

I rephrase Leetcode 581 Q: There was an ascending int array containing duplicates. Someone reshuffled some elements so no longer sorted. Write a function to determine the shortest subarray needs reshuffle. Before or after this subarray, no change needed.

O(N) time and O(1) space.

==== analysis
Not “easy” as labelled.

I want to find lowest integer that’s out of place. (Do the same on the other end and problem completed.)

first scan to find global min and max values, and ensure they are correctly placed.

Next scan from beginning on the rising slope. At first down-turn, treat the array as two partitions. In the right partition after the first peak, we determine the lowest value, say, -55. Then we will find the correct position for -55, within the Left partition! We will look for the last item equal-or-lower-than -55…. This is the key constraint of this entire problem.

Insight — We know -55 must move left and we don’t care about any other item in the right partition.

Insight — As a concrete illustration, if within left partition, -55 should stand after #7, then we know all Earlier items up to #7 are already correctly placed. Why? All Subsequent items in the left section are strictly higher than -55; and all other items in right section are all (equal or)higher than -55.

— my idea 1 in ON) space : radix sort a clone of the array, then compare to the input array.

subOptimal $$valuable solutions#CSY

My friend Shanyou consistently holds a position like “I don’t need to find those better solutions (forget the “optimal”). If my solution is brute force or non-optimal, at least it works, so it will be good enough for many interviews and good enough for my coding drill. Many candidates can’t find even a brute force solution in the given time, so if I strengthen my capabilities to this level it’s valuable.

Admirable !

That’s inner strength — 定力
That’s much-needed focus
That’s undervalued realism
That’s celebration of every progress
That’s a pat on our own back

The all-or-nothing, single-minded pursuit of optimal solution can often wipe out my motivation, joy, satisfaction, self-esteem.

It’s a common and fatal de-motivation to tell ourselves (like XR … ) “There’s no value in pursuing non-optimal solution.” Well there ARE a few:

  1. pride and achievement from completion of a tough challenge
  2. ECT, speed
  3. syntax idioms
  4. self-mastery, internal locus of control, similar to my yoga practice
  5. absorbency — demonstrate to myself; self confidence in absorbency
  6. joy, self-satisfaction — see exactly what I want{cod`drill beside immediate IV #Rahul
  7. … Other blogs provide other perspectives

!! Many published solutions are non-optimal in O()
!! Most solutions out there are non-optimal in terms of concise implementation, even if matching the best solutions in big-O.

Some interviewers only care about big-O and don’t care about too many loops; Other interviews want an elegant, simple solution; The most extreme interviews (Indeed, FB?) score candidates on syntax and code quality too. Which one do you target? Realistically, I would take Shanyou’s target.

Q: your real goals, motivations for coding drill? Absorbency? Burning pleasure? Satisfaction (as Rahul puts it)? Maintaining a hobby?
So defocus can help. The focus on “optimal” creates negativity.

— Many non-optimal solutions are worthwhile.

  • Some of them offer insight into the structure, constraints… but may not offer enough insight to solve the problem
  • Some of them are non-optimal but optimal solutions to modified problems.
  • Some of them are non-optimal but break down the original problem into smaller, familiar problems. This process builds the xRef I need for thick->thin. The breakdown technique can be reusable.
  • Some of them offer reusable techniques
  • Some of them are non-optimal but very short and clever
  • Some of them (my homemade) are unique and innovative … worthwhile achievements.
  • Some of them are good enough for “2nd-tier” coding interviews

 

concretize asterisk among brackets #Okao

https://leetcode.com/problems/valid-parenthesis-string/

Q (Leetcode 678): Given a string containing only three types of characters: ‘(‘, ‘)’ and ‘*’, write a function to check whether this string is valid.  An asterisk can concretize to empty string or a left or right bracket

====analysis

Kyle reviewed an O(N) solution, but advanced. Not really a “Medium” question if you want optimal. However in an interview perhaps a less optimized solution is good enough?

Worst data set will point out the constraint and structure, as explained in CIV: we wish all corner cases+worst data sets r given and clarify_requirement means

Need to use spreadsheet to work out a good sample data.

I feel confident to crack it if I get a good sample including some worst data.

— published O(N) solution

One scan. At each position ask the question “How many extra openers can there be up to this char” When the answer is negative, we give up and return false, but usually the answer is s a range like 0-2.

Use two global variables to record the lowest answer, and highest possible answer.

  • When we hit a ‘(‘, lo++ and hi++
  • when we hit a ‘)’, lo– and hi–
  • when we hit a ‘*’, then range expands,
    • lo—. Lowest possible extra opener count is one lower now, since the asterisk can be a closer
    • high++. Highest possible extra opener count is one higher now, since the asterisk can be an opener

I think at end of the scan, lo should be zero, i.e. zero-extra-opener is at least possible.

invalid/unbalanced brackets: kernel #62%

Q: given a string of N left+right brackets, you can find out how many invalid chars to remove to make it balanced. Say it’s R. There are up to N-choose-R ways to make it balanced. Print all unique balanced strings in compact form.

====analysis

subproblem: minimum how many invalid chars to remove

Useful preprocessing technique — on the left end, any closers must be removed. Same on the right end. No choice 🙂 We would end up with a valid char on each end. This is nice but optional in my Idea3 below.

— my idea 3 to count minimum cuts

Aha — There will surely be some “kernels” i.e. opener-then-closer in a row. First scan I will remove them, then remove more kernels. This is always optimal if we aim to minimize the cuts

  • [] ] [ [] ] [ [] ] becomes ]
  • []][[] becomes ][
  • [] [ [] [] [] [] becomes [
  • [[[][]][[]] becomes [
  • [[][]]][[]]][[] becomes ]][

What remain are the positions of bad chars. I need to remember these positions.

Case: closers only. Let’s look at one position like #55. We can cut #55 or another closer at an earlier position.

Case: openers only. Similar to the above.

Case: closers-openers. The original string is partitioned into exactly two sections, each similar to the above cases.

compute any sum@subMatrix by O(1) hashmap lookup

suppose origin cell is at northwest [1,1] not [0,0]

Pre-processing — We can incrementally compute each the rooted-submatrix SumToOrigin[r,c] in linear time. Save them in a s2o hashmap (shadow matrix is actually better)

Now Consider the submatrix identified by [2,2] and [3,4] enclosing 6 cells. This submatrix has sum =

s2o[3,4]+s2o[1,1]-s2o[1,4]-s2o[3,1]

12 cells + 1 cell  – 4 cells – 3 cells

So any submatrix sum can be computed in O(1). To go through all NNMM of them requires O(NNMM) where N and M are the board dimensions

value@xRef imt tricks@frontier_Qn: retention

My absorbency is limited. My absorbency is more limited than my time (that’s why I camp out in office when in the mood.) When I’m in the “absorbency” mood,….

… I need to apply my laser energy on the most valuable learning. Many coding drill guys would focus on unfamiliar problems, however, learning a few tricks on a few unfamiliar categories of problems  is unlikely an organic growth.. limited xRef.. not connecting the dots .. no thick->thin. Nowadays I think the more valuable learning is building xRef including

Even if my xRef leads to non-optimal solution, it is still extremely valuable. The non-optimal solutions is sometimes more versatile and can become more effective in a modified problem [3].

If you focus on optimal solution, you may experience wipe-out. Same wipe-out effect due to fixation on passing leetcode tests and fixation on “#new problems solved”

A shift from “frontier” exploration to xRef means slowing down and revisiting old problems.

In some coding problems, you really need some new technique [2], but across 90% of the coding problems, the number of constructs are below 10.

[2] any example?

[3] example:

  • find-all-paths
  • in-order-walk on a non-sorted tree to assign auto-increment nodeIDs, then use pre-order or post-order
  • reusable technique — RBTree as alternative to heap or sorted vector
  • reusable technique — yield-based cookbook
  • reusable technique — radix sort on ints, floats and fixed-length strings as a O(N) wall-blaster. Variable-length string can also achieve O(N) as described in my blogpost on “FastestSortFor”

[18] get ideas@G200 common Q #XR

Update — Now I realize my energy is limited so I need to allocate my spare time between

  • real coding
  • reading the key ideas

Also, I feel no positive feedback when reading the key ideas. I hit de-motivation.

Hi XR,

I now agree with your analysis — if I understand and memorize 1 key point about each problem, by reading leetcode, those points will help me significantly in a coding interview. This is one of the 4 benefits of coding practice #syntax,ECT,BP

  • If the problem is easy, then those key points will save me from running out of time.
  • If the problem is hard for everyone, then interviewer would have a lower expectation i.e. no need to complete it 100%. If I am on the right track (while other candidates are not) then i have an edge.
    1. case in point —- Last Friday I told you about shortest-path problems. I learned the technique from past interviews.
    2. case in point —- remember I told you about my own facebook question (3-way sorter/classifier #FB) Your swap technique is a valuable key point, even though it won’t help me finish the solution within 30 minutes
    3. case in point —- In 2018 I encountered comparable int array shuffling problems within 2 months at MS and LiquidNet. Key points are relevant.
    4. case in point —- In a 2018 interview, I learned to use N-by-N matrix to represent a N-node graph, then I used this key idea in another coding interview.

I wonder if I could get myself to follow a plan similar to yours:

  • every weekend (+ weekday evenings) spend 30 – 120 minutes.
  • First select some interesting” and popular question. If no idea, put it on back-burner
  • after a while, give up and read the solution to get the ideas
  • eg: edit distance
  • eg: punctuating continuous sentence
  • eg: regex
  • eg: maximum all-black sub-matrix

edit distance

The DP idea — compare matrix-path-counter, which is more visual and easier than This one.

Q72 on Leetcode: Given two words word1 and word2, find the minimum number of operations required to convert word1 to word2. You have the following 3 operations permitted on a word:

  1. Insert a character
  2. Delete a character
  3. Replace a character

Comment — Top 100, naturally occurring. I won’t bother to pass all Leetcode tests esp. the load tests. If I pass all non-load tests I would consider my solution decent.

https://github.com/tiger40490/repo1/tree/py1/py/str has my implementation based on a DP idea online, and a spreadsheet illustration. The idea is elegant once you wrap your mind around it.

===analysis===
Starting with the small string (length S), The challenge is to project as many of the S chars to the large string (length L). If we can project 5 chars at most, then … (wrong — the remaining S-5 chars need replacement, and the other L-S chars need insertion.)

–idea2: draw all the projection arrows from S to L. In a good-projection, every arrow on the right should be more slanted than every arrow on the left. We want the largest good-projection. In the opening example, the largest would have 5 arrows, …

—-
None of these ideas has proven effective.

punctuate ContinuousSentence

I believe https://leetcode.com/problems/word-break/description/ is identical.

Q (bbg): Given a dictionary of English words (containing no numbers no underscores) and a very long sentence, someone has removed all the spaces. Please write a program to restore it by adding a space after each word. If there’s no way to parse the sentence just return False, otherwise return True to indicate you found at least one way to parse the sentence.

The sentence can have repeated words.

I was allowed to use either white-board or computer.

Brute force? I struggled for a long time with some vague idea. Interviewer asked me to implement any solution I have, however inefficient , but I was unable to.

Key observation — This is a typical QQ type of algo question. If you remember one key point, then you have a advantage.

Key observation — syntax and ECT is not really hard. Only simple string operations. Data structure is simple. Challenge is an elegant (precious) pure algo.

Key observation — the recursive backtracking solution is working and clever-looking but actually not the very best. Still, recursive thinking is valuable and gave me a first solution. I’m not ashamed of my recursive solution.

Key observation — This problem is considered not too hard because …. the solution is short! But most of us can’t conceive such a solution! I won’t trust the difficulty level in leetcode.com

https://github.com/tiger40490/repo1/blob/py1/py/str/continuousSentenceBBG.py shows my

recursive backtracking  solution. Interviewers basically said it works.

There’s an inherent tree structure in this greedy backtracking algo. This tree has the power to meet the tougher requirement in Q2. As such, this algorithm is more powerful than then algo below.

— More efficient solution: non-recursive, no backtracking

Suppose the sentence has 99 chars ch[0], ch[1] … ch[98]. We maintain an “aboveWater[99]” bool array. A ch[33] is above water if there exists a 4-char bridge word from aboveWater character ch[29] ….

The bool array is assumed underwater initially. We scan it forward once to check and mark selected char as aboveWater.

Once the left section of the array is analyzed, we don’t need to revisit it by backtracking.

The simplest algo — forward iterate every position. At each position i,
look at the last 2 chars j-1~j
look at the last 3 chars j-2~j
look at the last 4 chars j-3~j
… If any is a bridge word, then mark j as aboveWater and move on.

— Q2(Leetcode): Given a non-empty string s and a dictionary wordDict containing a list of non-empty words, add spaces in s to construct a sentence where each word is a valid dictionary word. Return ALL such possible sentences.

%%A: My github solution should solve it even if not optimal. Need to test.

factorize a natural number #AQR

My friend Dilip received this question in a 2017 AQR on-site.

Q: given a natural number (like 8), write a function to output every factorization such as (2,4) (2,2,2). You can ignore or include the trivial factorization (1,8). You can use recursion if you want.
— (incomplete) Analysis

  1. I would first generate all the prime numbers up to sqrt(N)
  2. among them, i would find all the factors. Once I find a prime factor x, keep dividing by x so I know in total I have 3 x’s, 2 y’s and 1 z, or (x,x,x,y,y,z). I call them 6 non-distinct “prime factors”.

From there, I might be able to write a (recursive?) function to output the factorization formulas. The ideal algo automatically avoids duplicate factorizations but Here’s my non-ideal design: generate all 2-way “splits”, all 3-way splits… If I keep all my “splits” in a hashtable, I can detect duplicates. So just treat the 6 factors as 6 distinct factors. Now the problem is well-defined — next split@N boys .

— trie solution based on generate combinationSum compositions #backtrack up] trie+tree

Candidates are the non-distinct prime factors and their products, each a factor of the big number.

— recursive solution by CSY

  • no prime number needed! A major drawback — if the target number is odd, we would still keep testing 2, 4, 6, 8 as possible divisors!

https://github.com/tiger40490/repo1/blob/cpp1/cpp/algo_comboPermu/factorize_AQR.cpp is very short solution by CSY. Here’s my analysis —

  • I believe every time the factorize(60) function finds a small factor like 2, it pushes the factor onto a global stack, then run factorize() on the quotient i.e. 30 — wherein every factorization formula on 30 is “decorated” with the stack.

https://github.com/tiger40490/repo1/blob/py1/py/algo_combo_perm/factorize_AQR.py is my modified/improved python solution

  • I replaced the global vector with a local immutable list on each call stack. It helps me reason. This is also thread-friendly, if the target number is large.
  • It’s highly instructive to work out the expected output from the recursive loops, as in my comments.
  • Just like the continuousSentence problem, the recursive solution is clever-looking but not scalable.