I think some interviewers may appreciate the brevity.
Here’s a Reusable data structure for many pure algo challenges:
Construct a data store to hold a bunch of “structs” in linked list OR growing vector (O(1) insertion). Then we can build multiple “indices” pointing to the nodes
Here are a few average-O(1) indices:
(Note O(1) lookup is the best we can dream of)
- hashtable keyed by a struct field like a string
- array indexed by a struct field like small int id
- If there’s a non-unique int field, then we can use the same array lookup to reach a “group”, and within it use one (or multiple) hashtable(s) keyed by another field
I find this notation useful.
A path consists of steps.
A step through a 2D grid (or matrix) can only be Up, Down, Left or Right.
An alternative notation is
- North – Up
- South – Down
- West – Left
- East – Right
- In a stressful white-board test, I can factor out dump() and leave it unimplemented until I have time. if I use “print …” then I must implement then and there
- dump() intent is clearer than “print myList”
- later I could add dumpOneList() and dumpAllCategories()
- who knows when I might need to add logic into dump()
In some algo questions, I like to add a tiny “metadata” field to the VO.
eg: BFT showing level
Q: how likely is interviewer to accept it?
A: I feel west coast interviews tend to entertain crazy ideas since algo challenges are more free-flowing, exploratory, thought-provoking. I remember the Bbg FX interviewer (Karin?) said my “in-place” is OK if I append to the existing vector then truncate the original portion.
A: I feel if it’s clever then they may appreciate it.
Space/time trade-off. The metadata field
- can be a pointer to parent or next node (like LinkedHashMap)
- can be an iterator into another data structure
- can be the element’s own address if not already available
- can be a sequence id as sorting key
- insertion id is popular,
- can be a bunch of flags packed into 8-bit integer
- eg: STL uses sentinel nodes. I believe container.end() returns that.
- eg: c-string uses \0 as sentinel value.
- eg: binary search needle can be too high and we have to test if the return value is a real element or the end-of-container, so it’s extremely useful to add another sentinel to guarantee a successful search
When solving timed coding questions on “clean” arrays, it can be useful to append a sentinel node. It can simplify some algorithms which take action after each segment.
P 90 [[Pearls]] introduced a loop coding idiom. Idiom is applicable whenever we loop over a container and have an early-exit “break”. Such a loop has exactly two  conditional tests per iteration, therefore can run faster if we combine the two into one conditional test. This is a small halo idiom for latency. But beyond latency, there are more interesting benefits such as cognitive complexity reduction.
For example, consider the business logic after reaching “normal” end of the container without hitting the early exit. Rarely can we “forget” this business logic and simply exit the function and rely on the implicit return. Instead, this normal-completion scenario requires careful handling and a
cognitive burden. To remind myself, I often place a comment after end of the loop. (Python provides an Else clause for a for-loop.)
In some cases, the business logic after end of the loop is 2 pages away from the early-exit business logic, but they really should be in one module. I hate this situation so much that I always have set a flag and use a “break” so that the two routines are both written after end of the loop.
In all these scenarios, it’s often simpler to append an artificial sentinel element at end of the container! The sentinel is guaranteed to hit the early-exit, so the loop would never exhaust all elements and complete normally. Therefore we can combine the two into one conditional test:)
I usually replace “break” with “exit”, further reducing the control-flow complexity. Such micro simplifications can pay huge dividends in high-pressure timed tests.
Right before the exit/break we may still need to handle normal-completion scenario by checking if we are at the sentinel. Interesting scenario. Now we can combine the logic of normal-completion vs early exit. In many cases, they are (nearly) identical, so we can express the logic in very structured code.
Whether they are identical or not, handling the two scenarios (early vs normal completion) in a centralized module is usually simpler and more structured, reducing the cognitive complexity.
 what if three tests (two breaks)? The mental burden is even worse. The sentinel could reduce it
My own experiment https://github.com/tiger40490/repo1/blob/cpp1/cpp1/miscIVQ/spiral_FB.cpp shows
- had better default-populate with zeros. Afterwards, you can easily overwrite individual cells without bound check.
- it’s easy to insert a new row anywhere. Vector would be inefficient.
- To insert a new column, we need a simple loop
# zero-initialize a 5-row, 8-column matrix:
width, height = 8, 5
Matrix = [[0 for x in range(width)] for y in range(height)]
In any programming language, the underlying data structure is a uniform
pile-of-horizontal-arrays, therefore it’s crucial (and tricky) to understand indexing. It’s very similar to matrix indexing — Mat[0,1] refers to first row, 2nd element.
Warning: 2d array is hard to pass in c++, based on personal experience 😦 You often need to specify the size in the receiving function declaration. Even if this is feasible, it’s unwanted legwork.
 Warning — The concept of “column” is mathematical (matrix) and non-existent in our implementation, therefore misleading! I will avoid any mention of it in my source code. No object no data structure for the “column”!
 Warning — Another confusion due to mathematics training. Better avoid Cartesian coordinates. Point(4,1) is on 2nd row, 5th item, therefore arr — so you need to swap the subscripts.
|1st subscript||2nd subscript|
|height #rowCnt||45 #not an index value||<==|
|width #rowSz ||==>||78 #not an index value|
|example value||1 picks 2nd row||4 picks 5th item in the row|
|variable name||rowId, whichRow||subId |
|Cartesian coordinate||y=1 (Left index)||x=4|
while 1 : if node is None: return None if node.key == k: return node.val node = node.next
The above exit condition is far more visible than in
while node is not None: if node.key == k: return node.val node = node.next return None
Inside a loop, many people prefer if/elif/else. To them, it looks neat and structured.
However, I prefer the messier if…continue; if…continue; if..continue. Justification?
I don’t have to look past pageful of if/elif/elif/…/else to see what else happens to my current item. I can ignore the rest of the loop body.
Beside the current item, I also can safely let go (rather than keeping track) of all the loop-local variable values longer. All of them will be wiped out and reset to new values.
- eg: reverse link list in K-groups — (CS algo challenge) assume there’s no stub, solve the real problem, then deal with the stub
- eg: consumer thread dequeue() method. When empty, it “should” be waiting for a notification, according to Jun of Wells Fargo, but a simpler design returns a special value to indicate empty. The thread can then do other things or return to the thread pool or just die. I think the wait model is not ideal. It can waste a thread resource. We could end up with lots of waiting threads.
- Eg: https://bintanvictor.wordpress.com/2017/09/10/lru-cache-concise-c-implementation/ requirement is daunting, so it’s important and completely legitimate to
simplify lookup() so it doesn’t insertany data. API is simpler, not incomplete
- Eg: find every combination adding up to a given target #Ashish permutation within each combination is a separate issue and not the main issue
- recursive solution is often a quicker route to working code. Once we cross that milestone, we could optimize away the recursion.
- eg: extract isPrime() as a unimplemented function and simply assume it is easy to implement when I get around to do it.
- Eg: show free slots between meetings #bbg I solved a similar and more familiar problem.
- eg: violation check for Sudoku is a tedious but simple utility function. We could assume it’s available
- eg: violation check for n-queens is likewise slightly harder
See also EPI300
With STL, py, java, c#, and every other language, we need to develop fast fingers (ECT) over these basic tasks. Muscle memory is best.
- custom comparitor for a payload Trade class; Specify it in vector sorting or BST
- custom hashcode is only popular in Java algoIV
- — custom tree/link node structs
- declare node class
- initialize a bunch of nodes
- — #1 vector of int (vector of other data type is rarely quizzed):
- populate with hard-coded data
- populate with default values to a fixed size
- copy the container — py/java require explicit ctor
- join 33 vectors
- slice — harder for c++
- reverse-copy and reverse-in-place
- append, prepend, insert
- binary search, lower_bound
- dump, iterate, size
- — #2 string:
- convert between string and vector<char>
- initialize with content
- prepend, append, trim
- insert_by_pos — not easy
- read-write char by index
- successive find of target string from left (or from right)
- comp, sort among strings
- split at custom delimiter
- join 3 strings
- dump, iterate each char, size
- — #3 lookup (usually dict or std::map)
- populate with hard-coded data
- strictly insert
- strictly update
- lookup, possibly with failure
- dump, iterate, size
- — tree map: lower_bound
- — Queue: deque, enque, front, back,
- — Stack: push, pop, top
- — linked list? no special tasks
- — misc essential iterator tasks
- prev(), next(),
Quick sort is not the most efficient in many situations. It’s not the implementation of sort() in many popular languages. Yet, it’s the most valuable sort to learn, primarily because of interview. I think quicksort is a good test of coding abilities. The challenges:
#1 high-level goal of first pass — hard to remember!
#2 the gist of the first pass
#3 implementation in any language — many pitfalls
- pivot Object — the rightmost object in a section can be designated the pivot object. Some people use the 1st object or a random object within the section. To keep things simple, we will assume the values are unique
- final resting place — goal is to put the pivot object into its final resting place within the section, using scan-and-swap
- swaps — are the mechanism of movements
- scan — you must scan in some direction
- partition — after the first pass, the pivot object is in its final resting place and all smaller objects are on its left.
First we need to understand the high-level goal. Then the next challenge is the partition algorithm. The only way to remember the implementation details is writing the code.
On P 146 [[introduction to algorithms]], the procedure partition(A,p,r) does the following on the Section A[p,r] inclusive.
- it progressively shifts the rightmost (pivot) object from r to the grave/anchor position within the section
- it keeps the rightmost object value as the benchmark value throughout the procedure.
returns the new indexof this object. The index is defined in the entire array A.
- it shuffles 0 or more elements within the section
- it doesn’t try to sort any subsection
Upon receiving an unsorted section, the procedure simply puts the rightmost thingy into the grave position within the section.
Corollary: first scene in the quicksort movie actually completes the job of putting the rightmost object into its final resting place as an anchor within the entire array. After that, we focus on sorting the “left-section” and the “right-section” (in separate threads) without worrying about the first anchor object. Within the left-section, first scene completes the job of putting the rightmost object into its grave, a final resting place within the entire array.
Coding note — the recursion is not using a single function name like A calling A itself. Instead, qsort() calls partition() then qsort(). Most of the work is in partition().
Coding note — partition() function isn’t recursive in itself.
For me at least, the sequencing of the 3-piece for-loop is sometimes trickier than I thought. It’s supposedly simple rule(s), but I don’t get it exactly right sometimes. Can you always intuitively answer these simple questions? (Answers scattered.)
A87: ALWAYS absolutely nothing
A29: many statements. They are separated by many statements.
Q1: how many times (minimum, maximum) does the #1 piece execute?
Q2: how many times (minimum, maximum) does the #2 piece execute?
Q3: how many times (minimum, maximum) does the #3 piece execute?
Q: Does the number in A2 always exceeds A3 or the reverse, or no always-rule?
Q29: what might happen between #2 and #3 statements?
Q30: what might happen between #3 and #2? I feel nothing could happen.
Q87: what might happen between #1 and #2 statements?
Q: what’s the very last statement (one of 3 pieces or a something in loop body) executed before loop exit? Is it an “always” rule?
If there’s a q(continue), then things get less intuitive. http://stackoverflow.com/questions/16598222/why-is-continue-statement-ignoring-the-loop-counter-increment-in-while-loop explains the subtle difference between while-loop vs for-loop when you use “continue”.
In contrast, while-loop is explicit. So is do-while. In projects, for-loop is concise and often more expressive. In coding interviews, conditions are seldom perfect, simple and straightforward, so for-loop is error prone. White-board coding IV (perhaps bbg too) is all about nitty-gritty details. The condition hidden in the for-loop is not explicit enough! I would rather use for(;;) and check the condition inside and break.
The least error-prone is for(;;) with breaks. I guess some coding interviewers may not like it, but the more tricky the algorithm is, the more we appreciate the simplicity of this coding style.