scan an array{both ends,keep`min/max #O(N)

I feel a reusable technique is

  • scan the int array from Left  and keep track of the minimum so far. Optionally save it in an array
  • scan the int array from right and keep track of the maximum so far. Optionally save it in an array
  • save the difference of these two arrays

With these three shadow arrays, many problems can be solved visually and intuitively, in linear  time.

eg: max proceeds selling 2 homes #Kyle

How about the classic max profit problem?

How about the water container problem?

sorting collection@chars|smallIntegers

Therefore, given a single (possibly long) string, sorting it should Never be O(N logN) !

Whenever an algo problem requires sorting a bunch of English letters, it’s always, always better to use counting sort. O(N) rather O(N logN)

Similarly, in more than one problems, we are given a bunch of integers bound within a limited range. For example, the array index values could be presented to us as a collection and we may want to sort them. Sorting such integers should always , always be counting sort.

find any black-corner subMatrix #52%

Q: given a black/white matrix, find any rectangle whose all four corners are black.
Q2: list all of them
Q3 (google): find the largest

— idea 2: record all black cell locations and look out for 3-corner groups

a Rec class with {northwest corner, northeast corner, southwest corner}

first pass, For each pair on the same row, create a pair object and save it in hashmap {northwest cell -> list of x-coordinates on my right} We will iterate the list later on.

2nd pass, scan each column. For each pair on the same col, say cell A and C, use A to look-up the list of northeast corners in the hashmap. Each northeast corner B would give us a 3-corner group. For every 3-corner pack, check if the forth corner exists.

— idea 1: row by row scan.
R rows and C columns. Assuming C < R i.e. slender matrix

For first row, record each ascending pair [A.x,B,x] (No need to save y coordinate) in a big hashset. If S black cells on this row, then O(SS).

In next row, For each new pair, probe the hashset. If hit, then we have a rectangle, otherwise add to the hashset. If T black cells on this row, then O(TT) probes and (if no lucky break) O(TT) inserts.

Note one single hashset is enough. Any pair matching an earlier pair identifies a rectangle. The matching would never occur on the same row 🙂 Optionally, We can associate a y-coordinate to each record, to enable easy calculation of area.

After all rows are processed, if no rectangle, we have O(SS+TT+…). Worst case the hashset can hold C(C-1)/2 pairs, so we are bound by O(CC). We also need to visit every cell, in O(CR)

If C > R, then we should process column-by-column, rather than row-by-row

Therefore, we are bound by O( min(CC,RR) + CR). Now min(CC,RR) < CR, so we are bound by O(CR) .. the theoretical limit.

— idea 4: for each diagonal pair found, probe the other corners
If there are H black cells, we have up to HH pairs to check 😦 In each check, the success rate is too low.

— Idea 5: Brute force — typewriter-scan. For each black cell, treat it as a top-left, and look rightward for a top-right. If found, then scan down for a pair of bottom-left/bottom-right. If no then complete the given row and discard the row.

For a space/time trade-off,

  • once I find a black cell on current row, I put it in a “horizontal” vector.

RBTree O(1)insert quite common ] coding IV

— for single-insert is precise saying that hinted insert is O(1).
— for mass insert
Special case — If we know the input sequence is pre-sorted and going to be “appended” to existing tree nodes, then each insert will always hit the end(). We can rely on hinted insert to achieve O(N) .  is even more encouraging — ” If N elements are inserted, Nlog(size+N) in general, but linear in size+N if the elements are already sorted” so as long as pre-sorted, then we get O(N)

For 64-bit integer or float inputs, we can radix sort in O(N) then mass-insert in O(N).

— for construction from a pre-sorted sequence confirms it’s O(N) if the input range is already sorted.

pre-allocated array as backing store for graph nodes #java No

I think any graph node can use the same technique, but here I present a simple yet interesting use case — a linked list with each node allocated from an array. shows three home-made implementations:

  1. backing array of dummy link nodes, pre-allocated at compile time
  2. backing array of dummy link nodes, pre-allocated from free store aka DMA
  3. backing array is a byte array on heap or data section. Each link node is constructed via placement-new.

Here are a few Advantages that I consider minor because linked list is seldom needed in low-latency

  1. d-cache efficiency
  2. eliminates runtime load on heap allocator, since memory is pre-allocated. See malloc=long considered costly

Advantage #3: For c++ algo questions, this set-up has an interesting advantage — The node address is now an index into the backing array. This index is a natural auto-increment ID , based on creation order.

Now, the biggest advantage of linked list over vector is mid-stream insert/delete. One of the biggest disadvantages is lack of random-access. If nothing happens mid-stream (as in coding questions), then we can achieve random-access-by-id using array as backing store.

If nothing happens mid-stream, then this linked list is physically similar to an array with extra capacity.

This technique won’t work in java because java array of Node is array-of-pointers.

QuickSelect: select nth key#O(1)space

Quickselect has average O(N) runtime but worst-case O(NN), if the partitioning goes bad repeatedly. Randomized pivot can reduce the chance but I don’t think we can eliminate it

Space complexity is O(1) despite the apparent recursive set-up. Tail recursion optimization is usually available to reuse the same stack frame. Otherwise, recursion can be replaced by a loop.

std::nth_element() is linear on average but quadratic in worst case — explains QuickSelect algo

Quickselect is by the same inventor of quicksort.

iterate K pre-sorted uneven immutable lists #FB

Interviewer (Richard) was kind enough to offer me a tip early enough. so we didn’t waste time (which could easily result in out-of-time)

Q: given K pre-sorted immutable lists, each up to N items, return an iterator that on demand yields each of the (up to K*N) items in sorted sequence.

Estimate time and space complexities.


— I first proposed pair-wise merge. Since there are logK passes, Time complexity == O(NK logK)

Space complexity is tricky. Very first pass i have to create a single list up to NK items. Then I can reuse this list in each merge. so space complexity == NK [1], but I said NK*logK. Interviewer wasn’t familiar with this solution and didn’t correct me.

[1] See Suppose 8 lists to merge. I will merge A+B into first quarter of the big array (size NK), then C+D into 2nd quarter… In next pass, I will merge AB + CD in-place using the first half of my big array.

The advantage of this solution — once I create a single merged array, each iteration is O(1). This can be valuable if run-time iteration speed is crucial but initial warm-up speed is unimportant.

bigO insight — merging N pre-sorted arrays is O(N logN), same as merge sort?

— Then interviewer suggested iterating over the K lists so I implemented the solution in

  • Space complexity = K
  • Time complexity:
    • next() O(logK) due to popping. I said Fib heap has O(1) insert
    • init() O(K)
    • hasNext() O(1)

How about a RBTree instead of heap? Space complexity is unchanged.  Time complexity:

  • next() O(logK) for both remove and insert
  • init()  O(K logK), worse than priorityQ
  • hasNext() O(1)

— FB interviewer asked why I prefer to keep state in global var rather than a per-record field

%%A: Runtime heap allocation is slower if the record is bigger. In contrast, the global dictionary is tiny and likely lives in L1-cache

when2introduce new tokens into O()

The extra tokens like A  B C in a O(A+BB+logC) are not always independent dimensions.

  • Case: DFT/BFT have O(V+E) but some people may say V <= E, so why not remove V and say O(E).

I think this estimate misses the point that E could greatly outnumber V. If another algo is O(V+logE) it would probably win.

  • case: radix sort is O(WN), but some may say W <=N, so why not remove W and say O(NN)

Well, W is typically log(unique count among N), though W can be half N.

Some bigO students don’t like too many tokens and may simplify it to O(KK), but I think it’s imprecise. If M or N is very small like logK, then O(MN) is much faster than O(KK).

bigO usable in Average | worst-case #CSY

bigO is about large input data SIZE or GROWTH, not about input data quality.

Therefore, both average-case and worst-case can use bigO notation. Multiple sources specify “average case O(N logN)” —

big-Oh means upper-bound. big-Theta means tight-bound. They refer to the growth of running time as input data grows, regardless of data quality.

FizzBuzz in O(N)

Q (Leetcode ) Write a program that outputs the string representation of numbers from 1 to n. But for multiples of three it should output “Fizz” instead of the number and for the multiples of five output “Buzz”. For numbers which are multiples of both three and five output “FizzBuzz”.

What if the divisors in the requirement is not two but much more and each a prime number?


In the standard solution, there’s a hashtable O(NK) algo — for every int 1 to N, check the K divisors.  Here’s a O(N) solution:

  • first populate the return vector with the basic strings
  • Then make a strided iteration to append “Fizz” on every multiple of 3.
  • Then make a strided iteration to append “Buzz” on every multiple of 5.

The fact that prime numbers become very high very soon help our bigO:

N/3 + N/5 +… + N/31 + N/37… will be lower than 5N

## halo knowledge to impress west coast interviews

My naive list of "halo" technical knowledge that might impress some west coast interviewers.

  • order stat tree
  • Morris tree walk – O(1) space
  • path compression in union-find
  • hash table worst case is no longer O(N) as in java8
  • shortest-path algorithms
  • Fibonacci and other advanced heaps — I discussed it at Facebook
  • tail recursion to reduce space complexity
  • worst case quick sort and randomized pivot selection
  • bucket sort
  • insertion sort as the fastest sort
  • radix sort for floats and strings

BST: finding next greater node is O(1) #Kyle shows that in a std::map, the time complexity of in-order iterating every node is O(N) from lowest to highest.

Therefore, each step is O(1) amortized. In other words, finding next higher node in such a BST is a constant-time operation. confirms that std::next() is O(1) if the steps to advance is a single step. Even better, across all STL containers, iterator movement by N steps is O(N) except for random-access iterators, which are even faster, at O(1).

make unstable sorts stable

For any unstable sort, we can make it stable with this simple technique.

Aha — I have to assume two items of equal key have visible differences such as address. If they are indistinguishable then sort stability is moot.

I will first build a hashtable {address -> origPos}

After the unstable sort, all the items with same key=77 will cluster together but not in original order. Within this segment of the output, I will run another mini-sort based on the origPos values.

This doesn’t affect time complexity. In terms of space complexity, it’s O(N).

CIV: I still prefer RBtree #Rahul

  • goal #1 in CIV — speedy completion of an optimal solution in terms of O().
  • j4 #1 — RBTree is more intuitive more natural to my problem solving, more than priorityQ and sorted vector. Given my #1 goal, I would go for my favorite tools.
  • j4 — if the interviewer gives a new requirement, my RBtree may become useful (51% chance) or become unusable (49% chance)
  • Drawback #1 of RBtree — not supported in python
  • Drawback  — array sorting can achieve O(N) using radix sort or counting sort, esp. in the contrived contexts of Leetcode problems.

Q: what if interviewer feels RBtree is overkill and over-complicated?
A: I would say overall bigO is not affected.
A: I would say RBTree supports future features such as delete and dynamic data set.  Realistic reasons are powerful arguments.

Q: what if interviewer gives a follow-up request to simplify the design?
A: if I already have an optimal solution, then yes I don’t mind replacing my RBTree

hash table time complexity: O(N)^O(logN)^O(1)

Q: When interviewer ask about time complexity, shall I point out my hash table is O(log N) not O(1)?
A: if time permits, I would briefly mention O(N) and O(logN) worst case
A: if no time, then just say typically O(1). I don’t think we would lose many points. has a fairly authoritative answer. Basically, it is indeed O(N) worst case in general but

  • can improve to O(logN) in many situations where java8 technique applies
  • in other cases it is O(N) in theory, but extremely rare in practice. The only known case is a deliberate DoS attack.

linkedHashSet as dynamic sorted collection: O(1)lookup/remove O(logN)insert

linkedHashSet as dynamic sorted collection with O(1) remove/lookup, but insert is O(logN)

Note the java class doesn’t support pinpoint insertion

— My idea:

one-time build a red-black tree then a linkedHashSet therefrom. Te tree is the real container. The linkedHashSet points into the tree nodes.

From now on, additional insert is O(logN) via the tree.

lookup/remove is still O(1) via the hashtable

nth largest element in unsorted array #QuickSelect

Q: Find the kth largest element in an unsorted array. Note that it is the kth largest element in the sorted order, not the kth distinct element.

I think this is mostly a bigO algoQQ problem.

std::nth_element is linear on average .. talks about QuickSelect algo

— idea 6: priority Q (Fib-heap) of size k
if any item is higher than the min, then pop min O(logK) and insert in O(1)
— idea 6: priority Q
Linear time to build it
— idea 5: 95th percentile problem from NYSE
— idea 4: multiple scans
— idea 3: segments
— Sol2: O(N). use the O(N) algo in the blog on “given int array, find median ] O(N)”. Then discard one of the two segments. Then repeat.
Note: Each time the target element must exist in one of the 2 segments.

O(N) + O(N/2) + O(N/4) … -> O(N)

— Sol2a: Use the O(N) qsort partition algo to anchor a (random) pivot element to create two segments. Our target must exist in one of the two, so discard the other by adjusting the le/ri boundaries.

This idea is same as the most voted solution in leetcode discussion.
O(N) on average — we get O(N)+O(N/2) + O(N/4) + … < O(2N)

Note average complexity is acceptable in hashtable!

union-find O(1) data structure

[[CLRS]] has a chapter on disjoint set. It shows a simple implementation of disjoint set using a linked list. Another implementation is a tree. I also used a hashset.

The tree implementation is described in . This data structure is probably not used in the union-find algo challenges.

The IV algo challenges are usually small scale. They can use a primitive, trivial version of disjoint-set but doesn’t really gain any meaningful performance advantage. I think this attitude towards disjoint-set is similar to the prevailing attitude towards hash table — it is just a means to achieve big-O demands. With or without this data structure, the problem is solvable but the real challenge is the big-O requirement.

Two main operations can both be optimized — 1) Union() can use by-rank or by-size and 2) Find() can use path compression. Both optimizations keep the tree height very low. High fan-out is desirable and achievable.

Using both path compression and union by rank or size ensures that the amortized time per operation is essentially O(1).


#1(reusable)AuxDS for algo challenges

Here’s a Reusable data structure for many pure algo challenges:

Pre-process to construct a static data store to hold a bunch of “structs” in linked list -OR- growing vector -OR- growing RBTree , all O(1) insertion :). Then we can build multiple “indices” pointing to the nodes

Here are a few O(1) indices: (Note O(1) lookup is the best we can dream of)

  • hashtable {a struct field like a string -> iterator into the data store}
  • array indexed by a struct field like small int id, where payload is an iterator from the data store
  • If the structs have some non-unique int field like age, then we can use the same key lookup to reach a “group”, and within the group use one (or multiple) hashtable(s) keyed by another struct field

I think this is rather powerful and needed only in the most challenging problems like LRU cache.

word ladder: build graph +! test`every pair

Leetcode Q 127: word ladder — Given two words (beginWord and endWord), and a word list, find the length of shortest transformation sequence from beginWord to endWord, such that:

* Only one letter can be changed at a time.
* Each transformed word must exist in the word list. Note that beginWord is not a transformed word.
* Return 0 if there is no such transformation sequence.
* All words have the same length.
* All words contain only lowercase alphabetic characters.
* You may assume no duplicates in the word list.

Example 1:
beginWord = “hit”,
endWord = “cog”,
wordList = [“hot”,”dot”,”dog”,”lot”,”log”,”cog”]
Output: 5
Explanation: As one shortest transformation is “hit” -> “hot” -> “dot” -> “dog” -> “cog”,
return its length 5.

Example 2:
beginWord = “hit”
endWord = “cog”
wordList = [“hot”,”dot”,”dog”,”lot”,”log”]
Output: 0

==== analysis:
First scan O(NN)to build the bidirectional edges of the graph. Given an existing graph of N words (N can be 1), a new word is compared against each to construct the edge list.

N(N+1)/2 comparisons. No need to be extra clever as this simple O(NN) algo is optimal for small N. No need to worry about the visualization of the graph either because the edge list is a proven representation

Aha — depending on the relative size of N and S (the standard length of every word), the optimal algo is different !

— Idea 2

Now I realize there’s a more efficient algo to build the graph, based on the neglected fact that all strings have equal length.

  • for a new word (always same length S), like abc, create S(=3) “patterns” — *bc, a*c, ab*.
  • each pattern will try to join an existing club or create a new club. Clubs are maintained in a hashtable of {pattern -> list of original words} In java or python, each word is basically a pointer to an an immutable global object.
  • If joining an existing club, then all existing club members are linked to the new word, so new word will now hold a reference to this club as an “edge list”
  • I think this algo is O(N*S). If S is small, then this algo is more efficient than O(NN)

At the end of this phase, if beginWord and endWord belong to disjoint sets then return 0. However I see no simple implementation of disjoint set. Therefore, I will run 2nd scan O(N+E) BFS. But there are many cycles, so we need a hashset “Seen”, or array of size N.

Insight — Array is more elegant than hash table in this case.

To compare two words, char-wise subtraction should give all zero except one char. This last routine can be extracted to a “simple routine to be implemented later”, so no need to worry about it in a white board session.

average_case ^ amortized

>> bigO (and big-Theta) is about input SIZE, therefore, usable on average or worst case data quality

There’s no special notation like bigX specifically for average case.

>> Worst vs average case … is about data quality.

With hash table, we (novices) are permitted to say “quality of hash function” but this is technically incorrect.  My book [[algorithms in a nutshell]] P93 states that IFF given a uniform hash function, then even in worst case bucket sort runs in linear time O(N).

>> Amortized … is about one expensive step in a large number (≅ input size) steps like inserts. Amortized analysis makes no assumption about quality of data, and therefore is a worst-case analysis.

Note Sorting has no amortized analysis AFAIK.

eg: quicksort+quickSelect can degrade to O(NN). Amortization doesn’t apply.

worst case amortized — is a standard analysis
Eg: vector expansion. Worst-case doesn’t bother us.

— statistical-average amortized analysis — is a standard analysis
eg: Hashtable insert is O(1) based on both statistical average AND amortized analysis :

  • Some inserts will hit long chain, while most insertions hit short chain, so O(1) on average, statistically
  • Amortization is relevant iFF an insert triggers a hash table expansion.

What if the input is so bad that all hash codes are identical? Then amortization won’t help. For hash tables, people don’t worry about worst-case data, because a decent, usable hash function is always, always possible.

eg: iterating a RBTree. Data quality doesn’t bother us as RBTree has guaranteed time complexity :). Traversing entire tree is O(N). Therefore statistical average per-iteration is O(1). Worst per-iteration is O(log N) says “amortized analysis is a worst case analysis” , where “worse” means data quality

average case per-operation — can be quite pessimistic
eg: vector insert

eg: RBTree iteration (see above)

worst case per-operation — not common, relevant to realtime systems only.

I feel this analysis easily hits the extreme far-outliers, so the per-operation cost is far worse than typical,

insert to sorted collection of N items: I can beat O(log N)

features/advantages of sorted collection

  • O(N) ascending iteration
  • O(log N) range query
  • O(log N) closest neighbor query above/below
  • O(log N) boolean contains() query
  • O(1) min() max() queries
  • deleteByKey is typically O(log N) but can be O(1) if we construct a hashtable of {key -> iterator} once the collection is built. This hashtable can also accept new items.

Q: insert is typically O(log N), but can we improve on it?

— A: if there’s only one insert, we can hold it in a hidden field of the collection, to be checked at the end of each operation above.

— A: if there are unlimited inserts but all 64-bit integers, then we can treat the incoming integer as a short radix array (8-elements for example). Take each element of the array as a lookup key, look up in an array to find a subtree.

Basically the entire sorted collection is an  8-level tree with fan_out=256

You may say that there are 8 lookups required so O(k) where K=8, but here we assume the size of the radix array is a constant equal to eight. Note the collection size is unlimited and far more than 2^64 if duplicates are permitted.

priorityQ: 2 advantages over RBtree#O(1)add #Part2

RBTree O(1) insert is quite common in coding questions.

[1] Contrary to popular belief, RBTree mass insert (including mass-construction) is O(logN) rather than O(1) per node in the general case. However, see link above.

See lecture notes and SOF post on

Alien dictionary

Suppose Total C characters, and N words


Mostly implementation challenge.

insight — Published solution is mediocre performance as it scans each word exactly TWICE, but luckily “twice” doesn’t affect bigO — O(total char count across all words)

— idea 1: maintain a linked list of “clusters”. Each cluster is {pos, startWordID, optional lastWordID} Each cluster has words with the same prefix up to pos.

copy first letter of N words into an N-array. verify this array is sorted. Now separate the words into up to 26 clusters. Suppose we a cluster of 55 words. This cluster is the payload of a link node. When we look at 2nd char within this cluster, we see up to 26 sub-clusters, so we replace the big cluster with these sub-clusters.

Invariant — the original order among the N words is never changed.

Even if this idea is overkill, it can be useful in other problems.

the verify(arrayOf1stChar) is a util function.

— Idea 4: convert each word to an English word, in O(C).

Then sort the collection. What’s the O()? O(N logN C/N) = O(C logN)

— idea 5: compute a score for each word and check the words are sorted in O(N)

O(1)getRandom+add+del on Bag #Rahul

Q: create an unordered multiset with O(1) add(Item), del(Item) and a getRandom() having the  probability of returning any item  based on the PMF.

Rahul posed this question to our Princeton PhD candidate, who needed some help on the Bag version.

====my solution:
On the spot, I designed a vector<Item> + hashmap<Item, hashset<Pos>>. The hashset records the positions within the vector.

Aha — Invariant — My vector will be designed to have no empty slots even after many del(). Therefore vec[ random() * vec.size() ] will satisfy getRandom() PMF.

add() would simply (in O(1)) append to the vector, and to the hashset.

— del() algo is tricky, as Rahul and I agreed. Here’s an illustration: Let’s say Item ‘A’ appears at positions 3,7,8,11,16 and B appears at positions 2,5,31 (the last in the vector). del(A) needs to remove one of the A’s and move the B@31 into that A’s position.

  1. Suppose the PMF engine picks vec[11] which is an A.
  2. unconditionally O(1) find the item at the last position in vector. We find a B, which is different from our ‘A’
  3. Here’s how to physically remove the A from position 11:
  4. O(1) replace ‘A’ with ‘B’ at position 11 in the vector
  5. O(1) remove 11 from A’s hashset and add 11 into B’s hashset, so A’s hashset size decrements.
  6. O(1) remove 31 from B’s hashset, so B’s hashset size remains

[19] sorting J integers, each in [1,N]

Q: What’s the time complexity of sorting J integer scores which are all [1, N], possibly non-unique?

This is classic counting sort. My book [[algo in a nutshell]] incorrectly says O(J). Incorrect when N is large.

My analysis below uses a total of three dimensions N, J and K, where N and J can be vastly different, and K <=min(N, J), but K could be much smaller.


Textbook counting sort is listed at the bottom. If N is smaller than J, then O(N+J) is dominated by O(J). Now suppose N is huge (like the revenue figure of a company).

Suppose there are K distinct scores among the J scores. I would want a constant-time translation from each distinct score to a distinct counter. I wish there’s a perfect hashing from the K scores to K buckets but I think it’s impossible since the K distinct values are not known in advance. Even with imperfect hashing, I can get O(1) translation from each score value to a distinct counter. I will iterate over the J scores. For each,

  • look up (on hash table) the corresponding counter.
  • If the score is new i.e. missing from the hash table,
    • then create a counter
    • add the “pair” {score -> counter} into the hash table.
    • Also insert the score into a min-heap
  • increment the counter

O(K logK) : Now pop the min-heap K times, each time to get a distinct score in ascending order. Lookup the score in hash table to get the counter. If counter for score 55 the count is 3, then output 55 three times. This output would be a sorted sequence of the original J scores.

— comparing with alternatives: Mine is O(J + K logK). If K*logK exceeds N, then I would fall back to standard counting sort.

The comparison-based sort would be O(J logJ), inferior to mine if K is much smaller than J.

The textbook counting sort would be O(J + N) , using N independent counters.

topK,K-th ranked within a bag: quickselect

For any top-K problem, and for any problems about K-th ranked item, a Fairly efficient solution is the quickselect algo, in O(N). O(N) is the theoretical lower bound since you must examine each of N element.

For top-K, we can use quickselect to partition the array and the upper section would be top-K. This is O(N). In addition, I can think of two O(N logK) solutions.

Keep the container size at up-to-K. For each new item encountered, compare to the minimum in the container. IFF bigger, then remove the minimum then insert new-comer.


insert is O(1) on average but remove() is O(logK)

“If you are inserting a single node for every one you remove, then you lose the advantage of the asymptotic O(1) average insert that heaps provide as the delete would dominate, and you might as well use a BST.” according to


Each remove() and insert() is logK

average O(): hashtable more IMperfect than qsort

In these celebrated algorithms, we basically accept the average complexity as if they were very likely in practice. Naive…

In comp science problems, hash table’s usage and importance is about 10 times higher than qsort

  • I would say qsort is faster than many O(N logN) sorts. Qsort can use random pivot. It degrades only if extremely “lucky;)” like getting a “6” on all ten dice.
  • In contrast, hash table performance depends mostly on programmer skill in designing the hash function, less on luck.

Performance compared to the alternatives — qsort competitive performance is pretty good in practice, but hash table relative performance is often underwhelming compared to red-black trees or AVL trees in practice. Recall RTS.

fastestSortFor: 1-str; arr@Str; 64bit floats #non-comparison

I think the big-O coding question may require this knowledge, because

….real world sorting problems can often use key-based sorting, rather than comparison-based sorting.

Therefore, if your overall solution requires sorting, don’t assume it would become O(J*K+M^2+P* N logN) based on your O(N logN) sorting assumption. Here are some achievable linear-time sorts:

sort O(?) data type ref notes
counting O(N) chars
 radix O(N) 64-bit ints O(Nw) -> O(N)
 radix O(N) floats [3][4] IEEE format
 radix O(N) variable-size strings [1/2] #1 fastest
 burst O(N) variable-size strings #2 fastest. key-based, trie-based
 radix O(N) date/time converting to fixed-length string or fixed-size integers
 bucket? O(N) random ints in python/perl unbounded… No limit on #digits

O(1)space,O(1)search,O(N)sort: tricks

  • every problem asking “top N” — quick-select gives average linear time. Real upper bound is O(NN)
  • Every time I see O(1) space required on an array problem, I think of …. swapping.
  • Every time I see O(1) space required on small int arrays, I think of … save (2D) array indices in the array as payload
  • Every time I see O(1) space required on a list problem, I ask is it a ….. linked list.
  • Every time I see O(N) time required on an array problem, I think of … radix sort, applicable to 64-bit integers, 64-bit floats and strings.
  • Every time I see O(1) search, I think of … hash table and radix …?
  • Every time I see O(1) operation on a sorted data structure, I think of …. append on BST/priorityQ, deque

merge K presorted lists #O(what)

Q: Merge k sorted linked lists and return it as one sorted list. Analyze and describe its complexity.

Note K could be much larger than N. is my solution.

I feel this is mostly an optimization challenge. I can think of a few solutions

–Sol1: merge 2nd list into first. Then merge 3rd list into first … shows that this has higher runtime cost than the brackets solution.

Reason is, each 2-merge-to-1 must visit every node in both lists. So the first list nodes get visited K times!

–Sol1b: brackets.

There are only (log K) levels in the bracket so any list gets visited that many times.

–Sol3: in-place (inefficient)

We maintain K node-pointers for the K lists (K teams)

We also maintain a pointer to the last-added node in the merged list.

first node in K lists are put into a min-heap. Winner (smallest) team would be the “current list”. Now the winner team offers next node and add it into the heap. Winning team ..

What if N=1 and K is 1 billion?

bucketSort: O(N+k) could be better or worse than O(N)

Based on the 53-vote top answer in

It’s obvious (by definition) that O(2N) is equivalent to O(N).

O(N+k) is worse than O(2N) when k is larger, like 4000 times N. For a concrete illustration, say, we have N=5 million strings to sort, using k = 20 billion buckets.

  • It takes constant time to bucket each string, so the first step takes O(N) i.e. grows proportional to N.
  • 2nd step is now dominated by k, since all 20 billion buckets have to be examined. Time complexity here is O(N+k). This is explained in the stackoverflow answer.
  • Therefore, total complexity is O(N+k) i.e. cN + bk with two constant factors c and b. Even if b is small (tcost of looking at an empty bucket), as k grows, at sometime the k term would dominate the N term.

In practice, k is often chosen to be comparable to N, so I don’t think this is a practical issue.

min-stack #bbg

My friend Abhinav (not his real name, to protect his privacy) got this question at Bloomberg internship interview. I added some details to make it clearer:

Q: Design a stack that supports push, pop, top, and retrieving the minimum element all with O(1) time complexity in the worst case.

There exists a function compare(Item A, Item B) that returns 1 if A is greater, 0 if equal, and -1 if A is smaller.

  • getMin() — Retrieve the minimum element in the stack.
  • push(x) — Push element x onto stack.
  • pop() — Removes the element on top of the stack.
  • top() — Get the top element.

==== analysis =====

The most efficient getMin() data structure is the binary heap, but insert/delete runs in O(logN). Therefore I felt the requirements here are computationally impossible. But in this context, we only need to support deleting the last added item 🙂

Key insight — popping is a constrained form of deletion. It’s hard to hit O(1) while supporting unconstrained deletions, BUT with a constraint on deletions, all operations can be O(1).

I need a regular stack + a helper data structure. A linked list or vector can support the stack operations

— The helper — a naturally sorted stack (or vector or deque) to hold record-breakers and record-matchers.

IFF a new minimum (record breaker) or another item matching the existing minimum (record-matcher) is added, we push it to the sorted stack.

After every pop(), we check the popped item. If equal to top of sorted stack, then pop the sorted stack.

At any time, top of the sorted stack is the current minimum.

Vector and deque are actually fine. Interviewer may feel they are inferior to a stack, but with a modified requirement, they may become very useful.

— Here’s a design to beat binary_heap, based on finite-sized (32 or 64-bit int) keys

Assuming the comparison is based on 32-bit integer key (string or floats can also use radix sort). I will use a radix array structure. Perhaps 4 arrays of 256-elements.  Or perhaps a 4×256 matrix, Ultimately the payload stored in the data structure are pointers to stack nodes. This saves memory since a stack node may be 99MB.

Every time we push a node, we derive the new key value in O(1), use the key value to locate its ‘home’ in the radix structure and store the new node’s address in a linked list therein. After the push(), we can compare and update (in constant time) a global variable pointing to the minimum node.

Each stack node also has a back-pointer to the iterator into the list, so before pop() we can use the iterator to locate the object in the radix structure and delete it from the host linked list. We will also update the global variable.

factorial is worst O(n!) > O(2^n)

Factorial complexity is (much) worse than exponential complexity. Look at n! and 53^n, without loss of generality. Model them as two snowballs rolling over 1,2,3,..1000… Initially, the exponential (E) snowball is bigger, but after we pass n=53, the factorial(F) snowball grows by a factor of 54, while the snowball E grows by a factor of 53. From then on, F grows faster and faster and will outgrow E sooner or later.

[[Discrete Mathematics]] puts Factorial as the worst algorithm, but both Factorial and Exponential are unacceptable, while Polynomial is acceptable.

[[algo in a nutshell]] bucket sort, insertion sort..

== bucket sort
This book asserts that bucket sort is #1 top dog when data is uniformly distributed (not normally distributed!) — i.e. can be uniformly partitioned using a fast hashing function. Number of buckets created equals the input data size, just like in a standard hash table. Book shows bench-marking on large samples of random floating point numbers. However, I disagree.
  1.  for special data types like strings and small ints, radix sort is probably faster than bucket sort
  2. Bucket sort is #1 for large data volumes, beating quick sort, heap sort etc. But for nearly-sorted data, Insertion-sort is faster.

Other notes

  • Linked list — used inside each bucket. Arrays are possible alternative, but 50% slower.
  • —-Relation to other sorts:
  • Can be seen as generalization of counting sort
  • A cousin of radix sort in the most-to-least significant digit (MSD) flavor i.e. top-down radix sort
–hash sort, a variation of bucket sort, customized for random strings.
Book shows benchmarking on random strings. Hash sort is #1 for large data volumes, beating quick sort, heap sort etc

I guess if string distribution is unconstrained/unknown (without guarantee of randomness), hash sort will not have any advantage

in both cases, the hash code must be strictly “ordered” , so bucket #1 must hold lowest data items.

== insertion sort
P100. for nearly-sorted data, insertion sort is faster than all other sorts (including bucket sort, radix sorts), often by an order of magnitude shows insertion sort is better than divide-n-conquer for 1) nearly-sorted or 2) small collection shows that MSD radix sort also need to switch to insertion-sort when sample size is small.

== counting sort
P315. “fastest” sort for the right data set. I think a single English word is suitable. However, for nearly sorted data, insertion sort is still faster.

I guess it’s still slower than insertion sort if nearly-sorted.

== Binary search tree vs hash table
P130 discusses when to use which

O(n) sort – count`sort

#1 requirement: keys [note 1] be very small integers, small enough to be array indices.
# 1.1 I think this requirement is met whenever you notice an enum of “possible values”

[1] “values of keys” is perhaps more accurate.

This requirement alone reduces its applicability to a “niche”, but an extremely important niche. In fact, i think every time this requirement is met, we should consider count`sort. How about sorting the letters in a single english word? Remember the “anagram” question?

If the values are small integers of 0,1,2…k, then u need a temporary array of k+1.

count`sort is stable, and therefore usable in radix sort.

a less ambiguous explanation of big O

–If you can only remember one phrase, remember “bound-from-above”
If we say “Algorithm R is O(lg n)”, then number-of-iterations are bound-from-above by lg n multiplied by a constant.

–If you can remember a 2nd phrase, remember “worst case upper bound”
Worst case upper bound is often different from average-case upper bound of a program’s running time. For quicksort, they are O (n-squared) vs. O(n lg n). If we talk of O() in the absence of *-case, it means worst-case-upper-bound. A really long translation of “Algorithm R is O(n-squared)” would be

“iterations are bound from above by n-squared even for the worst-case input array of size n.”

A shorter version: “At most n-squared iterations for the worst input”

Average-case-upper-bound calculation often (if not always) involves probability. Amortized analysis is probability-free.

O(1) constant complexity means ….

1st type of time-complexity to study is the “linear growth” — O (N). As list size N tripples, running time tripples.

“constant complexity” means …. As list size N grows, running time doesn’t grow with N. It’s independent of input size. I think running time may still vary as list grows, but much slower than log N .

Example: somehow, hash search has a constant complexity for search and insert, independent of list size or hash table size(?)

big O, in%%lang

O, o, Omega and theta all try to answer the question “for this algorithm, how many more iterations when N triples?” (Avoid “doubles” to avoid potential ambiguity about 2), where N measures the input.

Example: For the quicksort algorithm, answer is O (N * log N), meaning “iterations increase by a factor of (3 * log 3) when input triples”
Example: For the binary search algorithm, answer is O (log N), meaning “(log 3) times more iterations when input triples”

O () means “no more than iterations”.
O is an upper bound, not a tight bound