3 Aug coding drill

Advertisements

val@pattern_recognition imt reusable_technique

Reusable coding techniques include my classic generators, DP techniques, array-element swap, bigO tricks like hash tables and medium-of-3 pivot partitioning.

  • One of the best examples of reusable technique — generate “paths/splits” without duplicates. Usually tough.

More useful than “reusable techniques” are pattern-recognition insight into the structure and constraints in a given coding problem. Without these insights, the problem is impenetrable.

Often we need a worst input to highlight the the hidden constraint. The worst input can sometimes quickly eliminate many wrong paths through the forest and therefore help us see the right pathway.

However, a bigO trick can sometimes wipe out the competition, so much so that pattern recognition, however smart, doesn’t matter.

 

vector used in hash table bucket

Summary — Same worst case time/space complexity as traditional chaining, but d-cache efficient during lookup, insert and removal.

Background — hash table worst case complexity is due to a bad input (and a weak hash function) leading to heavy collision and a single hot bucket. Load factor will not help as overall hash table size is still small. We end up with linear search in a long linked list. This affects not only lookup and removal. Insert also requires a lookup to prevent duplicate.

— Based on https://en.wikipedia.org/wiki/Hash_table#Separate_chaining_with_other_structures

  • This design makes more efficient use of CPU caching and the translation lookaside buffer(TLB), because slot entries are stored in sequential memory positions.
  • It also dispenses with the next pointers that are required by linked lists, which saves space.
  • Despite frequent array resizing, space overheads incurred by the operating system such as memory fragmentation were found to be small

speed^soft qualities: CIV scoring

If you are very strong, you can achieve both

  • complete the required number of problems in time with no serious bugs
  • demonstrate requirement clarification, edge cases, dry-run, thorough error handling, detailed complexity analysis, coding style.

In reality, some speed coding tests at FB, Indeed etc really favor speed over the other things.

When we take such a test, there is always internal tension between two motivations — speedy completion vs demonstrating other qualities.

Thinking out loud is a common recommendation but it can create unnecessary conversation and slow you down. However, some of these conversations may lead you to the right path.

multicast routing: efficient data duplication

The generic  routing algorithm in multicast has optimization features that offer efficient data duplication.

— optimized message duplication at routing layer (Layer 3?)

https://www.metaswitch.com/knowledge-center/reference/what-is-multicast-ip-routing explains that

Routers duplicate data packets and forward multiple copies wherever the path to recipients diverges. Group membership information is used to calculate optimal branch points i.e. the best routers at which to duplicate the packets to optimize the use of the network.

The multicast distribution tree of receiving hosts holds the route to every recipient that has joined the multicast group, and is optimized so that

  • multicast traffic does not reach networks that do not have any such recipients (unless the network is a transit network on the way to other recipients)
  • duplicate copies of packets are kept to a minimum.

##very few theoretical compSci constructs 4 CIV

Most comp science constructs are too advanced too complicated for a 45-minute coding interview. So reading any comp science book is like my son reading science books not written for his exams. What a candidate need is drills targeted at interviews.” I told friends, based on Leetcode.

A few exceptional interviews (eg: Google) go beyond Leetcode, but still use only simple features of advanced comp science constructs. Here are A few notable comp science constructs to study, mostly advanced data structures

  • [s] trie, suffix array, suffix tree
  • geometry (is dStruct-heavy domain) —
    • nearest neighbor query;
    • query: which (x,y) points fall within this rectangle?
    • line sweep
  • segment tree
  • topological sort – 2 algos
  • [s] disjoint set
  • relationships among in|pre|post-order binTree walks — these insights are valuable for some Leetcode problems.
  • self-balancing algo in RBTree
  • [s] B-tree
  • [s] shortest path algos like BFT
  • [s=only the simple ones are needed in CIV]
  • —- techniques
  • technique: augmented tree
  • technique: back tracking
  • technique: DP and greedy
  • technique bottom-up DP with memoization

Q: How about coding interview books? Can help. They are not as deep or wide ranging as algorithm books.

Q: how about bigO on common data structures? Yes relevant because of the interviewer’s insistence.

 

minimize average distance to N cells]matrix #Kyle 60%

Q: given N red cells in a white matrix, find the best cell anywhere in the matrix, whose average distance to the N red cells is minimized. You can imagine N homes deciding to build a waterpoint.

Distance is like how many horizontal or vertical steps.

=====analysis

Insight — vertical vs horizontal dimensions are independent, so this is really a one-dimension problem.

center of gravity?

Insight — rate of change should be zero at the trough…

5 constructs: c++implicit singletons

#1 most implicit singleton in c++ is the ubiquitous “file-scope variable”. Extremely common in my projects.

  • — The constructs below are less implicit as they all use some explicit keyword to highlight the programmer’s intent
  • keyword “extern” — file-scope variable with extern
    • I seldom need it and don’t feel the need to remember the the details.. see other blogposts
  • keyword “static” — file-scope static variables
  • keyword “static” within function body — local static variables — have nice feature of predictable timing of initializaiton
  • keyword “static” within a class declaration —  static field

~~~~~~  The above are the 5 implicit singleton constructs ~~~~~~

Aha — it’s useful to recognize that when a data type is instantiated many many times i.e. non-singleton usage, it is usually part of a collection, or a local (stack) variable.

Sometimes we have the ambiguous situation where we use one of the constructs above, but we instantiate multiple instances of the class. It’s best to document the purpose like “instance1 for …; instance2 for …”

multicast over public internet

Many people say that in general, multicast over public internet is unsupported. I think many retail websites (including the dominant west coast firms) are technically unable to use multicast.

https://networkengineering.stackexchange.com/questions/47994/is-multicast-on-the-public-internet-possible-and-if-yes-how

As an end-user, You can multicast across the public Internet to another site by using a tunnel that supports multicast.

As a larger organization, like a video provider or an ISP, it is certainly possible to forward multicast packets across their domain boundary (i.e. across an Internet).

To forward those multicast packets to another ISP, you would need a peering agreement with them and use the Multicast Source Discovery Protocol (MSDP), configured on both ends.

While you won’t propagate your multicast across the global Internet, crossing network boundaries with multicast packets is not impossible.

27 Jul coding drill

— presumably solved:

  1. power of 3
  2. Fizz Buzz
  3. first unique char in a string
  4. longest common prefix
  5. grouping anagram
  6. in-place-remove all dupes from sorted array
  7. getByRank() in sorted matrix #51%
  8. longest descending path through matrix #60%
  9. count lower ints to my right #50%
  10. Accounts merge
  11. https://leetcode.com/problems/find-peak-element/
— already-solved problems reviewed
  1. Roman to int
  2. generate all loop-free paths between two nodes
  3. append+maxInRangeK #Rahul
  4. O(1)getRandom+add+del on Bag#Rahul
— pending
  1. intersection of two arrays

homemade clustered sparse array

O(1) lookup and Random Access is hard to achieve among sparse array implementation. Here my sparse array is a special subset of sparse arrays — the populated sections come in clusters. For with special subset, I hope Random access is feasible. My design is Fragment table:

  • My fragment table is a std::deque to hold raw pointers to “fragment arrays” not vectors. I choose the word “fragment” since std::deque has its own internal “segment” that is completely irrelevant.
  • elements 0 to 99 —  live in a 100-array, whose address is saved in fragmentTable[0]
  • elements 100 to 199: live in a 100-array, whose address is saved in fragmentTable[1]
  • elements 200 to 299: live in a 100-array, whose address is saved in fragmentTable[2]
  • elements 500 to 599: all missing, so a special ptr is is saved in fragmentTable[5]
  • elements 603 to 605 are present. A special ptr is is saved in fragmentTable[6]. This fragment consists of a small-footprint 3-array. This fragment also has a field this->offset=3
  • Note the fragmentTable is a deque of void pointers. Each void pointer can be a regular plain old array or a special pointer.

— As a bonus, this design supports unlimited append. Here’s a limited prepend() procedure

  1. to prepend a value X in front of the current sparseArr[0], we first allocate a new 100-array as a new fragment whose offset=99. This new fragment supports 99 future prepend() operations, but each time offset needs decrementing.
  2. save X as the last element in the new fragment.
  3. record the new fragment’s address in a new fragmentTable[0]. This is possible because the fragmentTable is a std::deque.

Note I can also allocate a small-footprint new fragment, iFF no further prepend() is allowed.

— Cornerstone of my design — all fragments have Identical range=100 inspired by the std::deque and the ragged 2D array. This enables simple and fast random access. The underlying arrays are the same size with few exceptions — empty fragments, and tiny-n-frozen fragments.

If one of the “tiny fragments” is expected to get populated in the future i.e. not frozen, then it should be allocated as a full array with bigger footprint. The wasted space should not be wasted for long.


Even if this design is not flexible and general-purpose, it meets my design goals so I’m proud of it.

Many implementations use linked fragments, where each fragment has a “next_fragment” pointer. I don’t see any benefit.

I don’t want to look at pure-java implementation as java array has many restrictions.

count lower ints to my right #52%

https://leetcode.com/problems/count-of-smaller-numbers-after-self/ is labelled “hard”.

Q: given an integer array nums , return a new counts array, wherein counts[i] is the number of smaller elements to the right of nums[i]

====analysis

Order statistics tree , i.e. an augmented RBTree should make O(N logN).

One scan from right. Insert each node into this tree. Before inserting a node of value like 22, we will query the tree getRank(22).

Implementation wise, it’s hard to create a self-balancing BST from scratch. So I think an unbalanced BST might do.

Also, there might be some alternative solutions.

longest descending path through matrix #60%

https://leetcode.com/problems/longest-increasing-path-in-a-matrix/

Q: given an int-matrix that allows 4-way moves,  find longest path with strictly descending nodes. Lateral move disallowed.

====analysis

I have code to generate all paths… generate loopfree paths: graph node A→B

— Solution 1: O(N logN)
O(N) First pass  to construct “waterflow” graph — visit N nodes. For each, check the four neighbors. Add an outgoing edge (in a Node.edgeList) to a neighbor node if it is strictly lower.

Now put all nodes into a min-heap, according to node height. Can be part of first pass.

Second pass we visit each item in the min-heap. For each node, compute longest descending path starting therefrom. Record this path length as Node.score. Note if a node is surrounded by higher or equal nodes, then it has score 0, as in the lowest node.

A higher node AA is always upstream to some “computed” nodes. (We don’t care about any incoming edges into AA). So pick the max of (up to four) lower neighbors’ scores. Add 1 to get AA’s score. This computation is O(1) but the heap pop() makes second pass O(N logN)

Note the lowest node may not be part of the longest path, if it is surrounded by the highest nodes.

## defaultdict tricks #matrix

https://github.com/tiger40490/repo1/blob/py1/py/algo_tree/commonAncestor_Indeed.py demos a few time-saving tricks with defaultdict

  • mydict[key] # without printing or using the return value … would trigger the dictionary to construct a default value for this key IFF missing
  • from key/value pairs, build dict {key -> valueList}

https://github.com/tiger40490/repo1/blob/py1/py/algo_combo_perm/staircase_CSY.py shows

  • defaultdict(int) as a frequency counter.
    • mydict[key] += 1 #works even when key was previously missing in mydict
  • defaultdict(lambda: ‘_’).
    • mydict[(row,col)] can implement a matrix. negative (or other invalid) indices results in default value of string ‘_’. I believe dict key can be tuple but not list.
    • In contrast, the 2D array interprets negative indices as wrap-around !
  • defaultdict(lambda: [[],[]]) creates a list of 2 small lists for any “invalid” key. In contrast defaultdict(list) will create a monolithic list which hits runtime error when you access element of the nested list i.e. dict[someKey][0][anyIndex]

average_case ^ amortized

–> bigO (and big-Theta) is about input SIZE, therefore, usable on average or worst case data quality

There’s no special notation like bigX for average case.

–> Worst vs average case … is about quality of input data.

With hash table, we (novices) are permitted to say “quality of hash function” but this is incorrect.  My book [[algorithms in a nutshell]] P93 states that IFF given a uniform hash function, even in worst case bucket sort runs in linear time O(N).

–> Amortized … is about one expensive step in a long series of steps like inserts. Note Sorting has no amortized analysis AFAIK.

worst case amortized — is a standard analysis
Eg: vector expansion. Worst-case doesn’t bother us.

eg: quicksort+quickSelect can degrade to O(NN). Amortization doesn’t apply.

average case amortized — is a standard analysis
eg: hash table collision. A few operations happen to hit long chain, while most operations hit short chain, so O(1) on average.

Amortization is relevant iFF an insert triggers a hash table expansion.

What if the input is so bad that all hash codes are identical? Then amortization won’t help. For hash tables, people don’t worry about worst case, because a decent, usable hash function is always, always possible.

average case per-operation — can be quite pessimistic
eg: vector insert

worst case per-operation —

RBTree O(1)insert quite common ] coding IV

— for single-insert
https://en.cppreference.com/w/cpp/container/map/insert is precise saying that hinted insert is O(1).
— for mass insert
Special case — If we know the input sequence is pre-sorted and going to be “appended” to existing tree nodes, then each insert will always hit the end(). We can rely on hinted insert to achieve O(N) .

http://www.cplusplus.com/reference/map/map/insert/  is even more encouraging — ” If N elements are inserted, Nlog(size+N) in general, but linear in size+N if the elements are already sorted” so as long as pre-sorted, then we get O(N)

For 64-bit integer or float inputs, we can radix sort in O(N) then mass-insert in O(N).

— for construction from a pre-sorted sequence
https://en.cppreference.com/w/cpp/container/map/map confirms it’s O(N) if the input range is already sorted.

getByRank() in sorted matrix: priorityQ^RBTree

https://leetcode.com/problems/kth-smallest-element-in-a-sorted-matrix/

====analysis

recombinant binTree pyramid, where “water” flows east or south.

  • first level has one node .. lowest value. Add it to pq (i.e. priorityQ)
  • pop the pq and insert the two downstream nodes
  • total K pops, each pop is followed by up to 2 inserts

Heap will grow to up to K items, so each pop will be up to logK

Total O(K logK). To achieve this time complexity, we can also use a RBTree. The tree nodes can come from a pre-allocated array.

pre-allocated array as backing store for graph nodes #java No

I think any graph node can use the same technique, but here I present a simple yet interesting use case — a linked list with each node allocated from an array. https://github.com/tiger40490/repo1/blob/cpp1/cpp/lang_66mem/slistFromArray.cpp shows three home-made implementations:

  1. backing array of dummy link nodes, pre-allocated at compile time
  2. backing array of dummy link nodes, pre-allocated from free store aka DMA
  3. backing array is a byte array on heap or data section. Each link node is constructed via placement-new.

Here are a few Advantages that I consider minor because linked list is seldom needed in low-latency

  1. d-cache efficiency
  2. eliminates runtime load on heap allocator, since memory is pre-allocated. See malloc=long considered costly

Advantage #3: For c++ algo questions, this set-up has an interesting advantage — The node address is now an index into the backing array. This index is a natural auto-increment ID , based on creation order.

Now, the biggest advantage of linked list over vector is mid-stream insert/delete. One of the biggest disadvantages is lack of random-access. If nothing happens mid-stream (as in coding questions), then we can achieve random-access-by-id using array as backing store.

If nothing happens mid-stream, then this linked list is physically similar to an array with extra capacity.

This technique won’t work in java because java array of Node is array-of-pointers.

power-of-2 sized hash tables: implications

https://dzone.com/articles/hashmap-internal  and http://coding-geek.com/how-does-a-hashmap-work-in-java/ explain that java HashMap uses bucket count equal to power-of-two. A few implications

  • the modulo operation is a fast bit-masking i.e. select the lowest N bits: hash & (length-1)
  • an unlucky hashCode() may not offer dispersion among the lower bits. Assuming 16 buckets so only the lowest 4 bits matter, but hashCode() returns predominantly multiples of 8. Then two out of 16 buckets would be white hot. To guard against it, HashMap performs a rehash on the hashCode() output.

I implemented the same feature in https://github.com/tiger40490/repo1/blob/py1/py/88miscIVQ/re_hash_table_LinearProbing.py

Note rehash is ineffective if hashCode() returns very few distinct values. In java8, there’s one more protection against it — RBTree as alternative to linked list… See RBTree used inside java8 hashmap

RBTree used inside java8 hashmap

https://en.wikipedia.org/wiki/Hash_table#Separate_chaining_with_other_structures is a language-neutral discussion.

https://digitalmars.com/d/archives//640.html#N803 shows an early implementation

they are implemented as a hashed array of binary trees. 
My experience with them is that they are very efficient.

— JEP 180 is the white paper to introduce a self-balanced tree as an alternative to linked list.

https://dzone.com/articles/hashmap-performance shows with one million entries in a HashMap a single lookup taken 20 CPU cycles, or less than 10 nanoseconds. Another benchmark test demonstrates O(logN) get(key) in java8, but O(N) in java7, as traditionally known.

If two hashCode() values are different but ended up in the same bucket (due to rehashing and bucketing), one is considered bigger and goes to the right. If hashCodes are identical (as in a contrived hashCode() implementation), HashMap hopes that the keys are Comparable, so that it can establish some order. This is not a requirement of HashMap keys.

If hashCodes are mostly identical (rare) and keys are not comparable, don’t expect any performance improvements in case of heavy hash collisions. Here’s my analysis of this last case:

  • if your Key.equals() is based on address and hashCode() is mostly the same, then the RBTree ordering can and should use address. You won’t be able to look up using a “clone” key.
  • if you have customized Key.hashCode() then you ought to customize equals(), but suppose you don’t implement Comparable, then you are allowed to lookup using a clone key. Since there’s no real ordering among the tree nodes, the only way to look up is running equals() on every node.

http://coding-geek.com/how-does-a-hashmap-work-in-java/#JAVA_8_improvements says

A given bucket contains both Node (linked list) and TreeNode (red-black tree). Oracle decided to use both data structures with the following rules:

  • If for a given index (bucket) in the inner table there are more than 8 nodes, the linked list is transformed into a red black tree
  • If for a given index (bucket) in the inner table there are less than 6 nodes, the tree is transformed into a linked list

With the self-balanced tree replacing the linked list, worst-case lookup, insert and delete are no longer O(N) but O(logN) guaranteed.

This technique, albeit new, is one of the best simple ideas I have ever seen. Why has nobody thought of it earlier?

RBTree: technical notes #Realtime

Red–black trees offer … worst-case guarantees, valuable in real-time applications. The Completely Fair Scheduler used in current Linux kernels and epoll system call implementation[19] uses red–black trees.

This valuable guarantee is valid only on always-balanced trees, but no need to be strictly balanced. In fact, AVL is more rigidly balanced but lacks this guarantee.

In contrast, hash table doesn’t offer worst-case guarantee in the face of hash collision. In fact, Java 8 HashMap uses RBTree in additional to linked list… see my blogpost RBTree used in java hashmap.

After an insert or delete, restoring the red-black properties requires a small number (O(log n) or amortized O(1)) of color changes (which are very quick in practice) and no more than three tree rotations (two for insertion). Although insert and delete operations are complicated logically, their times remain O(log n).

The AVL tree is another structure supporting O(log n) search, insertion, and removal. AVL trees can be colored red-black, thus are a subset of RB trees. Worst-case height is better than the worst-case height of RB trees, so AVL trees are more rigidly balanced. However, Mehlhorn & Sanders (2008) point out: “AVL trees do not support constant amortized deletion costs”, but red-black trees do.[25]

doubly-linked list as AuxDS for BST

I find it a very natural auxDS. Every time the BST gets an insert/delete, this list can be updated easily.

Q: How about the self-adjustment after an insert or delete?
%%A: I think this list is unaffected

With this list, in-order walk becomes really easy.

https://leetcode.com/problems/kth-smallest-element-in-a-bst/solution/ is the first place I saw this simple technique.

lowest missing+ve int#Codility #70%

Q: Write a function int solution(int[] A);  that, given an array A of N integers, returns the smallest natural number that does not occur in A. For example, given A = [1, 3, 6, 4, 1, 2], the function should return 5.
Given A = [1, 2, 3], the function should return 4.
Given A = [−1, −3], the function should return 1.

• each element of array A is an 32-bit signed int
• expected worst-case time complexity is O(N);
• expected worst-case space complexity is O(N).
* Elements of input arrays can be modified.

https://leetcode.com/problems/first-missing-positive/description/ is similar but O(1) space and average O(N) time!

—— my analysis —–

The mutable and O(1) space hints at — saving array indices as payload !

—- Idea A:

first scan to swap non-positives to the end and remember the new boundary (say #111).

In the same scan also determine min and max. Suppose min=55.

Another scan to reduce all payloads by 55 so they fall in the range of 0 to max-55.

Now use CSY technique .. check array@0-N in-situ #Nsdq#contrived

—-solutionB: make_heap O(1) space but O(N logN) time. Build a min-heap based on the array in O(1) space, then keep calling min().

  • make_heap shows random access container (vector used in demo), and “rearrange”, implying O(1) space
  • make_heap shows O(N) to construct the heap

min() is O(1) but delete_min() is O(log N), so overall we probably get O(N logN)

—-solutionC: radix sort. https://en.wikipedia.org/wiki/Radix_sort#In-place_MSD_radix_sort_implementations shows an in-place binary radix sort.

First pass to transform all negative numbers to 0. Then iterate the sorted array and check for the earliest “gap”. Worst case — you get 1,2,3… without gap, so answer is the 1+ largest array element.

O(W*N) where W is width of the largest integer. If we assume 64-bit then W is a constant:)

http://www.drdobbs.com/parallel/parallel-in-place-radix-sort-simplified/229000734 is another in-place radix sort.

java array footprint: hypothetical calc

In low-latency or high-volume java apps (like exchange-facing), I think array of primitives is quite popular due to its efficiency advantage. Here I review a few potential footprint issues. Suppose we have an array AA of 12 ints, each 32-bit.

  • AA carries housekeeping data such as the array length (i.e. 12). AA also has the “standard” object header, of twelve (or eight) bytes… See size of Object.java, and how does it add up@@
  • The 12 ints take up 48 bytes, allocated in contiguous memory… lucky. For 12 Trade objects, we would get 12 scattered allocations.. expensive for runtime allocation and cache-unfriendly.
  • So we have 12 + 48 = 60 bytes so far. There might be padding for alignment. 64-bit CPU probably uses 8-byte alignment as shown on https://stackoverflow.com/questions/44468639/memory-alignment-of-java-classes
  • Finally, footprint is 64 bytes, but I have not verified it.

ask`lower base2reduce mgr expectation

This plan didn’t work out at Macq. Expectation was still too high.

The logic is, if my coworkers get total comp 200k and I ask only 160k, then I’m more likely to get some bonus. Even if I underperform them, I would still hit somewhere below 200k.

Now I think if I qualify to stay, then there will be some bonus even if my base is, say 190k. Hiring managers would not agree to a 200k base and run the risk paying doughnut bonus to a qualified employee.

## Y revisit old algo Qn #Kyle

I find it hard to be interested in previously solved problems. I think many people (not only those in a hurry) simply skip such problems. However, we are not so “good at” previously solved problems actually.

  • Reason – there are often insights and repeating patterns in an old problem, that require slow and repeated digestion. Solving it quickly is usually not enough.
  • reason — our solutions may not be optimal.
  • reason — there are very smart solutions out there we don’t know.
  • reason — we forget.
  • reason — with a simple tweak the problem can become a new problem. How well and how fast we can solve the modified problem depends on our insight into the original problem.

Therefore, I feel folks tend to trivialize some well-known problems and fail to learn enough from them.

Well-known doesn’t mean well-understood.
Well-known doesn’t mean simple.

Revisiting old problems is boring to many programmers, but can often feels easier than tackling new problems. I don’t get tired easily when revisiting old problems. If you are like me, then it could be more beneficial than tackling new problems.

However, my visible progress is much slower in terms of #problems solved per week. I think Kyle also pointed out something related to this.

oldtimers trapped ] pressure-cooker #Davis

See also G3 survival capabilities #health;burn rate

Q: Why are so many old timers unable to get out of a shitty job in a pressure-cooker company?

  • A small number of old timers change job but some of them regret as new job turns out worse. Other old timers hear those war-stories and prefer the certainty of known pain in the current job. They don’t feel confident they can find a better job.
  • bench time — Many old timers can’t afford bench time due to low reserve and high burn rate.
    • Old timers tend to be very unused to unstable income.
    • Damien (Macq) relied on his bonus payout..
    • Some had no savings at all.
  • Most old timers won’t even consider a 6M contract.
  • Most old timers will never consider a voluntary pay cut to get a more comfortable job
  • golden handshake — Some old timers (UBS?) can’t let go of the promised compensation package. They would not resign and give up on a 100k promised windfall
  • some old timers worry about job loss so much that they work their ass off to keep the job, even if the workload is unreasonable/unfair. I think in GS I was like them. I have a friend in a similar situation. I think the boss wants to let him go but may feel merciful, so they keep him but give him shitty tasks and demand fast turnaround at good quality…. unreasonable/unfair workload.
  • Some old timers may have a wife who completely opposes job change due to potential impact on family.

Q: why XR and I can afford to quit a job, stay home, and find a lower job?
A: Supportive wife, burn rate, passive income and savings.

##deeply felt Priorities b4 U.S.→SG@45 #big-picture

  1. — priorities over the next 2-10Y horizon
  2. Career[a] longevity till 70, probably on wall st, not in Singapore or West Coast. A related keyword is “relevance” to the tech economy.
    1. On Wall St, I continue to keep a keen focus on robust technologies like core Java, cpp, SQL, sockets, core threading, common data structures. Outside Wall st, jxee (and possibly web stacks) offers the best market depth and demand.
    2. Compared to Wall St, West coast is possibly low priority for now as I don’t see long term visibility.
  3. my wellness — Need to guard against PIP-hell, trapped… A “stability factor” arguably more impactful than GC, housing, schooling… One of the key signs of wellness is weight and calorie count.
  4. boy’s education — I don’t know if U.S. system is better for him
  5. GC — a G5 priority in my current plan, primarily on the back of the longevity factor.
  6. — 2nd tier
  7. increase precious time with grand parents — fly business class to NY.
  8. preparing for war at new job — short-term, immediate but actionable.
  9. wife’s and daughter’s life-chances in U.S. — important but not much I can do now
  10. more passive income to reduce the cash flow stress
  11. Saving up for U.S. housing — not much I can do now.
  12. I now have a deep desire but limited hope to keep up my 细水长流 motivation for coding drill and QQ learning. Burning pleasure; self-esteem; satisfaction; absorbency.
  13. leaving a good impression with MS manager

[a] I didn’t say “income”. I think more important to me is my marketability (+ relevance, in-demand ..). Career longevity is the basis of entire family’s well-being for 15Y until kids start working.

Salary — is kinda secondary because the room for improvement is negligible in my biased mental picture.

j8 MethodRef #cheatsheet

Need to developer more low-level insights as QQ…

  • Out of the four kinds of method refs, only AA) static-method and BB) specific-instance kinds are popular. The other two types are obscure.
  • q[ :: ] is used in all four kinds
  • I think the static-method kind is most readable and most intuitive. The javadoc tutorial features a BB example that should be implemented as static method IMHO.
  • a lambda expression has parameter types. A method ref has none and must be converted to a lambda expression. Where does compiler infer the parameter types? I think it is inferred from the calling context.

QuickSelect: select nth key#O(1)space

Quickselect has average O(N) runtime but worst-case O(NN), if the partitioning goes bad repeatedly. Randomized pivot can reduce the chance but I don’t think we can eliminate it

Space complexity is O(1) despite the apparent recursive set-up. Tail recursion optimization is usually available to reuse the same stack frame. Otherwise, recursion can be replaced by a loop.

std::nth_element() is linear on average but quadratic in worst case — https://stackoverflow.com/questions/11068429/nth-element-implementations-complexities explains QuickSelect algo

Quickselect is by the same inventor of quicksort.

https://en.wikipedia.org/wiki/Quickselect

get_majority_elem]unsorted array,O(1)space #90%

Q: Given an array of size n, find the majority element. The majority element is the element that appears more than ⌊ n/2 ⌋ times. You may assume that the array is non-empty and the majority element always exist in the array.

====analysis
worst input: odd length, only X and Y. X occurs once more than Y.

hash table solution needs O(N) space since there can be N/2 distinct values. To improve space complexity, how about quick select? Discard the smaller side and pick another random pivot.

Median-finder algorithm can solve this problem, using std::nth_element() which uses QuickSelect… O(1) space despite recursive set-up.

— idea 3 O(1) space: random pick then verify
Random pick and count the occurrence of this pick. Is it more than N/2? Within a few trials we should find a good pick.

K-palindrome #careercup 20%

Q (Facebook India, careercup): A k-palindrome is a string which transforms into a palindrome on removing at most k characters. Implement check(str, k). str has at most 20,000 characters. 0<=k<=30

Sample Test Case: Input : abdxa 1 -> false
====analysis
It’s crucial to zoom into the worst input early on — binary string with the correct “kills” well-dispersed.

Insight — once we group the chars by ascii code, I can see a successful palindrome is really an A-palindrome interleaved with a B-palindrome and C-palindrome…

This problem is related to the (tougher) problem of “max palindrome sub-sequence”

Eg k=33.
–idea 3: per-club distribution table
each char gets a dedicated club/distro, where we record all positions occupied by this char.
Since K is small, then every distro should be almost “balanced” around the midpoint.
— idea 7
There can be only up to 33 possible midpoints, including those in-between. (If all kills hit the upper end, then midpoint drops by 33/2.) Let’s try each possible midpoint.

Insight — once we pick a midpoint, we know how many we can kill above/below. For example, if we pick the highest midpoint, then all kills must hit lower half. The extreme midpoints are easier due to stringent constraints.

If we pick a central midpoint, then we can kill about 33/2 in each half, but the kills must match — nice constraint.

For each midpoint picked, we run Algo B:
At init-time, we compute the defects in each club/distro using Algo B1:
given the midpoint and the positions of all J’s, the defect count is the number of kills (killing a J or non-J) to balance J’s distro…

We will compile all 26 defect counts.  Now we start growing our seed at the chose midpoint. We expand both ways. If next 2 positions belong to different clubs J vs L, we evaluate the two kill options using Algo B2
One of the two choices will reduce overall defects. We first look at J’s distro. The J kill would improve or worsen J club’s defect.

If there’s an efficient way to compare evaluate the two kills or update the default counts, then we have a decent solution.

–idea 9: frq table. If there are 35 odd clubs, then 33 of them can become even but still 2 odd clubs –> immediately failure. However, This idea fails with the binary string.

## coding drill 19 Jul

  1. https://leetcode.com/problems/factorial-trailing-zeroes/
  2. https://bintanvictor.wordpress.com/wp-admin/post.php?post=31671&action=edit — merge sort in-place

— presumably solved

  1. https://leetcode.com/problems/group-anagrams/
  2. https://leetcode.com/problems/merge-two-sorted-lists/
  3. https://wordpress.com/post/bintanvictor.wordpress.com/31650 — pacific/atlantic
  4. https://wordpress.com/post/bintanvictor.wordpress.com/31657 — max-product subarray
  5. https://leetcode.com/problems/contains-duplicate/
  6. https://leetcode.com/problems/majority-element/
  7. https://wordpress.com/post/bintanvictor.wordpress.com/21764 find min in rotated array
  8. https://bintanvictor.wordpress.com/wp-admin/post.php?post=31592&action=edit — T.isSubtree(S)
  9. https://bintanvictor.wordpress.com/wp-admin/post.php?post=31792&action=edit — longest run@same char
  10. 3-sum

— previously solved problem forgotten and refreshed

  1. Leetcode #139 word-break. .. not really “medium”
  2. https://bintanvictor.wordpress.com/wp-admin/post.php?post=31745&action=edit — max sub difference
  3. https://bintanvictor.wordpress.com/wp-admin/post.php?post=31749&action=edit — min range to include all clubs
  4. https://bintanvictor.wordpress.com/wp-admin/post.php?post=31736&action=edit — find lone-wolf among AAABBBCCC
  5. https://bintanvictor.wordpress.com/wp-admin/post.php?post=31755&action=edit — flip binTree
  6. https://bintanvictor.wordpress.com/wp-admin/post.php?post=31758&action=edit — serialize linked list: each node points to a random node

— unsolved:

  1. https://leetcode.com/problems/valid-parenthesis-string/
  2. https://bintanvictor.wordpress.com/wp-admin/post.php?post=31825&action=edit — K-palindrome

 

T.isLikeSubtree(S) #60%

Q (Leetcode 572): Given two non-empty binary trees s and t, check whether tree t has exactly the same structure and node values as a subtree of s. A subtree of s is a tree consists of a node in s and all of this node’s descendants. The tree s could also be considered as a subtree of itself.

====analysis
https://leetcode.com/problems/subtree-of-another-tree/solution/ relies (as “primary check”) on the payloads of  each tree node, but in some trees, all payloads are empty or all payloads are either True or False. In these cases, the comparison of payload is only usable as a secondary check. The primary check must be structural. See key in a tree node

The O(N+K) is plain wrong.

I guess the 2nd solution (and possibly 1st solution) would compare every node in S to the root of T. I think there are more efficient solutions using subtree size and subtree height as secondary checks – more reliable than payload check.

My solution below uses BFT + pre/post/in-order walk !

— Preliminary step: post-order walk to get subtree-size, subtree-height at each S node + T root. (I will skip other T nodes.). Suppose T is size 22 height 4. We will look for any node Y of size 22 and height 4 and a matching payload. This would eliminate lots of S nodes:

If T height is more than 2, then lots of low-level S nodes are eliminated.
If T height is 2 or 1, then T size would be at most 3. Most high-level S nodes are eliminated.

— Solution 1: For both T and S, We take in-order walk to assign incremental IDs, then take pre-order walk to produce an array of IDs that represent the tree structure.

Can We run a level-aware BST. Only one level need to be examined … wrong!

I think the in-order walk itself is what I need. Find any node Y in S that matches size+height+payload of T root. Suppose ID(Y)=44 but ID(T root) = 4, then simply shift down by 40 and do a linear scan. Must check payload, but not height/size.

 

min int_range2include all clubs#presorted 60% careercup

Q (careercup): You have k uneven lists of pre-sorted integers, N ints in total. Find the smallest range that includes at least one number from each of the k lists. For example,
List 1: [4, 10, 15, 24, 26]
List 2: [0, 9, 12, 20]
List 3: [5, 18, 22, 30]
The smallest range here would be [20, 24] as it contains 24 from list 1, 20 from list 2, and 22 from list 3
==== Analysis
Looks rather contrived but very popular.
— solution 1 O(N): moving window algo:
O(N) merge all K clubs’ items into a big array, where payload includes the original clubId of each item. We end up with
[0/2nd 4/1st 5/3rd 9/2nd 10/1st …]

Once I come up with this big array, I feel confident to conceive a O(N) moving-window algo.

A ‘basic window’ is any subarray having representatives from each club.
A ‘tight window’ is a basic window that can’t shrink further, i.e. the 1st and last items are the only reps of their clubs.

Now findInitiaLwindow(), similar to findNextWin().
Simple procedure —
As we build the window, we update a hashtable like {club1: position 1,3,6; club2: position 0,5; club3: position 2,4,..}
Basically an array of queues.
We update this table each time we increment the front pointer in search of the next basic window.
When we find a basic window, this table should have all K queues non-empty and at least one of them singular, probably the last updated queue.

Now shrink this window via shrink(). Here’s a fast algo but more tricky and no O() impact —
For each queue, find the last added i.e. right-most position.
Put these K positions in a container(actually no container needed).
Now out of these K positions, find the minimum. Say 22. I claim 22 is the left edge of a smaller window.
Keep shrinking if we can but I doubt we can.

Here’s a slow shrink algo —
For the current window, look at the clubId of the left edge (back ptr).
Use the clubId to locate a queue in the table.
Head item in the queue should match the position of back ptr.
If not empty, pop this queue and increment the back ptr by one; else exit shrink()

Now we have a tight window, we remember its size (and update current winner).

After shrink(), we call findNextWin() —
Suppose the basic window’s earliest item is a clubX.
Now we move back ptr (clubX queue is now empty).
Now we move front pointer i.e. right edge of the window, looking for the next clubX item.

* findNextWin() is O(1) per front ptr increment. Only touches the tail of queues
* shrink() is O(1) per back ptr increment. Only touches the head of queues
* therefore, at most N front ptr moves and N back ptr moves.. O(N)

PriorityQ can speed up some, but won’t improve bigO

## reasons4salary_diff ] ibank #Davis,Jay

Primarily supply-demand driven

* Reason (demand-side): age discrimination — Davis pointed out that many ibank and other employers would accept (reality) to pay higher salary to a fresh grad than an old-timer because these employers long for fresh blood

* Reason (demand-side): core dev teams — are more central than peripheral teams like devops (eg QAPM), QA, DBA, reporting,.. Consider Ling, the QAPM devops scripting guru.

* Reason (demand-side): Jay Hu pointed out that hard-core dev skillset — is perceived as more valuable, harder than “tools” including reporting tools, devops tools, automation tools, vendor products that encapsulate the underlying complexities.

Similarly, c++ skillset is often perceived as harder than coreJava
Similarly, coreJava skillset is often perceived as harder than jxee
Similarly, c++java skillset is often perceived as harder than scripting

On the supply side, there are fewer hardcore c++ developer than coreJava developers, and a lot more jxee developers. However, there’s also the Curious case of quant dev — tight supply, but dwindling demand.

If you have some brain power (like the 华中理工 compScience friend of Jay Hu), then don’t be put off by hard-core dev, even though it’s dry, boring, tough, unglamorous. See my blogpost on my geek profile.

Pacific/Atlantic #51%

https://leetcode.com/problems/pacific-atlantic-water-flow/

Q: Given an m*n matrix of non-negative integers representing the height of each unit cell in a continent, the “Pacific ocean” touches the left and top edges of the matrix and the “Atlantic ocean” touches the right and bottom edges. Water can only flow in four directions (up, down, left, or right) from a cell to another one with height equal or lower.

Print all cells where water can flow to both the Pacific and Atlantic ocean.

====Analysis
peek? not yet
— solution 1:
Create m * n nodes in a 2D grid. Two more nodes:
All north|west boundary nodes have an edge From Pacific node
All south|east boundary nodes have an edge From Atlantic node

Between two adj cells, there’s 1 or 2 directed edges, from low to high. Intuition — tracing pollutant upstream. An arrow from P to Q means Q is suspect.

How do I represent the graph? Each node can hold a list of “suspect neighbors” (higher or equal).

Run BFT or DFT from Pacific node and turn on flag in each reachable node.

Run the same from Atlantic. Every reachable node with the flag set is a double-suspect to be printed.

clean-up: non-overlapping intervals #70%

Q (L435): Given a collection of intervals, find the minimum number of intervals you need to remove to make the rest of the intervals non-overlapping.

Note:

  1. You may assume the interval’s end point is always bigger than its start point.
  2. Intervals like [1,2] and [2,3] have borders “touching” but they don’t overlap each other.

==== analysis:

I think this is same as the greedy room scheduler. Schedule the earliest-ending task, so to to maximize accepted meetings.

A deselected meeting is an interval removed.

recent algo questions: mostly presented in intArray+binTree

binary tree + int_array represent 60% of my recent algo questions. With string and matrix included, we get 80%.

  • “int_array” also include array of int_tuples. Other arrays are rarely quizzed
  • In the real world, binTree is less common than arrays , but trees are more common than binTree. I think interview questions tend to get watered down to the binary species. K-ary trees are rarely quizzed
  • matrix — 95% of Matrix problems are 2D grid problems or integer array problems.
  • graph — Most graph problems are presented as matrix or binary trees, occasionally with cycle.
  • slist — shows up in 5-8% of problem descriptions. Some problems need slist as auxDS.

Leetcode is representative of a constellation of websites.

iterate K pre-sorted uneven immutable lists #FB

Interviewer (Richard) was kind enough to offer me a tip early enough. so we didn’t waste time (which could easily result in out-of-time)

Q: given K pre-sorted immutable lists, each up to N items, return an iterator that on demand yields each of the (up to K*N) items in sorted sequence.

Estimate time and space complexities.

====analysis

— I first proposed pair-wise merge. Since there are logK passes, Time complexity == O(NK logK)

Space complexity is tricky. Very first pass i have to create a single list up to NK items. Then I can reuse this list in each merge. so space complexity == NK [1], but I said NK*logK. Interviewer wasn’t familiar with this solution and didn’t correct me.

[1] See https://www.geeksforgeeks.org/sort-array-two-halves-sorted/. Suppose 8 lists to merge. I will merge A+B into first quarter of the big array (size NK), then C+D into 2nd quarter… In next pass, I will merge AB + CD in-place using the first half of my big array.

The advantage of this solution — once I create a single merged array, each iteration is O(1). This can be valuable if run-time iteration speed is crucial but initial warm-up speed is unimportant.

bigO insight — merging N pre-sorted arrays is O(N logN), same as merge sort?

— Then interviewer suggested iterating over the K lists so I implemented the solution in https://github.com/tiger40490/repo1/blob/py1/py/88miscIVQ/itr_K_presortedLists_FB.py

  • Space complexity = K
  • Time complexity:
    • next() O(logK) due to popping. I said Fib heap has O(1) insert
    • init() O(K)
    • hasNext() O(1)

How about a RBTree instead of heap? Space complexity is unchanged.  Time complexity:

  • next() O(logK) for both remove and insert
  • init()  O(K logK), worse than priorityQ
  • hasNext() O(1)

[19] zbs cf to GTD^compiler+syntax expertise

Why bother — I spend a lot of time accumulating zbs, in addition to QQ halos and GTD know-how

I have t_zbs99 and other categories/tags on my blogposts showcasing zbs (真本事/real expertise) across languages. Important to recognize the relative insignificance of zbs

  • GTD — localSys or std tools … goal is PIP, stigma, helping colleagues
  • QQ — goal is mobility. See the halo* tags. However, I often feel fake about these QQ halos.
  • zbs — goal is self-esteem, respect and “expert” status
    • Sometimes I can convert zbs knowledge pearls to QQ halos, but the chance lower than I wished, so I often find myself overspending here.

I also have blog categories on (mostly c++) bulderQuirs + syntax tricks. These knowledge pearls fall under GTD or zbs.

post-order walk: create valuable subtree statistics

https://github.com/tiger40490/repo1/blob/cpp1/cpp/algo_binTree/subTreeStats.cpp is rather simple and short, showing stats like

  • max path sum from current node to any subtree node. Note a path must be non-empty, so this max can be negative
    • We can even save in each node this actual path
  • subtree height,
  • subtree size,
  • d2root — tested but doesn’t use subtree statistics
  • — can be easily implemented:
  • subtree payload sum
  • subtree max payload… These stats are powerful AuxDS

Applicable on k-ary tree

reverse post-order mirrors pre-order #diagram+Dot

For a given binTree the pre-order sequence is mirrored by the reverse-post-order sequence.

“Reverse” := “right_child then left_child”

I had an “Aha” moment when reading P 389 [[discrete mathematics]]. Insightful illustrated path-tracer diagram showing that pre-order = ABCDEFGHIJ with root as first visited, and J=right most leaf node. The reverse-post-order = JIHGFEDCBA with root as the last visited, due to post-order. Note the DOT is on the Left for both reverse-post and pre.

(https://en.wikipedia.org/wiki/Tree_traversal#Depth-first_search has standard diagrams with a DOT on the left for pre-order; post-order has dot on the right of each node)

Note any post-order curve is less intuitive less visual (than pre-order) because the curve hits a root of subtree upon leaving ! The earlier encounters are touch-n-go. The dot can reinforce this key insight. The dot can help bridge the conceptual gap.

Similarly post-order sequence is mirrored by reverse-pre-order. For the same binTree above, reverse-pre-order = AFGHJIBDEC (not mentioned in the book). Post-order = CEDBIJHGFA, as printed in the middle of P 389. Note root is last visited due to post-order.

max proceeds selling 2 homes #Kyle

Q: you own a residential and an investment property. You have the historical price time series on both. You want to see the max aggregate amount you could have sold both, provided you must sell the investment property before the residential property. You can pick a time to sell the investment and a later time to sell the residential.

I created this questions for

find offending section ] originally-sorted array

I rephrase Leetcode 581 Q: There was an ascending int array containing duplicates. Someone reshuffled some elements so no longer sorted. Write a function to determine the shortest subarray needs reshuffle. Before or after this subarray, no change needed.

O(N) time and O(1) space.

==== analysis
Not “easy” as labelled.

I want to find lowest integer that’s out of place. (Do the same on the other end and problem completed.)

first scan to find global min and max values, and ensure they are correctly placed.

Next scan from beginning on the rising slope. At first down-turn, treat the array as two partitions. In the right partition after the first peak, we determine the lowest value, say, -55. Then we will find the correct position for -55, within the left partition! We will look for the last item same or lower than -55. This is the key constraint of this entire problem.

Insight — We know -55 must move left and we don’t care about any other item in the right partition.

Insight — if within left partition, -55 should be after #7, then we know all earlier items up to #7 are already correctly placed. Why? All subsequent items in the left section are strictly higher than -55; and all other items in right section are all (equal or)higher than -55.

— my idea 1 in ON) space : radix sort a clone of the array, then compare to the input array.

count unique squares ] matrix #70%

Q1: Count unique squares within a R*C matrix. A unique square is identified by two opposite corners. For example, [0,0] and [1,1] identifies a 4-cell square; [2,3], [2,3] identifies a 1-cell square.

Q2: print out all of them.

====analysis

Q1 Feels like a math problem, but printing is a algo question

— solution 1: Let’s focus on height-2 for now. I will denote the subset of squares having height-2 as “2-subset”. (The 1-subset is trivial.)

Insight — 2-subset is naturally non-overlapping with the 3-subset, 4-subset etc. These non-overlapping subsets constitute the whole solution. This fact is extremely valuable to the count algorithm.

For each height-2 square, I will focus on the north-west corner i.e. the corner among four, having lowest r and c IDs. We can divide the 2-subsets by the rowID. For a 55-row/44-column matrix, there are 54 height-2 squares with rowID=0. Same count with rowID=1 etc.

Aha — each square can be identified by the north-west corner {rowID, colID} + height. So at the higher level I divide the whole solution set by height. At the lower level, I sub-divide the set by the “low rowID”

bigO CIV: clarify_requirement means..worst_data

———- Forwarded message ———
Date: Tue, 9 Jul 2019 at 23:10

Big-O means “worst” case complexity. We need to be doubly sure what “worst” case data look like. (Note it doesn’t mean ridiculously rare data like integer overflow.)

On high-end coding interviews, these “worst data set” frequently shed light on the structure of the original problem, and often point to the correct direction for an efficient solution. (Note a nice-looking solution is not efficient if it exhibits poor time complexity in the face of a worst yet realistic data set. Such a system may not scale well in practice, given its Achilles’ heel.)

If interviewer wants a solution optimized for bigO time complexity, you can waste precious time tackling the wrong problem. The wrong problem is the problem defined by your idea of “worst data set”, but the real worst data set is very different.

Sometimes, the worst data is _subtly_ different but it points to a different algorithm direction. It may point to an iterative solution rather than recursion, for example.

The original interview question may not be that hard iFF we identify the worst data set early on, but sadly, we run out of time.

For some tough problems, the first (sometimes the only) challenge is quickly identifying worst data set. Interviewer always gives you the simple data set, NEVER the worst data set. It’s your job to identify the worst data set… often the key insight into the problem.

It’s no exaggeration to say — identifying the worst data set early or too late can make-or-break your chance at this employer. You may kiss good-bye to this job opportunity exactly because you are too slow to identify the worst data set. I know what this means. Other candidates probably got shot by this arrow on the heel, without knowing what happened.

Time is ticking as soon as the problem is presented to you. If interviewer says time is not a factor… take your time.. we want quality and thought process … I just ignore it. Our speed is always part of assessment. Therefore, a very common failure scenario is —

.. tractable problem but candidate runs out of time after wasting time on preliminaries like using incorrect worst data set to reason about the (wrong) problem.

print height@subtree at every node #post-order

For these problems, elegant implementation is expected.

Note in-order is for binary tree, but post-order is for any tree.

== Q: given Any tree without cycle, print the size of subtree at every node

I think post-order walk is usable.

== Q: given Any tree without cycle, print the height of subtree at every node. Also print the node address just for identification

Intuition — dft post-order walk. Need to become more familiar with the detailed steps. https://www.techiedelight.com/calculate-height-binary-tree-iterative-recursive/ has details.

== Q (Leetcode): Given a binary tree, determine if it is height-balanced — a binary tree in which the depth of the two subtrees of every node never differ by more than 1.

I can try it on Leetcode .. no need to set up test harness myself but I might be frustrated

Intuition– post-order walk. A simple recursive algo — for any node, find the its subtree height including itself and return it. Before returning, also compare the left vs right subtree heights

array of sd::byte^Unsigned-char as “byte array”

Background: Suppose you need to store or transfer arbitrary binary data.

In RTS, we used char-array. Some say we should use unsigned char. It is the only data type that is guaranteed (by the ANSI C Standard) to have no padding bits. So all 8 bits in an unsigned char contribute to the value. None of them is a padding bit. I think std::uint8_t is similar but less common.

Contrary to some online posts, unsigned-char type is different from “char” —

In C++17, std::byte is probably preferred because only the bitwise operations are defined. I believe you can reinterpret_cast a pointer to std::ptr. In https://github.com/tiger40490/repo1/blob/cpp1/cpp/lang_66mem/slistFromArray.cpp, I used none of the above — I used placement-new 🙂

In java we use the primitive “byte” type — an 8-bit signed integer.

realities@Canada universities #Davis

Davis agreed the quality of education is good in Canada universities, but students are less aggressive (I would say hungry) than in the U.S. He studied in a lesser-known college in Canada. His (and classmates’) salary grew but remained much lower than in U.S. even in pre-tax.

I feel U.S. pays high salary to software engineers. Canada may not.

(Same can be said about U.K. and France.)

Davis said “no investment banking jobs” (Same can be said about Australia.)

He gave examples to show — job opportunities are much fewer there.

(Same can be said about Australia.)

U.S. college graduates can find internships + job in the U.S., but Canadian college grads can’t so they usually work in Canada, with much lower salaries. Like in other countries, only a minority can break into more lucrative job markets overseas. There’s clearly an invisible barrier between Canada and U.S. job market. I thought there was none. Now I think the barrier is visa —

Canadian graduates need visa to work in U.S. TN is usable but still employers need extra legwork. Overall, Davis perceives a real barrier of entry. In 2016 I experienced something similar — Beyond the heavy door there’s a land of lucrative job opportunities. I wrote a blogpost about it.

  • I overestimated the “level playing ground” in U.S. job market. In reality, local candidates have an advantage bigger than I assumed.
  • I tend to overestimate my (or my kids’) raw talent. Without extraordinary talent we can kill the interview but we are not so extraordinary.
  • I underestimated the entry barrier.
  • Best example — I can see the advantage of Wall St as a job market relative to London (per Ram) and Singapore. Even a mediocre java guy can earn 150k. However, so many competent techies in Asia experience many obstacles when they try to break into Wall St. Most of them don’t bother to try.

hash table of integer_ratios #slope

It’s not possible to store arbitrary ratios as floating point values in a hash table. We hit this problem when matching line slopes.

https://docs.python.org/2/library/fractions.html#fractions.gcd provides a solution

  • the gcd(int1 int2) function finds the greatest common divisor of two integers. This is the key to our problem.
  • convert a string like ‘x/y’ to a simplified Fraction object
  • convert a decimal 
  • convert a float 

I think other languages also provide the same gcd functionality.

low_churn+low_stress domains

–high churn high expectation is worst but I have no experience

  • devops – churn due to integration with automation tools

–low churn high expectation

  • xp: Macq quant-dev domain
  • xp: PWM: SQL + Perl + java
  • xp: Qz

–high churn low expectation (low caliber)

  • xp: dotnet at OC
  • mediocre web shops

–low churn low expectation is ideal

  • xp: RTS socket + mkt data
  • xp: citi muni: bond math + coreJava

 

Minor QnA with Stroustrup

Q: memory efficiency is a traditional advantage of c++. Is that still true?
A: yes

Q: some of the top innovators in c++ including yourself, Andrei Alexandrescu, Herb Sutter, … are not contributing to java, c# or other languages. I wonder why.
A: Andrei is working D.
%%A: GC languages have their constraints. Many of the c++ innovations don’t easily apply. However, the cross pollination is more natural between c and c++. Stroustrup said he might be one of the biggest contributors to C feature improvements.

Q: strongholds of c++?
A: complex code, close-to-hardware

Q: do you agree that the new crop of language features are used mostly for system programmers, library developers … not application developers?
A: See the stronghold answer

Q: what competitors do you see over the next 20 years? Go? Rust?
A: none

std::sort() beating qsort() #inline

Stroustrup was the first one to tell me c++ std::sort() can beat C qsort() easily.

https://travisdowns.github.io/blog/2019/05/22/sorting.html says:

Since the qsort() code is compiled ahead of time and is found inside the shared libc binary, there is no chance that the comparator funciton, passed as a function pointer, can be inlined.

https://martin-ueding.de/articles/qsort-vs-std-sort/index.html says

For qsort(), since the function is passed as a pointer, and the elements are passed as void pointers as well, it means that each comparison costs three indirections and a function call.

In C++, the std::sort is a template algorithm, so that it can be compiled once for each type. The operator< of the type is usually baked into the sort algorithm as well (inlining), reducing the cost of the comparison significantly.

QQ study=burning_joy

See also the pleasure^chore framework

Unbelievably, QQ study feels like “burning” pleasure. After I complete some chore like coding drill, tax, I long for some QQ study !

There are very few burning pleasures:

  • tech study including zbs, halo…
  • localSys hacking — camp_out
  • weekend coding assignments
  • jogging in stadium
  • learning Oracle or unix in my younger days — camp_out

signed-int: longest K-sum subArray #60%

Q: given a signed-int array, find the longest subarray having sum=K

realistic scenario: we know someone executed a buy and a sell and made $K profit. There are many possible buy-sell points when I review the price points. We know this trader held the position longest to gain exposure, so which buy-sell points did he use?

====analysis

First scan to build cumulative sum — the sum-array. Looks like a price chart. Initial value is the first original element.

Now scan this sum-array. At each element s[j],

  • I save s[j] in hashtable {s[j] -> j] iFF no such key yet.
  • Then I look for the (by design, earliest) element s[i] such that s[i] + K == s[j]. If found in the hash table, then compute hold:=j-i and update the global longest_hold if necessary.

Note the hash table’s values (positions) are all earlier than j, since we are scanning forward.

Can consolidate to one scan.

Tip — hash table is usable to reduce O() thanks to exactly-K.

Q: what could derail work-till-70 plan

Q: what things can derail my work-till-75 plan. Let’s be open and include realistic and theoretical factors.

  • A: I think health is unlikely to be the derailer. In contrast, IV competition, age discrimination are more likely. The ruthless march of technology also means demand for my skillset will decline..
  • A: mental health? Look at GregM at RTS. See other blogposts on brain aging.
  • A: On the job, my energy, absorbency and sustained focus might go down (or up) with age. I wrote more in another blogpost — As I age, brown-field localSys will become harder to absorb. I may need to stay at one system longer.
    • On the other hand, if offered the chance to convert from contractor to FTE, I may need to resist and possibly move out.
  • A: On interviews, I think my QQ knowledge will remain competitive for many years.
  • A (pessimistic view): green field vs brown field — as I age, my capacity to handle green field may go down. My speed to learn brown field codebase may also go down but after I learn it, I may be able to retain the knowledge.
  • A1: #1 derailer is demand for my skills. In fact, beside doctors, Wall St tech might be one of the most enviable domains for work-till-70. Note “tech” also includes BAU, sysAdmin, projMgr, productMgr and other support functions.
  • A1b: Based on the rumor that west coast is more competitive and age-unfriendly, then the techies there in their 40’s may have more difficulty to remain hands-on like on Wall st. I have a natural bias towards WallSt contract market. If confirmed, then Wall st is better for older programmers.

a few common memory techniques in my HRT assignment

Technique: reinterpret_cast

Technique: memcpy

Technique: pre-allocated buffer as local static objects … can be better than "stack" allocation only for a large buffer.

Unlike local variables, local Static objects can be returned by pointer — major flexibility in low-level memory management.

https://stackoverflow.com/questions/3835857/in-c-does-using-static-variables-in-a-function-make-it-faster?rq=1 discusses the runtime cost of static local vs local

std::vector capacity reduction

Q: how would vector’s backing array reduce in size? In other words, how would the capacity every reduce?
%%A: The only hope is to shrink_to_fit() which is a request to compiler. Compiler may ignore it.

If capacity reduction does happen at runtime, then reallocation would probably happen.

I believe resize() assign() clear() etc will never reduce vector capacity.

[17] string(+vector) reserve()to avoid re-allocation #resize()wrong

See also std::vector capacity reduction and vector internal array: always on heap; cleaned up via RAII

string/vector use expandable arrays. Has to be allocated on heap not stack.

For vector, it’s very simple —

  • resize() creates dummy elements iFF upsizing
  • reserve() only allocates spare “raw” capacity without creating elements to fill it. See P279 [[c++stdLib]]

[[Optimized C++]] P70 confirms that std::string can get re-allocated when it grows beyond current capacity.

http://stackoverflow.com/questions/9521629/stdstringss-capacity-reserve-resize-functions compares std::string.resize() vs reserve().

  • Backgrounder: Capacity — allocated capacity is often uninitialized memory reserved for this string.
  • Backgrounder: size — length of the string that’s fully initialized and accessible. Some of the characters (nulls) could be invisible.
  • string.resize() can increase the string’s size by filling in space (at the end) with dummy characters (or a user-supplied char). See http://www.cplusplus.com/reference/string/string/resize/
    • After resize(55) the string size is 55. All 55 characters are part of the string proper.
    • changes string’s size
  • string.reserve() doesn’t affect string size. This is a request to increase or shrink the “capacity” of the object. I believe capacity (say 77) is always bigger than the size of 55. The 77 – 55 = 22 extra slots allocated are uninitialized! They only get populated after you push_back or insert to the string.

python bisect #cod`IV

The bisect module is frequently needed in coding tests esp. codility. In this write-up, I will omit all other function parameters.

* bisect.bisect_right(x)  # returns an position “i” after the first hit, if any
all values are <= x from a[lo] to a[i-1]) for the left side and all values are > x from a[i] to a[hi-1]) for the right side. So the “hit” items are on the left, unconditionally 🙂
* bisect.bisect_left(x) # returns an index i before or on the first hit:
all values are < x from a[lo] to a[i-1]) for the left side and all values are >= x from in a[i] to a[hi-1]) for the right side.

My sound bytes:

  • bisect_left(needle) returns the first index above or matching needle.
  • bisect_right(needle) returns the first index above needle, unconditionally.

A few scenarios:

  1. If No perfect hit, then same value returned by both functions.
    • Common scenario: if needle is higher than all, then “i” would both be the last index + 1.
    • Common scenario: if the needle is lower than all, then “i” would both be 0
    • in all cases, You can always insert Before this position
  2. If you get a perfect hit on a list values, bisect_left would return that “perfect” index, so bisect_left() is more useful than bisect_right(). I feel this is similar to std::lower_bound
    • This is confusing, but bisect_right() would return a value such that a[i-1] == x, so the returned “i” value is higher. Therefore, bisect_right() would never return the “perfect” index.
  3. If you have a lower-bound input value (like minimum sqf) that might hit, then use bisect_left(). If it returns i, then all list elements qualify from i to end of list
  4. If you have an upper-bound input value that might hit, then use bisect_left(). If it returns i, then all list values qualify from 0 to i. I never use bisect_right.
  5. Note the slicing syntax in python a[lo] to a[i-1] == a[lo:i] where the lower bound “lo” is inclusive but upper bound “i” is exclusive.
import bisect
needle = 2
float_list = [0, 1, 2, 3, 4]
left = bisect.bisect_left(float_list, needle)
print 'left (should be lower) =', left # 2

right = bisect.bisect_right(float_list, needle)
print 'right (should be higher) =', right # 3

new languages lose limelight soon

New languages come into and go out of fashion, like hairstyle and cars.

  • Java has lost the lime light for 10 years but is still fairly popular due to job market and high-profile and “cool” companies like google.
  • C and C++ are among the top 5 longest-living languages. They are unpopular among the young, but they remain important.
  • c# is a curious case. I think the really important windows applications are still written in c++.
  • Many high-level languages go out of fashion, including C#, D, ..
  • Most scripting or dynamic languages go out of fashion

 

G6 c++skills { Stroustrup chat

  1. memory mgmt – for time and space efficiency
  2. reusable techniques to use stack objects instead of heapy thingies
    • reduce reliance on dynamic containers
  3. virtual — reusable techniques to avoid virtual
  4. other compiler internals that impact latency
    1. knowledge of inline — to trade off latency against code size
  5. safe c++ techniques, to reduce the chance of run-time errors. C++ is particular vulnerable as it is
    • more complex than C and others and
    • more close-to-hardware than Java and other languages.
  6. TMP — against to trade off latency against complexity. But this zbs domain is too big for me

 

vector internal array: always on heap; cleaned up via RAII

Across languages, vector is the #1 most common and useful container. Hashtable might be #2.

I used to feel that a vector (and hashtable) can grow its “backing array” without heap. There’s global memory and stack memory and perhaps more. Now I think that’s naive.

  • Global area is allocated at compile time. The array would be fixed in size.
  • For an array to be allocated on stack, the runtime must know its size. Once it’s allocated on that stack frame, it can’t grow beyond that stack frame.
  • In fact, any array can’t grow at all.

The vector grows its backing array by allocating a bigger array and copying the payload. That bigger array can’t be in global area because this type on-demand reallocation is not compile-time.

Anything on heap is accessed by pointer. Vector owns this hidden pointer. It has to delete the array on heap as in RAII. No choice.

Now the painful part — heap allocation is much slower than stack allocation. Static allocation has no runtime cost at all after program starts.

kernels(+lang extensions)don’t use c++

Q: why kernels are usually written in C not c++? This is an underpinning of the value and longevity of the languages.

I asked Stroustrup. He clearly thinks c++ can do the job. As to why C still dominates, he cited a historical reason. Kernels were written long before C++ was invented.

Aha — I think there are conventions and standard interfaces (POSIX is one of them)… always in C.

I said “The common denominator among various languages is always a C API”. He said that’s also part of what he meant.

benchmark c++^newer languages

C++ can be 5x faster than java if both programs are well-tuned — A ball-park estimate given by Stroustrup.

The c++ code is often written like java code, using lots of pointers, virtual functions, no inline, perhaps with too many heap allocations rather than stack variables.

Many other benchmarks are similarly questionable. These new languages out there are usually OO and rely on GC + pointer indirection. If you translate their code to C++, the resulting c++ code would be horribly inefficient, not taking advantage of c++ compiler’s powers. An expert c++ developer would rewrite everything to avoid virtual functions and favor local variables and inline, and possibly use compile-time programming. The binary would usually become comparable in benchmark.  The c++ compiler is more sophisticated and have more optimization opportunities, so it usually produces faster code.

append+maxInRangeK #Rahul

Q: given a sequence of integers, implement two operations

  1. Operation 1: append another integer. Size becomes N
  2. Operation 2: given two arbitrary subscripts into the sequence, (they define a sub-array of size K), find the max

Rahul saw discussions without any optimal solution. I think a simple vector-based algo can achieve amortized O(1) for Append and O(logK) for Max.

AuxDS required.

— idea 1: segment the sequence into fixed segments

 

A/Bob/C: cod`IV #vec for python list

Here are other variables names frequent needed in coding puzzles:

  • Once you declared a long, meaningful var name, you could use abbr of it afterwards.
  • if you have an array of composite objects like “frame” or “trade”, each represented by a inner array, then be careful with frame[1]. Better call it aFrame[1] or frames[0].
  • what about “sorted” which is a python function name:( …. I would use ascd by default.
  • B4 means “before”; Af means “after”
  • hm means hashtable; tm means treeMap or std::map
  • targetRepetition or tgtRep
  • “val” — payload data filed in a Node class. This name is short, easy to type. Harder to global-replace but LG in a short program
  • candd” — candidate
  • “End” for the last element in a container
  • li means list … vec means vector … arr means array
    • In python, we need extra caution to use “vec” not “li” for the python list !
  • cnt means count or counter
  • — for a matrix
  • M, m or mat for the matrix
  • r as the 1st index; c as the 2nd index. Avoid x/y
  • height/rows as number of rows; width/cols as number of columns
  • I will refer to each element as a “cell”

—It’s more useful to have personal names than mere “Person A”. Shorter names are easier to write. Gender doesn’t matter.

  1. Alan, Alice, Aaron,
  2. Bob, Ben (Big Ben), Bill,
  3. Cody,Chris, Charlie
  4. Dan, David
  5. Ed, Emily, Elizabeth (queen)
  6. Fred
  7. Greg, Gail
  8. Henry, Harry,
  9. Ilya, Ivy, Ian
  10. Jen, Jim, Joe, Jack, John, Jason,
  11. Ken, Kate, Kevin,
  12. Lucy, Lee, Lilly, Larry,
  13. Mike, Mary, Matt,
  14. Nick, Nancy,
  15. Oliver, Oscar, …
  16. Pam, Peter, Paul, Peggy
  17. Quincy
  18. Raj, Rob, Ray,
  19. Sam, Sue,
  20. Tom, Ted, Tim,
  21. Uday ….
  22. Vic, Venkat
  23. Wendy, Will
  24. Xavier, Xander/Zander …
  25. Yixin, Yang ….
  26. Zofia, Zack,

when2introduce new tokens into O()

The extra tokens like A  B C in a O(A+BB+logC) are not always independent dimensions.

  • Case: DFT/BFT have O(V+E) but some people may say V <= E, so why not remove V and say O(E).

I think this estimate misses the point that E could greatly outnumber V. If another algo is O(V+logE) it would probably win.

  • case: radix sort is O(WN), but some may say W <=N, so why not remove W and say O(NN)

Well, W is typically log(unique count among N), though W can be half N.

Some bigO students don’t like too many tokens and may simplify it to O(KK), but I think it’s imprecise. If M or N is very small like logK, then O(MN) is much faster than O(KK).

Sunday night: if in the mood4localSys

Realistic scenario — I find myself in the mood for localSys learning on a Sunday night 11pm.

I think it’s better to sleep in office than to go home, but in Singapore, I had better go home and sleep, by taking taxi.

I think it’s better to work on the localSys till 2am (or later). Those 3 hours are precious engagement … Burning pleasure. I don’t get such 3-hour engagements in a week.

I used to feel “why not work on my QQ or coding practice now, and focus on work Monday morning?” It turns out that I’m usually less motivated on Monday morning, for whatever reasons.

Friday night is similar. I often find my highest appetite and absorbency on Friday nights (or eve of public holidays). So better go home late, sleep, then come back to office Saturday early morning to capture the mood.

[19] y WallSt_contract=my best Arena #Grandpa

Competition arenas … we all choose the arena to compete in. It’s a choice, either implicit choice or explicit choice. I would say better to conscious about this choice.

Not much new content in this blogpost. I feel very convinced to stick with WallSt contract market. Here is a ranking of the reasons why I consider it a rational decision, though our decisions are shaped by our deeply personal experiences and inherently irrational.

Beware of attachment !

  1. low stress, low expectation — my #1 reason as of 2019
  2. low-caliber competitors, mostly due to the “offputting
  3. age friendly
  4. I get to practice interviews and keep a precious burning-pleasure for a month each year on average.
    1. In contrast, If I were an ibank VP I would have multiple obstacles on that front.
  5. — other reasons to prefer Wall St contract
  6. higher probability of greenfield projects
  7. leverage on domain knowledge
  8. I can easily explain my job hopping profile
  9. a number of firms to hop around

Now the downside, off-putting factors. Many bright or young competitors are put off, reducing the competition

  • salary can drop; furlough
  • frequent job change required — often forced to change job.
  • no job security — Employers tend to cut contractors first.
  • no compensation package
  • no growth, no bonus
  • no chance to move off hands-on dev and become prj mgr etc
  • no medical benefits
  • restricted to mostly ibanks — many young bright people are interested in buy-side or non-finance.

baseclass(template)use subclass field+!vptr

A very restrictive context —

  1. the base and sub classes represent market data messages, used in a reinterpret_cast context, with zero padding. Therefore, vptr would add a pointer and mess up reinterpret_cast.
  2. multiple (say 11) subclasses have a “price” field, so I don’t want code duplication 11 times
  3. The field order is fixed in struct definition. Without this restriction, I would define the “price” field in a base struct and also a getPrice() method. With a subclass instance, CRTP could probably work like static_cast<Subclass const*>(ptr)->getPrice() but the “price” field’s physical offset would be be dictated by the compiler not according to my struct definition

My technique uses CRTP but no SFINAE no enable_if.

My technique is easier to /internalize/, as it relies on simple overload resolution + simple type deduction. In contrast, the SFINAE technique used in my RTS codebase is off-putting and alien to me

wildcard matching #more tractable than regex_FB

Q (Leetcode44): Given an input string (s) and a pattern (p), implement wildcard pattern matching with support for ‘?’ and ‘*’. The matching should cover the entire input string (not partial).

‘?’ Matches any single character.
‘*’ Matches any sequence of characters (including the empty sequence). The star is not a quantifier. I think a single star is like the “.*” in perl.

no space allowed.

I think a wildcard can sit at end of string

I think we could see two wildcards in a row. Two stars can be treated as one star.

=====analysis

I feel this is simpler than the regex_FB.py problem, therefore easier to absorb. I think either DP or my original recursive solution should work.

Once we have a topdn-memoization solution (published on githut) working, we can work on a botup.

Accounts merge #disjoint set

https://leetcode.com/problems/accounts-merge/

Q: .. too long

====Analysis
Union-find + BFT

— solution 1:
Data structure:

  • Acct object {name, vector of Email objects, isVisited flag }
  • Email object {addr, set of Accts }
  • hash table {address -> Email objects}

First scan to Create one acct object for each row. When populating the vector, check if the address exists in the hash table. If yes, then save the existing email object in the vector. Also the Email object’s set is updated with the new Acct object.

Second scan to print everything. Iterate over the accounts. Need to merge all the emails belong to John —

Scan the vector. If any Email  object has more than one Acct (named John), visit each Acct following BFT, until all John accounts are visited. Collect all of these accounts’ emails in one tree-set, then print them.

 

max-product subArray #51% untested

Q: Leetcode 152: given signed int array, find the subarray (minimum size 1) with highest product. Return the product.

Note if there’s 0 in your subarray you must return 0.

====analysis
peek? Not yet

— solution 1: O(N)

One scan to identify all the zeros. They partition the array.  For each partition, we need algorithm-A:

Count neg elements. If even, then return product of entire array. Else

Accumulate from left until last neg element -> prodL.
Accumulate from right until last neg element -> prodR. Compare the two.

trivial edge case — if the zeros create only single-neg partitions, then zero is the best.

trivial edge case — If subarray has only one element, be it zero or negative, then that’s the best for that subarray.

 

 

U.S.startups: often less selective,lower caliber

U.S. startups may represent a sweet spot among my job choices. I think they are less selective than HFT or top tech shops. They attract fewer top geeks.

  • Workload – presumably comparable to ibanks like MS/Baml/Barc. I guess some startups are higher, but I guess fewer than 50%. Mithun described 3 startups but he has no firsthand experience.
  • Salary — Many startups pay on-par with ibanks, according to CYW
  • Leaves – possibly fewer than ibanks
  • Cowoker benchmark – possibly lower caliber
    • Respect – could be slightly better than in an ibank if you earn the respect

Some of the big tech shops are actually less selective than HFT – Amazon, Apple

in each treeNode, save d2root

My friend YH showed me a simple BFT algo where each parent node AA saves AA’s level+1 into the two children nodes AAL and AAR.

So every node remembers its own level from root 🙂

I think this technique can be used in pre-order DFT too. For post-order, it is possible but less natural, as shown in my github code subTreeStats.cpp.

Note pre|post-order can work B-trees not only binary trees.

If nodes are immutable, we can use a hashmap {node address -> level}

##IV Q:Name some big-impact projects U contributed to

Kyle pointed out that your change could be a 10-minute job with big impact, but only because there was already an infrastructure. You leverage the infrastructure to create the impact. For a story to be compelling, it has to show complexity and personal contributions

  1. NYSE integrated feed. Most of the financial data websites get nyse data from us
  2. Baml muni trade execution — rewrite from scratch
  3. billing/commission calc at PWM.
    1. eg: mortgage commissions from scratch
    2. eg: special investment (hedge funds) commission rewrite
  4. Citi muni re-offer engine. Biggest muni house in the U.S. Lots of retail investors buy muni bonds through their investment advisors. My changes are actually incremental.

always separate write^read traffic

Rahul pointed out my “simplistic” thinking. Now I feel there’s no good reason to create a web server to handle both read and write requests.

A Read server has a sizable data cache to service client requests. This cache gets updated ….

A Write server (“writer”) has no such data cache, but it might have an incoming request queue + a downstream queue.

The incoming queue introduces delay, but users who send updates often understand that writes take longer than read.

The downstream queue is relatively new to me, so here’s my hypothesis —

Say 100 writers all need to get their records persisted in a central data store. The infrastructure at the central data store is now a bottleneck, so the 100 writers send their records in a queue rather than wait indefinitely. The writers can then handle other incoming requests.

FizzBuzz in O(N)

Q (Leetcode ) Write a program that outputs the string representation of numbers from 1 to n. But for multiples of three it should output “Fizz” instead of the number and for the multiples of five output “Buzz”. For numbers which are multiples of both three and five output “FizzBuzz”.

What if the divisors in the requirement is not two but much more and each a prime number?

====analysis

In the standard solution, there’s a hashtable O(NK) algo — for every int 1 to N, check the K divisors.  Here’s a O(N) solution:

  • first populate the return vector with the basic strings
  • Then make a strided iteration to append “Fizz” on every multiple of 3.
  • Then make a strided iteration to append “Buzz” on every multiple of 5.

The fact that prime numbers become very high very soon help our bigO:

N/3 + N/5 +… + N/31 + N/37… will be lower than 5N

subOptimal $$valuable solutions#CSY

My friend Shanyou consistently holds a position like “I don’t need to find those better solutions (forget the “optimal”). If my solution is brute force or non-optimal, at least it works, so it will be good enough for many interviews and good enough for my coding drill. Many candidates can’t find even a brute force solution in the given time, so if I strengthen my capabilities to this level it’s valuable.

Admirable !

That’s inner strength — 定力
That’s much-needed focus
That’s undervalued realism
That’s celebration of every progress
That’s a pat on our own back

The all-or-nothing, single-minded pursuit of optimal solution can often wipe out my motivation, joy, satisfaction, self-esteem.

It’s a common and fatal de-motivation to tell ourselves (like XR … ) “There’s no value in pursuing non-optimal solution.” Well there ARE a few:

  1. pride and achievement from completion of a tough challenge
  2. ECT, speed
  3. syntax idioms
  4. self-mastery, internal locus of control, similar to my yoga practice
  5. absorbency — demonstrate to myself; self confidence in absorbency
  6. joy, self-satisfaction — see exactly what I want{cod`drill beside immediate IV #Rahul
  7. … Other blogs provide other perspectives

!! Many published solutions are non-optimal in O()
!! Most solutions out there are non-optimal in terms of concise implementation, even if matching the best solutions in big-O.

Some interviewers only care about big-O and don’t care about too many loops; Other interviews want an elegant, simple solution; The most extreme interviews (Indeed, FB?) score candidates on syntax and code quality too. Which one do you target? Realistically, I would take Shanyou’s target.

Q: your real goals, motivations for coding drill? Absorbency? Burning pleasure? Satisfaction (as Rahul puts it)? Maintaining a hobby?
So defocus can help. The focus on “optimal” creates negativity.

— Many non-optimal solutions are worthwhile.

  • Some of them offer insight into the structure, constraints… but may not offer enough insight to solve the problem
  • Some of them are non-optimal but optimal solutions to modified problems.
  • Some of them are non-optimal but break down the original problem into smaller, familiar problems. This process builds the xRef I need for thick->thin. The breakdown technique can be reusable.
  • Some of them offer reusable techniques
  • Some of them are non-optimal but very short and clever
  • Some of them (my homemade) are unique and innovative … worthwhile achievements.
  • Some of them are good enough for “2nd-tier” coding interviews

 

## halo knowledge to impress west coast interviews

My naive list of "halo" technical knowledge that might impress some west coast interviewers.

  • order stat tree
  • Morris tree walk – O(1) space
  • path compression in union-find
  • hash table worst case is no longer O(N) as in java8
  • shortest-path algorithms
  • Fibonacci and other advanced heaps — I discussed it at Facebook
  • tail recursion to reduce space complexity
  • worst case quick sort and randomized pivot selection
  • bucket sort
  • insertion sort as the fastest sort
  • radix sort for floats and strings

good@pushing myself

am good at pushing myself to the limit, sometimes beyond my limit, even if i may not be “good at” them in a competitive sense.

  • yoga
  • jogging
  • coding drill
  • camp out

coding drill is arguably the most interesting case study. Other people (including students) may complete more problems, but i probably spend more time

  • Ranked by number of problems, i’m just average
  • ranked by hours spent, I’m much higher
  • ranked by effort, I’m even higher, as I push myself harder than others

generate combinationSum compositions #backtrack up]trie #2review

Q: https://leetcode.com/problems/combination-sum/description/

Given a set of unique candidate numbers and a target number, find all unique combinations of candidates, where each combination sums to target. Each candidate may be used repeatedly.

My solution is https://github.com/tiger40490/repo1/blob/cpp1/cpp/combo_permu/comboSum.cpp , showing a reusable backtracking technique described below. It’s clever and brief. However, the efficiency is questionable. Memoization might be applicable here.

This backtracking relies on a key insight. Suppose we have target = 7 and X=2,Y=3,Z=4 as the candidates.

  • when we try a Y, we don’t need to try any more Xs. For example, If we are trying XY, then all XX* solutions are already handled by earlier recursive calls.
  • each combo sequence is naturally pre-sorted internally.
  • Also, “lower” formulas are generated before “higher” formulas
                 root
          x        y     z
      /   |  \
     x    y   z       
    / \   | \  \
 xxx  xxy | xyz \ 
         xyy     xzz
void //can return something if needed

recurs( solutionsFound &, //growing
        curPartialSolution &, 
// above collections could be global variables, to simplify things

        remainingCandidates, /*probably an immutable global array. 
If we need the remaining array to shrink, we can still rely on startIndex to skip used candidates.*/

        gapFromTarget, 
        startIndex, //used to select from remaining candidates
) 

Inside this function, we scan remaining candidates starting from startIndex. Typically in one iteration

  1. we add a new candidate into curPartialSolution
  2. we call recurs
  3. we remove the last added candidate from curPartialSolution to restore the original curPartialSolution — backtracking up the tree.
  4. move on to the next candidate

I feel this is more greedy than DP

what could keep up%%coding drill]SG

  • I think the west coast salary would continue to stay high for a few years
  • I may see more examples of older programmers (like me) getting nice offers on the west coast.
  • I am sure to see more friends (in their 30’s) getting nice offers on the west coast.
  • Even in Singapore, there are a few west coast shops including Google and facebook.
  • Rich startups are always attractive to me. I was told some of them use similar coding tests.

Above are the “evidence” that I would see over the next 24 months. But how would I react to them?

Fundamentally, I feel confident I can improve on the algo and coding tests, if I work hard and keep up the effort. Internal locus of control. Self-efficacy

Fundamentally, I always want to stay relevant to the tech economy and various job markets. I never want to be sidelined.

You mentioned the lack of personal time. Big challenge. Another challenge is new job where I must work extremely hard to survive. First X months I would focus on new job and kids. Once I have some free cycles, I will try to resume coding drill.

I don’t know how, but I hope to find a bit of time each month (not each week) working on some challenging questions, or at least review some old questions.

suffix array for a haystack #learning notes

For an important usage, see find All hiding places of needle]haystack #suffixArray

— relationship between

  • Suffix array — saves the lexicographic rank of each suffix of a haystack.
  • LCP array —- Contains the maximum length of prefix match between two consecutive suffixes, after they are sorted (as in a dictionary) and stored in the suffix array.

Each array element describes one suffix.

Both are integer arrays of length N (i.e. haystack length). LCP saves some lengths; Suffix array saves head positions within haystack

By definition, the LCP is always “helper” for the suffix array. In contrast, the suffix array can be useful by itself.

find All hid`places@needle]haystack #suffixArray

Now I think suffix array is probably the best solution. Suffix array can be built in O(N) and can help any needle1, needle2, needle3 etc. This O(N) cost of pre-processing is small if searched repeatedly.

The Aha — suffix is sorted by the (suffix) string content. Therefore, all the “similar” suffixes club together.

Using the example in https://en.wikipedia.org/wiki/Suffix_array#Applications, if I want all matches for needle “AN” in haystack BANANA, using the suffix array, even a simple ( inefficient ) linear scan would find A, AN, ANANA in a “cluster”. In general, O(logN) binary search is good.

This usage requires only suffix array, but many other usage scenarios require the helper LCP array.

Note the preprocessing is on the haystack not the needle.

Note building the suffix array is an one-time investment , for repeated usage. If used only once, then total cost is O(N) to build and O(logN) to use this array. I think it is comparable to, not worse than, brute-force.

concretize asterisk among brackets #Okao

https://leetcode.com/problems/valid-parenthesis-string/

Q (Leetcode 678): Given a string containing only three types of characters: ‘(‘, ‘)’ and ‘*’, write a function to check whether this string is valid.  An asterisk can concretize to empty string or a left or right bracket

====analysis

Kyle reviewed an O(N) solution, but advanced. Not really a “Medium” question if you want optimal. However in an interview perhaps a less optimized solution is good enough?

Worst data set will point out the constraint and structure, as explained in CIV: we wish all corner cases+worst data sets r given and clarify_requirement means

Need to use spreadsheet to work out a good sample data.

I feel confident to crack it if I get a good sample including some worst data.

— published O(N) solution

One scan. At each position ask the question “How many extra openers can there be up to this char” When the answer is negative, we give up and return false, but usually the answer is s a range like 0-2.

Use two global variables to record the lowest answer, and highest possible answer.

  • When we hit a ‘(‘, lo++ and hi++
  • when we hit a ‘)’, lo– and hi–
  • when we hit a ‘*’, then range expands,
    • lo—. Lowest possible extra opener count is one lower now, since the asterisk can be a closer
    • high++. Highest possible extra opener count is one higher now, since the asterisk can be an opener

I think at end of the scan, lo should be zero, i.e. zero-extra-opener is at least possible.

java off-heap memory #cf database,reinterpret_cast

  • justification — see “jGC” item
  • tradeoff — ?
  • java.nio.ByteBuffer — non-blocking IO api is very relevant. It has API to access memory-mapped files, among other things. More often, I think we probably use NIO to request a big chunk of memory from kernel, and bypass the java heap.
  • serialization — most (if not all) off-heap techniques require objects serialized and stored off-heap.
  • byte array — Unlikely object pool, off-heap memory solutions must serialize Trade objects into binary byte array [1] and stores it. It has to be transformed back to Trade objects before use. C would use reinterpet_cast() — no disk involved.
  • offloading jGC — With huge data moved off the heap, jGC is now much faster, and jitter-free. This is the main goal and justification of off-heap
  • bigger — off-heap storage can be bigger than on-heap storage.
  • slower — off-heap storage is slower (less immediate) than on-heap, but faster than disk or database[2].
  • performance boost — even though “slower”, it can boost performance of GC.
  • memory sharing — between two JVM’s is easier if using off-heap. I think memory-mapped file is one means.

[1] In this way, the NYSE XDP protocol could arguably be considered an off-heap serialization protocol.

[2] database is actually a competing alternative solution for some use cases like “shared-cache” or “large lukewarm cache”. I feel database solution is very proven, reliable but slower than off-heap.

Among the many disparate online articles about off-heap, here are my top 2 fav

  1. https://stackoverflow.com/questions/6091615/difference-between-on-heap-and-off-heap
  2. https://www.tutorialdocs.com/article/spark-memory-management.html

4 competing domains to Support dev salary@WSt

In terms of supply/demand, there are systemic forces that support the relatively high developer salary in investment banks.

  • #1 pure tech shops — generate high profit, attract hot investment
  • #2 buy-side shops including mutual funds
  • #3 small startups — only in U.S. can they offer competitive salary
  • #4 exchanges and fintech shops including bbg, Reuters, S&P. Some (Miami, NYSE..) seem to generate high profits
  • #5 package software vendors

Few of these categories exist in Singapore.

hashtable resizing ≅ jGC jitter #unpredicatable

“Note because of the rehashing issue – a realtime applications and applications that need low latency- should not use a hash table as their data structure.” — said one guy on StackOverflow.

I would say if we have relative confidence about total data size (like below 1 million) then we can still use hash table. A benchmark test is not hard to set up, to compare against alternative designs such as RBTree. For large data sizes, I think hash table would win.

Note in latency sensitive systems, resizing (and stop-the-world GC) is likely to happen at busiest time.

Can we trigger a resize? Just send in more data to build up the table.

Can you trigger a pre-emptive GC? See can you force garbage collector to run@@

 

demanding mental power: localSys imt QQ

I told grandpa that I have noticed signs that my mental power is declining, mostly in localSys and local codebase absorbency. However, localSys has always been a weakness in me. The decline, if any, is very gradual.

QQ learning is still in good progress. Experience, accumulation, deepening.. thick->thin.

Note A.Brooks’ article talks about creativity, innovation, analytical … My issue is mostly memory capacity and speed. Recall the Indeed interview.

PIP@Macq: tough judge@%%design

If I were the judge, then Kevin’s solution may get rejected or rated mediocre.

I think the judgement can be unreasonably tough when the judge herself is a practitioner — consider Yang and Sundip Jangi.

On the other hand,

  • Yang liked my OO design in EOS
  • Sundip liked my personalization design

The outcome (PIP etc) doesn’t mean my work (i.e. output) is sub-standard. The outcome has many reasons and causes.

I need to be fair and impartial to myself. [[learned optimism]] uses the three P’s. One of them is Personal.

represent trees with unbounded fanout

When I first read it in [[CLRS]], I didn’t know why trees exist with unbounded fanout, until I read about https://en.wikipedia.org/wiki/Suffix_tree.

Insight —- If we don’t know how many child nodes a parent can have, then it’s not easy to define TreeNode fields to hold children. If you use a vector or linked list within a TreeNode (perhaps a java developer), your space efficiency will be poor.

The only way I know is a left-child-right-sibling scheme — TreeNode A has a pointer to A’s first child. If A is also one of the children of TreeNode P, then A will also have a pointer to a nextSibling , possibly null.

insight —- https://en.wikipedia.org/wiki/Left-child_right-sibling_binary_tree reveals that this scheme uses a binary tree to represent a k-ary tree

This is one of the few usages of linked list to my knowledge.

touch not cross: path between 2 corners

Q1: from p 183 [[discrete math]]: given a n x n grid. Start from north west corner moving south or east each step, towards that corner. The diagonal connecting them can be touched from north, but not crossed. print all paths

easier to treat origin as [0,0] and end as [N,N]

DFT will require deep recursion.
BFT (with color) where each node remembers all paths-from-root? Kinda brute force

Insight — Actually this is not necessarily a graph problem though it can be solved that way.

Q2 (accepted@leetcode ): Given n pairs of parentheses, write a function to generate all combinations of well-formed parentheses. For example, given n = 3, a solution set is:

[ “((()))”,
“(()())”,
“(())()”,
“()(())”,
“()()()” ]

====analysis
Four related problems — These two problems are related to the abbreviation generator i.e. the combination generator.

However, the abbr/combo generators are more versatile and possibly overkill for this problem. These two problems can use a bit array to represent the output. I think my solution on github is probably considered inelegant but I don’t care. BigO insight —  number of paths (or valid strings) is O(N!) so any solution would not be any better than O(N!)

In general, our own ideas are often inefficient. If efficient, then often inelegant by some arbitrary interview standard. Still more valuable than learning standard solutions. One of the biggest values is xRef which helps build insight, intuition, thick->thin.

3reasons coding drill is insufficient for getting offers

Q: why do many students and professionals practice hundreds of coding questions and still fall below the bar?

— Reason: no similarity — many problems I got are not really similar to my past Leetcode questions. They are really new problem to me.

However, It’s possible that another candidate would say “Oh this is similar to problem #1012” because he has done 400% more problems.

Some employers avoid published questions all together. They may tweak their problems to make them original.

— Reason: hidden similarity — when interviewer poses a coding question, it may be actually similar to a practice problem we did last month, but can we spot the hidden similarity? The similarity is often well-hidden. To really see it, we often need training and insight into the structure and key constraints in a given problem.

Without the right key, we can get mislead into thinking it’s similar to the wrong problem. We all have war-stories where “somewhat similar” is very different from really similar.

To develop the insight into those structures, I find it necessary to compare past problems, and develop pattern-recognition. 读书从厚读到薄 or synthesis learning.

— Reason: forgetting — Without sufficient refresh, many candidates simply forget 50%+ of the past problems they practiced, esp. if they have done 200+. Unfortunately, knowing a few key points is often insufficient at high-end interviews like Facebook.

convert 512bit int to bit-vector #python=best

In coding questions, int-to-bit_vector conversion is common. I now think python offers a simple clean solution, applicable on integers of any size:)

'{0:032b} <-32|8-> {0:08b}'.format(myInt) #convert to a 32-element list, simultaneously to an 8-element
  • first zero refers to argument #0 (we have only one argument).
  • The zero after colon means zero-padding
  • The last ‘b’ means binary
  • Don’t forget your (dental) braces 🙂

Note this is the str.format() method, not the global bulitin function format()

Return value is a python string, but easily convertible to a vector

What if the number is too large? My {0:08b}  example shows that if the int is above 255, then python does the safe thing — extending the output beyond 8-element, rather than truncating 🙂

To convert bit-vector to an int, use int(bitArr , 2) # 2 is the base i.e. binary

data-heavy big webpage: 99%content is received incrementally

Rahul introduced me to this concept briefly, so here’s my own guess.

Context — soccer world cup page with live comments. Heavy updates. Jargons

  • record — a comment, with some additional data fields
  • pull — browser-initiated, can be periodically scheduled
  • push — server-initiated
  • server-side cache — for each userID (or IP), server remembers what records already sent
  • client-side cache

Initially when a user visits this “frontpage”, the web server has to deliver quite a lot of records. Heavy load on the server.

Thanks to pagination, browser can immediately show the 10 latest comments right away. As user scrolls down, browser would read the client-side cache and display 10 more comments, without a round trip.

If user scrolls fast, then browser may (?) need to make another request. In any case, server’s response continues to trickle in carrying the #20+, #30+, #40+ records in the background, asynchronously. In my jargon, this is not server push, but rather the slow response to a client-pull request.

When user hits Refresh button, client sends a new pull. A smart server remembers the most recent record id, so its response would only carry more recent records. Browser would receive the payload, merge it with the client-side cache and use AJAX to redraw the screen.

Even without user’s refresh, the server also can push (or a periodic client-pull) latest updates. I feel server-push is more efficient than a periodic client-pull, and preempts a “run” on the server.

Without auto updates, the page would become static and outdated, like web 1.0. When user hits Refresh, server would receive a brand-new pull request — heavy load.

best2subArrays show`max diff in sums #70%

Q (careercup): Given an array of N signed integers. Find two disjoint contiguous sub-arrays such that the absolute difference between the two sub-array sums is maximized.  The sub-arrays should not overlap. O(n^2) algorithm was not accepted.

eg: [2 -1 -2 1 -4 2 8] ans: (-1 -2 1 -4) (2 8), diff = 16

==== analysis
int-array problems are my relative weakness but I have done quite a few. I feel this is more about the algo rather the implementation

Without loss of generality, let’s assume the best result includes a min subarray on the left and a max subarray on the right.

–idea 1: a partition point at 5 splits the array to 0-4 and 5-end
For a given partition point, we can find the min subarray on the left and also max subarray on the right.

A O(N^2) solution can be designed based on this idea — at each partition point, O(N) search of min subarray sum on the left and max subarray sum on the right

—idea 2: 2 colliding pointers le and ri.
0-le contains a min subarray, and ri-end contains a max subarray. But how do I move the 2 pointers?

— idea 3: O(N) scan fwd to record at each position “max sum ending here” and “min sum ending here” .. two numbers. Ditto reverse scan. Total 4 numbers per tuple.

A subarray is never empty, as produced from my minSubarraySum.py.

End up with N-array of tuples like {a,b,c,d} . Then scan this array in O(N)

lone-wolf hidden ] AAABBBCCC unsorted #52%

Q[Lv] 60%: Given array of integers, every element appears three times except for one, which appears exactly once. Find that single one in O(1) time. Could you implement it without using extra memory?

====analysis

peek? Not yet

well-defined problem:)
greedy?
O(1) space probably means swapping
— Idea 1: with more space, I can use a hashtable of count to achieve O(N)
— Idea 2: with O(1) space I can also sort in O(N logN)
— Idea 9: My O(N) algo, applicable for any-size integers and also other than “three”.

Pick a random pivot and partition in O(N) time and O(1) space. Also keep track how many repetitions of the pivot value (probably 3). Exclude the pivot value, count size of both partitions and discard the one whose size=3X. Repeat.

N+N/2+N/4+N/8 …= 2N

3-sum #52%

Q[L] Given an array nums of K integers, are there elements a, b, c in nums such that a + b + c = 0? Find all unique triplets in the array which gives the sum of zero. Doable in O(KK)

====analysis

— Solution 1: separate into (M) non-negatives and (N) negatives. M+N=K.

— solution 1a: given M targets and N negative items,

if M > N, then check O(NN) pairs among negatives …. and look up the sum in a pre-populated hash table of M items.
If M < N, then use the two-sum algo O(MN)

Total O(MN + MM) for small M
Total O(MN + NN) for small N

Both comes out to O(MN+max(M,N)), smaller than O( [N+M]^2 )

Note some bigO students don’t like too many tokens and may simplify it to O(KK), but I think it’s imprecise… see when to introduce new tokens in O()

— solution 1c: given M targets and N negative items,

I can sort the N-array. For each target, I can binary-search, but this is possibly O(MN logN)

stigma^trauma^gzPIP^esteem^1stAid

  • I would say t_trauma is a form of t_stigma, but deeper, more impactful
  • stigma — stems from PIP and bonus
  • “esteem” and 1stAid are positive, but 1stAid have immediate impact.

Many of these posts are a subset of PIP or Mgr^Contractor. My self-esteem crisis is invariably triggered by these two sources, but ..

.. in terms of severity PIP is 10x heavier than peer comparison.

 

max-sum path up+down binTree #FB 60%

Q2 (Leetcode “hard” Q124): find max path sum, where a ” path ” has minimum one node A and can include A’s left child and/or A’s right child. No uplink available.

====analysis

I see two types of paths

  1. down the tree, starting at some upstream node A ending at a node in A’s subtree
  2. up-down the tree, starting at some A’s left subtree, ending somewhere in A’s right subtree.

For 1) https://bintanvictor.wordpress.com/wp-admin/post.php?post=23360&action=edit has my tested solution

For 2: post-order walk to update each node a (signed) max-path-sum-from-here. See https://bintanvictor.wordpress.com/wp-admin/post.php?post=31601&action=edit. The 2 values from left+right children can solve this sub-problem

Q: can we use the 2) solution for 1)? I think so

BST: finding next greater node is O(1) #Kyle

https://stackoverflow.com/questions/11779859/whats-the-time-complexity-of-iterating-through-a-stdset-stdmap shows that in a std::map, the time complexity of in-order iterating every node is O(N) from lowest to highest.

Therefore, each step is O(1) amortized. In other words, finding next higher node in such a BST is a constant-time operation.

https://en.cppreference.com/w/cpp/iterator/next confirms that std::next() is O(1) if the steps to advance is a single step. Even better, across all STL containers, iterator movement by N steps is O(N) except for random-access iterators, which are even faster, at O(1).

serialize binTree #funptr

struct Node{
  Node * left, right;
  int data;
}

… If you simply output as “data,leftId,rightId | data,leftId,rightId | …” then the graph structure i.e. link is lost. Therefore, I feel the stream need to identify each node: “Id,data,leftId,rightId|…” Next question is how to assign the id values. Easiest — use node address as a string id! Suppose there’s a treewalk algo walk(Node * root, void ( *callback ) (Node *)), i will call it like

walk(root, serialize1node); // where serialize1node is the callback.

Note the position of the asterisk:

  • It’s on the right of type name Node — read left to right .. pointer to Node
  • It’s on the left of variable name callback — read left to right .. callback is a pointer to …

https://github.com/tiger40490/repo1/blob/cpp1/cpp/binTree/serialize_bbg.cpp is my tested solution I feel the challenge is ECT and syntax, not the idea. —- elegant home-grown two-pass solution Assign incremental IDs to each node in an in-order walk. Then run pre-order walk to output each ID.

  • 🙂 applicable for any binTree, but 3-way trees  or higher
  • 🙂 No null node needed
  • 🙂 no space overhead. Instead of serializing address, I serialize my generated ID of each node, which are usually smaller and never bigger than a pointer. I think the serialized byte array is the smallest possible.
  • Key insight — from a given preorder output, there’s exactly one possible BST we construct.

—- solution: standard graph representations — AdjMatrix + edgeSet, applicable on any graph —- solution : BFT -> list including nulls —-idea from my friend Deepak: pre-order tree walk and output one line per node {payload, direction from root, direction from Level 1 ancestor, direction from Level 2 ancestor..}. The technique is reusable even though overall efficiency and complexity is not optimal. We don’t need to be optimal.

  • first node in the file is always the root
  • the output file always includes the higher level nodes before the lower-level nodes
  • delimiter (\n) needed between nodes
  • demerit — data bloat… too verbose if tree is sparse but 99 levels deep

—-If the tree is a BST, then this idea is based on the fact that a reading a series of unsorted nodes can construct a BSTree.

For this solution, we have no requirement on the payloads. They can be booleans.

first pass — in-order walk to build a hash table of {node address -> integer id}. If we treat the ID’s as node keys, then the tree is a BST.

2nd pass — BFT or pre-order walk the original tree. For each node, we write node ID’s to the file, without writing the links or addresses. File looks like T,5|T,4|T,6|T,2|… When we reconstruct the tree, we read each node from the file, and insert that node into our new tree, using the id value as key to decide where.

If the original tree nodes have keys, I will just treat it as payload, and use my assigned ID as key

Kyle’s simple trie ] python

https://github.com/tiger40490/repo1/blob/py1/py/algo_tree/trie_Kyle.py is Usable for some rare coding IV. By Kyle Stewart.

https://leetcode.com/problems/implement-trie-prefix-tree/ has a slightly more complete requirement.

I think I can modify (not write from scratch) Kyle’s code to add the word search.

Looks similar to the 10-billion phone number problem

collections.Counter == unordered_multiset

From Kyle

from collections import *
c = Counter()              # a new, empty counter
c = Counter('gallahad')    # a new counter from an iterable

c = Counter(a=4, b=2, c=0, d=-2)
list(c.elements()) #['a', 'a', 'a', 'a', 'b', 'b']
Counter('abracadabra').most_common(3) #[('a', 5), ('r', 2), ('b', 2)]

sum(c.values())                 # total of all counts
c.clear()                       # reset all counts
list(c)                         # list unique elements
set(c)                          # convert to a set
dict(c)                         # convert to a regular dictionary
c.items()                       # convert to a list of (elem, cnt) pairs
Counter(dict(list_of_pairs))    # convert from a list of (elem, cnt) pairs
c.most_common()[:-n-1:-1]       # n least common elements
c += Counter()                  # remove zero and negative counts

 

## same-complexity optimization in algo design

A lot of algorithm optimizations do not reduce the time/space complexity, but offer other benefits. I will use “ECO” to denote such optimizations. (E can stand for Equivalent.)

  • benefit — smaller data set
    • eg: tree pruning
    • eg: keep only peak+trough
  • benefit — fewer edge cases
  • benefit — easier to understand
  • benefit — easier to implement
  • benefit — easier to debug and investigate.
    • eg: RBTree has this advantage over priorityQ
  • — examples:
  • eg: yield-generator
  • eg: replace RBTree with priorityQ or sorted vector
  • eg: bidirectional BFS in https://leetcode.com/problems/word-ladder/solution/

Note memoization is more then ECO.

If you can convert a recursive solution, then it is often more than ECO.

generate splits into palindromes

https://leetcode.com/problems/palindrome-partitioning/

Q: Given a string s, partition s such that every substring of the partition is a palindrome. Return all possible palindrome partitioning of s.

====analysis

I found this question quite hard and had no confidence, .. overwhelmed, like manyDP problems. But then I came up with two DP solutions .. https://github.com/tiger40490/repo1/blob/py1/py/algo_str/genPalindromSplits.py

I don’t bother to run the leetcode tests as they tend to deflate my burning joy, absorbency… precious stuff.

insight — the word “partition” is horribly confusing. It can mean two very important entities in the problem domain — either a palindrome substring or a complete family of substrings that form the original word. It’s crucial to avoid this word

–idea 1: recursive top-down DP

memoization is not easy since I used generator.

–idea 2: iterative bottom-up DP

##robust^traumatized reactions #kids,BGC..

See also ##[19] PIP hazard=worse than kids,BMI,BGC..

“Bad things happen to all of us. What counts is our reaction.” — first hard from a PWM contractor from India.

I don’t want a big tabular analysis… Perhaps just focus on how fast I regained strength. Resilience and robust are big words in my vocabulary. Self-esteem is important too, but optimism is more important.

  • “C” is my self-rating of My reactions to PIP —- traumatized 惊弓之鸟. too long-lasting, too personal, too ruminative. It /cast too long a shadow/ over my long-term career planning and job choices.
    • 🙂 However, I am always courageous and resilient to take up the challenge right after the PIP, and focus on work.
  • ——- all other reactions are more calm, resilient or even robust ——-
  • “B     ” my reaction to high-flier classmates —- I continue to fight the irrational , illogical reaction.  The harm is much light than PIP reaction.
  • “BBB ” My reaction to U.S. and Singapore immigration issues —- a little drawn out. I worried for quite a long time, proportional to the level of complexity. But I was calm and focused on the problem.
  • “BBB ” My reaction to the trespass —- severe for the first few days but I managed to shrug it off after lots of research online.
  • “BB   ” My reaction to kids’ academic difficulties, weight problem, —- a bit Pessimistic, but much lighter in comparison to PIP. I managed to detach myself emotionally and grow my resilience.
  • “AAA ” My reaction to investment foes —- shows a sign of strength. Resilience. I shrugged it off most of the time and focused on work.
  • “A     ” My reaction to rejections by women —- not robust not positive. I kinda acknowledged that I have a high standard and I wasn’t so attractive on the ‘market’. I think that was fairly realistic.
  • “BBB+”My reaction to the underwhelming quant prospect, my poor ROI —- realistic and negative. I didn’t complain for long. I accepted it and put it aside.
  • “AA   ” My reaction to my c# ROI —- calm. I remain confident about my c#.
  • “AAA ” my reaction to the perceived gap behind coding test pros. I renew my effort without over-thinking
    “AAA ” my reaction to contract terminations at Citi —- positive. I wasted no time. No self-pity.
  • “AAA+”My reactions to repeated interview rejections — always robust and positive. I get right back on the horse after I fall.