hash table time complexity: O(N)^O(logN)^O(1)

Q: When interviewer ask about time complexity, shall I point out my hash table is O(log N) not O(1)?
A: if time permits, I would briefly mention O(N) and O(logN) worst case
A: if no time, then just say typically O(1). I don’t think we would lose many points.

https://stackoverflow.com/questions/9214353/hash-table-runtime-complexity-insert-search-and-delete has a fairly authoritative answer. Basically, it is indeed O(N) worst case in general but

  • can improve to O(logN) in many situations where java8 technique applies
  • in other cases it is O(N) in theory, but extremely rare in practice. The only known case is a deliberate DoS attack.

void list.reverse() ^ reversed()→iterator #sorted(),items()

These three constructs do three different things, without overlap

  • mylist[::-1] returns a reversed clone
    • myStr[::-1] works too
  • mylist.reverse() is a void in-place mutation, same as mylist.sort()
  • reversed(mylist) returns an iterator, not printable but can be used as argument to some other functions
    • I feel this can be convenient in some contexts.
    • reversed(myStr) works too but you can’t print it directly


  • Unlike items(),dict.iteritems()return iterator

Different case:

  • sorted(any_iterable) returns a new vector (i.e. “list”). Returning an iterator would be technically stupid.

##Design competency based@localSys nlg

Have I blogged about this topic elsewhere? Perhaps some elements of this topic. Today I’m specifically addressing — Your competitive strength in tech designs based on localSys.

My self-assessment — I think I’m better than average with my design skill, assuming roughly equal level of localSys know-how. I’m more detailed, so I tend to take care of many localSys constraints.

Q: do other people see my strength in this space?
%%A: some colleagues do.  Some managers do, such as Yang, Sundip, Ravi, Anand, Saurabh (bbg module), Venkat@RTS (on config generator)

I basically chose not to care much about their appraisal. The risk is — The more I care about quality of my design and other people’s appraisal of it, the more contention, more emotional attachment

Therefore, My attitude — I put in (creative) effort but if they don’t like my design I won’t insist.

Nowadays I avoid design confrontation as I risk losing my reputation. I got my fingers burnt before, including Macq, GS .. when I cared too much about my designs. I think Rahul, Zhurong .. are unlike me. They put their heart into it … and cared about the designs.

— Scale — I usually refer to module-level, usually an enhancement or small feature. Bigger projects seldom come to me as a contractor or new-comer but here are some examples

  • 95G: race condition control via stored proc
  • 95G: wait/notify
  • Barclays everything
  • RTS
  • SwapBust
  • SIM trailer replacement for AISE
  • mtg comm

— job hopping factor  — I feel I have benefited from frequent job changes. I feel I had more success stories to tell. However, due to limited localSys, I wasn’t confident/competent to create bigger designs.

— Strength/weakness view
I think my competitive weakness is brown field GTD — figure-out-speed, localSys learning speed, speed code reading …

In green field, or design projects, I’m not so weak, but not always strong either.

I think some guys can design against ambiguity more effectively.

multicast: multi-player games

  • data loss is tolerable
  • many-to-many messaging — with multiple senders and multiple receivers

https://www.cisco.com/c/en/us/about/press/internet-protocol-journal/back-issues/table-contents-19/reliable-multicast.html says

Collaborative applications such as data conferences (whiteboarding) and network-based games are many-to-many applications with ….

… modest scaling requirements of less than 100 participants.

This kind of application requires low latency of less than 400 millisec.

Transmission does not always need strict reliability; for example, refresh of background information for a network game could wait for the next refresh.

story: %%investigation skill #etscfg

Hi Paul, In my etscfg I used this simple ifelse:

1        %IFELSE{defined(SOME_CHECK)}%
3        %ELSE%
5        %IFELSE%

To my surprise, Line 4 comes out preceded by a new-line when the if-condition fails. This impacts a lot of config artifacts that shouldn’t be impacted.

On the other hand, when the if-condition evaluates to true, I get exactly Line 2 and I don’t see any new-line added anywhere. This is correct output.

— Investigation —
On my Line 3, the implicit new-line character right after %ELSE% is incorrectly included in $else_block and printed out before Line 4, causing the surprise result. Attached are two test programs showing

my ($if_block, $else_block) = split(/(?:[^\w\n]*)\%ELSE\%/, $block); ### should be
my ($if_block, $else_block) = split(/(?:[^\w\n]*)\%ELSE\%\n?/, $block); ### to include trailing newline in delimiter of split()

In my fix, I also added a “guard” to deal with any trailing spaces sandwiched between %ELSE% and new-line character. I know sometimes I can leave a trailing space sandwiched there.

Altogether, my fix consists of two changes

• 1 new line of perl
• 1 modified line of perl


constexpr phrasebook

  • compile-time — value known at compile time. More strict than regular q[ const ].
  • compiler enforced — if a constexpr doesn’t have a compile-time known-value, then compiler will complain
  • subset — constexpr objects are a subset of const objects
  • — usages:
  • integral NDTTP values such as std::array length parameter. See [[effModern]]

linear probing+alternatives #phrasebook

collision — Each hash table has a hash collision resolution strategy. Common choices include separate-chaining (SC), linear-probing (LP), double-hashing (DH)

  • cache — In terms of CPU d-cache efficiency, linear probing beats quadratic-probing which beats separate-chaining. CPU cache efficiency is more relevant for very hot data content.
  • clustering — Like street-parking, this can hurt linear probing. Quadratic-probing and double-hashing can cope better.
  • failure — failing to locate an open slot (when available) is a problem with quadratic probing. DH, LP and SC are guaranteed safe.
  • open-addressing — is alternative to chaining. It includes LP, DH etc.

job hopper: more common@WestCoast

I first noticed that west coast employers didn’t care about my job-hopper profile. Then I saw more evidence

## localSys note-taking: choices

Other choices include jira

paper Windows10 StickyNotes MSOL Notes #can hide 🙂 Tasks mail folders recoll #1 default
Encourages Periodic refresh AAA unsuitable B # everything  is temporary C # “browse” F #needs discipline F #needs discipline
browse A B #click to see details A B #copy-edit subject B #click to see details
searchable F unsuitable unsuitable A D #too many C
Organization of 30+ topics Unsuitable F #intrusive until hidden D C # categories B AAA # hierarchical
Radiator AAA B # Kyle use it long-term F F D F
Screen shot F F F A A B
ez2edit B A A Mostly readonly A
locate by title +!search within small population only if in front A iFF tiny population B #no tree but can put into one category D #must search B #filename not always fully used

prepare Facebook onsite: reasonable pace

Reminders —

  • peek11 and “QQ” tags
  • oq tags in WP
  • trie in py

Q: difficult (medium or hard rating) leetcode problems?
A: some “easy” problems are very challenging and enriching.
A: I feel I should not take on too many problems that feel off-putting. They tend to slow me down, and deflate my motivation.

However, if a problem is a classic and offers valuable insights, then I should probably try it out, then read the solution.

There are too many problems on Leetcode and elsewhere. I have done 100+ but can’t do 200 more until my onsite interview. I think I need to pace myself and accept that I won’t get familiar problems.

Q: should I read solutions on some of the medium/hard leetcode problems after trying for a while without breakthrough?
A: Now I think i should , but I am reluctant to.
Q: focus on coding or pure algo?
A: mostly pure algo… aiming to get the gist of each problem. Implement if fun and simple, but I often get drawn into the implementation and waste my time.
Q: should I consider taking some half-day leaves when I find myself in the mood for coding drill?
A: yes, after discussing with manager.
Q: I used to spend 1 full day on one problem. Of course the best solutions only use 100 lines and will take me 1 hour to understand. Should I avoid that?
A: I think I should, given my limited time.
Goal — improve on my Indeed onsite performance.

motivation4coding drill #intrinsic

I had a short but in-depth discussion with my 17-year old intern about motivation, after he said he solved 10 "medium" leetcode problems on 4 July holiday.

Mostly I was talking, like 80%. He responded 20% of the time.

I started by saying I have been fascinated by the reasons some individuals have such high "motivation to practice", ever since I first noticed my son’s rather low motivation to practice. I tried various means of motivating my son, via "extrinsic motivation" and failed each time. I then said I myself is a good case study as I have periods of high motivation and period of high resistance.

relative importance of intelligence vs effort — IQ reduces the motivation, self-discipline and time required to complete a given amount of practice. IQ also increases the rewards.

Intrinsic motivation — is possibly missing in my son. Any amount of extrinsic motivation is unlikely to make any difference.

Age — is a fundamental factor if you ask me. Students and younger coders intuitively know they have a long career ahead of them, so coding drill can help their career long-term. In contrast, I intuitively know I am unlikely to rise up even if I improve my coding interview performance. My coding drill is helpful to the sustenance of my current career. Yes, I’m in maintenance mode, not growth mode.

generate Fib(N) in O(1)time and space: 2 techniques

Like all Constant-recursive sequences, the Fib sequence has a closed-form formula Fib(N) that returns an integer but in floating point form. By rounding, we can generate Fib(N) in constant time.

https://www.math.hmc.edu/funfacts/ffiles/10002.4-5.shtml has the formula we can easily program.

[ Phin – ( -Phi)-n ] /√5  # Notice the three minus signs

where the constant Phi := (1+√5) / 2.

Similarly, if we are given 89 as a Fib number, we can use logarithm to compute the rank of this Fib number.

https://github.com/tiger40490/repo1/blob/cpp1/cpp/lang_33template/FibDeepak.cpp shows two O(1) solutions

spare time as SlackResource #blogg`,parent`,localSys

In 2017 I wrote to Gerald Robinson that spare time depletion would become one of the key challenges/pains of my forthcoming/upcoming adjustments.

For many (probably most) of my peers, if they don’t feel the drive to keep learning in their spare time, the resource depletion doesn’t feel so harmful, hazardous, stressful.. Beside this “learning”, I also have a need for quiet time for reflection.

— commute — Bayonne commute is tolerable only when I have plenty of spare time as “slack resource”.

— Parenting — (esp. my son) became a huge commitment and strain on my slack resources. JackZhang pointed out the importance of “sufficient time for the kids once you move to U.S.”.

— Additional paid annual leave is supposed to be a slack resource but I think it doesn’t reduce the pressure to perform… but I could (as I did) spend these hours (weekends + evenings) in office to learn localSys, provided I’m engaged, in the mood.

Need to think harder about how to turn these large quantities of high-quality spare resource into stress-relief. These resources feel like untapped oil reserve, in an age of impending resource depletion.

— VP

Some VP’s like Youwei feel over-stretched/overwhelmed. I would say a non-lead role or a contract role would leave me with precious slack resources, in terms of mental energy and time.

— cash reserve is the other major slack resource…

##[19]low-churn,easy domains4career{50

I am always looking for low-churn, accumulating skillset that I can rely on after age 50, without strenuous effort. Such effort is possibly harmful for the brain.

My chosen domains are not white hot domains attracting the young bright kids. I always hope to pick up these skills and try to build them into my “portfolio” of skill assets. If 3 out of 5 decline, I still have the other assets to provide meaningful employment till 70. I have more listed in my spreadsheets “techBets” + “marketableDomains”

  • 😦 #1 example Quant dev — turns out to be too competitive. Low churn but I need strenuous effort even now.. see quant≠sure good for aging dev (also ruthless march@technology)
  • 🙂 mkt data — feels better
  • 🙂 bond math — feels even better
  • VaR math?
  • FIX?
  • algo trading — I didn’t choose it because too competitive
  • — technical skills
  • 🙂 python — worked out well
  • 😦 c# — abandoned
  • 😦 swing — abandoned
  • 😦 MOM — fell out of fashion


real effort but below-bar@@ only Macq

Macq is probably the only job where I focused on localSys GTD but still fell below the bar.

The PIP cast a long shadow and left a deep scar. Am still recovering. This scar is deeper than Stirt …

Remember the abused children, who grew up traumatized? I was not traumatized as a kid. My traumatic experience is still /devastating/ but I can handle it. I have the maturity to handle it.

Adults are often traumatized by failed marriage, in-law conflicts,..

if not4$$, y I sacrifice so much2reenter U.S.#again

Q: As of 2016 to 2019, I didn’t need high salary so badly, so what’s the real reasons why I sacrifice so much to re-enter U.S.?

A#1: foundation — rebuild confidence about career/financial foundation for next 20->25-30Y, since my passive-income/asset/burn-rate profile was (still is) far from comfortable
* age discrimination
* green card
* lower calibre requirement on a typical job .. “semi-retirement job”

A#2: self-esteem rebuild — after multiple blows (三人成虎 , three-strikes)   .. stigma
A#3: far bigger job market, providing much better sense of career safety

linkedHashSet as dynamic sorted collection: O(1)lookup/remove O(logN)insert

linkedHashSet as dynamic sorted collection with O(1) remove/lookup, but insert is O(logN)

Note the java class doesn’t support pinpoint insertion

— My idea:

one-time build a red-black tree then a linkedHashSet therefrom. Te tree is the real container. The linkedHashSet points into the tree nodes.

From now on, additional insert is O(logN) via the tree.

lookup/remove is still O(1) via the hashtable

duplicate-prevention by set^map: java and c++

The contrast of set vs map is important as most of us need to use both construccts.

( The cross-reference of java vs c++ (or c#) is only a curiosity. )

— java map.put() Replaces (and returns previos mapped value)
— java set.add() Rejects (and returns false. True would mean added-witout-conlfict )

Not sure how important this is, but if you ask “which incumbent object blocked my insert?” or “which incumbent is equivalent to me?” then I guess the java Set collection is not the best chocie.

I guess we can use a map where each key maps to itself as the value.  Since both sides are pointers, footprint isn’t doubled.

— c++ set.insert() also rejects as java does. Obviously Multiset won’t.
— c++ map.insert() is consistent with set. The word “insert” implies size increase.
— c++ map operator[] Replaces, similar to java, but more complicated under the surface.

t_siddesh^slowTrack ^t_brank^ T_effectiveOC^ t_gzPeerComp^ gzMgr

  • t_siddesh is more specific about .. a few named individuals like TYY.
  • t_slowTrack is fairly specific
  • t_brank is not on named individuals, less about peer comparison, but focused on the concept of “brank”.
  • T_effectiveOC is about .. the motto “increase effectiveness through other people”, not necessarily brank-related
  • t_gzPeerComp is mostly about “high-flying” peers, very similar to t_siddesh. Perhaps i should remove this tag
  • Most (if not all) blogposts with these two tags are logically in the gzMgr category.

QQ=fake; zbs_learning=treacherous

I used to feel my QQ knowledge was not zbs but now I think many interviewers (like myself) ask zbs questions. zbs is a vague concept. QQ has a well-established range of topics. Experts sizing up each other … is mostly QQ

Biggest danger with zbs learning — when I spend my precious spare time on zbs not aimed at QQ, I inevitably regret. Without QQ, the learning lacks reinforcement, positive feedback loop… and fades from memory.
I think in SG context the dearth of interviews make zbs learning treacherous

Therefore, zbs learning must aim at xref and thick->thin.

One guy with theoretical zbs (either strong or poor QQ) may be very mediocre joining a new team. It depends on the key skill needed in the team. Is it localSys, design …? I tend to feel what’s really needed is localSys.

nth largest element in unsorted array #QuickSelect

Q: Find the kth largest element in an unsorted array. Note that it is the kth largest element in the sorted order, not the kth distinct element.

I think this is mostly a bigO algoQQ problem.

std::nth_element is linear on average .. https://stackoverflow.com/questions/11068429/nth-element-implementations-complexities talks about QuickSelect algo

— idea 6: priority Q (Fib-heap) of size k
if any item is higher than the min, then pop min O(logK) and insert in O(1)
— idea 6: priority Q
Linear time to build it
— idea 5: 95th percentile problem from NYSE
— idea 4: multiple scans
— idea 3: segments
— Sol2: O(N). use the O(N) algo in the blog on “given int array, find median ] O(N)”. Then discard one of the two segments. Then repeat.
Note: Each time the target element must exist in one of the 2 segments.

O(N) + O(N/2) + O(N/4) … -> O(N)

— Sol2a: Use the O(N) qsort partition algo to anchor a (random) pivot element to create two segments. Our target must exist in one of the two, so discard the other by adjusting the le/ri boundaries.

This idea is same as the most voted solution in leetcode discussion.
O(N) on average — we get O(N)+O(N/2) + O(N/4) + … < O(2N)

Note average complexity is acceptable in hashtable!

versioned dict #indeed

Suppose a resume consists of key/value pairs of strings. We need to support two operations —

  1. Each write(key, value) operation creates or updates a key with a new value, and returns an incremented integer vid i.e. version ID. A version ID represents a point in history, i.e. a snapshot of the entire resume
  2. Each read(vid) returns a key/value dictionary as a complete resume snapshot
  3. * In a realistic system, we also need a deleteVersion(vid) but no need to optimize for it

Single threaded system. Version 0 is an empty resume.

Q: design an efficient solution.
Q: after creating N versions via N write() operations, what’s the time and space complexity of next write()? What’s the time and space complexity of the next read()?


Without loss of generality, Let’s say there are K fields in the resume version N. Note K <= N. I should have differentiated K vs N, which is useful for my clarity of thinking.

Holy grail — O(1) read and write. Impossible?

—Design 5 in hind sight, optimized for read(): Simple solution would snapshot entire vector of k/v pairs [1] at each write() i.e. each version. Space complexity is bad (I now believe time complexity is more important.).  Worst case is K==N as version1 creates key1, version 5 creates key5 etc.

[1] not dict because returning a hashmap and forcing client to iterate over the hashmap can lead to sub-optimal iteration efficiency. However, the requirement may say “return a hashmap”

A single read() is presumably O(K) just to read the K fields. I was convinced this was the theoretically limit for read() complexity, because I was thinking of network serialization — either my system or client system need to iterate over all fields of the resume. Now in hind sight I feel interviewer was thinking in java mode — to return entire dict by reference is O(1) rather than O(K) … I didn’t get his hint when he said O(K) read was not really theoretical limit.

Each write() clones the dict for a new vid and saves the new dict in a vector of pointers (!) . Therefore write() is O(K).

—My design 1, Based on the same idea, I used a vector for each key. However, my read() had to create a dict to be returned, on the fly in O(K). I didn’t think of return-by-reference/pointer.

—My design 2 is a minor optimization to remove the cloning of unchanged values. I thought I was on to something efficient but in java (and python?), all strings are copied by reference so my optimization is meaningless in java.

—My design 3 is lazy write. So write() only appends to one of the K vectors. The other K-1 vectors are updated only when needed by a later read() or write(). Amortized cost?

This O(1) write() and O(?) read() complexity can be emulated and surpassed by …

—My design 4 used K RBTrees (small trees) to optimize for frequent write() at O(1). Each write appends on one tree. STL insert with hint is O(1). No such feature in java or python. Read() would construct a new hashmap and return it. Efficiency is discussed shortly.

Design 4b would use a separate vector of Record{vid, value} to replace the small tree. Append is O(1).

Read(vid) requires binary search in each [3] of K trees:

Aha — After N updates, total node count across all K trees is N, so even a naive search would not exceed O(N).

If needed, I can also maintain at O(1) cost a “replay vector of originals” i.e. original {key, value} from each write(). We can use this vector for replay, discussed shortly.

[3] Q1: Can we avoid scanning all K trees when vid is small or K is much smaller than N ?

A: Like rebus, let’s maintain a separate vector (or RBTree) “birthday” holding records {vid, pointer to a single small tree}. Birthday expands (O(1)) by one element whenever a new small-tree is born i.e. new key is created. My read(vid=55) would binary-search in this birthday data structure using vid to locate the last-born small-tree before version 55, and then iterate backward to visit each small tree born before version 55. This optimized read() uses binary-search throughout to eliminate unnecessary linear search.

Q1b: Worst case read():  K == N i.e. every write() creates a new key. So read() is still O(N)?

A: read() can become O(A) where A:=difference between queried vid and any smaller vid previously cached.
A: If we find out in practice K is close to N, we can use additional data structure to memoize previous read(vid) results i.e. the new hashmaps. I can use a sparse vector of records {pointer to read(vid) result}, indexed by vid. Each read() would save, at O(1) cost [4], the result in this memoization vector. For the next read(vid=77) I would use binary search in this memoization vector to find the closest lower neighbor (eg 55), then replay from vid=77 towards  55, using XR’s algorithm.

[4] see my blogpost on memoization keyed by auto-increment id

Note RBTree supports efficient deletion from both birthday tree + small tree.


Now i feel a big string representing the entire resume is still under 100KB. I assumed were were talking about millions of fields of gigabytes each.

For a long time I was confused and assumed that each write() could update multiple key/value pairs.

cod`drill:YOUR satisfactions@@ wipe_out [def2] #Rahul

Rahul used the word “satisfaction”. (Sometimes I call it “intrinsic motivation” or “joy”.) The satisfaction factor is crucial to absorbency. It enables Rahul to put in so many hours.

Q: What are my satisfactions i.e. intrinsic motivation?
A: discover my own solutions that are reasonably efficient even if not optimal. Almost never elegant and simple.

I don’t mind reading a classic solution in a Real (possibly online) publication but I hate to read solutions in forums as XR does. Those forum posts leads to wipe-out

  • completely wipe out that precious satisfaction.
  • positive feedback loop broken
  • self-esteem destroyed
  • self-mastery destroyed
  • sense of progress wiped out
  • sense of self-improvement wiped out
  • absorbency wiped out
  • I feel /diminished/ and worth-less.
  • i feel hopeless giving up

It’s possible that the forum posters also learned the solutions from publications. But I basically assume they are just so brainy.

a standard representation of binTree

I think on Leetcode there’s now a standard representation of simple binTree. https://leetcode.com/problems/convert-sorted-array-to-binary-search-tree/ shows one example.

see also (de)serialize binary tree #funptr ECT

Let’s not spend too much time here. This representation is simple but not very visual. Can’t deal with recombinant trees. It is useful only for Binary trees not general graphs. Graphs need adjacency matrix or edgeSet.

— My own idea is probably similar:

Can we make do without id and design the serialization sequence to save/reconstruct the tree structure?
I can output layer by layer.  If 2nd layer has no missing node, then 3rd layer consists of exactly 4 nodes, using a sentinel value for any NUL.  If 2nd node is NUL like [A/NUL/C/D], then the next layer would have “2 child nodes for A, 2 child nodes for C, 2 child nodes for D”, where the 2 child nodes for A could be NUL/NUL

What if all integer values are valid values? Trivial problem. I could add a one-bit flag to each serialized payload.

The id/edgeList based solution is general purpose and efficient. The technique above are 雕虫小技. Venkat of OC said something similar about regex state machine. In the same vein, disjoint set is a general solution though many problems have simplified union-find solutions.

camp_out]ff: %%coping strategy/strength

Exec summary: If I find myself engaged, I would want to spend extra quiet hours including weekends working in office and make progress on localSys. I would be able to apply myself and match my brain power against the accumulated brainpower baked into the local code base.

By “Camp-out” I mean all forms of extra time spent working in office. Camp-out is a concrete form of engagement. Usually the engaging period is a honeymoon but hopefully it can become a more sustained focus on localSys. If it does happen (I won’t push myself too hard…) I would hit better GTD, productivity, and earn some respect from some colleagues (not necessarily the boss).

More importantly, I would feel proud of myself and free of regret or guilt. See warning below.

Capture the mood. Eg: You get an engaging project that give you insight into portable GTD, tools, or localSys.

  • Xp: Qz — I did capture the mood a few (like 3+) times. I felt really good even though the camp-out didn’t “save me” in the end. Like U.S. presidential election, the “end” result depends on many many things so it’s unwise to use it as a big paint brush and wipe out all the effort.
  • Xp: Macq — I made big efforts to camp out to work on c++ build system and python. No regrets. The camp-out didn’t really build my foundation in localsys as the expectation was too high. No regrets!
  • xp: OC — I did a few times, mostly to learn c#. No regrets.

— Warning: camp-out may not earn me respect, GTD, enough localSys mileage. Ashish feels I would survive.

I feel I may still get PIP, but after such an all-out effort, I would be decisive to leave. Consider Qz and Macq.

— I see this as my competitive strength and an effective coping strategy.

I used to feel I’m kind of inadequate if I have to camp out and sacrifice family time just to keep my head above water. Wrong! Most of my peers simply don’t have the absorbency, and productive solitude to do this. Many can’t even imagine it.

I said my son may have more or less math abilities, but his effort (chiefly absorbency) is probably lower than his classmates. My daughter’s abilities is unknown, but her absorbency is better than her brother. Similarly, my absorbency is higher than my peers as demonstrated by camp-out.

— Initially, May need 6M-9M to get over the initial hump. I can even rent a temp home for 6M if I feel engaged. See kids only on weekends.

— As contractor I feel the motivation but zero pressure to camp out; As employee I feel pressure to camp out. Clearly I feel better as contractor.

— Q: In my experience, part of the camp-out hours are often spent on blogging (flight control) and morale-boosting workout. OK?

— Q: However, if code is too opaque, I won’t be able to figure things out 😦 What can I do?

— Q: What if I receive a PIP.. will I be able to persuade myself to camp out?
A: Yes. I did in GS, Qz and Macq. No honeymoon, but I have been able to overcome the huge negative energy, convert it partially to motivation and get myself to camp out and focus on GTD.

I like this imagery — I felt like a boxer on my back-foot, not hopeful but not pessimistic either.

— Q: If in the mood for localSys but need to sleep? Shall i go home?
A: don’t go home to sleep. Sleep in office. Go home to eat then come back to office.


in-place merge: 2 sorted arrays

Q: Given two sorted integer arrays nums1 and nums2, merge nums2 into nums1 as one sorted array.

  • The number of elements initialized in nums1 and nums2 are m and n respectively.
  • You may assume that nums1 has enough space (size that is greater or equal to m + n) to hold additional elements from nums2.

I would add a requirement — O(1) additional space. so you can’t create another array. This can be realistic if allocation is strictly controlled to prevent fragmentation in embedded environment.


Rather contrived, so I won’t spend too much time

–Idea: use num1’s right portion as the “new” array.

Suppose the allocated array of num1 has capacity k >= m + n. I will call it array KK. Note The right portion of KK is currently unused, so I can wipe it clean with some dummy value.

( If no dummy value is possible, then I probably can still solve the problem but with less clarity. )

Now backscan both arrays and put the highest value in KK[m+n-1] , filling KK leftward. The spare capacity to the right of this position will remain unused forever.

Implementation note — We need a back-scanner pointer into num1 as “cur” + another pointer to the right, “lastPicked”… meaning the item at this position has been copied to KK.

(We may not need lastPicked pointer, but it is less ambiguous more clear, easier to reason with.  You may say it’s a device for analysis and communication, not necessarily for coding.)

We also need such a pointer into num2.

staircase:1or2 each step

Q: You are climbing a stair case. It takes n steps to reach to the top. Each time you can either climb 1 or 2 steps. In how many distinct ways can you climb to the top?

I think this problem is not “easy”. similar to Fib:

f(3) = f(1) + f(2) where f(1) = how many unique paths after taking a double-step; f(2) how many unique paths after taking a single step.

##strongER trec taking up large codebases

I have grown from a sysAdmin to a dev
I have grown from web and scripting pro into a java pro then c++ pro !
Next, I hope to grow my competence with large codebase.

I feel large codebase is the #1 diffentiator separating the wheat from the chaff — “casual” vs hardcore coders.

With a large codebase, I tend to focus on the parts I don’t understand, regardless that’s 20% or 80% of the code I need to read. I can learn to live with that ambiguity. I guess Rahul was good at that.

In a few cases, within 3M I was able to “take up” a sizable brown field codebase and become somewhat productive. As I told Kyle, I don’t need bottom-up in-depth knowledge to be competent.

In terms of count of successes taking up sizable brown-field codebase, I am a seasoned contractor, so I would score more points than a typical VP or an average old timer in a given system.

  • eg: Guardian — don’t belittle the challenge and complexity
  • eg: mvea c++ codebase — My changes were localized, but the ets ecosystem codebase was huge
  • eg: mvea pspc apportionment/allocation logic + price matching logic
  • eg: mtg comm (small green field), integrated into a huge brown field codebase
  • eg: StirtRisk personalization — I made it work and manager was impressed
  • eg: StirtRisk bug fixes on GUI
  • eg: Macq build system — many GTD challenges
  • eg: Moodles at NIE
  • eg: Panwest website contract job at end of 2016
  • ~~~ the obvious success stories
  • eg: AICE — huge stored proc + perl
  • eg: Quest app owner
  • eg: error memos
  • eg: RTS

##conquests since GS #j,c++..

Until I left GS, I didn’t know how it feels to “conquer” a sizable, lucrative tech skill. Such a tech skill represents a specific job market with supply and demand.

  • perl? not sizable not lucrative 😦 Until 2011, Perl was my only core competency 😦 How lucky am I now !
  • coreJava QQ (as defined on WallSt) .. was my first conquest. After this conquest I haven been trying to replicate my success story, while many peers stayed in one firm in order to move up.
  • SQL? Many interview topics not explored, but now SQL is no longer a sizable job market:(
  • MOM? Not sizable
  • sockets? not so sizable, not yet conquered
  • bond math … was a small conquest
  • c++ QQ .. was my 2nd conquest, as experienced in 2017 onward
  • CIV .. was my 3rd conquest. Growing bigger, though I only rate myself B among west coast candidates.

git | diff | make file6 appear before file2

The pain — a big commit touches on too many files, 3 of them major changes, while 5 files have cosmetic changes. In git-diff or git-difftool, I want to focus on the 3 major files.

Some developers would want to split the big commit into a “major” and a “cosmetic” commit, but one of the snapshots would be half-cooked and unusable. In such a case, the commit comment would warn “This commit is part of a bigger change. Please do not build this commit alone”.

As an alternative we can make the 3 major files show up first (or last) in diff. You can put the content below in src/diffOrder.txt and use it as instructed. All unspecified files would show up in natural order.

# git diff -O src/diffOrder.txt
# check EACH line below to ensure unix line ending, even if entire file is unix-style.
# wildcards are needed below.


#include order among my own headers

Each header has an “amount” of dependencies, each being another included header.

Some header files are very short and have only “‘system” #includes. They are known to be “light” headers with light dependencies.

Majority of my headers have more than two non-system #includes. Consider class declaration headers. The more custom #includes it has, the more dependencies it has.

My #include order is from heavy to light — Last to include are C headers. Above them are the c++ system headers. Among my custom header files, I include the simpler, lighter ones later. This way, if a heavy header is missing some #include (a coding error), it would (more likely) get caught by compiler.

A common practice is to put lots of “frequently used headers” in a util.h, and include it on top of every file. I don’t like it.

too many DB-writes: sharding insufficient #Indeed

Context: each company receives many many reviews. In a similar scenario, we can say that too many user comments flood in during the soccer world cup.

Aha — the updates don’t need to show up on browser in strict order

Aha — commenting user only need to see her own comment on top, and need not see other users’ latest comments. The comments below her own could be browser-cached content. Ajax…

Interviewer: OK you said horizontal sharding by company id can address highly concurrent data store updates. Now what if one company, say, Amazon, by itself gets 20% of the updates so sharding can’t help this one company.

me: I guess the update requests would block and possibly time out. We can use a task queue.

This is similar to WordPress import/export requests.

  • Each task takes a few seconds, so we don’t want user to wait.
  • If high server load, the wait could be longer.
  • More important — this task need not be immediate. It can be scheduled.
  • We should probably optimize for server efficiency rather than latency or throughput

So similarly, all the review submissions on Amazon can be queued and scheduled.

In FIX protocol, an order sender can receive 150=A meaning PendingNew. I call it “queued for exchange”.  The OMS Server sends the acknowledgement in two steps.

  1. Optional. Execution Report message on order placement in OMS Server’s Order Book. ExecType (150) = A (Pending New)
  1. Execution Report message on receiving acknowledgement from exchange. ExecType (150) = 0 (New). I guess this indicates placement in exchange order book

Note this optimization is completely different from HFT latency engineering.

MI loses type info about subclasses

[[Alexandrescu]] pointed out a fundamental weakness in MI for library design — base classes “do not have enough type information to carry out their tasks”, and “MI loses type information (about subclasses) which abounds in templates”

This statement can only be understood based on TMP. I like and will repeat this statement in QQ interviews. If interviewer is knowledgeable enough about TMP to quiz me further, I would say

“TMP can be combined with inheritance. I don’t remember the various TMP techniques that make use of the type information of subtypes”.

TMP is a compile-time technique so type information is more available.

max salary: simple game-plan

The strategy — “Whether I ask for a big base or modest base, there’s a chance I may have problem with manager expectation. So let’s just go for the max salary and forget about learning/tsn

    • algo trading? tend to pay a premium, but I wonder how they assess my trec.
    • java/c++ combo role? will Not pay lower
    • Some quant skill? tend to pay a premium
    • If a HFT shop makes a real offer at S$150k base I will decline — no real upside for me. Similarly, if a quant dev job pays $170k base I will decline — the promised accu (across jobs) is a big mirage. Accu can happen within a single job, but so is the technical accu on a single job.

Max-salary game plan must not ignore :

  • correlation between salary and expectation — as observed in some past jobs but not in every lucrative role. My Barclays and 95G roles were great.
  • the stigma, damagedGoods and high expectations in Stirt and Macq…. Ashish’s view — just earn the money for 6 months and leave if not happy.
  • commute
  • reputation risk at the major banks.

Am i still a survivor? I would say YES in OC and GS, and yes in Macq based on the internal transfer offer.

Mithun suggested — Are we traumatized/scarred and fixated on the stigma? I said the same to Deepak CM.

DFT||BFT: “seenNodes”

Q: in graph DFT and BFT, do we need a “seen” hashtable to remember visited nodes?

A: I probably needed it in some of my coding problem
A: I think so. The textbook DFT/BFT algos use white/grey/black for this purpose. It’s basically a

hashtable { node addr -> color } esp. if nodes are readonly.

This color technique is not needed for a simple binary tree, but it can become useful in tough CIV such as cyclic graphs.

pre-order: Polish notation #any binTree

  • Pre-order walk can create a expression tree in Polish notation
  • Post-order walk can create a expression tree in reverse-Polish notation

https://en.wikipedia.org/wiki/Tree_traversal#Uses has a concise example.

This usage is fairly popular in my past coding interviews.

In fact, the wikipedia article goes on to say that pre/post order walk can create a “representation  of a binary tree” … see my blogposts for details such as BST: post-order as serialization.