story: %%investigation skill #etscfg

Hi Paul, In my etscfg I used this simple ifelse:

1        %IFELSE{defined(SOME_CHECK)}%
2        
3        %ELSE%
4        
5        %IFELSE%

To my surprise, Line 4 comes out preceded by a new-line when the if-condition fails. This impacts a lot of config artifacts that shouldn’t be impacted.

On the other hand, when the if-condition evaluates to true, I get exactly Line 2 and I don’t see any new-line added anywhere. This is correct output.

— Investigation —
On my Line 3, the implicit new-line character right after %ELSE% is incorrectly included in $else_block and printed out before Line 4, causing the surprise result. Attached are two test programs showing

my ($if_block, $else_block) = split(/(?:[^\w\n]*)\%ELSE\%/, $block); ### should be
my ($if_block, $else_block) = split(/(?:[^\w\n]*)\%ELSE\%\n?/, $block); ### to include trailing newline in delimiter of split()

In my fix, I also added a “guard” to deal with any trailing spaces sandwiched between %ELSE% and new-line character. I know sometimes I can leave a trailing space sandwiched there.

Altogether, my fix consists of two changes

• 1 new line of perl
• 1 modified line of perl

 

constexpr phrasebook

  • compile-time — value known at compile time. More strict than regular q[ const ].
  • compiler enforced — if a constexpr doesn’t have a compile-time known-value, then compiler will complain
  • subset — constexpr objects are a subset of const objects
  • — usages:
  • integral NDTTP values such as std::array length parameter. See [[effModern]]

linear probing+alternatives #phrasebook

collision — Each hash table has a hash collision resolution strategy. Common choices include separate-chaining (SC), linear-probing (LP), double-hashing (DH)

  • cache — In terms of CPU d-cache efficiency, linear probing beats quadratic-probing which beats separate-chaining. CPU cache efficiency is more relevant for very hot data content.
  • clustering — Like street-parking, this can hurt linear probing. Quadratic-probing and double-hashing can cope better.
  • failure — failing to locate an open slot (when available) is a problem with quadratic probing. DH, LP and SC are guaranteed safe.
  • open-addressing — is alternative to chaining. It includes LP, DH etc.

## localSys note-taking: choices

MSOL
paper Windows10 StickyNotes MSOL Notes Tasks mail folders recoll #1 default
Encourages Periodic refresh A unsuitable B # everything  is temporary C # “browse” F #needs discipline F #needs discipline
browse A B #click to see details A C #poor subjects B #click to see details
searchable F unsuitable unsuitable A D #too many C
Organization of 30+ topics Unsuitable F #intrusive until hidden D C # categories B A # hierarchical
Radiator A B # Kyle use it long-term F F D F
Screen shot F F F A A B
ez2edit B A A Mostly readonly

real effort but below-bar@@ only Macq

Macq is probably the only job where I focused on localSys GTD but still fell below the bar.

The PIP cast a long shadow and left a deep scar. Am still recovering. This scar is deeper than Stirt …

Remember the abused children, who grew up traumatized? I was not traumatized as a kid. My traumatic experience is still /devastating/ but I can handle it. I have the maturity to handle it.

Adults are often traumatized by failed marriage, in-law conflicts,..

if not4$$, y I sacrifice so much2reenter U.S.#again

Q: As of 2016 to 2019, I didn’t need high salary so badly, so what’s the real reasons why I sacrifice so much to re-enter U.S.?

A#1: foundation — rebuild confidence about career/financial foundation for next 20->25-30Y, since my passive-income/asset/burn-rate profile was (still is) far from comfortable
* age discrimination
* green card
* lower calibre requirement on a typical job .. “semi-retirement job”

A#2: self-esteem rebuild — after multiple blows (三人成虎 , three-strikes)   .. stigma
A#3: far bigger job market, providing much better sense of career safety

nth largest element in unsorted array #QuickSelect

Q: Find the kth largest element in an unsorted array. Note that it is the kth largest element in the sorted order, not the kth distinct element.

I think this is mostly a bigO algoQQ problem.

std::nth_element is linear on average .. https://stackoverflow.com/questions/11068429/nth-element-implementations-complexities talks about QuickSelect algo

— idea 6: priority Q (Fib-heap) of size k
if any item is higher than the min, then pop min O(logK) and insert in O(1)
— idea 6: priority Q
Linear time to build it
— idea 5: 95th percentile problem from NYSE
— idea 4: multiple scans
— idea 3: segments
— Sol2: O(N). use the O(N) algo in the blog on “given int array, find median ] O(N)”. Then discard one of the two segments. Then repeat.
Note: Each time the target element must exist in one of the 2 segments.

O(N) + O(N/2) + O(N/4) … -> O(N)

— Sol2a: Use the O(N) qsort partition algo to anchor a (random) pivot element to create two segments. Our target must exist in one of the two, so discard the other by adjusting the le/ri boundaries.

This idea is same as the most voted solution in leetcode discussion.
O(N) on average — we get O(N)+O(N/2) + O(N/4) + … < O(2N)

Note average complexity is acceptable in hashtable!

versioned dict #indeed

Suppose a resume consists of key/value pairs of strings. We need to support two operations —

  1. Each write(key, value) operation creates or updates a key with a new value, and returns an incremented integer vid i.e. version ID. A version ID represents a point in history, i.e. a snapshot of the entire resume
  2. Each read(vid) returns a key/value dictionary as a complete resume snapshot
  3. * In a realistic system, we also need a deleteVersion(vid) but no need to optimize for it

Single threaded system. Version 0 is an empty resume.

Q: design an efficient solution.
Q: after creating N versions via N write() operations, what’s the time and space complexity of next write()? What’s the time and space complexity of the next read()?

——-

Without loss of generality, Let’s say there are K fields in the resume version N. Note K <= N. I should have differentiated K and N, which is useful for my clarity of thinking.

—Design 5 in hind sight, optimized for read(): Simple solution would snapshot entire dict at each write() i.e. each version. Space complexity is bad (I now believe time complexity is more important.) Worst case — K==N as version1 creates key1, version 5 creates key5 etc.

A single read() is presumably O(K) just to read the K fields. I was convinced this was the theoretically limit for read() complexity, because I was thinking of network serialization — either my system or client system need to iterate over all fields of the resume. Now in hind sight I feel interviewer was thinking in java mode — to return entire dict by reference is O(1) rather than O(K) … I didn’t get his hint when he said O(K) read was not really theoretical limit.

Each write() clones the dict for a new vid and saves the new dict in a vector of pointers (!) . Therefore write() is O(K).

—My design 1, Based on the same idea, I used a vector for each key. However, my read() had to create a dict to be returned, on the fly in O(K). I didn’t think of return-by-reference/pointer.

—My design 2 is a minor optimization to remove the cloning of unchanged values. I thought I was on to something efficient but in java (and python?), all strings are copied by reference so my optimization is meaningless in java.

—My design 3 is lazy write. So write() only appends to one of the K vectors. The other K-1 vectors are updated only when needed by a later read() or write(). Amortized cost?

This O(1) write() and O(?) read() complexity can be emulated and surpassed by …

—My design 4 used K RBTrees (small trees) to optimize for frequent write() at O(1). Each write appends on one tree. STL insert with hint is O(1). No such feature in java or python.

Read(vid) requires binary search in each [3] of K trees. After N updates, total node count across all K trees is N, so even a naive search would not exceed O(N).

If needed, I can also maintain at O(1) cost a “replay vector of originals” i.e. original {key, value} from each write(). We can use this vector for replay.

[3] Q1: Can we avoid scanning all K trees when vid is small or K is much smaller than N ?

A: Like rebus, let’s maintain a separate vector (or RBTree) “birthday” holding records {vid, pointer to a single small tree}. Birthday expands (O(1)) by one element whenever a new small-tree is born i.e. new key is created. My read(vid=55) would binary-search in this birthday data structure using vid to locate the last-born small-tree before version 55, and then iterate backward to visit each small tree born before version 55. This optimized read() uses binary-search throughout to eliminate unnecessary linear search.

Q1b: Worst case read():  K == N i.e. every write() creates a new key. So read() is still O(N)?

A: If we find out in practice K is close to N, we can use additional data structure to save previous read(vid) results. I can use a vector of records {vid pointer to read() result}, keyed by vid. Each read() would save, at O(1) cost [4], the result in this memoization vector. For the next read(vid=77) I would use binary search in this memoization vector to find the closest lower neighbor (eg 55), then replay from vid=77 towards  55, using XR’s algorithm.

[4] see my blogpost on memoization keyed by auto-increment id

Note RBTree supports efficient deletion from both birthday tree + small tree.

—-

Now i feel a big string representing the entire resume is still under 100KB. I assumed were were talking about millions of fields of gigabytes each.

For a long time I was confused and assumed that each write() could update multiple key/value pairs.

cod`drill:YOUR satisfactions@@ wipe_out defined #Rahul

Rahul used the word “satisfaction”. (Sometimes I call it “intrinsic motivation” or “joy”.) The satisfaction factor is crucial to absorbency. It enables Rahul to put in so many hours.

Q: What are my satisfactions i.e. intrinsic motivation?
A: discover my own solutions that are reasonably efficient even if not optimal. Almost never elegant and simple.

I don’t mind reading a classic solution in a Real (possibly online) publication but I hate to read solutions in forums as XR does. Those forum posts leads to wipe-out

  • completely wipe out that precious satisfaction.
  • positive feedback loop broken
  • self-esteem destroyed
  • self-mastery destroyed
  • sense of progress wiped out
  • sense of self-improvement wiped out
  • absorbency wiped out
  • I feel /diminished/ and worth-less.
  • i feel hopeless giving up

It’s possible that the forum posters also learned the solutions from publications. But I basically assume they are just so brainy.

a standard representation of binTree

I think on Leetcode there’s now a standard representation of simple binTree. https://leetcode.com/problems/convert-sorted-array-to-binary-search-tree/ shows one example.

see also (de)serialize binary tree #funptr ECT

Let’s not spend too much time here. This representation is simple but not very visual. Can’t deal with recombinant trees. It is useful only for Binary trees not general graphs. Graphs need adjacency matrix or edgeSet.

— My own idea is probably similar:

Can we make do without id and design the serialization sequence to save/reconstruct the tree structure?
I can output layer by layer.  If 2nd layer has no missing node, then 3rd layer consists of exactly 4 nodes, using a sentinel value for any NUL.  If 2nd node is NUL like [A/NUL/C/D], then the next layer would have “2 child nodes for A, 2 child nodes for C, 2 child nodes for D”, where the 2 child nodes for A could be NUL/NUL

What if all integer values are valid values? Trivial problem. I could add a one-bit flag to each serialized payload.

The id/edgeList based solution is general purpose and efficient. The technique above are 雕虫小技. Venkat of OC said something similar about regex state machine. In the same vein, disjoint set is a general solution though many problems have simplified union-find solutions.

camp_out]ff: %%coping strategy/strength

Exec summary: If I find myself engaged, I would want to spend extra quiet hours including weekends working in office and make progress on localSys. I would be able to apply myself and match my brain power against the accumulated brainpower baked into the local code base.

By “Camp-out” I mean all forms of extra time spent working in office. Camp-out is a concrete form of engagement. Usually the engaging period is a honeymoon but hopefully it can become a more sustained focus on localSys. If it does happen (I won’t push myself too hard…) I would hit better GTD, productivity, and earn some respect from some colleagues (not necessarily the boss).

More importantly, I would feel proud of myself and free of regret or guilt. See warning below.

Capture the mood. Eg: You get an engaging project that give you insight into portable GTD, tools, or localSys.

  • Xp: Qz — I did capture the mood a few (like 3+) times. I felt really good even though the camp-out didn’t “save me” in the end. Like U.S. presidential election, the “end” result depends on many many things so it’s unwise to use it as a big paint brush and wipe out all the effort.
  • Xp: Macq — I made big efforts to camp out to work on c++ build system and python. No regrets. The camp-out didn’t really build my foundation in localsys as the expectation was too high. No regrets!
  • xp: OC — I did a few times, mostly to learn c#. No regrets.

— Warning: camp-out may not earn me respect, GTD, enough localSys mileage. Ashish feels I would survive.

I feel I may still get PIP, but after such an all-out effort, I would be decisive to leave. Consider Qz and Macq.

— I see this as my competitive strength and an effective coping strategy.

I used to feel I’m kind of inadequate if I have to camp out and sacrifice family time just to keep my head above water. Wrong! Most of my peers simply don’t have the absorbency, and productive solitude to do this. Many can’t even imagine it.

I said my son may have more or less math abilities, but his effort (chiefly absorbency) is probably lower than his classmates. My daughter’s abilities is unknown, but her absorbency is better than her brother. Similarly, my absorbency is higher than my peers as demonstrated by camp-out.

— Initially, May need 6M-9M to get over the initial hump. I can even rent a temp home for 6M if I feel engaged. See kids only on weekends.

— As contractor I feel the motivation but zero pressure to camp out; As employee I feel pressure to camp out. Clearly I feel better as contractor.

— Q: In my experience, part of the camp-out hours are often spent on blogging (flight control) and morale-boosting workout. OK?

— Q: However, if code is too opaque, I won’t be able to figure things out 😦 What can I do?

— Q: What if I receive a PIP.. will I be able to persuade myself to camp out?
A: Yes. I did in GS, Qz and Macq. No honeymoon, but I have been able to overcome the huge negative energy, convert it partially to motivation and get myself to camp out and focus on GTD.

I like this imagery — I felt like a boxer on my back-foot, not hopeful but not pessimistic either.

— Q: If in the mood for localSys but need to sleep? Shall i go home?
A: don’t go home to sleep. Sleep in office. Go home to eat then come back to office.

 

in-place merge2sortedArrays

Q: Given two sorted integer arrays nums1 and nums2, merge nums2 into nums1 as one sorted array.

  • The number of elements initialized in nums1 and nums2 are m and n respectively.
  • You may assume that nums1 has enough space (size that is greater or equal to m + n) to hold additional elements from nums2.

I would add a requirement — O(1) additional space. so you can’t create another array. This can be realistic if allocation is strictly controlled to prevent fragmentation in embedded environment.

====analysis:

Rather contrived, so I won’t spend too much time

–Idea: use num1’s right portion as the “new” array.

Suppose the allocated array of num1 has capacity k >= m + n. I will call it array KK. Note The right portion of KK is currently unused, so I can wipe it clean with some dummy value.

( If no dummy value is possible, then I probably can still solve the problem but with less clarity. )

Now backscan both arrays and put the highest value in KK[m+n-1] , filling KK leftward. The spare capacity to the right of this position will remain unused forever.

Implementation note — We need a back-scanner pointer into num1 as “cur” + another pointer to the right, “lastPicked”… meaning the item at this position has been copied to KK.

(We may not need lastPicked pointer, but it is less ambiguous more clear, easier to reason with.  You may say it’s a device for analysis and communication, not necessarily for coding.)

We also need such a pointer into num2.

staircase:1or2 each step

Q: You are climbing a stair case. It takes n steps to reach to the top. Each time you can either climb 1 or 2 steps. In how many distinct ways can you climb to the top?

I think this problem is not “easy”. similar to Fib:

f(3) = f(1) + f(2) where f(1) = how many unique paths after taking a double-step; f(2) how many unique paths after taking a single step.

MI loses type info about subclasses

[[Alexandrescu]] pointed out a fundamental weakness in MI for library design — base classes “do not have enough type information to carry out their tasks”, and “MI loses type information (about subclasses) which abounds in templates”

This statement can only be understood based on TMP. I like and will repeat this statement in QQ interviews. If interviewer is knowledgeable enough about TMP to quiz me further, I would say

“TMP can be combined with inheritance. I don’t remember the various TMP techniques that make use of the type information of subtypes”.

TMP is a compile-time technique so type information is more available.

max salary: simple game-plan

The strategy — “Whether I ask for a big base or modest base, there’s a chance I may have problem with manager expectation. So let’s just go for the max salary and forget about learning/tsn

    • algo trading? tend to pay a premium, but I wonder how they assess my trec.
    • java/c++ combo role? will Not pay lower
    • Some quant skill? tend to pay a premium
    • If a HFT shop makes a real offer at S$150k base I will decline — no real upside for me. Similarly, if a quant dev job pays $170k base I will decline — the promised accu (across jobs) is a big mirage. Accu can happen within a single job, but so is the technical accu on a single job.

Max-salary game plan must not ignore :

  • correlation between salary and expectation — as observed in some past jobs but not in every lucrative role. My Barclays and 95G roles were great.
  • the stigma, damagedGoods and high expectations in Stirt and Macq…. Ashish’s view — just earn the money for 6 months and leave if not happy.
  • commute
  • reputation risk at the major banks.

Am i still a survivor? I would say YES in OC and GS, and yes in Macq based on the internal transfer offer.

Mithun suggested — Are we traumatized/scarred and fixated on the stigma? I said the same to Deepak CM.

DFT|BFT: “seenNodes”

Q: in graph DFT and BFT, do we need a “seen” hashtable to remember visited nodes?

A: I probably needed it in some of my coding problem
A: I think so. The textbook DFT/BFT algos use white/grey/black for this purpose. It’s basically a

hashtable { node addr -> color } esp. if nodes are readonly.

This color technique is not needed for a simple binary tree, but it can become useful in tough CIV such as cyclic graphs.