multimap implementation

Given a multimap of {name -> Acct}, here’s a practical question:

Q: how do you save two different Acct objects having the same name?

I would use a linked list to hold all the different Acct objects for a given name. The tree node would hold the linked list.

std::multimap CAN hold different values under the same key, but std::multimap has other constraints and may not be able to use my simple idea.

 

convert any-size int to my host endianness

https://linux.die.net/man/3/be64toh is what I was looking for.

There are many unofficial solutions on StackOverflow etc. They unconditionally swap the bytes, but what if the host endianness is same as the input? I don’t like these flaky solutions.

The standard ntoh functions are like betoh, because “n” means “be” i.e big-endian.

## stable base;fluid superstructure in G5 tech skills

My perspective in this post is mostly tech interview — accumulation, retention of knowledge, prevention of loss due to churn. As I age, I’m more selective what new technology to invest into.

In contrast to the interview perspective, the GTD perspective is dominated by localSys ! So churn doesn’t matter.

Many of these technologies are past their peak, though none is losing relevance. However, I tend to perceive these tech skills as robust and resilient, time-honored. Among them, I see a partial pattern — Many of them exhibit a stable base; some of them show a fluid superstructure of skillset.

  1. essential algorithms/data_structures and bigO — stable, possibly growing base + fluid superstructure
  2. java skillset — has a stable base i.e. coreJava + fluid ecosystem including jGC, jxee
  3. C/C++ — has a stable base skillset including TMP, STL… The superstructure is fluid mainly due to c++0x
  4. SQL and socket — each has a stable base
  5. pthread, unix internals ..– ditto
  6. unix (no QQ topics, but many GTD skills) — stable superstructure skillset including instrumentation, utils and scripting. Base? tiny.
  7. http stack — stable but tiny base including session/cookies + fluid superstructure

sorted_Array|_List ⇒ balanced_BST

Q1 (easy): Given an array where elements are sorted in ascending order, convert it to a height balanced BST, where the depth of the two subtrees of every node never differ by more than 1.

Q2 (medium): How about a sorted slist?

==== analysis

I feel this “easy” problem should be medium. Perhaps there are elegant solutions.

I need not care about the payloads in the array. I can assume each payload equals the subscript.

There are many balanced BSTs. Just need to find one

— idea 1: construct an sequence of positions to probe the array. Each probed value would be added to the tree. The tree would grow level by level, starting from root.

The sequence is similar to a BFT. This way of tree building ensures

  • only the lowest branch nodes can be incomplete.
  • at all branch levels, node count is 2^L

I think it’s challenging to ensure we don’t miss any position.

Observation — a segment of size 7 or shorter is easy to convert into a balanced subtree, to be attached to the final BST.

When a subarray (between two probed positions) is recognized as such a segment we can pass it to a simple routine that returns the “root” of the subtree, and attach it to the final BST. This design is visual and a clean solution to the “end-of-iteration” challenge.

The alternative solution would iterate until the segment sizes become 0 or 1. I think the coding interviewer probably prefers this solution as it is shorter but I prefer mine.

— idea 2

Note a complete binary tree can be efficiently represented as an array indexed from one (not zero).

— Idea 3 (elegant) dummy payload — For the slist without using extra storage, I can construct a balanced tree with dummy payload, and fill in the payload via in-order walk

All tree levels are full except the leaf level — guarantees height-balanced. We can leave the right-most leaf nodes missing — a so-called “complete” binary tree. This way the tree-construction is simple — build level by level, left to right.

— idea 4 STL insert() — For the slist, we can achieve O(N) by inserting each element in constant time using STL insert() with hint

— For the slist without using an array, I can get the size SZ. (then divide it by 2 until it becomes 7,6,5 or 4.) Find the highest 3 bits (of SZ) which represent an int t, where 4 <= t <= 7. Suppose that value is 6.

We are lucky if every leaf-subtree can be size 6. Then we just construct eight (or 4, or 16 …) such leaf-subtrees

If SZ doesn’t give us such a nice scenario, then some leaf-subtrees would be size 5.  Then my solution is to construct eight (or 4, or 16 …) leaf-trees of type AAAAA (size 6) then BBB (size 5).

Lucky or unlucky, the remaining nodes must number 2^K-1 like 3,7,15,31 etc.  We then construct the next level.

–For the slist without using an array, I can propose a O(N logN) recursive solution:
f(slist, sz){
locate the middle node of slist. Construct a tree with root node populated with this value.
Then cut the left segment as a separated slist1, compute its length as sz1. node1 = f(slist1,sz1); set node1 as the left child of root.
Repeat it for the right segment
return the root
}

##longevity rating: java is outlier ] language_war

Among languages, java is way ahead of the pack. We need to treat java as exceptional, outlier. With that, c++ looks rather solid and decent. Shall we merge this table into my pastTechBet.xlsx sheet? Partially merged, but no need to update the sheet.

longevity rating #bias tech skill mkt share prominence domains
80 % java robust 2000’s
40 % py rising 2000’s
50 % c/c++ fell 1980’s HFT, gaming,
telco,embedded, AI/ML
20 % c#/dotnet fell 2010’s
30% php ? 2000’s
10% perl FELL 2000’s automation
40% javascript rising 2000’s
30 % RDBMS/sql fell 1990’s
70 % socket robust 1990’s probably
90% TCP/IP dominant 1970’s
20 % MOM robust
90 % http stack dominant 2000’s
90 % unix “tradition” dominant beyond memory

efficient memoization: keyed by auto-increment id

Market Value of this knowledge — I feel this level of intricate knowledge is useful mostly in bigO coding interviews. If you get the bigO wrong there, you can lose major points. Versioned dict #indeed is one typical coding interview question, where the result of read(id) can be memoized to speed up subsequent reads.

Q: what’s the optimal time complexity of maintaining the memoization data structure? User requests can hit us with read(9), read(88), read(15), read(0), read(15) again…

Notation — N:= number of id values.

  • Option 0 (unusable): hash table — nearest-neighbor search impossible
  • Option 1 (sub-optimal): RBTree —  Lookup is O(logN). In general, insertion costs O(logN).

( Special case — If the read() id never decreases, then STL RBtree insert-with-hint is O(1). )

  • Option 2: vector — indexed by id [1]. Lookup is binary search O(logN). Insertion is always amortized O(1). Here’s why.

Eg: after read(9), we get read(88). We would need to null-fill the vector until position 88. This null-fill design looks sub-optimal.

But how is the id 88 created after 87 ? Probably in some write() operation. So write() can incrementally expand this vector 🙂

Actually, this incremental-insert design is not faster. I would rather leave write() alone as the null-fill design is a mass-insert and minimizes re-allocation.

[1] Is it common or lucky to have id values as auto-increment integers? I think it’s quite common in practice and in algo interviews. Auto-increment Integer id is light-weight and fast to generate.

##xp: enough localSys⇒GTD #but respect≠guaranteed

Did localSys effort save me? See my rather surgical tabulation analysis of past PIP + survivals

  1. GS — No, not a lot of respect, but I was able to hold my end of the log. With insufficient localSys I would get even less respect.
  2. Quest — No. The PIP was due to many factors. I feel my Quest GTD was adequate. With insufficient localSys, I would get even less respect.
  3. 🙂 RTS — yes
  4. 95G + volFitter — limited localSys .. nice green field projects for me.
  5. Macq — No. However, With insufficient localSys, I would have been kicked out within Y1

— Macq? I now feel my localSys at Macq was deeper than at Quest or RTS

  • I put in lots of effort and was able to diagnose most build errors on MSVS and Linux.
  • I needed lots of localSys knowledge to modernize codebase for MSVS-2015 and c++14.
  • I needed localSys + innovative on-the-fly code gen + other skills to add the logging decorator into pymodels. This hack alone is worth a bonus. This hack is more effective more valuable than all my hacks in OC.

However, I feel the expectation was too high so in the end I got PIP and felt like damaged good.

I need to challenge that negative impression of the entire episode.

in-place-remove all dupes from sorted array #100%

https://leetcode.com/problems/remove-duplicates-from-sorted-array/ fully tested.

Simplification theme — minimize state maintenance during the scan. Eliminate every loop variables that can be derived.

variable: lastGoodPos — up to this position, all unique
variable: cur — the front pointer position under examination

My algo — During the scan, if a[cur] == a[cur-1] then just advance cur
else i.e. a[cur] is bigger, then need to save this bigger item in a[lastGood+1]

Roman-to-integer converter

https://leetcode.com/problems/roman-to-integer/

Roman numerals are usually written largest to smallest from left to right. However, the numeral for four is not IIII. Instead, the number four is written as IV. Because the one is before the five we subtract it making four. The same principle applies to the number nine, which is written as IX. There are six instances where subtraction is used:

  • I can be placed before V (5) and X (10) to make 4 and 9.
  • X can be placed before L (50) and C (100) to make 40 and 90.
  • C can be placed before D (500) and M (1000) to make 400 and 900.

Input: “MCMXCIV” Output: 1994 Explanation: M = 1000, CM = 900, XC = 90 and IV = 4.

==== analysis

Too complicated for a speed coding test?

Reusable technique — One back scan might be enough. Normally the rank of letters encountered would increase. A decrease (only one position) means subtraction. See code in https://leetcode.com/problems/roman-to-integer/discuss/6547/Clean-O(n)-c%2B%2B-solution

Reusable technique — hardcode the 6 “subtraction” cases to simplify the if/else logic.



			

CIV: we wish all corner cases+worst data sets r given

In a few problems, the real important “Aha” is getting the full list of corner cases. This list define the boundary of the problem, and would point out the direction, often immediately.

Of course coding test won’t give you the time.  Part of the skill tested is “clarifying requirement”. This is the real meaning.

Sugg: probe the interviewer

Sugg: Gayle pointed out some techniques to create some test data to reveal the boundary

FAQ@leetcode user guide

–“favorite”

For each question, these controls are found beside the votes.

I use the “Favorite” list to indicate “fuxi”

— personal notes on each problem

On the far right edge there’s a hidden button.

Under MyAcct, there’s also a readonly “NoteBook” i.e. all the problems having some notes.

Perhaps use the c++ code editor?

 

language war: 4 criteria

  1. criteria-E: efficiency, performance including platform-specific optimization.
    • Ling of MS felt threading support is crucial
    • many interactive or batch applications don’t care much about latency
  2. criteria-P1: proven, widely used in “my” community (me = tech lead), with mature ecosystem.
  3. criteria-F: killer features, including OO
  4. criteria : RAD, simplicity, ease-of-use

It is instructive to study the “dethrone” stories.

  • Case: On server-side, java dethroned c++ due to P1, RAD, E
    • In contrast, windows GUI seldom use java. They use 1)c++ 2)c# due to P1, F and E
  • Case: Python dethroned perl due to RAD
  • Case: shell scripting is very old but survived, due to F
  • Case: php survived due to F (designed for web server side), RAD

20Y long term trend – demand for high-level language skillset is rising (unsteadily) relative to java/c++. The reasons are numerous and include RAD, F, P1 Q: which factor is the biggest threat to java? A: F, RAD, not E ! I guess the RAD productivity gap between java and python isn’t so big. For large modularized projects, java probably has a productivity advantage.

Q: will c++ (or even java) be relegated to assembly’s status? No but I wonder why.
 

leetcode/hackerrank/geeksforgeeks..Q-selection priorities

–problem selection criteria

  • favor problems of my weakness, though I have covered all categories
  • favor problems with more likes, or oldest problems on leetcode
  • favor “hard” problems, to protect self-esteem
  • For no-clue “ez/med” problems, just read the discussion, to minimize tcost. Similar to high school cram
  • For tractable “ez/med” problems, …?

http://www.geeksforgeeks.org

  • ^ organized by language and data structure etc
  • ^ detailed explanation in each question
  • ▼not so many per-employer questions, but many question are rumored as “asked in XXX”

https://www.hackerrank.com/dashboard

  • ^ provides practice to hackerrank interviews
  • ^ organized by language and technical domain
  • ^ you can test your code, in many languages including C++, java, python
  • ^ no solution

–careerCup

  • ^ organized by employer
  • ^ no solution

–bloommerg (https://codecon.bloomberg.com/) is fine

  • ^ no solution

— hacker earth has many coding challenges, and many practice problems.   https://www.hackerearth.com/problems/

tsn: what if I fail due2capabilities #Okao

Yet another revisit. See also post on dare2fail.

My intern David Okao asked “What if the west coast workplaces are too demanding? Can you cope?” I replied

  • As an adventurer, I don’t mind the risk… robust, resilient confidence
  • As an adventurer, I see myself as adaptable, a survivor
  • I may have my secret weapons
  • I may find my strengths such as domain knowledge, data analysis, trouble-shooting, efficient design, math(yes)

I then went over a similar discussion about MLP with Ashish, when I said —

  • If similar to Macq, I put up a good fight but still fail due to personal “capabilities”, I ought to feel positive about the whole experience.
  • I’m good at the job-hunting game so no real worries.

recombinant binTree^2D_grid

I prefer “recombinant binary tree”. As a special 2D-grid, it has BOTH [2] child links pointing down [1]

[1] or “eastward” in a treeDump() output.

[2] Note if there are four child links then no binTree.

This constraint is important. Important for visualization. Easy to remember:

  • Every edge is slanted at 45 degrees, either southeast or southwest
  • Consequently, we can visualize that all nodes 33 steps below root line up horizontally on the 33rd Level 🙂

Now I feel that any 2D-grid with this feature should be drawn (and used) as a recombinant binary tree. This includes half the 2D-grid problems and path-through-matrix problems 🙂

 

##%% predictionS @prob(PIP)=unreliable

  • [o] RTS — I estimated 20->30-40% risk but actually zero
  • [o] OC — I felt I was doing nothing for months (20% risk) but they offered me a transfer
  • [o] citi — I wasn’t productive (30% risk) but they extended my contract for 6M
  • [u] Macq — I felt I was decent (2% risk), but expectation was way too high
  • [u] Stirt – I felt I was moving up (2% risk), but headcount pressure came as surprise
  • [u] barclays — I did superb job (0% risk) but head count pressure came as surprise
  • [u] 95G — I did superb job (0% risk) but head count pressure came as surprise
  • [o=overestimate of prob(PIP or layoff)]
  • [u=underestimate]

clone connected graph given any one node #!!need4stdSol

There are a few ways to phrase the same question.

  1. For a connected graph of colored nodes (say, only a few colors or black/white), you are given one node only. Please clone entire graph.
  2. Given one node in a connected graph, visit every node and recreate entire graph in another computer.
  3. Given one node in a directed graph, serialized it, transmit to another computer and deserialize.

===== analysis

I feel this is one of the most practical algo problems. I feel my solution is fairly efficient for binary tree too.

Not too hard conceptually, if your mental picture is clear. I would need to assign node IDs (based on addresses [1]) to differentiate the black nodes. These IDs are serialized. So are the edge lists [1].

— For serialization, Basic idea is a BFT (or DFT) to visit every node. Each time a node address is encountered again (probably among the edge list of some node), we would simply skip it.

Within each node, the this->edges field is “translated” and serialized as a list or treeSet of node_IDs. The serialized file looks like

  • ID=1, edges=3,4,5,6
  • ID=2, edges=3,5,7
  • ID=3, edges=5,7,9
  • ID=4, edges=6,8

I think serialization is O(V+E). I think same O() for deserialization.

— For deserialization, we can simply construct all node objects in an array, using ID as index.  The edge list can stay as is. Optionally, we translate each edge from an ID into an address in the array.

[1] these are basic algoQQ pointers

target-graphNode is usually given by node address

It’s easy to get mislead when a question says a target node to search for is given by payload value or by key.

I always assume payload can be black/white, and key can be repeating! Therefore, the most generic way to identify a node is by address.

However, in the rare case where the graph nodes are movable in the given problem, then we must use node keys. What if the graph has no key? Then it’s not a search graph!

vi on multiple files

— 2019 xp: I find it useful to quickly exit vi, examine another file, then use bash to recycle previous (long) vi-launch command.

I rely heavily on the fact that vi remembers the last cursor (and search pattern) per file.

Whenever I run a vi-launch command, I quickly decide on file1 file2 file3 names, picking and shuffling among the hottest files.

—-

[3/4] means vi receives 3 keystrokes; we hit 4 keys including shift or ctrl …

–“split” solution by Deepak M

vi file1 # load 1st file

  • :sp file2 # to show 2nd file upstairs
  • :vsp file3 # to show 2nd file side by side
  • You end up with  — file2 and file3 side by side upstairs, and file1 downstairs!
  • [2/3] ctrl-ww # To move cursor to the “next” file, until it cycles back

–the q( :e ) solution

vi file1 # load 1st file

  • :e file2 # to put 2nd file on foreground
  • [1/3] ctrl-^ — to switch to “the other file”
  • This solution is non-ideal for moving data between files, since you must save active file before switching and you can’t see both files

–editing 3 or more files

  1. vi file1 file2 file3
  2. q(:n) to switch to next, q(:N) for previous…
  3. q(:args) shows all files with current file highlighted
  • –Suppose now you are at file2.
  • q(:e file4) works. q(^) will toggle between file2 and file4
  • However, q(:n :N  :args) only work on the original list, not new files from q(:e)

q(:n :N ^) always shows the current filename in status bar:)

invalid/unbalanced brackets: kernel #62%

Q: given a string of N left+right brackets, you can find out how many invalid chars to remove to make it balanced. Say it’s R. There are up to N-choose-R ways to make it balanced. Print all unique balanced strings in compact form.

====analysis

subproblem: minimum how many invalid chars to remove

Useful preprocessing technique — on the left end, any closers must be removed. Same on the right end. No choice 🙂 We would end up with a valid char on each end. This is nice but optional in my Idea3 below.

— my idea 3 to count minimum cuts

Aha — There will surely be some “kernels” i.e. opener-then-closer in a row. First scan I will remove them, then remove more kernels. This is always optimal if we aim to minimize the cuts

  • [] ] [ [] ] [ [] ] becomes ]
  • []][[] becomes ][
  • [] [ [] [] [] [] becomes [
  • [[[][]][[]] becomes [
  • [[][]]][[]]][[] becomes ]][

What remain are the positions of bad chars. I need to remember these positions.

Case: closers only. Let’s look at one position like #55. We can cut #55 or another closer at an earlier position.

Case: openers only. Similar to the above.

Case: closers-openers. The original string is partitioned into exactly two sections, each similar to the above cases.

XOR: key points4 (coding)IV

  • usage: swap two int variables. I think the “minus” trick is easier to understand
  • usage: doubly-linked list … MLP-sg connectivity java IV#2
  • usage: ‘1’ XOR an unknown bit among neighboring bits => TOGGLE the bit
  • usage: If you apply an all-1 toggle (bit vector), you get the “bitwise NOT” also known as “one’s complement”
  • Like AND, OR, this is bitwise meaning each position is computed independently — nice simplicity

If you are looking for a one-phrase intro to the 2-input XOR, consider

) TOGGLE ie toggle the selected bits. If you apply a toggle twice, you get the original.
) DIFFERENCE ie difference gate

See https://hackernoon.com/xor-the-magical-bit-wise-operator-24d3012ed821

— how about a bunch of bits to XOR together?

Wikipedia points out —  A chain of XORs — a XOR b XOR c XOR d (and so on) — is true whenever an odd number of the inputs are true.
Venn 0110 1001.svg is the Venn diagram for a xor b xor c. Red means True. Each of the three circles were initially meaning if you shoot dart inside the ‘a’ circle, then you get ‘a=True’. If outside the ‘a’ circle, then ‘a=False’. You can see that your dart is red (i.e. True) only when encircled an odd number of times. Note your dart is unable to land outside the big circle.

Insight — iFF you toggle a NULL (taken as False) odd times, it becomes True. Therefore, if among N input bits, count of True (toggles) is odd, then result is True.

Does order of operand matter? No

https://leetcode.com/problems/single-number/ has a O(1) time O(1) space solution using this hack. Applicable on a collection of floats, or dates, or any serializable objects.

 

palindrome problems: patterns

Palindromes are … kinda rare but popular in coding interviews because they are easy to understand and unambiguous.

Most of the Palindrome problems are tough, and often require AuxDS.

You can move two pointers inward, or
You can move two pointers outward from the center if you are confident about the center ( or lower+upper bounds for the center’s location)

streaming data median #70%

Q (Leetcode 295): Design a data structure that supports the following two operations:

  • void addNum(int num) – Add a integer number from the data stream to the data structure.
  • double findMedian() – Return the median of all elements so far.

Follow up:

Q2 If all integer numbers from the stream are between 0 and 100, how would you optimize it?

Q3: If 99% of all integer numbers from the stream are between 0 and 100, how would you optimize it?

===== analysis
For Q2, I use a fixed array to keep the frequency. Based on this array, I maintain a segment tree.
For Q3, I will add two additional “buckets” i.e. array elements for ‘below0’ and ‘above99’.
Below is for Q1
— keeping a size-constrained minHeap holding the “upper house”. Similarly, a maxHeap to hold the “lower house”.

If N is even, then both heaps have same size. If N is odd, then lower half (maxHeap) gets 1 more item and its top is the median

if a new number is below current median it goes into the lower house (max heap) whose top may relocate to the other heap if necessary.

pop() is O(log N/2), so addNum() is O(logN) whereas findMedian() is O(1). Can RBTree achieve the same?

(de)serialize slist: each node points to a random node#Rahul

Q: a slist has one more link field in each node. It’s the address of a random node in the same slist. How do you serialize this linked list?

https://leetcode.com/problems/copy-list-with-random-pointer/ is a similar question where input is head node.

====analysis
I think the generic solution works but there is hopefully a more intuitive one

— solution 1:
first traversal to build a hashtable {node address -> auto-increment id}

2nd scan visits each node to save the random field along with the host node id

If goal is deep-clone, then 2nd scan can build clone list without the random link. Then another pass to create the links.

localSys,big codebase..taxing@the aging memory

A.Brooks talked about innovative brain power. I’m talking about memory capacity.

Now I see that localSys on a big system is taxing on the aging memory. I guess GregM (RTS) might be a cautionary tale. GregM relied on his theoretical knowledge to pass interviews, but not fast enough with local codebase

Is green field better than brown-field codebase? I guess so, based on personal experience. Green-field projects are rarely given to a new joiner but contractors are often hired on green field budget — like Citi, 95G, volFitter and RTS 🙂

Now consider this separate question:

Q: Are the c++11 QQ topics a form of churn on the interview arena?
A: Yes but c++11 QQ is lighter (on the aging memory) than localSys

Q: how about the Coding IV?
A: still lighter stress than localSys.

make unstable sorts stable

For any unstable sort, we can make it stable with this simple technique.

Aha — I have to assume two items of equal key have visible differences such as address. If they are indistinguishable then sort stability is moot.

I will first build a hashtable {address -> origPos}

After the unstable sort, all the items with same key=77 will cluster together but not in original order. Within this segment of the output, I will run another mini-sort based on the origPos values.

This doesn’t affect time complexity. In terms of space complexity, it’s O(N).

merge two pre-sorted halves: array|slist

Q: Given an integer array of which both first half and second half (unequal sizes) are sorted. Task is to merge two sorted halves of array into single sorted array. 

====analysis

Much easier if we have a slist rather than array.

— solution 1 O(N) time but O(N) space:

First replicate the array to a slist. Then maintain 2 pointers. Pick the smallest and relocate it to end of the merged segment (on the left).

— solution 2 in-place without extra space

quick sort will need extra stack space O(log N)

shared vars across files: prefer static field

When I have state to maintain and share across compilation units, there are basically three main types of variables I can create. (Non-static Local variables are simple and don’t maintain state.)

  1. nonstatic field of a singleton class — Note any user of this variable need a reference to the single object 😦
  2. file scope var in a *.cpp  — relatively simple usage, but don’t put in a shared header , as explained in global^file-scope variables]c++
  3. public static fields — most versatile design. Any user can access it after they #include the class header file.
  4. — non-contenders
  5. local static variables — (niche usage) You can create a local static var in myfunc(). To share the variable across compilation units, myfunc() can return a reference to this object, so from anywhere you can use the return value of myfunc(). This is a simple for of singleton.
  6. global variables — monster. Probably involves “extern”. See my other blogposts

The advantages of static field is often applicable to static methods too.

In fact, java leaves you with nothing but this choice, because this choice is versatile. Java has no “local static”, no file-scope, no global variables.

API^ABI

See also posts on jvm portability

  • — Breaking API change is a decision by a library supplier/maintainer.
  • clients are required to make source code change just to compile against the library.
  • clients are free to use previous compiler
  • — Breaking ABI change is a decision by a compiler supplier.
  • clients are free to keep source code unchanged.
  • clients are often required to recompile all source code including library source code

API is source-level compatibility; ABI is binary-level compatibility. I feel

  1. jar file’s compile-once-run-anywhere is kind of ABI portability
  2. python, perl… offer API portability at source level

c++ABI #best eg@zbs

Mostly based on https://www.oracle.com/technetwork/articles/servers-storage-dev/stablecplusplusabi-333927.html

Imagine a client uses libraries from Vendor AA and Vender BB, among others. All vendors support the same c++ compiler brand, but new compiler versions keep coming up. In this context, unstable ABI means

  • Recompile-all – client needs libAA and libBB (+application) all compiled using the same compiler version, otherwise the binary files don’t agree on some key details.
  • Linker error – LibAA compiled by version 21 and LibBB compiled under version 21.3 may fail to link
  • Runtime error – if they link, they may not run correctly, if ABI has changed.

Vendor’s solutions:

  1. binary releases —  for libAA. Vendor AA needs to keep many old binary versions of libAA, even if compiler version 1.2 has retired for a long time. Some clients may need that libAA version. Many c++ libraries are distributed this way on vendor websites.
  2. Source distribution – Vendor AA may choose to distribute libAA in source form. Maintenance issues exist in other forms.

In a better world,

  • ABI compatible — between compiler version 5.1 and 5.2. Kevin of Macq told me this does happen to some extent.
  • interchangeable parts — the main application (or libBB) can be upgraded to newer compiler version, without upgrading everything else. Main application could be upgraded to use version 5.2 and still link against legacy libAA compiled by older compiler.

(overloaded function) name mangling algorithm — is the best-known part of c++ABI. Two incompatible compiler versions would use different algorithms so LibAA and LibBB will not link correctly. However, I don’t know how c++lint demangles those names across compilers.

No ABI between Windows and Linux binaries, but how about gcc vs llvm binaries on Intel linux machine? Possible according to https://en.wikipedia.org/wiki/Application_binary_interface#Complete_ABIs

Java ABI?

 

find longest full path #Rahul

Q: A single string encodes a directory tree. Find the longest full path to a leaf node. For example, “a/222/DDDD” is the longest path in “a\n\t33\n\t\tBB\n\t\tCC\n\t\t\tII\n\t222\n\t\tDDDD”

a
  33
    BBB
    CC
      II
  222
    DDDD

==== analysis

— solution 1: two-scan. O(Number of nodes)

Aha — looks like a pre-order output, so we can build the tree easily.

Each node takes a whole line; indent count indicates d2root. In 2nd scan, we walk the tree in either BFT or pre-order DFT. Each node will use a field to remember pathLengthTillHere.

— solution 2: one-pass. Same O(). Read the input string as a pre-order output.

Aha — No need to build any tree. Just use use a stack to keep the current nodes’ ancestor tree nodes. (Rahul said recursion might be easier).

Same pathLengthTillHere.

Preorder+Inorder list → binTree #50%

https://leetcode.com/problems/construct-binary-tree-from-preorder-and-inorder-traversal/ says — Q: Given preorder and inorder traversal of a tree, construct the binary tree. You may assume that duplicates do not exist in the tree.

For example, given

preorder = [3,9,20,15,7]
inorder = [9,3,15,20,7]

Return the following binary tree…

====analysis

The int values in the int arrays are meaningless and useless. I can improve things by assigning incremental ranks [1] to each node in the inorder list. Then use those ranks to rewrite the preorder list. After that, walk the preorder list:

  • first node is root
  • any subsequent node can be placed precisely down the tree.

I solved a similar problem of converting preorder list into BST — https://bintanvictor.wordpress.com/wp-admin/post.php?post=19713&action=edit

heap usage: C ilt C++/Java #Part 2

I now recall that when I programmed in C, my code never used malloc() directly.

The library functions probably used malloc to some extent, but malloc was advanced feature. Alexandrescu confirmed my experience and said that c++ programmers usually make rather few malloc() calls, each time requesting a large chunk. Instead of malloc, I used mostly local variables and static variables. In contrast, C++ uses heap much more:

  • STL containers are 99% heap-based
  • virtual functions require pointer, and the target objects are usually on heap, as Alexandrescu said on P78
  • pimpl idiom i.e. private implementation requires heap object, as Alexandrescu said on P78
  • the c++ reference is used mostly for pass-by-reference. Pass-by-reference usually works with heap objects.

In contrast, C++ uses small chunks of heap memory.

Across languages, heap usage is is slow because

  • In general OO programming uses more pointers more indirection and more heap objects
  • heap allocation is much slower than stack allocation, as Stroustrup explained to me
  • using a heap object, always always requires a runtime indirection. The heap object has no name, only an address !
  • In Garbabe-Collected languages, there’s one more indirection.

## delights]tsn: engaged,leverage

Sugg — I think we can try and develop a powerful habit to focus on the delights in our memory. Such a perception would help us focus on the delights in the current “struggle”.

The delight should NOT be related to anything “strategic “(mirage) My expectation was invariably way too high in terms of strategic value, leverage and ROTI, therefore underwhelming. These expectations were crushed by job market demand fluctuations.

In terms of engagement … my expectation was slight high.

Against this backdrop, there have been little delights, but stigma/respect (not salary!) was too huge as a factor and has overshadowed those little delights. For a perspective, please look at the spreadsheet “predict next 3-5Y job satisfaction”

I chose a python job at Macq .. engaged for a few months, to my delight.

I chose the c++ job at RTS .. engaged for 6 months, to my delight.

  • Reason: interviews — traction in interviews and also in GTD and zbs
  • Reason: interviews — socket QQ and linux system QQ .. was in-demand
  • Reason: QQ discussions with colleagues .. visible progress

Both job decisions produced good/superior leverage over 3-5Y.

I chose a c# job … engaged for a year and then disengaged, shorter than expected. Leverage was good.

I chose a Quant dev job … engaged for a year and never disengaged. Leverage was underwhelming.

I chose a c++ large scale eq OMS job … engaged for a few months. Leverage is unknown

c#^ java+cpp journeys

my c# xx journey was exciting for 6M Before OC and in first year in OC. In contrast, my c++/coreJava (less for jxee) journeys have generated superior ROTI (elusive) because 1. the interview topics are stable 2. market waves steered me to stick to (not abandon) these career directions leverage? c# is lower but not bad. See separate blogpost — in terms of my expectations

  • java – exceeding my expectations in churn. Found 2nd life in web2.0.
  • c# – missed my expectations. Displaced in web2.0. Google CIV uses 5 anguages, without c#
  • c++ – matching my expectation. slow decline. Efficiency advantage is eclipsed by java and some new languages
  • py – exceeding my expectation
  • javascript – exceeding expectation

For all languages, there is no salary hike, no strategic value so at that level all underwhelming  

technology xyz is dead

I often hear people say “technology XX is dead” .. exaggerated attention-grabbing gimmick, common on social media user posts.

I like the “dead”, old-fashioned, time-honored, stable technologies that are robust (not only resilient) against churn.

The alternative technologies are use-less, worth-less, hope-less, less proven, low-value, more likely to die young or even dead-on-arrival

ruthless march@technology

There’s a concept of “best practices across industry”, as I experienced in Macq. Using new technology, things can be done faster, at a large scale, and more automated, even though I may feel it doesn’t make such a difference.

CTO’s don’t want to be seen as laggards. Same motivation at MS-Iceman, Quoine …

  • PWM-billing, PWM-comm. I remember Mark wanted “strategic improvement” not incremental improvement. He needs it for his promotion 政绩
  • RTS infrastructure was considered (by Jack He and outsiders) outdated and lagging behind competitors

You can call it “ruthless march of technology” — a ruthless progress. At a fundamental level, this “progress” can wipe out the promised benefit of “slow-changing, stable domain knowledge”

  1. quant skillset
  2. SQL skillset — affected by noSQL
  3. c++ skillset — perhaps affected by c++0x
  4. FIX skillset — perhaps affected by faster proprietary exchange APIs?
  5. … However, the skills above are still relatively robust. Other skillsets (they are not today’s focus) have proved arguably more robust against this march — sockets, pthread, STL, coreJava, bond math,.. I listed them in my spreadsheet pastTechBet.xlsx.

## IV skills: disaster-rescue 逃生

I have relied on this parachute over and over again. Therefore, I have real motivation to strengthen this parachute and keep it in good condition.

  • OC — parachute saved me from transfer to BAU
  • 2007 after first entry to U.S., parachute saved me from sitting on bench for months as happened to … Florence
  • Vz — after my contract was cut unexpectedly, parachute saved me from sitting on bench. This proves to be the first of many rescues including 95G, Barclays,
  • [B] Stirt — after the layoff, parachute helped me avoid sitting on bench for months.
  • [B] Macq — after the PIP, parachute saved me from another slow job search on SG job market
  • [B] deMunk — parachute gave me confidence that I had the technical capabilities to escape — My “parachute” was my marketable skillset
  • [B=example of boxer cornered n beaten]

Other people’s examples:

  • [B] Youwei — his java IV skills rescued him from the MS layoff
  • Venkat of OC — it rescued him from a terrible boss
  • Shanyou and Deepak — need better parachutes
  • Jack Zhang
  • Davis Wei

STL map[k]=.. Always uses assignment@runtime

The focus is on the value2, of type Trade.

— someMap[key1] = value2; //uses default-ctor if needed. Then Unconditionally invokes Trade::operator=() for assignment! See P344 [[Josuttis]]

Without default-ctor or operator=, then Trade class probably won’t compile with this code.

— someMay.emplace(key1, value2); //uses some ctor, probably the copy-ctor, or the cv-ctor if value2 type is not Trade

specialize class- but !!function- templates

A fundamental TMP technique is class template specialization.

Class templates’ flexibility can only be achieved via specialization but function templates’ flexibility can be achieved via overload ! Overload is much simpler than specialization.

In [[c++coding standard]], Alexandrescu/Sutter said “Don’t specialize func templates”, for multiple reasons.

After remembering this sound byte, it’s probably important to remember one of the reasons, for IV (halo) and zbs.

[19] localSys proficiency]1st 4M #Macq

Arguably my #1 weakness is the initial learning curve on localSys. For that reason, My figure-things-out speed would be slower than other guys.

Due to my superior interview performance, I “sometimes” show underwhelming learning curve, depending on the other new hires. In some teams like RTS, Stirt, … some new hires were slower, less motivated, more distracted.

Camp-out — is my advantage, provided I have an engagement honeymoon.

self-independent learning — Ashish mentioned it many times. Not sure how realistic. If I can achieve it, I would have a big-gun, as described in big_gun + laser_gun on localSys

respect — Ashish pointed out independent discovery would win me respect from team members. I can recall many experiences to support it.

burning pleasure

virtuous circle — the respect would spur me on.

annual leaves — It’s OK to take some leaves, but if I can spend more time at work in first 3-6 months I have a higher chance of creating the virtuous circle.

Q: At Macq, was the initial 6M ramp-up effective for localSys, GTD?
A: It did help prevent the Stirt mistake. Key is capture of absorbency and burning pleasure, in the engagement honeymoon. In 2015 I didn’t have a lot to capture 😦
A: Consider set-up-to-fail . I would say even with a quick ramp-up, I would still fall below the bar.
A: now i feel 6M is hard to maintain. Realistic honeymoon is 4M, but sometimes it lasts longer.

intern coding interview prep course – codepath

Hi XR,

Today, another intern (4th year) showed me a 12-week prep course he is enrolled in. www.codepath.org is an online course for students keen to join tech firms like Goog.

Each week there are two sessions with coding question assignments. My intern told me most of the assignments are Leetcode "medium" questions, sometimes "hard" problems. He estimated about 15 problems per week. 12 weeks mean 180 problems.

If a student misses three sessions then automatically kicked out. If a student fails to submit assignments for a few sessions, kicked out.

Those are the facts. Now my observations —

I feel these students can become reasonably competent at solving coding problems quickly. Over 12 weeks, they would solve an estimated 180 coding problems. In contrast, I aim to solve average 1 problem a month.

I tend to feel defeated and hopeless by such a stark contrast. I feel we need to set realistic targets , like 1 problem a month. Still better than zero.

Further, I believe many of these students will not keep up the practice after this "crash course" and will forget most of it.

Q: after 12 weeks can a participant easily pass typical west coast coding interviews?

A: I doubt it. Not easily. Improved chance for sure. Note the course has been successful mostly against internship interviews.

A: If it takes only 12 weeks to pass a FB or Goog interview, then millions of programmers would have done it. This reminds me of book titles like [[learn c# in a day]] and [[Sam's teach yourself java/javascript/python/android… in 24 hours]]

A: My team’s intern is a 2nd year college student. He is also practicing on Leetcode, about 10 problems a day on the weekend, so in theory, a sophomore can practice a few hundred Leetcode problems and pass FB employee-level interview? I think this is rare.

I have coded or solved-on-paper 100 Leetcode (or similar) problems, usually without running the leetcode tests. However, I know I’m still quite far below the bar at FB interview.

[19]cod`drill: XR discussion #low-end WC jobs

sustainable learning — Am not aiming for efficient learning. Instead, am aiming for sustainable learning. So I prefer problems without known solutions.

Life-long — Coding drill could become a lifelong hobby just like yoga. My tech-learning used to be 10% on coding drill but now 50% on coding drill. Ditto with yoga.

I can see my “west-coast coding IV” performance improving. It is still below the bar at Indeed, but I made big progress at the unfamiliar task dependency problem.

The harsh reality is, we are competing with fresh grads or even interns. Better accept the reality rather than burying our heads in the sand.

There are a large number of easier jobs on the west coast, so perhaps apply remotely and attend the remote coding rounds for practice.

The west-coast CIV is a more /level playing field/ than QQ IV. More objective. Am not negative about it.

multi-file weekend coding: c++is worst lang

In my weekend assignments, tcost of BP is higher for c++  than other languages.

In any language, function names must match across the files. Ditto for type names and variable names.

Java additionally requires consistency between supertype^subtype, across files. I often rely on Eclipse for that.

C++ adds one more major headache — header files.My C++ coding uses vi or notepad++.

  • more #includes are needed when I extract a header file
  • in header files, I can’ t “use namespace ..”, so I must manually add the namespace prefixes on many names
  • function declaration^definition much match across header file vs implementation files

In a weekend coding assignment, I need to avoid multiple-file coding.

Just keep everything in one file. Tidy up in the end and adhere to Best Practices.

 

 

compute any sum@subMatrix by O(1) hashmap lookup

suppose origin cell is at northwest [1,1] not [0,0]

Pre-processing — We can incrementally compute each the rooted-submatrix SumToOrigin[r,c] in linear time. Save them in a s2o hashmap (shadow matrix is actually better)

Now Consider the submatrix identified by [2,2] and [3,4] enclosing 6 cells. This submatrix has sum =

s2o[3,4]+s2o[1,1]-s2o[1,4]-s2o[3,1]

12 cells + 1 cell  – 4 cells – 3 cells

So any submatrix sum can be computed in O(1). To go through all NNMM of them requires O(NNMM) where N and M are the board dimensions

merge 2 binTrees by node position

Q (leetcode 617): https://leetcode.com/problems/merge-two-binary-trees/submissions/

==== Analysis

https://github.com/tiger40490/repo1/blob/py1/py/algo_tree/merge2Tree.py is a short, simple solution fully tested on leetcode.com but hard to run offline. Elegant in term of implementation.

Insight — Challenge is implementation. Input and return type of DFT function are tricky but server as useful implementation techniques.

Labelled as easy, but pretty hard for me.

— Idea 1: BFT. When a node has any null child, put null into queue. Not so simple to pair up the two iterations

— Solution 2: DFT on both trees. Always move down in lock steps. When I notice a node in Tree A is missing child link that Tree B has, then I need to suspend the Tree A DFT?

My DFT function would have two parameters nodeInA and nodeInB. One of them could be null , but the null handling is messy.

Aha — I set the input parameters to to dummy objects, to avoid the excessive null check. In this problem, this technique is not absolutely necessary, but very useful in general

 

cod`drill: student^mathematician

A student’s approach to statistics is different from a mathematician’s. Likewise, I feel the intern way of coding drill is not good for me

  • not long-term commitment. I believe most of them would stop after getting a job.
  • not in-depth
  • not aimed at deep learning and retention
  • more like cram

However an unbiased interviewer/observer could find the student stronger than a mathematician!

old-timer paid below grads; unable2quit #Davis

My friend Davis revealed to me that many non-VPs earn below the $115k salary offered to a fresh Master’s grad.

  • Davis said employer won’t give you more than 3% annual increment, so quite often it can’t reach $115k.
  • Anthony also said the big hike happens only when you change job [1].
  • Jack Zhang said over 10 years the total increment could add up to below 20k.

Q3: what’s the market rate for your skill as an old timer?
A: probably higher than the grads.

Q3b: so why the old-timers don’t get a better job elsewhere?
A: they don’t feel they can.
A (Davis): these old timers have value-add only because of their localSys knowledge. In a sense, some new-age employers have a prejudice against people of loyalty. They probably associate Loyalty with stagnation and obsoleteness.

Therefore, one-long-job resume can be bad when you change career. I always felt my job-hopper resume is a liability, but some west coast shops don’t care.

I feel these old-timers are afraid of failure [1] at the new job, perhaps after a long, stressful adjustment. I think people do notice that adjustment to a new environment can be very tough and often unsuccessful.

Contractors keep adjusting.. stronger, more adaptable, more confident than those loyal old-timers.

Q6: why is employer willing to pay grads so much more?
A: Employers don’t want to but that’s the market rate set by supply-demand. Ibanks want to bring in fresh talents and must pay the market rate.

[1] A.Gambino discussion .

tsn: esteem hazards^boosts

With the exception of c++, socket(mkt data), py(devops), c#, MSVS … looks like majority of ventures out of my tech sweet spot java/SQL/scripting… presents esteem-hazards (like health hazards) but dismal prospect in terms of self-esteem boost.

high-risk, low-return ventures, in terms of self-esteem?

Looking deeper, the esteem-hazard is really nothing but one mgr’s assessment. For my recent painful jobs, I continue to dismiss/ignore the possible boost to self-esteem —

  • I conquered some of my biggest technology fears — MSVS; c++ crash management; c++ large code navigation.. Other people may not agree, but my own experience proves that these challenges are harder than high-level dev (like web/scripting..). My fears about these c++ challenges were more deep-rooted .
  • OC — I built real mileage in c#. I even proved myself stronger than some c# veterans when I created a simple web server to dump log files for everyday troubleshooting.
  • Macq — I invented elegant solutions for splunk even though boss wasn’t impressed
  • .. many more
  • see more pointers in https://bintanvictor.wordpress.com/wp-admin/post.php?post=27139&action=edit

c++TMP^other QQ topics #java

Alexandrescu’s TMP techniques (not “designs”) are very tricky (not “complex”). They require absorbency, but do they enhance latency? Do they get you higher jobs with lower stress?

I need to make time-allocation decisions among QQ topics, including TMP

In terms of latency, Well, java can now rival c++ in latency. The technical reasons are not obvious nor intuitive, but not my focus today. Just an observed fact which discredits conventional wisdom and our assumptions.

— zbs, based on continued relevance :

TMP is needed when reaching next level in c++ zbs.

TMP is more time-honored than many c++0x features.

Many new c++0x features were added for TMP. I feel TMP is the main innovation front across c++ language n standard development. C++ lost many battles in the language war but no other languages offer anything close to TMP features.

— As QQ

Will C++TMP (and rvr) QQ turn out similar to java bytecode engineering, reflection, generics? (Even in such a scenario, TMP still offers better roti than Qz.) Actually TMP is quizzed more than those. The c++ guru interviewers often adore TMP.. cult following.

EJB is an add-on package .. different category, not an advanced core language feature.

When TMP is not quizzed you may still get opportunities to showcase your halo. Many interviewers ask open-ended questions.

TMP techniques would remain a halo for years to come. Classic QQ topic.

— GTD: templates are never needed in greenfield projects. Occasionally relevant in understanding existing code base such as etsflow, STL, boost..

Q: are there rare projects using TMP and offer me an opportunity to outshine others, gain GTD advantage ..?
A: I guess it’s one in 10 or 20. Here are some factors:

Within a given project codebase, TMP is a powerful tool for DRY improvement and re-usability , but such reusability  is over-rated in most projects.

DRY (don’t repeat yourself) is practiced more widely, but I feel TMP techniques replace 100 lines of code duplication with 20 lines of unreadable code.

 

identical binTree

Q: Given two binary trees, write a function to check if they are the same or not. Two binary trees are considered the same if they are structurally identical and the nodes have the same key.

====analysis:

Look at hints? No need !

— solution 1: I can use the serialization solution and compare the two serialized strings in real time.

— solution 2: BFT but ensure each branch level has strictly 2^n items including nulls

Intuition — If both sides match up on every level, then identical

This solution works if each tree node has a fixed number of child nodes.

— solution 2b: a null node at level N will reduce Level N+1 nodes by two.

— solution 3: recursion
bool diff(node1, node2){
if diff(node1.left, node2.left) or diff(node1.right, node2,right): return false
}
There might be some implementation issues

— Idea 9: BFT but each queue item is {payload, childrenCount}

This solution doesn’t work for binTree as it can’t detect “null leftChild” vs “null rightChild”.

This solution works if each node can have arbitrary number of child nodes, but I don’t know how common this is.

frequent breaks@work → productivity drop

As I get older I believe my brain needs more frequent mini-breaks, but I think sometimes my breaks become distractions. The more treacherous “breaks” tend to be

  • blogging
  • personal investment — Eliminate !
  • generic tech learning, unrelated to current project

— sense of urgency — When taking these breaks, A sense of urgency is what I need, but I favor a relaxed sense of urgency.

— focus — frequent mini-breaks should not affect my focus at work. I might need to avoid chitchats.

I think it’s possible to maintain or enhance my focus by taking breaks.

— stay late — Frequent breaks often require longer hours. I should prefer coming early, but in reality I often stay late.

— I also want to avoid long sit-down sessions. The breaks give relief to my eyes, neck, back, shoulder etc.

%%strength ] c++knowhow #Mithun

See also

Mithun asked me “So you are now completely pro in c++?” I replied

  1. On-the-job technical challenges are not very different from java
  2. On interviews the QQ topics are different. That’s the real challenge for a java guy moving into c++.

Now I feel a 3rd element is zbs beyond GTD and interviews. I have written many blogposts about “expert”. I also have many blogpost in the category “c++real”

Some may say c++ is overshadowed by java, and c++ QQ is overshadowed by coding IV. Well, we need sharper perception and judgment, and recognize the many facets of the competitive landscape. I won’t elaborate here, but c++ has withstood many waves and is more robust than many other technologies.

GS CIV:highest point]%%visible python proficiency

One of the few real pieces of evidence of progress. Average 1-2 times each year.

I should really recognize (not dismiss) the fact that, the 3 questions together represent the toughest python speed coding test I have ever taken.

GS is selective and python is relatively unfamiliar to me. Before 2019, I would not have imagined I could pass this tough coding interview.

  • extreme time pressure, comparable to Indeed and FB
  • unfamiliar territory — linear probing, defaultdict etc
  • completed without error
  • many issues encountered and resolved on the spot
  • bigO analysis performed. No performance bug
  • added additional tests. My implementation is basically flawless.
  • high energy, high concentration
  • highest GTD proficiency ever

us`lowest N natural nums,count BST#70%#Rahul

how do you make use of the previous results for N=2 to tackle N=3?
f(z) denotes count of unique BSTs consisting of the fist z natural number
f(0)=f(1)=1

For N=21, we can have 1 node on the left side, 19 nodes on the right side

  • for odd z, express it as z:=2y+1 where y > 0
    f(2y+1)=[ f(a=0)*f(2y) + f(a=1)f(2y-1) + f(2)f(2y-2)… +f(y-1)f(y+1) ]*2 + f(y)f(y)

Note a should increment from 0 up to y-1.

  • for even z, express it as z:=2y where y > 0
    f(2y)=[ f(0)*f(2y-1) + f(1)f(2y-2) + f(2)f(2y-3)… +f(y-1)f(y) ]*2

Let’s try this formula on f(2)…= 2 good
Let’s try this formula on f(3)…=5 y=1

–N=3:
if ‘1’ is root, then there are f(2) BSTs
if ‘3’ is root, then there are f(2) BSTs
if ‘2’ is root, then there are f(1) * f(1) BSTs
–N=9:
if ‘1’ is root, there are f(8) BSTs
if ‘9’ is root, there are f(8) BSTs
if ‘4’ is root, there are f(3) left-subtrees and f(5) right-subtrees, giving f(3)*f(5) BSTs

This is not coding challenge but math challenge.

Q2: output all BSTs. We need a way to represent (serialize) a BST?

avoid hot domains with bright 30-something

The strongest competitors tend to the 30-something, in terms of

  • code reading,
  • localSys,
  • correlating logs, config, src code, DB, input data…
  • memory capacity
  • [a=not really my weakness]
  • [a] xx new technology — they are strong but I’m not bad either.
  • [a] bigger picture as Josh described — some 30-something are very good at it

C/C++ is a domain a few young guys are good at but shunned by majority of them, therefore a good domain for older people like me.

In the hot domains, I feel clearly less in-demand as I age. Remember Arthur Brooks’s 2019 article? Perhaps I should shift towards domains that the brightest young guys avoid — A subset of 150k FTE@light load  

CIV: I still prefer RBtree #Rahul

  • goal #1 in CIV — speedy completion of an optimal solution in terms of O().
  • j4 #1 — RBTree is more intuitive more natural to my problem solving, more than priorityQ and sorted vector. Given my #1 goal, I would go for my favorite tools.
  • j4 — if the interviewer gives a new requirement, my RBtree may become useful (51% chance) or become unusable (49% chance)
  • Drawback #1 of RBtree — not supported in python
  • Drawback  — array sorting can achieve O(N) using radix sort or counting sort, esp. in the contrived contexts of Leetcode problems.

Q: what if interviewer feels RBtree is overkill and over-complicated?
A: I would say overall bigO is not affected.
A: I would say RBTree supports future features such as delete and dynamic data set.  Realistic reasons are powerful arguments.

Q: what if interviewer gives a follow-up request to simplify the design?
A: if I already have an optimal solution, then yes I don’t mind replacing my RBTree

max rectangle formed us`any 4points#Google#Rahul 50%

Q (Google onsite): given N Points defined by x,y coordinates, find the largest possible rectangle formed with four corner points drawn from the N points.

====analysis

worst input? many points line up along the north and west of a rectangle?

From N points, we get (roughly) NN line segments, actual count is up N(N-1)/2.

Any 3 points forming a right-angle corner is represented as a Corner object with 5 fields (or more)

  • 3 Points: clockwise start point, corner point, clockwise end point
  • 2 line segments

A FloatingLineSegment object has 2 fields {length, slope}

Any 2 line segments of equal length and slope forming a parallelogram is represented as a Para object having fields (or more):

  • 4 Points
  • 4 line segments
  • 2 FLSs
  • a single point representing the central intersection

Each line segment has one FLS and one slope. So we scan the NN line segments to populate some hashmaps.

A hashmap of {FLS -> set of lineSegments}

A hashmap of {slope -> set of FLS }. From any one slope value, we can look up the FLSs of that slope + the FLSs of the opposite slope.

Scan the NN line segments. For each line segment L1, compute the opposite slope. Use slope hashmap to get the FLSs. Use these FLSs to get the line segments perpendicular to L1. Discard any of those line segments not joining L1. This is how to populate .. A hashmap of {lineSegment -> set of Corners}

Scan the NN line segments. For each line segment L1, compute FLS. Use the FLS hashmap to get the other line segments. This is how to populate … A hashmap of {lineSegment -> set of Paras}

Are there more Corners or more Paras? I hope at least one of them is O(NN)

(From NN line segments, we could get a large number of Corners objects. I think the count of Corners is too big if all points line up along some big rectangle. I feel we should start from one corner and try all possible rectangles rooted there, then remove this corner from the search space.)

Solution 1: At any point C1, there are N-1 relevant line segments. Using a hashmap {slope ->?}, it takes up to O(NN) time to identify all Corners at C1. If N-1 is 8, then at most 4×4 such Corners, so worst case O(NN) such corners. For each Corner, check if the opposite corner point exists. This check is O(1). This entire process at C1 takes O(NN).

Overall O(NNN) since there are N points like C1. Average case is much better.

I feel O(NN logN) is possible.

— O(NNN) by Rahul — pick any 3 points, check if they form a corner. If yes, compute the opposite corner and check its existence.

–Idea 3: for the N(N-1)/2 line segments, sort by slope, length, lowest x-coordinate, lowest y-coordinate…?
sorting takes O(NN logN)

–idea 4: group N(N-1)/2 line segments by slope and length.

Within each group, there are up to N points and O(NN) line segments.

— O(NN) solution … wrong!

Pick any 2 points as a pair-of-opposite-corners. Work out the locations of the other 2 corners and check if they exist. This check is O(1).

There are O(NN) such pairs, so overall O(NN).