single/multi-thread TCP servers contrasted

In all cases, listening socket LL and worker sockets W4 W5 look like

local/server port local/server ip remote/client ip remote/client port socket file descriptor
 80 ANY unbound unbound LL  3
 80 high port A W4  4
 80 high port B W5  5 newly created

Now the 3 designs. [1] is described in accept()+select() : multiple persistent worker-sockets

LL W4 W5
not used by the “engaged” thr sending data single-thr primitive
not used by the “engaged” thr, but will be monitored via select() sending data single-thr multiplexing[1]
 listening Established. sending or idle sending multi-threaded

boost::optional #NaN #a G9 boost construct has illustrations using optional<int>.

— #1 Usage: possibly-empty argument-holder
I encountered this in MS library:

void myFunc(optional<ReportType &> reportData_){
if(reportData_) cout<<“non-empty”;

In addition, declaration of this function features a default-arg:

void myFunc(optional<ReportType &> reportData_ = optional<ReportType &>());

Q: for an int param, how does this compare with a simple default-arg value of -1?
A: boost::optional wins if all integer values are valid values, so you can’t pick one of them to indicate “missing”

git | diff highlights trailing space

By default there’s a default-enabled highlight (in red) of trailing spaces. I encounter it all the time when running git-diff. Extremely distracting.

$ git config core.whitespace  trailing-space # the dash before “trailing” disables it

Beware: when you copy-paste into a command window, the dash may become something else 😦

If successful, this command adds a line into .git/config like

whitespace = -trailing-space

You can remove this line to restore the “factory default”.

c++^bigData^crypto #SG

Workload (Figure-out-faster-than, team caliber, manager expectation …) are far more important, so let’s not spend too much time comparing :

  • –c++
  • see blog .. my view on java
  • –jxee/bigData
  • 🙂 many “commodity” jobs but often paying rather well.
  • 😦 i feel as a “specialization” this is low quality — high churn, low depth, low complexity,
  • 🙂 but I might be able to leverage it for future positions. I think my full-time python, full-time SQL, web app dev experiences provided the same leverage, albeit no-depth
  • –crypo —
  • 🙂 many small shops capable of paying
  • 😦 low chance to deep dnlg

iterator invalidated{erase: opaque

See also

I had multiple encounters with STL iterator invalidation, esp. after erase().

  • sometimes I get correct result
  • sometimes I get segfault
  • sometimes I get incorrect result

Opaque i.e. no obvious clues, so no keyword to google

Luckily my code was 20-lines

Luckily i could reproduce it once a while.

This is another example of tough bugs to hunt down.

MS stronghold domains

  • Advisory is the business that Morgan Stanley is perhaps best known for, and is traditionally strongest in. I think this includes big deals like M&A and IPO. MS is a top-2 player along with GS.
  • As of 2018, IB and PWM contribute to similar amounts.
  • Trading business — as a sell-side player, MS is market leader in equities (IED).

tcp: detect wire unplugged

In general for either producer or consumer, the only way to detect peer-crash is by probing (eg: keepalive, 1-byte probe, RTO…).

  • Receiver generally don’t probe and will remain oblivious.
  • Sender will always notice a missing Ack (timeout OR 3 duplicate Acks). After retrying, TCP module will give up and generate SIGPIPE.
send/recv buffers full buffer full then receiver-crash receiver-crash then sender has data to send receiver-crash amid active transmission
visible symptom 1-byte probe from sender triggers Ack containing AWS=0 The same Ack suddenly stops coming very first expected Ack doesn’t come Ack was coming in then suddenly stops coming
retrans by sender yes yes yes yes
SIGPIPE no probably yes probably

Q20: if a TCP producer process dies After transmission, what would the consumer get?
AA: nothing. See — Receiver is ready to receive data, and has no idea that sender has crashed.
AA: Same answer on

Q21: if a TCP producer process dies During transmission, what would the consumer get?
%A: ditto. Receiver has to assume sender stopped.

Q30: if a TCP consumer process dies during a quiet period After transmission, what would the producer get?
AA: P49 [[tcp/ip sockets in C]] Sender doesn’t know right away. At the next send(), sender will get -1 as return value. In addition, SIGPIPE will also be delivered, unless configured otherwise.

Q30b: Is SIGPIPE generated immediately or after some retries?
AA: describes Ack and re-transmission. Sender will notice a missing Ack and RTO will kick in.
%A: I feel TCP module will Not give up prematurely. Sometimes a wire is quickly restored during ftp, without error. If wire remains unplugged it would look exactly like a peer crash.

Q31: if a TCP consumer dies During transmission, what would the producer get?
%A: see Q30.

Q32: if a TCP consumer process dies some time after buffer full, what would the producer get?
%A: probably similar to above, since sender would send a 1-byte probe to trigger a Ack. Not getting the Ack tells sender something. This probe is builtin and mandatory , but functionally similar to (the optional) TCP Keepalive feature

I never studied these topics but they are quite common.

3-way sorter/classifier #FB

— Requirement — You are give a list of possibly repeating integers. Each belongs to exactly one category, either category Low, Medium or High. There is a function to return the category for any integer. Please write a constant-space function to output all category-L members, followed by all the category-M members, and finally the category-H members.

If input list has 9 trillion integers, output will have 9 trillion intergers too.
Within category Low, you can print in any order.
Within category Medium, you can print in any order.
Within category High, you can print in any order.

Personally, I would suggest (to make things concrete) that L means single-digit; M means double-digit and H means 3-digit or higher.

I failed to clarify requirements — input is array or linked list?

=== solutions

— 1st design — edit the same list in-place. Extract-insert each L or M element to group all elements into three sections. L on the left, H on the right. First design uses array, So I maintain 2 markers (itegers).

Invariant — the mm marker points to start of the M section. hh marker points to start of the H section, and a current marker points to start of unclassified section. L section is on the left of mm.

  • Every time I see a L element, I move it to beginning of array, and increment mm. (It’s more efficient to move it to left of mm.)
  • Every time I see a M element, I move it to the left of hh, and increment hh
  • Every time I see a H element, I do nothing.

— Second design avoids the inefficient array shift, using STL linked list.

Q: When to adjust mm and hh iterators?
A: Only when erase() hits mm or hh. This is important and easily missed. There’s no error message. Result may not show any difference. As a general rule

“Always check every iterator (except unneeded) when you do a list::erase()”

I plan to implement 1st design in python list (vector internally) and 2nd design in std::list.

–3rd design swap-based. If we only have an array and not a linked list, then swap-based design is efficient is my tested solution

Q: same local IP@2 worker-sockets: delivery to who

Suppose two TCP server worker-sockets both have local address port 80. Both connections active.

When a packet comes addressed to, which socket will kernel deliver it to? Not both.

Not sure about UDP, but TCP connection is a virtual circuit or “private conversation”, so Socket W1  knows the client is If the incoming packet doesn’t have source IP:port matching that, then this packet doesn’t belong to the conversation.

accept()+select() : multiple persistent worker-sockets

I feel it is not that common. See is very relevant.

The naive design — a polling-thread (select/poll) to monitor new data on 2 worker-sockets + accept-thread to accept on the listening socket. The accept-thread must inform the polling thread after a worker-socket is born.

The proposed design —

  1. a single polling thread to watch two existing worker sockets W1/W2 + listening socket LL. select() or poll() would block.
  2. When LL is seen “ready”, select() returns, so the same thread will run accept() on LL and immediately get a 3rd worker-socket W3. No blocking:)
  3. process the data on the new W3 socket
  4. go back to select() on W1 W2 W3 LL
  • Note if any worker socket has data our polling thread must process it quickly. If any worker socket is hogging the polling thread, then we need another thread to offload the work.
  • Note all worker sockets, by definition, have identical local (i.e. server-side) port, since they all inherit the local port from LL.

[[tcp/ip socket programming in C]] shows a select() example with multiple server ports.


slist in python@@ #no std #AshS

A quick google search shows

* python doesn’t offer linked list in standard library

* python’s workhorse list like [2,1,5] is a expendable array, i.e. vector. See and

* {5, 1, 0} braces can initialize a set. I very seldom use a set since a dict is almost always good-enough.

white-board real estate usage

  1. We often insert lines later at …
    1. Leave 2 lines of space before  and after loop header.
    2. leave 2 lines of space after function header
    3. leave space before and after if-header
  2. Example data — should be drawn at bottom right, but make sure you can see them as you code
  3. global var list — I tend to keep them at bottom left on one line
  4. I like many utility functions
    1. short utility function — you can start it near the bottom, or above existing code, with limited “runway”
    2. lengthy yet easy utility — If you worry a utility function may be lengthy, you can start writing on paper and stick it near the board
  5. top right — is for a 2nd major function, perhaps extracted from the first major function
  6. top level driver function — is never complicated but can grow.  I can write it below an existing function
  7. center of the board — leave empty for eyeball test, other examples, graph etc

simultaneous send to 2 tcp clients #multicast emulation

Consider a Mutlti-core machine hosting either a forking or multi-threaded tcp server. The accept() call would return twice with file descriptors 5 and 6 for two new-born worker sockets. Both could have the same server-side address, but definitely their ports must be identical to the listening port like 80.

There will be 2 dedicated threads (or processes) serving file descriptor 5 and 6, so they can both send the same data simultaneously. The two data streams will not be exactly in-sync because the two threads are uncoordinated.

My friend Alan confirmed this is possible. Advantages:

  • can handle slow and fast receives at different data rates. Each tcp connection maintains its state
  • guaranteed delivery

For multicast, a single thread would send just a single copy  to the network. I believe the routers in between would create copies for distribution.  Advantages:

  • Efficiency — Router is better at this task than a host
  • throughput — tcp flow control is a speed bump


new socket from accept()inherits port# from listen`socket

[[tcp/ip sockets in C]] P96 has detailed diagrams, but this write-up is based on

Background — a socket is an object in memory, with dedicated buffers.

We will look at an sshd server listening on port 22. When accept() returns without error, the return value is a positive integer file descriptor pointing to a new-born socket object with its own buffers. I call it the “worker” socket. It inherits almost every property from the listening socket (but see tcp listening socket doesn’t use a lot of buffers for differences.)

  • local (i.e. server-side) port is 22
  • local address is a single address that client source code specified. Note sshd server could listen on ANY address, or listen on a single address.
  • … so if you look at the local address:port, the worker socket and the listening socket may look identical, but they are two objects!
  • remote port is some random high port client host assigned.
  • remote address is … obvious
  • … so if you compare the worker-socket vs listening socket, they have
    • identical local port
    • different local ip
    • different remote ip
    • different remote port

tcp listening socket doesn’t use a lot of buffers, probably

The server-side “worker” socket relies on the buffers. So does the client-side socket, but the listening socket probably doesn’t need the buffer since it exchanges very little data with client

That’s one difference between listening socket vs worker socket

2nd difference is the server (i.e. local) address field. Worker socket has a single server address filled in. The listening socket often has “ANY” as its server address.

non-forking STM tcp server

This a simple design. In contrast, I proposed a multiplexing design for a non-forking single-threaded tcp server in accept()+select() : multiple persistent worker-sockets

ObjectSpace manual P364 shows a non-forking single-threaded tcp server. It exchanges some data with each incoming client and immediate disconnects the client and goes back to accept()

Still, when accept() returns, it returns with a new “worker” socket that’s already connected to the client.

This new “worker” socket has the same port as the listening socket.

Since the worker socket object is a local variable in the while() loop, it goes out of scope and gets destructed right away.

y tcp server listens on ANY address

Background — a server host owns multiple addresses, as shown by ifconfig. There’s usually an address for each network “interface”. (Routing is specified per interface.) Suppose we have on IF1 on IF2

By default, an Apache server would listen on and This allows clients from different network segments to reach us.

If Apache server only listens to, but some clients can only connect to perhaps due to routing!

Bbg QnA IV

Q: how would you write your own program to see if local system is big-endian or little-endian.

Q2: what design patterns do you use?
A: factory, template-method are used a lot in my code.
A: producer-consumer is an essential pattern in async designs
A: adapter — wrapper functions?
A: When needed, I use visitor as a big gun.

Q2b: you said template method is often used implicitly. Can you give some examples in STL?
A: would be hard since classic template method uses virtual functions but there are very few of them in STL. std::sort() fixes the procedure and let user specify the comparison, using template techniques instead of virtual

Q: how would you design multicast to a number of registered users?


c++get current command line as str

#include <fstream>

std::string const & get_command_line() {
  static std::string ret;
  if (ret.empty())
        std::string path="/proc/" + std::to_string((long long)getpid()) + "/cmdline";
        std::cerr<<"initializing rtsd parsername from "<<path<<" ..\n";
        std::ifstream myfile(path);
        if (!myfile) return ret;
        getline(myfile, ret);
  return ret;

spend 4H/D self-study@@ #XR

Hi XR,

I called you to highlight to you the realistic amount of hours we can spend on self-study.

Real example 1 —- I monitored it for months in my Master’s program — I would spend about 30 hours a week, including lectures, tutorial classes and homework(about 15-30 Hr/week, often taking up one full weekend-day). If I self-study without the program, I think it will drop to 3 hours a week. Why so low? Read on.

Eg 2 —- Some graduate students prepare for coding interviews by going through some question bank, just as you planned. Could be close to a hundred questions, during a 3M period — just my imagination, not based on any observation. They probably have 4 hours/weekday + 16 hours/weekend-day of free time. How many hours do they actually spend ? I would guess 10 – 20 hours a week. Why so low? Read on

Real example 3 —- In Singapore, I had kids, bills, grocery, household chores, home maintenance, family outing … so I had far less spare time than now, but still I probably had 1H/weekday + 8H/WED (weekend-day) for myself. In reality, I averaged 3 hours a week on self-study, but you know what a motivated person I am. Why so low? Read on

Real example 4 —- in Feb 2018 I had some very promising, valuable interview opportunities (FB). Even on the very last weekend before the interview, my time utilization rate was very very low. I lock myself up in office with no one else to distract me. I sleep only 7 hours and I eat in front my computer. In fact, I tell my friends “I spent entire weekend in office without going out of the building. You would think I could spend 15 hours a day on the preparation. Well, more like 6. Why so low?

It’s not all about motivation. I had high motivation in Singapore. I had the highest motivation over the recent weekend. But the numbers are still much lower than “theoretical limits”. There’s huge difference between wall-clock time vs real time spent:

Example 5 —- My son would spend an hour pretending to study 20 Chinese characters but only learn 3 characters and soon forget all of them. So the real time he spent was … a few minutes?

It’s about intensity. I call it laser energy intensity.

Real example 6 —- in a coding interview, my mind was working at top capacity. Over 15 minutes I solved some very tough problem. The rest of the interview (I was there for 3 hours) was far less intense, though it’s still intense, as intense as any technical interview.

Example 4 continued —- I time myself for white-board coding. Over 20 minutes my mind was working under immense pressure. Over the next 2 hours I sit down at a computer and make the actual code work. It’s still a lot of work, but my mind was at 50% peak capacity.

Use numbers to gauge the amount of workload and your time utilization rate. Say each problem (in homework or a coding exercise) takes average 30 minutes in exam / interview. Beside eating, using toilet, you have 15 hours to spend, so you would want to do 30 problems. How many can you actually complete? Perhaps 6, up to 10. Why so low? Our mind can’t operate at peak capacity for 15 hours!

In Example 4, after I spend 20 intense minutes, I need to wind down. After I spend 2 hours on lower-intensity work, I also need to wind down.

In Example 1, at the lecture/tutorial classes, the intensity was about 30% on average. That’s why I can do 2-3 hours. At a higher intensity the students’ mind would /disengage/ after an hour. All experienced teachers would slow down and give breaks when introducing tough technical topics.

In Example 1, When I spent 10 hours to complete my homework, my intensity was about 50% and I took many breaks. Actually the concentration time spent was more like 6-7 hours.

How does this mental “workload” compare to paid project work in a regular job, including work-from-home?

  • 😦 Project work is often repetitive or mundane. I can’t choose another topic.
  • 🙂 Paid work often has much bigger impact (otherwise why paid) and often a sense of achievement
  • 😦 Paid work stress is harmful, since the impact is high.
  • 🙂 Project work often has a discipline built-in, so we don’t wander off. This is similar to my Master’s program
  • 🙂 Project work requires us to keep going when the going gets tough. Also in my Master’s program

addicted to c++for()loop@@

I find myself thinking in c++ while writing python for loops. Inefficient.

for i in range(0,len(mystrLiTup),1): # explicit is best

What if you need to adjust the index “i” in the loop? A transitional construct (until we get rid of c++ thinking):

mystrLiTup='sentence. with. periods. end'
while True:
  i+=1 # increment explicitly
  if i >= len(mystrLiTup): break
  print mystrLiTup[i]
  if mystrLiTup[i] == '.':

move/forward used beyond argument-passing@@ rare

See also arg casting: #1 usage@move/forward

I believe move() and forward() are most often used when passing argument into a “worker” function (such as a ctor). I see no exception with forward() and only two questionable exceptions with move():

1) immediate steal:

Badstr && alias=move(passedIn);
do_something_with(alias.ptrField); //I don’t need the move() in this case
alias.ptrField = NULL;

2) move-assignment:

existingBadstr = move(passedIn); //could be written as
existingBadstr. operator=(move(passedIn)) //NOT really an exception to the norm

Tested in


mystr[:-2] COPY_truncate last 2 chars #mystr[2:]

  • Python string (tuple too) is immutable, so mystr[:-2] returns a copy with last 2 chars truncated
  • Even for a mutable list, this slicing syntax returns a copy.
  • …. This seems to be the best syntax to truncate list, string and tuple.


— how about

mystr[2:] # As expected, this clones then chops First 2 chars

— If you want to truncate a list in-place, use the q(del) keyword (not a function)

Syntax is easy for single-element deletion. Tricky for slice deletion

list tuple str comment
del immutable in-place truncate?
var[:-2] tested copy_truncate LAST 2
var[2:] tested copy_truncate FIRST 2


Q1: generate one solution

Q2: generate all solution to the 4-queen problem without
A: there are up to 4^4 = 256 possible solutions so just iterate over all, but let’s say we want to extend the Q1 solution is my tested solution. The only important function is placeOnRow(), which took me 17 minutes to write on white-board, but I missed something very important — restoring Mat[r][c] to UNFILLED before continue

4×4 sudoku: backtracking #FB

4-queen is one of the simplest problems in the backtracking category, but I will look at a mini sudoku. Sudoku was once asked in a Facebook 45-min white-board coding test. is my tested implementation.

I name the four values as color1  color2  color3  color4. A placement is {a color, row index, column index}.

Basic idea — at each empty cell, we try all four possible colors. If color1 is allowed, then we call the same function recursively to fill the next empty cell. Then (crucially) we check the status. If OK, then game over. If NOK, then we try another color. If no more color to try, then (crucially) we reset current cell to unfilled and return NOK

Key to the algo — backtracking is implemented by the restoration and returning NOK

Key to the algo — each empty cell would start a new recursive stack frame. But what data is saved on that stack frame?

Key to the implementation — know what variables go into the loop, what go to the stack and what can be global (simplest). In this case the returned next cell indices CANNOT be globals because each stack frame must remember “next-cell-after-me”

Key to the implementation — simplify state. Each stack frame doesn’t need to remember the “breadcrumb” of all previous placements. Each layer of recursive call only need to remember the colors already tried for its own current cell.

Key to the implementation — factor out simple utilities if they are well-defined and easy to implement. See comment in code.

The important function requires no more than 10 lines! So an interviewer could say “This is something a candidate can write in 45 minutes, assuming all the simple utilities are already available to him, namely failedSomething() and findNextCell2fill().

Truth is, most of us can’t write these 10 lines in an hour on the white-board if we don’t have the correct idea. My initial white-board implementation took 18 minutes and missed one important part —

Mat[r][c] = UNFILLED_COLOR # restore

I then took about an hour to make it work.

StopB4: arg to range()/slice: simplistic rule

I believe range() and slicing operators always generate a new list (or string)

If you specify a stopB4 value of 5, then “5” will not be produced, because the “generator” stops right before this value.

In the simplest usage, START is 0 and STEP is 1 or -1.

…. If StopB4 is 5 then five integers are generated. If used in a for loop then we enter the loop body five times.

In a rare usage (avoid such confusion in coding test!), STEP is neither 1 or -1, or START is not zero, so StopB4 is a used in something like “if generated_candidate >= StopB4 then exit before entry into loop body”

Code below proves slicing operator is exactly the same. See

word[:2]    # The first two characters, Not including position 2
word[2:]    # Everything except the first two characters
s[:i] + s[i:] equals s
length of word[1:3] is 3-1==2

bbg: allocate half the players to NY^SF

Always good to be concrete, so without loss of generality, let’s say J=99 i.e. 198 players.

Q1: Given 2J players, we need to allocate exactly half of them to NY and the rest to San Francisco, to form two teams for a inter-city contest. We receive the NY fare and SF fare for each player, as 2J (i.e. 198) pairs of integers. How can we minimize the total fares?

Q2(bonus): what if 2J+1 players, and you can decide which city gets an extra player?

— Analysis: how many possible allocations? 2J-choose-J. Very large number of allocations. Something like O(J!) so brute force is impractical.

If for any player the two fares both go up by $9800, it doesn’t change our algorithm at all. Therefore, we only care about the fare difference (N-S) for each player.

— solution: I will assume most players live near SF so SF fares are lower. I tabulate and sort the 198 fare differences “NY fare – SF fare” and suppose indeed at least 99 (half) are positive[1]. Therefore, my “base” is SF i.e. my base allocation is everyone-allocated-to-SF. Next I must pick 99 (half) of them and allocate to NY.

I will avoid the higher values in the table.

I simply pick the lower 99 (half) in the table, because these are the 99 “extra” costs I will incur. I want to minimize the total of these 99 values, whether they are mostly positive or mostly negative.

  • Note the highest number in the table indicates the most expensive player for NY relative to SF. If this number is negative, then SF is more expensive than NY for All players so follow the rule in [1] but notice he is the least expensive players for SF relative to NY. Regardless of positive/negative sign, we want to keep this person in SF .
  • Note the lowest number in the table indicates the least expensive player for NY relative to SF. If this number is negative, then SF is more expensive than NY for this player — normal. Regardless, we want to allocate her to NY.

[1] if proven otherwise (by the data), then we could easily treat NY as base to keep the solution intuitive. Actually even if we don’t change the base, the algorithm still works, albeit unintuitively.

A2: Say J=99 so total 199 players We already picked 99 for NY. Do we pick one more for NY or keep remaining 100 in SF?

Just look at the next lowest value in the table. If positive, then NY is more expensive for him, so keep him in SF.

punctuate ContinuousSentence #bbg

I believe is identical.

Q (bbg): Given a dictionary of English words (containing no numbers no underscores) and a very long sentence, someone has removed all the spaces. Please write a program to restore it by adding a space after each word. If there’s no way to parse the sentence just return False, otherwise return True to indicate you found at least one way to parse the sentence. The sentence can have repeated words.

I was allowed to use either white-board or computer.

Brute force? I struggled for a long time with some vague idea. Interviewer asked me to implement any solution I have, however inefficient , but I was unable to.

Key observation — This is a typical QQ type of algo question. If you remember one key point, then you have a advantage.

Key observation — syntax and ECT is not really hard. Only simple string operations. Data structure is simple. Challenge is an elegant (precious) pure algo.

Key observation — the recursive backtracking solution is working and clever-looking but actually not the very best. Still, recursive thinking is valuable and gave me a first solution. I’m not ashamed of my recursive solution.

Key observation — This problem is considered not too hard because …. the solution is short! But most of us can’t conceive such a solution! I won’t trust the difficulty level in shows my recursive backtracking  solution. Interviewers basically said it works.

There’s an inherent tree structure in this greedy backtracking algo. This tree has the power to meet the tougher requirement in Q2. As such, this algorithm is more powerful than then algo below.

— More efficient solution: non-recursive, no backtracking

Suppose the sentence has 99 chars ch[0], ch[1] … ch[98]. We maintain an “aboveWater[99]” bool array. A ch[33] is above water if there exists a 4-char bridge word from aboveWater character ch[29] ….

The bool array is assumed underwater initially. We scan it forward once to check and mark selected char as aboveWater.

Once the left section of the array is analyzed, we don’t need to revisit it by backtracking.

The simplest algo — forward iterate every position. At each position i,
look at the last 2 chars j-1~j
look at the last 3 chars j-2~j
look at the last 4 chars j-3~j
… If any is a bridge word, then mark j as aboveWater and move on.

—— Q2(Leetcode): Given a non-empty string s and a dictionary wordDict containing a list of non-empty words, add spaces in s to construct a sentence where each word is a valid dictionary word. Return ALL such possible sentences. is very similar

My github solution should solve it even if not optimal. Need to test. is my original “Morse” solution. It may not be optimal, but a very, very good effort. My yoga effort is probably bigger than jogging or diet, but this effort doesn’t make me above average among the experienced practitioners.

Still my Morse solution and yoga practice make me stronger and more formidable in the mind.

These efforts are bigger than my earlier efforts over the past 5 years.

binary-matrix island count #DeepakCM

Q: has my solution. I don’t want to spend the time passing all leetcode tests! Diminishing return is conceptually identical, though using a different criteria for “connected” — diagonal neighbors are “connected”

Hi Deepak

I realize for “connected” problems, there’s definitely a graph underneath, so graph traversal is required. I guess BFT or DST will both find all the nodes connected to “me”.

Given a 1000 x 1000 all-black matrix, I think DFT recursion will go into exactly 1,000,000 levels and overflow stack space.

A high-level (vague) idea is

  • Scan in type-writer fashion. Suppose there are 10×10 = 100 cells either black or write. I paint each cell brown [1] once visited. I also use a integer counter to keep track how many cells already visited.
  • During the scan. If a cell is already visited, I will ignore it and move on
  • Any time I find a black cell, I will start a BFT (I dislike DST) to find all connected black cells.  Once I find all the black cells connected to “me”, I resume the type-writer scan.
  • Once my counter hits 100, I have visited all cells and will exit.

[1] you paint it white, which is probably better.

I thought this problem was too hard and not worth study, but you convinced me that it may come up and I can at least give a vague solution  … better than no solution.

Compared to a binary tree, walking over a matrix is /operationally/ (not conceptually) harder because

  • need to check array bounds. Error-prone and time consuming in coding tests
  • The (invisible) links will not cover all cells. To avoid missing a cell, we still need a full matrix scan. The integration of tree walk + matrix scan is unintuitive, to put it mildly.
  • During the tree-walk, you don’t want to visit any cell too many times or get lost in an endless loop. A brute-force technique — shadow matrix to remember all visited cells.
  • … However, once you get over these nitty-gritty operational complexities, then tree-walk in matrix is not really harder. These algo questions can therefore frustrate and fail many candidates untrained on The technique.

describe Latest project #weak@22Feb

See also my notes in C:\0\z0\cv,iview.0

Gayle has a good one-pager too for tech companies. There are similar resources.

Current project is supposed to be fresh in your mind, so interviewer has high expectation of details. Better avoid the current project.

Option: you could talk about a “past” project, but bring in details from another project, such as your actual current project, but beware you may be asked “what’s your current project?”

Avoid naive honesty, which often backfires. Such honesty doesn’t serve any purpose. We are professionals, expected to communicate truthfully but also stay relevant. The major challenge or a key component I created may be irrelevant to the interviewer or impossible to describe over phone.

Q: which component did you *design* ?

  • A: wait/notify framework to associate order requests sent out with response received. See the SCB concurrency design questionA
  • A: orderbook + reset, refresh. Orderbook alone could be too familiar to some interviewers .. hard to score points
  • A: Guardian — monitoring, with microagent + server + GUI. Easy to describe with some complexity
  • A: retrans gap mgmt. More details needed
  • A: NYSE+Arca+American integrated feed parser
  • A: Preferences framework
  • A: vol model serialization

Q: major challenges?

%%A: wait/notify framework

Q: what does the system do, in layman’s term

[17] widely in-use, no longer quizzed #spring,SOAP.

I see a pattern — a new technology is getting adopted and quizzed in-depth at interviews. After 5 years, it is still a favorite, perhaps dominant solution, but 1) the know-how has become common knowledge and candidates are assumed to know it and 2) usage is now standardized and simplified, so the “bar” is lower, and candidates without the knowledge can easily pick it up.

No more in-depth questions needed. Therefore, time previously invested here is wasted, since only superficial knowledge is required now.

  1. Eg: spring/hibernate
  2. Eg: java servlets and JSP — From 1999 to 2008 these topics were heavily quizzed. Still widely in use but often invisible.
  3. Eg: Apache web server — In 2000 I was asked a lot on Apache details. Apache is still very popular. See
  4. Eg: php — still widely used, but I feel not asked a lot. See
  5. Eg: xml parsing — I used to get in-depth questions about DOM or SAX etc. Now I believe xml is still widely used
  6. Eg: web services, SOA, SOAP — Still very much in use
  7. Eg: HTTP protocol details like GET/POST, status codes
  8. Eg: Maven and Ant

Most of my examples are in the high-churn domains like Internet, mobile. I believe the same will happen to interview questions on big data, javascript, Android, iOS , blockchain, ..

The opposite list — some essential technologies underlying multiple technology waves were never heavily quizzed, but, after the waves subsided, remain rather relevant to many niche interviews.

  • SQL query — joins, subquery, case, ..
  • SQL and DB tuning
  • Unix automation — It can take years to become reasonably competent with all of bash, piping, subshells, trap, shell functions, string operators, file manipulation, and top 30 Unix commands
  • Unix system administration
  • Pthreads, a POSIX standard C library
  • http client programming
  • regular expression
  • Inter-Process-Communication
  • Java servlet session management
  • Java serialization
  • Java reflection

— You write —
There are still many projects using Spring. My current project is also using Spring, but it’s modified by internal team to create an internal framework. When people discuss in meeting, they say “Spring” to refer to this framework. But there are many pitfalls when I use it. To name a few:

  1. a) restful service is easy to implement in spring, ust add related annotations, but it doesn’t work, and after I spent a few days of research, I gave up and choose to use a internally created annotation.
  2. b) some configurations doesn’t work, parameters couldn’t be passed in. I still don’t know what’s the reason. The internal framework code is not accessible for other teams developers, so I don’t think it worth to spent more time to try to figure out.

For this project using Spring, the interview only mentioned this project is using Spring, but didn’t ask any questions about Spring.

For last year, I went through 5 interviews, 2 mentioned the projects are using Spring, and only one client asked some Spring questions.

I recall 5 years ago, 8/10 will ask spring and hibernate questions. Now, still a few clients asked Spring questions, but none asked Hibernate questions.


## HFT c++knowhow is distinct like .. swing

I figured out long ago that java swing is a distinct skill, distinct from mainstream java.

Now I feel HFT c++ is also such a skill — distinct from mainstream c++. Interviewers tend to ask questions (almost) irrelevant to mainstream c++. Shanyou said focus is on system-level not application-level. I feel superficial knowledge is often enough.

  1. [u] avoid “virtual” for latency
  2. CPU cache management
  3. translation lookaside buffer
  4. [P] kernel bypass
  5. [uP] socket tuning
  6. [P] memory and socket performance monitoring
  7. shared memory
  8. [u] placement new and custom allocator
  9. system calls
  10. [u] in-lining
  11. [u = used in my systems to some extent]
  12. [P = no programming, pure system knowledge]

–These topics are relevant to mainstream c++ but more relevant to HFT

  • setjmp
  • IPC mutex
  • reference counting

stack 2 windows above/below #win7

based on

Scenario — On my vertical monitor I want two windows A and B stacked; on my other monitor I want one big window; total 3 windows.

Solution —

  • minimize all other windows
  • right click on taskbar -> showWindowsStacked

Note to show windows side by side there are many options like winKey and snap

AuxDS ] each node2simplify algo #shadow matrix

In some algo questions, I like to add a tiny “metadata” field to the VO.

eg: BFT showing level

Q: how likely is interviewer to accept it?

A: I feel west coast interviews tend to entertain crazy ideas since algo challenges are more free-flowing, exploratory, thought-provoking. I remember the Bbg FX interviewer (Karin?) said my “in-place” is OK if I append to the existing vector then truncate the original portion.

A: I feel if it’s clever then they may appreciate it.

Space/time trade-off. The metadata field

  • can be a pointer to parent or next node (like LinkedHashMap)
  • can be an iterator into another (static) data structure
  • can be the element’s own address if not already available
  • can be a sequence id as sorting key
    • insertion id is popular,
  • can be a bunch of flags packed into 8-bit integer


sentinel node trick: array/slist

  • eg: STL uses sentinel nodes. I believe container.end() returns that.
  • eg: c-string uses \0 as sentinel value.
  • eg: binary search needle can be too high and we have to test if the return value is a real element or the end-of-container, so it’s extremely useful to add another sentinel to guarantee a successful search

When solving timed coding questions on “clean” arrays, it can be useful to append a sentinel node. It can simplify some algorithms which take action after each segment.

P 90 [[Pearls]] introduced a loop coding idiom. Idiom is applicable whenever we loop over a container and have an early-exit “break”. Such a loop has exactly two [1] conditional tests per iteration, therefore can run faster if we combine the two into one conditional test. This is a small halo idiom for latency. But beyond latency, there are more interesting benefits such as cognitive complexity reduction.

For example, consider the business logic after reaching “normal” end of the container without hitting the early exit. Rarely can we “forget” this business logic and simply exit the function and rely on the implicit return. Instead, this normal-completion scenario requires careful handling and a cognitive burden. To remind myself, I often place a comment after end of the loop. (Python provides an Else clause for a for-loop.)

In some cases, the business logic after end of the loop is 2 pages away from the early-exit business logic, but they really should be in one module. I hate this situation so much that I always have set a flag and use a “break” so that the two routines are both written after end of the loop.

In all these scenarios, it’s often simpler to append an artificial sentinel element at end of the container!  The sentinel is guaranteed to hit the early-exit, so the loop would never exhaust all elements and complete normally. Therefore we can combine the two into one conditional test:)

I usually replace “break” with “exit”, further reducing the control-flow complexity. Such micro simplifications can pay huge dividends in high-pressure timed tests.

Right before the exit/break we may still need to handle normal-completion scenario by checking if we are at the sentinel. Interesting scenario. Now we can combine the logic of normal-completion vs early exit. In many cases, they are (nearly) identical, so we can express the logic in very structured code.

Whether they are identical or not, handling the two scenarios (early vs normal completion) in a centralized module is usually simpler and more structured, reducing the cognitive complexity.

[1] what if three tests (two breaks)? The mental burden is even worse. The sentinel could reduce it

## notable linux system calls: QQ question can sort the functions by function id is ordered by function id

  • fork()
  • waitpid()
  • open() close() read() write()
  • –socket
  • socket() connect() accept()
  • recvfrom() sendto()
  • shutdown() is for socket shutdown and is more granular than the generic close()
  • select()
  • epoll family
  • –memory
  • brk
  • mmap

Ack in FIX #TCP

TCP requires Ack for every segment? See cumulative acknowledgement in [[computerNetworking]]

FIX protocol standard doesn’t require Ack for every message.

However, many exchanges spec would require an Ack message for {order submission OR execution report}. At TrexQuant design interview, I dealt with this issue —

  • –trader sends an order and expecting an Ack
  • exchange sends an Ack and expecting another Ack … endless cycle
  • –exchange sends an exec report and expecting an Ack
  • trader sends an Ack waiting for another Ack … endless cycle
  • –what if at any point trader crashes? Exchange would assume the exec report is not acknowledged?

My design minimizes duty of the exchange —

  • –trader sends an order and expecting an Ack
  • exchange sends an Ack and assumes it’s delivered.
  • If trader misses the Ack she would decide either to resend the order (with PossResend flag) or query the exchange
  • –exchange sends an exec report and expecting an Ack. Proactive resend (with PossResend) when appropriate.
  • trader sends an Ack and assumes it’s delivered.
  • If exchanges doesn’t get any Ack it would force a heartbeat with TestRequest. Then exchange assumes trader is offline.

std::sort (!!quicksort) requires random-access

list::sort() works on list — a non-random-access container, using a stable version of quicksort. See

There are many online resources about quicksort without random access.

In contrast, java Collections.sort() does NOT require RandomAccess.. RandomAccess #ArrayList sorting

Gayle: no rush2write code: point allocation

Gayle strongly advocates to get a brute-force solution as soon as possible, and walk through the “idea” in detail, before coding. However, I’d rather start coding up my idea much earlier since I need more time coding.

I guess Gayle’s experience shows that some interviewers are less worried about runnable working code than the thought process. For a tricky algo, if the idea is sound, then you basically pass….? See with algo cracked,ECT=雕虫小技

However, I feel an elaborated/detailed “idea” (item 2 below) alone is not enough to pass. I frequently run out of time implementing things like The algorithm is simple and not worth 40% but I was too slow completing a basic implementation. “Done is better than perfect”. Basic implementation is better than no implementation. You need to demonstrate you can implement a working solution.

Here’s my estimate of the point distribution in a typical whiteboard interview:

  1. 10(% clear understanding of requirements + clarifying questions
  2. 35(50)% detailed idea (even unimplemented) with decent performance enhancement
  3. 35(20-40)% basically working implementation.
  4. 5% performance analysis — can be tricky
  5. 5% readability — modularized, clean
  6. 5% eyeball test — can be hard to get right if code is convoluted with many variables
  7. 5% minor bugs, corner cases

unordered_map^map performance: unpredictable

Small halo… the theories and explanations are subject to change.

Many people report that unordered_map can be slower than traditional std::map for small collections. I feel it’s implementation-dependent. Result may NOT apply to java or c#. (by Stroustrup) says “For larger numbers of elements (e.g. thousands), lookup in an unordered_map can be much faster than for a std::map.” In the same paragraph Stroustrup also says unordered_map “lookup involves a single call of a hash function and one or more equality operations is mostly about lookup speed. It shows that string key size hurt hash table, but sample size hurts RBTree.

Many interviewers asked me

  • Q: why for a small string collection, RBTree can outperform hash table?
  • A: String hashing takes time proportional to string size
  • A: Poor hashing => Hash collision => linear search is another potential problem. String hashing is unpredictable compared to integer hashing. No one can guarantee the next sample will be “uniform” according to our hash function.
  • A: rehash

Cache efficiency on the bucket array (array of pointers) is another reported weakness in hash tables. However, I guess a small bucket array can be completely held in cache and each linked list usually holds 1 node only so cache miss is on that 1 node. For a tree, logN nodes are accessed for a lookup and all of them could get cache-miss.

Practical suggestion — it’s easy to swap the two in a given codebase, so just swap and benchmark. Either can be faster. If your data set changes over time, you may need to re-benchmark.

malloc=long considered costly

Heap allocation is extremely slow compared to other operations.


c++QnA interviews(HFT): !! pickier than Facebook

I feel my HFT interviews are very picky on low-level kernel or compiler optimizations or network card engineering, but such QQ knowledge is not widely needed.

Don’t feel inferior to them.

  • Q: what if a HFT programmer goes to a Facebook interview?
  • Q: what if a coding contest champion from Facebook goes to an HFT interview?
  • Q: what if these guys go to a tough logic or probability quiz?


mgr|risk| smaller job pool

When not comfortable (under threat), or job lost … the prospect of finding a similar job is much worse than a hands-on developer, because the number of senior mgr jobs is much smaller.

Avichal basically said he would avoid hands-off manager roles.

As contractor, most of the time I feel very relaxed about moving in and out. The price to pay, of course, is lower salary.

forward^move on various reference types

Suppose there are 3 overloads of a sink() function, each with a lvr, rvr or nonref parameter.

(In reality these three sink() functions will not compile together due to ambiguity. So I need to remove one of the 3 just to compile.)

We pass string objects into sink() after going through various “routes”. In the table’s body, if you see “passing on as lvr” it means compiler picks the lvr overload.

thingy passed through move/fwd ↴ route: as-is route: move() route: fwd  without move() route: fwd+move
output from move() (without any named variable) pbclone or pbRvr ? should be fine pbref: passing on a lvr pbRvr
natural temp (no named variable) pbclone or pbRvr ? should be fine pbref: passing on a lvr pbRvr
named rvr variable from move() ! [2] pbref or pbclone [1] pbRvr pbref?
lvr variable pbref or pbclone passing on a rvr pbref? should never occur
nonref variable pbref or pbclone passing on a rvr pbref? should never occur are my experiment.  Take-aways:

— when a non-template function has a rvr param like f(MyType && s), we know the object behind s is a naturally-occurring temporary object (OR someone explicitly marked a non-temp as no-longer-in-use) but in order to call the move-ctor of MyType, we have to MyType(move(s)). Otherwise, to my surprise, s is treated as a lvr and we end up calling the MyType copy-ctor. [1]

  • with a real rvr (not a universal ref), you either directly steal from it, or pass it to another Func2 via Func2(move(rvrVar)). Func2 is typically mv-ctor (including assignment)

[2] Surprise Insight: Any time you save the output of move() in a Named rvr variable like q(string && namedRvrVar), this variable itself has a Location and is a l-value expression, even though the string object is now marked as abandoned. Passing this variable would be passing a lvr!

  • If you call move(), you must not save move() output to a named variable! This rule is actually consistent with…
  • with a naturally-occurring temp like string(“a”)+”1″, you must not save it to named variable, or it will become non-temp object.

— std::forward requires a universal reference i.e. type deduction.

If I call factory<MyType>(move(mytype)), there’s no type deduction. This factory<MyType>(..) is a concretized function and is equivalent to factory(MyType && s). It is fundamentally different from the function template. It can Only be called with a true rvr. If you call it like factor<MyType>(nonref>, compiler will complain no-matching-function, because the nonref argument can only match a lvr parameter or a by-value parameter. shows that every time any function gets a variable var1 as a universal ref or a true rvr, we always apply either move(var1) or forward(var1) before passing to another function. I think as-is passing would unconditionally pass var1 as a lvr.

[19]C#job #AshS

See also enough c# mileage accummulated@@ and exposure: semi-automatic(shallow)Accu #$valuable contexx

As I told XR, a C# job offers opportunity for steeper growth curve, and higher chance of learning something useful [1]. It could provide another boost to my c++/java zbs.

As I told XR, some c++ shops are on VC++, and I am rather weak on windows and MSVS.

[1] In contrast, a regular c++ (and esp. java) job might hit a plateau of slow growth and diminishing return.

Therefore, c# job sounds more “strategic”. Strategic is not about long-term career development, but about engaging. With something “strategic” I might be more engaged, but I might lose the interest in X months too.

–Ashish’s team specifically

  • 🙂 Good chance of appreciation. Ashish isn’t as strong as Kevin.
  • 🙂 smaller codebase
  • 🙂 bitcoin trec (+shallow dnlg) will be a new addition to my /repertoire/ on my profile
  • 🙂 no risk of reputation damage at a big bank

components@kernel beside deviceDrivers #signals,interrupts

Q: exactly what constitute the linux kernel?

[[OpSys by William Stallings]] P99 has a nice diagram showing most of the components in linux kernel are device drivers and similar software

  • NIC driver
  • char device driver — for keyboard
  • block device driver — for disk
  • physical memory interface
  • device interrupt handlers

What are the other components of kernel?

1. system calls — the highest level in the kernel
1. Signals — the Signal definition and handlers are at the same level as syscalls, but at a higher level than the device interrupt module.
2. TCP/IP module —-  is a level between syscalls .. and .. NIC driver
2. virtual memory —– is a level between syscalls .. and .. physical memory interface
2. filesystem module – is a level between syscalls .. and .. block device driver

Lastly, scheduler — is a highly visible component of kernel. Scheduler relies critically on the device interrupt module.

9 income sacrifices: often2avoid stagnation

Update — relevant for my hibernation decision.

I demonstrated a habit of giving up higher income, in order to keep learning something of strategic value ( or managing mgr expectation )

  • before it’s “too late” — i.e. before the opportunity disappears
  • The prospect of blood-letting (wasted cycles) is too painful.
  • I felt my peers were moving ahead of me, while I stood still.
%sacrifice higher income forgone lower income to gain … ROI realized intangible ROI strategic?
2002 50% Zed ->SCS presales none brave
2003 40% SCS ->self-employ entre none brave
2004 70% Sperion ->self-employ entre none brave
2007 10% est. UBS/Headstrong GS branding unsure
20012 7% Barc #Swing 150k picked OC 100% c#, quant dev learned enough to crack some c#IV confidence no:(
2015 10% est. sign-on increment Macq lower mgr expectation year1 bonus as a self-image boost #short-lived no:(
2017 10%+ Pimco C++/FI offer 100+/H c2c picked RTS 100% c++ more c++ offers yes  [1]
2017 10% java jobs picked RTS c++ more c++ offers ensued yes 🙂
2018 10% java jobs picked mvea c++ c++ job market cracked c++ skill is harder, rarer conviction
2018 15% CVA picked mvea low-latency high-volume eq trading big OMS framework job satisfaction yes 🙂 [3]
—-  —- —————
2013 not a job [2] UChicago formal math training Master’s degree, branded uni confidence in math; self-confidence no:(

[1] In 2017, I felt my c++ had not reached the critical mass despite my long self-study. A power drill was required to break the stone wall. The RTS full time job turned out to be that power drill.
[2] For the UChicago motivation, there’s a separate blog post.
[3] no regret taking this lower offer!

## CRTP usageS #template Method

This blog post describes two usages of CRTP

I briefly read the excellent blog I feel CRTP is advanced for almost all the contexts I can think of. (In contrast,  I can see some “necessary” usages of SFINAE, such as the one in my little AddOrder.h) shows a simple [1] example of CRTP to replace virtual functions. How much efficiency improvement does it make? Questionable. I always say that if you want the lowest latency, then write selected modules in assembly language, and store it in hardware like FPGA.

[1] if there is anything simple in template meta-programming.

I have heard of several techniques to avoid virtual functions, but I believe the actual evidence (in terms of measured improvement in latency) is likely unconvincing or insignificant. Therefore, if CRTP is used to eliminate virtual function latency, then I am not sure how much value it adds.

There are other techniques to avoid “virtual”. I feel they are easier to understand than CRTP.

Sometimes CRTP is the only choice — if the virtual function needs its own template parameters, then compiler will complain that “function template can’t be virtual”. Rahul hit this complaint.

Q: for a true virtual method v1(), the derived class is not yet written when the base class is compiled. Later, Only at run time can the “system” pick the right implementation of v1(). How about CRTP?
A: base class is Not compiled ahead of the derived class. Each derived class includes a header defining the base class template.


Beside this “virtual-elimination” use case, CRTP has other applications (am still unfamiliar with), but if I’m asked in interviews I will only mention this one use case. One of the common “other usages” is TemplateMethod with compile time (not runtime) resolution, the 1st usage in the excellent blog . In the classic template method pattern, the “procedure” is published and finalized in the base class. Individual steps of the procedure are virtual methods, resolved at runtime. In the CRTP version, superclass methods call subclass methods, safely and cleanly. Superclass using subclass is a taboo in most contexts, but both traditional TemplateMethod and CRTP-TemplateMethod are notable exceptions.

The article didn’t clearly highlight a key point about this usage — The base class NumericalFunctions is general purpose, designed to be subclassed by anyone.  I could write a Temperature class to subclass NumericalFunctions too. This way, the code in NumericalFunctions is available for reuse. is working code. Key points to remember about the code sample:

  • base-class — is a template with a dummy type “Sub”
  • derived classes — have the form “class Sub1 public Base<Sub1>”
  • the static dispatch (non-virtual) function in Base always static_cast “this” to *Sub.


c++interview tougher than java on WallSt #XR

My friend XR is first to point this out.

Q1: for candidates with 5+ years of experience, is the passing rate really worse in c++ IV than java IV? Let’s limit ourselves to sell-side.
%%A: indeed slightly lower.

Q2: is c++ paying slightly higher? I don’t think so.

Both java and c++ (c# too but not python) interviews go to crazy low levels details. Most tough questions in java/c#/python are low-level. C/C++ is a lower-level language.

c++(!!Java)ecosystem questions are tough #Deepak

Perhaps the most important reason for my Q1 answer is selectivity. A java hiring team can be highly selective but on average, c++ hiring teams have higher selectivity:

  • higher bar
  • more time allocated to selecting the right candidate
  • less interested in commodity skills and more demanding on in-depth knowledge
  • less likely to settle for second best talent

What does it mean for Shanyou and Deepak? They are unlucky to hit a rising bar in a shrinking domain… increasingly competitive.

[17]productivity@past jobs: questionable patterns

Be careful with the assessment — perhaps I was not so productive on my past US projects
Be careful with the assessment — perhaps I was not so un-productive on the q3SG projects

Consider deleting this blogpost completely.

Pattern – Barc, ML … were java, my turf
… but Citi was java as well. My java skill did help, but I didn’t make use of my strongest java skills.
… but at GS, Perl was my turf too. My Perl experience did help but the colleagues were too strong.

Pattern – Barc, ML … not many peers to compare with me
… but Chien was a direct peer. I think Rao also.
… but later stage GS, I was benchmarked

Pattern – OC, … familiar with the codebase
… but OC codebase was rather big.
… but Stirt codebase wasn’t big.

Pattern – GS, OC … spent long time on the existing codebase. In contrast, Creating code from scratch is easier. … but Stirt was code from scratch

Gayle: use big,ugly sample2explore; tiny sample4test

–In the initial exploration (only 50% clear direction), the big ugly sample would clarify many assumptions.

Any mis-assumption corrected is a potential life-saver since the mis-assumption could lead us down the wrong track.

Such a sample will help us see the pattern in data and get to the gist of the problem.

–However, during the eyeball testing on white board, a large sample would take too much time. A tiny sample can flush out the bugs more quickly

max-sum subMatrix O(N^3)

Q: given a square matrix (height N) of signed ints, find a submatrix with maxi sum


compute submatrix sum by hashmap lookup is a O(N^4) solution. has a O(N^3) solution for square matrix. I think brute force is O(N^6). If you know the trick it is doable.

Clearly the max-profit algo is inapplicable. The Kadane algo may help.

Need auxDS for sure.

— simple, elegant DP idea based on hint:

For every left-right column pair, find the best submatrix bounded by those two columns . For example, given column pair 3/7, the best submatrix must extend from column3 to column7. I think this constraint is extremely liberating, and leads to an elegant DP solution:

For pair 1/2, construct a new column array temp[] where temp[6]:= row-wise sum across columns 1/2. Once we have temp[], apply Kadane’s algo on temp[] to find the max submatrix sum. Entire process is O(N)

Q: Is each pair evaluated as an independent problem? I doubt it. I think after we have done pair 1/2, adding column3 into the mix is O(N), so the pairs 1/2, 1/3, 1/4 .. 1/N can be completed in O(NN). All pairs would O(NNN)

So how is adding column3 into the mix O(N)? I think we don’t use the prev Kadane result, but we use the previous temp[] to gain efficiency! We can update existing temp[] in O(N), then apply Kadane’s algo on it in O(N)

What if I have a 99×88 matrix? Do we use a O(NNM) or O(NMM) algo? Since there are fewer columns, I will stick to the column-pair algo presented.  (If there are fewer rows i.e. flat matrix then I would use row pairs.)

## BRING reminder sheet : cod`IV

  1. edge case — my loop may get skipped completely
  2. edge case — empty output
  3. drive to improve af it works
  • bring rectangular post-it + breakable tape


bucketSort: O(N+k) could be better or worse than O(N)

Based on the 53-vote top answer in

It’s obvious (by definition) that O(2N) is equivalent to O(N).

O(N+k) is worse than O(2N) when k is larger, like 4000 times N. For a concrete illustration, say, we have N=5 million strings to sort, using k = 20 billion buckets.

  • It takes constant time to bucket each string, so the first step takes O(N) i.e. grows proportional to N.
  • 2nd step is now dominated by k, since all 20 billion buckets have to be examined. Time complexity here is O(N+k). This is explained in the stackoverflow answer.
  • Therefore, total complexity is O(N+k) i.e. cN + bk with two constant factors c and b. Even if b is small (tcost of looking at an empty bucket), as k grows, at sometime the k term would dominate the N term.

In practice, k is often chosen to be comparable to N, so I don’t think this is a practical issue.

pure virtual dtor

pure virtual dtor is a low-value obscure topic in 1) interview 2) zbs. So I won’t spend too much time on it. addresses the rationale and justification for pure virtual dtor explains this dtor must be defined somewhere or compiler/linker would complain.

bbg: Problem-Solving skill

I think this skill is like IQ…

I feel this is broadly similar to west coast interviews, perhaps with subtle differences in how Mistakes are assessed.

One Bbg interviewer (young Chinese guy) shared with me what he meant by “problem solving skill” — given a well-defined programming problem

  • not superb performance
  • not syntax perfection
  • solution should be “reliable”. I guess he means ‘free of major bugs’
  • problem is often realistic — I have noticed this feature repeatedly
  • I said many candidates go on the wrong track and (usually) get stuck. He agreed.
  • I said many solutions are broken because candidate misses some key factor. He agreed.
    • perhaps a factor about requirement
  • I now think some candidates have no experience no basic understanding on a given programming topic (say, DP, or regex parser, or condVar) so they will never get off the wrong tracks.

Rahul’s reasoning — on the job, most of the time a new hire would receive inputs from other people, and need to think through a problem with many obstacles and unknowns. Interviewer can’t really pose a realistic work site problem, so interviewers use algo questions as a proxy.

I think that Beside the technical capabilities, there are many personality, motivation factors that determine candidate’s progress on a given problem

  • Is the candidate confident (rather than nervous) to take the hint and work on it?
  • Is the candidate too ready to give up?
  • Is the candidate unable to use the hint but unable to seek clarification?

Rahul as an interviewer tries to make the candidate relax and perform at her best.

c++matrix using deque@deque #python easier

My own experiment shows

  • had better default-populate with zeros. Afterwards, you can easily overwrite individual cells without bound check.
  • it’s easy to insert a new row anywhere. Vector would be inefficient.
  • To insert a new column, we need a simple loop

For python, # zero-initialize a 5-row, 8-column matrix:
width, height = 8, 5
Matrix = [[0 for x in range(width)] for y in range(height)]

In any programming language, the underlying data structure is a uniform pile-of-horizontal-arrays, therefore it’s crucial (and tricky) to understand indexing. It’s very similar to matrix indexing — Mat[0,1] refers to first row, 2nd element.

Warning: 2d array is hard to pass in c++, based on personal experience 😦 You often need to specify the size in the receiving function declaration. Even if this is feasible, it’s unwanted legwork.

[1] Warning — The concept of “column” is mathematical (matrix) and non-existent in our implementation, therefore misleading! I will avoid any mention of it in my source code. No object no data structure for the “column”!

[2] Warning — Another confusion due to mathematics training. Better avoid Cartesian coordinates. Point(4,1) is on 2nd row, 5th item, therefore arr[2][5] — so you need to swap the subscripts.

1st subscript 2nd subscript
max subscript  44  77
height #rowCnt 45 #not an index value  <==
width #rowSz [1]  ==> 78 #not an index value
example value 1 picks 2nd row 4 picks 5th item in the row
variable name rowId, whichRow subId [1]
Cartesian coordinate[2] y=1 (Left index) x=4

## emulators of secDB #GregR

— secDB derivatives/cousins/emulators:

  • Pimco and Macquarie licensed Beacon
  • CS hired two secDB veterans to build a similar system on java
  • MS has a RCE (risk calc engine) project, based on scala and java
  • UBS tried it too. I applied for this job in 2011 or 2012
  • Athena and Quartz
  • BlackRock Aladdin (1988), written in java, for risk management across portfolios. All other functionalities are secondary.

I feel you need such a system only if your books have many derivative contracts that needs constant “revaluation”. This is a core feature of derivative risk systems.

Q: beyond risk systems, why is Quartz also supporting trade booking and execution?

I think secrete key is in the data store, which is central to those systems. SecDB systems feature a specially designed in-memory and replicated data store, which can be the basis of those systems.

A special data store is live and reference market data.

##short-n-hard algo@simple data structures

90% of the hardest coding questions use simple data structures. Some of these algos are

  • very short but baffling
  • doesn’t use fancy data structures
  • requires prior study; no expectation to invent them during an interview

–Linked list

Brent — check for loop in linked list

–binary tree

Morris — Morris in-order walk]O(N) #O(1)space


  • regex
  • KMP string search
  • longest palindrome substr
  • edit distance



ValueType default ctor needed in std::map[]

Gotcha — Suppose you define a custom Value type for your std::map and you call mymap[key] = Value(…). There’s an implicit call to the no-arg ctor of Value class!

If you don’t have a no-arg ctor, then avoid operator[](). Use insert(make_pair()) and find(key) methods.

Paradox — for both unordered_map and map, a lookup HIT also requires a no-arg ctro to be available. Reason is,

  • since your code uses map operator[], compiler will “see” the template member function operator[]. This function explicitly calls the ctor within a if-block.
    • If you have any lookup miss, this ctor is called at run time
    • If you have no lookup miss, this ctor is never called at run time
    • compiler doesn’t know if lookup miss/hit may happen in the future, so both branches are compiled into a.out
  • If your source code doesn’t use operator[], then at compile time, the template function operator[] is not included in a.out, so you don’t need the default ctor.


is Set based on Map@@ #c++,python

–java: I believe set is often based on map…

–std::set is based on RBTree, same as std::map. Matt Austern said

“The Associative Container map is very similar, except that in a set the elements are their own keys while in a map the elements are pairs.”


CPython source code for set (according to Achim Domma) is mostly a cut-and-paste from the dict implementation.

unordered_set implementation note says

“unordered_set is typically implemented as a linked list that holds all the elements and a hash table stores pointers to the linked list nodes.”, echoed on

FB cod`IV tips: #official

bucket sort; merge sort; graph

watch more videos

go through the Prep emails.

  • Practice.. Test your code on the board not code in your head.
  • U can confirm with interviewer if u want to start coding earlier. I tend to run out of time solving the problem. Kelly said just practice more to speed up.
  • You can request to skip small talk n start technical part earlier.
  • Messy code is ok for some interviewers, as long as they can follow your thoughts. Poor recall of language details probably OK, but not fundamental mistakes.
  • Always .. Interviewer will under-specify the problem, waiting for you to ask clarifying questions. The more corner cases you identify the better.
  • Always .. for all coding interviews, the problem is solvable within 30-40 minutes.
  • Always .. Done is more important than perfect. You can improve on a imperfect solution in terms of big-O.
  • Always .. walk through your solution with some unusual cases. Some interviewers don’t care about trivial corner cases, but some unusual inputs are valid and important, like an extremely lopsided tree.
  • Better be “deep” with one language so you can demonstrate your depth (esp. about big-O), but you could also choose different languages for different problems.
  • FB is very specific about big-O
  • some interviewers are very specific about details in the output. Empty input? What are the exact characters in the output? A client program will need the output to be exactly right.
  • FB is very specific about details in the input. Ask clarifying questions.
  • Never invent facts that don’t exist.
  • You should be constantly talking while solving the problem. Talk through more than you usually do.

IV Q: proud achievements #incomplete

See also ## lasting achievements as a techie

I need lots of technical details to substantiate my answers.

  • NYSE integrated feed with up to 400k x 8 msg/sec.
  • Barclays vol fitter with half a million B/S equation solver per minute
    • Black Scholes
    • biggest eq drv house in the world as of 2012.
    • fast refitting
    • server farm

contractor^citi-FTE #JackZ

Jack Zhang described the old AT&T’s laid-back culture – 4 managers for one foot soldier; short workday + basketball game; contractors working harder than FTE ..

Based on my first-hand observation, Citi also has a very laid-back culture, bureaucratic inefficiency, “wheel spinning” as Srini put it.. so workload is noticeably lighter than other i-banks. If I were to become FTE I would probably choose Citi. Probably low bonus, but I hate bonus.

However, consulting is still a better option for me because

  • Motivation to keep myself up to date and in-demand. The citi-FTE mode can easily decay
  • In each team learn something new … either major new domain, or minor new skill

y I use lots of if..{continue;}

Inside a loop, many people prefer if/elif/else. To them, it looks neat and structured.

However, I prefer the messier if…continue; if…continue; if..continue. Justification?

I don’t have to look past pageful of if/elif/elif/…/else to see what else happens to my current item. I can ignore the rest of the loop body.

Beside the current item, I also can safely let go (rather than keeping track) of all the loop-local variable values longer. All of them will be wiped out and reset to new values.

FB2: look-n-say sequence #80%

This is one of the 3 sample FB coding questions, fairly popular

… has my self-tested solution.  Among other things it showcases a reusable technique — VO to simplify state maintenance.

Q: Implement a function that outputs the Look and Say sequence, as explained in
Iterative solution should do. Based on the last string, we can generate a new string. Simply scan and split into segments. Each segment would produce 2 digits, to append on the new string.

Corner cases? No

FB3: edit distance==1 #untested

This trivial problem surprised me
  1. surprise — python string/list techniques are often easer than c++, so I need to adapt to the language
  2. surprise — took 40 min on white board, mostly because of c++ style thinking
A popular coding question for phone or white-board screening.

Q: Write a function that returns whether two words are exactly “one edit” away using the following signature:

bool OneEditApart(string s1, string s2);
An edit is:
  • Inserting one character anywhere in the word (including at the beginning and end)
  • Removing one character
  • Replacing one character


OneEditApart("cat", "dog") = false
OneEditApart("cat", "cats") = true
OneEditApart("cat", "cut") = true
OneEditApart("cat", "cast") = true
OneEditApart("cat", "at") = true
OneEditApart("cat", "act") = false is my tested code. I first spent 40 minutes to write on white board but it is basically bug-free.
I was writing python in c++ mode and spent too much time on the len-off-by-1 case until I decided to insert into the shorter string. Then I found a much simpler solution …
.. #1 lesson learned in python string matching:
After finding a mismatch char, comparing two slices is much much simpler than insertion technique or iterator adjustment. No performance hit. If you want to avoid string copy, you can even extract a function to take string references.

FB: spiral number pattern

This is a fairly popular coding question.

Q: given an positive integer N, print all integers from 1 to N2 in a N-by-N matrix, following a spiral pattern, starting from top-left. Please implement it in

int[][] spiral(int n); //to output

9 8 7
2 1 6
3 4 5

Facebook focuses on corner cases, complexity analysis. is my implementation — iterative, deque<deque> matrix.

## G5 algoIV constructs x-lang #dataStruct++

  1. elegant, legit simplifications
  2. Judicious use of global -vs- argument -vs- local variables
  3. iterator and  Pointer in node/VO class
  4. Construct tree, hashmap or Sorted array as auxiliary, or for memoisation
  5. Recursion
  6. sentinel node trick: array/slists

The tools below are more specialized and less widely relevant

  • Graph traversal, recursive or iterative..
  • Permutations etc, recursive or iterative ..
  • Std::lower_bound etc
  • sorting

The topics below are Not really for a coding test:

  • DP
  • concurrency

python queues #deque

In a coding test, I will use a vanilla list for both. confirms that list can be an efficient stack and an inefficient queue.

— Stack can use

  • list.append()
  • list.pop() with no arg.
  • Top of stack is list[-1]

— Queue: shows an primitive (inefficient) Queue class I wrote:

  • dequeue — list.pop()
    • A non-queue operation — pop(2) would remove and return the 3rd vector item but this is inefficient on a vector!
  • enqueue — list.insert(0, newItem) — similarly inefficient

A deque or circular array (fixed capacity) are more efficient for a queue. Linked list is slower due to frequent allocations.

[[python cookbook]] P659 compared these and other alternatives. The fastest is collections.deque.

donut bonus often precedes job loss#XR

I won’t talk about my personal experiences (too painful) but in general, a donut bonus often means

You are free to go

You are the weakest member of the team

You are not qualified not competent for the role

You will be assigned some unimportant shitty work

You need to improve or get out of here

You are classified as “damaged goods”.

If you choose to remain, you may get increasingly harsh treatment, until you move on.

You may not have the option to stay here and remain an organization deadweight

However, if 1/3 of the team members get donut bonus, then it’s normal and not so bad.

25-horse puzzle #Google

Q: A single racetrack. P=25 mechanical horses each with a unique speed. You can race up to 5 horses each time, and get a printout of the ranking, but no timing.

What’s the minimum number of races to identify fastest #1 #2 #3 horses.

Youtube and my green quant-IV book both have the ananlysis.


Quick elimination — each time I can eliminate minimum 2 horses but can we eliminate faster?

What if T is 1 and P is 9? Two races exactly.

Here’s my algo #1

  1. first pick 5 random horses,
  2. race them and eliminate slowest 2 (reduce population by 2). Name the top 3 as AA/BB/CC
  3. race CC with 4 untested random horses. Four scenarios:
    1. if 3 or more beat CC, then eliminate the two slowest including CC, and put the top 3 in a group with AA/BB. Go back to Step b)
    2. if 2 beat CC, then eliminate the three slowest including CC, and put top 2 and an untested horse in a group with AA/BB. Go back to b)
    3. if only one horse beat CC, then eliminate the four slowest including CC, and put the #1 and 2 untested horses in a group with AA/BB. Go back to step b)
    4. if CC is #1 (happy path), then eliminate the other four horse and go back to step c)

Worst case — first race actually had the slowest 5, then we hit case C1 every time. So we only eliminate 2 horses each race

best case — first race had 5 fastest. total 6 races [1] but this is statistically unlikely. If you are not the most lucky, then we will need more than 6 races.

[1] first race eliminates 2… next 5 races each eliminates 4 only if CC is always fastest in every race.

–algo #2:

  1. [5 races] Split into 5 groups and run 5 unrelated races. Discard slowest 2 in each group. We end up with 15 horses.
  2. [1 race] Now race the 5 number one’s. Name them ABCDE. The D and E groups are completely eliminated (-6) and the C group is left with C alone (-2), and B group’s #3 is eliminated (-1). We end up with 6 horses. I will name the seven as A/B/C and the less-tested A2 A3 B2. There are only three possibilities:
    1. ABC
    2. A A2 A3
    3. A B B2
  3. [1 race] so I will race 5 of the 6 horses, sparing A since it’s already the final champion.
  4. — total 7 races best or worst case.


detect cycle in slist #Part 2

Note there’s an open question at the end

Don’t use fast-slow iterators. These 3 solutions below each has advantages over the fast-slow iterators solution.

–solution: list-reversal, based on and implemented in
Basically, iterate from aa to bb to cc … and reverse each arrow. Suppose dd points to bb initially forming a loop. At some point the snapshot is

  • null <- aa <- bb <- cc <- dd  while currentNode is dd and nextNode is bb.

If we go on, we would be back at aa, proving the existence of a loop.

Benefit of this solution — O(1) space complexity and O(N) time complexity

constraint — wont work on read-only lists .

–solution: count nodes against memory usage — my invention
We can get a reliable estimate of the memory usage of the entire data structure, and divide it by per-node footprint, so we know within good approximation how many nodes exist. If we count more than that, there must be duplicates. shows one way to estimate the memory usage — keep track of the maximum and minimum addresses among all visited nodes.

The low level languages usually provide addresses. The higher level languages usually provide memory usage. In realistic applications, there is always a way to get an estimate of memory usage.

–solution: Brent. See my code

If we keep count of total number of increments, we should see that every time we double “power”, that total count is a power of 2. If we have incremented 32 times then we know { mu + lambda } must exceed 32… Advantages over the simple-2-iterator:

  1. tortoise is much faster, so if mu (merge point, or loop starting point) is far out, then we will find it faster. With the standard solution, tortoise takes a long time to reach the merge point.
  2. can return loop size

I now believe I can predict in which “phase” we will detect the loop — If lambda is between 2^5 and 2^6, then we will detect the loop in Phase 6, and we can’t detect in Phase 5

I suspect power-of-3 is also working:

  • If the slower iterator stops moving completely, then the algorithm is broken because the slower iterator could remain outside the loop.
  • If the slower iterator tailgates the faster iterator and the following distance is always always within a static limit (like 99), then the algorithm is broken because the loop size could exceed 99, so the two iterators would have no chance of rendezvous.
  • ==> If the following distance grows gradually, AND the follower is never stuck forever, I think eventually they will meet —
    • Suppose right after a tortoise jump, powOfN bumps up to 81 but actual loop length is shorter, like 80. Now within the next 81 iterations, the hare would move step by step while the tortoise remains. They would rendezvous for sure.
  • —- pros and cons for power-of-3 over power-of-2 —-
  • pro: if a big loop involves root, tortoise won’t have to jump too early too frequently ==> Fewer comparison needed.
  • con: if there’s a tiny loop far out, tortoise would end up jumping too late?

elaborate python project/challenge #PWM

Key is “technical challenge”. Without it, I may look like just another python coder. Some of the interviewers may not be easy to impress. I will only target the majority.

Different companies might have different needs for python skill. I can only guess

  1. Usage: data analysis, data science
  2. Usage: automation, glue language
  3. …. py as primary language to implement business logic in an enterprise application? Only in Macquarie, baml and jpm
    1. I could remake my PWM Commissions system in python

For 2), I used

  • devops
  • integration with git, maven, g++, code generation, test result parsing, …. for devops
  • automated testing
  • simple network servers
  • subprocess
  • integration with shell script
  • automatic upload
  • generate c++ source code to be compiled into a native python extension
  • import hack
  • logging decorators —
  • python code attached to trade object. Code to be evaluated at reval time.

lazy singleton based@JVM dynamic classloading #Lea

In [[DougLea]] P86, this JVM expert pointed out that a simple “eager” singleton is eager in other language, but lazy in Java due to runtime on-demand class loading.

Specifically we mean a public[1] static final field. This initialization is thread-safe by default. Assuming the field is immediately accessed after class loading, this simple design is comparable to the familiar synchronized lazy singleton. What are the pros and cons?

Synchronized singleton requires more legwork (by developer) but
* It lets you pass computed ctor parameters at runtime. Note the singleton ctor is private but yo can call getInstance(userInput).
* As hinted earlier, if you load the class but do not immediately use the instance, then the simple design incurs the expensive initialization cost too early.

[[DougLea]] was writen before java5. With java5, [[EffJava]] advocates enum.

[1] DougLea actually prefers private field with public getter, for encapsulation.