## coding IV P/F

See also ##failed c++cod`IV: home^IDE^whiteboard:4 beat-fronts

  • paper — is a shorthand for paper, dumb editor, white-board etc
  • webex vs onsite are somewhat similar, usually pair-programming. Take-Home setting is vastly different.
  • pts (points) — awarded to myself, indicating the performance, value, difficulty ..
  • p/f are only based on interviewer’s subjective judgement
  • pp means passed convincingly, with flying colors
  • [1] I focused on working solution! See %% priorities in a take-home coding IV
  • [2] highlight: where practice can make a difference in a make-or-break test
lang location IDE? firm/person pts 1-5 Notes category type@practice needed [2]
py weekend hackerrank block.one #Ashish 5 2020 6q 6000 minutes. I took up two algo
java weekend IDE Blackrock #YH 4 2020 lexdfs algo
py/c home hackerrank Flex #Deepak 2 2020 3q algo
py home hackerrank p MLP-algo
#Ashish
3 2020 3q algo
py home hackerrank f Optiver
#Ashish
4 2019 8H: Morse;prod@2BigInt tough algo
c home hackerrank p FlowTrader #Ashish 3 2019 3q 2H algo
c home hackerrank p Ziliqa #Ashish 2 2019 see mail 20 Oct std algo speed-coding; memorize syntax
py webex IDE p GS-HK 4 2019 See highest point ]%%visible python proficiency
py onsite paper f FB 2019 full day
c/py onsite paper p FB 5 2019 first round algo Leetcode medium/hard
c weekend IDE HRT 5 2019 parser dStruct heavy
py onsite paper 😦 Indeed 3 2019 algo algo practice
c weekend IDE they gave up redMart 5 2019
c/py home hackerrank no news Altonomy 4 2019 4 problems #Deepak
c onsite paper p Quoine 1 2019 balanced brackets #5-min job algo
py webex IDE p Indeed 5 2019 see my blogposts algo speed-coding; memorize syntax
c home hackerrank p TradeWeb 1 2019 CSY: basic STL map manipulation algo STL + basic algo practice needed definitely
c phone phone p SCB-FM 2 2018 pure algo 1 easy + 2 hard ones. I over-performed algo array/slist problems
c weekend IDE 😦 quantlab 4 2018 Houston HFT weekend coding drill ROTI dStruct
c/thr home IDE p Pimco quant 3 2018 2H parallel programming thr study pthreads
c home hackerrank p FlexTrade 3 2018 slist cleanup#github std algo array+list problems
java home hackerrank 😦 Pimco #190m 2 2018 Q9: path through matrix. Ineffi [1] std algo too hard, but some 2D practice can help
java home hackerrank 😦 Pimco #190m 2018 Q8: buy 1 sell any algo array problems
c onsite paper p LiquidNet 2 2018 30 minutes algo array problems
c home codility 😦 Mako 4 2018 monkey. Ineffi [1] algo too hard
c home codility 😦 Mako 3 2018 quasiconstant [1] algo array practice
c onsite paper p SIG 2 2018 15-30min … RAII smart ptr std lang
c onsite paper p SIG 3 2018 push_back std lang
py onsite IDE p SIG 3 2018 movie query dStruct algo practice
c weekend hackerrank p Promethean 5 2018 2 tough graph problems algo too hard but some QQ can help
c/py home hackerrank pp Kraken 3 2018 3 (small?) problems algo algo
any onsite paper 😦 Nsdq 2018 oracle day-trading algo array problems
any onsite paper 😦 Nsdq 2018 check array 0-N in-situ algo array@int problems
java home IDE pp Nsda 4 2018 drone: medium-level implementation SDI hard to prepare
c onsite paper pp CVA 3 2018 O(N) in-situ filtering algo array problems
c onsite paper 😦 FB 2018 2 short problems std algo algo
c onsite 😦 Trex 2018 exchange messaging SDI no practice can help
py onsite IDE pp Trex 1 2018 3 short tricky problems. only 1 implemented algo algo practice can help
c onsite paper pp Quantum 1 2018 virt func not declared in base trivial lang
c/thr webex paper pp bbg 4 2018 single-thr pool QQ lang
c webex paper pp bbg 3 2018 shared ptr ctor/dtor QQ lang
py onsite paper pp bbg 5 2018 continuousSentence algo backtracking, string problems
c/thr onsite paper ? Wells 2 2017 concurrent queue std
c onsite paper p Wells 1 2017 remove_spaces. Ineffi trivial string problems
c onsite paper p bbg 2 teams 2 2017 array reshuffle lang
c onsite IDE p bbg 2 teams 1 2017 isBST() std tree problems
c onsite paper P! bbg 2 teams 1 2017 string string problems
c onsite paper 😦 bbg 2 teams 2017 filters on screen dStruct heavy dStruct
C home hackerrank 😦 Thesys 2017 3 problems out of 5 algo algo
C home IDE pp Thesys 5 2017 first N prime Fibonacci hard to prepare
c webex IDE pp bbg 2 2017 top N active stocks dStruct heavy dStruct
c onsite paper pp BAML 1 2017 various lang
c weekend IDE 😦 GS 3 2017 TickEngine dStruct heavy dStruct problems can help
C webex IDE pp bbg 2 2017 free slots dStruct heavy algo
C webex paper p bbg 1st round 2 2017 tree serialization std algo tree algo
Java onsite paper pp BGC 2 2017 multiple short programs lang
Java weekend IDE pp BGC 3 2017 connected or not hard to prepare
Java/thr weekend IDE p HSBC 4 2017 3 big questions barely passed hard to prepare
Java/thr home IDE pp pimco 4 2017 iterator again lang
Java onsite paper pp pimco-Zoltan 2 2017 a few short problems lang?
c webex IDE 😦 Citadel 2017 array shrinking array problems
py webex paper ? Broadway 2017 hashtable. Code (on github) works std algo implement std containers
cpp weekend IDE ? iRage 3 2015 order book again. No response dStruct heavy hard to prepare
py home codility p baml 2016 Ashish: applying to Qz algo
Java home codility 😦 baml 2015 Qz FX option team. too lazy algo
c home codility 😦 Jump 3rd 2015 algo
c weekend IDE pp jump 1st 3 2012 order book dStruct heavy hard to prepare
c weekend IDE 😦 DRW 3 2015 Tetris. code quality hard to prepare
c webex paper 😦 bbg -London 2015 unable to understand Q1 of 4 std algo algo
c# webex paper pp eikon 2 2013 short, easy questions QQ lang won’t practice
java home IDE 😦 MS 4 2011 FIX. Big take-home test hard to prepare
swing webex IDE pp Barx 2 2012 swing QQ lang won’t practice
C home IDE 😦 Mac 2012 2 short problems lang
java home IDE 😦 MS-comm 2012 too lazy hard to prepare
c onsite paper pp Cantor 1 2011 auto-ptr QQ lang won’t practice
java onsite paper 😦 Barc SOR 2011 recursion recursion
java onsite IDE pp RBC 1 2011 30-45 min, bond tick conversion lang
java onsite IDE 😦 UBS 2011 Suntec lang
java/thr onsite whiteboard 😦 UBS 2011 Jack Yang lockfree stack QQ
java/thr onsite whiteboard 😦 Barc 2011 wait/notify + lockfree QQ
java/thr onsite IDE 😦 Lab49 2010 executor lang
java/thr home IDE pp Gelber 3 2011 multithreaded hard to prepare
C home IDE 😦 Amazon 2011 didn’t take it seriously algo
C onsite whiteboard 😦 FB 2012 regex QQ std algo too hard, but some QQ can make a diff
any onsite whiteboard 😦 Goog 2007 algo

clone a std::vector by memcpy@@

In general, we can’t use memcpy to clone the internal array of a vector.

If we have a vector of simple value types like an int or float, memcpy is probably simpler.

If there’s any pointer member data, then we are in trouble. For example, if there’s any std::string in the internal array, memcpy will bypass the string copy operations. The internal pointer in the string will be copied incorrectly. When the original array content gets deleted, our cloned strings will be messed up.

There are many other element types that involve pointer:

  • the element may have a smart ptr data member. All smart ptr classes provide special copy control, not to be bypassed.
  • the element may be another container. Every container I know uses pointers. They all have special copy control.
  • the element may have some shared resource like a mutex or a connection. These resources are typically allocated on heap. When the original vector gets destroyed, the resource will need release or clean-up

Q: For vector of int, is memcpy faster than copying element by element?
A: Probably not. Both O(N). A for-loop to copy an array of integers may compile to the same binary.

growing popularity@scripting languages #threat2c++

See also sg19 %% attitude on java, relative2c++ and ..

By 2019, I have noticed growing popularity of high-level languages like GO, node.js and python … all dynamic, scripting languages, in contrast to compiled languages. However, java is still fairly relevant. I won’t say too much about the possible reasons — ease of learning, ease of finding and training new hires…

i feel this trend is a continuation + evolution of a longer trend — from c/c++ to java/c#.

One direct impact of this trend — demand for c++ skill is shrinking[3], again. The remaining c++ jobs tend to pay lower than these “easier” languages or require deep expertise (like HFT shops) .. very bad deals for the c++ job seekers.

[3] It’s important to remember that demand for c# and perl developers is also shrinking.

come-n-go /Flash-in-the-pan — In my younger days (early 2000’s), there were also a growing choice of scripting languages but only one became general purpose — Perl. I feel GO, node.js may lose popularity over the coming years. These languages are popular partly because they are new. I described elsewhere that among the scripting languages, only 1 or 2 will dominate. Perl was dethroned by python.

See y re-enter c++if jobs fewer — I still believe c++ is harder and “builds character“, unless you spend 90% of your c++ career using wrappers that other people wrote and don’t bother to look under the hood.

proof-of-work ] blockchain #key questions

Based on my discussions with some experts in the field..

Say Cody just paid Bob some amount in bitcoins. At this time, there could be a large number of pending (unaccepted) transactions like this one waiting to be accepted into the linked list (i.e. the block chain).

One or more miners without coordination, can pick up this transaction + up to 999 other transactions and attempt to add them to the block chain.

They have to overcome a challenge, a computationally intensive challenge, by brute force — They need to find a number (called nonce) such that the integer hash code from hashing function

H(localhost timestamp, //not always in sync with other hosts
[T1, T2, … T1000], // 1 or more pending transactions
the minder’s Id, //typically the IP
nonce,
previous block’s hash // <– forming a linked list or block-chain
) < barrier

The barrier is a binary number with 3 (or more) leading zeros. In other words, the challenge is mining for an nonce to give a low-enough hash code with enough leading zero bits. Once a miner finds a “solution” and creates a valid block, she broadcasts the block’s hash code (with enough leading zeros) to the network. If one node verifies it, then other nodes would soon verify it.  There’s no central authority to decide when to accept. A node accepts it and uses it as “previous block hash” in a new block, as described in TOWP.

The head block of a linked list is special — no previous block! (A Git repo has only one such “block”, but not in blockchain.) 2nd block includes the first block’s hash. 3rd block includes 2nd block’s hash, and indirectly includes first block’s hash. Therefore, for any given block KK, its entire ancestry lineage is hashed into KK.

  • TOWP — TheOriginalWhitePaper
  • POW — proofOfWork
  • lucky — Some miner could be lucky and find a “solution” in the first try, but the challenge is so tough that there’s only brute force.
  • LevelOfDifficulty — is the number of leading zeros, like 3. When hardware speeds improve, someone (perhaps the merchant) will increase the LevelOfDifficulty, as explained in TOWP
  • immutable — Just like Git, every block is immutable once accepted into the chain.
  • The most common hash function is SHA and SCRYPT.

Q: how fast is the verification vs POW
A: POW is supposed to take 10 min on average. Verification should take nanosec, according to the experts I spoke to. Beside the “attached” transactions, I don’t know what inputs there are to the verification, but it computes a hashcode and compares it to something.

Q: Fundamentally, how would verification fail when an owner double-spends a coin?

Q: what if two miners both pick Cody/Bob’s transaction and broadcasts their “solution” hash code? The first miner would win the race (and get the reward after some delay)

Q: How are two independent linked lists handled in one host?

Q: what’s the need for POW?

Q: what’s the motivation for the a miner to take up a transaction?
A: there’s a 25-coin reward, or possibly higher if the Cody or Bob increases the reward

Q: does each coin have an id?
AA: no, but there is ID for each transaction, each block and each account.

Q5: what if verification fails?
%%A: I feel it’s rare but possible. If happens, the node would ignore it.

Q5b: how would the miner know? Not sure.

 

movie query #SIG

Like the other coding question from the same employer, this is mostly about nested data structure.

Q1: Given sample csv file below (unsorted), implement an efficient query function get_movies(genre, Start_year, End_year). Optimize for query, not initial loading. Bonus points if you make it work within 75 minutes.

  #movieId,title,date,genres
  1,Toy Story 3,2010,Adventure|Animation|Children|Comedy|Fantasy
  2,Toy Story,1995,Adventure|Animation|Children|Comedy|Fantasy
  ...... 20,000 more lines

https://github.com/tiger40490/repo1/blob/py1/py/movieQuery_SIG.py is my working code, good enough for a small data set like movies, but inefficient for bigger data set like newspaper articles.

Q2: what if the number of genres could be thousands, like a tag cloud.

— Q1 analysis
My initial thought was on binary search, with O(logN) + O(E-S) which is often equivalent to O(N). Locate first and last item in a sorted list, then iterate over enclosed items.

After some brief discussion with interviewer, I came up with a design of O(1) + O(E-S), arguably the fastest, featuring a jagged 2D array:

  • Assumption: genres are shared and shows up repeatedly in the file, rather than random strings that may show up only once.
  • use contiguous array to hold all movies, array indexed by year.
  • Within each “year bucket”, store multiple movies in a contiguous array indexed by genreId.
    • Note there is no gap in the genreId /numbers/. I initially said hardcoded genre enum but interview (Ken) challenged me with frequently changing genres. I now feel we can generate genreId as we load the csv file. This is comparable to some perfect-hashing described in Q2 analysis below.
  • each “cell” holds a list-of-shared_ptr-to-movie
  • each movie object is on heap, and owned by shared_ptr’s

— Q2 analysis:
If tags are user-generated, random and uncoordinated, then the total population of tags could be unlimited. Our genreId would go up to millions. Solution? Perhaps use a separate hash table within each “year bucket” instead.

However, since the query is “by tag” it implies that the most important tags are shared.

At this point, I feel design should depend on the real, not imaginary, context, so it’s kind of pointless to speculate what type of tags we will get.

OMS skill – !! standardized !! portable

I now feel the glorified OMS know-how is vague and poorly-standardized.

  • My B2bTradingEngine at 95G does a lot of OMS.
  • Smart order router module often does some OMS.
  • FIX connectivity module often does some OMS.
  • VWAP and other execution algos are usually considered a category of OMS.
  • … Many of the above are very different skills and have almost nothing in common.

I am afraid that even after 3Y or 5Y in some OMS system, when I apply to another OMS job they may realize I’m not a veteran.

Q: In comparison, which domain is more standardized in terms of skillset?
A: raw mkt data. Note 2nd-hand market data processing is much less valuable
A: basic bond math. Note more advanced bond math is rarely needed
A: FIX connectivity

  • How about VaR?
  • quote distribution systems in Citi and OC? poorly standardized

specialty jobs outpay commodity finIT jobs@@ #Mithun

In 2007 I worked on the most common type of financial IT job — 1) web java with 2) a big database and stored-procedures  3) small amount of javascript (possibly with ajax). 4) In my case, there was also some scheduled batch job.
It was in Private-Wealth-Management but no financial domain knowledge required.
(I have since moved on to “fancier” systems, not only pricing or market data.) I started to think the most common dev job like the PWM job doesn’t pay as well, but what’s the reality that you see throughout your career?
Q: does the most common type of financial IT job pay lower because it only needs commodity skills?
[Mithun] Not always. It depends on how critical the demands are and how much $ impact it has.
Trading as such they have money to spend, sometimes demanding & definitely impacts $ 
there is a short-term trend —  depends on demand & supply. 
 
I have lately seen lot of money thrown at angular even thought it might not be challenging as concurrency, but then u need many angular developers for a few server side developers. But then this skill fast retires.
Perhaps it pays just as well, or 5% lower than the specialty dev jobs in trading systems?
Commodity java dev will get around 145k while concurrency guy will get 175k…. 
Things have changed in that “most common type of financial IT job”, which I can only define in vague terms.
* once there was lots of spring/hibernate usage
* once there were lots of xml-related server-side libraries
* I guess there are more RESTful interface?
* more javascript code
* fewer stored procedures
* more noSQL systems, less SQL
* I guess there’s now more Hadoop??
yes Hadoop , spark, but I see a lot of demand for python, and they are coming with rich libraries.

buggy RAII-powered smart_ptr #SIG

Q1: given a well-behaved class X, someone wrote a RAII wrapper below. What problems do you see?

Q1b: how would you fix them?

%%A1: memory leak of x1, due to the synthesized op=()
%%A1: double-delete on x2

Q1c: OK Good. Now what if X has subclass Y,  all with virtual dtor?
%%A1c: X and Y each create virtual clone() to return a pointer to the host type

Q2: what’s the outcome of double-delete?
AA: undefined behavior. You are lucky if it crashes.

struct A{
  A(X* x): myX_(x){}
  ~A(){delete myX_;}
private:
  X* myX_;
};
void demo(){
  A a1(new X()); //unnamed x1 object on heap
  A a2(new X()); //unnamed x2 object on heap
  a1 = a2;
}
/////////// above is original code; below is my Q1b answer -- Add op=()
A & operator=(A const & rhs){
  if (this != &rhs){
    auto tmp = this->myX_; //delete only after successful "new"
    this->myX_ = new X(*rhs.myX_); //clone the X object of rhs
    // if new or ctor throws, the memory is not really allocated 🙂
    delete tmp; //let's assume no exception here.
  }
  return *this;
}

design IV: telltale signs of disapproval

(Note a subtype of design interview is the SDI.) When interviewer asks, as described in https://www.susanjfowler.com/blog/2016/10/7/the-architecture-interview:

  • why you choose this approach (this design, this direction…)
  • what are the pros and cons of this approach
  • what alternatives there might be

It’s 80% a sign of disapproval.

If they are only curious, they would phrase it differently.

Remember interviewer has a rather fixed idea how this design should go. If you propose something unfamiliar to her, she can’t “lead” the interview with confidence. She risks losing control. Therefore, she has no choice but steer you back to her familiar territory.

Most of them won’t want to admit that your idea is plausible but different from her idea.

per-thread^per-DBConn writeBuffer

  1. Broadly speaking, if you write to a table in your DB connection but don’t commit, it will be invisible to other connections.
  2. Similarly, if a thread writes to shared object without memory fence, the updates is only in the per-thread buffer and invisible to other threads.
  3. File write is an even more common scenario. One process writes a single line to the file, but doesn’t flush. It will not be visible to other processes.
  4. CVS

frequency@(hardware)timer interrupts #KHz

In the 2003 edition of [[LinuxKernel]] P195, 1200 interrupts/second was the highest frequency, above 1KHz.

The CPU must process these timer interrupts. If it were easy to “nullify” these interrupts and eliminate the CPU overhead, then we would have millions of time interrupts per second but I doubt it. The overhead is probably unavoidable.

Scheduler is a kernel component. I think scheduler divides cpu time into slots. I guess the smallest quantum is limited by this frequency, among other factors.

P350 of The same book says that scheduler runs upon timer interrupts + keyboard interrupts.

Note many people get confused by software timer vs hardware timer. Hardware timer is my focus.

Many people get confused by software interrupt vs hardware interrupts. I have a separate blogpost (interrupts=hardware interrupts #by default), but basically most interrupts are hardware interrupts. Software interrupts are simulated.

 

##simplicity@design pushed2the limit #

Here I collection simple concepts proven rather versatile, resilient, adaptable. Note in these designs, the complexity can never disappear or reduce. Complexity shifts to somewhere else more manageable.

  • [c] stateless — http
  • [!s] microservices — complexity moves out of a big service into the architecture
  • [c] pure functions — without side effects
  • use the database concept in solving algo problems such as the skyline #Gelber
  • stateless static functions in java — my favorite
  • [c !s] garbage collection — as a concept. Complexity shifts from application into the GC codebase
  • REST
  • in c# and c++, all nested classes are static, unlike in java
  • [!s] python for-loop iteration over a dir, a file, a string … See my blog post
  • [c] immutable — objects in concurrent systems
  • [c] STM i.e. single-threaded mode, without shared mutable #Nsdq
  • [c] pipe — the pipe concept in unix is a classic
  • JSON
  • [c] hash table as basis of OO — py, javascript, perl..
  • [c] sproc (+trigger) — as a simple concept “data storage as guardian of its data, and a facade hiding the internal complexities”
  • [!s] dependency injection
  • [c !s] EDT — swing EDT and WPF
  • [c] RAII
  • [!s] smart pointers as a concept
  • singleton implemented as a static local object, #Scott Meyers
  • DougLea’s singleton — all-static class
  • [c=celebrated, classic, time-honored, ..]
  • [!s = not so simple in implementation]

 

##RDBMS=architect’s favorite

A specific advantage .. stored proc can _greatly_ simplify business logic as the business logic lives with the data …

Even without stored proc, a big join can replace tons of application code implementing non-trivial business logic. Hash table lookup can be implemented in SQL (join or sub-query) with better clarity and instrumentation because

* Much fewer implicit complexities in initialization, concurrency, input validation, null pointers, state validity, invariants, immutabilities, …
* OO and concurrency design patterns are often employed to manage these complexities, but SQL can sidestep these complexities.

Modularity is another advantage. The query logic can be complex and maintained as an independent module (Similarly, on the presentation layer, javascript offers modularity too). Whatever modules dependent on the query logic has an dependency interface that’s well-defined and easy to test, easy to investigate.

In fact testability might be the most high-profile feature of RDBMS.

I think one inflexibility is adding new column. There are probably some workarounds but noSQL is still more flexible.

Another drawback is concurrency though there are various solutions.

stateless (micro)services #%%1st take

in 2018, I have heard more and more sites that push the limits of stateless designs. I think this “stateless” trend is innovative and bold. Like any architecture, these architectures have inherent “problems” and limitations, so you need to keep a lookout and deal with them and adjust your solution.

Stateless means simplicity, sometimes “extreme simplicity” (Trexquant)

stateless means easy to stop, restart, backup or recover

Stateless means lightweight. Easy to “provision”, easy to relocate.

Stateless means easy scale-out? Elastic…

Stateless means easy cluster. Http is an example. If a cluster of identical instances are stateless then no “conversation” needs to be maintained.

TCP sliding-window: congestion^receiver

Based on P 248/265 [[computerNetworking]] , 2011 paper@high-perf messaging, and my blogpost on AWS^SWS

Sliding window is a protocol or a design framework. AdvertizedWindowSize is an integer field in the Ack message. There are also two variables “ReceiveWindow/CongestionWindow” in the TCP sender codebase (part of kernel).

Sliding window kills two birds with one stone — 1) congestion control and 2) overflow protection on the receiving end. Without these controls, sender can pump packets too fast causing

  • receiver socket buffer overflows
  • congestion on the sender-to-receiver path so some of the router buffers overflow

Basic idea is limit sending rate such that

  • unacknowledged bytes := lastByteSent – lastByteAcked < ReceiveWindow
  • unacknowledged bytes := lastByteSent – lastByteAcked < CongestionWindow

UDP doesn’t limit sending rate and is favored by many aggressive applications such as market data dissemination.

[18]fastest threadsafe queue,minimal synchronization #CSY

I got this question in a 2017 Wells white-board coding interview, and discussed with my friend Shanyou. We hoped to avoid locks and also avoid other synchronization devices such as atomic variables..

Q1: only a single producer thread and a single consumer thread and no other threads.

I put together a java implementation that can enqueue without synchronization, most of the time … See https://wp.me/p74oew-7mE

Q1b: Is it possible to avoid synchronization completely, i.e. single-threaded mode?
A: No. Consumer thread would have absolutely NO idea whatsoever how close it is to the producer end. No. We asneed a memory barrier at the very least.

Q2: what if there are multiple producer/consumer threads?

I believe we can use 2 separate locks for the two ends, rather than a global lock. This is more efficient but invites the tricky question “how to detect when the two ends meet“. I am not sure. I just hope the locks enforce a memory barrier.

Alternatively, we could use CAS on both ends, but see lockfree queue #popular IV

 

RTS: Don’t rely@QA2test your code@@

insightful article: managing tight deadlines is the background

My ICE manager said "Don’t rely on QA team to test your code. Do your thorough testing before passing to QA."

This is Impractical given the thousands of detailed items in a 50-page spec. Thorough developer testing would take more time and I didn’t do it. A colleague was able to give QA team a well-tested rpm and received only 30 bug reports (mine received 60 to 100), but did it win him any award? No.

  • I could explain my requirement is more complex. Manager may or may not accept.
  • Sometimes we could explain one of us is new
  • Which system has more crashes in QA/production? Each crash can take a long time to figure out. Thorough developer testing doesn’t help much here.
  • In the end, which system goes live faster? That’s the visible yardstick. 2nd visible yardstick is how quickly we start QA, so by cutting corners on developer testing, I was able to start QA earlier. A more thorough developer test won’t help with any of these visibles.

thread^process: lesser-known differences4IV

Popular IV question. Largely a QQ question.  Some may consider it zbs.

To the kernel, there are many similarities between the “thread” construct vs the “process” construct. In fact, a (non-kernel) thread is often referenced as a LightWeightProcess in many kernels such as Solaris and Linux.

  • context switching — is faster between threads than between processes because no TLB flush needed.
    • In linux, context switching between kernel-threads is even faster.
  • creation — some thread libraries can create threads without the kernel knowing. No such thing for a process.
  • socket — 2 threads in a process can access the same socket; two processes usually can’t access the same socket, unless …
    • .. unless parent-child. See post on fork()
  • memory — thread AA can access all heap objects of Thread BB, and even BB’s stack objects via pointers. Two processes can’t share these, except via shared memory.
  • a non-kernel thread can never exist without an owner process. In contrast, every process always has a parent process which could be long gone.

 

horror story: java app adopting protobuf #json

https://softwareengineering.stackexchange.com/questions/153903/move-from-json-to-protobuf-is-it-worth-it says

Somebody added protobuf (because it was “faster”). The only thing faster is RTT because of of smaller payload, and that can be fixed with gzipped JSON.

The part that is distasteful to me is the relative work to maintain protobuf (compared to JSON). I use java so we use Jackson object mapping for JSON. Adding to a response means adding a field to a POJO.

In contrast, for protobuf I have to modify the .proto file, then update the serialization and deserialization logic which moves the data in/out of the protocol buffers and into POJOs. It has happened more than once that a release has happened where someone added a field and forgot to put in either the serialization or deserialization code for the protocol buffers.

[19] deep-learning seminar takeaways

Historical observation by Ameet of DeterminedAI

  • Earliest application domain — image classification (part of Computer Vision), using GPU. beat human performance in 2015
  • Initially Apache Sparks and MLib didn’t support deep learning
  • DL only became a college comp science topic since 2010’s. AI is now one of the hottest majors.
  • ML was mostly an “academic” discipline… need to become an “engineering” discipline. 2 barriers
    • complex workflow
    • fragmented, not holistic infrastructure

DeepLearning training phase can take months

Hyperparameters — configurations to define a specific DeepLearning model

Top 3 classic DeepLearning domains — computer vision, (audible) speech recognition,  and textual NLP.

— NLP: mostly text parsing. Virtually every successful NLP application relies on text parsing.

— “social science applications” have been less spectacular

  • CRM
  • Recommendation system
  • Mobile advertising
  • Financial fraud detection
  • NLP – highly successful application of deep learning even though data is produced by humans !

SDI: URL shortening

Q: Design TinyURL or bitly (a URL shortening service)

Given a (typically) long URL, how would how would you design service that would generate a shorter and unique alias for it.

Discuss things like:

  • How to generate a unique ID for each URL?
  • How would you generate unique IDs at scale (thousands of URL shortening requests coming every second)?
  • How would your service handle redirects?
  • How would you support custom short URLs?
  • How to delete expired URLs etc?
  • How to track click stats?

https://www.interviewbit.com/problems/design-url-shortener/ is a long discussion.

high-volume DoS-guard #Nsdq SDI

Q: Design an API Rate Limiter (e.g. for Firebase or Github)

You are expected to develop a Rate Limiter services that can:

  • Limit the number of requests an entity can send to an API within a time window e.g., 15 requests per second.
  • The rate limiting should work for a distributed setup, as the APIs are accessible through a cluster of servers.

(A similar question was asked at Nsdq… )

Q2: how do your cluster of cache servers detect a given IP on the Internet is sending requests too frequently, causing Denial of Service? How do you protect yourself?

Q2b: After you blacklist a client IP, it goes quiet, then it sends a single request again. How you decide whether to ignore the request?

Q2c: what algorithm to decide if a client IP has legitimate need to send lots of requests vs another client IP engaging in Denial of Service attack?

Q2d: what if distributed DoS attack?

https://en.wikipedia.org/wiki/Denial-of-service_attack#Defense_techniques has practical solutions.

3ways to call ctor #placement new

  1. to construct an Acct at a pre-existing address, call placement-new, which implicitly uses Acct ctor
  2. to construct an Acct in stack or static memory (usually data segment), just call ctor explicitly
  3. to construct an Acct in heap, call new, which implicit uses ctor.

By the way, you can also allocate memory without calling ctor. The malloc() and q[ operator new ] can do that, as explained in [[moreEffC++]]

[18] 2nd in-depth job]c++ : critical-mass ] GTD

Pattern:

  • After Citi-muni, even though I had enough experience to pass similar job interviews, I didn’t feel confident in GTD, so I took a 2nd real time trading system job in Baml, and reached critical mass
    • I did learn more in the ensuing 3 months than I would have over another 3 months in Citi
  • RTS is similar. I could already pass real time c++ interviews, but I didn’t feel confident in GTD.

Q: How about c#? I actually feel confident about GTD in a future c# team, so I didn’t ‘need a 2nd c# job?

hardware-Interrupt^signal #unfinished

Let’s compare in terms of delays, socket, timer devices, software timers…

I feel interrupts^signals are somewhat like cells^molecules — at different levels!

— socket
signals are important tools for socket programmers at application and library level.  Kernel support for sockets probably use signals too.
Interrupt (handling) is in kernel (codebase) and are probably too low-level so application developers are a few abstraction levels higher.
SIGIO is mostly used for socket and other I/O devices.
— timers
timer interrupts actually refer to hardware-timers, a few abstraction levels below application developers.
software timers are a very different animal. Software timers integrate with SIGALRM.

most(new/old)specializations turn out non-strategic

Look at my past vindicative specializations vs 

The Singapore government make technology bets. Venture capitalist make bets. Many organizations also make technology bets.. Every technology bet can lose or win.

In this post, I’m not advocating trySomethingNew. I am advocating specialization, which often requires accumulation and sticking to something not so new, like FIX, threading, SQL, sockets, bond math, …

 

engaging[def1] #marketableDomainXp.xls

Let’s keep this in blog , not spreadsheets.

“Engaged/engaging” means level of mental engagement, including traction, level of interest and positive feedback loop. This depends on many things such as trySomethingNewStrategic. In retrospect engagement may not mean much, but it does affect my satisfaction then and there, which is important according to Theodore Rubin.

Engagement is a major determinant of medium-term job satisfaction. Because of it I walked away from CVA, Barx and other higher, easier java jobs.

This factor is notoriously n inherently elusive, unstable (strategic). Engagement is impermanent, then-and-there, like joy and sadness. I once found wafer fab spacesuit fashionable! Big data, wpf, quant, swing, FIX, kdb, … were once attractive to me. What’s engaging used to be stategic trySomethingNew, but look at other blog posts… It’s hard to identify an endeavor as long-term engaging.

When I was into option math, I didn’t want any low latency or connectivity projects.

I think soon I could lose interest in mkt data or latency.

However, even in a 5Y java/c#/python job, I believe i can find engaging challenges.

Q: On marketable-tech-xp.xlsx, which factors on row 2 help keep my mind engaged?
– Some form of complexity is always helpful. There are some on Row 2.
– Poor “market value” always decimates my “engagement”. There are some protective factors on Row 2.

The job_satisfaction_predictor spreadsheet compares past jobs in terms of engagement.

 

 

[12] VaR: stress^historical based

I think my OC colleague Rajesh hinted once — there are two common methodologies to compute "minimum loss in the worst 5% scenarios"

1. stress test
2. historical

In Historical, you record the actual (unrealized + realized) losses for the past x (like 252) trading days and plot a histogram, and directly pin point "minimum loss in the worst 5% of those trading days".

I think the way you record the losses (or gains) is like "for IBM, daily price relative is 95%, +10%, 102%, 92% …."

In stress test, I think you simulate a drop in vol and/or a drop in underlier etc

blockingMutex implementation ] kernel!!userland

Background — Linux kernel provides two types of locks — spinlock and blocking mutex, as in https://www.kernel.org/doc/htmldocs/kernel-locking/locks.html . Here I focus on the mutex. I think this is far more useful to userland applications.

https://lwn.net/Articles/575460/ has good pointers:

  • I believe context switch is expensive, since CPU cache has to be replaced. Therefore, optimistic spin is beneficial.

https://github.com/torvalds/linux/blob/master/kernel/locking/mutex.c shows

  • a blocking mutex is used in kernel, perhaps not directly used by userland apps
  • implemented using spin lock + some wait_lock
  • maintains a wait_list. Not visible to any userland app.

 

y FIX needs seqNo over TCP seqNo

My friend Alan Shi said … Suppose your FIX process crashed or lost power, reloaded (from disk) the last sequence received and reconnected (resetting tcp seq#). It would then receive a live seq # higher than expected. This could mean some executions reports were missed. If exchange notices a sequence gap, then it could mean some order cancellation was missed.  Both scenarios are serious and requires request for resend. CME documentation states:

… a given system, upon detecting a higher than expected message sequence number from its counterparty, requests a range of ordered messages resent from the counterparty.

https://bintanvictor.wordpress.com/wp-admin/post.php?post=38225&action=edit&classic-editor has more details.

Major difference from TCP sequence number — FIX specifies no Ack though some exchange do. See Ack in FIX^TCP

Q: how could FIX miss messages given TCP reliability? This is a good question, but so far never quizzed in any interview.
A: FIX can use non-TCP channels
A: I guess tcp session disconnect is one reason.

https://kb.b2bits.com/display/B2BITS/Sequence+number+handling has details:

  • two streams of seq numbers, each controlled by exch vs trader
  • how to react to unexpected high/low values received. Note “my” outgoing seq is controlled by me hence never “unexpected”
  • Sequence number reset policy — After a logout, sequence numbers is supposed to reset to 1, but if connection is terminated ‘non-gracefully’ sequence numbers will continue when the session is restored. In fact a lot of service providers (eg: Trax) never reset sequence numbers during the day. There are also some, who reset sequence numbers once per week, regardless of logout.

 

c++^java..how relevant ] 20Y@@

See [17] j^c++^c# churn/stability…

C++ has survived more than one wave of technology churn. It has lost market share time and time again, but hasn’t /bowed out/. I feel SQL, Unix and shell-scripting are similar survivors.

C++ is by far the most difficult languages to use and learn. (You can learn it in 6 months but likely very superficial.) Yet many companies still pick it instead of java, python, ruby — sign of strength.

C is low-level. C++ usage can be equally low-level, but c++ is more complicated than C.

string pool IV notes

String str = new String("Cat"); //[1] inefficiency

Based on https://www.quora.com/String-s-new-string-ABC-How-many-objects-are-created-after-the-above-statement .. Net effect of above statement — either 1 or 2 instantiations

> At class-loading time, JVM picks up all string literals in the class file and populates the string pool. IIF this “Cat” is the first occurrence then it would be instantiated in the string pool.

Net effect so far is 0 or 1 instantiation.

> At runtime, the new() forces JVM to instantiate a brand new object, in the regular heap, unrelated to the string pool.

Net effect now is 1 or 2 instantiations.

[1] Note this is (almost) never needed according to P20 [[effJava]]

given arbitrary value X,get nearest2nodes in std::map

Basically, find the left/right neighbor nodes.

! Don’t use upper_bound since lower_bound is enough.

  • If perfect match, then lower_bound return value is all you need. No need for 2 nodes:)
  • If no perfect match, then lower_bound() and prev(lower_bound)
  • if X too low, then begin() alone is all we can get
  • if X too high then prev(end()) alone is all we can get

See https://github.com/tiger40490/repo1/blob/cpp1/cpp1/miscIVQ/curveInterpolation_CVA.cpp

NavigableMap.java interface offers four methods .. see closestMatch in sorted-collection: j^python^c++

std::lower_bound() means touchUpperBound # 4 scenarios

  • std::upper_bound() should be named strictUpperBound()
  • std::lower_bound() should be named touchUpperBound() since it can return an element touching the target value

If no perfect hit, then both returns the same node — lowest node above the target

  1. if target is too high, both return end(), which is a fake element
  2. if target is too low, both return begin(), which is a real element
  3. if target is matching one of the nodes, then perfect hit
  4. if target is between 2 nodes, then … this is the most common.

https://github.com/tiger40490/repo1/blob/cpp1/cpp/66miscIVQ/curveInterpolation_CVA.cpp caters to all four scenarios.

isPreorderBST(randomArray) #G4G

This was asked in a 2018 hacker rank interview, not as important as an on-site coding question. However, I see this question as a classic.

— G4G solution based on sorted stack

https://www.geeksforgeeks.org/check-if-a-given-array-can-represent-preorder-traversal-of-binary-search-tree/ has a tested solution but it’s too cryptic. I added instrumentation to help myself understand it. See my github code.

Insight — at any time, the stack is sorted with the top being the smallest value . If the new item is highest item so far, then stack is wiped clean !

The algo is fairly simple —

  • initialize stack with arr[0] i.e. root
  • for every new item xx, start looping
    • if xx falls below stack top or stack is empty, then push xx, otherwise pop and loop
  • so when would we fail. It relies on the “root” variable which holds the top of the stack if non-empty, but frozen (the real tree root) once stack wiped clean !

— My solution 2

My analysis — After we have seen some number of nodes, there’s exactly one possible tree we could construct, so let’s construct it.

Note during the construction, each new node has only one place to go (key insight) and after that we can check if it breaks pre-order.

https://github.com/tiger40490/repo1/blob/py1/py/algo_tree/isPreOrderBST.py is my tested code, probably less elegant than the G4G solution, but I’m still proud of it.

I don’t think any solution (including the G4G) is really O(N). My solution is not inferior. It has a readability advantage. It’s longer but not necessarily slower.

 

## execution algo + other domains / skills

OMS — is probably a core module in execution system. Equity and FX systems tend to need OMS but many trading desks presumably need no OMS, very simple OMS or off-the-shelf OMS. A buy-side trading desk may simply use the OMS of the broker. The same could be said of “execution systems” like VWAP algos.

Therefore, I feel the importance of OMS/EMS is over-stated.

SOR — is more niche, less “generic” a skill, but as a module is more visible to the business.

FIX connectivity — is a more generic tech skill and shows resilience to churn.

mkt data — is more “separate” from the other skills.

java9 embrac`cloud

I feel the most strategic changes in java9 are about cloud, such as

  • MSA,
  • Docker container
  • memory footprint reduction

https://qconsf.com/sf2016/sf2016/presentation/java-se9-cloud.html is a 2016 article

When cloud computing becomes an even bigger driver than it is today, some legacy technologies will gain mind-share while others will lose mind-share. Java 9 team wants to be a winner.

## JVM interceptions #mem,exception..%%hypothesis

Java provides a virtual machine to resemble a physical machine. To the upper-level[1] applications running therein[2], JVM provides similar interfaces as a physical machine does. Through these interfaces, JVM intercepts runtime requests made by the hosted application, and provides powerful value-added services to the hosted application. Below are a subset of the interfaces I’m interested in.

Note when I say “intercept”, I mean the request is ultimately serviced by the host kernel of the physical machine. However, in many cases the JVM completes the requests in itself without calling into the host kernel.

  • memory allocation. Note de-allocation is not a legitimate request from a java app. The JVM is likely to handle allocations without calling into kernel. When allocation fails, the outcome is much better than in an unmanaged runtime.
  • memory access beyond an array or linked data structure. JVM probably forwards the request on, but the physical address is managed by the GC.
  • thread — management (including creation). Mostly native thread is used, so JVM forwards the requests to kernel, but I think JVM is very much in-the-loop and can provide instrumentation support
  • exception handling. In an un-managed environment, exception outcome can be inconsistent, unpredictable. Linux uses interrupts and signals (interrupts^signal ] kernel). JVM handles exceptions at a higher level, more effectively:)

[1] a layered view
[2] a “hosting” view. In this view, the JVM is a “container”. A docker container is cgroup that includes the JVM process

c++lib to build simple http endpoint #Mark

Note this is about c++ add-on packages (like boost), less popular in c++ than in jxee !

After overspending on boost (and hibernate, gemfire, struts), I won’t repeat the mistake.

My preference is not feature set or flexibility, but simplicity.  In my professional experience, I remember the java servlet and C# WCF implementations are robust, feature-rich and industry-strength. They are not simple. I like the simplicity in python and perl http endpoints.

  • The simpler, the faster.
  • The simpler, the fewer mistakes we tend to make.
  • The simpler, the more adaptable.
  • The simpler, the easier to integrate with other components
Now some c++ libraries to implement a simple http endpoint:
  1. https://cpp-netlib.org/0.9.1/hello_world_server.html  shows a simple http endpoint constructed with cpp-netlib
  2. https://code.google.com/archive/p/mongoose/ is a simple library to construct http endpoints
  3. https://www.boost.org/doc/libs/1_55_0/doc/html/boost_asio/example/cpp11/http/server/main.cpp is a demo using boost::asio, which is industry-strength, not simple

## My github profile

Some interviewers have requested my github profile. I create github content for learning and to share with my friends, not really for interviewers. I have decided not to spend too much time polishing my github content. The content would be rough round the edges. If there’s any doubt or bug report, I would be happy to discuss.

unbounded queue for 1-Producer-1-Consumer #Wells

———- Forwarded message ———
From: Bin TAN (Victor) <tiger40490@gmail.com>
Subject: demo of my design that first interviewer didn’t see

Hi Angelo,

During my interview with Jun, I spent the last 20 minutes locked in a rigorous debate — given an unbounded queue designed for single-threaded mode, can we provide a thread-safe facade so a producer thread and a consumer thread can operate on it safely, but without using locks in every operation.
Note In JDK and STL there are many collections designed for single-threaded mode, because such designs are simpler and significantly faster.
I argued valiantly that there exist designs where most of the time we can avoid both locking and CompareAndSwap. I failed to convince Jun. Jun believed firmly that unsynchronized access by two threads is always unsafe so we must use lock or CompareAndSwap every single time.
I just spent 20 minutes at home posting an implementation in https://github.com/tiger40490/repo1/tree/jProj/java/com/wells
I would also like to point out the java ConcurrentHashMap lets two threads (possibly a writer thread and a reader thread) access the shared data structure concurrently, usually without locking or CompareAndSwap , when the two threads happen to access two distinct segments. Note there can be a large number of segments, up to a few hundred at least, so the chance of two threads hitting distinct segments is very high (99.999% chance for a 333-segments map). Therefore, contrary to what Jun said, for reader/writer threads to access the same data structure,  we don’t need locking in every read and write access.
concurrencyLevel – the estimated number of concurrently updating threads. The implementation may use this value as a sizing hint.
I wrote this (like most of my concurrent designs) in Java since Java memory model is better documented and better understood.

check array@0-N +! auxDS #Nsdq#contrived

— Q1: Given a size-5 array of integers. For every element x: 0<= x <=4. Array would be considered Good if all elements are unique, i.e. all the numbers 0 to 4 show up exactly once each. Please write a program to check if the array is good. If Bad, then print every number that’s showing more than once. O(N) time and O(1) space please.

We will denote array size as sz, so N == sz-1.  N=4 and sz=5 in the opening example.

— comments:

I feel this question (even without the bonus) is too hard to complete on the spot, unless you have done it before.

I made a mistake that would be unforgivable in west coast and Bloomberg interviews — failed to clarify requirements and assumed input array is immutable.

— Q1b (bonus): Also print every missing number. (Requires iterating entire input.)

Example A {1,2,0,2,0} has repeating 0 and 2, and missing 3 and 4. Example B {3,1,4,2,0} is Good.

— comments:

If you feel the O() complexity requirements are daunting, then work out a less efficient solution first, and improve on it.

I feel in real projects, we always have enough memory to build a separate array, so the O(1) space requirement is contrived. A O(N) time and O(N) space solution is easier. Therefore, I rate this coding question as contrived and relatively non-standard. Yet the required techniques are “universal” in many high-end interviews.

https://github.com/tiger40490/repo1/blob/cpp1/cpp/algo_arr/checkArr0-N_Nsdq.cpp

has my 99% solution. The unfinished part is trivial.

 

 

max-profit #Nsdq short-sell

Q1: given a time series of price points within a past day, there are many ways to trade the day — one buy-sell, five buy-sells, or do-nothing … Short-sell allowed, but you must start and end the day holding no shares. For example, if you sell 9 times then you must buy 9 times, each time 1 lot (100 shares) exactly. Write a function to trade the day and produce the highest profit.  Note you can analyze the price points with perfect hindsight.

Interviewer asked for real code. Very little time given, so I believe the right solution is short, and much simpler than the question on “array of 0-N” (where interviewer asked for pure algorithm).

https://github.com/tiger40490/repo1/blob/cpp1/cpp/array/oracleDayTrader_Nsdq.cpp is my buggy”greedy” solution.

https://github.com/tiger40490/repo1/blob/py1/py/array/maxProfit_Nsdq.py is my new solution.

Q2: now you are allowed to buy UP-TO 100 shares each time. All other rules remain. When if ever is it optimal to buy/sell fewer than 100 shares (a.k.a. an odd lot)?
%%A: never

Q3: same as Q1 but now your holding must always be long 1 lot, short 1 lot or zero holding.

–(broken?) brute force solution I gave on the spot:

Start with just four price points 1,2,3,4. Name every pair A=1>2; B=1>3; C=1>4; D=2>3; E=2>4; F=3>4. Each pair represents a buy/sell round trip, with a profit (ignore it if unprofitable).

How many “plays” i.e. how many ways to trade the day? 2^6 plays.

Just evaluate each play and find the best. Beware that ABD is same as B x 2 and should be scaled down by 2. Similarly, AEBFC == C x 3

##[15] 20 low-level IV topics : c++imt Java

This is collected after i stopped active interviewing in Aug 2012.

Low-level domains are my strength, proven again in 2017-2018 interviews. Beside the obvious domains — threading,  data structures, c++/java/c#..

  • exception in c++/java
  • concurrent singleton
  • GC
  • big-O analysis in west coast interviews
  • linux/compiler/C++ interviews are even more low-level than java. Despite serious “weak-joints”[1], I generally excel at the lower-level
    • shared mem — pointer therein!
    • sockets .. like kernel bypass
    • unix signals
    • cpu cache – instruction and data
    • inline
    • rvr, the most important c++0x feature, is very low-level
    • big-4
    • pbclone^pbref^pbptr, slicing
    • undefined behavior – usually are low level
    • internals of deque, vector/hashtable reallocation…
    • smart pointer internals
    • reinterpet_cast vs move()
    • pass a functor to a thread
    • [1] See items of the same color
  • —-On the other hand, interviewers don’t bother to go low level on …
  • thread pool? somehow never low-level
  • Unix commands, python? never low level
  • (noSQL and) SQL? no
  • FIX? not so low-level
  • SOA, MOM
  • asynch designs
  • design patterns

CVA c++ IV 2 #oq

void func(ResourceMgr & rm){ 
  int connId = rm.getConn();
  double d=externalFunc(connId); 
  cout<<d<<endl;
  rm.reclaim(connId); //release the resource to the mgr
}
  • Q5: In the code above, external func can throw, so write a RAII wrapper to prevent resource leak.
  • Q5b: what if you aren’t allowed to use RAII? Ok you said catch-all. Is an empty catch block enough?
    AA: need to re-throw the original exception, to mimic the RAII behavior.
  • Q(paper coding): Iterate over two vectors in lock steps. Which is faster — iterator vs vector index?
  • Q (bond math): Is there any uncertainty in the present value calc on an pre-existing vanilla IRS?
  • q: what design patterns do you use?
  • q: write a singleton skeleton
  • Q: how do we make a class “final” in pre-c++11
    %%A: either make dtor private or make all constructors private
  • Q: is shared_ptr thread safe?
  • Q: any difference — const shared_ptr<T> vs shared_ptr<const T>?
  • %%Q: does it make sense to pass a shared_ptr by const reference? I feel it’s cleaner to pass by raw ptr
    AA!: pass by const-reference is faster and recommended

–Zheng

  • Q: why use weak_ptr instead of raw ptr. Valid question.
    A: See [[std c++lib]] P84.
    A: See last paragraph in https://bintanvictor.wordpress.com/2010/01/20/weak_ptr-can-access-inner-ptr-only-through-a-shared_ptr/
  • Q: you said atomic<int> operator++ probably wraps a CAS in a while loop internally, so is it spinlock?
    %%A: I think so.   http://www.cs.cornell.edu/courses/cs4410/2015su/lectures/lec06-spin.html is a detailed explanation
  • Q34: various synchronization techniques?
  • Q34b: what’s the c++ function to force a memory barrier?
    A: std::atomic_thread_fence(),
    but i feel if an application (not a library) uses this function then something is seriously wrong.
  • Q: can we implement a mutex using CAS?
    AA: blockingMutex implementation ] kernel
    AA: spinlock, not blockingMutex, can be implemented as in   https://www.andrew.cmu.edu/course/15-440-sp09/applications/ln/lecture4.html
  • ? Q (paper coding): write a piecewise linear interpolation Curve class with a ctor taking a vector<pair<double, double>>.  See https://github.com/tiger40490/repo1/blob/cpp1/cpp1/miscIVQ/curveInterpolation_CVA.cpp
  • Q: What are r-values and r-value references
  • Q: Explain your use of SFINAE. See https://github.com/tiger40490/repo1/blob/cpp1/cpp1/template/SFINAE_demo1.cpp
  • Q: what’s the c++ library function for cmpxch?
    A: atomic::compare_exchange_weak()
  • Q: is your barcap vol surface used for EOD risk only?
    %%A: no. A trader might suspect a lesser known product is currently mis-priced. She could confirm the live bid/ask are obvious outliers on the fitted surface.
    %%A: a dealer desk price new deals based on the fitted surface, whether or not the strike/expiry requires interpolation.

–Simon Ma:

  • Q (paper coding): implement Fib() recursively and iteratively
  • Q: what’s inline?
  • Q: semaphore vs mutex
  • Q: how did you compute greeks like gamma?
  • Q (bond math): given 6M spot IR = 5% and 12M = 10%, what’s the 6M rate 6M forward?
  • Q: Assign first half of a vector to another vector
    %%A: vector::assign() probably takes two iterators so I can pass v.begin(), v.begin()+5
  • Q (obscure): What data structure to represent a directed graph of N nodes? See NxN matrix for graph of N nodes
    %%A: create a Node class with a single ptr field…
  • Q: use parallel algorithm to compute sum of a vector<int>
    %%A: only the last stage global aggregation has a shared mutable that needs protection. The sub-aggregation can be single-threaded.
  • Q (probability problem): Both Alan and Bob agree to show up at Cafe9 sometime between 12 and 2pm. Whoever arrives first would wait for 15 minutes only. What’s the probability of them actually meeting
  • Q: what’s thread_local?
    %%A (correct): a new storage class that’s similar to static
  • Q (paper coding): A natural number is Good if it is a product of only 3, 5 and 7. The smallest Good numbers are 1,3,5,7,9,15,21,25,27,35… Write a print(int N) to print the Nth Good number. Hint: write a isGood(int k). See https://github.com/tiger40490/repo1/blob/cpp1/cpp1/miscIVQ/isGood357_CVA.cpp
  • Q (unclear): implement subtract(int a, int b) using only operator+ and comparison operators. I feel this question is unclear. Can we use bitwise? Can we multiply?
    %%A: just try all the integers (i) one by one a == b+i
  • Q (obscure): what operators can’t be overloaded?
    %%A: q(?:) correct
    AA: address-of CAN be overloaded!

–Mikhail the mgr

  • Q: inserting 1000,000 items into a list vs a vector without reserve()
    A: vector wins
  • Q: do you define your exception classes?
  • Q: in what context is a deque completely unusable whereas vector is perfectly fine?
    A: a C function taking an array (i.e. a ptr + a count). Vector::data() is designed for this. In Pre-c++11, we can pass &v[0] and v.size()
    A: if the code takes one element and uses pointer increment to access other elements. Deque would fail at segment boundaries.
  • Q89 (coding question): given a vector of T [passed in by const ref], write a template function to return [by const reference] the minimum element.
  • Q89b: how do you handle an empty vector?
  • ? Q89c (obscure): given q(vector const & vec), what can you say about q{auto it = vec.begin()} versus q{vector::const_iterator it=vec.begin()}

CAS cpu-instruction: 3inputs #curVal discovered]hw

A CVA interviewer asked me to explain the cmpxch cpu-instruction. I now believe it COMPARES two values (expected^current) and IIF matched, updates the memory location to a “newValue”.

Out of these 3 “inputs”, only the expected and newValue are inputs to the function. The 3rd item “current” is NOT an input parameter to the function, but discovered in the hardware.

See P1018 [[the c++StdLib]] by Josuttis

This knowledge is slightly below java API

[18] G4 IV(!! GTD)domains 2 provide 20Y job security

See also

Let’s ignore zbs or GTD or biz domains like mktData/risk here …

  • –roughly ranked by value-to-me
  • [c s] java? resilient in the face of c# and dynamic languages. At least 10Y relevance.
  • [c s] c++? resilient in the face of java. Time-honored like SQL
  • [c] abstract algorithm and data structures, comp science problem solving
  • [c n] tcp/udp optimization + other hardware/kernel/compiler optimizations
  • ……….No more [c]
  • py + shell scripting? no [c] rating since depth unappreciated
  • Linux and windows? at least 10Y growth, but no [c]
  • [s] SQL? resilient in the face of noSQL, but no [c]
  • bond math?
  • [n s] FIX? At least 10Y relevance
  • [c=high complexity in IV; shelf-life; depth appreciated …]
  • [n=niche, but resilient]
  • [s=survived serious challenges]

convert a recursive algo to iterative #inOrderWalk

Suppose you have just one function being called recursively. (2-function scenario is similar.) Say it has 5 parameters. Create a struct named FRAME (having 5 fields + possibly a field for lineNo/instructionPointer.)

Maintain a stack holding the Frame instances. Each time the recursive algorithm adds to the call stack, we add to our stack too.

Wiki page on inorder tree walk  has very concise recursive/iterative algos. https://github.com/tiger40490/repo1/blob/py1/py/tree/iterative_InOrderWalk.py is my own attempt that’s not so simple. Some lessons:

  • Differentiate between popping vs peeking the top.
  • For a given node, popping and printing generally happen at different times without any clear pattern.
    • the sequence of pop() is probably a pre-order tree walk
    • the sequence of print is an in-order tree walk

read-mostly data store: needs synchronization

Suppose a data store is infrequently updated, and only by a single writer thread, but is accessed frequently by many, many reader threads.

Q1: Can we cheat and skip some of the (expensive) locking?
A1: No. The writer could be in the middle of a multi-step update, so readers would see a half-updated data store

Q2: What if we somehow ensure the updates are never multi-step [1]?
A2: Unfortunately no, due to visibility failure. If a reader thread doesn’t synchronize, the crucial memory fencing would not occur and the reader may not notice a recent update, and use cached data instead.

[1] One way to do it — writer creates a new version of a record, and at last reseat a single pointer , as described in my AtomicReference blogpost https://wp.me/p74oew-s3

SIG phone round

See retrans questions on retrans biz logic if I were CTO

Q: how do you extract the seq number from packet header?
A: reinterpret_cast to a pktHeaderStruct ptr, then read the field

Q: is reinterpret_cast safe?

Q: why is RBTree sometimes faster than hashtable for a small collection? See unordered_map^map performance: unpredictable

Q4: speed of std::sort() on a vector vs list::sort
%%A: probably vector due to cache efficiency
A: also the stability of list::sort comes at a cost

Q4b: how are the two sorting implemented?
%%A: possibly two variations of quicksort

Q: std::sort() vs qsort() in C. Same algorithm, but which is faster?
%%A: std::sort, due to functor inlining
AA: Item 46 in [[effectiveSTL]]

Q: given a csv file with a million rows, you use c/java/c#/py to (convert to number) sum up 2nd column and produce a single output number, which language is fastest?
%%A: main cost is file read, but let’s suppose entire file is loaded into memory.
%%A: I feel python bytecode (even optimized) will be slightly slower. Java (due to JIT) and C are likely faster.
A: now in hindsight I feel integer parsing will cost much more than arithmetics.

move() ^ pointer-casts: mostly compile-time

See https://stackoverflow.com/questions/27309604/do-constant-and-reinterpret-cast-happen-at-compile-time/27309763

  • dynamic_cast incurs runtime cost.
  • static_cast, const_cast, reinterpret_cast are compile-time
  • std::move() is compile time. An unconditional cast, according to Scott Meyers.

That’s for pointers or references.

Nonref variable cast is uncommon, unfamiliar and pretty much unnecessary, except for numeric types.  static_cast<std::string>("Hello") ends up calling std::string constructor.

unexpected longevity@FOSS

Conclusion — my tech-bets and investment in many FOSS technologies proved to be correct. In contrast, only a few of my tech bets on commercial softwares are correct — MSVS, Oracle, Sybase, Excel+VBA,

I didn’t want to spend too much effort analyzing the forces around FOSS, but to my surprise, those forces keep growing and evolving.

  • Eg: weblogic was once dominant, but left behind by Tomcat and Jboss
  • Eg: Microsoft has to contend with Linux, Java, Apache
  • Eg: Oracle has to keep developing OpenSolaris, and MySQL
  • Eg: IBM, Oracle … have to support Linux
  • Eg: SUN, HP-UX all lost the battle against Linux. SUN has no choice but OpenSolaris
  • Most of them have to face the stiff challenge by a single FOSS — GNU/Linux

Because a FOSS needs no revenue no payroll to stay alive, there’s no survival risk or financial uncertainty in a FOSS project. Therefore, a FOSS often has better longevity.

Some of the most influential, dominant, enduring and low-churn softwares are FOSS and are unlikely to change:

  1. linux, BSD-unix
  2. java and GCC
  3. python, perl, and most scripting languages
  4. most development tools in *nix
  5. many javascript frameworks
  6. many browsers

Q: what forces power the FOSS and provide the energy, momentum?
A: alpha-geeks who want to create a impact and legacy?

Apparently, you need just one (or a few) alpha-geek to create a formidable competitor to a software vendor’s army of developers.

binary data ] FIX

It’s possible to “embed” arbitrary binary data in a FIX tag-value pair. However, the parser has to know where it ends. Therefore, the “value” consist of a length and a binary payload.

Q: Any example?

Q: can we send a image?

Q: can we send a document?

Q: Encryption? Need to read more

Empty base-class optimization

This optimization only zeros out the runtime [1] size of baseclass subobject [2]. All other sizes of empty types are the same — one byte.

Suppose Der subclasses B1 and B2.

Note on [1]:
If B1 is empty, then the compile-time operator sizeof() shows sizeof(B1) == 1 not zero, but at run time, the Der object size shows the B1 sub-object has zero size, due to EBCO.

Note on [2]:
If all of Der, B1 and B2 are empty, we know sizeof(Der) == 1, but how about rutime sizeof(an_Der_instance)? Also one, not zero, because this instance is not a baseclass subobject.

q[Node*] param to a function can be..

A Node address can represent an entire aray of Node objects

If the Node class has a nextNode field, then this param can represent an entire linked list, or head (or any node) of a linked list

If Node class has left and right child nodes, then this param can represent a tree, or a subtree

Q: Can this param represent an array of linked lists. A hashtable is such an array.
A: No. If each element in the array is a Node*, then the array would look like a Node * *

c++11 throwing dtor: a few technicalities for IV

  • STL containers and std::swap() lose all exception guarantees when a payload class dtor throws
  • In a normal context (without double exception), if a dtor throws, it’s same as new() throwing exception. Stack unwinds as expected, and all subsequent dtors run as expected.
  • in c++11, all dtors are implicitly noexcept. If such a dtor throws, it triggers std::terminate() by default

Q: what if your dtor encounters a critical error but not allowed to throw exception?
A: https://www.kolpackov.net/projects/c++/eh/dtor-1.xhtml mentions a pre_destroy() member function that can throw. Dtor would call this function but catch and ignore the exception. Client code can also call this same function but handle the exception intelligently.

inline perf can Backfire ! #Google#i-cache

As illustrated below, without inline, instruction cache system could hit ChangeOfFlow twice as it enters/exits your function aa(). If aa() is actually inlined and embedded in a hostFunc, then the instruction cache system can often load entire hostFunc, eliminating COF. This helps instruction cache, but excessive inlining can increase executable footprint (code bloat).

google c++ guide points out that

  • inline can either increase or decrease (for tiny functions) executable footprint. In general, smaller footprint improves running time due to instruction cache efficiency
  • virtual functions are inlined (i.e. defined in class body) primarily for convenience/maintainability, not performance

See also https://www.eetimes.com/document.asp?doc_id=1275470 and https://www.eetimes.com/document.asp?doc_id=1275474

As an example, consider the function call tree flow in Figure 1. Suppose function F2 is linked near function F1, but function F3 is not linked near F1. When function F1 calls F2, it is possible that F2 is already in the cache and that there will be no cache miss. (The likelihood that F2 is in the cache depends on the sizes of F1 and F2, as well as the location of the call inside F1.) In contrast, there will probably be a cache miss when F1 calls F3. Because F3 is located far away from F1 in memory, it is not likely to be in the cache when F1 calls it.

compare%%GTD to single-employer java veterans

Me vs a java guy having only a single long-term java project, who has more zbs (mostly nlg) and GTD power, including

  • performance tuning in thread pool, DB interface, serialization
  • hot-swap and remote debugging
  • JMX
  • tuning java+DB integration

When it comes to QQ and coding test scores, the difference is more visible than it is with GTD/zbs.

Conclusion — over 10 years, your portable GTD power grows too slow if you stick with one (or very few) system.

Am I advocating job hopping? Yes if you want to remain an individual contributor not aiming to move up.

 

[10]y mem footprint is key to latency #20sec JGC

see also post on large in-memory search

I suspect there’s a dilemma —

  • large heap allocation request -> free list search is harder and slower. Need to avoid ad-hoc/unplanned request for large chunks.
  • many small heap allocation requests -> free list mgr becomes hotspot
  • .. in reality, pre-allocating large arrays is probably a performance win.

I thought in low latency, time costs outweigh space costs, but no. Garbage collection is a major performance issue in low latency systems. I guess so is object creation. I guess that’s why memory efficiency affects latency. GC probably takes less cpu time if there’s less stuff to scan.

Distributed cache (memory visualization) isn’t much used in low latency systems, possibly because

  • * serialization
  • * network latency
  • * TCP? I feel the fastest HFT systems have very limited IO perhaps in the form of shared memory? FIX is based on TCP so I would assume very high overhead, but in reality, it may be a small overhead.

— In [[enterprise java perf]] published by SunMicrosystems and authored by IBM researchers, there was a brief warning against large heap. According to the authors, for a large heap, one GC cycle can take 20 sec.