## past vindicative specializations

see also — For the “domains nlg” (first 5 rows), marketable_domain_xp spreadsheet is more comprehensive but not necessarily more updated.

Quant and other Unsuccessful diversifications aren’t the focus here, but are listed below the table.

 scales 0-5 mkt value
given my
entry barrier accu achieved %%expertise
among peers@WS
tsn: determination
I was
wage ROI val4IV
mktData #socket  4 growing not everyone has xp  2~3 2 #few worked
2  5 some 2
bondMath  2 robust math is not natural
to most dev
 2  3 1 #spare time++  3 some 2
orderBook, OMS,
 3-4 robust medium 1 #cod`IV 1 3 #CVA sacrificed  3 none
forex #sg 2 robust medium 1 1 0 #spare time 1 none
c++  5 ok not
higher than
I thought
 5 #critical mass  3 5 #huge sacrifice  2 minimal
coding drill  5 Growing high 4 #XR disagrees  3 2 #spare time  5 none 4
python  4~3 growing low  2  3 #unknown 1  4 none
bash scripting +
unix #devops
 2 robust medium  4  3 0  3 none 1
threading xLang 5 growing high  4 #critical mass  5 0 4 some 5
(abstract?) xLang

collection internals

5 unexpected
the deeper
the harder
 4 #critical mass  4 0 4 some 5
RDBMS 3 shrinking low  4 #critical mass  4 0 #spare time 1 #under-
[00-06] web dev 1 sustained
high growth
low  5 #critical mass  3 0 3 none

–algo practice for IV
^ good amount of accumulation
^ my confidence is boosted esp. in c++
^ I’m rising to the challenge of coding test growing popular

^ boosts my IV confidence
▼no critical mass yet
▼low traction
▼don’t know my expertise relative to peers

–option domain knowledge including BS:
▼ much lower demand than bond math

— C# and XAML
^ Microsoft is a major force
▼not aligned to my current direction

^ was my Achilles heel, now slowly gaining confidence

–C++ GTD and IV
^ I made real progress in 1) GTD and 2) IV but not during Mac days — wrong job nature
^ deepens my java understanding
▼ those high paying domains (HFT, quant) are too hard to break into

–MOM architecture, products…
▼falling out of favor
▼ not as widespread as I perceived. Probably used in finance and legacy systems

–javascript, php, mysql
▼not aligned to my direction

▼ market too fragmented

–EJB, Spring, Hibernate
▼falling out of favor

[18]comfort zone: java/SQL/perl

If you play safe and stay within the comfort zone of java/SQL/Perl, then don’t under-estimate the negative consequences such as

  • reactive rather than proactive
  • doldrums — see post on “y re-enter c++”
  • no deepening your understanding — a zbs
  • remain afraid and /uninitiated/ with the low-level details below JVM
  • uninitiated on latency tuning buzzwords — shallow QQ topics
  • speed coding test — unable to use python

jira Resolved^Closed

JIRA has two “fields” (one of them isn’t a field, but for this, it might as well be).

  1. The Status tells you where the issue is in the workflow. It does not tell you anything about the issue’s resolution, even though it might say “resolved”, “closed”, “done”, “fed to penguin”, it’s just an indicator of where it is.
  2. Resolution is either empty or populated (I consider it a boolean). if it is empty, JIRA sees the issue as “open” (and displays “unresolved” on screen), whatever the status is. It considers it “done” if it has any data value at all.

Resolved often means “developer put in the fix and feel confident but other people may not feel confident.”

Closed is a Status. It is the end status of the workflow and means “Reporter or QA confirmed that no more action needed. No need to look at it again, until reopened.”

In one workflow, when I Resolve a jira, status transitions to ReadyForQA.

reactive java #learning notes

After a 5-minute glance, I feel this is yet another (one of a group) jxee add-on package. Not sure about its shelf-life.

There are many jargon terms in, or related to, this concept. Presumably too many (and intimidating) to a new comer.

https://spring.io/blog/2016/06/07/notes-on-reactive-programming-part-i-the-reactive-landscape is a 2016 Spring article.

https://dzone.com/articles/rxjava-part-1-a-quick-introduction is a 2016 tutorial to RxJava library.

Data Specialist #typical job spec

Hi friends,

I am curious about data scientist jobs, given my formal training in financial math and my (limited) work experience in data analysis.

I feel this role is a typical type — a generic “analyst” position in a finance-related firm, with some job functions related to … data (!):

  • some elementary statistics
  • some machine-learning
  • cloud infrastructure
  • some hadoop cluster
  • noSQL data store
  • some data lake
  • relational database query (or design)
  • some data aggregation
  • map-reduce with Hadoop or Spark or Storm
  • some data mining
  • some slice-n-dice
  • data cleansing on a relatively high amount of raw data
  • high-level python and R programming
  • reporting tools ranging from enterprise reporting to smaller desktop reporting software
  • spreadsheet data analysis — most end users still favor consider spreadsheet the primary user interface

I feel these are indeed elements of data science, but even if we identify a job with 90% of these elements, it may not be a true blue data scientist job. Embarrassingly, I don’t have clear criteria for a real data scientist role (there are precise definitions out there) but I feel “big-data”, “data-analytics” are so vague and so much hot air that many employers would jump on th bandwagon and portray themselves as data science shops.

I worry that after I work on such a job for 2 years, I may not gain a lot of insight or add a lot of value.

———- Forwarded message ———-
Date: 22 May 2017 at 20:40
Subject: Data Specialist – Full Time Position in NYC

Data Specialist– Financial Services – NYC – Full Time

My client is an established financial services consulting company in NYC looking for a Data Specialist. You will be hands on in analyzing and drawing insight from close to 500,000 data points, as well as instrumental in developing best practices to improve the functionality of the data platform and overall capabilities. If you are interested please send an updated copy of your resume and let me know the best time and day to reach you.

Position Overview

As the Data Specialist, you will be tasked with delivering benchmarking and analytic products and services, improving our data and analytical capabilities, analyzing data to identify value-add trends and increasing the efficiency of our platform, a custom-built, SQL-based platform used to store, analyze, and deliver benchmarking data to internal and external constituents.

  • 3-5 years’ experience, financial services and/or payments knowledge is a plus
  • High proficiency in SQL programming
  • High proficiency in Python programming
  • High proficiency in Excel and other Microsoft Office suite products
  • Proficiency with report writing tools – Report Builder experience is a plus


##types of rvr/rvalueObjects out there #SCB IV

An rvr variable is a door plate on a memory location , wherein the data content is regarded Disposable. Either 1) a naturally occurring unnamed temporary object or 2) a named object earmarked (via move()) as no-longer-need.

Examples of first case:

  • function returning a nonref — Item 25 of [[effModernC++]] and P532 [[c++primer]]. I think this is extremely common
    • function returning a pair<int, float>
    • function returning a vector<int>
    • function returning a string
    • function returning an int
  • string1+string2
  • 33+55

In the 2nd case the same object could also have a regular lvr door plate (or a pointer pointing to it). This lvr variable should NOT be used any more.

Q: That’s a rvr variable… how about the rvr object?
A: no such thing. A rvr is always a variable. There exists a memory location at the door plate, but that object is neither rvr nor lvr.
%%A: I explained to my 2018 SCB interviewer — rvr and lvr (and pointer variables) are thingies known to the compiler. Objects are runtime thingies, including 32-bit pointer objects. However, an unnamed temp object is (due to compiler) soon-to-be-destroyed, so it is accessed via a rvr.

TCP blocking send() timeout #details

See also recv()/send() with timeout #CSY

I see three modes

  1. non-blocking send() — immediate return if unable to send
  2. blocking send() without timeout — blocks forever, so the thread can’t do anything.
  3. blocking send() with timeout —

SO_SNDTIMEO: sets the timeout value specifying the amount of time that an output function blocks because flow control prevents data from being sent. If a send operation has blocked for this time, it shall return with a partial count or with errno set to [EAGAIN] or [EWOULDBLOCK] if no data is sent. The default for this option is zero, which indicates that a send operation shall not time out. This option stores a timeval structure. Note that not all implementations allow this option to be set.

In xtap library, timeout isn’t implemented at all. Default is non-blocking.  If we configure to use 2), then we can hit a strange problem — one of three receivers gets stuck but keeps its connection open. The other receives are starved even though their receive buffers are free.

mlock() : low-level syscall to prevent paging ] real-time apps

https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux_for_real_time/7/html/reference_guide/using_mlock_to_avoid_page_io has sample code

See also https://eklitzke.org/mlock-and-mlockall

https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_MRG/1.3/html/Realtime_Reference_Guide/sect-Realtime_Reference_Guide-Memory_allocation-Using_mlock_to_avoid_memory_faults.html says

If the application is entering a time sensitive region of code, an mlockall call prior to entering, followed by munlockall can reduce paging while in the critical section. Similarly, mlock can be used on a data region that is relatively static or that will grow slowly but needs to be accessed without page faulting.

de-multiplex packets bearing Same dest ip:port Different source

see de-multiplex by-destPort: UDP ok but insufficient for TCP

For UDP, the 2 packets are always delivered to the same destination socket. Source IP:port are ignored.

For TCP, if there are two matching worker sockets … then delivered to them. Perhaps two ssh sessions.

If there’s only a listening socket, then both packets delivered to the same socket, which has wild cards for remote ip:port.

UDP socket is identified by two-tuple; TCP socket is by four-tuple

Based on [[computer networking]] P192. see also de-multiplex by-destPortNumber UDP ok but !! enough for TCP

  • Note the term in subject is “socket” not “connection”. UDP is connection-less.

A TCP segment has four header fields for Source IP:port and destination IP:port.

A TCP socket has internal data structure for a four-tuple — Remote IP:port and local IP:port.

A regular TCP “Worker socket” has all four items populated, to represent a real “session/connection”, but a Listening socket could have wild cards in all but the local-port field.

calling database access method while holding mutex

Update: XR referred me to — https://wiki.sei.cmu.edu/confluence/display/java/LCK09-J.+Do+not+perform+operations+that+can+block+while+holding+a+lock

Hi XR,

Q: If some vendor provides a database access function (perhaps dbget()) that may be slow and may acquire some lock internally, is it a good idea to call this function while holding an unrelated mutex?

We spoke about this. Now I think this is so bad it should be banned. This dbget() could take an unknown amount of time. If someone deleted a row from a table that dbget() needs, it could block forever. The mutex you hold is obviously shared with other threads, so those threads would be starved. Even if this scenario happens once in a million times, it is simply unacceptable.

Here, I’m assuming the dbget() internal lock is completely unrelated to the mutex. In other words, dbget() doesn’t know anything about your mutex and will not need it.

As a rule of thumb, we should never call reach-out functions while holding a mutex. Reach-out functions need to acquire some shared resources such as a file, a web site, a web service, a socket, a remote system, a messaging system, a shared queue, a shared-memory mapping… Most of these resources are protected and can take some amount of time to become available.

(I remember there’s some guideline named open-call but I can’t find it.)

That’s my understanding. What do you think?

##[17] proliferation → consolidation.. beware of churn

This is an extension of my 2015 blog post https://bintanvictor.wordpress.com/2015/03/31/some-of-the-worst-technology-churns-letter-to-tanko/

Imagine you spent months of serious personal effort [1] learning to use, debug, and tune, say, MongoDB but after this project you only find projects that need just superficial Mongo knowledge. Developer time-investment has no recurring return. I think this is widespread in tech: A domain heats up attracting too many players creating competing products with varying degrees of similarity. We wish these products are mostly similar so we developers’ time investment can help us more than once, rather than learn-n-forget like 狗熊掰棒子. Often it takes decades to see some consolidation among the competitors, when most of them drop out of the race and we one player emerges dominant, or a common standard [2] is adopted with limited vendor extensions.

Therefore I see two phases : Proliferation -> Consolidation. The churn in the early phase represents a hazardous pitfall.

If we invest too much learning effort there we get burned.

  • Javascript packages
  • JVM languages — javascript, Scala, Groovy, jython
    • I don’t even know which company uses them
  • ORM and database access “frameworks”–ADO.net, LINQ, EntityFramework, SpringJDBC,  iBatis,
  • Data Grid and NoSQL — Terracotta, Hazelcast, Gigaspace, Gemfire, Coherence, …
  • MOM — tibco, solace, 29west, Tervela, zeroc
  • machine learning?
  • web app languages
    • php, perl and the LAMP stack
    • Javascript MEAN stack
    • ASP and the Microsoft stack
    • Java stack

[1] You could have spent the time on personal investment, or something else if not required by the project.

[2] Some positive examples of standardization —

  1. RDBMS vendors
  2. Unix vendors
  3. c++ vendors — mostly GCC vs Microsoft VC++
  4. Java IDEs; c++/java/c# debuggers
  5. cvs, svn, git

A few development technologies “free” of proliferation pains —

  1. socket and system programming — complexities are low level and in C not c++
  2. core java
  3. SQL
  4. c/c++
  5. Unix know-how for testing, investigation, devops and process management
    1. shell scripting,
    2. regular expressions
    3. searching

##@55, Safer2b manager or hands-on dev@@ #addYHL

Hi Shanyou,

Based on your observations, when I reach 55, do you think it’s safer as a manager or a hands-on developer? “Safer” in the presence of

  1. competition from younger generation
  2. competition from same age group or older
  3. new, disruptive technologies
  4. technology obsolescence (what I call technology “churn”).
  5. outsourcing

Among these threats, my concern is primarily #1 but what about you?

##family safety enhancers: next sg job#respect[def]

see predict next 3-5Y job satisfaction #engaging

see what determined7past job satisfaction #nextSgJob

Conclusion: As of Apr 2018 the top 2 enhancers are

  1. respect — more than “thank-you-for-trying” appreciation
  2. MktDepth — market depth
  3. –others…
  4. comp? important to satisfaction but not really important to “safety”
  5. engagement? important to satisfaction then-n-there but not really important to safety

online solution2algoQ:I always try%%self1st #maxRectangle

https://bintanvictor.wordpress.com/2018/04/07/check-array-of-0-to-n-nasdaq/ is the Nasdaq question. It includes a link to my solution in github. I came up with my own solution. Then I tried your solution found online.

https://bintanvictor.wordpress.com/2018/04/10/check-array-can-be-preorder-bst-walk/ is another interview question. I tried my own and came up with a “better” solution.

If I don’t try my own solution, I would not learn a lot, and I would not have fun implementing my own idea (even if not optimal).

I also find it hard to absorb then remember clever solutions I found online, such as the maxRectangle.

Solving problems myself also grows my confidence when facing unfamiliar problems. I do read online solutions when I have no clue. confidence-build`: another reason y I don’t look at Leetcode answer

By the way, one advantage of my solution is — it can identify the missing numbers, all within O(N) time and O(1) space.

susFocus^engage^absorbency^visPgress^traction #Clarified

I hope to differentiate these related terms, and reduce proliferation of tags

  • traction — is kind of vague and high-level, so no “t_traction”
  • traction — often describes learning curve breakthrough. Defined in [11]traction(def)^learning curve gradient #diminishing ROTI
  • visPgress — is more concrete and well-defined than “traction”
  • (dis)engaged — describes a mental state, like “absorbed”, “enchanted”
  • (dis)engaged — can be felt only from inside
  • t_distract — specific distractions, often short-term like kids, e-banking…
  • sustainedFocus — is often required to overcome tough learning obstacles
  • sustainedFocus — can be sustained by self-discipline (absorbency) whereas engaged is often due to luck and sustained interest.
  • sustainedFocus — is more a specific form of being “engaged”
  • sustainedFocus — is a longer phrase compared to the honeymoon of engagement
  • absorbency — is most specific
  • absorbency — describes the difficulty of staying engaged despite dry, repetitive, focused learning
  • t_focus@GTD tag and gzGTD category are broadly similar

Grandpa is a model of

  • engaged
  • sustainedFocus — but without too much self-discipline needed
  • absorbency

In comparison to grandpa,

  • I want to have more sustained focus
  • I want longer engagement, because I tend to lose interest too soon
  • I know my engagement lasts only hours, so I frequently feel a bold justification to capture the moment, perhaps at a financial cost
  • my absorbency capacity is not as good as I wish
  • i feel a real need to capture commute time
  • I need a job that gives me more personal time, even during work hours.
  • … I think these factors have profound consequences, such as my career decisions

bitwise coding questions: uncommon

Don’t over-invest.

Q: How prevalent are these questions in coding interviews?

  • I feel these questions are usually contrived, and therefore low-quality and unpopular among hiring firms.
  • There’s no classic comp-science constructs at bitwise level, and bitwise doesn’t play well with those contructs
  • the bitwise hacks must be language-neutral wrt python and javascript, so quality questions are scarce.
  • phone round? impossible

crossed orderbook detection@refresh #CSY

At least one interviewer asked me — in your orderbook replication (like our rebus), how do you detect a crossed orderbook? I have an idea for your comment.

Rebus currently has a rule to generate top-book-marker. Rebus puts this token on the best bid (+ best ask) whenever a new top-of-book bid emerges.

I think rebus can have an additional rule to check the new best bid against the reining best ask. If the new best bid is ABOVE the best ask, then rebus can attach a “crossed-orderbook-warning” flag to the top-book-marker. This way, downstream gets alerted and have a chance to take corrective action such as removing the entire orderbook or blocking the new best bid until rebus sends a best-bid without this warning.

Such a warning could help protect the customer’s reputation and integrity of the data.

fragmentation: IP^TCP #retrans

I consider this a “halo” knowledge pearl because it is part of an essential everyday service. We can easily find an opportunity to inject it into an Interview. Interviews are unlikely to go this deep, but it’s good to over-prepare here.

This comparison ties together many loose ends like Ack, reassembly, retrans, seq resets..

This comparison clarifies what reliability IP offers + what reliability features it lacks:

  • it offers complete datagrams
  • it can lose an entire datagram — TCP (not UDP) will detect it and use retrans
  • it can deliver two datagrams out of order — TCP will detect and fix it

[1] IP fragmentation can cause excessive retransmissions when fragments encounter packet loss and reliable protocols such as TCP must retransmit ALL of the fragments in order to recover from the loss of a SINGLE fragment
[2] TCP seq# never looks like 1,2,3
[3] IP (de)fragmentation #MTU,offset

IP4 fragmentation TCP fragmentation
minimum guarantees all-or-nothing. Never partial packet stream in-sequence without gap
reliability unreliable fully reliable
name of a “whole” piece IP datagram #“packet/msg” are vague
a TCP session with thousands of sequence numbers
name for a “part” IP fragment TCP segment
sequencing each fragment has an offset each segment has a seq#
.. continuous? yes discontinuous ! [2]
.. seq # reset? yes for each packet loops back to 0 right before overflow
Ack no Ack !
positive Ack needed
gap detection using offset using seq# [2]
id for the “msg” identification number no such thing
end-of-msg flag in last fragment no such thing. Continuous stream
out-of-sequence? likely likely
..reassembly based on id/offset/flag [3] based on seq#
retrans not by IP4 [1][3] commonplace
missing a piece? entire IP datagram discarded[3] triggers retrans

retrans: FIX^TCP^xtap

The FIX part is very relevant to real world OMS.. Devil is in the details.

IP layer offers no retrans. UDP doesn’t support retrans.



TCP FIX xtap
seq# continuous no yes.. see seq]FIX yes
..reset automatic loopback managed by application seldom #exchange decision
..dup possible possible normal under bestOfBoth
..per session per connection per clientId per day
..resumption? possible if wire gets reconnected quickly yes upon re-login unconditional. no choice
Ack positive Ack needed only needed for order submission etc not needed
gap detection sophisticated every gap should be handled immediately since sequence is critical. Out-of-sequence is unacceptable. gap mgr with timer
retrans sophisticated receiver(ECN) will issue resend request; original sender to react intelligently gap mgr with timer
Note original sender should be careful resending new orders.
See https://drivewealth-fix-api.readme.io/docs/resend-request-message


linux hardware interrupt handler #phrasebook

I think some interrupts are generated by software, but here I focus on hardware interrupt handlers.

  • pseudo-function — Each handler is like a pseudo-function containing a series of instructions that will run on a cpu core
  • top priority — interrupt context is higher priority than kernel context or process context. You can say “interrupt” means “emergency”. Emergency vehicles don’t obey traffic rules.
    • However, an interrupt handler “function” can get interrupted by another [1]. The kernel somehow remembers the “stack”
  • not preemptible — except the [1] scenario, kernel can’t suspend a hardware interrupt handler in mid-stream and put in another series of instructions in the “driver’s seat”
  • no PID — since there’s no preemption, we don’t need a pid associated with this series of instructions.

detect cycle in binary tree

Q1: (Adapted from a real interview) Given a binary tree, where each node has zero or 1 or 2 child nodes but no “uplink” to parent, and given the root node, detect any cycle.

https://github.com/tiger40490/repo1/blob/cpp1/cpp/binTree/cycleInBinTree.cpp is my tested implementation of 1a.

Solution 1a: Three web sites all point at DFT with hashset. I guess the hashset is shrunk whenever we return from a recursion

Solution 1b: I will first write a classic BFT, where each node is processed by processNode(). In this case, my processNode(me) function will start another BFT to traverse my subtree. If I hit myself, then that’s a cycle. I think the additional space required is the size of the queue, which is up to O(N). The (non-recursive) call stack at any time is at most 2 (or 3?) levels.

bft solution 1c: Upon append, each node keeps a parent-node-set, represented by hash table. Too much memory needed

Q2: how about constant space i.e. don’t use O(N) additional space?

I think any binary tree traversal requires more than O(1) additional space, except Morris. But can Morris even work if there are cycles?


de-multiplex by-destPort: UDP ok but insufficient for TCP

When people ask me what is the purpose of the port number in networking, I used to say that it helps demultiplex. Now I know that’s true for UDP but TCP uses more than the destination port number.

Background — Two processes X and Y on a single-IP machine  need to maintain two private, independent ssh sessions. The incoming packets need to be directed to the correct process, based on the port numbers of X and Y… or is it?

If X is sshd with a listening socket on port 22, and Y is a forked child process from accept(), then Y’s “worker socket” also has local port 22. That’s why in our linux server, I see many ssh sockets where the local ip:port pairs are indistinguishable.

TCP demultiplex uses not only the local ip:port, but also remote (i.e. source) ip:port. Demultiplex also considers wild cards.

socket has local IP:port
socket has remote IP:port no such thing
2 sockets with same
local port 22 ???
can live in two processes not allowed
can live in one process not allowed
2 msg with same dest ip:port
but different source ports
addressed to 2 sockets;
2 ssh sessions
addressed to the
same socket

cod`IV^QnA: relative importance

Let’s focus on technical screening but put aside SDI questions, which are sometimes like QnA .

  • In terms of “weighting” assigned by employer — coding 50/50 QnA on average, though only on-site coding is authentic. If an interviewer uses on-site coding, then it would often have higher weighting than the QnA portion.
  • interviewers’ time — coding 30/70 QnA. Interviewer needs to spend more time conducting QnA interview than coding interview.
  • candidate’s time — roughly coding 45/55 QnA. On average coding takes a candidate less time but more effort.
  • candidate’s effort — coding 70/30 QnA on average. QnA requires effort in advance but little effort during the interview.
    • I feel coding tests often require substantial concentration/effort but many people are not willing. One hour of coding test effort == 4 hours of project effort
    • timed coding test is like exam… full concentration.
  • rejection percentage — coding 70/30 QnA on average. I feel rejection rate is higher in coding than in QnA, partly due to insufficient effort by many candidates. West Coast and HFT has higher rejection rate on coding tests.
  • In terms of long term benefits on personal financial security and family wellbeing — about the same value.

Conclusion — as a job candidate, we had better embrace coding interviews, for our own long-term security.

make_shared() cache efficiency, forward()

This low-level topic is apparently important to multiple interviewers. I guess there are similarly low-level topics like lockfree, wait/notify, hashmap, const correctness.. These topics are purely for theoretical QQ interviews. I don’t think app developers ever need to write forward() in their code.

https://stackoverflow.com/questions/18543717/c-perfect-forwarding/18543824 touches on a few low-level optimizations. Suppose you follow Herb Sutter’s advice and write a factory accepting Trade ctor arg and returning a shared_ptr<Trade>,

  • your factory’s parameter should be a universal reference. You should then std::forward() it to make_shared(). See gcc source code See make_shared() source in https://gcc.gnu.org/onlinedocs/libstdc++/libstdc++-api-4.6/a01033_source.html
  • make_shared() makes a single allocation for a Trade and an adjacent control block, with cache efficiency — any read access on the Trade pointer will cache the control block too
  • I remember reading online that one allocation vs two is a huge performance win….
  • if the arg object is a temp object, then the rvr would be forwarded to the Trade ctor. Scott Meryers says the lvr would be cast to a rvr. The Trade ctor would need to move() it.
  • if the runtime object is carried by an lvr (arg object not a temp object), then the lvr would be forwarded as is to Trade ctor?

Q: What if I omit std::forward()?
AA: Trade ctor would receive always a lvr. See ScottMeyers P162 and my github code

https://github.com/tiger40490/repo1/blob/cpp1/cpp1/rvrDemo.cpp is my experiment.

— in summary, advantages of make_shared
* code size is smaller and more i-cache friendly
* one allocation fewer. I think one allocation is thousands of times slower than a simple calc


##[18] questionable tsn bets: past+future

  • boost beyond shared_ptr
  • functional programming
  • JMS?
  • weblogic
  • sendmail, makefile,
    • In contrast better bets at that time include apache, freeBSD, dns
  • option math?
  • quant dev
  • —— above are questionable bets
  • —— a subset of vindicated bets i.e. paying off above minimum expectation. See also ## past vindicative specializations
  • sockets!
  • bash + scripting
  • bond math
  • py
  • FIX
  • c++
    • c++ multi-file build, gdb, valgrind
    • c++11 — recently it started to pay off
    • pthreads — recently it started to pay off
    • template details — recently it started to pay off
  • —— next 10Y

How is debugger breakpoint implemented@@ brief notes #CSY

This is a once-only obscure interview question. I said up-front that CPU interrupts were needed. I still think so.

I believe CPU support is needed to debug assembly programs, where kernel may not exist.

For regular C program I still believe special CPU instructions are needed.

https://eli.thegreenplace.net/2011/01/27/how-debuggers-work-part-2-breakpoints seems to agree.

It says Interrupt #3 is designed for debugging. It also says SIGTRAP is used in linux, but windows supports no signals.


NxN matrix: graph@N nodes #IV

Simon Ma of CVA team showed me this simple technique.

https://github.com/tiger40490/repo1/blob/cpp1/cpp1/miscIVQ/tokenLinked_Friend.cpp is my first usage of it.

  • I only needed half of all matrix cells (excluding the diagonal cells) because relationships are bilateral.
  • Otherwise, if graph edges are directed, then we need all (N-1)(N-1) cells since A->B is not same as B->A.

My discrete math textbook shows this is a simplified form of representation and can’t handle self-link or parallel edge. The vertex-edge matrix is more robust but space-inefficient.

churn !! bad ] mktData #socket,FIX,.. unexpected!

I feel the technology churn is remarkably low.

New low-level latency techniques are coming up frequently, but these topics are actually “shallow” and low complexity to the app developer.

  • epoll replacing select()? yes churn, but much less tragic than the stories with swing, perl, structs
  • most of the interview topics are unchanging
  • concurrency? not always needed. If needed, then often fairly simple.

j4 stick2c++: Score big{losing@quant/c#

See also vindicative specializations , what if I transition to desk quant role but don’t rise up@@ and j4 c#: hind sight

I already give up several “investments”. If I take a java job, I would again forgo so many years of investment in c++. Now after I got more c++ offers, I feel /triumphant/vindicative/.

swing py c# quant 2010~13 quant af 2013 c/c++  (Zoom out …)
 $0 $0  $0 S$70k $ invested
 $0 $0 S$5k/Y  $0 $1k/Y cf
nonQuant job
up to USD20k/Y pretax opportunity cost
6M 2Y since barc 2Y  1Y 3Y 6Y since 1998 nominal effort
3M 4M 1Y  6M 2.5Y 4Y serious effort incl. STS
3M 2M 1M  2M 2Y 2Y spare time sacrificed(STS)
-2 -3 -6  -5 #more than py -15 -16 points invested
Barx passed some IVs OC, Bbg, Reuters 95G/OC Stirt/Mac/CVA ~18 offers job “offers”
Trex,bbg.. DRW; Nomura; Mako; Trex; Pimco analytics too many help interviews
helps my WPF xx value@algo IV deepens java nlg brain teasers; math cfd; contrarian insight into bigData/quantTrading; see y re-enter c++ other ROTI
2 more than
3 #built real
professional xp
more than invested 9 #50%+ more than invested points SCORED
 no loss? no loss -3 no loss -7 no loss net points lost

CVA c++IV 1

Q: when would you pass (in/out of functions) by ptr rather than non-const ref?
%%A: if the argument can be null, or void ptr
A: if I need to pass a double ptr
A: if I need to pass a raw array
%%A: A factory often returns by ptr or smart ptr, seldom by reference

Q: do you catch exception by value, ptr or reference?
A: slicing

Q: iterating vector by int index vs iterator… which is faster?
A: iterator wins since vec[i] needs a O(1) pointer arithmetic operation to locate the element

Coding question https://github.com/tiger40490/repo1/blob/cpp1/cpp/array/inSituFilter_CVA.cpp

CVA==marketValue of option-to-default

Monte Carlo is the only way to estimate it…

Classic PresentValue discounts each cash flow , but ignores the possibility of non-payment.

CVA simulates more than 1000 “paths” into the future over 50 to 75 years. Each path probably has a series of (future) valuation dates. On each valuation date, there’s a prediction of the market. A prediction includes many market factors. I believe my FRM book lists 9 standard market factors in the “stress test” chapter.

Each path can be described as a predicted evolution of the entire “universe”.

On each path, a specific shock or stress can be applied.

I guess that on each valuation day, the net amount Alan owes me is predicted (and the net amount Bob owes me is predicted) , known as my exposure to Alan. Multiply this exposure by the probability of Alan’s default and also the recovery rate, we get a kind of predicted loss. I think this is the basis of the CVA.

Most of the contracts are derivative contracts. Max Expiry is 75 years.

Even exchanged-traded assets’ valuation need to be predicted on a given valuation date on a simulation path. That’s because the exchange-traded product could be a collateral. A falling collateral value impacts the recovery amount, so this valuation affects the exposure indirectly.

SDI: elevator design #DeepakCM #70%

My friend Deepak gave me this Basic Requirements:

  • N-level building. You can assume 5 for now
  • Each level’s lift lobby has up/down buttons
  • Inside each lift there are N buttons for the N target floors
  • Any time, system can receive requests from any button

My basic design is not optimized for efficiency. The number of pending requests will stay below 20 since there are only that many buttons, so we iterate over all requests frequently.

My design effort is heavily focused on data structure — The more complex the requirements, the more I need to focus on clean, concise, sound data structure. They may not be necessary — a less optimal data structure can also work, but an optimal data structure helps us tremendously to cope with the complexity. I feel this problem is tractable once the data structures take shape.

Q: what if lift is in motion towards some target when a lift-lobby button is pressed and it happens to be serviceable?
A: Like the pencil solution to the space-pen challenge, my endless loop in main() may qualify as a simple solution to this daunting challenge. The system wakes up frequently to check for new requests. When sleeping, it ignores all inputs.

This simple design, if viable, avoids asynchronous or multi-threading complexities.  http://pubs.vmware.com/foundry1/pg/wwhelp/wwhimpl/common/html/wwhelp.htm?context=pg&file=Foundry_PG_concepts.3.30.html is a similar single-threaded design

For simplicity, I assume the new requests from all buttons show up in some buffer (or database), so we can poll it to see them. There’s no interrupt or callback.

Q: in your design, there are “targets” set by in-lift passengers vs. up/down requests (from lift lobbies) assigned by the system. How do you prioritize between the two types?
A: In general, targets are at higher priority than assignments, but if lift is already moving down towards Level 2 for an assigned request, it will move down all the way till that level.

I don’t want to spend too much time on any module since the correct emphasis/focus could be different in a real world design or design interview.

4buffers in 1 TCP connection #full-duplex

Many socket programmers may not realize there are four transmission buffers in a single TCP connection or TCP session. These buffers are set aside by the kernel when the socket is created.

Two buffers in socket AA (say in U.S.) + two sockets on socket BB (say, in Singapore)

Two receive buffers + two send buffers

At any time, there could be data in all four buffers. AA can initiate a send in the middle of receiving data because TCP is a full-duplex protocol.

Whenever one of the two sockets initiate a send, it has a job duty to ensure it never overflows the receiving buffer on the other end. This is a the essence of flow-control. This flow-control relies on the 2 buffers involved (not the other two buffers) P 247 [[computer networking]] has details. Basically, sender has an estimate of the remaining free space in the receive buffer so sender never sends too many bytes. It keeps unsent data in its send buffer.

Is UDP full-duplex? My book doesn’t mention it. I think it is.

deadlock avoidance: release all locks fail-fast

In some deadlock avoidance algorithms, at runtime we need to ensure we immediately release every lock already acquired, without hesitation, so to speak. Here’s one design my friend Shanyou shared with me.

  • Tip: limit the number of “grabbing” functions, functions that explicit grab any lock.

Suppose a grabbing function f1 acquires a lock and calls another function f2 (a red flag to concurrency designers). In such a situation, f2() will use -1 return value to indicate “must release lock”. If f2() calls a grabbing function f3(), and if f3() fails to grab, it propagates “-1” to f2 and f1.

  1. f3() could use trylock.
  2. f3() could check lock hierarchy and return -1.

mgr|stress| project delay #^FTE/contractor

contractor is most care-free. Even As an employee, the pressure to deliver is lower than as the mgr.

As a junior VP (perhaps a system owner) you could still stay behind a shield (defend yourself) — “I did my best given the limitations and constraints”. However, As mgr, you are more expected to own the task and solve those problems at a higher level of effectiveness, including negotiations with other departments.

“Results or reasons?” … is the manager’s performance review.

Recall Yang, Stirt-risk …

  • —- historical barometer levels created by project delivery pressure —-
  • GS – 10/10,  “if i quit GS I may have to quit this country; I must Not quit” Hazardous !
  • RTS – 7 dropping to 5
  • mvea – 3
  • Mac – 7. Kevin’s emphasis is design
  • Stirt – 8
  • OC – 5, largely due to fear of bonus stigma
  • 95G, Barc – 3, due to mgr pressurizing
  • Citi – 2

Q: which thread/PID drains NicBuffer→socketBuffer

Too many kernel concepts here. I will use a phrasebook format. I have also separated some independent tips into hardware interrupt handler #phrasebook

  1. Scenario 1 : A single CPU. I start my parser which creates the multicast receiver socket but no data coming. My process (PID 111) gets preempted on timer interrupt. CPU is running unrelated PID 222 when my data wash up on the NIC.
  2. Scenario 2: pid111 is running handleInput() while additional data comes in on the NIC.

Some key points about all scenarios:

  • context switching — There’s context switch to interrupt handler (i-handler). In all scenarios, the running process gets suspended to make way for the interrupt handler function. I-handler’s instruction address gets loaded into the cpu registers and this function starts “driving” the cpu. Traditionally, the handler function would use the suspended process’s existing stack.
    • After the i-handler completes, the suspended “current” process resumes by default. However, the handler may cause another pid333 to be scheduled right away [1 Chapter 4.1].
  • no pid — interrupt handler execution has no pid, though some authors say it runs on behalf of the suspended pid. I feel the suspended pid may be unrelated to the socket (Scenario 2), rather than the socket’s owner process pid111.
  • kernel scheduler — In Scenario 1, pid111 would not get to process the data until it gets in the “driver’s seat” again. However, the interrupt handler could trigger a rescheduling and push pid111 “to the top of the queue” so to speak. [1 Chapter 4.1]
  • top-half — drains the tiny NIC ring-buffer into main memory (presumably socket buffer) as fast as possible [2] as NIC buffer can only hold a few packets — [[linux kernel]] P 629.
  • bottom-half — (i.e. deferrable functions) includes lengthy tasks like copying packets. Deferrable function run in interrupt context [1 Chapter 4.8], under nobody’s pid
  • sleeping — the socket owner pid 111 would be technically “sleeping” in the socket’s wait queue initially. After the data is copied into the socket receive buffer in user space, I think the kernel scheduler would locate pid111 in the socket’s wait queue and make pid111 the cpu-driver. This pid111 would call read() on the socket.
    • wait queue — How the scheduler does it is non-trivial. See [1 Chapter]
  • burst — What if there’s a burst of multicast packets? The i-handler would hog or steal the driver’s seat and /drain/ the NIC ring-buffer as fast as possible, and populate the socket receive buffer. When the i-handler takes a break, our handleInput() would chip away at the socket buffer.
    • priority — is given to the NIC’s interrupt handler as NIC buffer is much smaller than socket buffer.
    • UDP could overrun the socket receive buffer; TCP uses transmission control to prevent it.

Q: What if the process scheduler is triggered to run (on timer interrupt) while i-handler is busy draining the NIC?
A: Well, all interrupt handlers can be interrupted, but I would doubt the process scheduler would suspend the NIC interrupt handler.

One friend said the while the i-handler runs on our single-CPU, the executing pid is 1, the kernel process. I doubt it.

[1] [[UnderstandingLinuxKernel, 3rd Edition]]

[2] https://notes.shichao.io/lkd/ch7/#top-halves-versus-bottom-halves

if thread fails b4 releasing mutex #CSY

My friend Shanyou asked:

Q: what if a thread somehow fails before releasing mutex?

I see only three scenarios:

  • If machine loses power, then releasing mutex or not makes no difference.
  • If process crashes but the mutex is in shared memory, then we are in trouble. The mutex will be seen as forever in-use. The other process can’t get this mutex. I feel this could be a practical problem, with practical solutions like reboot or process restart.
  • If process is still alive, I rely on stack unwinding.

Stack unwinding is set up by compiler. The only situation when this compiler-generated stack unwinding is incomplete is — if the failing function is declared noexcept. (In such a case, the failure is your self-inflicted problem since you promised to compiler it should never throw exception.) I will assume we don’t have a noexcept function. Therefore, I assume stack unwinding is robust and all stack objects will be destructed.

If one of the stack objects is a std::unique_lock, then compiler guarantees an unlocked status on destruction. That’s the highest reliability and reassurance I can hope for.

atomic{int} supports operator+=()

Background — on a machine with lock-free CPU…

My friend Shanyou asked me — with c++11 atomic data types, can we simply say

myAtomicInt ++; //without any mutex

A: Yes according to [[c++StdLib]]

Q: Is there some hidden CAS while-loop therein?
A: Yes I am 99% sure because other threads could be updating the same shared mutable object concurrently on other CPU cores.

Q: is there a CAS while-loop in a basic store/load operation?
A: I don’t think so

CHANNEL for multicast; TCP has Connection

In NYSE market data lingo, we say “multicast channel”.

  • analogy: TV channel — you can subscribe but can’t connect to it.
  • analogy: Twitter hashtag — you can follow it, but can’t connect to it.

“Multicast connectivity” is barely tolerable but not “connection”. A multicast end system joins or subscribes to a group. You can’t really “connect” to a group as there could be zero or a million different peer systems without a “ring leader” or a representative.

Even for unicast UDP, “connect” is the wrong word as UDP is connectionless.

Saying a nonsense like “multicast connection” is an immediate giveaway that the speaker isn’t familiar with UDP or multicast.

updates2shared-mutable: seldom flushed2memory immediately #CSY

  • memory fence c++
  • memory barrier c++
  • c++ thread memory visibility


You can search for these keywords on Google. Hundreds of people would agree that without synchronization, a write to sharedMutableObject1 by thread A at Time 1 is not guaranteed to be visible to Thread B at Time 2.

In any aggressively multithreaded program, there are very few shared mutable objects. If there’s none by design, then all threads can operate in single-threaded mode as if there’s no other thread in existence.

In single-threaded mode (the default) compilers would Not generate machine code to always flush a write to main memory bypassing register/L1/L2/L3 caches. Such a memory barrier/fence is extremely expensive — Main memory is at least 100 times slower than register access.

I would hypothesize that by default, the most recent write (at Time 1) is saved to register only, not L1 cache, because at compile time, compiler author doesn’t know if at runtime this same thread may write to that same address! If you update this object very soon, then it’s wasteful to flush the intermediate, temporary values to L1 cache, since no other threads need to see it.

L1 cache is about 10 times slower than register.

Multi-threaded lock-free programming always assumes multiple threads access shared mutable objects.

Even a lock-free function without contention [1] requires memory barriers and is therefore slower than single-threaded mode. I would say in a low-contention context, the main performance gain of single-threaded over lock-free is data cache efficiency. Another performance gain is statement reordering.

[1] i.e. no retry needed, since other threads are unlikely to touch sharedMutableObject1 concurrently

[11]quant library in java/c# !! catching on


You once told me java can emulate the same quant lib functionality of C/C++. I asked quants in GS, MS, ML, Barcap and a few other banks. I don’t remember anyone saying their quant lib is in java. I now feel there’s no industry momentum behind such a migration. Further, I feel there’s no justification either. I’d go out on a limb and say there’s justification for sticking to C++.

A java implementation is less accessible from dotnet, python, and other scripting languages that could be making (slow) inroads into trading floors.  In contrast, all major languages support an decent interface to integrate with a C library. C is the common denominator.

More importantly, the sponsors of the quant lib are business users (not only traders) and they
know none of the languages but they know MSExcel. I’d say Excel integration is a must for every quant lib, otherwise traders may refuse to use it. C implementations easily integrate with MSExcel, via the
Microsoft COM interface and other interfaces. C# also integrates well with Excel.

Some quant libs are used in visualization and GUI. Dotnet and WPF are a market leader in GUI.

I also feel C implementation tends to be faster, at least no slower, than java quant lib. In pre-trade real time apps, a quant lib needs to be fast. A Barcap veteran told me the most important justification for C++ in quant lib is speed/performance.

In 2018 I asked an Executive Director in MS CVA team why java is not used. He said performance is the main reason.

rvalue Object holding a resource : rather rare

I think naturally-occurring rvalue objects  rarely hold a resource.

  • literals — but these objects don’t hold any resources via a heap pointer
  • string1 + “.victor”
  • myInventoryLevel – 5000
  • myVector.push_back(Trade(12345)) — there is actually a temp Trade object. Compiler will call the rvr overload of push_back(). https://github.com/tiger40490/repo1/blob/cpp1/cpp/rvr/rvrDemo_NoCtor.cpp is my investigation. My temp object actually hold a resource via a heap pointerBut this usage scenario is rare in my opinion

However, if you have a regular nonref variable Connection myConn (“hello”), you can generate a rvr variable:

Connection && rvr2 = std::move(myConn);

By using std::move(), you promise to the compiler not to use myConn object afterwards.



UDP recv()from 1 send()at most

P116 [[tcp/ip sockets in C]] made it very clear.

A call to recv on the receiver machine will return data from at most one send() on the sender machine.

It can be a partial message, but would be the first part. See https://stackoverflow.com/questions/13317532/receiving-a-part-of-packet-via-recvfrom-udp

I believe entire payload of one send()/sendto() is packaged into an envelope. The kernel would never deliver two envelopes to one recv()/recvfrom() call. Therefore receiver can only receive one envelope at a time. If entire envelope is too large then only only first part of the payload is delivered.

childThr.get_id() after join()

Not sure about pthreads but here is c++11 std::thread::get_id() behavior:
“If the thread object is not joinable, the function returns a default-constructed object of member type thread::id.”
I believe after you join a childThr, that thread is no longer joinable, SO get_id() will return a meaningless boilerplate value.

Solution: to use that id, you need to save it before joining
https://github.com/tiger40490/repo1/blob/cpp1/cpp/thr/takeTurn.cpp is my experiment

L1/L2/L3 latency stats@2012 processor #Martin

I said “1000 times” when a GS interviewer asked for an estimate of relative latency of main memory vs register. He said that’s about right.

Numbers below were taken from the CPU Cache Flushing Fallacy blog post by Martin Thompson, which indicates that for a particular 2012-era Intel processor, the following was observed:

  • register access = single cycle or 4 instructions per cycle
  • L1 latency = 3 cycles (3 x register)
  • L2 latency = 12 cycles (4 x L1, 48 x register)
  • L3 latency = up to 38 cycles (3 x L2, 12 x L1, 144 x register)
  • MM Latency= very roughly 200 cycles  (5 x L3, 15 x L2, 60 x L1, 720 x register) = average 65 ns on a 3 GHz CPU

Diagram is more simplified than the text, but there are many fine prints.

## optimize code for i-cache: few tips

I don’t see any ground-breaking suggestions. I think only very hot functions (confirmed by oprofile + cachegrind) requires such micro-optimization.

I like the function^code based fragmentation framework on https://www.eetimes.com/document.asp?doc_id=1275472 (3 parts)

  • inline: footprint+perf can backfire. Can be classified as embedding
  • use table lookup to replace “if” ladder — minimize jumps
  • branching — refactor a lengthy-n-corner-case (not “hot”) code chunk out to a function, so 99% of the time the instruction cache (esp. the pre-fetch flavor) doesn’t load a big chunk of cold stuff.
    • this is the opposite of embedding !
  • Trim the executable footprint. Reduce code bloat due to inlining and templates?
  • loop unrolling to minimize jumps. I think this is practical and time-honored — at aggressive optimization levels some compilers actually perform loop unrolling! Programmers can do it manually.
  • Use array (anything contiguous) instead of linked list or maps to exploit d-cache + i-cache
  • https://software.intel.com/en-us/blogs/2014/11/17/split-huge-function-if-called-by-loop-for-best-utilizing-instruction-cache is a 2014 Intel paper — split huge function if it’s invoked in a loop.


session^msg^tag level concepts

Remember every session-level operation is implemented using messages.

  • Gap management? Session level
  • Seq reset? Session level, but seq reset is a msg
  • Retrans? per-Msg
  • Checksum? per-Msg
  • MsgType tag? there is one per-Msg
  • Header? per-Msg
  • order/trade life-cycle? session level
  • OMS state management? None of them …. more like application level than Session level

reinterpret_cast(zero-copy)^memcpy: raw mktData parsing

Raw market data input comes in as array of unsigned chars. I “reinterpret_cast” it to a pointer-to-TradeMsgStruct before looking up each field inside the struct.

Now I think this is the fastest solution. Zero-cost at runtime.

As an alternative, memcpy is also popular but it requires bitwise copy. It often require allocating a tmp variable.

## python metaprogramming features #IV++

  1. feature: module import
  2. feature: runtime code generation and introspection

These are two of the most powerful py features I have seen. I wrote in another blog that py reflection is richer than c# and much richer than java.

feature: decorator layering and with param
feature: metaclasses. IBM has an article

What’s in common among these features? We manipulate modules, classes and attributes as if they are ordinary data. In other words, we manipulate the program itself. That’s the hallmark of meta-programming IMHO. We can compare to other people’s concepts of “meta-programming” but I don’t have to abandon mine.

linux tcp buffer^AWS tuning params

—receive buffer configuration
In general, there are two ways to control how large a TCP socket receive buffer can grow on Linux:

  1. You can set setsockopt(SO_RCVBUF) explicitly as the max receive buffer size on individual TCP/UDP sockets
  2. Or you can leave it to the operating system and allow it to auto-tune it dynamically, using the global tcp_rmem values as a hint.
  3. … both values are capped by

/proc/sys/net/core/rmem_max — is a global hard limit on all sockets (TCP/UDP). I see 256M in my system. Can you set it to 1GB? I’m not sure but it’s probably unaffected by the boolean flag below.

/proc/sys/net/ipv4/tcp_rmem — doesn’t override SO_RCVBUF. The max value on RTS system is again 256M. The receive buffer for each socket is adjusted by kernel dynamically, at runtime.

The linux “tcp” manpage explains the relationship.

Note large TCP receive buffer size is usually required for high latency, high bandwidth, high volume connections. Low latency systems should use smaller TCP buffers.

For high-volume multicast channel, you need large receive buffers to guard against data loss — UDP sender doesn’t obey flow control to prevent receiver overflow.


/proc/sys/net/ipv4/tcp_window_scaling is a boolean configuration. (Turned on by default) 1GB  is the new limit on AWS after turning on window scaling. If turned off, then AWS value is constrained to a 16-bit integer field in the TCP header — 65536

I think this flag affects AWS and not receive buffer size.

  • if turned on, and if buffer is configured to grow beyond 64KB, then Ack can set AWS to above 65536.
  • if turned off, then we don’t (?) need a large buffer since AWS can only be 65536 or lower.


One ISIN map to multiple symbols across exchanges

Previously, a single company could have many different ticker symbols as they varied between the dozens of individual stock markets.

Today, Daimler AG stock trades on twenty-two different stock exchanges, and is priced in five different currencies; it has the same ISIN on each (DE0007100000), though not the same ticker symbol.

–For trade identification

In this case, ISIN cannot specify a particular trade, and another identifier (typically the three- or four-letter exchange code such as the Market Identifier Code) will have to be specified in addition to the ISIN.

–For price identification

The general public would want the price for an ISIN, while traders may want the price for a ticker symbol on one (or a few) liquidity venues.

symbol ticker(esp. short ones)are recycled

Symbol ticker is typically 1 to 4 chars, though numbers are often used in Asia such as HKSE. We can call it

  • symbol
  • stock symbol
  • ticker symbol
  • symbol ticker

Symbols are sometimes reused. In the US the single-letter symbols are particularly sought after as vanity symbols. For example, since Mar 2008 Visa Inc. has used the symbol V that had previously been used by Vivendi which had delisted

translation lookaside buffer #stats

  • a.k.a Address Translation Cache. The TLB lets the processor very quickly convert virtual addresses to physical addresses.
  • TLB is a cache for the big, slow page table
  • A typical entry in TLB is a pair of {virtual -> physical addresses}. In contrast,
  • A typical entry in a L1 cache is mapping of {physical address -> payload}.
  • You can hit both caches!
  • Both caches sits between processor and main memory
  • each hardware system has one or more TLBs
  • TLB-miss can be handled in hardware or kernel
  • typical miss probability — 0.01% to 1%
  • typical miss latency (penalty) — 10 to 100 clock cycles to read the page table
  • typical hit latency: 0.5 to 1 clock cycle

If a TLB hit takes 1 clock cycle, a miss takes 30 clock cycles, and the miss rate is 1%, the effective memory cycle rate is an average of 1 × 0.99 + (1 + 30) × 0.01 = 1.30 i.e. 1.30 clock cycles per memory access.

https://stackoverflow.com/questions/5440128/thread-context-switch-vs-process-context-switch says TLB favors thread rather than process context-switch. TLB gets flushed during process rather than thread context-switch.


dominant server-side language@WallSt: evolution

Don’t spend too much time.. Based on my limited observations,

  • As of 2007, the top dog was java.
  • The dominance is even stronger in 2018.
  • Q: how about in 10 years?
  • A: I feel java will remain #1

Look at the innovation leaders — West coast. For their (web) server side, they seem to have shifted slightly towards python, javascript, RoR

Q: Why do I consider buy-side, sell-side and other financial tech shops as a whole and why don’t I include google finance?
A: … because there’s mobility between sub-domains within, and entry barrier from outside.

Buy-side tend to use more c++; Banks usually favor java; Exchanges tend to use … both c++ and java. The latency advantage of c++ isn’t that significant to a major exchange like Nsdq.


##order-book common data structureS #array

Rebus has two levels of trees, according to Steve.

  1. For each feed there’s a separate symbol lookup tree — one would think a hashtable would be faster but given our symbol data size (a few thousands per feed) AVL tree is proven faster.
  2. A per-symbol bid-tree (and ask-tree) sorted by price

sizeof(‘a’) in c^c++^java #CSY

  • ‘a’ is taken as an int object in C, so sizeof(‘a’) is 4 !!
  • c++ take it as a char object.

See https://stackoverflow.com/questions/2172943/size-of-character-a-in-c-c

Java char is 2-bye to support 16-bit unicode! In Java, Only “byte” type is 1-byte, by definition.

c++ doesn’t have “byte” type, because “char” is the equivalent of the java “byte” type.


A common interview question. http://fixmeisters.blogspot.com/2006/04/possdupflag-or-possresend-this.html has details.

msg with PossDupFlag msg with PossResend
handled by FIX engine application to check many details. Engine can’t help.
seq # old seq # new seq #
triggered by retrans request not automatically
generated by FIX engine not automatically
reactive action by engine proactive decision by application

FIX.5 + FIXT.1 breaking changes

I think many systems are not yet using FIX.5 …

  1. FIXT.1 protocol [1] is a kinda subset of the FIX.4 protocol. It specifies (the traditional) FIX session maintenance over naked TCP
  2. FIX.5 protocol is a kinda subset of the FIX.4 protocol. It specifies only the application messages and not the session messages.

See https://www.fixtrading.org/standards/unsupported/fix-5-0/. Therefore,

  • FIX4 over naked TCP = FIX.5 + FIXT
  • FIX4 over non-TCP = FIX.5 over non-TCP

FIX.5 can use a transport messaging queue or web service, instead of FIXT

In FIX.5, Header now has 5 mandatory fields:

  • existing — BeginString(8), BodyLength(9), MsgType(35)
  • new — SenderCompId(49), TargetCompId(56)

Some applications also require MsgSeqNo(34) and SendingTime(52), but these are unrelated to FIX.5

Note BeginString actually look like “8=FIXT1.1

[1] FIX Session layer will utilize a new version moniker of “FIXTx.y”

IV batting average#biased guesstimate; no obsession

Disclaimer… These numbers are seriously unscientific and heavily biased and subjective. They are also biased due to the sample — I probably excluded many cases.

Out of a hypothetical 720 positions applied,

across java #no HFT across c++ non-HFT HFT
No. shortlisted based on CV 1/3 like 240 1/3 – 1/4 like 180 same as c++[1] like 180
No. passing initial screening 2/3 like 160 2/3 i.e. 120 1/3 like 60
No. technical win [3] 1/2 like 80 1/2 i.e. 60 20%[2] i.e. 12
No. offers 1/2

[1] HFT firms are very open and welcomes everyone to apply

[2] I feel if I keep trying I will pass an HFT tech interview! but if I don’t fit in, it’s not easy to find another HFT job.

[3] proved my competence (at least to myself) i.e. can do the job, but they may not like some perceived “weaknesses”, technical or communications.

FIX over non-TCP

based on https://en.wikipedia.org/wiki/Financial_Information_eXchange&#8230;

Background — The traditional FIX transport is naked TCP. I think naked TCP is most efficient.


Some adaptations of FIX can use messaging queue or web services. I think these are built on top of TCP.

MOM would add latency.


FIX is traditionally used for trading. In contrast, (high volume) market data usually go over multicast for efficency. However, there are now  adaptations (like FAST FIX) to send market data over multicast.

checksum[10] value is ascii

https://www.onixs.biz/fix-dictionary/4.2/tagNum_10.html asserts the value has 3 characters, always unencrypted.

In contrast, some non-FIX protocols use an int16 binary integer, which is more efficient.

However, very few protocols would use 10 bits to represent a 3-digit natural number. The only usage I can imagine is a bit-array like 24 bits used to combine 5 different fields, so bit 0 to 9 could be clientId…

SOH can be embedded in binary field value

Usually, a FIX tag/value “field” is terminated by the one-byte SOH character, but if the value is binary, then SOH can be embedded in it at the beginning or the END (resulting in two SOH in a row?)…

https://stackoverflow.com/questions/4907848/whats-the-most-efficient-way-to-parse-fix-protocol-messages-in-net confirms SOH can be embedded.

https://www.ibm.com/support/knowledgecenter/en/SSVSD8_8.3.0/com.ibm.websphere.dtx.packfix.doc/concepts/c_pack_fix_message_structur.htm says in a binary field, there must be a length

learning nothing Strategic@a job@@ Normal !

Suppose after I stay on a job for 2 years, (beside lots of local system knowledge) I now only have some “familiar, un-fresh” tech topics to learn, like 2 of the following

  • Java — serialization, Eclipse, ..
  • Linux commands
  • Some domain jargons
  • Perl, or python
  • SQL
  • git
  • A bit of math # always turns me on!

… but none of the following

  1. algos
  2. low latency
  3. quant
  4. .. other hot domains but none guarantees higher income

Q: Would I lose interest and feel bored? Note 80% of my peers are in this situation. They are coping fine!
A: I think I will, but need to see the reality. Looking at my past “strategic” learning, I think many are similar to my TriTech direction — my naive preference for opamps design in my 1997 third year IA at TriTech — Nothing strategic after all.

Hoping for a job with something engaging and challenging is realistic and reasonable. Hoping for “strategic” is naive. For many years, I was driven by this TriTech motivation which inevitably made me feel I’m in the wrong job. (Only Barclays job felt “strategic” for 6 months.) Now looking back, ## past vindicative specializations shows a small number of vindicative examples. I feel that’s 30%, so most of my trySomethingNew or other specializations didn’t prove strategic.

However, Quartz is different. Learning something familiar but generic like java is still better than learning Quartz. Quartz is a killer.

JGC G1 Metaspace: phrasebook #intern

Incidentally, NIO buffer is also in native memory


CV-competition: Sg 10x tougher than U.S.

Sg is much harder, so … I better focus my CV effort on the Sg/HK/China market.

OK U.S. job market is not easy, but statistically, my CV had a reasonable hit rate (like 20% at least) because

  • contract employers don’t worry about my job hopper image
  • contract employers have quick decision making
  • some full time hiring managers are rather quick
  • age…
  • Finally, the number of jobs is so much more than Sg


G3 bitter setbacks]sg

  1. long service, moving-up attempts failed 3 times in SG. Self-esteem damaged each time due to damagedGood.
    • Next time I get into a perm job, I will not try so hard
  2. c++ visible progress was slow; c# visible progress was faster but abandoned
  3. .. minor setbacks
  4. MSFM poor ROI. See tabulation
  5. FSM and trading as a side project and new direction? Unwhelming exprience… didn’t realy bear fruits

Now I feel such failed attempts are part of life for everyone unless you don’t make any attempt. It’s irresponsible to avoid reviewing those unsuccessful attempts, but let’s not be too fixated on the negatives.

## any python ecosystem question@@

This topic is relevant to how I prepare for high-end python job interviews.

  1. core python QQ i.e. hard quiz questions not needed in projects? I don’t know any
    • decorators?
    • protocols like ContextManager?
  2. python ecosystem QQ? I assume there are some topics
  • pandas? never quizzed
  • python web app? never asked
  • python to launch new processes in system automation? never asked
  • python unit test tools? never asked
  • python integration with RDBMS? never asked
  • python integration with Excel, java ..? never quizzed
  • networking scripts in py? never asked. https://github.com/tiger40490/repo1/blob/py1/py/sock/tcpEchoServer.py is my useful utility

crossed orderbook mgmt: real story4IV

disappears after a while; Some days better some days worse

data quality visible to paying customers

–individual contribution?
not really, to be honest.

–what would you do differently?
Perhaps detect and set a warning flag when generating top-book token? See other post on “crossed”


  • query the exchange if query is supported – many exchanges do.
  • compare across ticker plants? Not really done in our case.
  • replay captured data for investigation
  • retrans as the main solution
  • periodic snapshot feed is supported by many exchanges, designed for late-starting subscribers. We could (though we don’t) use it to clean up our crossed orderbook
  • manual cleaning via the cleaner script, as a “2nd-last” resort
  • hot failover.. as last resort

–the cleaner script:

This “depth-cleaner” tool is essentially a script to delete crossed/locked (c/l) entries from our replicated order books. It is run by a user in response to an alert.

… The script then compares the Ask & Bid entries in the order book. If it finds a crossed or locked order, where the bid price is greater than (crossed) or equal to (locked) the ask price, it writes information about that order to a file. It then continues checking entries on both sides of the book until it finds a valid combination. Note that the entries are not checked for “staleness”. As soon as it finds non-crossed/locked Ask and Bid entries, regardless of their timestamps, it is done checking that symbol.

The script takes the entries from the crossed/locked file, and creates a CIM file containing a delete message for every order given. This CIM data is then sent to (the admin port of) order book engine to execute the deletes.

Before the cleaner is invoked manually, we have a scheduled scanner for crossed books. This scanner checks every symbol once a minute. I think it uses a low-priority read-only thread.

mainstream^specialist jobs: positioning in Sg+U.S.

I feel this is all about positioning — accumulation, body-building contest…. The high-end jobs are always limited and contested.

  • I tend to see algoTrading and quantDev as high-end specialist roles.
  • I tend to see app owner or lead developer (a.k.a. architect) in mainstream domains not as “low-level specialist“, because I tend to believe “devil is in the details”. I feel better with the low-level specialist role
  • There are also many senior dev roles in mainstream (non-specialist) domains, such as market data, risk,…

In the U.S. there are more specialist type of roles, some even open to contractors.

In Singapore, I sought after HFT and quantDev specialist roles but as I told a Bbg interviewer, these attempts never worked out. The quant related roles offered no accumulation and poor market depth. Therefore, for my next Singapore job I will consider more mainstream.

FIX heartBtInt^TCP keep-alive^XDP heartbeat^xtap inactivity^MSFilter hearbeat

When either end of a FIX connection has not Sent any data for [HeartBtInt] seconds, it should send a Heartbeat message. This is first layer of defense.

When either end of the connection has not Received any data for (HeartBtInt + “some reasonable transmission lag”) seconds, it will send a TestRequest probe message to verify the connection. This is 2nd layer of defense. If there is still no message received after (HeartBtInt + “some reasonable transmission lag”) seconds then the connection should be considered lost.

Heartbeats issued as a response to TestRequest must contain the TestReqID transmitted in the TestRequest. This is useful to verify that the Heartbeat is the result of the TestRequest and not as the result of a HeartBtInt timeout. If a response doesn’t have the expected TestReqId, I think we shouldn’t ignore it. If everyone were to ignore it, then TestReqId would be a worthless feature added to FIX protocol.

FIX clients should reset the heartbeat interval timer after every transmitted message (not just heartbeats).

TCP keep-alive is an optional, classic feature.

NYSE XDP protocol uses heartbeat messages in both multicast and TCP request server. The TCP heartbeat requires client response, similar to FIX probe message.

Xtap also has an “inactivity timeout”. Every incoming exchange message (including heartbeats) is recognized as “activity”. 60 sec of Inactivity triggers an alert in the log, the log watchdog…

MS Filter framework supports a server-initiated heartbeat, to inform clients that “I’m alive”.  An optional feature — heartbeat msg can carry an expiration value a.k.a dynamic frequency value like “next heartbeat will arrive in X seconds”.

#112=testRequest id. There can be many test requests, each with an id.

c++condVar 2 usages #timedWait

poll()as timer]real time C : industrial-strength #RTS is somewhat similar.

http://www.stroustrup.com/C++11FAQ.html#std-condition singles out two distinct usages:

1) notification
2) timed wait — often forgotten

https://en.cppreference.com/w/cpp/thread/condition_variable/wait_for shows std::condition_variable::wait_for() takes a std::chrono::duration parameter, which has nanosec precision.

Note java wait() also has nanosec precision.

std::condition_variable::wait_until() can be useful too, featured in my proposal RTS pbflow msg+time files #wait_until

m_activity IV story@investigation skill #RTS

symptom — array filled up beyond limit of 1024 .. Undefined Behavior, often crashing entire process, but no guarantees.

This “m_activity” array is a process-wide singleton, holding ALL active tcp/udp socket descriptors (each an int id). Every time we close a socket we are supposed to remove its id from the array, and shift all “upper” elements down.

Sometimes a connection can drop unexpectedly.

What’s this array for? We iterate this array frequently for timer events. We don’t but could use select() to monitor a bunch of sockets.

I found that when we reconnect after an unexpected disruption, we were not following a proper sequence
* check if connected
* if disconnected, then remove the socket id from the array
* connect
* upon success, append the new socket id to the array

Due to bug, in a unstable period, usually at start of day, we could drop connection and reconnect many times, and fill up this array within the first hour (often within minutes).

Hard to reproduce.

HFT mktData redistribution via MOM #real world

Several low-latency practitioners say MOM is unwelcome due to added latency:

  1. The HSBC hiring manager Brian R was the first to point out to me that MOM adds latency. Their goal is to get the raw (market) data from producer to consumer as quickly as possible, with minimum hops in between.
  2. 29West documentation echoes “Instead of implementing special messaging servers and daemons to receive and re-transmit messages, Ultra Messaging routes messages primarily with the network infrastructure at wire speed. Placing little or nothing in between the sender and receiver is an important and unique design principle of Ultra Messaging.” However, the UM system itself is an additional hop, right? Contrast a design where sender directly sends to receiver via multicast.
  3. Then I found that the RTS systems (not ultra-low-latency ) have no middle-ware between feed parser and order book engine (named Rebus).

However, HFT doesn’t always avoid MOM.

  • P143 [[all about HFT]] published 2010 says an HFT such as Citadel often subscribes to both individual stock exchanges and CTS/CQS [1], and multicasts the market data for internal components of the HFT platform. This design has additional buffers inherently. The first layer receives raw external data via a socket buffer. The 2nd layer components would receive the multicast data via their socket buffers.
  • SCB eFX system is very competitive in latency, with Solarflare NIC + kernel bypass etc. They still use Solace MOM!
  • Lehman’s market data is re-distributed over tibco RV, in FIX format.
  • A major hedge fund was targeting sub-10 microsec latency and decided Solace is unacceptable and Aeron was chosen

[1] one key justification to subscribe redundant feeds — CTS/CQS may deliver a tick message faster than direct feed!

-XX:CompileThreshold to control JIT priming

## how might jvm surpass c++]latency #MS #priming has 2 other tips from jvm trading engine veterans

https://www.theserverside.com/tip/Improving-Java-performance-by-minimizing-Virtual-Machine-JVM-latency is a 2014 short piece focused on q[ -XX:CompileThreshold ] — sets the number of method invocations before Hotspot will compile a method to native code.

The -server VM defaults to 10,000 and -client defaults to 1500.

Smaller numbers reduce priming time, but Very low numbers would mean that the server starts considerably slower because of the time taken by the JIT to compile too many methods (which may not be used that often after all).

%%churn OK, accu bad] quantDev: Sg+U.S.

My past experiences are underwhelming. I thought that once I become experienced and proven in quant dev domain, things will be easier and I could move from one job to another. Wrong!

  • Barclays? helped me get into OC since OC interviewers are interested in how things are done in Barclays. Didn’t really help me go anywhere else
  • Stirt? helped me a bit with Mac interview
  • MSFM? didn’t help me get anywhere, partly because I didn’t try.

OC/Stirt/Mac gave me no insight no breakthrough in my understanding no thick->thin->thick

The number of quant dev positions is much fewer than in market data!

CRTP,ADL,thread_local to replace old-school

  • — thread_local variable to replace member data .. q[static thread_local ] in %%production code
  • eliminates pollution
  • — ADL  is often chosen to replace member operator and methods.. ADL #namespace
  • reduces coupling
  • –CRTP is often chosen to replace runtime binding (dynamic dispatch) of virtual function call
  • Template only
  • shaves a few clock cycles in HFT

I think both are advanced.

killing a stuck thread: cancellation points #CSY

CSY shared this interview question:

Q: once you know one of many threads is stuck in a production process, what can you do? Can you kill a single thread?
A: there will Not be a standard construct provided by OS or thread library because killing a thread is inherently unsafe.. Look at java Thread.stop()
A: yes if I have a builtin kill-hook in the binary

https://www.thoughtspot.com/codex/threadstacks-library-inspect-stacktraces-live-c-processes describes a readonly custom hook. It is conceivable to add a kill feature —

  • Each thread runs a main loop to check an exit-condition periodically.
  • This exit-condition would be similar to pthreads “cancellation points”

https://stackoverflow.com/questions/10961714/how-to-properly-stop-the-thread-in-java shows two common kill hooks — interrupt and Boolean flag

Note java had a deprecated thread.stop().


[20]hedge fund AUM ranking: confusing #BW,Ren,AQR

Note AUM fluctuates year by year, as hot money flows between hedge funds..

— Based on https://en.wikipedia.org/wiki/List_of_hedge_funds (https://www.investopedia.com/articles/personal-finance/011515/worlds-top-10-hedge-fund-firms.asp has a similar ranking.)

Many hedge fund managers also manage public funds and offer non-hedge fund strategies. To compare two hedge funds’ AUM, one can use

  • AUM in private funds — world top-10 are typically 30-70b in size. Bridgewater (an outlier, along with Renaissance) is #1 managing $132b
    • Man group ($60b+) is biggest in Europe in this respect
    • Citadel ($30b+) is biggest Chicago manager in this respect
  • AUM in public mutual funds following hedge fund strategies — AQR is #1 managing $172b including $60b in private funds.
  • AUM across all type of funds — Blackrock is #1 managing $6400b

https://hedgelists.com/top-100-largest-us-hedge-funds-2018/ uses a different criteria and includes ibanks’ asset-management arms.

mvea phone IV c++

Q: any experience with FIX?
Q: how many percent of your time is on support vs development?
Q: are you involved in requirement gathering and analysis?
Q: in your project, where do you use python vs c++?
Q: describe the reliability/resilience features in your NYSE system
%%A: retransmission; hot standby; database dump/restore;
%%A: after a mid-day restart, parser would request replay or snapshot to get up to date with the live feed

Q: What’s “pure virtual” vs virtual?
Q: what’s polymorphism
Q: why do you use smart pointers?
Q: what c++11 features did you use?
Q: what’s the “auto” keyword?
Q: j4 stored proc? See https://bintanvictor.wordpress.com/2007/08/01/j4stored-procedures-a-sybase-perspective/
Q: std::vector vs std::list
%%A: list offers efficient splice, insert/delete in the middle and front. Never reallocates.

Up to 700,000 orders received. Half of them are latency sensitive. System validates each order and generates a “principal order” to be sent out to all the stock exchanges in the U.S. (and sometimes Americas), in FIX. These principal trades provide perfect hedging to MS.

Each of the 700,00 orders look just like a regular cash equity order, except a flag to indicate it’s a OTC swap..

5 c++ IV topics seldom asked on java

Blogging again.. Comments welcome.

1) big-6 components of any class, namely constructor, destructor, copy-constructor, move-constructor, assignment operator, move-assignment operator — Java/c# has only one namely constructor
** Destructor should never throw … interviewer often ask why
** All of the others can throw, so how to manage?
** non-virtual destructor

2) references (r-value or l-value) vs pointers (and double pointers) — no such low level constructs in java/c#
** pointer can point to heap or stack. In java all "pointers" are about heap

3) memory management including operator-new, malloc, double free, placement-new,
memory leak prevention, dangling pointer

4) socket API — is a C not c++ api

5) template meta programming

Some minor topics:
) smart pointers — lots of tricky questions.

) virtual functions, vptr, pure virtual,
) linux system library functions like fork(), malloc(), free(), write()
) const-correct

) multiple inheritance, virtual inheritance

IP4 fragmentation+reassembly #MTU,offset

I consider this a “halo” knowledge pearl because it is part of an essential everyday service. We can easily find an opportunity to inject it into an IV.

A Trex interviewer said something questionable. I said fragmentation is done at IP layer and he said yes but not reassembly. He was wrong. See P329 [[Computer Networking]]

I was talking about IP layer breaking up , say, a 4KB packet (TCP or UDP packet) into three IP-fragments no bigger than 1500B [1]. The reassembly task is to put all 3 fragments back together in sequence (and detect missing fragments) and hand it over to TCP or UDP.

This reassembly is done in IP layer. IP4 uses an “offset” number in each fragment to identify the sequencing and to detect missing fragments. The fragment with the highest offset also has a flag indicating it’s the last fragment of a given /logical/ packet.

Therefore, IP4 detects and will never deliver partial packets to UDP/TCP (P328 [[computer networking]]), even though IP is considered an unreliable service. IP4 can detect missing/incomplete IP-datagram and will refuse to “release” it to upper layer (i.e. UDP/TCP).

  • If TCP doesn’t get one “segment” (a.k.a. IP-datagram), it will request retransmission
  • UDP does no retransmission
  • IP4 does no retransmission

[1] MTU for some hardware is lower than 1500 Bytes …

rvr coding experiments=tricky4everyone

I think every candidate faces the same challenge, so each person’s understanding would be patchy in some area.

Each candidate tries to identify and internalize a small number of “fundamental” principles in this subject, and hope the fundamentals would help connect the dots in a logical, natural fashion, but I think this is hard. There are too many surprises, too many unnatural “phenomena”. Therefore, I can only hope to connect a few dots, while the other dots remain scattered and hard to remember.

The best way to clear up the confusion and doubts, and deepen our understanding is through coding experiments, but in this domain, I found it very tricky to write code to confirm my understanding of rvr or move().

Many of the relevant language rules are too implicit, so I can’t easily insert probing prints. Some of those language rules are related to compiler’s function-override resolution.

Therefore, my standard experiment techniques are often inapplicable or ineffective.

Therefore, I now hold lower expectation of my eventual understanding of this domain. I consider further t-investment low-yielding in terms of ROTI. I should spend my time on other domains.

break out of N layers of loop

–Most flexible solution — use a wrapper function to hold the inner n layers. Use return to break out of N layers.

As a bonus, you can also return a value like “break-out with a value”

If there’s another layer outside those N layers and you are half way through that iteration and want to keep going, then in that loop call the function

–In some coding test, it’s more succinct to say

if some_break_condition: print_result(); sys.exit()

For the semicolon usage, See https://stackoverflow.com/questions/8236380/why-is-semicolon-allowed-in-this-python-snippet

socket^swing: separate(specialized skill)from core lang

  • I always believe swing is a distinct skill from core java. A regular core Java or jxee guy needs a few years experience to become swing veteran.
  • Now I feel socket programming is similarly a distinct skill from core C/c++

In both cases, since the core language knowledge won’t extend to this specialized domain, you need to invest personal time outside work hours .. look at CSY. That’s why we need to be selective which domain.

Socket domain has much better longevity (shelf-life)  than swing!

learn new tech for IV(!!GTD): learn-on-the-job is far from enough

Example — you programmed java for 6+ months, but you scored below 50% on those (basic) java knowledge question I asked you in skype chat. You only know what to study when you attend interviews. Without interviews, you won’t encounter those topics in your projects.

Example — I used SQL for at least 3 years before I joined Goldman Sachs. Until then I used no outer join no self-join no HAVING clause, no CASE, no correlated sub-query, no index tweaking. These topics were lightly used in Goldman but needed in interviews. So without interviews, I wouldn’t not know to pay attention to these topics.

Example — I programming tcp sockets many times. The socket interview questions I got from 2010 to 2016 were fairly basic. When I came to ICE I looked a bit deeper into our socket codebase but didn’t learn anything in particular. Then my interviews started showing me the direction. Among other things, interviewers look for in-depth understanding of

· Blocking/non-blocking

· Fast/slow receivers

· Buffer overflow

· Reliability

· Ack

How the hell can we figure out these are the high-value topics in TCP without interviews? I would say No Way even if I spend 2 years on this job.

[18] IV (!! GTD)body-build`logbook by skill

I’d like to keep the table columns simple.

muscle delta when duration intensity
2D coding Q 3->5 2018 2D 3
perm/combo 5->8 2017 4D 3
string/array 6->7 2
backtracking 1->6 2018 2D 3
tail recursion 0->5 2017 x hours 1
big O analysis 7->8 2018 1
sorting algo nlg 7->8 2018 x hours 1
advanced recursion 3->5 2018 x Days 3
whiteboard best practice 6->7 2018 2 days 3
shared mem 2->4 2018 2 H 2 coding experiment
c++ for coding Q 3->8 2017 10-20 D 2 bbg
socket 3->6 17-18 x days 2 thanks to IV, not project
rvr/mv/forward 2->5 16-18 x days 2
noexcept 0->5 2018 1 H 1 Trex IV
other c++11 2->4 1
py basics 4 coding test 3->5 2

##[18]realistic2-10Y career plann`guidelines

Background: not easy to have a solid plan that survives more than 3Y. Instead of a detailed plan, I will try to manage using a few guidelines.

  • –top 3 “guidelines” [1]
  • respect/appreciation/appraisal(esp. by manager) — PIP/stigma/trauma/damagagedGood. Let’s accept: may not be easy to get
  • Singapore — much fewer choices. Better consider market-depth^elite domain
  • ——– secondary:
  • Expertise accu or sustained focus — holy grail
  • family time — how2get more family time #a top3 priority4Sg job
  • trySomethingNew — may/not be justifiable
    • stagnation — could be the norm
    • engaging — keep yourself engaged, challenged, learning, despite the stagnation
  • interviews — Let’s accept : extremely important to me but much harder in Singapore. Even in the U.S. I may need to cut down.
  • distractions — Let’s accept
  • FOLB Peer pressures — and slow-track… Let’s accept.
  • Entry-barrier — could be too high for me in the popular domains like algo trading
  • non-lead dev role — Let’s embrace. Don’t feel you must move out or move up. Hands-on coding is gr8 for me. Feel good about it
  • Shrinking Choices — many employers implicitly prefer younger programmers
  • Entry-barrier — could be too low for some young guys — the popular domains will have many young guys breaking in
  • Churn — Avoid
  • personal time — short commute, flexible time, low workload, freedom to blog]office… is proving to be so addictive that I have forgotten the other guidelines.

[1] I didn’t say “priorities”

long term planning can be demoralizing

My father often tells me I plan ahead too much…

Q: where will I be, what job will I have 5 years from now?

Such questions can be demoralizing and sometimes can dampen a precious spirit of optimism. I sometimes perform better by focusing on here and now.

I think the reality may be quite bland and uninspiring — same job, with declining income, not much “offensive” to mount …

tech screening passed(but still rejected): pat on your own back

Always remember that before you invested months of serious effort, you couldn’t pass the tech screening.

One of my fundamental principles — focus on tech screening and don’t worry about offer. Once I pass technical screening, it becomes a beauty contest or personality match. If interviewers don’t like a competent candidate for age, face, language, lackOfHumor, communication style (often related to culture, nationality and up-brining) …, then I will grin and say never mind. Sooner or later someone will like my personality.

They may say candidate is cocky, opinionated … There’s no right or wrong here. Another interviewer may not feel that way.

The average programmer doesn’t have an off-putting personality, definitely not in an interview, so for every 3 interviewers who don’t like him, there will be some interviewer out there who likes him.



  1. fads — vaguely I feel these are fads.
  2. salary — (Compare to financial IT) absolute profit created by data science is small but headcount is high ==> most practitioners are not well-paid. Only buy-side data science stands out
  3. volatile — I see data science too volatile and churning, like javascript, GUI and c#.
  4. shrink — I see traditional derivative-pricing domain shrinking.
  5. entry barrier — quant domain requires huge investment but may not reward me financially
  6. value — I am suspicious of the economic value they claim to create.

max palindrome substring

https://leetcode.com/problems/longest-palindromic-substring/ seems to be the same problem.

Deepak CM received this problem in a real IV.

https://github.com/tiger40490/repo1/blob/py1/py/str/longestPalindrome.py is one solution, not O(N) at all.

— linear time solutions are on wikipedia, but probably not so intuitive so I give up.

— my simple O(NN) solution 1. first {O(N)} identify each “core” which is defined as

  • “ABA”
  • at least 2 count of the same char like AA

Then {O(N)} for each core, scan both ways in lock steps until we see a mismatch. Then we know the length of this palindrome.

https://leetcode.com/problems/longest-palindromic-substring/solution/ shows a O(NN) DP solution

— my one-scan (original) idea 2, but now I feel unsure.

We stop at every character on our forward scan. When we encounter any seed, we need to keep growing it, as one of these seeds will surely grow into the longest palindrome. However, how do we simultaneously grow so many seeds? We won’t due to efficiency.

Instead, I grow the earliest (oldest) seed only. Any seed encountered afterwards will be shorter , that is until the oldest seed stops growing and gets archived. After the archiving, I start a “manhunt’ — I look at the next oldest [1] seed and determine if it can grow to the current position. If it can’t then it is surely inferior to the just-archived oldest. If it can, then we end the manhunt right there and keep scanning forward

Guarantee to find it — every seed is queued. The only way for a seed to get archived is if it stops growing i.e. we have computed it’s full length

[1] FIFO container is needed

One of the worst test cases is a long black/white string containing only length-1 palindromes. My algo would archive many short seeds… and achieves O(N)

##functions(outside big4)using either rvr param or move()

Q: Any function(outside big6) featuring an rvr param?
%%A: Such functions are rare. I don’t know any.
AA: [[effModernC++]] has a few functions taking rvr param, but fairly contrived as I remember. See P170, 173
AA: P544 [[c++primer]] says class methods could use rvr param
* eg: push_back()

Q: any function (outside big6) using std::move?

  • [[effModernC++]] has a few functions such as P170, 174
  • P544 [[c++primer]] says rarely needed
  • [[josuttis]] p20



  • t_c++pattern — must be a named pattern
  • t_c++idiom — must be well-established small-scale
  • t_ECT and t_c++idiom are mutually exclusive
  • t_tecniq — higher than syntaxTips, less selective than t_c++idiom or t_c++pattern but Should NOT be dumping ground
  • t_implBestPractice is like gentmp

given int array,find triplets: x divides y;y divides z #Trex

Within a sequence of unique natural numbers, count the occurrence of so-called triplets of (x,y,z) where x divides y and y divides z.

I feel this is not a classic, but the technique is somewhat reusable.

https://github.com/tiger40490/repo1/blob/py1/py/array/tripletsInPreSorted_Trex.py is the improved solution after lots of hints.

— a useful technique:

Construct each {factor, multiple} pair and save it in a collection. Any new pair constructed will be checked against all existing pairs. The “check” is the trick. My 25 Apr 2018 commit shows the most efficient “check” I can think of.

list of pairs -> list of triplet -> list of quadruplet -> list of quintuplet … Build up pairs to construct triplets. Use triplets to construct quadruplets. Use quadruplets to construct quintuplets.

The pairs collection doesn’t need to be fully built before we start constructing triplets.

convert-to-0 using4transformations #Trex

Q: given a natural number, write a function to "transform" it to zero in a minimum number of transformations. Four transformations allowed:A) add 1
B) subtract 1
C) divide by 2 if divisible
D) divide by 4 if divisible

My own idea:

1) attempt D for as long as possible. Repeat for C. Now we have an odd number K
2) if K can be written as 4N-1, then do A, otherwise B.
3) go back to step 1

https://github.com/tiger40490/repo1/blob/py1/py/trivial/4conversion_ez.py is my code

find median@2sorted arrays #Trex untested

https://leetcode.com/problems/median-of-two-sorted-arrays/description/ is similar except X and Y can be unequal length. My solution solves the harder, generalized problem.

This “coding” question is really math problem. Once you work out the math techniques, the coding is simple.

Designate arr1 as the shorter array. compare med(arr1) vs med(arr2)

Suppose former is lower, i can discard lower half of arr1 (s items). Can i discard highest s items in arr2? I think so because upper half of arr2 cannot have that median element, so any subset of it can be discarded

repeat until arr1 is completely discarded or left to a single element .. might be the final median. Now answer is close to the med of the remaining arr2.

–For the equal-length problem, My own idea on the spot — find the median of X and median of Y. If med(X) < med(Y) then discard the lower portion of X i.e. the “XB group”, and higher portion of Y (“YA group”). Then repeat.

  • Note len(XB) == len(YA) == min(len(X), len(Y))/2 := K. So every iteration would shrink the shorter array by half (i.e. K), and shrink the longer array by K. K would drop in value in next iteration.
  • loop exit — When the shorter of the two (say it’s X) shrinks to length 1, we are lucky — find the numbers around median(Y) and adjust the answer based on X[0].

Insight — Why can’t the final “winner”be somewhere in XB group? Because XA + YA already constitute half the population, and all of them are higher.

I always like concrete examples. So Suppose there are 512 items in the lower portion “XB group”, and the higher portion “XA” has 512 items. Suppose there are 128 items each in YB and YA groups. So in this iteration, we discard YA and the lowest 128 items in XB.

Definition of lower portion —
* all lower items up to but not including med(X) If len(X) is odd
* exactly the lower half of X if len(X) is even

down-cast a reference #idiom

ARM P69 says down-cast a reference is fairly common. I have never seen it.

Q: Why not use ptr?
%%A: I guess pointer can be null so the receiver function must face the risk of a null ptr.
%%A: 99% of references I have seen in my projects are function parameters, so references are extremely popular and proven in this use case. If you receive a ref-to-base, you can down cast it.

See post on new-and-dynamic_cast-exceptions
see also boost polymorphic_cast

favor std::begin(arrayOrContainer)

https://stackoverflow.com/questions/7593086/why-use-non-member-begin-and-end-functions-in-c11 explains some important details.

Q: So how do we choose between

  • this free global function
  • the container member function cont::begin() / end()?

%%A: Basically, I would always use std::begin() instead of cont.begin() esp. in template-enable programs.

Ack in tcp # phrasebook

Ack — returned by receiver to original sender … on every segment. See P241 [[computerNetworking]]

byte-level — TCP seq number is at byte level and it jumps. Ack is such a seq number.

expected — Ack number is the next seq number expected

proactive Ack — Never. Receiver will Never send ACK if it has not receive anything. I think this means receiver can’t detect unplugged wire.

zero AWS — gradually the AWS value in the Ack will drop to zero

1-byte probe — See [1]

Slow-receiver — TCP flow-control is only evident with a slow receiver

retrans — sender always resends x nanosec (Timeout) after missing an Ack. See tcp: detect wire unplugged

[1] http://www.mathcs.emory.edu/~cheung/Courses/455/Syllabus/7-transport/flow-control.html

[2] http://tldp.org/HOWTO/TCP-Keepalive-HOWTO/overview.html

deleted^not-generated in c++11

[[effModernC++]] says that cp-ctor will not be generated in some cases, but in those cases is it necessary to put q(=delete) on cp-ctor?

A: I think q(=delete) has no effect on the binary. There’s simply no such ctor in the binary, with or without the q(=delete). The class has the cp-ctor deleted implicitly or explicitly.

A: q(=delete) is more explicit and improves readability. I think this is like adding q(virtual) to a derived class dtor when base dtor is already virtual

A: Without q(=delete), a new developer can add a cp-ctor.

See http://en.cppreference.com/w/cpp/language/default_constructor about no-arg ctor

y employers prefers younger workers, a revisit

My neighbor Julius (of Indonesia) said

* younger employees mean lower cost
* more energy
* can map out a 10Y career plan for him

–Below are my own observations and reading.

The younger guys often have more spare time available. Granted, many choose to spend it outside work, but a small percentage (30%?) of the ambitious, dedicated or hard-working individuals would *regularly* and voluntarily spend some of that at work.

For managerial roles, I feel a 30-something can be very effective. The relative short experience may not mean a lot.

For technical roles, the long experience of a 40-something is even less valuable. My own experience is most convincing. At 25 I was more formidable than many of my older colleagues. I was sharp, fast-learning, self-driven, knowledgeable, possibly more experienced than them in a given technology.

In my early 30’s, my capabilities were seen as more formidable more outstanding , like 90 marks.

As I grow older, even if I don’t grow weaker, there are many younger guys at 90 marks. Even if we ask the same salary, 90 marks is no longer so outstanding.  Given two 90-mark candidates at 35 vs 45, I think most employers would prefer the younger since he is seen as “more promising” and rising star.

%%FIX xp

Experience: special investment upfront-commission. Real time trading app. EMS. Supports
* cancel
* rebook
* redemption
* sellout

Experience: Eq volatile derivatives. Supports C=amend/correction, X=cancel, N=new

FIX over RV for market data dissemination and cache invalidation

Basic FIX client in MS coding interview.

Trex QnA IV #std::forward,noexcept

Q: how would a tcp consumer-client know the server process died rather than a quiet server?

Q: how would a tcp producer-server know a client process died? Signal?

Q: What’s perfect forwarding?

Q: std::forward() vs std::move()? See std::move=unconditional-cast #forward=conditional-cast

Q1: in c++03, a myVectorOfTrade.push_back( Trade(22, 0.7) ) uses how many ctor/dtor invocations?
A: regular ctor, copy-ctor, dtor of the temporary

Q1b: how about c++11?
A: regular ctor, mv-ctor, dtor of temporary. See P293…. We assume there’s a Trade mv-ctor and there’a some pointer field in Trade, otherwise the mv-ctor has nothing to steal and is probably same as copy-ctor

Q1c: what about something like emplace_back(22, 0.7)
A: in-place ctor using placement-new. P294 [[eff modern c++]] perfect forwarding eliminates temporaries

https://github.com/tiger40490/repo1/blob/cpp1/cpp1/rvrDemo.cpp is my illustration.

Q: how would “noexcept” improve runtime performance?
AA: P 91 [[effModernC++]] has a short paragraph explaining “noexcept” functions are more compiler-optimizable
AA: P 25 [[c++stdLib]] says noexcept functions don’t require stack unwinding.

Q: please implement a simple Stack class backed by a vector. Implement push(), pop(), top().