undefined reference to cplus_demangle

This is a somewhat typical linker error.

Thanks to https://stackoverflow.com/questions/10269171/cannot-link-with-libiberty

I had to move -liberty as the last -l option.

Note all the -l are small “L”

 

 

q[cannot open shared object file] abc.so

strace -e trace=open myprogram can be used on a working instance to see where all the SO files are successfully located.

— Aug 2018 case: in QA host, I hit “error while loading shared libraries: libEazyToFind.so: … No such file or directory”

I can see this .so file so I used LD_LIBRARY_PATH to resolve it.

Then I get “error while loading shared libraries: libXXXXX.so: … No such file or directory”. I can’t locate this .so, but the same executable is runnable in a separate HostB. (All machines can access the same physical file using the same path.)

I zoomed into the HostB and used “ldd /path/to/executable”. Lo and behold, I can see why HostB is lucky. The .so files are located in places local in HostB … for reasons to be understood.

— May 2018 case:

The wording should be “cannot locate ….”

I fixed this error using $LD_LIBRARY_PATH

The *.so  file is actually specified as a -lthr_gcc34_64 option on the g++ command line, but the file libthr_gcc34_64.so was not found at startup.

I managed to manually locate this file in /a/b/c and added it :

LD_LIBRARY_PATH=$LD_LIBRATY_PATH:/a/b/c/

bash: split long command to multiple lines

I managed to split a huge g++ command line  to about 300 lines… much more readable.

The trick:

  • terminate each line with a space, a backslash and … NO trailing space
  • my perl one-liner must user four backslashes to insert that single backslash, then another \n
  • total there are 5 backslashes in a row.

Here’s the command

perl -pe "s/ -/ \\\\\n-/g" build1file.sh

function returning c-string #avoid

This is a very common scenario but … c-string is always accessed by ptr, but returning a ptr is … always error-prone! https://stackoverflow.com/questions/1496313/returning-c-string-from-a-function gave two good solutions:

  1. statically allocate the string
  2. caller to pass in a ptr to an empty buffer

I like simplicity and prefer

  • static local char-array
  • global char-array
  • char const * myName() { return “Dabao”; } //the simplest solution, if the string is immutable

ptr = inevitable when using c-str

It is impossible to use any string without using pointers in C, according to https://stackoverflow.com/questions/1496313/returning-c-string-from-a-function

That’s one reason to call C a lowLevel language.

In most c++ string classes, there’s still a c-string inside every “string object”. I am 99% sure the char-array now lives on heap.

In java and c#, not only the char-array, but the entire string object (including the house-keeping data) live on heap.

## clean design may look over-complicated

What is simple/clean really depends on (your knowledge of) best practices.

  • #1 eg: shared_ptr as member variable. Looks complicated but simplifies many things.
  • eg: nested container without pointer — looks scary[1] to java/python developers, but I think in c++ this is a proven design and simplifies many things.
  • eg: most of the 23 GOF design patterns introduce additional classes to address coupling and cohesion. These are classic and clean designs but add complexity. MVC pattern ditto.
  • try-with-resource looks complicated to me
  • RAII looks complicated to me

[1] inserting a entire sub-container  into the umbrella container requires copying the sub-container.

elegant use@priorityQ #indeed

Q: A given system receives a bunch of tasks at beginning of day.
task {
string id #assigned by this system as auto-incrementing string value
int execTime
list<task> children # all children becomes runnable as soon as this task completes. This is a classic edgeList
}
implement int getTotalTime(task) #including all descendant tasks

Q1: during system initialization we need to calculate how long each task (and its subtree) will execute. I solved this “warm-up” problem using a top-down tree-walk with memoization.

Q2: Now suppose this system expands to a farm of N identical cpu nodes. When a given cpu completes a task, it can only receive one specific runnable task, i.e. the one with smallest task ID. (This is an artificial constraint that hints at priority queue. I feel in a realistic context some other constraints may exist. Hopefully those constraints can be modeled and supported in the same priority queue.)

Overall, this is a realistic problem, not blatantly contrived.

Solution — I use two priorityQueues to hold two mutually exclusive subsets of tasks. Any task not in these PQs are completed (dead) or blocked (unborn). The mutual exclusion is a hallmark of clean designs.

PQ1 “the eligibles” holds all runnable tasks. This queue receives a group of tasks when group’s parent task completes — a (completion) event. This PQ is sorted by task id.

The event time is predictable as soon as this parent task starts running on a cpu.

PQ2: “the running” contains up to N tasks that are in the driver seats, sorted by expected end times.

My main() function is nothing but a while loop. In each iteration, pop the lowest item from the “running” queue, to simulate the earliest task completion event. At the completion, add all its children to the eligible queue. Then pop the leading eligible (possibly a new entry) and append it to the “running” queue.

The main loop is therefore an event loop like Win32 event loop. Therefore, it’s driven by the “running” queue, not the eligibles queue.

Elegant solution based on two priorityQueues. This data structure is not really intuitive so I think such intuition can and should be learned. This is my first enlightenment on the usage of priority queue. PQ1 yields the most eligible task, based on id or other criteria; PQ2 yields the earliest finisher according to each task’s duration and start time.

  • “Running” tasks is a natural usage of priorityQueue;
  • “Eligibles” is based on a stylized scheduling policy for illustration, but I can imagine real scheduling policies to use criteria other than FIFO. Note FIFO queue is less versatile than priorityQueue for scheduling.
  • Kernel scheduler uses priority queue in a similar way

Exit condition — while-loop ends when the running queue becomes empty. The eligibles queue must have emptied out earlier. The last event time is the total cost.

Minor Note on Initial data parsing: When Task 1 is received, it may reference zero or more child tasks by id, but the child Task-55 details are only available when Task-55 description is seen later on. So it’s best to store a list of child ID’s rather than pointers. The ID’s can translate to Task pointers via a lookup table, either a map or array

Minor Note: we don’t enqueue tasks unrelated to the initial task (specified by user input). We only care about the initial task and its subtree.

classic generator problems: non-trivial

classic generator problems (such as combination sum, combo with redraw, phonePad…) are not simple for most of us programmers. Even though these are classic and well-researched, most programmers can’t really master them .. cognitive barrier is formidable.

Our mind is not constructed to deal with such challenges. I have tried and failed many times to compare across various generator problems hoping to extract the gist like thick->thin 总结规律

Duplication in output —  is one of many tough challenges.

The recursion-in-loop magic is another “thingy” beyond our comprehension.

Many of them are DP problems —

  • A size-5 data set is easier to deal with than size-6
  • real challenge is to train our human brain to see through the “forest” and identify how size-5  RESULTS can help solve size-6

I believe these algos are building blocks of many coding problems, both real-world problems and quiz problems.

count paths between 2 binTree nodes #PimcoQ9 Ashish,Pinsky

The DP idea — compare matrix-path-counter and EditDistance

  • showcasing efficient queue in python.
  • showcasing using (x,y) coordinates as dictionary key
  • showcasing find max value in a dictionary

— Requirement: (https://leetcode.com/problems/unique-paths-ii/description/ is similar)
See Q9.pdf in the email to Ashish. Here are some key points:

Consider a maze mapped to a 2d-grid with an upper left corner at coordinates (row, column) = (0, 0). Any movement must be in increasing row or column direction. You must determine the number of distinct paths through the maze. You will always start at position (0, 0), the top left, and end up at (max(row), max(column)), the bottom right.
1 1 0 1
1 1 1 1
As an example, consider the above matrix where 1 indicates an open cell and 0 indicates blocked. You can only travel through open cells, so no path can go through the cell at (0, 2). There are two distinct paths to the goal. As a 2nd example, matrix below has 10 paths:
1 1 1 1
1 1 1 1
1 1 1 1

https://github.com/tiger40490/repo1/blob/py1/py/2d/classic_pathCountQ9.py is my solution.

==== Analysis ====
I see a binary tree, where each node is a cell. Cell (0,0) has down/left node (1,0) + right node (0,1). I feel this is similar to a more generic problem “count paths between 2 nodes in a binary tree”

— DFT:
😦 I implemented something like a DFT but it was too slow with some test cases
😦 can overflow stack
🙂 DFT can print each unique path. I think BFT can’t.
🙂 DFT is easier to implement than BFT

— DynamicProgramming BFT solution from origin, as I described to Bill Pinsky:
This solution is more versatile than the one from Ashish. It can handle any directed graph.

Mark each node as “computed” once we have computed a score denoting how many paths-from-root there are. Save the score in a shadow matrix.

Origin node has score 1. Blocked cell has score 0.

Start BFT from origin. The two immediate neighbors are set to 1 (or 0 if blocked). Every node can be computed by adding up above-score + left-score. (This algo is simpler than the EditDistance algo.)

Memoization Performance note — My implementation was extremely slow (albeit correct) until I added an “if already computed, then continue” early in the loop

This Layered DP algo is also described in max-path-sum problems

Q: why is score[1,1] accessed 4 times?
A: node[1,1] is added to the queue twice. Each dequeue would need one check.
A: Also score[1,2] need to access score[1,1] as a parent. Ditto score[2,1]

— Ashish Singh gave me a much simpler solution, as shown in my github code.

SOR/OMS/.. skill !! upstream

See %%algo trading dev xp !! HFT

SmartOrderRouting, OrderMgmt, FIX connectivity .. are core skills in algo trading dev, but are they really upstream, advanced skills? I doubt it.

  • in terms of interviews, I feel the domain nlg (jargon or architecture) required is a thin layer. The technical screening seems to focus on generic tech skills esp. coding test.
  • In terms of technical know-how (zbs), such a developer will NOT become stronger because of this experience. If she does acquire portable tech knowledge (possibly with an industry standard API rather than local wrappers), then she is probably experienced in that technical domain, but not other domains.

coding drill≈yoga + weight loss

I had a brief discussion with my friend Ashish.. Coding test improvement is similar to weight or flexibility improvement. I also tell my son about diet control. You probably can think of many other examples.

  • we don’t notice any improvement for a long time. Still, I believe the practice makes me better than before.
    • In the absence of visible progress, Optimism and Faith is important. A pessimist tends to give up before seeing ROTI
  • when we stop the exercise, we are likely to lose part of the effort and continue the downward spiral, esp. with flexibility. See coding drill: LASTING value4family wellbeing for10Y 
  • as we grow older, we feel we tend to lose the competition, even though there’s not enough evidence.
  • it can feel boring and repetitive
  • pair exercise is easier
    • Otherwise, promise a friend

algo trading developers .. salary premium@@

Hi Friends,

For years I believed algo trading (including HFT) roles pay some 10-30% higher than regular financial IT roles. Now I doubt it.

For the sake of argument, let’s say the market rate for a senior developer is 160k base + some bonus + some stocks.

  • Factor: elitism and selectivity — buy-side algo trading jobs are notoriously hard to get. I tried about 10 places (Hudson River, Mako, SquarePoint, Gelber, DRW, Jump, TrexQuant, Susquehanna, WorldQuant, Millennium, 3-arrows…) Even if salary is higher, the chance of getting it is extremely low, at least for me.
  • Factor: role expectation and workload — I don’t have experience working at buy-side algo trading shops, but i do have experience in other demanding teams. I didn’t do so well, so I know the risk of sky-high expectation is very real. So even if salary is higher, how long can we keep it?
  • Factor: where is the intelligence — virtually all the algo-trading engineer roles I have seen emphasize low-latency rather than high-frequency. High-frequency is not really a job description for an engineer — developers optimize the low-latency system, to enable high frequency trading. The developers’ skill and intelligence required is substantial, but it’s not the same intelligence portrayed in mass media — which is trading algorithm or “alpha”. That intelligence is required only for the quants, the strategists, the portfolio managers, the traders… They are paid higher than most engineers. I guess some would say the engineers play a supporting role.
  • Factor: architect level — the high salaries (say 30% above market rate) are often at the architect (or lead developer) level. At that level, a regular financial IT job also pays above market rate. If we compare salaries at architect level, then algo trading doesn’t pay such a big premium. Perhaps 10%? Actually I’m not an architect so this level is not really relevant. I’m more comfortable as a senior developer.

For a regular senior developer, I feel algo trading roles on sell-side pays no higher than average financial IT. I know it from multiple interviews (BNP, HSBC, Citi …)

For a regular senior developer on a buy-side algo-trading system, I guess it can pay 10-20% above market rate, up to 200k base (?), but I also know of many other financial IT jobs in the U.S. paying 180k to 200k base.

Update — SIG can pays 160k for mkt-data or SOR roles but too selective.

My tentative conclusion:
* algo-trading on sell-side doesn’t pay a premium
* algo-trading on buy-side is a high-end domain, but there are other high-end domains paying at a comparable level.

prefer std::deque when you need stack/queue

  • std::stack is unnecessarily troublesome for dumping and troubleshooting … see https://github.com/tiger40490/repo1/blob/cpp1/cpp/2d/maxRectangle.cpp
  • std::queue has a similar restriction.

Philosophically, std::stack and std::queue are container adapters designed to deny random access, often for safety. They are often based on an underlying “crippled deque”. As soon as you need to dump the container content, you want a full-fledged deque.

## any unexpected success@tsn@@

## past vindicative specializations is more comprehensive, but in this blogpost I want to start on a clean slate and answer the question —

Q: I have explored far and wide… for new domains, new challenges, so are there any unexpected successes of try-something-new?

skill discovered proven over %%strength 1-5 val 4 GTD val4IV
unix shell + unix admin 2000 Catcha 2Y 3 4 1
! perl 2000 Catcha 3Y 4 #many years ago 2 1
LAMP+js 2000 1Y 2 0 0
! python #popularity no surprise 1Y 2 #xp@Mac 2 3
! socket #and tools 2017 never 1 #lowLevel 3 if relevant 2
!! threading concept+java impl 2010 4Y 5 #theoretical 0 5
! x-lang collections 2010 5Y 4 #lowLevel+theoretical 1 5
! x-lang OO 2010 NA 4 #lowLevel 0 4
white board coding [1] 2011 2Y 2 @WallSt 0 3
c++instrumentation/build tools
! bond math 2010 Citi 1Y 2 1 if relevant 2
option math 2011 Barc 1Y 3 2 if relevant 1
fin jargon 2010 4Y #mvea 3 #know what’s relevant 2 2
finSys arch #abstract 2010 2Y 2 3 3

[1] Am more comfortable on whiteboard than other candidates.

Explanation marks means Surprise

10 μs Additional latency: collocated eq OMS

  • Many organizations “are using the words ultra low latency to describe latencies of under 1 millsec” [1]
  • 13 microsec in collocated eq OMS
  • 150 microsec “single-trip” latency in similar software outside collocation site, measured by Corvil, from A to B
    • Time A: FIX msg coming into our engine
    • Time B: FIX msg going out of our engine
    • 150 μs is median, not average
    • Corvial is (most likely) a TCP network sniffer with FIX parser so it can track a single order flow
  • 2 millis in a “regular” build
  • A major hedge fund had a very limited flow featuring 10 microsec tick-to-trade. Monolithic process. User sends in a symbol + a template id. Engine “instantiates” the template and sends out the FIX

[1] https://en.wikipedia.org/wiki/Low_latency_(capital_markets)

Treasury trading doesn’t need such low latency.

sum@arbitrarySubArray, mutable int #Rahul#segmentTree

Q: given an array of mutable integers, implement subArraySum(le, ri), and updateElement(idx, newVal)

This is data-structure heavy. You need correct data structure to support efficient update/query.

Assumption A: Without loss of generality, i will assume original array length is a power of two, such as 8

— Idea 1: carefully segment the array. Maintain array of Segment object {le, sum}

The segments can shrink/expand based on heuristics. For now, I will assume “Segment.le” is immutable.

Every update() will update the Segment.sum in exactly one segment per level.

At the leaf level, there are 8 segments of length one or two. (Given Assumption A, it would be two.)

Next level I will have 4 segments. Each segment at this level consists of exactly 2 leaf segments. Similar to Fenwick tree and segmented binary tree, update() and query() are both O(log N)

killer app@RBTree #priorityQ,sortedVector

A common usage context is query on some data set after pre-processing . In these contexts, BST competes with sorted vector and priority queue.

  • Killer app against vector: incremental insert or live updates
  • Killer app against vector: if there’s even occasional delete(), then sorted vector would suffer
  • Killer app against vector: update on one item can be implemented as delete/reinsert. Vector requires binary search -> mid-stream insert
  • minor advantage over sorted vector: vector sorting requires swapping, so value-semantics is problematic
  • Killer app against priority queue: range query, approximate query,
  • Killer app against priority queue: both max() and min()
  • Application: if each BST node needs additional data. Binary heap doesn’t expose those nodes(?)

It’s important to remember the advantages of vector

  1. cache efficiency
  2. runtime malloc cost

Java, c++ and c# all provide self-balancing BST in their standard libraries, from the very beginning. In my projects, I use these containers on an everyday basis. However, after talking to a few friends I now agree that most coding problems don’t need self-balancing BST because

  1. These problems have no incremental insertion/deletion, so we can simply sort an array of items at O(N logN) as pre-processing
  2. In some cases, priorityQ is a viable alternative
  3. Python doesn’t have this container at all and all coding problems must support python.

bitBucket: investigate history of merges

Scenario: throughout the month, your main branch takes in many pull-requests. You want to see the sequence of merge points in chronological order.

On BitBucket, the commit listing is chronological but doesn’t show the merge commits. I usually unhide the merge commits, so on my page the merge commits are interleaved with “regular” commits in chronological order like

  • merg3
  • merge2
  • commit2c
  • merge1
  • commit1b
  • commit2b
  • commit 2a
  • commit1a
  • commit3a

The regular commits line up according to their commit timestamps. the most recent merge could have its one and only commit created months ago !

What I want is “nothing but the merge points”. I think I have to ignore the “regular” commits and only look at the merge commits.

template template param

Andrei Alexandrescu briefly introduced this technique .. I think he said it is one of the most powerful TMP techniques.

One common feature of templ-templ param — the param P “template<typename>P” is usually used as “P<S>”, where S is a sister parameter to P !

http://www.informit.com/articles/article.aspx?p=376878 has a good tutorial.

https://stackoverflow.com/questions/213761/what-are-some-uses-of-template-template-parameters shows operator<<(). I have it in my github https://github.com/tiger40490/repo1/blob/cpp1/cpp/lang_33template/containerOutputOperator.cpp

The variadic version is broken because the template tries to accept any class template having 1 or more parameters in the definition. This includes std::basic_string, which already has an operator<<() that conflicts with the variadic version.

This also helps me understand that the 2-param version (using templ-templ) is really a template that accepts any class-template having two type-params by definition. Note the two type-params can use default values.

— See my github demo https://github.com/tiger40490/repo1/blob/cpp1/cpp/lang_33template/tmpl_tmpl_param.cpp

First and possibly biggest challenge is — understand the limitation of traditional solutions. I would say this is 90% of the struggle for a novice.

 

partition list into good^bad sections #c++LiquidNet

Q: template<typename I, typename P>
I myPartition(I bb, I const end, P isgood); //return first Bad node in the range after shuffling all Bad ones to the end. If no Bad node, then return the original end.

You can call it like firstBad = shuffle(li.begin(), li.end(), isEven). Itr and Pred are template params or well-defined typedefs.

Note we can’t assume the iterator provide sort order. We can only inc/dec and check equality.

https://github.com/tiger40490/repo1/blob/cpp1/cpp/linkedList/partition_LNet.cpp is my solution but it’s hard to reason because I move two rather than one pointer. Interviewer gave a simpler solution with similar time complexity.

Lessons:

  • in a two-iterator algo, if every move on each iterator has some logic, then it is probably over-complicated. In this case, we could move one iterator according to some logic, and simply move the other iterator to the same spot.
  • “bb should be at or before ee” — the invariant is critical but hard to maintain in the face of complex logic
  • There were many scenarios in my design. There are fewer scenarios in the interviewer’s design.

Risk analytics dev, hopefully !! another broken dream#Ashish

Update: many developers seem to suggest that any risk analytics experience would not help me move into “front office”, though I’m not so hung up about front office.

Many people also point out budget is typically lower in back office.

Hi Ashish,

Thanks for your input. I feel bad about repeated failed attempts to break into a few quantitative domains (listed below) so I need to consider some negative what-ifs. What if the risk system is similarly “underwhelming” like:

* volatility surface fitter in Barclays
* curve building and real time “ticking” risk sensitivity in Baml
* quant library integration at Macquarie

* equity and FX volatility product pricing in OCBC
* (I guess your Barclays mortgage system could be another quant domain?)

Armed with my formal education and analytical competence, I have tried valiantly to find a foothold in a quant dev domain. I was unable to dig in my heels.

I will not go into details on these “broken dreams”. Suffice to say that I had enough of these … that recently I decided to walk away from a higher-paying ($120/hr) contract because it looks like another system with some complicated financial math.

I don’t want to give up completely on my (large) investment in quant study. This is the source of one of the biggest pains in my career. As I said, I am also cautious about throwing good money after bad money…

Thanks for pointing out the “infrastructure” part — I can imagine that like your friends, most tech guys in risk systems are only responsible for the infrastructure. In some banks, there’s an army of developers supporting the risk systems, so the infrastructure is presumably quite complex and large scale. Presumably, if a developer can understand a sizable infrastructure or the complex “plumbing” in a risk system, he would be considered competent, valuable, and strong even though he doesn’t understand the math.

Across domains, math is not the only domain knowledge — I believe 1) system architecture and 2) jargon are important domain knowledge too, esp. in risk systems.

Among the millions of java/c++ developers in risk systems, I guess 2% to 10% need some quant knowledge? In such a case, the market depth would be rather poor, because risk quant developer roles would be few and far between, just like the pricing quant developer market.

LiquidNet IV c++

Q1: partition list into good^bad sections

Q3b: fix a sample source code that passes unique_ptr<int> by value to a foo() function:

unique_ptr<int> foo(unique_ptr<int> kk)

Q3b.1: give examples of move semantic

Q3b.2: write a template solution to check if the template argument T is a pointer type. I don’t have a good solution but I mentioned SFINAE. My github has a sample sfinae code to check for specific types of pointers. In contrast, std::is_pointer doesn’t handle ptr to member etc.

How about static_assert? Yes tested

See post on template type arg validation

Q3a: improve

string & find(list<Person> li, string name){
  for (auto i=li.begin(); i!=li.end(); i++)
    if (*i == name) return i->nickname;
  return "";
}

%%A: last return will crash due to return-by-ref. If my function’s return value can be empty, then I usually return by pointer
%%A: equality check is dubious. Better be explicit like Person(name) or i->convertToString()
%%A: li.end() can be cached
%%A: ++i
%%A: li and name should really be const ref

Q4: predict output
%%A: I feel #2/3 are bad code but compiler may not be able to

struct B{void foo(){cout<<"hello\n";}; b = new B(); b->foo(); //#1
delete b;
b->foo(); //#2 .. what?
b=NULL;
b->foo(); //#3 .. what?

LiquidNet IV questions # non-c++

Each orders is big, at least 40,000 shares… latency unimportant. Two clients can negotiate without revealing identities — Any chat is monitored. No auction supported.

Other dev teams outside the “core” MatchingEngine team:

  • execution-algo engine — Client can also send an algo order like a vwap order. The LN ME (matching engine) will forward it to an execution-algo engine (owned by another team), which slices it and send the pieces to “lit markets” like NYSE. Alternatively, it can send them back to the same LN ME.
  • There are WPF apps for clients to use.
  • There are post-trade analytics systems (Venkat’s interest?).
  • There are settlement systems

Q: how do you go about testing your c++ code? LN uses python to automate tests. C++ developers must write tests for their own features.

–QA interviewer: a junior BA proposed a system with system F (perhaps some hedge fund) integrating with system L. L would periodically (5-sec interval) use an API to query F to get a list of orderId’s. Then L would convert each orderId into a FIX message and send to F. Please improve.

I feel this question requires domain knowledge.

  • Sugg: batch mode — one big message containing 999 items is more efficient than 999 small messages. Not sure about FIX
  • Sugg: In the current “pull” model, 5-sec interval may be too frequent if F is quiet; 5-sec could be too infrequent if F has a burst of new orders, so it’s best to make F “push” to L.
  • Sugg: the push can make use of 2-way FIX messaging

–retrans question (Good): In your order-book engine (like Rebus), suppose you get a bunch of order delete/exec/modify messages, but the orderId is unrecognized and possibly pending retrans. Rebus doesn’t know about any pending retrans. What would rebus do about those messages?

A: I feel warehousing is the only choice. After some timeout, should clear the outdated/hopeless … memory footprint. Can use a FIFO cache.

MSOL q[Received]: space-sav`custom format

–based on http://www.tushar-mehta.com/publish_train/xl_vba_cases/1205%20Outlook%20custom%20columns.shtml

IIf(Date()=int([Received]),Format([Received],”hh:mm”),IIf(Date()-[Received]<7,format([Received],”ddd m/d”),format([Received],”d mmm yy”)))

Warning: need to clean up the double-quotes after copy-paste from this blogpost!

I name it “-myDT3” — 3 types of values

Q: can I share this custom column between Inbox/Deleted/Sent?
A: Unsuccessful try in 2010

Problem — unsortable
Solution — add the original Received field, but keep it razor thin

How to edit the formula: FieldChooser -> User-defined (usually empty) -> drag the new field in -> edit -> drag it out

nsdq=lightweight west coast CIV

  • like west-coast, No coding questions in threading like bbg
  • like west-coast, no memory mgmt questions
  • like west-coast, no question on containers as implemented in any language
  • arch design question — more in-depth than Indeed
  • algo + whiteboard
  • unlike west-coast, there were a bit of knowledge questions, but nothing in-depth like on Wall St.
  • Unlike west-coast, there was a weekend assignment

Now I feel some west coast interviews might be similar to nsdq interview.

staircase problem #CSY@Bbg

Q (bbg question posed to CSY): given a staircase of height N (eg 3), you can reach the top by three steps 1,1,1 or two steps 1,2 or 2,1, or a single step of 3. Generate all paths for a given N.

I think the solution(s) need more review and internalization.

This problem has many classic solutions

  • bottom-up
  • top-down recursion with memoization
    • yield-based generator function
  • BFT without recursion without duplicates

I feel this is simpler than AQR factorization and the CombinationSum problems.

Is this same as N-boy-split? No… split is harder. With the split, ABC can have a “step” of AC.

easy to test —  https://github.com/tiger40490/repo1/blob/cpp1/cpp/algo_comboPermu/staircase_CSY.cpp was my non-recursive BFT solution.

https://github.com/tiger40490/repo1/blob/py1/py/algo_combo_perm/staircase_CSY.py is my yield-based recursive solution. Only 5 lines in the implementation. I added lots of tests.

Q: why path count == 2n-1?
Answer from CSY: f(1)=1, f(2)=2, f(n) = f(n-1)+f(n-2)…+f(1) = 2n-1
A: For a staircase of n, you can take step of 1 and have f(n-1) paths, or you can take step of 2 and have f(n-2) paths…

—jargon file:

a STEP from LEVEL 0 to 2 has LENGTH=2, and skips level 1; a PATH consists of 1 or more STEPS; a FORMULA is a complete PATH, and always starts at level 0 and may hit intermediate levels #2, #5, #6 …

— key observations:
* Observation 1: Every formula starts with firstStepOf1, or firstStepOf2, (or firstStepOf3…) with not duplicate between these two groups !
* Observation 2: After taking firstStepOf1, the remaining steps are really a formula for staircase of N-1, a smaller problem. This invites a bottom-up DP solution.

—BFT solution. suppose N = 5. We model each path as a directory path from root. Each queue item is a path represented by a linked list or vector of capacity N.

I hope this solution avoids duplicates… Yes it does:)

  1. enqueue all “first levels” /1; /2; /3; /4; /5
  2. then dequeue and visit first path i.e. “1”. In this case, there are 4 levels remaining in the staircase, so enqueue /1/1;  /1/2;  /1/3;  /1/4
  3. then dequeue and visit 2nd path in the queue i.e. “/2”. enqueue /2/1; /2/2;  /2/3

Every time (after dequeue) we see a path has no remaining level, we have a formula.

—DP (bottom-up) iterative solution to be implemented

  1. first, build the formulas for a length-2 staircase. They are /1/1 + /2 i.e. two formulas saved in FormulaArrayForStaircase2 i.e. “fa2”
  2. then build the formulas for a length-3 staircase — fa3 = firstStepOf2 -> fa1 + firstStepOf1 -> fa2
  3. then build the formulas for a length-4 — fa4 = firstStepOf3 -> fa1 + firstStepOf2 -> fa2 + firstStepOf1 -> fa3

— recursive: CSY gave a recursive solution and interviewer asked “Can you find a non-recursive solution”?

I think the recursive solution repeats the same sub-problem (such as a 3-level staircase) over and over again, but we can make the recursive function more efficient with memoization — it should short-circuit when the input already has a solution saved in a global hashtable.

—- python yield/generator recursive solution

My github code shows simple generator function without memoization is about half the speed of bottom-up. With memoization, it’s 20% faster than bottom-up.

update each cell with d2nearest 0 #DeepakCM

(Deepak 2019) tough matrix problem: given a black/white but mostly white matrix, for each white cell, compute the least horizontal/vertical steps (shortest distance) to nearest white cell.

Given a Matrix with 1’s and very few 0’s, replace all the 1’s in the matrix with the adjacent distance to nearest 0. There can be more than one ‘0’ in the matrix
Ex : Input: Matrix contains more than one ‘0’ Matrix = {
1, 1, 1, 1, 1,
1, 1, 1, 1, 1,
0, 1, 0, 1, 1,
1, 1, 1, 1, 1,
1, 1, 0, 1, 1 };
Output = {
2, 3, 2, 3, 4,
1, 2, 1, 2, 3,
0, 1, 0, 1, 2,
1, 2, 1, 2, 3,
2, 1, 0, 1, 2 }

——————-
Theoretical limit: O(N)

I will denote W := # white cells. Optimal solution should/might be O(N). For N = 1 quintillion, the run-time should NOT grow even when there are more white locations.

If we know for sure the scores in four neighbors, then it’s O(1) to work out my score. Which cells have known scores? those next to the zeros.

–idea 4: same as frontier but using a queue to hold frontier cells.

–idea 3 (frontier):
Does this algo work with any graph?
What if the white cells are on the perimeter and the frontier shrinks?
How would two frontiers join?

initial-scan #0a to initialize all non-white locations to -1 (indicating “green”). Save the number of greens in a “greenCount”
initial-Scan #0b to collect all white locations. For each white location F with some green neighbor, saves F in a “frontier” collection, perhaps a linked list.

When “saving” also cache (using a 4-bit integer) the nongreen neighbors, to avoid repeated memory access.

Also create an empty “new frontier” collection.

The initial scans can be combined but at the cost of simplicity.

Invariants before any subsequent update-scan —

  • Every frontier location has some green neighbor.
  • new frontier collection is empty.
  • greenCount is up to date.

update-scan #1 update each adjacent green location to the frontier. Set score to 1, finalized and no longer green. iif a finalized location F has a green neighbor, then save F in the new frontier collection.

After the scan, assert the new and old frontier collections have no overlap. Now swap old and new frontier collections and clear the new collection.

update-scan #2 for each location in the frontier, update adjacent green locations to 2, finalized and no longer green. If such a finalized location F has a green neighbor, then save F in the new frontier collection.

Green count should now reduce. When it becomes 0 we are done.

big-O? At each scan, the new frontier is usually larger than the old frontier until an inflection point. If before each update-scan we keep a count of the frontier collection size, i think they won’t add up to exceed N. Therefore, total complexity is O(N) provided the fanout is a constant like 4. If fanout is unlimited, then possibly O(V+E) since we visit each node and each ege up to 3 times.

–idea 2 (shells):
scan #1 to save all white cell locations, and save all black cell locations in a shadow matrix (bool shadow matrix of the same size as original matrix) and a blacklist (hashtable indexed by cell location)
For each while, compute distance to center. At end of this scan, designte one whte cell as red i.e. closest to center.

scan #1b[O(N)] update all black cells with d2red. Now we have some baseline values, to be improved
scan #2 [O(W)] for each white, update all cells around it with distance 1. Remove the updated cells from the “blacklist”

Also set the bool in the shadow matrix

Scan #3 for each white, update the 2nd shell

details?

If a black cell is updated by 5 white cells in the same iteration, then all 5 whites would be equally distant, so the first of them would remove the black cell from blacklist.

So each black cell is only updated once .. O(N)?

–idea 1 (DP) incomplete:
Scan#1 from top and left, update each cell with a “TL score” i.e. shortest dist ignoring the Bottom-Right quadrant of cells i.e. cells right-and-lower of current.

consider a typical cell on 2nd row. what’s tl score? Can compute using upper and left neighbors? will it ignore a white on the right?

Scan#2 from bottom right, to compute a BR score for each cell

Scan#3 (can be part of Scan#2) combine the data

Rationale — for each white cell, the shortest path can be in either BR quadrant (Scan2) or (Scan1) the other 3 quadrants.

algo-IV^QQ expertise

Background — became expert via external QQ benchmark` !!localSys xx

Q2: how are algo quizzes different from the C++ language expertise tested in job inerviews?
%%A2: I feel this algo expertise is loosely defined — mostly defined by a few hundred “fashionable and common algo interview questions.”

Ironically, the algorithm researchers and inventors may not ace these west-coast algo interviews 😉 Also, the non-trivial algorithms in textbooks are invariably too complicated for a 45-min coding interview.

There’s no deep insight or innovation in this game. This type of “expertise” doesn’t solve tough real-world problems. They can only solve simple real-world problems.

Top performers in this game are possibly students, exam-taking machines. They won’t feel proud of themselves as compared to c++ language expert or algorithm inventor.

Q2b: how are speed coding different?
%%A: in addition to A2, speed coding requires a mastery of a tiny subset of the language features and mastery of many coding idioms based thereon.

python usage + risk system ] GIC #ChaoJen

Government Investment Corporation is a large buy-side. IT heads are keen about python as a first-class language, used for data analysis, in post-trade risk systems etc. Pandas is the most common module.

Python salary is not always lower than java/c++. My friend Chao Jen said python salary could be high if the team is demanding and deep-pocketed.

Is the GIC risk system mostly written in python? Chao Jen doesn’t know for sure, but he feels c++ may not be widely used in the GIC risk system. Might be a vendor product.

Hackerrank python tests can be hard to get 100% right, but Chao Jen said many get close to 100%.

FIX^MOM: exchange connectivity

The upshot — some subsystem use FIX not MOM; some subsystem use MOM not FIX. A site could send FIX over MOM (such as RV) but not common.

It’s important to realize FIX is not a type of MOM system. Even though FIX uses messages, I believe they are typically sent over persistent TCP sockets, without middle-ware or buffering/queuing. RTS is actually a good example.

Q: does FIX have a huge message buffer? I think it’s small, like TCP, but TCP has congestion control, so sender will wait for receiver.

Say we (sell-side or buy-side) have a direct connectivity to an exchange. Between exchange and our exterior gateway, it’s all sockets and possibly java NIO — no MOM. Data format could be FIX, or it could be the exchange’s proprietary format — here’s why.

The exchange often has an internal client-server protocol — the real protocol and a data flow “choke point” . All client-server request/response relies solely on this protocol. The exchange often builds a FIX translation over this protocol (for obvious reasons….) If we use their FIX protocol, our messages are internally translated from FIX to the exchange proprietary format. Therefore it’s obviously faster to directly pass messages in the exchange proprietary format. The exchange offers this access to selected hedge funds.

As an espeed developer told me, they ship a C-level dll (less often *.so [1]) library (not c++, not static library), to be used by the hedge fund’s exterior gateway. The hedge fund application uses the C functions therein to communicate with the exchange. This communication protocol is possibly proprietary. HotspotFX has a similar client-side library in the form of a jar. There’s no MOM here. The hedge fund gateway engine makes synchronous function calls into the C library.

[1] most trader workstations are on win32.

Note the dll or jar is not running in its own process, but rather loaded into the client process just like any dll or jar.

Between this gateway machine and the trading desk interior hosts, the typical communication is dominated by MOM such as RV, 29West or Solace. In some places, the gateway translates/normalizes messages into an in-house standard format.

##define good life: job,health,life-chances..

I will not list generic and vague items like cash asset or health.

  •  job
    • #1 factor — sense of security that I can always get a job. Ashish Singh is the first one to point it out
    • engaging, challenging job, learning something meaningful, not like java/SQL again.
    • some complexity
    • [$$] reasonable commute
  • [$$] not so much time spent on home maintenance
  • enough time to spend with kids
  • enough time for exercise
  • [$] yoga classes + other classes
  • [$] healthy and convenient diet
  • grand parents frequent visits? Very hard
  • kids not falling behind in school
  • passive income? LG2 but the more the better
  • appreciating property?
  • parks nearby, not necessarily upscale waterfront
  • [$$] clean street
  • [$$] car-freedom
  • freedom to blog everywhere .. /therapeutic/
  • ..
  • life chances
  • [$ = money can “buy”, to some extent]

22surprises{U.S.reentry #traction#stigma

There are many valuable observations below, but let’s not spend too much time polishing…

  1. self-esteem regained — in tech IV/GTD, after 5Y bleeding self-confidence #stigma
  2. coding tests — continues to spread. I improved progressively, gained traction — I even find it enjoyable.
    • dnlg — all 3 domain knowledge categories are losing weight in interviews. Note half my recent interviews are outside ibanks.
  3. my c++ competence (esp.sockets) finally gained traction, thanks to the interviews.
  4. rise of west coast salary level. U.S. tech job market didn’t lose steam. U.S. geek economy continues to grow
  5. Tristate housing — school-district housing is more expensive than I thought, but Edison/Bayonne can be quite affordable
  6. ! java remains robust and dominant in ibanks. c++ is robust too. There are still many c++ roles in U.S.
  7. [c] concentration window — proved to be extremely important to my learning, career planning and reflections. Parenting and household chores are real drags.
  8. [c] my peers didn’t “leave me in the slow track”. Most of them are still regular developers. I guess they can’t move up unless they were already in a lead role by age 35
  9. “strategic technology bet” — is thoroughly discredited, through repeated introspection

–Next 20

  1. [c] ibanks interviews — (including coding IV) continue to play to my advantage, after 5 years
  2. [c] Java QQ continues to feel natural to me… I didn’t lose most of my java QQ strength…
  3. [c] aging developers — I see good examples in Shubin, Paul, Shanyou, Alan, Daniel, Kam, Pinsky, John etc
  4. U.S. high-end contract rate — has grown from $90 to $110
  5. start-ups — There are many interesting start-ups both in U.S. and Singapore, able to pay.
  6. retire — I have decided to retire in Singapore not U.S. I see my Singapore citizenship as a huge strategic advantage over my Chinese/Indian peers.
  7. [c] wife was competent at her job and continues to keep the kids in the current condition without deterioration
  8. [c] kids — my daughter didn’t become alienated; my son didn’t get out of control.
  9. [c] I continue to take unpaid leaves to learn from interviews
  10. mkt data — enjoys growing demand and I gained traction more than I gained a new defensible territory.
  11. quant career and math aptitude — broken dream. Disillusioned. deep pain
  12. U.S. investment yield is typically 6%, higher than what I observe in Singapore.
  13. [c] ibanks didn’t reduce IT budget or go offshore as some predicted
  14. [c] HFT — is robust
  15. c++ demographics — mostly older
  16. apps and coding jobs … (in the global economy) are becoming more important, more wide-spread than I anticipated.
  17. [c = continuation, but unexpected]

[19] j^python arg passing #(im)mutable

Python is ALWAYS pbref.

— list/arrayList, dict/HM etc: same behavior between java and py —
pbref. Edits by the function hits the original object.

— string: probably identical behavior between java and py —
immutable in both java and py.

Passed by reference in both.

If function “modifies” the string content, it gets a new string object (like copy-on-write), unrelated to the original string object in the “upstairs” stackframe.

— ints, floats etc: functionally similar between java and py —
If function modifies the integer content, the effect is “similar”. In python a new int object is created (like string). In java this int is a mutable clone of the original, created on the stackframe.

Java int is primitive, without usable address and pbclone, never pbref.

Python numbers are like python strings – immutable objects passed by reference. Probably copy-on-write.

this->myVector as instance field

Sound byte — if you want your class to have a vector field, then a nonref field is the simplest design, unlike the java convention (below).

I have occasionally seen this->myVector as a nonstatic data member. I think this is normal and should not raise any eyebrows. [[effC++]] P62 has a simple example.

I also used std::map and other containers as fields in my classes, like PSPCDemux.

Java programmers would have a pointer field to a vector constructed on heap, but memory management is simpler with the nonref field. In terms of memory layout, PSPCDemux::myvector has some small footprint [1] embedded in the PSPCDemux object, and the actual container payload has to be allocated on heap, to support container expansion.

[1] Java is different as that “small footprint” shrinks to a single pointer.

These fields don’t need special handling in PSPCDemux ctors. By default an empty container would be allocated “onsite”. PSPCDemux dtor would automatically call the container dtor, which would free the heap memory.

If you adopt the java convention, then your dtor need to explicitly delete the heap pointer. This is tricky. What if the dtor throws exception before deleting? What if ctor throws exception after calling new?

versionTable^BST to support binary search #Indeed

My son’s exam grading table says
A: 85 and above
B: 62 and above
C: 50 and above

One solution to support efficient getGrade(score) is a deque of record {qualifyingScore, grade}. Binary search on the qualifyingScore field is O(logN)

Now a subtly different problem. Suppose you update your resume title periodically,

“developer” in version 1 or later
“SrDev” in version 5 or later
“Consultant” in version 9 or later
“Architect” in version 13 or later

This problem is “subtly” different because new versions always get higher version numbers.

I have always avoided the deque-table data structure, in favor of the more powerful RedBlack trees. Current conclusion — RBTree is still superior, but IFF we receive the table content (high volume) in sorted order, then deque-table is simpler at least conceptually. Since python and javascript don’t offer RBTree, many interviewers aren’t familiar with it.

  • prepend/append is the advantage of deque-table. However, RBTree insert-with-hint is amortized O(1), comparable to deque table.
  • mid-stream Insert in general is more efficient on RBTree because deque is O(N)
  • Delete is similarly more efficient in RBTree.
  • Lookup is all O(logN).

Delete is often simplified away in coding tests, but important in practice, to correct mistakes or prune overgrown data stores. Realistic Binary search systems seldom operate on “immutable” data store.

read_matrix(r,c)beats matrix[r][c]

Usually read_matrix(r,c) simply returns matrix[r][c] but

Such a wrapper function can

  • assertions
  • read-count + write-count on each element
  • write-once constraint enforced
  • saves typing — a(r,c) vs mat[r][c]
  • encapsulation — the matrix can be encapsulated as a local static object.
  • simplifies conditional logic when r==0 or r==c. Without this wrapper, I often need separate code to initialize those special cells
  • In python, defaultdict() can handle index-out-of-range automatically but read_matrix() also does a fine job

IDE IV=easier4some !!4me

I guess many (majority ?) candidates consider IDE coding IV easier than white-board. I think they rely on IDE as a coding aid. Many programmers are not used to “demo” coding using white board … stressful, unnatural?

The “coding aid” feature helps me too, but it helps my competitors more !

If on white-board (or paper), my relative rating is A-, then on IDE i would score B

  • my relative weakness on IDE — ECT speed
  • my relative strength on white-board — clarity of thinking/explanation

%%algo trading dev xp !! HFT

Let’s not try to look like a HFT pro, citing your low-speed algo trading xp…. You could look desperate and clueless.

My list experiences don’t guarantee success, because devil is in the details (Just look at all the details in xtap retransmission…) However, I should feel more confident than an outsider.

  • [95G/Citi] OMS — In Citi, I worked on automatic quote cancellation and republish. Most executions involve partial fill and subsequent quote (limit order) reduction, similar to OMS. Citi and Baml were probably 2 biggest muni houses in the world. Baml system also handles corporate bonds.
  • [Citi] real time even-driven (tick-driven, curve driving…) quote repricing
  • [Citi] generating and updating limit orders in response to market data changes. Need to brush up on the basic principles since I was asked to elaborate, but a developer probably doesn’t need thorough understanding.
  • [95G] OMS — low-speed, low volume (bond) order management using very few FIX messages with trading venues to manage order state. I actually wrote my own wait/notify framework. This is probably the most valuable algo-trading project I have worked on.
  • [OCBC] simple automated option quote pricer in response to client RFQ
  • [95G] FIX messaging — up to 6-leg per order. Another ECN uses NewOrderSingle and Cancellation messages.
  • [RTS] FIX feeds — with heartbeat, logon etc
  • [NYSE] proprietary protocol is arguably harder than FIX
  • [NYSE] high volume, fault-tolerant raw market data draining at 370,000 messages/sec per thread
  • [NYSE] order book replication — based on incremental messages. I also implemented smaller order books from scratch, twice in coding interviews. This is directly relevant to algo trading
  • [NYSE] connection recovery, under very high messaging rate. Devil in the details.
  • SOR — no direct experience, but Citi AutoReo system has non-trivial logic for various conduits.
  • [Barclays] converting raw market data into soft market data such as curves and surfaces, essential to algo trading in bonds, FX-forwards, equity options.
  • [Stirt] real time ticking risk, useful to some traders if reliable
  • home-made FIX server/client
  • [Citi/Stirt] real time trade blotter — i added some new features
  • [Citi] very limited experience supporting an ETF algo trading system
  • [UChicago] school project — pair trading

However I feel these experiences are seen by hiring managers as insufficient. What are the gaps you see between my experience and the essential skills?

? limited OMS experience
? latency optimization
? network optimization

tech specialization=illusion #cod`IV

My dad is a real specialist. There are new comers who publish books to challenge the established theories, but it always takes a new comer many years of accumulation to make a name.

In contrast, Tech specialization suffers from the combined hazards of 1) churn (+volatility), 2) low entry-barrier,

3) screening interviews are the real watershed just like 高考. The screening used in job interviews are often coding tests. Your specialization is marginally relevant at best. So even if you specialize in, say, cryptocurrency, FunctionalProg, Containers, SOA, .. the hardest interview challenge you face is probably a generic algorithm coding test! Your specialization and accumulation could help you do a better job, but not getting the job. I feel this is like an actress auditioning. You don’t need such strong experience acting; you just need to wow them at audition.

4) market depth — (similar to my dad’s situation) after one job in this specialization, can you easily find another?

python^perl #my take

(Collected from various forums + my own, but ranking is based on my experience and judgment)

  1. OO
    1. – Not sure about perl6, but not many app developers create perl classes. Many CPAN modules are OO though. Python users don’t create many classes either but more than in perl. I guess procedural  is simple and good enough.
    2. – not sure about perl6, I feel perl OO is an afterthought. Python was created with OO in mind.
    3. – even if you don’t create any class, out-of-the-box python relies (more deeply) on more OO features than CPAN modules. Object-orientation is more ingrained in Python’s ethos.
    4. OO is important for large-scale, multi-modular development project. Python is more battle tested in such a context, partly due to it’s OO design, but i think a more important reason is the python import machinery, which is probably more powerful and more widely used than perl’s
  2. – Python can cooperate better with other popular technologies like DotNet (IronPython) and Java (Jython). See also python project in boost.
  3. – perl’s text processing (and to a lesser extent unix integration) features are richer and more expressive. Using every key on the keyboard is like using all 20 fingers and toes. I immediately felt the difference when switching to java. In this aspect, pytyon is somewhere between those 2 extremes.
  4. – perl can be abused to create unreadable line-noise; python has a rather clean syntax
  5. – As part of unix integration, perl offers many practical one-liners competing effectively with grep/awk/sed. Perl one-liners integrate well with other unix commands
  6. – Perl was initially designed for unix, text and as an alternative for shell script programmers. It’s still very good for that purpose. For that purpose, I feel OO offers limited value-add.
  7. – for unix scripting, perl is more similar to unix shell scripting at least on the surface, but python isn’t hard to learn.
  8. – I feel perl was more entrenched and more established in certain domains such as unix system automation, production support, bio-informatics
  9. big data, data science and data analytics domains have already picked python as the default scripting language. Perl may be usable but not popular
  10. – some say Perl the base language is much larger(??) than Python’s base language so probably it takes longer(???) time to learn Perl than Python but in the end you have a more expressive language???
  11. – CPAN was once larger than python’s module collection. If CPAN’s collection is A and python’s module colleciton is B, then I personally feel (A-B) are mostly non-essential modules for specialized purposes.
  12. – Python has better support for Windows GUI apps than Perl.
  13. I feel the open source contributions are growing faster in python
  14. There are more books on python

initialize const field in ctor body #cast

Background: I have a const (int for eg) field “rating” to be initialized based on some computation in the ctor body but c++ compiler requires any const field (like rating) be initialized in initializer only.

solution: cast away the constness.. See https://stackoverflow.com/questions/3465302/initializing-c-const-fields-after-the-constructor

I also used it in my github code https://raw.githubusercontent.com/tiger40490/repo1/cppProj/cppProj/concretizeSheet/concretize.cpp

*const_cast<bool*>(&hasUpstream) = tmp_hasUpstream;

[19] problems+!published solutions: better4me

Nowadays I feel problems without published solutions, without extensive test cases are better for me, since I don’t worry about … wipe-out, like bloodshed.

-> better avoid Leetcode. Careercup and bbg codecon are fine. Hackerrank?

I used to worry about not knowing correct solutions -> wasting my time. Now I know from extensive experience that coming up with my homemade solutions is good enough and rewarding, even if incorrect/incomplete.

I don’t always need or want to know the correct solution, not every time.

Further, I often wish there’s no known solution, since they always wipe out my satisfaction and self-esteem.

 

jvm thread Maps-To(isn’t)native thread: %%speculation

I don’t (need to) know HOW a jvm thread maps to a native thread. I believe by construction, JVM attaches house-keeping info on the native thread handle (something based on a ptr). The housekeeping data probably provides essential management features, such as

* thread dump
* interactive debugger support
* jGC knows all the live objects on a thread’s call stack

need a safe invalid value for a c++float@@ NaN

— For c++ float, if you need a safe “invalid” value, there’s NaN, with standard support like std::isnana() etc

— For c++ int, you need to pick a actually-valid number like INT_MAX.

Q: How do you find a special value to indicate “variable has an invalid value”?
%%A: I think you need separate boolean flag.
A: boost::optional #NaN #a G9 boost construct is exactly designed for this

save lookup-key in payload object: data duplication

In c++ data structure designs, I often have a key/payload pair in a lookup map, and face the design choice — should I duplicate data by saving the key in the payload object?

Context (simple) — if every time we access any payload object, we always have the key on hand, then no need. It could be unnecessary complication to copy the key into the payload instance, since we may not have the key at payload construction time.

Context — if sometimes we manipulate the payload object without (easy access) to the key, then reconsider. For example, the key object may be a private field of some Acct class so we would need a handle on that Acct instance. I would incur the cost of data duplication by saving the key inside payload, upon construction.

If key object address is “permanent” .. like maintained in some global collection or allocated on heap, then I prefer reference/pointer. (In java, every object is a reference.) This way, key object content change won’t affect me.

(Special case) If the key object is an integral type and immutable, I would simply copy its content into the payload object.

(complex special case) if the key object is BASED-ON a simple data type (string or int) but needs to be a custom MyKey type, then we have the (KISS) choice of saving the string as a payload field, but half the times when we read this field from a payload, we may need to create a temp MyKey. I usually prefer KISS.

I treat this as a design issue more than merely a low-level implementation issue, because I have many concerns about the data duplication:

  1. memory footprint
  2. !! More important than (A) is artificial complexity and higher chance of bugs. I’m very lucky if the key object location is permanent as described above.
  3. code maintenance

map::emplace() calls std::pair ctor

The non-map emplace() usually forwards the args to the ctor of the element type

vector<Acct>::emplace(args) would call Acct(args)

In contrast, map emplace() calls std::pair ctor!

https://en.cppreference.com/w/cpp/container/map/emplace shows emplace() using 3 versions of std::pair<> class template constructors.

    std::map<std::string, std::string> m;
 
    // uses pair's move constructor
    m.emplace(std::make_pair(std::string("a"), std::string("a")));
 
    // uses pair's converting move constructor
    m.emplace(std::make_pair("b", "abcd"));
 
    // uses pair's template constructor
    m.emplace("d", "ddd");

The simplest is most “magical” and (i guess) may not work for more complex data types. I used this feature in https://github.com/tiger40490/repo1/blob/cpp1/cpp/algo_miscIVQ/concretize.cpp

std::reference_wrapper usages

std::reference_wrapper is a class template that wraps a reference in a copyable, assignable object. Usages:

  • std::reference_wrapper is primarily used to store references inside standard containers which cannot hold references. In my projects, no one ever stores references in containers. Pointless.
  • std::reference_wrapper is also used to pass objects by reference to the constructor of std::thread, or the helper functions std::make_pair() and std::make_tuple()

Helper functions std::ref and std::cref() are often used to generate std::reference_wrapper objects.

See https://en.cppreference.com/w/cpp/utility/functional/reference_wrapper

–details on the std::thread usage

https://github.com/tiger40490/repo1/blob/cpp1/cpp/thr/takeTurn.cpp shows that without std::ref(), the object passed to std::thread ctor is a clone, so the original object is not updated by the thread routine.

My https://github.com/tiger40490/repo1/blob/cpp1/cpp/thr/functorToThrCtor.cpp uses std::ref() too. Same code also shows another adapter wrapper for a function

https://en.cppreference.com/w/cpp/thread/thread/thread says std::ref() is likely needed.

https://thispointer.com/c11-multithreading-part-3-carefully-pass-arguments-to-threads/ has detailed sample code

Q: your #1 domain xp.. pricing^mktData@@

See also marketable_domain_xp spreadsheet… Don’t spend too much time on this question… illusion. Though I see myself a veteran in some domain but hiring managers may see a haphazard collection of marginally related experiences. I may say AA is my #1 domain, but I may still choose BB domain.

E-trading (AutoTrading) and OMS are too vague as domains. Means different things to different people.

In terms of absolute length, pricing would be my #1 domain

  1. Citi pre-trade bond pricing
  2. Barcap
  3. Stirt risk bucketing
  4. OC Quest
  5. MSFM + self-study

In terms of depth, I’m very comfortable with the (limited) math therein, which is an entry barrier to many.

  • 😦 However, this domain has relatively few jobs and poor accumulation
  • 😦 Mostly in ibanks
  • 🙂 has contract roles
  • 🙂 salary premium? occasionally
  • 🙂 entry barrier? reasonable

[19] new problems!=the best drill

I used to focus on how many new problems solved each week, but that’s very challenging and not necessarily most effective. In contrast, Reviewing existing code is easier i.e. requires lower absorbency, but can still take hours.  Worth the time!

There’s a real risk to overspend time on new problems.

  • We don’t always fully digest the solved problems. No one can remember so well without refresh. Therefore, am growing stronger than before and stronger than my peers who don’t invest this time to refresh.
  • a bigger risk is burnout, stress, rat race against friends who “solved X new problems in Y weeks”. Reality is, they don’t necessary perform better or grow stronger.

 

##eg@syntax/idiom learning via coding drill

Syntax (+ some best practice in syntax) is essential in 70% of my real-coding interviews.

There are many memorable exceptions like char-array/int-array, tree, linked list, regex ….. , which require ECT, BP and pure algo, but trivial syntax. Some of them are relatively “hard” on pure algo and my relative strength shines through among Wall St candidates.

Below are some examples of syntax learning via coding drill

  • pointer manipulation in linked graph
  • matrix manipulation
  • pitfall — iterator invalidation
  • pitfall — output iterator hitting end()
  • c++ search within string
  • eg: multi-threading problems —- I had at least 8 experiences in java and c++ (UBS-Yang, Barcap, Gelber, SCB, HSBC, Wells, Bbg thrPool, Pimco + some minor ones in BGC)
  • eg: template, move semantic and placement-new … asked in white-board coding, nothing to do with ECT!
  • eg: lower_bound() means touchUpperBound is all about syntax not algorithm. Revealed via coding interview!
  • See also categories pyECT_nlg, t_c++syntaxTip, t_c++tecniq, T3_stdStr_tagT2_cStr/arr/vector
  • see tags t_c++ECT_nlg
  • c++ syntax requires 3-6 times more learning than python

java’s amazing perf^simplicity #c++/c#

Paradoxically
* java’s syntax simplicity is on par with python, better than c# and much better than c++.
* java’s performance is on par with c# and c++, largely due to JVM and JIT

java has been popular on web servers, and crucially the newer mobile, cloud, big-data platforms, beating c++, c#, python

java’s adoption rate as a foundation platform or integration-target… is better than all other languages. Many products are built for java, on JVM or with java in mind. I’m deliberately vague here because I don’t want to spend too much time analyzing this vague, general observation.

Liunx(!!Unix)kernel thread can! run user program

–“Liunx kernel thread cannot run user programs”, as explained in [[UnderstandingLinuxKernel]] P3.

Removing ‘kernel‘… Linux userland threads (based on LWP, explained below) do run user programs.

Removing ‘thread‘… Linux kernel processes or interrupt routines could possibly run under a user process pid.

Removing ‘Linux‘ … Other Unix variants do use kernel threads to run both 1) user programs and 2) kernel routines. This is my understanding from reading about JVM…

  • unix  — multi-threaded user apps =use=> LWP =use=> kernel threads
  • linux — multi-threaded user apps =use=> LWP. LWP is the basic unit of concurrency in user apps. Linux LWP is unrelated to Unix LWP.

SQL expertise(+tuning)as competitive advantage: losing mkt val

Opening example — I have received very few selective/tough SQL interview questions. The last interview with non-trivial SQL is Lazada.

If you want to rely on SQL/RDBMS skill as a competitive advantage, you will be disappointed.

I think many teams still use SQL, but use it lightly. Simple queries, simple stored proc, small tables, no tuning required. Therefore, interview questions are dumbing down…

I believe I over-invested in SQL. The last job that required any non-trivial SQL was 95G…

strategic value of MOM]tech evolution  is about MOM, but similar things can be said about SQL and RDBMS.

This is a cautionary tail in favor of TrySomethingNew. If my friend Victoria stays within the familiar domain of SQL/Perl/PHP/Apache, then her skillset would slowly lose market value.

Don’t forget — TSN can have low ROTI. We have to accept that possibility