AlphaGo relied on c++ for efficiency

When Stroustrup mentioned AlphaGo, I immediately said that was really cool.

He said when he looked under the hood he realized it was built on c++ components.

I think AlphaGo has to analyze a large number of possibilities so efficiency (not latency) is important and had to use c++.

Stroustrup also singled out TensorFlow library.

reliable multicast for replicated-cache update

https://www.usenix.org/conference/usits-99/scalable-web-caching-frequently-updated-objects-using-reliable-multicast is a 1999 research paper. I hope by now multicast has grown more mature more proven. Not sure where this is used, perhaps within certain network boundaries such as a private network of data servers.

This paper examines reliable multicast for invalidation and delivery of popular, frequently updated objects to web cache proxies.

## defaultdict tricks #matrix

https://github.com/tiger40490/repo1/blob/py1/py/algo_tree/commonAncestor_Indeed.py demos a few time-saving tricks with defaultdict

  • mydict[key] # without printing or using the return value … would trigger the dictionary to construct a default value for this key IFF missing
  • from key/value pairs, build dict {key -> valueList}

https://github.com/tiger40490/repo1/blob/py1/py/algo_combo_perm/staircase_CSY.py shows

  • defaultdict(int) as a frequency counter.
    • mydict[key] += 1 #works even when key was previously missing in mydict
  • defaultdict(lambda: ‘_’).
    • mydict[(row,col)] can implement a matrix. negative (or other invalid) indices results in default value of string ‘_’. I believe dict key can be tuple but not list.
    • In contrast, the 2D array interprets negative indices as wrap-around !
  • defaultdict(lambda: [[],[]]) creates a list of 2 small lists for any “invalid” key. In contrast defaultdict(list) will create a monolithic list which hits runtime error when you access element of the nested list i.e. dict[someKey][0][anyIndex]

demanding mental power: localSys imt QQ

I told grandpa that I have noticed signs that my mental power is declining, mostly in localSys and local codebase absorbency. However, localSys has always been a weakness in me. The decline, if any, is very gradual.

QQ learning is still in good progress. Experience, accumulation, deepening.. thick->thin.

Note A.Brooks’ article talks about creativity, innovation, analytical … My issue is mostly memory capacity and speed. Recall the Indeed interview.

BST: finding next greater node is O(1) #Kyle

https://stackoverflow.com/questions/11779859/whats-the-time-complexity-of-iterating-through-a-stdset-stdmap shows that in a std::map, the time complexity of in-order iterating every node is O(N) from lowest to highest.

Therefore, each step is O(1) amortized. In other words, finding next higher node in such a BST is a constant-time operation.

https://en.cppreference.com/w/cpp/iterator/next confirms that std::next() is O(1) if the steps to advance is a single step. Even better, across all STL containers, iterator movement by N steps is O(N) except for random-access iterators, which are even faster, at O(1).

Kyle’s simple trie ] python

https://github.com/tiger40490/repo1/blob/py1/py/algo_tree/trie_Kyle.py is Usable for some rare coding IV. By Kyle Stewart.

https://leetcode.com/problems/implement-trie-prefix-tree/ has a slightly more complete requirement.

I think I can modify (not write from scratch) Kyle’s code to add the word search.

Looks similar to the 10-billion phone number problem

void list.reverse() ^ reversed()→iterator

These three constructs do three different things, without overlap

  • mylist[::-1] returns a reversed clone
    • myStr[::-1] works too
  • mylist.reverse() is a void in-place mutation, same as mylist.sort()
  • reversed(mylist) returns an iterator, not printable but can be used as argument to some other functions
    • I feel this can be convenient in some contexts.
    • reversed(myStr) works too but you can’t print it directly

Similarly,

  • Unlike items(),dict.iteritems()return iterator

Different case:

  • sorted(any_iterable) creates a new vector (i.e. “list”). Iterator would be technically stupid.

max salary: simple game-plan

The strategy — “Whether I ask for a big base or modest base, there’s a chance I may have problem with manager expectation. So let’s just go for the max salary and forget about learning/tsn

    • algo trading? tend to pay a premium, but I wonder how they assess my trec.
    • java/c++ combo role? will Not pay lower
    • Some quant skill? tend to pay a premium
    • If a HFT shop makes a real offer at S$150k base I will decline — no real upside for me. Similarly, if a quant dev job pays $170k base I will decline — the promised accu (across jobs) is a big mirage. Accu can happen within a single job, but so is the technical accu on a single job.

Max-salary game plan must not ignore :

  • correlation between salary and expectation — as observed in some past jobs but not in every lucrative role. My Barclays and 95G roles were great.
  • the stigma, damagedGoods and high expectations in Stirt and Macq…. Ashish’s view — just earn the money for 6 months and leave if not happy.
  • commute
  • reputation risk at the major banks.

Am i still a survivor? I would say YES in OC and GS, and yes in Macq based on the internal transfer offer.

Mithun suggested — Are we traumatized/scarred and fixated on the stigma? I said the same to Deepak CM.

pass generator output to next generator

I think this technique can be extremely time-saving in coding tests.

https://github.com/tiger40490/repo1/blob/py1/py/algo_combo_perm/1fromEachSet.py my code demos:

for myset in pool:     output = list(gen(output, myset))

The gen() function uses yield. For the first call to gen(), we exhaust all of its items and save into a list named “output”.

Then we pass this list into the second gen(), this time with a different myset

checked STL^checked java Collections

jdk checkedList, checkedMap etc are designed to check type errors — checking any newly added item has the correct type for the collection. See P 246 [[java generics]]

STL checked container checks very different coding errors. See http://www.informit.com/articles/article.aspx?p=373341, which is extracted from my book [[c++codingStd]]

techShop-plan: rare opp2try tech start-ups

The Sg2019 presents a rare opportunity to work for such a firm and gain some insight how they select candidates.

Salary might be slightly lower like 150k vs 180k@SCB, but i think it’s completely fine, until we (irrationally) compare to some peers.

The inherent risk of stigma (damagedGood, PIP..) are lower since I’m older and newer, bringing down the expectation. I would feel like Michael Jordan joining baseball.

missing q[ return ] in non-void function #Josh

I told Josh about this real Scenario: my non-void function has 3 control paths, one of them without a return statement.

g++ usually give a warning under “-Wall”. Flowing off the end of a function is equivalent to a return with no value — undefined behavior in non-void function.

If you print the function return value and the 3rd control path executes, you get undefined behavior.

https://stackoverflow.com/questions/3402178/omitting-return-statement-in-c

avoid lopsided BST

Self-balanced is not a theoretical nicety but an essential BST feature. Without it a BST could easily become lopsided and all operations will slow down.

If for any reason (I don’t know any) we can’t use a AVL or RBT, then we could randomize the input and insert them (without deletion) and get a fairly balanced BST.

competitiveness: Pre-teen sprinter + HFT selectivity

I was a decent sprinter at age 6 through 13, then I realized I was not good enough to compete at Beijing municipal level.

There are quite a number of HFT shops in Singapore. I think they have a higher bar than ibanks. At ibank interviews I felt again like that pre-teen sprinter, but HFT interview is generally too hard for me

prevent c++compiler reordering statements

http://www.cplusplus.com/reference/atomic/atomic_signal_fence/ says this low-level c++11 function is something like a

directive to the compiler inhibiting it from making optimizations that involve moving writing operations beyond a releasing fence or read operations before an acquire fence.

I doubt there’s any need for this in any application.

UDP: send/receive buffers are configurable #CSY

It’s wrong to say UDP uses a small receive buffer but doesn’t use send buffer.

Receive — https://access.redhat.com/documentation/en-US/JBoss_Enterprise_Web_Platform/5/html/Administration_And_Configuration_Guide/jgroups-perf-udpbuffer.html shows how to increase UDP receive buffer to 25MB

Send — https://stackoverflow.com/questions/2031109/understanding-set-getsockopt-so-sndbuf) shows you CAN configure send buffer on a UDP socket.

Send — https://www.ibm.com/support/knowledgecenter/en/SSB23S_1.1.0.15/gtpc2/cpp_sendto.html also confirms SO_SNDBUF send-buffer size applies to UDP sockets

In addition, Application is free to use its own custom buffer in the form of a vector for example.

%%approaches to classic generator problemS

* p2u in next_perm

* insert at each position of each array, in a growing collection to create new arrays for the collection. Optionally, all arrays are immutable.

Actually, only two approaches so far, though each of them can be adapted in many ways.

Note many (simple) classic generator algos are required in coding tests or (practical) algo problems

c++ lockfree data types: mostly bool/int/pointer

  • In generally, only atomic bool/int/pointer can be lock-free. Bigger types need special implementation techniques and I have read that lock-free is usually not possible.
  • Atomic flags are lock-free (this is the only type guaranteed to be lock-free on all library implementations)

const vector{string} won’t let you modify strings #ChengShi

Point #1: a const vector only gives out const references. In this case, reference to const string objects.

My friend ChengShi said he has never seen const vector<const string>, because const vector<string> is good.

Point #2: In fact, my experiment of q[vector<const string>] can’t compile. Why?

https://stackoverflow.com/questions/17313062/vector-of-const-objects-giving-compile-error explains that in vector<T>, T must be assignable!

git | list all(and only)commits from branching point to a descendant commit

Background — Suppose you made 3 commits on your feature branch name “parix”, but meanwhile, someone added 4 commits in master branch. Therefore there is now a divergence in the commit history graph.

Often times you need to visualize the divergence. You need to exactly what 3 commits are on your branch after the common ancestor.

git log master..pairx # listing the 3 additional commits in pairx branch, right after the common ancestor i.e. the branching point

git log pairx..master # to show those 4 commits.

OrderedDict == LinkedHashMap

collections.OrderedDict can be useful in some coding interviews. As Rahul pointed out, interviewers are likely interested in your skill implementing it.

  • similarly preference — http endpoint skill is less valued than socket programming skill
  • Counterexample — interviewers are seldom interested in how you implement a priority queue.

In all cases, some knowledge of actual implementation is a small halo, as Venkat shows

–impl:

doubly-linked circular list, as explained on https://stackoverflow.com/questions/33748340/how-does-pythons-ordereddict-remember-elements-inserted, and in the source code link

http://code.activestate.com/recipes/496761-a-more-clean-implementation-for-ordered-dictionary/ shows a non-standard implementation, but del is not O(1)

2 reasons: BM is poor model for bond price

Reason 1 — terminal value is known. It’s more than deterministic. It’s exactly $100 at maturity. Brownian Motion doesn’t match that.

Reason 2 — drift estimate is too hard too sensitive. A BM process has a drift value. U can be very careful very thorough to estimate it, but any minor change in the drift estimate would result in very large differences in the price evolution, if the bond’s lifespan is longer than 10Y.

 

[18] Integer(like String)objects always immutable: java+python #XR

Integer(like String)objects always immutable in java. My google search confirmed that.

Beside serialization, there is no practical reason to deep-copy them.

Python integer objects are also immutable. Every time we modify an int, the new value has a different id(). See also my blog post on python immutable types.

See also https://medium.com/@meghamohan/mutable-and-immutable-side-of-python-c2145cf72747

string comparison is costly #loop-unroll`

P120 [[ algos in a nutshell ]] claims that string comparison is consider expensive.

String comparison requires byte-by-byte comparison in a loop. Sometimes people need to write this loop by hand in assembly, and sometimes unroll the loop to fill the instruction pipeline.

I have no reason to doubt the authors. I believe this practice proved effective in practice.

factoryMethod: based on arg ^ host object

I used to believe a factory method always relies on arg to tell it what type of instance to manufacture.

Now I know you can call myFactoryInstance.makeTrade(); // no arg, but the host object provides the clue as to what type of instance to manufacture. The factory method is virtual.

See https://stackoverflow.com/questions/5739611/differences-between-abstract-factory-pattern-and-factory-method

python usage in FI quant lib #Pimco

In one of world’s biggest fixed income buy-side firms’ quant library, the codebase is 3/4 c++ and ¼ python including pandas, numpy, machine learning, grid computing modules. I think this is similar to Macquarie FICC quant lib.

C++ is much faster, but data structures are very limited including STL containers.

I think the funds hold mostly bonds and mortgages. How about futures, IRS? Perhaps for hedging?

Q: bond price change when yield goes to zero

Can bond yield become negative? Yes 2015-2017 many bonds traded at negative yield. https://www.ft.com/content/312f0a8c-0094-11e6-ac98-3c15a1aa2e62 shows a realistic example of a vanilla bond trading at $104. Yield is negative –You pay $104 now and will get $100 repayment so you are guaranteed to lose money.

Mathematically, when yield approaches negative 100 bps, price goes to infinity.

When yield approaches zero, bond price would go to the arithmetic sum of all coupons + repayment.

gdb symbol-loading too time-consuming

After I attach gdb, it immediately starts a prolonged symbol loading process. It’s better to skip the loading, and selectively load some symbols.

https://ascending.wordpress.com/2007/09/02/a-couple-of-gdb-tricks/ describes how to use ~/.gdbinit, but I had no permission to write to the ~/

gdb -iex ‘set auto-solib-add off’ …. # worked

–loading a particular *.so file

I got “No loaded shared libraries match the pattern” and fixed it by

shar file.name.so #instead of /full/path/to/file.name.so.some.version

ExecType[150] to complement 35=8

Tag 150 shows the execution report type, so it only (and always, according to Rahul) accompanies 35=8 not other 35=* messages.

With 35=8 alone, we won’t know what type report this is.

The first execution report we get on a given order should show 150=0 i.e. new, not a fill. I believe only the real exchange (not the intermediate routing nodes) would send this message.

I think sometimes we get 35=8;150=A i.e. pending-new, presumably before 150=0. Why? I don’t think we need to bother now.

template template param

Andrei Alexandrescu briefly introduced this technique .. I think he said it is one of the most powerful TMP techniques.

One common feature of templ-templ param — the param P “template<typename>P” is usually used as “P<S>”, where S is a sister parameter to P !

http://www.informit.com/articles/article.aspx?p=376878 has a good tutorial.

https://stackoverflow.com/questions/213761/what-are-some-uses-of-template-template-parameters shows operator<<(). I have it in my github code.

The variadic version is broken because the template tries to accept any class template having 1 or more parameters in the definition. This includes std::basic_string, which already has an operator<<() that conflicts with the variadic version.

This also helps me understand that the 2-param version (using templ-templ) is really a template that accepts any class-template having two type-params by definition. Note the two type-params can use default values.

 

need a safe invalid value for a c++float@@ NaN

— For c++ float, if you need a safe “invalid” value, there’s NaN, with standard support like std::isnana() etc

— For c++ int, you need to pick a actually-valid number like INT_MAX.

Q: How do you find a special value to indicate “variable has an invalid value”?
%%A: I think you need separate boolean flag.
A: boost::optional #NaN #a G9 boost construct is exactly designed for this