[20] SG tech talent pool=insufficient: expertise^GTD

Listening to LKY’s final interviews (2013 ?), I have to agree that Singapore — counting citizens+PRs — doesn’t have enough technical talent across many technical domains, including software dev. https://www.gov.sg/article/why-do-we-need-skilled-foreign-workers-in-singapore is a 2020 publication, citing software dev as a prime example.

A telltale sign of the heavy reliance on foreign talent — If an employer favors a foreigner, it faces penalty primarily (Russell warning) in the form of ban on EP application/renewal. This penalty spotlights the reliance on EPs at multinationals like MLP, Goog, FB, Macq.

The relatively high-end dev jobs might be 90% occupied by foreigners, not citizens like me. I can recall my experience in OC, Qz, Macq, INDEED.com interview… Why? One of the top 2 most obvious reasons is the highly selective interview. High-end tech jobs always feature highly selective tech interviews — I call it “Expertise screening”.

Expertise is unrelated to LocalSys knowledge. LocalSys is crucial in GTD competence.

As I explained to Ashish and Deepak CM, many GTD-competent [1] developers in SG (or elsewhere) are not theoretical enough, or lack the intellectual curiosity [1], to pass these interviews. In contrast, I do have some Expertise. I have proven it in so many interviews, more than most candidates.

(On a side note, even though I stand out among local candidates, the fact remains that I need a longer time to find a job in SG than Wall St. )

[1] As my friend Qihao explained, most rejected candidates (including Ashish) probably have the competence to do the job, but that’s not his hiring criteria. That criteria is too low.  Looks like SG has some GTD-competent developers but not enough with expertise or curiosity.

— Math exams in SG and China

Looking at my son’s PSLE math questions, I was somehow reminded that the real challenge in high-end tech IV is theoretical/analytical skills — “problem-solving” skill as high-end hiring teams say, but highly theoretical in nature. This kind of analytical skill including vision and pattern recognition is similar to my son’s P5 math questions.

In high-end tech IV, whiteboard algo and QQ are the two highly theoretical domains. ECT and BP are less theoretical.

What’s in common? All of these skills can be honed (磨练). Your percentile measures your abilities + effort (motivation, curiosity[1]). I’m relatively strong in both abilities and effort.

So I know the math questions are similar in style in SG and China. I have reason to believe East-European and former-Soviet countries are similar. I think other countries are also similar.

rvalue objects before/after c++11

Those rvalue objects i.e. unnamed temp objects have been around for years. So how is rvr needed to handle rvalue-objects?

  • C++11 added language features (move,forward,rvr..) only to support resource stealing where resource is almost always some heapy thingy.
  • Before c++11, rvalue objects don’t need a special notation and don’t need a special handle (i.e. rvr). They are treated just like a special type of object. You were able to steal resources, but error-prone and unsafe.

private-Virtual functions: java^c++

Q: in c++ and java, is private virtual function useful?
A: both languages allow this kind of code to compile. C++ experts uses it for some advanced purpose but in java, any private methods are effective non-virtual, so any subclass method is unrelated to the baseclass private method.

— C++ is more nuanced
The trigger of this blogpost is P68 [[c++ coding standard]] by two top experts Sutter and Alexandrescu, but I find this “coding standard” unconvincing.

Private virtual functions seem to be valuable in some philosophical sense but I don’t see any practical use.

— java
See also hiding^overriding in java

Q: beside hiding (static methods), overriding and overloading, does java support another mechanism where subclass can redefine a (non-static) method?

  • in GTD this is very low value.
  • in terms of zbs and expertise this actually reveals something fundamental, esp. between java and c++
  • .. also highlights some important but lesser-known details of java inheritance
  • in terms of IV, this can be a small halo whenever we talk about overriding^overloading

A: Code below is not overriding nor overloading but it does compile and run, so yes this is another mechanism. I will call it “xxxx” or “redefinition”. The xxxx method is unrelated to the baseclass private method so compiler has no confusion. (In contrast, With overriding and overloading, compiler or the runtime follows documented rules to pick an implementation. )

Note if super and subclasses both use “public”, we get overriding. Therefore, “xxxx” requires “private in superclass, public in subclass”

Code below is based on https://stackoverflow.com/questions/19461867/private-method-in-inheritance-in-java

public class A {
    private void say(int number){
        System.out.print("A:"+number);
    }
}
public class D extends A{
    // a public method xxxx/redefining a baseclass private method 
    public void say(int number){
        System.out.print("Over:"+number);
    }
}
public class Tester {
    public static void main(String[] args) {
        A a=new D();
        //a.say(12); // compilation error ... Private
        ((D)a).say(12); //works
    }
}

array/deque based order book

For HFT + mkt data + internal matching or market making .. this is a favorite interview topic, kind of similar to retransmission.

==== For a first design, look at … https://web.archive.org/web/20141222151051/https://dl.dropboxusercontent.com/u/3001534/engine.c has a brief design doc, referenced by https://quant.stackexchange.com/questions/3783/what-is-an-efficient-data-structure-to-model-order-book

  • TLB?
  • cache efficiency

— no insert/delete of array

Order cancel and full-fill both results in deletion of an array element .. shifting. Random inserts mid-stream also requires shifting in the array. To preempt shifts, the design adopted in engine.c is “one array element for every possible price point”.

  1. when an existing order gets wiped out, its quantity becomes zero. It’s also possible to use a flag, but zero quantity is probably more space efficient.
  2. when we get a limit order at a new price of 9213, we don’t insert but update the dummy price point object in-situ.

What if all the price points in use are only 0.01% of the price points allocated? To answer that question, we need some estimates of the likely price levels and the footprint of the array element. Luckily, price levels are not floating points but integers essentially — a key constraint in the requirement.

  • An array element can be very space efficient — a nullable pointer.
  • Alternatively, it can be an integer subscript into another array of “received price points”. Dummy price point would be represented by “0”, a special subscript. Subscript can be double-byte, a big saving cf 8-byte pointers.
  • Likely price levels could range from 30D min to 30D max plus some margin. Such a range would be up to 10,000 price levels.
    • But What if price plunges at SOD or mid-day? “Not sure how my company system was designed, but here’s my idea –” we would need to allocate then initialize a new array of price levels. Deque would help.
  • Unlikely price levels (for outlier orders) would be hosted in a separate data structure, to support a super low bid (or super high ask). These outlier orders can tolerate longer delays.
  • a deque would support efficient insert near Both ends.

Good design for a penny stock (few price levels), but …

Q: how about a “pricey stock”? Their price levels would be so numerous , almost like float 😦
A: Take the bid book for example. The top 10,000 price levels can use the penny-stock  dequeue design.

==== vector insert as a second design

A fairly complete reference implementation for Nasdaq’s ITCH protocol supports order lookup, order reduction (partial fill or full fill), order cancel and  new order. New order at a fresh price level uses vector::insert —
  • If this insertion happens near the end of the vector (top of book), then the elements shifted would be very few
  • if the market slowly declines, then the best bid would be wiped out, shrinking the vector at the back end. New price levels would be inserted at lower price points, but again the elements above it would be few.
  • If this insertion happens far from the back end, as in an outlier order, then lots of elements shifted, but this scenario is rare.

Note a single element’s footprint can be very small, as explained earlier.

==== switch to a RBTree if some abnormal price movement detected. This idea may sound crazy, but consider RBTree used inside java8 hashmap

bSearch in array^BSTree

A) bSearch in sorted array
B) bSearch in a BST

Both are logN but here are some key differences:

  • #1 diff (for static data): d-cache efficiency
    • .. if data volume is larger than L3 cache or even main memory, then you may hit page faults and need virtual memory, then the d-cache benefit is reduced
  • diff: dynamic .. i.e. insert/deletes. array is hopeless
    • .. what if inserts and deletes only happen at the two ends? Then a dequeu is still usable and possibley faster than BST
  • diff: BST may become unbalanced, derailing the logN performance. Self-balancing BST is easily available.