social class]U.S. n%%chosen tech domain

I used to feel US is a less class-conscious society than China or Singapore. Anyone can make it in this “free”, meritocratic country. Then “insiders” tell me about the old boy’s circle, and the alumni circles on Wall St.

I feel in any unequal, hierarchical society, there are invisible walls between social strata. I was lucky to be an immigrant in technology. If I step out of tech into management, I am likely to face class, racial bias/affinity and … I would no longer be “in-demand” as in tech. Look at the number of Chinese managers in GS. Many make VP but few rise further.

Therefore the tech role is a sweet spot for an immigrant techie like me. Beside Tech, a few professions are perhaps less hierarchical – trading, medical, academic, research(?), teaching …

quant developer requirements

Many quant developers (in our department) program in c# (for excel

plugin) or build infrastructure code modules around quant lib, but

they don't touch c++ quant business logic classes. C++ quant lib

(model) programming is reserved for the mathematicians, typically

PhD's.

Many of these non-C++ quant developers have good product knowledge and

can sometimes move into business side of trading.

I was told these quant developers don't need advanced math knowledge.

—-quant interviews

Mostly C++ questions. Most candidates are filtered out here.

2nd group – probability, (different from statistics)

Some finance intuitions (eg — each item in the BS formula)

Some brain teasers

— some typical C++ questions (everything can be found from the Scott

Meyers books)

exceptions during ctor/dtor

virtual ctor

Given a codebase, how do you detect memory leak

multiple inheritance (fairly common in practice)

threading

—–

[[heard on the street]] and [[A Practical Guide To Quantitative

Finance Interviews]]

Another book by Shreve.

##some benefits@learning c++, even if no salary increase

  1. After learning c++, i am fairly confident I could if I must pick up c# in a few (4?) months and start passing interviews. C++ is inherently tougher than java and C#. Java and C# both have large libraries, but the core languages are significantly simpler/cleaner than c++.
  2. After learning C++, i have found python and perl easier to understand and master since both are written in C/C++. I now believe some people who claim they could pick up a new language in a few months. Those languages have their roots in C/C++.
    • The basic challenges of scope+namespace, object lifetime, heap/stack, pointers, memory allocation, object construction, pass-by-ref/value, arrays, function pointer, exceptions, nested struct+array+pointer… are faced by every language designer. Many of these challenges depend on basic library, which is invariably C.
    • The common OO challenges of inheritance, virtual, static/non-static, HAS-A/IS-A, constructor, downcast, … are faced by every OO language designer. Many of them borrow from java, which borrows from C++ and smalltalk
  3. threading — java remains the gold standard but c++ currency support is more complex, harder to understand and offers some low-level insight
  4. memory management — c++ offers insight into JVM and CLR
  5. c++ gave me other insight into java, esp. GC, JVM, overriding, references, heap/stack, sizeof, …

[[c++recipes]] mv-semantic etc

I find this book rather practical. Many small programs are fully tested and demonstrated.

This 2015 Book covers cpp14.

–#1) RVR(rval ref) and move semantic:
This book offers just enough detail (over 5-10 pages) to show how move ctor reduces waste. Example class has a large non-ref field.

P49 shows move(), but P48 shows even without a move() call the compiler is able to *select* the move ctor not copy-ctor when passing an instance into a non-ref parameter. The copy ctor is present but skipped!

P49 shows an effective mv-ctor can be “=default; “

–custom new/delete to trace memory operations
Sample code showing how the delete() can show where in source code the new() happened. This shows a common technique — allocating an additional custom memory header when allocating memory.

This is more practical than the [[effC++]] recipe.

There’s also a version for array-new. The class-specific-new doesn’t need the memory header.

–other
A simple example code of weak_ptr.

a custom small block allocator to reduce memory fragmentation

Using promise/future to transfer data between a worker thread and a boss thread

mv-semantic: keywords

I feel all the tutorials seem to miss some important details and selling a propaganda. Maybe [[c++ recipes]] is better?

[s = I believe std::string is a good illustration of this keyword]

  • [s] allocation – mv-semantic efficiently avoids memory allocation on heap or on stack
  • [s] resource — is usually allocated on heap and accessed via a pointer field
  • [s] pointer field – every tutorial shows a class with a pointer field. Note a reference field is much less common.
  • [s] deep-copy – is traditional. Mv-semantics uses some special form of shallow-copy. Has to be carefully managed.
  • [s] temp – the RHS of mv-semantic must strictly be a temp object. I believe by using the move() function and the r-val reference (RVR) we promise to the compiler not to access the temp object afterwards. If we access it, i guess bad things could happen. Similar to UndefBehv? See [[c++standard library]]
  • promise – see above
  • containers – All standard STL container classes (including std::string) provide mv-semantics. Here, the entire container instance is the payload! Inserting a float into a container won’t need mv-semantics.
  • [s] expensive — allocation and copying assumed expensive. If not expensive, then the move is not worthwhile.
  • [s] robbed — the source object of the move is crippled, robbed, abandoned and should not be used afterwards. Its “resource” is already stolen, so the pointer field to that resource should be set to NULL.

——–
http://www.boost.org/doc/libs/1_59_0/doc/html/move/implementing_movable_classes.html says “Many aspects of move semantics can be emulated for compilers not supporting rvalue references and Boost.Move offers tools for that purpose.” I think this sheds light…

mv-semantic : use cases rather few

I think the use case for mv-constructs is tricky. In many simple contexts mv-constructs actually don’t work.

Justification for introducing mv-semantic is clearest in one scenario — a short-lived but complex stack object is passed by value into a function. The argument object is a temp copy — unnecessary.

Note the data type should be a complex type like containers (including string), not an int. In fact, as explained in the post on “keywords”, there’s usually a pointer field and allocation.

Other use cases are slightly more complex, and the justification is weaker.

Q: [[c++standard library]] P21 says ANY nontrivial class should provide a mv ctor AND a mv-assignment. Why? (We assume there’s pointer field and allocation involved if no mv-semantics.)
%%A: To avoid making temp copies when inserting into container. I think vector relocation also benefits from mv-ctor

[[c++forTheImpatient]] P640 shows that sorting a vector of strings can benefit from mv-semantic. Swapping 2 elements in the vector requires a pointer swap rather than a copying strings

returning RVR #Josuttis

My rule  of thumb is to avoid a RVR return type, even though Josuttis did NOT forbid it by saying anything like return type should never be rvr.

Instead of an rvr return type, I feel in most practical cases, we can achieve the same result using a nonref return type. I think such a function call usually evaluates to a nameless temp object i.e a naturally-occurring rvalue object.

[[Josuttis]] (i.e. the c++Standard library) P22 explains the rules about returning rval ref.

In particular, it’s a bad idea to return a newly-created stack object by rvr. This object is a nonstatic local and will be wiped out after the function returns.

(It’s equally bad to return this object by l-value reference.)

 

lambda meets template

In cpp, java and c#, The worst part of lambda is the integration with (parametrized) templates.

In each case, We need to understand the base technology and how that integrates with templates, otherwise you will be lost. The base technologies are (see post on “lambda – replicable”)
– delegate
– anon nested class
– functor

Syntax is bad but not the worst. Don’t get bogged down there.

lambda is more industry-standard than delegate

Before java and c++ introduced lambada, I thought delegate is the foundation of lambdas.

Now I think lambda is an industry standard, implemented differently in c++ and java. See post on “lambda – replicable”. For python…

Bear in mind
A) the most fundamental, and pure definition of lambda — a function as rvalue, to be passed in as argument to other functions.

B1) the most common usage is sequence processing in c#, java and c++
* c# introduced lambda along with linq
* java introduced lambda along with streams

B2) 2nd common usage is event handler including GUI.

See post on “2 fundamental categories”

noSQL top 2 categories: HM^json doc store

Xml and json both support hierarchical data, but they are basically one data type. Each document is the payload. This is the 2nd category of noSQL system. #1 category is the key-value store i.e hashmap, the most common category. The other categories (columnar, or graph) aren’t popular in finance projects I know,

  • coherence/gemfire/gigaspace – HM
  • terracotta – HM
  • memcached – HM
  • oracle NoSQL – HM
  • Redis – HM
  • Table service (name?) in Windows Azure – HM
  • mongo – document store (json)
  • CouchDB – document store (json)
  • Google BigTable – columnar
  • HBase – columnar

big data feature: variability-in-biz-Value

RDBMS – every row is considered “high value”. In contrast, a lot of data items in a big data store is considered low-value.

The oracle nosql book refers to it as “variability of value”. The authors clearly think this is a major feature, a 4th “V” beside Volume, Velocity and Variety-of-data-format.

As a result, data loss is often tolerable in big data systems (but never acceptable in RDBMS). Exceptions, IMHO:
* columnar database such as kdb
* Quartz, SecDB

big data tech feature: scale out

Scalability is driven by one of the 4 V’s — Velocity, aka throughput.

Disambiguation: having many machines to store the data as readonly isn’t “scalability”. Any non-scalable solution could achieve that without effort.

Big data often requires higher throughput than RDBMS could support. The solution is horizontal rather than vertical scalability.

I guess gmail is one example. Requires massive horizontal scalability. I believe RDBMS also has similar features such as partitioning, but not sure if is economical. See posts on “inexpensive hardware”.

The Oracle nosql book suggests noSQL compared to RDBMS, is more scalable — 10 times or more.

RDBMS can also scale out — PWM used partitions.

noSQL and ACID

See big data feature: variability-in-biz-Value, the 4th V of big data.

A noSQL software could support transactions as RDBMS does, but the feature support is minimal in noSQL, according to the Oracle noSQL book.

Transactions slow down throughput, esp. write-throughput like create/update/delete. Read throughput is also affected because of locking, among other things.

In a big data site, not all data items are high value, so ACID transaction properties may be overkill and not worthwhile.

— The A/C/I/D

  • Atomicity — the most visible, best-known feature, but often overshadows the other three features
  • Consistency — mostly about invariants. If a transaction meets all the validations and constraints (and commits), and they are comprehensively defined, then the operation is very likely to be correct. However, if the invariants rules are simplistic and superficial, then consistency doesn’t mean much. The data may be incorrectly written.
  • Isolation — mostly about concurrent operation, which should not affect the final state of the everything after the dust settles. Concurrent or serialized operation should leave the data store in the same state.
  • Durability — is about back-up and redo log. A power-failure in the middle of a transaction should roll back that transaction and have all earlier operations reflected in the restored data store.

 

noSQL feature #1 – unstructured

I feel this is the #1 feature. RDBMS data is very structured. Some call it rigid.
– Column types
– unique constraints
– non-null constraints
– foreign keys…
– …

In theory a noSQL data store could have the same structure but usually no. I believe the noSQL software doesn’t have such a rich and complete feature set as an RDBMS.

I believe real noSQL sites usually deal with unstructured data. “Free form” is my word.

Rigidity means harder to change the “structure”. Longer time to market. Less nimble.

What about BLOB/CLOB? Supported in RDBMS but more like a afterthought. There are specialized data stores for them. Some noSQL software may qualify.

Personally, I feel RDBMS (like unix, http, TCP/IP…) prove to be flexible, adaptable and resilient over the years. So I would often choose RDBMS when others prefer a noSQL solution.

WallSt friends’ comment@slow coder,deadlines

This is life-n-death:  if you are not adding enough value you are out…

With important exceptions (Stirt, Lab49..) Wall street systems are stringent about time line, less about system failures, even less about maintainability or total cost of ownership or Testing. I feel very few (like 5%) Wall St systems are high precision and I include the pricing, risk, trade execution systems. Numerical accuracy is important to the users though, because those numbers are about the only thing they can understand. Their job is about control on those numbers.

In City muni, Boris’s code was thrown out because it didn’t make it to production. Any production code is controlled not by dev team but many many layers of control measures. So my production code in Citi will live.

If you are slow, Anthony Lin feels they may remove you and get a replacement to hit the deadline. If they feel it’s hard to find replacement and train him up, then they keep you – all about time lines.

Hou Li felt your effort does protect you – 8.30 – 6pm everyday. If still slow, then manager may agree estimate is wrong. She felt deadline and effort estimate are arbitrary. However, if you are obviously slower than peers, then boss knows it.

equivalent FX(+option) trades, succinctly

The equivalence among FX trades can be confusing to some. I feel there are only 2 common scenarios:

1) Buying usdjpy is equivalent to selling jpyusd.
2) Buying usdjpy call is equivalent to Buying jpyusd put.

However, Buying a fx option is never equivalent to Selling an fx option. The seller wants (implied) vol to drop, whereas the buyer wants it to increase.

left skew~left side outliers~mean PULLED left

Label – math intuitive

[[FRM]] book has the most intuitive explanation for me – negative (or left) skew means outliers in the left region.

Now, intuitively, moving outliers further out won’t affect median at all, but pulls mean (i.e. the balance point) to the left. Therefore, compared to a symmetrical distribution, mean is now on the LEFT of median. With bad outliers, mean is pulled far to the left.

Intuitively, remember mean point is the point to balance the probability “mass”.

In finance, if we look at the signed returns we tend to find many negative outliers (far more than positive outliers). Therefore the distribution of returns shows a left skew.