I used to feel US is a less class-conscious society than China or Singapore. Anyone can make it in this “free”, meritocratic country. Then “insiders” tell me about the old boy’s circle, and the alumni circles on Wall St.
I feel in any unequal, hierarchical society, there are invisible walls between social strata. I was lucky to be an immigrant in technology. If I step out of tech into management, I am likely to face class, racial bias/affinity and … I would
no longer be “in-demand” as in tech. Look at the number of Chinese managers in GS. Many make VP but few rise further.
Therefore the tech role is a sweet spot for an immigrant techie like me. Beside Tech, a few professions are perhaps less hierarchical – trading, medical, academic, research(?), teaching …
Many quant developers (in our department) program in c# (for excel
plugin) or build infrastructure code modules around quant lib, but
they don't touch c++ quant business logic classes. C++ quant lib
(model) programming is reserved for the mathematicians, typically
Many of these non-C++ quant developers have good product knowledge and
can sometimes move into business side of trading.
I was told these quant developers don't need advanced math knowledge.
Mostly C++ questions. Most candidates are filtered out here.
2nd group – probability, (different from statistics)
Some finance intuitions (eg — each item in the BS formula)
Some brain teasers
— some typical C++ questions (everything can be found from the Scott
exceptions during ctor/dtor
Given a codebase, how do you detect memory leak
multiple inheritance (fairly common in practice)
[[heard on the street]] and [[A Practical Guide To Quantitative
Another book by Shreve.
* exposure to pricing decisions — the most important decisions
* closer to traders and their decision support
* closer to profit center
I find this book rather practical. Many small programs are fully tested and demonstrated.
This 2015 Book covers cpp14.
–#1) RVR(rval ref) and move semantic:
This book offers just enough detail (over 5-10 pages) to show how move ctor reduces waste. Example class has a large non-ref field.
P49 shows move(), but P48 shows even without a move() call the compiler is able to *select* the move ctor not copy-ctor when passing an instance into a non-ref parameter. The copy ctor is present but skipped!
P49 shows an effective mv-ctor can be “=default; “
–custom new/delete to trace memory operations
Sample code showing how the delete() can show where in source code the new() happened. This shows a common technique — allocating an additional custom memory header when allocating memory.
This is more practical than the [[effC++]] recipe.
There’s also a version for array-new. The class-specific-new doesn’t need the memory header.
A simple example code of weak_ptr.
a custom small block allocator to reduce memory fragmentation
Using promise/future to transfer data between a worker thread and a boss thread
I feel all the tutorials seem to miss some important details and selling a propaganda. Maybe [[c++ recipes]] is better?
[s = I believe std::string is a good illustration of this keyword]
- [s] allocation – mv-semantic efficiently avoids memory allocation on heap or on stack
- [s] resource — is usually allocated on heap and accessed via a pointer field
- [s] pointer field – every tutorial shows a class with a pointer field. Note a reference field is much less common.
- [s] deep-copy – is traditional. Mv-semantics uses some special form of shallow-copy. Has to be carefully managed.
- [s] temp – the RHS of mv-semantic must strictly be a temp object. I believe by using the move() function and the r-val reference (RVR) we promise to the compiler not to access the temp object afterwards. If we access it, i guess bad things could happen. Similar to UndefBehv? See [[c++standard library]]
- promise – see above
- containers – All standard STL container classes (including std::string) provide mv-semantics. Here, the entire container instance is the payload! Inserting a float into a container won’t need mv-semantics.
- [s] expensive — allocation and copying assumed expensive. If not expensive, then the move is not worthwhile.
- [s] robbed — the source object of the move is crippled, robbed, abandoned and should not be used afterwards. Its “resource” is already stolen, so the pointer field to that resource should be set to NULL.
http://www.boost.org/doc/libs/1_59_0/doc/html/move/implementing_movable_classes.html says “Many aspects of move semantics can be emulated for compilers not supporting rvalue references and Boost.Move offers tools for that purpose.” I think this sheds light…
I think the use case for mv-constructs is tricky. In many simple contexts mv-constructs actually don’t work.
Justification for introducing mv-semantic is clearest in one scenario — a short-lived but complex stack object is passed by value into a function. The argument object is a temp copy — unnecessary.
Note the data type should be a complex type like containers (including string), not an int. In fact, as explained in the post on “keywords”, there’s usually a pointer field and allocation.
Other use cases are slightly more complex, and the justification is weaker.
Q: [[c++standard library]] P21 says ANY nontrivial class should provide a mv ctor AND a mv-assignment. Why? (We assume there’s pointer field and allocation involved if no mv-semantics.)
%%A: To avoid making temp copies when inserting into container. I think vector relocation also benefits from mv-ctor
[[c++forTheImpatient]] P640 shows that sorting a vector of strings can benefit from mv-semantic. Swapping 2 elements in the vector requires a pointer swap rather than a copying strings
My rule of thumb is to avoid a RVR return type, even though Josuttis did NOT forbid it by saying anything like return type should never be rvr.
Instead of an rvr return type, I feel in most practical cases, we can achieve the same result using a nonref return type. I think such a function call usually evaluates to a nameless temp object i.e a naturally-occurring rvalue object.
[[Josuttis]] (i.e. the c++Standard library) P22 explains the rules about returning rval ref.
In particular, it’s a bad idea to return a newly-created stack object by rvr. This object is a nonstatic local and will be wiped out after the function returns.
(It’s equally bad to return this object by l-value reference.)
In cpp, java and c#, The worst part of lambda is the integration with (parametrized) templates.
In each case, We need to understand the base technology and how that integrates with templates, otherwise you will be lost. The base technologies are (see post on “lambda – replicable”)
– anon nested class
Syntax is bad but not the worst. Don’t get bogged down there.
Before java and c++ introduced lambada, I thought delegate is the foundation of lambdas.
Now I think lambda is an industry standard, implemented differently in c++ and java. See post on “lambda – replicable”. For python…
Bear in mind
A) the most fundamental, and pure definition of lambda — a function as rvalue, to be passed in as argument to other functions.
B1) the most common usage is sequence processing in c#, java and c++
* c# introduced lambda along with linq
* java introduced lambda along with streams
B2) 2nd common usage is event handler including GUI.
See post on “2 fundamental categories”
Xml and json both support hierarchical data, but they are basically one data type. Each document is the payload. This is the 2nd category of noSQL system. #1 category is the key-value store i.e hashmap, the most common category. The other categories (columnar, or graph) aren’t popular in finance projects I know,
- coherence/gemfire/gigaspace – HM
- terracotta – HM
- memcached – HM
- oracle NoSQL – HM
- Redis – HM
- Table service (name?) in Windows Azure – HM
- mongo – document store (json)
- CouchDB – document store (json)
- Google BigTable – columnar
- HBase – columnar
RDBMS – every row is considered “high value”. In contrast, a lot of data items in a big data store is considered low-value.
The oracle nosql book refers to it as “variability of value”. The authors clearly think this is a major feature, a 4th “V” beside
Volume, Velocity and Variety-of-data-format.
As a result, data loss is often tolerable in big data (but never acceptable in RDBMS). Exceptions, IMHO:
* columnar DB
* Quartz, SecDB
See post on variability
Economics — data volume often necessitates inexpensive storage. Commodity hardware is a key feature of big data.
“Inexpensive” helps scale-out (aka horizontal scaling). Just add more nodes. In contrast, RDBMS requires scale-up to bigger machines. See other posts on scale-out.
Scalability is driven by one of the 4 V’s — Velocity, aka throughput.
Disambiguation: having many machines to store the data as readonly isn’t “scalability”. Any non-scalable solution could achieve that without effort.
Big data often requires higher throughput than RDBMS could support. The solution is horizontal rather than vertical scalability.
I guess gmail is one example. Requires massive horizontal scalability. I believe RDBMS also has similar features such as partitioning, but not sure if is economical. See posts on “inexpensive hardware”.
The Oracle nosql book suggests noSQL compared to RDBMS, is more scalable — 10 times or more.
RDBMS can also scale out — PWM used partitions.
See post on “variability”, the 4th V of big data.
A noSQL software could support transactions as RDBMS does, but the feature support is minimal in noSQL, according to the Oracle noSQL book.
Transactions slow down throughput, esp. write-throughput.
In a big data site, not all data items are high value, so ACID properties may not be worthwhile.
I feel this is the #1 feature. RDBMS data is very structured. Some call it rigid.
– Column types
– unique constraints
– non-null constraints
– foreign keys…
In theory a noSQL data store could have the same structure but usually no. I believe the noSQL software doesn’t have such a rich and complete feature set as an RDBMS.
I believe real noSQL sites usually deal with unstructured data. “Free form” is my word.
Rigidity means harder to change the “structure”. Longer time to market. Less nimble.
What about BLOB/CLOB? Supported in RDBMS but more like a afterthought. There are specialized data stores for them. Some noSQL software may qualify.
Personally, I feel RDBMS (like unix, http, TCP/IP…) prove to be flexible, adaptable and resilient over the years. So I would often choose RDBMS when others prefer a noSQL solution.
This is life-n-death: if you are not adding enough value you are out…
With important exceptions (Stirt, Lab49..) Wall street systems are stringent about time line, less about system failures, even less about maintainability or total cost of ownership or Testing. I feel very few (like 5%) Wall St systems are high precision and I include the pricing, risk, trade execution systems. Numerical accuracy is important to the users though, because those numbers are about the only thing they can understand. Their job is about control on those numbers.
In City muni, Boris’s code was thrown out because it didn’t make it to production. Any production code is controlled not by dev team but many many layers of control measures. So my production code in Citi will live.
If you are slow, Anthony Lin feels they may remove you and get a replacement to hit the deadline. If they feel it’s hard to find replacement and train him up, then they keep you – all about time lines.
Hou Li felt your effort does protect you – 8.30 – 6pm everyday. If still slow, then manager may agree estimate is wrong. She felt deadline and effort estimate are arbitrary. However, if you are obviously slower than peers, then boss knows it.
The equivalence among FX trades can be confusing to some. I feel there are only 2 common scenarios:
1) Buying usdjpy is equivalent to selling jpyusd.
2) Buying usdjpy call is equivalent to Buying jpyusd put.
However, Buying a fx option is never equivalent to Selling an fx option. The seller wants (implied) vol to drop, whereas the buyer wants it to increase.
Label – math intuitive
[[FRM]] book has the most intuitive explanation for me – negative (or left) skew means outliers in the left region.
Now, intuitively, moving outliers further out won’t affect median at all, but pulls mean (i.e. the balance point) to the left. Therefore, compared to a symmetrical distribution, mean is now on the LEFT of median. With bad outliers, mean is pulled far to the left.
Intuitively, remember mean point is the point to balance the probability “mass”.
In finance, if we look at the signed returns we tend to find many negative outliers (far more than positive outliers). Therefore the distribution of returns shows a left skew.