heap usage: C ilt C++/Java #Part 2

I now recall that when I programmed in C, my code never used malloc() directly.

The library functions probably used malloc to some extent, but malloc was advanced feature. Alexandrescu confirmed my experience and said that c++ programmers usually make rather few malloc() calls, each time requesting a large chunk. Instead of malloc, I used mostly local variables and static variables. In contrast, C++ uses heap much more:

  • STL containers are 99% heap-based
  • virtual functions require pointer, and the target objects are usually on heap, as Alexandrescu said on P78
  • pimpl idiom i.e. private implementation requires heap object, as Alexandrescu said on P78
  • the c++ reference is used mostly for pass-by-reference. Pass-by-reference usually works with heap objects.

In contrast, C++ uses small chunks of heap memory.

Across languages, heap usage is is slow because

  • In general OO programming uses more pointers more indirection and more heap objects
  • heap allocation is much slower than stack allocation, as Stroustrup explained to me
  • using a heap object, always always requires a runtime indirection. The heap object has no name, only an address !
  • In Garbabe-Collected languages, there’s one more indirection.
Advertisements

checked STL^checked java Collections

jdk checkedList, checkedMap etc are designed to check type errors — checking any newly added item has the correct type for the collection. See P 246 [[java generics]]

STL checked container checks very different coding errors. See http://www.informit.com/articles/article.aspx?p=373341, which is extracted from my book [[c++codingStd]]

closestMatch in sorted-collection: j^python^c++

–java is cleanest. P236 (P183 for Set) [[java generics]] lists four methods belonging to the NavigableMap interface

  • ceilingEntry(key) — closest entry higher or equal
  • higherEntry(key) — closest entry strictly higher than key
  • lowerEntry
  • floorEntry

mktData direct^Reuter feeds: realistic ibank set-up

Nyse – an ibank would use direct feed to minimize latency.

For some newer exchanges – an ibank would get Reuters feed first, then over a few years replace it with direct feed.

A company often gets duplicate data feed for the most important feeds like NYSE and Nasdaq. The rationale is discussed in [[all about hft]]

For some illiquid FX pairs, they rely on calculated rates from Reuters. BBG also calculate a numerically very similar rate, but BBG is more expensive. Bbg also prohibits real time data redistribution within the ibank.

represent graph: matrix^edgeList #std::forward_list

Josh’s book [[data structures and other objects using c++]] has 5 pages devoted to three best-practice data structures representing a generic directed graph. (I think my [[discrete math] also covers the same topic.)

Note all these data structures can serialize graphs.

  1. boolean adjacency matrix — adj mat, unable to represent parallel edges between nodes A and B (rare) or self-to-self
  2. edge set — unable to represent parallel edges but can support complex graph algos having risk of traversing the same edge repeatedly. The set can be a field of a graph node, or a hash table {node-> edgeSet}
  3. edge list — can be implemented using std::forward_list, marginally faster better than hashset… not discussed further in this blogpost

pros and cons:

  • iterating all edges connected to a node — edge set wins. adjMat very poor.
  • sparse graph — adj mat wastes memory, not scalable.
  • test linkage between 2 nodes — adj mat and edgeSet are O(1)
  • ease of implementation — adj mat wins

Fundamental classification:

  • In a so-called dense graph, number of edges ~ O(VV) i.e. between any pair of nodes there’s likely an edge
  • In a so-called sparse graph, number of edges ~ O(V).

[[algo in a nutshell]] written by some professors, compares matrix and edge list. For a dense graph, P140 describes an intriguing edge list practical implementation to save space and speed up isConnected(node1, node2). Key insight —

   “Number all individual nodes using auto-incrementing ID numbers.”

After this, we will see that a node’s edge list may look like 2,3,4,5, 8,9 …which can be summarized as 2-4, 8-9. I felt this was a trivial trick but is actually a standard implementation. There’s a tradeoff though — inserting edge is now slower, but I feel O() could remain unchanged.

 

STL containers +! pointer

  1. std::array
  2. std::string SSO i.e. small-string optimization

These containers

  1. heap storare is not required
    • trivial special case — if this container is a nonref field of a Shape object, then q(new Shape) would still allocate the container on heap.
    • This is similar to “Are java primitives on heap/stack?”
  2. Therefore contains no pointer.
  3. Therefore move ctor runs in linear [1] time compared to constant-time move-ctor in other STL containers
    • std::array probably offers a move-ctor and move-assignment

[1] P205 [[effModernC++]] also addresses

Q: What happens when you move a std::array container with 77 Acct elements?
A: 77 individual Acct objects are moved to the new container, in linear time, invoking the move-ctor of Acct class

## time-honored textual data file formats

[[art of Unix programming]] has insights on it… Note Windows tend to use binary file formats, whereas the Unix tradition favors textual.

Standard cvs format is not well designed. Its use of double-quote is criticized.

The /etc/passwd format supports metachar embedding better than standard csv.

The “record-jar” format combines cookie-jar with RFC-822 format and is more human-friendly than xml, ini, and cvs. It has field names. Its values can be unstructured text. It also supports “stanza” records.

xml can be checked for well-formed even without a DTD or schema.

anon classes^lambda: java perf

Based on [[JavaPerm]] P381

  • An anon class requires an actual *.class file created by javac compiler and loaded from serialized form (usually disk).
  • No such class file for a lambda.

More than half the interview questions are about fancy theoretical knowledge, so this is knowledge valuable to interviews.

This difference has very limited performance impact, as of java 8. This performance perspective is unimportant to interviews, IMHO.java

 

windows registry: criticized

[[art of unix programming]] points out that Windows registry are sometimes corrupted. ( Remember the standard advice to always back up the registry before editing it? ) When corrupted, the registry “frequently” breaks the entire system and requires a completely reinstall, as the book claims.

  • weakness — the registry is a centralized hierarchical config data store shared in RW mode by all (dozens of) applications.
  • weakness — single point of failure

In contrast, Unix config data is decentralized, but I won’t elaborate.

Centralized hierarchical configuration sounds plausible but isn’t impressive in practice.

subclass ctor/dtor using virtual func

See https://stackoverflow.com/questions/13440375/invoking-virtual-method-in-constructor-difference-between-java-and-c

Suppose both superclass and subclass defines a virtual cleanup() method.

— c++

… you lose the virtual i.e. “dynamic dispatch” feature. The subclass instance is not present so the only the base class implementation of cleanup() could run.

–java: let’s focus on ctor

… the subclass implementation of cleanup() would run, even though the subclass instance is not initialized — dangerous! See P70 [[elements of java style]]

kernThr ] linux^Unix

Here are a few things I can recall, from [[UnderstandingLinuxKernel]].

P 104 lists some minor Linux kernel threads, but here are the important ones:

  • process 0 (swapper) is a kernel thread. This process gets scheduled when there’s no other process to run
  • process 1 (init) is a kernel thread. This process launches and monitors all other processes except process 0. It’s also the adopted parents of all orphaned zombies
  • both of them run forever. Most other linux kernel threads run for a short while and exit.

Linux kernThr is completely unrelated concept to traditional unix kernThr. More difference than similarities (zero?)

  • Unix kernel threads aka “native threads” concept was first known to me in jvm.  kernel scheduler can see native threads but not java green threads.
  • native threads often run user mode but linux kernel threads only run in kernel mode and very limited in usage. I think they are mostly set up to access hardware

Linux kernel threads are less fundamental than kernel routines including sysCall handlers + interrupt handlers.

  1. every Linux user process enter kernel mode via kernel routines
  2. not every user process interacts with kernel threads

minimum hardware requirement to run linux #historical observation

  1. Bell labs Unix, in its early days, were able to run on hardware considered “underpowered” even by the standard of that day — P33 [[art of unix programming]]. I believe contemporary kernels were unable to run on those low-end machines.
  2. Linux (P77) has a similar characteristic. In the 1990’s the big commercial Unixes targeted enterprise-class hardware but Linux emphasized doing more with less. Today, Linux powers 99% of the world’s most powerful supercomputers but Linux also runs on low-end or obsolete hardware.

In both cases, I feel that an OS designed with very low minimum hardware requirement turned out to be actually more efficient, more adaptable, more versatile, more powerful than their conventional competitors.

At what juncture would kernel scheduler run@@

Kernel scheduler has an algorithm and therefore implemented as a sequence of instructions. You can think of it as some black-box function/routine.

I think it is Not really a long-running background process. In Linux, I believe it is an on-demand routine, but not run on behalf of any process.

Background — Many familiar on-demand kernel routines do run on behalf of an “owner process” —

  • accessing a file or socket
  • accessing some device such as NIC
  • accessing memory

However, other on-demand kernel routines (often interrupt handlers) do not have an “owner process”. Here are some routines —

  • reacting to timer interrupts
  • reacting to low-level emergency hardware interrupts like …. ?

So the scheduler is a classic example. I think scheduler can get triggered by timer interrupts. See P 350 [[linux kernel]]

interrupt+signal ] kernel

There are precise answers in the kernel code, but here are my high-level, imprecise pointers, based on [[linux kernel]]:

  • Based on my “adapted Intel terminology”, interrupts from cpu are known as cpu Exceptions, whereas interrupts from other devices are known as hardware interrupts
    • interrupting hardware includes hardware clocks, keyboard, NIC and other I/O devices.
    • cpu exceptions are generated by problematic/erroneous instructions
  • LG2 — “software interrupts” concept is not so well-defined, and used primarily for 1) notifying debugger 2) implement system calls

Both hardware interrupts and cpu exceptions can generate SIG* signals. Signals are much higher-level constructs than the hardware-level interrupts. Signal as a concept is 50% kernel 50% UserMode:

  • A signal is always addressed to a user process.
  • signal delivery is kernel’s job; signal handling is user process’s job

I feel interrupts^signals are somewhat like cells^molecules — at different levels!

how is mkt data used ] buy-side FI analytics@@

This is a BIG bond asset manager… They use 2-factor HJM model, among others.

They use EOD market data for risk measure + risk sensitivity calculations. No real time.

Models were written by 40+ quants untrained in c++. The 16-strong IT team integrates the models

I asked “Do you use liquid fixed income market data mostly to calibrate models and use the model to price illiquid instruments?”

A: both

  • To calibrate model — every day, as explained in [[complete guide]] P436
  • To derive valuation directly on existing positions if the instruments are comparable (between ref data instrument and position instrment)

compiler loop-unrolling could hurt i-cache

[[art of unix programming]] P290/291 advocates (presumably based on experience) to bend over backward and ensure the central data structure + time-critical loop NEVER fall out of i-cache/d-cache.

I think this is useful sound byte to impress interviewers.

Developers need to know the size of L1 cache, L2 cache, L3 cache in our hardware. Then target the central data structure to one of them. Say L2 cache. We want to size our data structure to fit in there and never break the boundary.

“Small is beautiful” applies to both code and data.

Example 1 : cpu vs i-cache, IMHO more convincing than
Example 2 : cpu vs d-cache

In both examples, if cpu becomes fast enough then we ought to make it do more work in order to economize the caches.

— Example 1
Loop-unrolling enlarges text-section of the binary, so the additional “text” must compete for scarce instruction-cache.

[[art of unix programming]] claims that at least in some (not theoretical) cases, this compiler optimization may back fire i.e. hurt rather than improve performance.

The critical situation – suppose the loop runs in one of the extremely hot functions that need to remain in the instruction-cache, then we want to minimize code footprint.

Loading this hot function over and over from main memory can be worse than executing the original loop (without unrolling). This happens when cpu execution speed improves over the years but main memory access speed remains a bottleneck.

— Example 2
A second example “claimed” in the same paragraph, revolves around pre-computed look-up table for frequent access. This kind of optimization strategy can shave cpu cycles, but can increase the fierce competition for scarce L1 cache.

In other words, Pre-computing of a data table was designed to save cpu time but could backfire if the increased data footprint leads to data-cache misses.

We are again talking about extremely hot data that needs to remain in L1 cache. If this pre-computed data is accessed frequently, it can hog the L1 cache.

In such a case, removing the pre-computation, and re-computing each time, could be faster.

I think this problem can become real –
• If cpu is becoming much faster in recent years, or the computation gets simplified, then the cpu saving would diminish.
• At the same time, if the pre-computed data size grows larger, then the “hogging” would worsen.

case-insensitive string search #KMP #Ahbinav

[[c++cookbook]] P174 suggests to prepare the needle by creating an all-upper-case needle, but leave the (bigger) haystack unchanged.

I discussed with my friend Abhinav. With std::search, we should probably upper-case the haystack as well. The std::search implementation can run M*N char-comparisons.

Even with the efficient KMP algorithm, typical complexity is M+N char-comparisons. So my solution is no worse than the authors’ solution.

Liunx(!!Unix)kernel thread can! run user program

–“Liunx kernel thread cannot run user programs”, as explained in [[UnderstandingLinuxKernel]] P3.

Removing ‘kernel‘… Linux userland threads (based on LWP, explained below) do run user programs.

Removing ‘thread‘… Linux kernel processes or interrupt routines could possibly run under a user process pid.

Removing ‘Linux‘ … Other Unix variants do use kernel threads to run both 1) user programs and 2) kernel routines. This is my understanding from reading about JVM…

  • unix  — multi-threaded user apps =use=> LWP =use=> kernel threads
  • linux — multi-threaded user apps =use=> LWP. LWP is the basic unit of concurrency in user apps. Linux LWP is unrelated to Unix LWP.

Q: which thread/PID drains NicBuffer→socketBuffer

Too many kernel concepts here. I will use a phrasebook format. I have also separated some independent tips into hardware interrupt handler #phrasebook

  1. Scenario 1 : A single CPU. I start my parser which creates the multicast receiver socket but no data coming. My process (PID 111) gets preempted on timer interrupt. CPU is running unrelated PID 222 when my data wash up on the NIC.
  2. Scenario 2: pid111 is running handleInput() while additional data comes in on the NIC.

Some key points about all scenarios:

  • context switching — There’s context switch to interrupt handler (i-handler). In all scenarios, the running process gets suspended to make way for the interrupt handler function. I-handler’s instruction address gets loaded into the cpu registers and this function starts “driving” the cpu. Traditionally, the handler function would use the suspended process’s existing stack.
    • After the i-handler completes, the suspended “current” process resumes by default. However, the handler may cause another pid333 to be scheduled right away [1 Chapter 4.1].
  • no pid — interrupt handler execution has no pid, though some authors say it runs on behalf of the suspended pid. I feel the suspended pid may be unrelated to the socket (Scenario 2), rather than the socket’s owner process pid111.
  • kernel scheduler — In Scenario 1, pid111 would not get to process the data until it gets in the “driver’s seat” again. However, the interrupt handler could trigger a rescheduling and push pid111 “to the top of the queue” so to speak. [1 Chapter 4.1]
  • top-half — drains the tiny NIC ring-buffer into main memory (presumably socket buffer) as fast as possible [2] as NIC buffer can only hold a few packets — [[linux kernel]] P 629.
  • bottom-half — (i.e. deferrable functions) includes lengthy tasks like copying packets. Deferrable function run in interrupt context [1 Chapter 4.8], under nobody’s pid
  • sleeping — the socket owner pid 111 would be technically “sleeping” in the socket’s wait queue initially. After the data is copied into the socket receive buffer in user space, I think the kernel scheduler would locate pid111 in the socket’s wait queue and make pid111 the cpu-driver. This pid111 would call read() on the socket.
    • wait queue — How the scheduler does it is non-trivial. See [1 Chapter 3.2.4.1]
  • burst — What if there’s a burst of multicast packets? The i-handler would hog or steal the driver’s seat and /drain/ the NIC ring-buffer as fast as possible, and populate the socket receive buffer. When the i-handler takes a break, our handleInput() would chip away at the socket buffer.
    • priority — is given to the NIC’s interrupt handler as NIC buffer is much smaller than socket buffer.
    • UDP could overrun the socket receive buffer; TCP uses transmission control to prevent it.

Q: What if the process scheduler is triggered to run (on timer interrupt) while i-handler is busy draining the NIC?
A: Well, all interrupt handlers can be interrupted, but I would doubt the process scheduler would suspend the NIC interrupt handler.

One friend said the while the i-handler runs on our single-CPU, the executing pid is 1, the kernel process. I doubt it.

[1] [[UnderstandingLinuxKernel, 3rd Edition]]

[2] https://notes.shichao.io/lkd/ch7/#top-halves-versus-bottom-halves

down-cast a reference #idiom

ARM P69 says down-cast a reference is fairly common. I have never seen it.

Q: Why not use ptr?
%%A: I guess pointer can be null so the receiver function must face the risk of a null ptr.
%%A: 99% of references I have seen in my projects are function parameters, so references are extremely popular and proven in this use case. If you receive a ref-to-base, you can down cast it.

See post on new-and-dynamic_cast-exceptions
see also boost polymorphic_cast

sentinel node trick: array/slist

  • eg: STL uses sentinel nodes. I believe container.end() returns that.
  • eg: c-string uses \0 as sentinel value.
  • eg: binary search needle can be too high and we have to test if the return value is a real element or the end-of-container, so it’s extremely useful to add another sentinel to guarantee a successful search

When solving timed coding questions on “clean” arrays, it can be useful to append a sentinel node. It can simplify some algorithms which take action after each segment.

P 90 [[Pearls]] introduced a loop coding idiom. Idiom is applicable whenever we loop over a container and have an early-exit “break”. Such a loop has exactly two [1] conditional tests per iteration, therefore can run faster if we combine the two into one conditional test. This is a small halo idiom for latency. But beyond latency, there are more interesting benefits such as cognitive complexity reduction.

For example, consider the business logic after reaching “normal” end of the container without hitting the early exit. Rarely can we “forget” this business logic and simply exit the function and rely on the implicit return. Instead, this normal-completion scenario requires careful handling and a cognitive burden. To remind myself, I often place a comment after end of the loop. (Python provides an Else clause for a for-loop.)

In some cases, the business logic after end of the loop is 2 pages away from the early-exit business logic, but they really should be in one module. I hate this situation so much that I always have set a flag and use a “break” so that the two routines are both written after end of the loop.

In all these scenarios, it’s often simpler to append an artificial sentinel element at end of the container!  The sentinel is guaranteed to hit the early-exit, so the loop would never exhaust all elements and complete normally. Therefore we can combine the two into one conditional test:)

I usually replace “break” with “exit”, further reducing the control-flow complexity. Such micro simplifications can pay huge dividends in high-pressure timed tests.

Right before the exit/break we may still need to handle normal-completion scenario by checking if we are at the sentinel. Interesting scenario. Now we can combine the logic of normal-completion vs early exit. In many cases, they are (nearly) identical, so we can express the logic in very structured code.

Whether they are identical or not, handling the two scenarios (early vs normal completion) in a centralized module is usually simpler and more structured, reducing the cognitive complexity.

[1] what if three tests (two breaks)? The mental burden is even worse. The sentinel could reduce it

malloc=long considered costly

Heap allocation is extremely slow compared to other operations.

 

lazy singleton based@JVM dynamic classloading #Lea

In [[DougLea]] P86, this JVM expert pointed out that a simple “eager” singleton is eager in other language, but lazy in Java due to runtime on-demand class loading.

Specifically we mean a public[1] static final field. This initialization is thread-safe by default. Assuming the field is immediately accessed after class loading, this simple design is comparable to the familiar synchronized lazy singleton. What are the pros and cons?

Synchronized singleton requires more legwork (by developer) but
* It lets you pass computed ctor parameters at runtime. Note the singleton ctor is private but yo can call getInstance(userInput).
* As hinted earlier, if you load the class but do not immediately use the instance, then the simple design incurs the expensive initialization cost too early.

[[DougLea]] was writen before java5. With java5, [[EffJava]] advocates enum.

[1] DougLea actually prefers private field with public getter, for encapsulation.

ADL #namespace

  • “Put the function in the same namespace as the classes it operates on.” is a one-liner summary
  • If you want to write a function that needs only a class’s public interface – then that function doesn’t have to be a (static/non-static) member. The function can become a free function placed in the same name space as the class. This increases encapsulation and data hiding, and reduces coupling as only the public interface is needed.. P79 [[c++codingStd]]
  • It’s also an idiom/pattern to impress interviewers. The published experts all seem to view ADL as a library feature mainly for overloaded operators. Read cppreference and [[c++codingStd]]. I’m not interested in that usage.
  • boost seems to use a lot of ADL
  • I feel namespace loves ADL more than any other c++ feature 🙂
  • ADL is an endorsed, prized compiler feature but still with criticisms [1]. To eliminate the confusion, simply fully qualify the function call.

[1] https://stackoverflow.com/questions/8111677/what-is-argument-dependent-lookup-aka-adl-or-koenig-lookup has the best quickguide with simple examples.

https://softwareengineering.stackexchange.com/questions/274306/free-standing-functions-in-global-namespace is a short, readable discussion.

My annotations on the formal, long-winded definition — ADL governs look-up of unqualified function names in function-call expressions (including operator calls). These function names are looked up in the namespaces of their arguments in addition to the usual namespace-based lookup.

STL: few exceptions #no virtual

side note — virtual function is also very rare in STL classes. I don’t know any.


STL functions seldom throw exception. See P 248 [[c++standard library]]

  1. at() method of vector/string/map… throws, since it’s the “checked” version of the subscript operator
  2. reserve() method throws

Besides these two unusual functions, STL would only throw the standard low-level exceptions like memory allocation failures. All other error conditions are undefined-behavior, such as

  • top()/pop() on priority queue
  • std::string operator[index] with invalid index

However, Payload data types going into STL container can create exceptions:

  • If a payload class ctor/assignment throws, then it would propagate.
  • Payload destructors should never throw. [[c++coding standard]] says it’s forbidden by STL standard.

class scope as a pseudo namespace #goog/effC++

https://google.github.io/styleguide/cppguide.html#Nonmember,_Static_Member,_and_Global_Functions

[[effC++]] P 120 has a concise chapter on namespace vs namespace-emulating class.

So when you see someName::someVar or someName::someClass or someName::someFunc, the “someName” may be a class.

When you see “someVar” without the prefix, don’t assume it’s in your local namespace. It could be typedef of someName::someVar !

std::string is COW-reference-counted now but should change]c++11

http://www.drdobbs.com/cpp/c-string-performance/184405453 (circa 2003) explains that COW speeds up string copy and string destruction.

https://stackoverflow.com/questions/12520192/is-stdstring-refcounted-in-gcc-4-x-c11

This proved a small halo.

However, gcc 5.1 introduced a breaking change. https://gcc.gnu.org/onlinedocs/libstdc++/manual/using_dual_abi.html says These changes were necessary to conform to the 2011 C++ standard which forbids Copy-On-Write strings

Q: why forbidden?
A: thread-safety … See [[std c++lib]] P692

j4 factory: java^c++ #Wells

A Wells Fargo interviewer asked

Q6: motivation of factory pattern?
Q6b: why prevent others calling your ctor?

  • %%A: some objects are expensive to construct (DbConnection) and I need tight control.
  • %%A: similarly, after construction, I often have some initialization logic in my factory, but I may not be allowed to modify the ctor, or our design doesn’t favor doing such things in the ctor. I make my factory users’ life easier if they don’t call new() directly.
  • AA: more importantly, the post-construction function could be virtual! This is a sanctioned justification for c++ factory on the authoritative [[c++codingStd]] P88
  • %%A: I want to (save the new instance and) throw exception in a post-construction routine, but I don’t want to throw from ctor
  • %%A: I want to centralize the non-trivial business rule of selecting target types to return. Code reuse rather than duplication
  • %%A: a caching factory
  • %%A: there are too many input parameters to my ctor and I want to provide users a simplified façade

[[Focus]] for technical xx

See also my email to Raja, XR and ZengSheng on “the zone”, the “creative engine”…

As the book author, Dan’s concept of focus is more narrow. It’s related to a student’s focus. I told my son he needs better concentration. As a student I was very good with concentration and engaged.

Compared to my engaged “traction” days, I now spend too much time emailing, blogging on non-tech, calling home, research about housing, shopping, food preparation … But that’s not the key. During the work or study hours, I am not focusing as I did in primary school.

Traction/engaged days are with growth, concentration, positive feedback loop, tech blogging….

I think Yang was the first manager who pointed out I need better focus. Perhaps he meant I tried to study all the systems in PWM Comm all at once. I made real progress (with ErrorMemos, SQL, Java …) after I sent my wife/son back to Singapore.

I made real progress with quant stuff during the Barcap days. However, I also had to keep job hunting remotely.

Avichal, who sat beside me, said I had too many distractions… Disengaged.

Similar to dabao’s case, I guess the initial code complexity requires focus… Without sufficient focus, the complexity is just too much for my capacity. Better shut out everything else and focus on delivering something non-trivial. Remember PWM, Quartz, …

Grandpa had a lot to say about focus…

poll()as timer]real time C : industrial-strength #ST-mode

See also http://www.sourcexr.com/articles/2013/11/12/timer-notifications-using-file-descriptors

RTS Xtap (and earlier framework? probably) use a C timer based on the Linux syscall epoll(). It has millisecond precision [1], good enough for our real time feed parsers that consume all of NYSE, Nasdaq, OPRA etc. I won’t say how many clients we support, but some of the world’s top sites use our feeds.

There’s also a version using poll(). It’s used by default when epoll is unavailable, but linux supports epoll. For simplicity, I will say “poll” from now on.

[1] The millisec resolution is due to the tick length in Linux kernel, described on P 195 [[Linux kernel]]

I guess one advantage of poll() to implement timer is the integration with sockets. A parser is fundamentally event driven. Timer and socket are the two primary event sources.

It looks like the poll() syscalls are capable of supporting both requirements, together, which is our situation.

I call it “industrial-strength” solution because it has powered one of the most reliable, most widely used market data dissemination systems for decades, the unsung hero behind most of the financial data web sites. It has been in use for decades and handles 300 “exchanges feeds”

Concurrency? xtap is single-threaded mode. In the poll() solution, incoming packets AND timer events are inherently processed serially. No synchronization required. When the receiver socket is busy, timer events are .. delayed. Exactly what we want.

–The timer/event loop structure

while(1){ //pseudocode
  check_timers(); //roughly this gets called every 1 ms
  epoll_wait(timeout=1ms);
}

 

simple implementation of memory allocator#no src

P9 [[c++game development primer]] has a short implementation without using heap. The memory pool comes from a large array of chars. The allocator keeps track of allocated chunks but doesn’t reuse reclaimed chunks.

It showcases the header associated with each allocated chunk. This feature is also part of a real heap allocator.

reinterpret_cast is used repeatedly.

RMI could send more than simple VO

[[java the good parts]], written by a RMI authority, is a thin book with a concise but in-depth chapter on RMI. One of the things pointed out was —

In a distributed system like RPC or RMI, sometimes the object sent over network are more than monolithic POJO “value objects” with trivial methods. For example, you could send a special FXDeal object having specialized virtual reval() method. Interface would declare a BaseDeal data type for it. Obviously, receiving end should NOT “slice” this FXDeal object down to BaseDeal and lose the specialized reval().

Not sure how CORBA handles it, but RMI was designed to support it. Basically the serialization stream of the object includes the location to download the FXDeal class bytecode.

STL containers: pick correct SORT for each

Needed in coding IV

  • std::sort() is good for array, vector, deque…
  • std::list has no random access iterator so it must use list::sort() method
  • a single string? You have to construct a vector<char>
  • unordered containers don’t need sorting

map (and set) remain sorted so don’t need a sort operation. They can specify a special sort comparitor either as a template arg or a ctor arg. P316 and P334 [[c++ std lib]] compare the two choices. You can also also pass no comparitor arg but define operator< in the key class

2 simple yet concurrent singleton implementations ] c++

Both designs are thread-safe.

  1. First design is the static factory method + a static data member initialization.
  2. Second design uses a local static object, based on [[eff C++]] P 222.
template<class T>
class StaticFieldSingleton {
private:
 static StaticFieldSingleton<T> instance_; //declaration of static field
 StaticFieldSingleton() {
  cout << "StaticFieldSingleton() ctor\n";
 }
 StaticFieldSingleton(StaticFieldSingleton<T> const &);
 StaticFieldSingleton<T>& operator=(StaticFieldSingleton<T> const &);
public:
 static StaticFieldSingleton<T>& getInstance() {
  return instance_;
 }
}; 
//separate definition required on any static field
template<class T> StaticFieldSingleton<t> StaticFieldSingleton<t>
::instance_; //<---def of static field: required complexity

///////////// 2nd design uses a static local object
class SimpleSingleton {
 SimpleSingleton() {
  cout << "SimpleSingleton()\n";
 }
 SimpleSingleton(SimpleSingleton const &);
 SimpleSingleton& operator=(SimpleSingleton const &);
public:
 static SimpleSingleton& get_instance() {
  static SimpleSingleton instance;//<----- cleaner syntax
  return instance;
 }
};
int main() {
 StaticFieldSingleton<float>::getInstance();

 SimpleSingleton& ins1 = SimpleSingleton::get_instance();
 SimpleSingleton& ins2 = SimpleSingleton::get_instance();
 cout << &ins1 << endl;
 cout << &ins2 << endl;
}

 

risk-factor-based scenario analysis [[Return to RiskMetrics]]

Look at [[return to risk riskMetrics]]. Some risk management theorists have a (fairly sophisticated) framework about scenarios. I feel it’s worth studying.

Given a portfolio of diverse instruments, we first identify the individual risk factors, then describe specific scenarios. Each scenario is uniquely defined by a tuple of 3 numbers for the 3 factors (if 3 factors). Under each scenario, each instrument in the portfolio can be priced.

I think one of the simplest set-up I have seen in practice is the Barcap 2D grid with stock +/- percentage changes on one axis and implied vol figures on the other axis. This grid can create many scenario for an equity derivative portfolio.

I feel it’s important to point out two factors can have non-trivial interdependence and influence each other. (Independence would be nice. In a (small) sample, you may actually observe statistical independence but in another sample of the same population you may not.) Between the risk factors, the correlation is monitored and measured.

[[algo in a nutshell]] bucket sort, insertion sort..

== bucket sort
This book asserts that bucket sort is #1 top dog when data is uniformly distributed (not normally distributed!) — i.e. can be uniformly partitioned using a fast hashing function. Number of buckets created equals the input data size, just like in a standard hash table. Book shows bench-marking on large samples of random floating point numbers. However, I disagree.
  1.  for special data types like strings and small ints, radix sort is probably faster than bucket sort
  2. Bucket sort is #1 for large data volumes, beating quick sort, heap sort etc. But for nearly-sorted data, Insertion-sort is faster.

Other notes

  • Linked list — used inside each bucket. Arrays are possible alternative, but 50% slower.
  • —-Relation to other sorts:
  • Can be seen as generalization of counting sort
  • A cousin of radix sort in the most-to-least significant digit (MSD) flavor i.e. top-down radix sort
–hash sort, a variation of bucket sort, customized for random strings.
Book shows benchmarking on random strings. Hash sort is #1 for large data volumes, beating quick sort, heap sort etc

I guess if string distribution is unconstrained/unknown (without guarantee of randomness), hash sort will not have any advantage

in both cases, the hash code must be strictly “ordered” , so bucket #1 must hold lowest data items.

== insertion sort
P100. for nearly-sorted data, insertion sort is faster than all other sorts (including bucket sort, radix sorts), often by an order of magnitude

https://stackoverflow.com/questions/736920/is-there-ever-a-good-reason-to-use-insertion-sort shows insertion sort is better than divide-n-conquer for 1) nearly-sorted or 2) small collection

https://www.cs.princeton.edu/~rs/AlgsDS07/18RadixSort.pdf shows that MSD radix sort also need to switch to insertion-sort when sample size is small.

== counting sort
P315. “fastest” sort for the right data set. I think a single English word is suitable. However, for nearly sorted data, insertion sort is still faster.

I guess it’s still slower than insertion sort if nearly-sorted.

== Binary search tree vs hash table
P130 discusses when to use which

[[automate the boring stuff with python]]

This book teaches just enough python features for the purpose. All non-essentials are left out.

–sub chapter on selenium
the easiest way to use selenium and drive a browser, IMO.

–Chapter on Excel
text file spreadsheet converter
merge/unmerge
setting font in a cell

–Chapter on PDF:
combine select pages from many files

—-Chapter on CSV + json

—-Chapter on task scheduler

—-Chapter on keyboard/mouse control — powerful

[[c++recipes]] mv-semantic etc

I find this book rather practical. Many small programs are fully tested and demonstrated.

This 2015 Book covers cpp14.

–#1) RVR(rval ref) and move semantic:
This book offers just enough detail (over 5-10 pages) to show how move ctor reduces waste. Example class has a large non-ref field.

P49 shows move(), but P48 shows even without a move() call the compiler is able to *select* the move ctor not copy-ctor when passing an instance into a non-ref parameter. The copy ctor is present but skipped!

P49 shows an effective mv-ctor can be “=default; “

–custom new/delete to trace memory operations
Sample code showing how the delete() can show where in source code the new() happened. This shows a common technique — allocating an additional custom memory header when allocating memory.

This is more practical than the [[effC++]] recipe.

There’s also a version for array-new. The class-specific-new doesn’t need the memory header.

–other
A simple example code of weak_ptr.

a custom small block allocator to reduce memory fragmentation

Using promise/future to transfer data between a worker thread and a boss thread

java8 default method break backward compatibility #HSBC

Among the java8 features, I think default  method is a more drastic and fundamental change than lambda or stream,  in terms of language

In my HSBC interview a London interviewer (Wais) challenged me and said that he thought default methods are designed for backward compatibility. I now think he was wrong.

—- Based on P171 [[Mastering Lambdas]]
Note The rare cases of incompatibility is an obscure (almost academic) concern. More important are the rules of method resolution when default methods are among the candidates. This topic is similar in spirit to popular interview questions around overriding vs overloading.

Backward compatibility (BWC) means that when an existing interface like Collection.java includes a brand new default method, the existing “customer” source code should work as before. Default methods has a few known violations of BWC.

  • simplest case: all (incl. default) methods in an interface must be public. No ifs or buts.  Suppose Java7 MyConcreteClass has private m2() and implements MyIntf. What if MyIntf is now updated with a default method m2()? Compilation error!
  • a more serious case: java overriding rule (similar to c++) is very strict so m(int) vs m(long) is always, automatically overloading not overriding.  Consider a method call myObj.m(33). Originally, this binds to the m(long) declared in the class. Suppose the new default method is m(int) … an Overload! At compile time, this is seen as a better match so selected by compiler (not JVM runtime)… Silent, unexpected change in business logic and a violation of BWC!

This refreshingly thin book gives 2 more examples. Its last example is a serious backward incompatibility issue but I am not convinced it is technically possible. Here’s my rationale —

Any legacy code relying on putIfAbsent() must have an implementation of putIfAbsent() somewhere in some legacy java7 class. Due to “class-wins-over-interface” rule, a new default method putIfAbsent() will never be chosen when compiling the legacy code using java8 tool chain.

[[21st century c]] – unusually practical update on C

a sub-chapter on string processing in the new world
a sub-chapter on robust Macros in the new world
a sub-chapter on function to report errors in the new world
a full chapter on pointer in the new world
a full chapter on C api to be consumed by other languages like python
a full chapter on struct syntax improvement to support returning multiple values + status code
a sub-chapter on pthreads
a sub-chapter on [[numerical recipes in C]] and the implementation – the GNU scientific library
a sub-chapter on SQLite
briefly on valgrind
function returning 2 values + status code
many innovative macro tricks
innovative and concise explanation of auto(i.e. stack) vs static vs malloc memory

Note a sub-chapter is very short, in a concise book. A remarkably practical update on C, somewhat similar to [[safe c++]]. Content isn’t theoretical, and not so relevant to interviews, but relevant to real projects and GTD

Rebanato – good author on fixed income models

recommended by Sian Hwee.

Ronnie said Black model is popular (largely due to simplicity, and historical reason), and many option products are quoted in terms of vols implied from the Black model. 
TA seems to agree that the advanced models (beyond the Black model) are still needed but indeed harder than the earlier lectures before the Black model.

[[java performance]] by Scott Oaks

–[[java performance]] by Scott Oaks

 

best of breed..see chapter details on

[jvm] heap memory

[jvm] threading

[jvm] instrumentation

JPA

serialization

lambda, stream  (java 8 interviews!)

 

The Introduction chapter outlines 3 broad aspects

* JVM – like memory tuning

* java language – like threading, collections

* Java API — like xml parser, JDBC, serialization, Json

 

JVM tuning is done by “system engineers” who may not be developers.

 

maturity bucketing for VaR

[[complete guide]] P 457 pointed out VaR systems often need to aggregate cashflow amounts across different deals/positions, based on the “due date” or “maturity date”.

Example — On 12/31 if there are 33 payable amounts and 88 receivable amounts, then they get aggregated into the same bucket.

I think bucketing is more important in these cases:

  • a bond has maturity date and coupon dates
  • a swap has multiple reset dates
  • most fixed income products
  • derivative products — always has expiry dates

In StirtRisk, I think we also break down that 12/31 one day bucket by currency — 12/31 USD bucket, 12/31 JPY bucket, 12/31 AUD bucket etc.

Q: why is this so important to VaR and other market risk systems? (I do understand it hits “credit risk”.)
%%A: For floating rate products, the cashflow amount on a future date subject to market movements
%%A: FX rate on a future date 12/31 is subject to market movements
%%A: contingent claim cashflow depends heavily on market prices.
%%A: if 12/31 falls within 10D, then 10D VaR would be impacted by the 12/31 market factors

IRS intuitively – an orange a day#tricky intuition

Selling an IRS is like signing a 2-year contract to supply oranges monthly (eg: to a nursing home) at a fixed price.

Subsequently orange price rises, then nursing home is happy since they locked in a low price. Orange supplier regrets i.e. suffers a paper loss.

P241 [[complete guide]] — Orange County sold IRS (the oranges) when the floating rate (orange price) was low. Subsequently, in 1994 Fed increased the target overnight FF rate, which sent shock waves through the yield curve. This directly lead to higher swap rates (presumably “par swap rates”). Equivalently, the increased swap rate indicates a market expectation of higher fwd rates. We know each floating rate number on each upcoming reset date is evaluated as a FRA rate i.e. a fwd-starting loan rate.

The higher swap rate means Orange County had previously sold the floating stream (i.e. the oranges) too cheaply. They lost badly and went bankrupt.

It’s crucial to know the key parameters of the context, otherwise you hit paradoxes and incorrect intuitions such as:

Coming back to the fruit illustration. Some beginners may feel that rising fruit price is good for the supplier, but wrong. Our supplier already signed a 2Y contract, so the rising price doesn’t help.

[[all about HFT]]

Author is an option specialists (currently teaching derivatives at a university). Many mentions of HFT on options.
 
chapters (80p) on technology. Author believes the 3 legs are {strategy, math, tech}
chapter (50p) on strategy
**first part seems to be uninteresting, math-light but might be important in practice
**chapter (12p) on arbitrage strategies
1 page on native API vs FIX.
a few pages on cpu offloading, including running Monte Carlo on GPGPU
compares c++ vs c#java in a HFT context
compares buy vs build in a HFT context

[[practical api design]] java

Written by the founder and initial architect of NetBeans, this book has a java root. Freely available in pdf, I find this ebook rather detailed, not a lot of high-level (vague) concepts like most design/architecture books including the classics. It seems to deal with some real world coding and design decisions. I would say these are not life-and-death decisions but still everyday decisions. (In contrast those other books address decisions I don’t encounter or care at all — seem to belong to another level.) Therefore this book is closer to my life as a developer.

There’s a chapter against “extreme” (unhealthy) advice. Unconventional critique:
** an api must be beautiful
** an api must be correct
** an api must be simple
** an api must have good performance
** an api must be 100% compatible
** an api must be symmetrical

chapter on creating a threadsafe API, to be used by clueless and untrained programmers.

chapter on supervising the development of an API

Section on overcoming the fear of committing to a stable API

Section on Factory being better than constructor

Section on making everything final

[[safeC++]] discourages implicit conversion via OOC/cvctor

See other posts about OOC and cvctor. I am now convinced by [[safeC++]] that it’s better to avoid both. Instead, Use AsXXX() method if converting from YYY to XXX is needed. Reason is type safety. In an assignment (including function input/output), it is slightly hacky if LHS is NOT a base type of RHS. Implicit conversion is like Subversion of compiler’s type enforcement — Given a function declared as f(XXX), it should ideally be illegal to pass in a YYY. However, The implicit converters break the clean rule, from the back door.

As explained concisely on P8 [[safeC++]], The OOC is provided specifically to support implicit conversion. In comparison, The cvctor is more likely to be a careless mistake if without “explicit”.

Favor explicit conversion rather than implicit conversion. Some manager in Millennium pointed out that c++ syntax has too many back doors and is too “implicit”. Reading a piece of code you don’t know what it does, unless you have lots of experience/knowledge about all the “back-doors”.

use delete() or delete[] @@ #briefly

An address stored in a pointer (ptr-Object or ptr-Nickname — see http://bigblog.tanbin.com/2012/07/3-meanings-of-pointer-tip-on-delete-this.html) can mean either a “lone-wolf” or a “wolf pack”. 
Specifically, the logical (rather than physical) data at the address could be an array of objects or a single one.

The op-new and op-new[] operators both return an address. If you received the address without knowing which new operator was used, then it’s impossible to know how to delete it. As [[effC++]] points out, deleting the wrong way is UndefinedBehavior. 

The suggestion in [[safeC++]] is to avoid smart array (like shared_array) but use vector instead.

On a similar note, if your function receives a char*, you would have no idea whether it’s a single char or a string.

[[safeC++]]assertion technique(versatile), illustrated with NPE

Update: how to add a custom error message to assert — https://stackoverflow.com/questions/3692954/add-custom-messages-in-assert

This is a thin book. Concise and practical (rather than academic) guidelines.

#1 guideline: enlist compiler to catch as many errors as possible.

However, some errors will pass compiler and only happen at run-time. Unlike on P11, I will treat programmer errors and other run-time errors alike – we need an improvement over the “standard” outcome which is UB (undefined behavior) and we may or may not see any error message anywhere.

#2 guideline: In [[safeC++]], such an improvement is offered in the form of assertions, in other words, run-time checks. The author gives them a more accurate name “diagnostics”.

2.1) now the outcome is guaranteed termination, rather than the undefined behavior.
2.2) now there’s always an error message + a stack trace. Now 2.2) sounds like non-trivial improvement. Too good to be true? The author is a practicing programmer in a hedge fund so I hope his ideas are real-world.

Simplest yet realistic example of #2 is NPE (i.e. null pointer deref). NPE is UB and could (always? probably not) crash. I doubt there’s even an error message. Now with a truly simple “wrapper” presented on P53-54, an NPE could be diagnosed __in_time__ and an fatal exception thrown, so program is guaranteed to terminate, with an error message + stack trace.

Like a custom new/delete (to record allocations), here we replace the raw pointer with a wrapper. There we see a pattern where we replace builtin c++ constructs with our wrappers to avoid UB and get run time diagnostics —

$ this wrapper is actually a simple smart ptr
$ traditional smart ptr templates
$ custom new, delete
$ vector
$ Int class replacing int data type

The key concepts —
% assertion
% diagnostics
% run time

Q: Can every UB condition be diagnosed this way? Not sure, but the most common ones seem to be.

[[inside windows debugging]]

P38 has a half-pager comparison of the 2 main debuggers

^ MSVS – dev environment, with source code available

^ windbg – production environment. For post-coding, without source code.

** script/SQL debugging – only MSVS

P38 points out the free MSVS-express lacks certain debugging features. WinDBG is Completely free.

Q: does “windows debugger” mean windbg + minor tools?

–symbol files, symbol servers, *.pdb files

P53

P54 – “public symbols” and the microsoft online public symbol server.

[[safeC++]] – concise, pragmatic, unconventional wisdom

First off, this is a 120-page thin book, including about 30 pages of source code in the appendices. Light-weight, concise. Very rare.

I feel the author is bold to advocate avoidance of popular c++ features such as
– “Avoid using pointer arithmetic at all.”
– For class fields, avoid built-in types like int. Use Int type — no need to initialize.
– “Use the new operator only without bracket”. Prefer Vector to new[]
– “Whenever possible, avoid writing copy ctor and assignment operators for your class”

I feel these suggestions are similar to my NPE tactics in java. Unconventional wisdom, steeped in a realistic/pessimistic view of human fallibility, rather tedious, all about ….low-level details.

Amidst so many books on architecture and software design, I find this book so distinctive and it speaks directly to me — a low-level detailed programmer.

I feel this programmer has figured out the true cost/benefit of many c++ features, through real experience. Other veterans may object to his unconventional wisdom, but I feel there’s no point proving or convincing. A lot of best practices [1] are carefully put aside by veterans, often because they know the risks and justifications. These veterans would present robust justifications for their deviation — debatable but not groundless.

[1] like “avoid global variables and gotos”

–finance
Given the author’s role as a quant developer I believe all of the specific issues raised are relevant to financial applications. When you read about some uncommon issue (examples in [1]), you are right to question if it’s really important to embedded, or telecom, or mainframe domains, but it is certainly relevant to finance.

Incidentally, most of the observations, suggestions are tested on MSVS.

–assert, smartPtr…
I like the sample code. Boost smart ptr is too big to hack. The code here is pocket-sized, even bite-sized, and digestible and customizable. I have not seen any industrial strength smart pointer so simple.

–templates
The sample code provided qualify as library code, and therefore uses some simple template techniques. Good illustration of template techniques used in finance.

[1] for eg the runtime cost of allocating the integer ref count on P49; or the date class.

Any negative?

[[debug it]] c++, java.. — tips

I find this book fairly small and practical. No abstract theories. Uses c++  java etc for illustrations.

Covers unix, windows, web app.

=== debugging memory allocators
–P170
memory leaks
uninitialized variable access
varialbe access after deallocation
–p199
Microsoft VC++ has a debuging mem allocator built in. Try http://msdn.microsoft.com/en-us/library/e5ewb1h3(v=vs.90).aspx

Electric Fence

===
–P201 DTrace – included in Mac OS X
–P202 WireShark, similar to tcpdump
–P203 firebug – client-side debugging
edit DOM
full javascript debugging

–P188 rewrites – pitfalls

–A chapter on reproducing bugs — quite practical

string,debugging+other tips:[[moving from c to c++]]

[[moving from c to c++]] is fairly practical. Not full of good-on-paper “best practice” advice.

P132 don’t (and why) put “using” in header files
P133 nested struct
P129 varargs suppressing arg checking
P162 a practical custom Stack class non-template
P167 just when we could hit “missing default ctor” error. It’s a bit complicated.

–P102 offers practical tips on c++ debugging

* macro DEBUG flag can be set in #define and also … on the compiler command line
* frequently people (me included) don’t want to recompile a large codebase just to add DEBUG flag. This book shows simple techniques to turn on/off run-time debug flags
* perl dumper receives a variable $abc and dump the value of $abc and also ….. the VARIABLE NAME “abc”. C has a similar feature via the preprocessor stringize operator “#”

— chapter on the standard string class — practical, good for coding IV

* ways to initialize

* substring

* append

* insert

[[linux programmer’s toolbox]]

MALLOC_CHECK_ is a glibc env var
–debugger on optimized code

P558 Sometimes without compiler optimization performance is unacceptable.

To prevent optimizer removing your variables, mark them volatile.

An inline function may not appear in call stack. Consider “-fno-inline”

–P569 double-free may not show issues until the next free() or malloc()

–P470 – 472 sar command
can show per-process performance data

can monitor network devices

—P515 printf + macros for debugging

buffering behavior differs between terminal ^ log files

[[Hull]]estimat`default probability from bond prices#learning notes

If we were to explain to people with basic math background, the

arithmetic on P524-525 could be expanded into a 5-pager. It's a good

example worth study.

There are 2 parts to the math. Using bond prices, Part A computes the

“expected” (probabilistic) loss from default to be $8.75 for a

notional/face value of $100. Alternatively assuming a constant hazard

rate, Part B computes the same to be $288.48*Q. Equating the 2 parts

gives Q =3.03%.

Q3: How is the 7% market yield used? Where in which part?

Q4: why assume defaults happen right before coupon date?

%%A: borrower would not declare “in 2 days I will fail to pay that

coupon” because it may receive help in the 11th hour.

–The continuous discounting in Table 23.3 is confusing

Q: Hull explained how the 3.5Y row in Table 23.3 is computed. But Why

discount to the T=3.5Y and not discounting to T=0Y ? Here's my long

answer.

The “risk-free value” (Column 4) has a confusing meaning. Hull

mentioned earlier a “similar risk-free bond” (a TBond). Right before

the 3.5Y moment, we know this risk-free bond is scheduled to pay all

cash flows at future times T=3.5Y, 4Y, 4.5Y, 5Y. That's 4 coupons +

principal. We use risk-free rate 5% to discount all 4+1 cash flows to

T=3.5Y. We get $104.34 as the value of the TBond cash flows

“discounted to T=3.5Y”

Column 5 builds on it giving the “loss due to default@3.5Y, discounted

to T=3.5Y”. Iin Column 6, This value is further discounted from 3.5Y

to T=0Y.



Part B computes a PV relative to the TBond's value. Actually Part A is

also relative to the TBond's value.

In the model of Part B, there are 5 coin flips occurring every

mid-year at T=0.5Y 1.5Y 2.5Y 3.5Y 4.5Y with Pr(default_0.5) =

Pr(default_1.5) = … = Pr(default_4.5) = Q. Concretely, imagine that

Pr(flip = Tail) is 25%. Now Law of total prob states

100% = Pr(d05) + Pr(d15) + Pr(d25) + Pr(d35) + Pr(d45) + Pr(no d). If

we factor in the amount of loss at each flip we get

Pr(d05) * $65.08 + Pr(d15) * $61.20 + Pr(d25) * $57.52 + Pr(d35) *

$54.01 + Pr(d45) * $50.67 + Pr(no d, no loss) + $0 == $288.48*Q

java/c++overriding: 8 requirements #CRT

Here’s Another angle to look at runtime binding i.e. dynamic dispatch i.e. virtual function. Note [[effModernC++]] P80 has a list of requirements, but I wrote mine before reading it.

For runtime binding to work its magic (via vptr/vtbl), you must, must, must meet all of these conditions.

  • method must be –> non-static.
  • member must be –> non-field. vtbl offers no runtime resolution of FIELD access. See [[java precisely]]. A frequent IV topic.
  • host object must be accessed via a –> ref/ptr, not a non-ref variable. P163 [[essential c++]]. P209 [[ARM]] explains that with a nonref, the runtime object type is known in advance at compile time so runtime dispatch is not required and inefficient.
  • method’s parameter types must be —> an exact copy-paste from parent to subclass. No subsumption allowed in Java [2]. C++ ARM P210 briefly explains why.
  • method is invoked not during ctor/dtor (c++). In contrast, Java/c# ctor can safely call virtual methods, while the base object is under construction and incomplete, and subclass object is uninitialized!
  • method must be –> virtual, so as to hit the vtbl. In Java, all non-static non-final methods are virtual.
  • in c++ the call or the function must NOT be scope-qualified like ptr2->B::virtF() — subversion. See P210 ARM
  • the 2 methods (to choose from) must be defined INSIDE 2 classes in a hierarchy. In contrast, a call to 2 overload methods accepting a B param vs a D param respectively will never be resolved at runtime — no such thing as “argument-based runtime binding”. Even if the argument is a D instance, its declared type (B) is always used to statically resolve the method call. This is the **least-understood** restriction among the restrictions. See http://bigblog.tanbin.com/2010/08/superclass-param-subclass-argument.html

If you miss any one condition, then without run/compile-time warnings compiler will __silently__ forgo runtime binding and assume you want compile-time binding. The c++11 “overload” and java @Override help break the silence by generating compiler errors.

However, return type of the 2 functions can be slightly different (see post on covariant return type). P208 ARM says as of 1990 it was an error for the two to differ in return type only, but [[c++common knowledge]] P100 gives a clear illustration of clone() method i.e. virtual ctor. See also [[more eff C++]] P126. CRT was added in 1998.

[2] equals(SubclassOfObject) is overloading, not overriding. @Override disallowed — remember Kevin Hein’s quiz.

Here’s a somewhat unrelated subtle point. Suppose you have a B extended by C, and a B pointer/ref variable “o” seated at a C object, you won’t get runtime binding in these cases:

– if you have a non-static field f defined in both B/C, then o.f is compile-time binding, based on declared type. P40 [[java precisely]]
– if you have a static method m() defined in both B/C, then o.m() is compile-time binding, based on declared type. [1]
– if you have a nonref B variable receiving a C object, then slicing — u can’t access any C part.

[1] That’s well-known in java. In C++, You can also “call a static member function using the this pointer of a non-static member function.”

removal→iterator invalidation:STL,fail fast,ConcurrentMap

This is a blog post tying up a few discussions on this subject. It’s instructive to compare the different iterators in different contexts in the face of a tricky removal operation.

http://tech.puredanger.com/2009/02/02/java-concurrency-bugs-concurrentmodificationexception/ points out that ConcurrentModEx can occur even in single-threaded myList.remove(..). Note this is not using myIterator.remove(void).

[[Java generics]] also says single-threaded program can hit CMEx. The official javadoc https://docs.oracle.com/javase/7/docs/api/java/util/ConcurrentModificationException.html agrees.

ConcurrentHashMap never throws this CMEx. See http://bigblog.tanbin.com/2011/09/concurrent-hash-map-iterator.html. Details? not available yet.

Many jdk5 concurrent collections have thread safe iterators. [[java generics]] covers them in some detail.

As seen in http://bigblog.tanbin.com/2011/09/removeinsert-while-iterating-stl.html, all STL graph containers (include slist) can cope with removals, but contiguous containers can get iterators invalidated. Java arrayList improves on it by allowing iterator to perform thread-safe remove. I guess this is possible because the iterator thread could simplify skip the dead node. Any other iterator is invalidated by CMEx. I guess the previous nodes can shift up.

–brief history

  1. STL iterator invalidation results in undefined behavior. My test shows silent erroneous result. Your code continues to run but result can be subtly wrong.
  2. In java, before fail-fast, the outcome is also undefined behavior.
  3. Fail-fast iterator is the java solution to the iterator invalidation issue. Fail-fast iterators all throw CMEx, quickly and cleanly. I think CMEx is caused by structural changes — mostly removal and insertions.
  4. CHM came after fail-fast, and never throws CMEx

## common FX arbitrage scenarios

As in other financial markets (FI, options, forwards…), arbitrage is probably #1 force at play constraining public quoted prices. In FX, there are a couple of major arbitrage scenarios, and each is extremely important to real world pricing algorithms

* tri-arb (total 6 bid/ask numbers on 3 currency pairs)
* Int Rate Parity. See [[Complete guide]]
* arb between FX futures and spot markets. Bear Sterns used to do these arbitrage trades.

IRP is at the heart of forward point calculations.

IRP is the crucial link between the money market and FX forward market prices.

Tri-arb is at the heart of cross rate derivation.

How about fx options quoted on AB, AC and BC? I believe there are arb-constraints on the quotes.

eSpeed/BrokerTec inter-dealer Treasury market workup

(In the offer case, “Lift-up” is similar, but “hit-down” is easier in English.)

The bids on the eSpeed order book is like a stack of cards. Top of the stack is one or multiple bids at the same bid price. For example 2 bids at 99.8. Needless to say, the trader who came earlier is first_among_equals, but remember both bidders are on the same price level, like 2 physical cards spread out horizontally, on top of the card stack [1].

When you  h i t  them, you potentially “lower” the market – “hit down”. 2 scenarios/outcomes, inspired by P117 [[Complete guide to capital markets]] —

Scenario/Outcome 1 – you lower the market – During the workup, you and “secondary” hitters make up a total  o r d e r  size exceeding the total bid quantity at 99.8, consisting of 2 original bids + secondary bids. Afterwards, all the “cards” at 99.8 are removed, exposing the cards at 99.7, which is the new top-of-stack.

Scenario/Outcome 2 – best bid remains at original level – similar to Scenario 1 but not “exceeding”. After workup, top bidder is seldom the original top  bidders. Original 1st bidder has the first choice to “join” the workup [2]. If she doesn’t  a b s o r b  all the sell orders, then she’s removed at the 99.8 bid level.

Looking at both scenarios, you see that “hit-down” can never move the market up.

[1] In semi-technical terms, it’s like a linked list anchored at top slot of the card “stack”.

[2] This is often an automated trading system

various meanings of "forward"

Note any time people say “forward”, it’s relative to “spot” or current. Background – There’s always some _fluctuating_, measurable variable. Think of it as temperature. It has an observable current level, and people try to estimate the level in x months, and people trade on that estimate.

forward price of an underlying security — is the simplest. Relevant in the Black-Scholes

“forward price” in Treasury — is probably a misnomer IMO. P132 [[complete guide]] clarifies the 2 meanings of “forward-price”

forward rate — see blog on FRA. It is like a CD rate for a 3 month term starting x months from today.

forward contract — see blog. It’s a contract to delivery some copper, or currency, or a treasury.

statement reorder for atomic variables #java,c++

see also [[effModernC++]]

–Java: Atomic is similar to volatile

See also http://www.cs.umd.edu/~pugh/java/memoryModel/jsr-133-faq.html#volatile

Atomic package javadoc says:

The memory effects for accesses and updates of atomics generally follow the rules for volatiles, as stated in  The Java Language Specification, Third Edition (17.4 Memory Model):

* get() has the memory effects of reading a volatile variable.
* set() has the memory effects of writing (assigning) a volatile variable.
* lazySet() … is weaker
* compareAndSet() and all other read-and-update operations such as getAndIncrement have the memory effects of both reading and writing volatile variables.

FX fwd arbitrage – 4 b/a spreads to OVERCOME

Look at the parity between fwd/spot FX rates and the 2 interest rates (in the 2 currencies). Basic concept looks simple, but in the real market each rate is quoted in bid and ask. 8 individual numbers involved.

We pick 4 of them to evaluate ONE arbitrage strategy (fwd rate too high) and the other 4 to evaluate another arbitrate opportunity (fwd rate too low)

Across asset classes, most pricing theories assume 0 transaction cost and 0 bid/ask spread. In this case, the bid/ask spread is often the showstopper for the arbitrageur. Similar challenge exists for many option arbitrageurs.

I think [[complete guide to capital markets]] has one page illustrating this arbitrage.

what 80% volatility means

( http://lvb.wiwi.hu-berlin.de/members/personalpages/wh/talks/DetHaeSlidesVarianceSwapForecast060531.pdf has the precise math definition of Realized Variance )

We need to differentiate between i-vol vs h-vol …..

Q: what does 24% volatility mean for an option expiring in 3 months?
A: it means the stdev for an observable is —-> 24% * √ 3m/12m = 24% * 1/2 = 12%.
See P277 [[Complete guide ]]

Now let’s define that observable. If today’s closing price is $100, and closing price in 3 months is X, then (X-100)/100 is the observable.

Therefore, 3-month forward price is likely (two-thirds likelihood) to fall between ±12% of current price, or between $88 and $112. Here we ignore interest rate and dividend.

Now forget about options. Consider a stock like IBM.

Q: what does a vol of 80% means for IBM?
A: see P 76 [[Options Vol trading]]. Average day will see a 5% move (80%/√ 252. More precisely 66.666% of the days will see moves under 5%; 33.333% of the days will see wider/wilder moves.

Q: a longer example — what does 25% volatility mean for IBM’s closing prices tomorrow vs today?
A: it means the stdev for IBM daily percentage return is 25% * √ 1 / 252days = 25% / 15.9 = 1.57% [1]

A longer answer: Take the underlier’s daily closing price for today ($100) vs tomorrow (X). The daily percentage return (X-100)/100  could be 1%, -2%, 0.5% .., but for clarity I’d rather compute PriceRelative to get 1.01, 0.98, 1.005…

Now if we simulate 1000 times and plot a histogram of daily returns defined as log(PR), then we get a mean. The mean is most likely very close to 0. The histogram is likely to resemble a normal curve. If today closes at $100, and if IBM does follow an annualized vol of 25%, then tomorrow’s close is likely (two-thirds likelihood) to fall within $98.43 to $101.57. Note the numbers are imprecise because we are assuming IBM is equally like to lose 1.57% or gain 1.57%, but this is the naive model. It also predicts “equally likely to gain 101% or lose 101%” to become NEGATIVE. BS model assumes log(PR) is normal so 1.57% gain is as likely as a decline to 98.454%.

IF (big IF) IBM continues to *follow* the 25% annualized vol, and if we observe its daily PR for 5 years, we will see that most of the time (two-third of the time), daily PR falls somewhere between ±1.57% using the naive model. See P34 [[Trading Options in Turbulent Markets ]]

[1] There are 252 trading days in a year. Our 25% vol is an annualized vol.

delta of a stock option – on last days

For any call/put option
– ATM options (call or put) have delta close to 50%, so ATM option’s MV moves about 50c when underlier moves $1. [3]
– Deep ITM options have delta close to 100% (“100 delta”) , so it feels like a regular stock [1];
– Deep OTM options have delta close to 0  (“0 delta”). [2].
– Therefore, a trader uses magnitude of delta as approximate likelihood of expiring ITM. See P28 [[Trading Option Greeks ]]

You can see part of the reason in the curve of [Px (call option)  vs  spotPx(underlier) ] on P276 [[complete guide to cap mkt]]. ITM call’s slope is almost parallel to slope of a long stock. OTM call’s slope is nearly flat.

Before reading further, remember a trader observes fluctuations in underlier and in volatility level. Forgive my long-winded reminder – within an hour on the expiration day, market could exhibit a sudden increase in implied-volatility. New vol will impact delta, valuation, unrealized PnL.. until you exit.

My friend Chuck Kang, who traded options before, described that just before expiration (like the Triple-witching), option *premiums* jump wildly in response to underlier price moves. I can understand why in the case of a ATM option on a fairly volatile stock.

However, P31 [[Trading Option Greeks]] suggests that On the last days deltas diverge from 50% towards 100% or 0. The impending expiration pushes out delta.

[1] I feel timeVal doesn’t affect how _likely_ the option will expire ITM. A deep ITM is 99% sure to expire ITM, so a single lot of this option feels equivalent to 100 shares of underlier, either long or short. The MV consists of mostly intrinsicVal. Delta(intrinsicVal) is 100% for an ITM.

[2] I feel timeVal doesn’t affect how _likely_ the option will expire ITM. When underlier shifts up and down, the deep OTM option still shows no promise of expiring ITM. Delta(timeVal) is nearly zero. Delta(intrinsicVal) == 0 for an OTM.

[3] If an option is still ATM on its last days, then market doesn’t know whether it will expire ITM or OTM. Too close to call.

discount curve – cheatsheet

Based on P245 of [[complete guide to capital markets]]

Discount curve is designed to tells us how to discount to NPV $1 received x days from today, where x can be 1 to 365 * 30. If the curve value for Day 365 is 0.80, then SPOT rate is 25% or 2500bps — 80 cents invested today becomes $1. Note the value on the curve is not bps or spot rate, but a discount factor value from 0 to 1.

Q: how do I get forward rate between any 2 dates?
A: P246. Simple formula.

Discount curve is “built” from and must be consistent with
+ ED CD rates (spot rates) of 1,7,14..,90 days. As loans, these loan term always starts today.
+ ED futures rates (forward rates). Loan term always last 3 months, but start on the 4 IMM dates of this year, next year, next next year ….
(Note ED futures rates are determined on the market; ED CD rates are announced by BBA.)

(Note I have ignored IR swaps, which are more liquid than ED futures beyond 10Y tenor.)

Discount curve needs to cover every single date. First couple of months are covered by latest announced ED CD rates, interpolating when necessary. After we pass 90, all subsequent dates (up to 30 years) are covered by ED futures rates observed on CME. Being forward rates, these aren’t directly usable as those CD rates, but still very simple — If the 3-month forward rate 3/19/2008 – 6/19/2008 is 200bps, and discount factor for 3/19 is 0.9, then 6/19 discount factor is (0.9 / 1.02)

state maintenance in FIX gateway

Even the fastest, most dumb FIX gateways need to maintain order state, because exchange can send (via FIX) partial fills and acks, and customer can send (via other MOM) cancels and amends, in full-duplex.

When an order is completed, it should be removed from memory to free up. Before that, the engine needs to maintain its state. State maintenance is a major cost and burden on a low-latency system.

A sell-side FIX engine is a daemon process waiting for Response from liquidity venue, and mkt/limit orders from buy-side clients.

[[Complete guide to capital markets]] has some brief coverage on OMS and state maintenance.

convert between i-volatility & option price (no real eg

In bond markets, bids/offers are often quoted in bps of yield. Example – Seller and buyer each convert 200bps yield into price and always get the same dollar price. It’s like converting between meter and inch, or between Celsius and Fahrenheit.

The conversion between hazard rate and market quotes is less clear.

Similarly, in many option markets, prices are quoted in implied volatility. Seller and buyer each convert 20%/year[1] to price, using Black-Scholes and always get the same dollar price.

Price/vol conversion is similar to price/yield conversion because …… in each case price is a poor parameter for relative value analysis, and investors found a superior apparatus to compare 2 securities — yield for bonds and i-volatility for options. Wikipedia says “Often, the implied volatility of an option is a more useful measure of the option’s relative value than its price.”

Price/vol conversion is similar to price/yield conversion because ….. from (mid-point of quoted) bid/ask prices, you get an implied-vol or implied-yield[3], over the remaining lifetime of the instrument. The vol and yield are both forecasts, verifiable at end of that “lifetime”. I-vol vs Historical-Vol. Implied Yield vs real inflation rate.

Price/vol conversion is similar to price/yield conversion because ….. everything else is held constant in the equation. Specifically,

** underlying price is held constant — Tricky. Even though underlying price changes every minute, investors each try to estimate the volatility of the underlying price over the _next_ month. If UBS assume stability, then UBS may use a 19%/year vol. As underlying price moves, UBS option pricing will move accordingly, but always lower than a hypothetical Citi trader (who assume instability).

When a buyer compares 2 competing quotes for the same option, she sees 2 i-vol estimates[2]. 19% by UBS, and 22% by Citi. As underlying price wiggles by the minute, both quotes move, but Citi quote always higher.

Next day, UBS uses a higher vol of 21.9%. As before, both quotes move along with underlying price, but now the spread between UBS and Citi quotes narrow.

At any underlier spot price, say underlying S = $388, one can convert the UBS quote to a lower implied vol and convert the Citi quote to a higher implied vol. During this conversion, underlying price is held constant.

** time to expiration is held constant — Tricky. This situation exists in price/yield conversion too.

[1] same dimension as yield? P276 [[complete guide]]
[3] Implied-Yield is not a common term but quite accurate.

CIP illustrated

Jargon: “forward-points” are bid/ask quotes. These are added on top of spot quotes
Jargon: full-forward-rates are the bid/ask quotes after the adjustments.
Sound byte: if my currency “drops”, then my interest rate should compensate that.

Q1: Suppose AUD (or USD…) spot interest rate is 200 bps [1] for a 1-year term. Yen spot rate is 150 bps. Suppose spot USDJPY = 100. What can you say about today’s AUDJPY forward rate with far date a year from now?
A1: this is Covered-Interest-Rate-Parity in action. Formula below gives F = 100*1.015/1.02 = 99.5098.

A more basic question is
Q2: will AUDJPY rate rise or fall?

We need to be thoroughly familiar with Q2 — AUD must WEAKEN (to 99.***) to reduce the high AUD return of 200bps!

[1] don’t care about Libor or treasury rate. Just consider it a measurable interest-rate

In practice, the 2 IR, the spot and forward fx rates always follow the equation. convert-deposit == deposit-convert. Otherwise, there will be arbitrage. This is described in both my bank online learning and [[complete guide]]

——textbook eg—–
USD 1 mth IR = 418 bps/year
GBP 1 mth IR = 480 bps/year
Spot cable = 1.7249
Number of days in the month = 31
Days basis GBP = 365
Days basis USD = 360

Let’s calculate 1-mth GBPUSD forward rate.

418 * 31/360 = 35.99444 bps of interest in USD
480 * 31/365 = 40.76712 bps of interest in GBP

Option A: 1 GBP invested today becomes #1.004076712 in a month
Option B: 1 GBP converted to USD and invested today becomes $1.7249 * 1.003599444 = $1.731109. Formula is S*(1+IR)

To make these 2 investments equally appealing to an investor, forward GBPUSD rate must be

1.731109/1.004076712 = 1-mth forward GBP/USD = 1.72408, which represents a depreciation of 8.2 pips.

Intuitively, GBP pays more interest, so GBP must depreciate. If GBP were to appreciate, then it’s too good to be true.

Textbook answer — Forward points = -0.00082

[ Formula is Spot-GBPUSD * (1 + IR-on-USD) == Forward-GBPUSD * (1 + IR-on-GBP)]
[ GBP convert to USD then deposit =========== GBP deposit then convert to USD ]
[ convert then deposit ==================== deposit then convert ]

—– a similar textbook example —–
AUD 3-mth 550 bps (high-yielder)
USD 3-mth 425 bps
Spot AUD/USD = 0.7326
Days = 92
Days basis AUD = 365
Days basis USD = 360
Forward AUD/USD = 0.7626 * (1 + 425 bps/360*92)/(1 + 550 bps/365*92) = 0.730431
Textbook answer — Forward points = -0.002169

OMS in FX vs equities

OMS is often the biggest and the core module in any trading platform that supports pending orders and FIX execution.

What are the differences between an eq OMS (see [[Complete Guide]]) and an FX OMS?

FX is fundamentally OTC and quote-driven, not order driven as listed markets are.

Q: in FX, there’s no exchange making the (irrevocable) matching?