simple solution to a simple Markov chain


Suppose there are 10000 particles. In the steady state, a number of them (say U) are in Bull state, a number of them (say E) are in Bear state, and R are in Recession state. They add up to the total population of 10000. Let’s find the values of U/E/R, i.e. in the steady state, how many particles will be in each state. Note unlike most MC problems, there’s no absorption in this problem.

After one step, all the particles each decide its next state, according to the transition probabilities. 2.5% of those in Bull state would change to Recession, while 90% of them would remain  in Bull.

0.025 U + 0.05 E + 0.5 R = R ….. one of 3 equations. 3 equations and 3 unknowns. U is found to be 6250, E = 3125 and R = 625

But why the ” … = R” part? Answer is the steady state. After one step, if 0.025 U + 0.05 E + 0.5 R is 0.000001% more than R, then this is not steady state and the R population will increase at every step and it would capture all 10000 particles.

java method taking a single argument but 2 alternative types

Requirement: method read() in an interface IReader needs to accept a single argument but of various types

read(Book b) to be implemented by one concrete class BookReader
read(Score s) to be implemented by another concrete class ScoreReader

Suppose we want to unify the 2 read() methods into one unified method, so a “client” can get an instance of Reader and simply pass in either a book or a score. C# probably has language support for this, but in Java …

Solution 1: use Object argument

Solution 2: declare
read(T content, Class argumentClass); implements
read(T content, Class argumentClass){
if (content instanceof Book){….

Paradoxically IDE may warn you that argumentClass is an unused variable inside the method. However Compiler use it to enforce type safety —, Integer.class);// won’t compile. Book.class required

vega of a vanilla IRS@@

Q: if Libor interest rate volatility moves by a tiny bit (for eg from 22% to 22.1%), how much would a long IRS position’s current PnL change?  To keep things simple, consider a IRS with a single payment at the end of 2013.

Jargon — Long position is the fixed-payer’s position, who stands to receive the floating payments. See

%%A: I feel this is similar to a long position in Gold futures. You have already agreed to pay a fixed price of $9000 for the asset. The asset you have bought may have a market value higher or lower today, tomorrow and and on delivery date. Futures exchange uses end-of-day mark to market to issue margin calls… Very familiar.

Now, how about vega? I feel a Gold future has no vega. I think it’s delta-1. When the Implied[1] vol of gold moves from 22% to 22.1%, the gold futures position doesn’t undergo a definite amount of MV change. It can dip or surge or stay flat.

Similarly, I would assume an IRS long position has no vega.

[1] Note vega is measured against Implied vol, not realized vol.

weekend IV – Ab Initio

Which project has the largest data volume and complexity?

How do you usually implement xml transform — in java or using stylesheet?

How do you manage both latency and throughput?

What’s the level of latency you deal with — micros or millis?

(Real time fraud detection 2000 transactions/sec, with millis latency.)

How do you diagnose latency issues?

How do you measure latency?

If you use simple logging technique to record latency, then how do you minimize the latency added by logger itself?

In-memory databases — how do you manage the disk read/write.

time-series databases — how do you query them?

Complex data analysis — use RDBMS or implement in java?

Experience with

Experience with SybaseIQ?

Many real world data processing systems receive xml messages, process it in DB then send back an xml reply. How would you deal with this? XSL transform is slower than parsing in java. I feel xml transform needs to traverse the entire xml tree, so it needs the DOM tree, which is slower than SAX parser.

Present a past project with parallel processing, large volume, possibly async messaging, good latency and throughput.

IV: local volatility by OCBC Bertrand

Q: tell me 1 or 2 low-latency challenges in your projects.
Q: what’s a variance swap?
Q: what’s your favorite messaging vendor product? (MQ is firm standard but too slow for this team.)
Q: Where do you get your implied vol numbers? From exchange or you do your own inversion? (I guess exchanges do publish quotes in vol.)
Q: how many tenors on your vol surface?
Q: how do you model term structure of vol?
Q: how do you perform interpolation when you query the vol surface?
Q: you mentioned various vol models (taylor, cubic etc) but how do you decide which model to use for each name?
Q: describe BS equation in simple language, and what are the inputs?
Q: FX vol is delta sticky. On X-axis you see 25 delta, 10 delta, atm vol etc. How about eq?
Q: any example of structured vol products?
Q: what’s dv01?
Q: which asset class are you most comfortable with?
Q: in your systems there are often multiple languages. How do you make them interoperable?
Q: what’s java generic?
Q: value type vs reference type in c#
%%Q: who will validate the prices and decide if it’s a reasonable or valid price?
A: traders. Even though they aren’t fully trained as quants are. Some managers also have pricing knowledge. Perhaps no real quants.
Q1: what’s fwd vol vs local vol?
Q: you said variance is additive? Can you elaborate?
Q: have you heard of total variance?
Q3: what’s put-call parity?
%%A: C + K = P + F. Each of the 4 can be synthesized using the other 3

Q3b: can you show me one synthetic portfolio, say a synthetic long call?
Q: Can you explain PCP in terms of deltas?

Q: you mentioned code bloat due to c++ templates, so how does c++ deal with that?
Q: have you heard of c++ strong exception guarantees vs basic exception guarantees?

Q: how does java generic differ from c++ and c# — take a hashmap for example?
Q: c# hashmap of integer? Is it special?
Q: you mentioned java hashmap of integer involves autoboxing, so what happens during autoboxing?
Q: What smart pointers have you used?
Q: if i were to use a smart pointer with my legacy class, do I have to modify my class?

Q7: you mentioned java generic was designed for backward compatibility, but why not add a new type HashMap in addition to the old non-generic HashMap class?
%%A: old and new must be interoperable

Q7b: what do you mean by interoperable?

Q: tell me one project with low latency

— some answers revealed —
A1: local vol is designed to explain skew. During the diffusion, instantaneous vol is assumed to be deterministic, and a function of spot price at that “instant” and TTL.

I guess what he means is, after 88 discrete steps of diffusion, the underlier could be at any of 888888 levels (in a semi-continuous context). At each of those levels, the instantaneous vol for the next step is a function of 2 inputs — that level of underlier price and the TTL.

A3: easiest way is “call premium – put premium == fwd price” (cash amount to be added to LHS?)

lambda in c#

I bought a book [[c# in depth]] by Jon Skeet, with excellent coverage of c# lambda. Now I feel lambda is added to the c# 3.0 primarily to support LINQ, though there are probably other usages.

Lambda in c# feels like nothing more than syntactic sugar over anonymous delegates (or “anonymous methods” as mentioned in MSDN).

If we understand c# delegates (built into c#  from Day 1), then lambda is simple — perhaps as simple as typedef or #define in C, or as simple as references in c++.

However, to have a good grip over lambda, we need to go beyond LINQ and have a solid understanding of delegates.

unicast delegate -> anon -> lambda is concise about the “evolution” of the inline-delegate Technique. Note I call it a “technique” because this is nothing but syntactic sugar. All the different alternative code techniques therein translate to the earliest version. It’s extremely important to know what goes on behind the scene, and that means the earliest, base version.

From the evolution, you can tell that the c# creators intended lambda to be a Wrapper over unicast delegate. In other words, there’s always a unicast delegate instance behind every lambda.

For a bigger picture of delegates, see my post on 2 fundamental categories.

how to make volatility values ANNUALIZED

(Let’s assume a flat forward curve i.e. 0 drift, 0 dividends, 0 interests.) Suppose an implied vol for a 1-year option is 20%. If we record ln(PR) i.e. log of daily price relatives until expiry, we expect 68% of the 200+ daily readings to fall between -0.2 and 0.2. That’s because ln(PR) is supposed to follow a normal distribution.

Note we aren’t 68% sure about the expiration underlier price i.e. S(t=T) or S(T) for short. This S(T) has a lognormal distribution[2], so no 68% rule. However, we do know something about the S(t=T) because the end-to-end ln(PR) is the sum of ln(daily PR), and due to central limit theorem, the overall ln(PR) has a normal distribution with a variance = sum(variance of ln(daily PR)). We always assume the individual items in the sum() are independent and “identical”, variance of ln(daily PR) is therefore 0.04/252days.

Also, Since ln(overal PR) = ln[S(T)/S(0)] has normal distribution, S(T) has a lognormal distribution. That’s the reason for [2].

To answer any option pricing question, we invariably need to convert quoted, annualized vol to what I call raw-sigma or stdev.

Rule #1 — we assume one-period variance will persist to 2 periods, 3 periods, 4 periods… (eg: a year consists of 12 one-month periods.)

Example 1: If one-year variance is 0.04, then a four-year raw-variance would be .04 * 48/12 = .16. The corresponding stdev i.e. raw-sigma would be 40%. This value is what goes into BS equation to price options with this maturity.

Example 2: If one-year variance is 0.04, then a three-month raw-variance would be .04 *3/12 = .01. The stdev i.e. raw-sigma would be sqrt(.01) = 10%. This value is what goes into BS equation to price options with this maturity. By Rule #1, we assume the same 3-month variance would persist in 2nd 3-month, the 3rd 3-month and 4th 3-month periods.

a few CORE differences – c# over java

(Let’s be brief, incomplete…Page numbers refer to [[C# precisely]] )
— fundamental differences among the myriad differences
Diff — delegates, events, lambda – expanded C++ stateless functor
Diff — non-virtual method? counter-intuitive to both C++ and java guys,
Diff: yield
diff: enums. Fundamentally different.  A Java enum type is a “singleton” class.
Diff — generics
Diff — expression tree

–fundamental but less exploited
Diff — ref param, out param
** assignment to a method param are visible to caller method ! P71
Diff — hiding rule — less clean than java. Even more complicated than C++ hiding rules
** you can even hide a virtual method! P 75

Diff: struct? a “kernel” innovation but unimportant in “userland”. P54
Diff: simple (i.e. primitive) types are aliases of struct types. Replaces autoboxing. Fundamentally they are same as java primitives
Diff: operator overloading like ==. Similar to equals() override in J
diff: access modifiers “internal”, default. Similar to “package access” in J
diff: Integration of primitive and wrapper classes into structs — primitive wrappers glorified
diff: simple types having methods.
–higher-level or local/peripheral features
diff: explicit interface members
class properties and indexers? accessors glorified
implicit user-defined type conversions? tricky and not widely used
IDisposable/using — “finally” glorified
rectangular/jagged arrays? Only one of them is widely used.
all nested classes are all static
static classes
static ctor

histogram/pdf -incompatible- binomial tree

This is not really restricted to option pricing. Actually, forget about option pricing! I’m thinking about the relationship of a pdf/histogram and a standard diamond-filled binomial tree. mentions but doesn’t explain the10,000 simulations. One way to simulate is via the binomial tree.

To get 2001 nodes (each corresponding to a range described in that blog), we need 2000 steps to grow our tree. However, the standard CRR btree has the 2001 price points equally spaced logarithmically. A small CRR btree might grow to these price levels.


Therefore the 2001 ranges in will not map 1-to-1 to 2001 price nodes in a btree. Multiple high price Ranges will map to the highest tree Node, and many low Nodes will fall into the lowest Range of $0~$1.

underlier terminal spot-price pdf –> option value

In any analysis of derivative valuation, we are interested in the the possible valuationS of a security at a given time. Suppose an IBM $190 option expires 22 Dec 2014, we want to know something about the possible price level on that day. We use a random variable ST to Denote S(t=T) i.e. the underlyer price at time=T. ST might be 180, 200, or 230 or whatever. (Actually IBM is quoted to 2 decimal places;-) However, as a continuous random variable, ST can be any value between 0 and 10x current price, or higher.

To keep things simple, we first look at the likelihood of ST falling below 150, between 150~200, 200~250, and beyond 250. By intuition, the probabilities of hitting these 4 “buckets” or ranges must add up to 100%.

That’s too coarse. Let’s divide into $1 buckets from 0 to $2000. We end up with 2000+1 ranges (including a special “above $2000” bucket). Say our smart computer model estimates that chance of ST falling into $200-$201 is 5 bps or 0.05%. So we draw a vertical bar of height=5; width=1/10,000 over the 200-201 range. Suppose the 201-202 probability is 3 bps, we draw a bar of that height. Iterating over our 2001 ranges, we get 2001 bars. Total area of the bars add to 1.0 [1]. Your first histogram! When the range size becomes infinitesimal, histogram becomes a pdf curve — the beautiful lognormal bell curve.

Other posts in this blog discuss how to derive the exact pdf (prob density function) of the random variable ST from the Basic assumption

\frac{dS}{S} = \mu \,dt+\sigma \,dW\,

Once we have a pdf of ST, current value of an European call (before expiration) is tractable. Since the terminal value is a hockey-stick payoff function, we multiply the pdf by a piecewise linear function, and find area under the curve. This is abstract. Let’s use histogram to illustrate.

Suppose our smart computer simulates 10,000 trials. 5 outcomes should fall into 200-201. Payoff = $200.5-$190 = $10.5. Similarly, 3 outcomes fall into 201-202, with payoff =$11.5. Roughly half the outcomes probably fall below $190 — worthless. If we compute the average payoff, we might get something like $11.11. This depends on the sigma used in the 10,000 simulations and time to expiry. We assume 0 dividend and 0 drift.

[1] In fact, the bar of 5 consists of 5 minibars, each 0.0001 wide and 1.0 long. There are exactly 10,000 minibars in the histogram representing 10,000 trials.

avoid conflicts with colleagues at all costs

Remember the theory about team forming , Storming, norming, conforming — Sounds like storming is healthy and necessary. I don't buy it.

“we encourage open debate” — bullshit. Many team members don't like those who oppose them and make their life difficult. As a manager I might encourage it rather than let team members fight private battles. I feel the debate is all about personal convenience. Who cares about system or user or team's long term healthy?

healthy debate in a team — bullshit

Even if 2 fellow developers know each other well, their tolerance for each other can be surprisingly fragile beneath the surface. If one of them shows just a little bit disrespect, relationship could unravel.

I guess if you avoid conflicts you may become a pushover. Well, pick your battle.

4-layers of pointer wrapping

I was looking for a realistic scenario of multiple layers of pointer wrapping. Here’s a vector of iterators (Not a vector of vectors). Each iterator therein comes from a nested-vector.

If we get an iterator from the outer vector, we get a ptr to ptr to ptr to ptr to double.

vector<vector<smartPtr >::iterator>::iterator my_itr;

1) inner-most pointer is a 32-bit raw pointer to a double (stack/heap/global) object.
2) The smart pointer is bigger, say 55-bit object holding a pointer to the 32-bit raw pointer.
3) The inner-iterator is a 32-bit pointer to the smart pointer object, since vector iterator is typically implemented as raw pointers.
4) The outer iterator my_itr is another 32-bit raw pointer to the elements of the outer vector, where each element is an inner-iterator. (Note each element is not a vector.)

How about adding an asterisk — smartPtr… ? As explained elsewhere on this blog ( it is not my favorite construct.

drift + random walk — baby steps

Update – a discrete random walk assumes the step size (in log space) is normally distributed.

When we enhance a granular binomial tree to be even higher granularity, the interval between 2 tree levels (sampling interval) becomes infinitesimal and we can use the standard calculus notation of ” dt “. Black-Scholes differential equation becomes

\frac{dS}{S} = \mu \,dt+\sigma \,dW\,
Note t is a Timespan, not a Datetime. I call it TTL, time-to-live or time-to-maturity.
Note the LHS denominator is the spot price, not “t”. The expression (dS/S) measures an stock’s percentage return as _holding_period_ becomes infinitesimal. Basic calculus[1] gives us the integral of the LHS
    integral(1/S * dS) = ln(S)
Ignoring the dW part, integrating the right-hand-side gives some linear function of t [0]. Therefore under zero volatility, stock price is an exponential function of t [2]. Therefore Drift is exponential — continuous compounding.
[0] note the “variable-of-integration” is S on the left hand side but ” t ” on the right-hand-side. This was a bit confusing to me.
Integrating the dW part is harder. Actually, since the dW (unlike drift) is inherently random, I doubt we can simply get the integral and predict the S at any value of t. Instead, we hope to derive the pdf of S at any point t in the future. Let me repeat the implicit but fundamental assumption — the value of S at a given t is a random variable but has a pdf. This randomness comes from the dW, not the drift.
Once we have a pdf of S(t), expiration value of an European call is tractable. Since the terminal value is a hockey-stick payoff function, we multiply the pdf by a piecewise linear function, and find area under the curve. See other blog posts.
A note on the sigma in
\frac{dS}{S} = \mu \,dt+\sigma \,dW\,
BS assumes sigma to be constant. When sigma itself moves up and down following a random motion, we have a stochastic volatility model. A simplified non-constant sigma model is the local-volatility model, popular in investment banking.

&myVec[0] is like a pointer into the underlying array

A STL container typically defines a nested typedef named “reference“, so within this container class template, “reference” can be used as a data type just like “int” or the dummy type T. Typically “reference” is a typedef for “T&”.

myVec[0] is a “function[1] call” returning such a reference. Therefore, &(myVec[0]) is a real address, nothing more nothing less — I won’t call it a pointer variable or pointer object — NOT_A_pointer_AT_ALL. This expression “&myVec[0]” evaluates to an address.

Incidentally, the vector class template also has a typedef named “pointer“. I think it’s added in c++03.

[1]if you regard overloaded operator as a special funciton

4 Scopes for operator-overloads #new()

  1. non-static member operators are very common, such as smart ptr operator++(), operator< () in iterators
  2. static member operator new() is sometimes needed. ARM explains why static.
  3. friend operators are fairly common, such as operator<<()
  4. class specific free standing operator is recommended by Sutter/Andrei, to be placed in the same namespace as the target class. Need to understand more. Advanced technique.

fake random walker – compounded rate of return

I wrote this in May 2012, before I took my Stoch class on Geometric Brownian motion.

I feel stock price is a random walker but ln(PriceRelative) is not. Let’s denote this ln() value as P.

As a (one-dimension) random walker, a stock price S(at time t) can move from 1000 to 1001.2 to 1000.3 to 999 to 999.5… The steps are cumulative. The current value is the cumulative effect of all incremental steps.

Now look at P. It might take on +0.03, +0.5, -0.01 -0.6, -1, +2… Not cumulative no drift. In fact each value in the series is independent of the previous values. I feel P is a random variable but not Brownian.

If L is a Brownian walker (i.e. follows a Wiener process), then deltaL (incremental change in S) is similar to our P.

Q: So which RV is Normal and which is LogNormal in this model?
%%A: I believe S(after i steps) denoted S_i has a log normal distribution but ln(S_i) is normal. More specifically —

Suppose matlab generates 3000 realizations (of the random process). In each path, we pick the i’th step so we get 3000 realizations of S_i. A histogram of the 3000 is a punched bell. However, if we compute ln(S_i) for the 3000 realizations, the histogram is bell-shaped.

Q: So which RV is in a geometric Brownian motion?
%%A: S.

Intuitively, if a RV is in a strictly Brownian motion (not Geometric), then its value at any time has a Normal (not LogNormal) distribution.

probably no shallow copy by STL containers

As hinted in Item 3 of [[effSTL]] and [[c++primer]], element copy is performed during insert, sort, container clone… On each element, copy-ctor and op= are used.

Q: Excluding container of pointers, is there any “shallow copy”?
%%A: I don’t think so. Container doesn’t remember the original (incoming) object’s address. Once an element is in the container, container does know its address, and often returns its address. For example, vector has many member functions returning by reference.

Q: does vector subscript accessor return address rather than a copy?
AA: yes, pbref. &vecA[0] gives the address of 1st element. See [[effSTL]] item on getting a C-style array from a given vector.

Q: does deferenced iterator return an address or a copy?
%%A: not sure. remember iterators are smart pointers, so by definition they redefine operator*()

insert() – versatile technique to populate 1 container from another

Let’s start with the most popular container in STL…

For vector, this is the most versatile conversion-technique —
void insert ( iterator position, InputIterator first, InputIterator last ); // shows

– you can copy another container/array/string in its entirety into a new vector. Essentially convert any container to a vector, or make a copy of a vector — but you need construct an empty container first.
– you can copy part of another container/array/string
– you can copy the data into position 5, shifting 6th element
(all copies are deep copies)

list::insert() is similar —
multiset::insert() —

map::insert() is similarly versatile —
void insert ( InputIterator first, InputIterator last ); // shows
– you can copy another map in its entirety

Multimap has the same insert method —

//However, for interviews the most convenient is similar to python and perl:

map<int, char> m = {{1, 'a'}, {3, 'b'}, {5, 'c'}, {7, 'd'}};

##dark corners within java collections

* after you put a “key” object into a hashset/map, you can shoot yourself in the foot by modifying the object’s state. Its hash code will change!
** 2 objects equals() relationship will change too.

* for sorted set/map, the compareTo() method can be inconsistent with equals(). These sorted containers use compareTo() exclusively and ignore equals(), but developers generally expect these containers to be consistent with equals(). Javadoc says It is strongly recommended (though not required) that natural orderings be consistent with equals(). Sorted sets (and sorted maps) without explicit comparators behave “strangely” otherwise. In particular, such a container violates the general contract for or, which is defined in terms of equals().
** STL sorted containers don’t use equals() at all.

* the equals/compareTo consistency requirement is weaker than the equals/hashcode contract.

* space efficiency of hashset (or treeset) vs hashtable (treemap). A HashSet instance uses a HashTable instance so not so space-efficient. Same for TreeSet vs TreeMap

extern !! q[extern C]

This post is about extern on global variables. The other (overload) usages of extern include
1) extern “C” — see other posts
2) extern “anyOtherLang” — illegal in most compilers. Proprietary feature

Basic purpose of extern on non-functions? Create a declaration without definition. See c++Primer P396. Rules

* use extern, not extern “C”
* on a non-function — basically a global variable
* without initializer

Suppose a.cpp Defines a globalVar1 and this is to be shared.

  1. My rule 1: if globalVar1 is shared only with one b.cpp file, then I could put the extern declaration in b.cpp. Nowadays I seldom choose this option. It’s better to follow one standard rule (Rule 2) rather than multiple rules.
  2. My rule 2: if globalVar1 is shared with potentially multiple cpp files, then it is better to put the extern in a.h

RAM insufficient@@ telltale signs #scanRate

Scan Rate is the most important indicator. There’s a threshold to look for. It indicates “number of pages per second scanned by the page stealing daemon” in unix but not windows. is concise

Paging (due to insufficient Physical memory) is another important indicator, correlated with Scan Rate. There’s a threshold to look for.

duplicate a c-string on stack

            char tmp[strlen(src)+1]; // I guess strlen(src) must be known at compile time

On stack, the array capacity must be determined before runtime.

If you want to declare an array whose capacity is determined at run time, you must use heap. explains strdup()

join/case as alternative to union

Assuming all animals with/without names are wanted, I'd write a unioned outer join — animal left outer join cats union animal left outer join dogs. This would give me a complete intermediate table for further processing.

I don't use union often but I usually find no good alternatives when I need a union. However, shows a join/Case as an alternative to union. I hope it's usable here. The intermediate table probably has the right number of rows (as in animals) but more columns than wanted. Case expression acts as a multiplexer to pick either cats column or dogs column.


Sent: Thursday, May 17, 2012 9:11 PM

To: Kit Cheng; Bin TAN; Song Ding; Xin Zhuang; Sister; Hai Yi

a table of animals (id, name);

a table of cats (id, color);

a table of dogs (id, color);

write a sql to list all the animals whose color is red.

PnL curve pushed towards hockey stick – 2 "pushers"

The vanilla European Call or Put’s valuation graph (against spot price) is a _smooth_ curve Above the kinky hockey stick. I like to remind myself the curve depicts a “range of possibilities”. The “original” curve is a Snapshot of a bunch of what-if possibilities such as “If underlier becomes $x my put’s theoretical value is $y”.

The curve drops towards the hockey stick as expiration approaches. This sentence is important but too complex too abstract. Let’s be concrete and say we are on the 2nd last day of an IBM put. Let’s say IBM is around $100 now.

To keep things simple, let’s say our option is ATM — Scenario 1. The range of possibilities remains the same because IBM can still become $1 or $1000 or any value in between. Theoretical value of our put in that range of scenarios would be different than the earlier days of the option. IBM = $50 -> put = $51… IBM=$100->put=$11… IBM=$200->put=$1.50. Note, in reality IBM will fluctuate around the spot rather than long-jumping, so we can erase all but the middle section of this new curve. Still a smooth curve. If you compare this smooth curve to the original smooth curve, new curve is physically closer to the hockey stick.

What if our 2nd put is deep OTM with K = $50 (Scenario 2)? Well the hockey stick is now a different hockey stick. (Actually it’s shifted to the left). But again as expiration approaches, the “smooth curve” moves down to the hockey stick.

This downward shift is known as option decay. There are rare exceptions, but vast majority of options lose value over time.

During the decay, the smooth curve presumably becomes more convex, as it morphs into the kinky hockey stick. This means gamma increses????

Now, there’s a 2nd way the smooth curve moves towards (or away from) the hockey stick. When IBM implied vol surface experiences a uniform drop, the smooth curve drops in a similar fashion. More specifically, if the implied vol smile curve for our maturity drops, the smooth curve drops towards the hockey stick. This is still too vague too abstract. Let’s be concrete and say smile curve drops to 0 for our maturity. Our put becomes identical to a short position in IBM if current price is below our strike, or worthless if above strike. That means, our PnL graph against IBM price is the hockey stick itself.

Therefore, drop in sigma_i has a similar effect as option decay. A surge in sigma_i will lift the smooth curve away from the hockey stick. Note all of the graphs we mentioned are plotted by BS — so-called theoretical values.

It’s all common sense after you internalize it.

HBase basics

Doing a put always creates a new version of a cell, at a certain timestamp.

To overwrite an existing value, do a put at exactly the same row, column, and version as that of the cell you would *overshadow*.

HBase internally refuses to distinguish insert vs update — both are puts.

–column family–
Columns in HBase are grouped into column families. Physically, only and all column-family-1 data are stored in one file dedicated to family-1. 2 column families are usually stored in 2 physical files.

Not mentioned explicitly, but hbase data is, like everyone else, organized into logical tables, consisting of rows and columns. I think this is the logical view end-programmers prefers.

A table often has a few columns belonging to the same column family.

A {row, column, version} tuple exactly specifies a cell in HBase.

A {row, column, version} tuple exactly specifies a cell in HBase

When deleting an entire row, HBase will internally create a tombstone for each ColumnFamily (i.e., NOT each individual column).

simple answer: it doesn’t, not in the way that RDBMS’ support them (e.g., with equi-joins or outer-joins in SQL).

I believe MR is the join.

HDFS is a distributed file system that is well suited for the storage of large files. HBase, on the other hand, is built on top of HDFS and provides fast record lookups (and updates) for large tables. Hbase internally puts your data in indexed “StoreFiles” that exist on HDFS.

store a billion phone numbers for fast lookup #Barc

The solution I managed to recall in 2015: tree. Each node has the count as payload, and up to 10 child nodes.

Q: efficiently store a billion phone numbers for fast lookup.

I assume realistic usage is read-mostly rather than read-only, but this solution is reasonably efficient in read-write scenarios too. My home-grown idea happens to resemble

Assumption — phone numbers across the world have at most 15 digits i.e. short strings. Our trie would be at most 15 levels deep. Fan-out at any parent node is at most 10.

( This 15-digit assumption is not as restrictive as it may appear. The distribution of any realistic “population” of interest to an information system is often /bounded/ by some important constraints. We seldom need to store data of arbitrary length or arbitrary precision. )

Compared to a naïve array data structure (converting each phone number to integer and treat it as subscript) which requires 10^15 array elements (about a million billion elements), our trie would create only a billion leaf nodes and a comparable number [1] of intermediate nodes.

[1] I’d estimate this number: on 14th level, there can be only a few hundred million nodes. On the 13rd lowest, even fewer. Phone numbers share area codes, so on the 4th or 5th lowest level, we have clusters.

Search is O(1) given the Assumption above — just follow the pointers. I feel performance is more predictable than hash table and not susceptible to bad hashing or skewed distribution.

Insert is O(1) given the Assumption above.

How about a hash table? Most standard implementations use a fill factor below 100%, so we need a bucket array of 1billion+. Memory efficient. I’m not sure about quality of hash code distribution. I feel it’s possible to get many collisions.

paging long JTable

<![CDATA[ /** * based on <> */ public class PagingTable extends JFrame { public static void main(String args[]) { final PagingTable frame = new PagingTable(); frame.setSize(300, 200); frame.getContentPane().add( PagingModel.createPagingScrollPaneForTable(new JTable( new PagingModel())), BorderLayout.CENTER); frame.setVisible(true); } } class Record { static String headers[] = { “Record Number”, “Batch Number”, “Reserved” }; static int counter; static long start_ = System.nanoTime(); String[] data; public Record() { data = new String[] { “” + (counter++), “” + (System.nanoTime() – start_), “Reserved” }; } public String getValueAt(int i) { return data[i]; } public static String getColumnName(int i) { return headers[i]; } public static int getColumnCount() { return headers.length; } } class PagingModel extends AbstractTableModel { protected int pageSize; protected int pageOffset; protected Record[] data; public PagingModel() { this(500, 100); } public PagingModel(int realRows, int size) { data = new Record[realRows]; pageSize = size; for (int i = 0; i < data.length; i++) data[i] = new Record(); } @Override public int getRowCount() { return pageSize; } @Override public int getColumnCount() { return Record.getColumnCount(); } // Only works on the visible part of the table @Override public Object getValueAt(int row, int col) { return data[row + (pageOffset * pageSize)].getValueAt(col); } @Override public String getColumnName(int col) { return Record.getColumnName(col); } // use this method to figure out which page you are on int getPageOffset() { return pageOffset; } int getTotalPages() { return (int) Math.ceil((double) data.length / pageSize); } void pageDown() { if (pageOffset 0) { pageOffset–; fireTableDataChanged(); } } // we’ll provide our own version of a scroll pane that includes // the page up and page down buttons by default. public static JScrollPane createPagingScrollPaneForTable(JTable jt) { final PagingModel model = (PagingModel) jt.getModel(); final JButton upButton = new JButton(“^”); upButton.setMargin(new Insets(0, 0, 0, 0)); upButton.setEnabled(false); // starts off at 0, so can’t go up final JButton downButton = new JButton(“v”); downButton.setMargin(new Insets(0, 0, 0, 0)); upButton.addActionListener(new ActionListener() { public void actionPerformed(ActionEvent ae) { model.pageUp(); if (model.getPageOffset() == 0) upButton.setEnabled(false); else downButton.setEnabled(true); }//method }); downButton.addActionListener(new ActionListener() { public void actionPerformed(ActionEvent ae) { model.pageDown(); // If we hit the bottom of the data, disable the down button if (model.getPageOffset() == (model.getTotalPages() – 1)) downButton.setEnabled(false); else upButton.setEnabled(true); }//method }); return new JScrollPane(jt) { { // Turn on the scrollbars; otherwise we won’t get our corners setVerticalScrollBarPolicy(ScrollPaneConstants.VERTICAL_SCROLLBAR_ALWAYS); setHorizontalScrollBarPolicy(ScrollPaneConstants.HORIZONTAL_SCROLLBAR_ALWAYS); setCorner(ScrollPaneConstants.UPPER_RIGHT_CORNER, upButton); setCorner(ScrollPaneConstants.LOWER_RIGHT_CORNER, downButton); } }; }//method }//class ]]>

PnL roll-up inside Coherence@@

Hi friends,

In my previous project, the cache had to support PnL aggregation by account or by product type (US aviation stock or high-yield bonds etc).  I didn’t find out how it was implemented. How is it done in a typical Coherence system?

In fact, the accounts exist in a hierarchy. For example, individual accounts belong to an office. Each office belongs to a region. Each region belongs to a legal entity…. Similarly, the products are organized in a hierarchy too. PnL is first computed at position level. How are these individual position PnL aggregated in Coherence?

Answer 1 — is the recommended solution by some practitioners. Basically, api-users write 2 classes — a filter and a processor. Filter controls the subset of data entries or you may say the “universe”. Processor does the aggregation. Internally, I was told an aggregation loop is unavoidable.

I feel coherence data entries are organized into maps along with auxiliary lookup tables to support predicate-assisted select queries like “where = UK”.

Answer 2 — also mentions an EntryAggregator.

O(n) sort – count`sort

#1 requirement: keys [note 1] be very small integers, small enough to be array indices.
# 1.1 I think this requirement is met whenever you notice an enum of “possible values”

[1] “values of keys” is perhaps more accurate.

This requirement alone reduces its applicability to a “niche”, but an extremely important niche. In fact, i think every time this requirement is met, we should consider count`sort. How about sorting the letters in a single english word? Remember the “anagram” question?

If the values are small integers of 0,1,2…k, then u need a temporary array of k+1.

count`sort is stable, and therefore usable in radix sort.

localSys knowledge about a Live system #3 sources#GS

When you work in financial IT, real insights into a live financial [1] IT system comes from

A) hands-on experience, often local system knowledge

B) user interaction — so you know the 80% of the codebase that are NOT important.

B2) production problem solving, with user-impact or business impact. As explained elsewhere, these steers our effort to the key sections of codebase. 20/80 rule — 20% of the codebase holds 80% of the system functionality, business logic and production issues.

Without B, you as a developer can be familiar with many sub-systems but without confidence to support users on your own, implement numerous changes, assess impacts, estimate effort, identify root causes… In many organizations developers aren’t assigned such tasks, but GS was very different and very effective.

C) good domain knowledge — This is required only in quantitative systems. Otherwise, I feel A+B will give us (system developers) more than enough domain knowledge. When C is relevant, then A+B will make you productive without “insight”.

[1] including real-time trading systems, reporting systems, financial analysis systems, STP etc.

first class functions, my educated guess

First Class Function as a concept is over-complicated for someone looking for simple concrete, non-abstract explanations. In the primary usage, we pass a FCF into a function. Callable parameter. Most tutorials would apply the FCF on a list/sequence/stream.

Lambda is the most common example of a callable parameter. Arguably the most useful example of FCF.

Command pattern is more sophisticated than FCF. Command often has state to hold configuration and additional payload.

* Delegate in c# is a more complete example, a well-integrated component of the language. In c#, Lambda is implemented using delegates
* most scripting languages have good support for callable parameters.
* Before java 8, an interface and an anon class are the only implementations in java
* C offers function pointers, as a primitive support of callable params.
* C++ offers functors esp. in STL
I don’t want a language lawyer over the exact definition of “first class function” or FCF. I’m trying to get the essence of “first class function”. I think the basic distinctions are

1) a FCF can be passed in and out of other functions

To understand these features, we must understand the regular, non-first-class functions. In C (and perhaps earlier languages), a function (say, play()) is compiled into a sequence of machine code, which gets loaded into the “text section” of a process memory. Text section is a kind of memory so every chunk of code including play() gets an address. The address of our chunk (play()) is a so-called function pointer. In C, we can pass this pointer (to play()) into a routine, and inside the routine invoke play(). This is the most primitive fist-class function.

2) a FCF can be assigned to a variable, and passed around
3) a FCF Object must have an address

STL supports function pointers but encourages the more advanced functors. Java doesn’t support function ptr or functor and favors interface. A method can receive an argument of type IPlayer, and inside the routine, invoke This is fundamentally different from first-class function. Java’s Method object is another thing to contrast with FCF.

Anyway, back to FCF. I feel FCF is more of a syntactical construct to ease and encourage certain style of coding. In comparison, the underlying implementation by compilers is less important. Most app developers do not need to care about the compiler implementation of FCF.

smartPtr – remember the 3 types involved

I feel it’s easy to confuse the 3 data types involved in even the simplest smart pointer. It’s less confusing when you instantiate the class template and slightly more confusing when you have to program in templates.

Say you use smartPtr, there are really 3 data types involved here
1) T
2) pointer to T
3) smartPtr

There are also at least 3 objects involved
1) a single pointee object, of type T
2) 1 or more 32-bit raw-pointer objects of type T*
3) Multiple instances of smartPtr

At clean-up time, we need to
1) deallocate the pointee object, assuming it’s on heap
2) don’t worry about the raw pointer objects. Someone would take care of it. However, we assume no one calls delete() on one of them!
3) the smartPtr instances are typically low-maintenance. They are often stack-allocated or are (non-ref) fields of bigger objects — non-ref meaning a field of type smartPtr not smartPtr* or smartPtr&. When such a smartPtr instance is cleaned up, we don’t need to worry about anything.

Now we are ready to understand circular reference.
– A smartPtr instance myCar holding a raw pointer to a Car instance, which has a field which is a smartPtr instance;
– this smartPtr instance holds raw pointer to a Driver instance.
– the driver instance has a field that is myCar.

All 6 objects are unreachable and unusable, but look at the 2 reference counts.

c++ Push all zeros in array to the end #easy

Basic programming exercise

#include <cstdlib>
#include <iostream>
#include <iterator>
#include <assert.h>

using namespace std;
int a[] = {0,0,-1,2,0,4,0,0,8,0};
Push all the zero's of a given array to the end of the array. In place only. Ex 1,2,0,4,0,0,8 becomes 1,2,4,8,0,0,0
int main(int argc, char *argv[])
   size_t size = sizeof(a)/sizeof(a[0]);
   copy(a, a+size, ostream_iterator<int>(cout, " "));
   int left = 0;
   int right = size -1;
   while(left < right){ if (0==a[left]){ while (0==a[right]) --right; if (left >= right) break;
          swap(a[left], a[right]);
   cout<<"\n new: \n";
   copy(a, a+size, ostream_iterator<int>(cout, " "));

mode finder algo #java boxing

“We have a set of integers, the size is 1 mill. The value of
those integers can go as large as 1 trillion. So the set might look
like [1,3, 3, 100, 23, 100, 100, 89,999999, 89, 100000000, 10000000,
10000000000000, 39, 3, 23, 1,1,1,1,1,1000, 10000000000000….], now we
need to find out the individual single integer which is mostly
repeated. Say if integer 3 repeated 1000 times, no other integers have
dups as many as it does, integer 3 will be the one we are looking for.”

I did a java solution using HashMap. Now I feel auto-boxing is a growing performance drag with increasing volume. Whenever you compute hashcode on an Integer, you must box it up to a heap object. Counting up also requires unboxing and then boxing. C++ would use simple integers.

A Java solution can avoid auto-boxing completely. Build a custom HMap class with an array of buckets, each containing a pointer to a custom LList of Pair objects. Each Pair instance consists of 2 primitive integers + pointer to the “next” Pair object.

How about a sparse array?

RV vs 29West is a personal blog I find fairly trust-worthy.

As a reminder, distributed systems operate a series of broker instances, where each one is located in the same process, VM, or OS as the code participating in the message network. For example, it might be a library that's bound into your process space (like 29West), or it might be a daemon that is hosted on the same machine (like Rendezvous). These distributed components then communicate over a distributed, unreliable communications channel (such as Ethernet Broadcast or IP Multicast).

select2perform/brain-bench c++ Q&A

Q: vector1.erase(remove(vector1.rbegin(),vector.rend(),someValue).base())

Q: To put 5 identical values into a vector,
std::fill_n() ?
std::fill() ?
Q: If class E virtually extends C and extends D, where C/D both virtually extend B, what’s the construction sequence?
AA: P1000 [[primer]]

Q: find intersection of 2 compatible vectors?
%%A: std::find_first_of()
Q: can you pass a float into a func(int) or func(char)?
AA: yes but if you cout in func(), it interprets and prints the object differently.
Q: can a non-static reference field be initialized upon declaration?
%%A: i don’t think so, reference field or nonref. The java-style quick initi is illegal for non-static.
AA: gcc allows java-style init only for a static AND const field.

Q: can you “extern” a prototype and then declare the same prototype again?
AA: legal

Q[u]:difference between “new Account” and “new Account()” with the parentheses?
A: subtle difference. See other posts on this blog such as

Q[u]: declare a pointer to a method accepting an int and returning void. Need to memorize the exact syntax
AA: void (Dog::*ptr2method) (int);

Q: my ex spec mentions std::runtime_error. Can I throw a range_error, or an std::exception, or std::invalid_argument?


Q: reverse print a string using copy to cout
AA: copy(str.rbegin(), str.rend(), ostream_iterator< char >(cout, ” “));
* The ” ” 2nd argument is not necessary, but I’d suggest we memorize just a single form of the ostream_iterator ctor, the 2-arg form.
* The type is char not  ptr-to-char

Q: using N1::f; f(); using N2::f; f();
AA: GCC complains 2nd all to f() is ambiguous. explains 

Q[u]: multiple inheritance dreaded diamond – what if one of the 2 “virtual” keywords is omitted? How many times will you call the base constructor?

Q[u]: anything wrong with
Base & var = static_cast(*(new Derived));
AA: legal in gcc.

[u = unimportant/trivial/unlikely to be issue in practice. Often compiler will inform you clearly.]

mutex^condVar^countingSemaphore – 3 synchronization devices

I believe mutex is a primitive construct, hardware-wise, therefore elemental and low-level. It’s NOT built using condition variables or counting semaphores. In contrast,

– mutex is usually needed _inside_ counting Semaphore. See OS/2 and java. Also see solaris pthreads sem_wait() and sem_post() [1] as described on P93 [[Bil Lewis]]
– mutex is usually needed as guard _around_ Condition var. See java, pthreads [[Bil Lewis]] and boost thread.
** In fact, both work similarly. See similarity between the wait-loops in [[Bill Lewis]].

Complicating the big 3 interdependency, when one of several “grabbing” threads is given the contended mutex, system may internally send signals, but that happens at a lower level than a condition var. Condition variable and Counting semaphore probably depends on the same low-level signaling. This low-level signaling construct is not an API construct and not one of our big 3.

counting semaphore is less fundamental than mutex and condition variables.
* java has built-in support for M and C but not S
* OS/2 doesn’t support counting semaphore. Note For some time, pthreads, win32 and OS/2 are the 3 dominant thread implementation,

[1] pthreads sem_wait() means “grab a permit”, sem_post() means “return my permit to the pool”.

— Jargon clean-up —
There are 2 distinct meanings of “semaphore”+ some vague meanings.
1) Counting semaphore is the first and best known semaphore API.
2) Other types of semaphores were added, notably System V IPC semaphores.
x) In some discussions, “semaphores” vaguely mean “various synchronization variables” and includes mutex, condVar and countingSemaphores.

The counting semaphore concept is fundamental and elemental, but implementation-wise, it’s always constructed using mutex and condVar.

Some people chose to write their own counting semaphore using mutex, condVar and a counter

Q: why is “lock” not in our big 3.
A: “Lock” often means mutex, but is sometimes a wrapper on a mutex, as in boost.

— sysV semaphore and shared memory
Linux supports three types of IPC mechanisms — message queues, sysV IPC semaphores and shared memory. As the Camel book points out, those are the same 3 sysV IPC constructs.

Once the memory is shared, there are no checks on how the processes are using it. They must rely on other mechanisms, for example System V semaphores, to synchronize access to shared memory.

const – common subversions

Whenever I see “const”, I always remind myself at least 2 “backdoor” ways to break it

– mutable keyword

– const_cast

There are more techniques of subversion. In C we often see pointer-to-const, but in C++ ref-to-const is more common, and often known simply as a constant-reference. Suppose we have a const ref to a pointee object myOldCar. myOldCar may have another reference to it. That (non-const) reference can modify myOldCar's internal content behind our back!

Offsite — Even if myOldCar is a const object, it may have a pointer field transistorRadio. Since the actual radio object is “offsite” i.e. outside the real estate of myOldCar, that object can change its content. myOldCar is effectively non-immutable.

Incidentally, other software constructs have backdoors and subversions too

* Whenever I see singleton, I always remind myself of those subversive techniques to make multiple instances of the class.

* Whenever I see a private ctor, I always remind myself of those subversive techniques to instantiate this class without this ctor.

remain relevant(survival)to your team^global economy#localSys

When we feel job insecurity, we look around and see who’s more secure, more relevant, more current, and more in-demand. We may find some colleagues more relevant to your team or to the company. Maybe they are local system experts. Maybe they have deep expertise in the base vendor product (Tibco, Murex, LotusNotes, WebLogic…). We may want to _specialize_ like them.


Job security derives from skill relevance, but relevant to who? The industry, not the local team, which could disappear if entire team becomes obsolete and irrelevant. I have seen people (including myself) specializing in DNS, VB.NET, Swing, Perl, EJB, let alone home-grown systems such as SecDB. These guys are (extremely) relevant but for a limited time only.

Have a long-term perspective — If you want job security through skill relevance, then invest in more permanent skills such as Math, algorithms, data structures, threading, SQL,…

Have a global perspective — Job security is a serious, almost life-and-death issue. You probably should look beyond your local industry. What skills are truly relevant on the global arena? Some skill (say Murex) might be more relevant locally but less globally.

Avoid niche domains unless it helps boost your mainstream skills.

fair-dice/fair-coin #already internalized

Game: keep tossing a dice until you quit. Last number is what the host pays you.

I believe most player would keep tossing until they get a 6, therefore,
– Average(earning) = $6, However,
– Average(value across all tosses) = 3.5

So why the fair-dice characteristic (3.5) doesn’t apply to the first average? To investigate the average, we use machines — a video-cam “recorder” to record one toss at a time.

– One recorder, the all-toss recorder, records all tosses blindly. It will probably show Average 3.5.
– One recorder, the earning recorder, doesn’t record “every” toss blindly. The 1s and 2s aren’t recorded. Instead, the player controls when to record. So she turns on the recorder AFTER her last toss of a game. Recorder will probably show Average 6.

Conclusion — You can rely on the unbiased-dice statistical properties only if recorder is unbiased.

Q: is the recorder started before or after a toss?
If ALWAYS “before” ==> then unbiased.
If sometimes “after” ==> then biased. For example, if someone records just his 4s then the recorder would include nothing but 4s — Completely biased.
Here’s a similar paradox — In a country where every family wants to have a boy everyone keeps having a child until they have a boy, at which point they stop. What is the proportion of boys to girls?
A: 50/50. Average ratio = 1.0, due to the unbiased all-toss recorder.

What if we only count the last child — What’s the proportion of boys to girls? Not 1.0 any more. Reason: the recorder starts AFTER the toss — biased.

##hidden cost,stolen CPU cycles: latency engineering

Update — “kernel thread” means something special in linux…


Latency /engineering/ and optimization is all about the implicit operations, hidden costs and “stolen” cpu cycles. Incur the minimum CPU costs for a task.

  • eg (practical!): function A calling B, calling C is one more stack frame than A-calling-C, unless B is actually inlined.
  • eg: boxing/unboxing — extra work for cpu. Also creates garbage for the GC.
  • eg: one thread switching between multiple sockets — more cpu workload than one thread dedicated to one (exchange) socket
  • eg: un-contended lock acquisition — more work than no-lock, partly due to memory fence
  • eg: garbage collection – competes for CPU at a bad time. Usually, if there’s no OOM, then the GC thread is very low priority and won’t slow down a critical task.
  • eg: page swap as part of virtual memory systems — competes for CPU
  • eg: vtbl lookup — adds a few clock cycles per function call. To be avoided inside the most critical apps in an exchange. Therefore c developers favor templates than virtuals
  • eg: RTTI — latency sensitive apps generally disable RTTI early on — during compilation
  • eg: null terminator at end of c-strings — adds network traffic by x% unnecessarily
  • eg: inline – trims cpu cycles.
  • eg: one kernel thread mapping to multiple user threads — fastest system should create no more user threads than the maximum cpu (or kernel) threads, so the thread scheduler doesn’t need to run at all. I feel this is possible only in a dedicated machine, but such a machine must then communicate with peripheral machines, involving serialization and network latency.

For a dedicated kernel thread to service a busy stream of tasks, we need to consider what if the tasks come in bursts so the thread becomes temporarily idle. One idea is to suspend the thread in wait() but i believe kernel thread can’t be suspended. In the kernel, a common approach is to simply keep the thread in spinlock. Assumption is, one cpu is exclusively dedicated to this thread, so this cpu can’t do anything else even if we suspend this thread.

Live hardware threads ^ kernel threads – Barrel processor

Exec summary — in one cpu cycle a Barrel-processor machine (like T2000) can have
– 8 executing threads
– 32 Live hardware threads
– hundreds of kernel threads
– thousands of userland threads
I worked for a while on the Sun T2000 server supporting 4 cpu threads per core. explains

– Such a cpu-core does not allow execution of multiple instructions in one cycle. So “Simultaneous” is misleading.
– Each thread running at roughly 1/4 the original CPU speed

A single-tasking processor spends a lot of time idle, not doing anything useful whenever a cache miss or pipeline stall occurs

In the T2000 (8-core, 32 hardware threads), if you have 100 processes, then you have at least 100 kernel threads (software threads). Optimally, 32 of them would be Running as “live hardware-threads” since each is assigned to a cpu core. In one cycle, 8 of them are executing, while the other 24 are in pipeline inside the cpu core. sheds more light.

java iterators – weekly consistent vs fail-fast

Roughly speaking, java iterators are either weakly consistent (WC) or fail-fast (FF). The “3rd” category is snapshot iterator for copy-on-write. See [[java generics]]

WC is in 1.5
FF is in 1.4.

WC don’t throw ConcurrentModificationException. This particular Exception is the “FAIL” part of “FAIL-fast”.

I feel STL iterators provide no thread-safety and fall into none of these 3 categories. Since there’s no consistent threading support in c++ standard library, such features must be implemented by yourself under a particular thread library.

How about c#?


Gel (testing algo)

1.       Write a blockingqueue using wait / notify
2.       Traversal of non-binary-tree,
3.       What if the leaf pointing back to the root, (infinitive loop), how to prevent it?

Solamon (testing threading)
1.       Write a thread safe class, getValue, incr (int val++) (sync on both, use volatile, or use AtomicInteger);
2.       If AtomicInteger is used, do u know how its method incrementAndSet work? oldVal = value; newVal = value; if(newVal != value)…
3.       Besides the performance penaty of lock, what’s the big problem locking have? Deadlock
4.       How to generate a deadlock?
5.       The performance issue with locking? Like val++, that require two machine inst, but sync around it will take hundreds of inst; on the other hand, database ops take million of intrs so 100 is neglective
6.       Lock free system?
7.       Sorting types. Bubble sorting, selection sorting, merge sorting and quick sorting. Performance of each, in big O notation. For merge sorting, is nlogn. N for divide and logn for merge.
Joe (testing architecture)

1.       Write customized list class that will work as general list but with version history info. The signature like get(int index, int verNum)
set(int index, String val) it will return new version number;

2.       2*3-5, parse the string, the final string form is 23*5-, input is a bin-tree like
2 3   5

Do the pre-order transversal, the output will be 23*5-

3.       List, break down to a batch of lists, list<list>, the size of inner list is about 10.

throwing dtor: %%justified use cases

Status — still I don’t have a well-justified use case.

I feel throwing dtor is frowned upon but not “illegal”. In practice it’s seldom done by design. When used, it’s not worth the analysis. In interviews, however, this receives disproportionate attention.

However, in practice, there are reasons to break the rule. Suppose I have

int i1, i2;// temp variables,
  i1 = myobj->eat();
  delete myobj;
}catch(business_exception & be){
  //handle exception using the temp variables i1, i2 etc

Since eat(), drink() etc and dtor all throw the same business_exception, this code is clean and maintainable. If we need to throw more than one exception type from those 3 functions, we can easily add the code. The same exception handler is used as a catch-all.

It would be messy to pass the temp variables i1, i2 etc into myobj dtor and replicate the same exception-handling logic therein.

So in this case, I’d make myobj dtor throw business_exception.

Now, as described in [[moreEffC++]] myobj dtor is invoked as part of stack unwinding due to another exception? [1] Well, in this case, I know that’s a fatal scenario and I do want system to crash anyway, like an assertion error, so the terminate() behavior is not unacceptable.

In other words, myobj’s class is written such that its dtor should throw exception only under normal object destruction and should never be part of an exceptional stack unwinding. In such a case, no one should misuse this class in an exception-unsafe context. If they ignore the restrictions on this class, they could get this dtor invoked as part of an exceptional stack unwinding, and the consequence is something they must deal with.

[1] in c++11, system will trigger std::terminate() whether or not this is part of unwinding. See

[12]call`same method@unrelated types: template outshines java

Suppose pure abstract class Animal has a die() method, and so does Project, Product, Plant and Planet, but they don’t share a base class. How would you write a reusable function that invokes this method on a generic input object, whose type could be any of them?

Java can’t do this. In C++ you create

template<typename T>  f1(T input){ input.die(); }

If you pass an int into f1(), then you get compile time error. Probably a linker error. Is this SFINAE ? I doubt it.

STL algorithms routinely take an iterator argument and then call operator>() on the iterator. Now, this operator is undefined for a lot of iterator INSTANCES. I think only RandomAccessIterator category supports it.

Q: So how does STL make sure you don’t pass an object of ForwardInterator category into such a function?
A: use the template parameter type name (dummy type name) as a hint. Instead of the customary “T”, they put a suggestive dummy type name like “BidirectionayInterator” or “InputIterator”. If you ignore the hint you get compile-time error.

Now we understand that STL iterators have no inheritance hierarchy, but “stratification” among them.

low latency: C++preferred over java

(Fastest is FPGA, but Let’s put that aside. )

Low latency developers generally prefer C and avoid OO (runtime overhead), but they do use C++ templates to achieve power and flexibility.

In terms of data structures, I think they use STL too. Array is essential. In contrast, graph data structures incur additional allocation due to the graph Node objects — no such overhead in arrays.

Java’s issues —
* autoboxing — market data use mostly of primitive objects
* Every instance takes something like 8+ bytes.
* Indeterminate garbage collection
* virtual function overhead
* Even purely local variables are often put into heap for delayed clean-up
* JVM could reach good throughput wrt c++, but only after a slow warm-up.

Functional lang – common list operations – python, STL, LINQ … shows a fundamental Functional Programming technique — A) apply a pure function (possibly a closure[1]) on successive elements of a sequence, B) and then perform accumulation, filter/grep, reduce etc across the entire sequence. Here I described it as a 2-stepper but you do not have to feel this way.

Python offers various elements of this strategy — list comprehension, apply(), filter(), map(), reduce(),  … but are they used with pure functions?

FP favors immutable objects. Basically,

      tuples + list operations in (B) = a major component of FP

Sorry about the loose language… Mine is very different from the academic/textbook presentation of FP, but in terms of techniques, mine is not too far.

STL offers many similar operations to (B), esp. with the *_copy and *_if algorithms in
– find if
– count if
– remove_copy
– replace_copy

[1] I feel a closure qualifies as a pure function even though the output depends on some data in the “environment”.

I’ll stick my neck out and say that STL iterator concept is a defining feature(?) of a sequence.

return a matrix – imitate pthread_create()

As explained in other blog posts, a language-neutral dll (dynamic link library) should not receive or return an instance of C++ matrix type, since another language may not know its memory layout. So how does a function return a matrix (or 2 matrices?)

I feel we can take a page from pthread_create(), where the parameter representing the new thread is a pre-allocated but empty buffer. The pthread_create() function-call populates that buffer.

Well, some people don't like pre-allocation. They want to return a pointer. Where is the pointee physically?
– stack? pointee will be bulldozed before use
– global area like a global variable? next call to the same function will overwrite it. Concurrent call to the function will lead to “overwrite each other”
– heap? who will free the memory? Has to be caller. Caller would need to copy the data somewhere if it is not in a position to “free it after use”

In the pre-allocate scenario, the caller would set up an empty matrix in the format of the library, pass its address to the function, and later read the populated matrix. Matrix can be on heap, stack or global.

template specialization ^ virtual function

To achieve polymorphism, template specialization is an alternative to virtual functions. The way a “behavior” is chosen for a given type is by compile time template instantiation.

In contrast, virtual functions (including essentially all java non-static methods) use run time binding and require additional run time cost and per-instance memory cost –vptr.

In efficiency-sensitive and latency-sensitive apps, the choice is obvious.

pass a matrix into a (language-neutral) math DLL

The argument passing convention might be different between languages. For our math DLL to be callable from fortran/C (the 2 basic languages for numerical computing?), I feel the lowest common denominator is a pointer to a buffer.


Caller would populate the buffer with the numbers of the matrix. Each number should be in a language-neutral binary format, following a specific bit layout, in case different languages represent a double differently. Something like 53 bits for the fraction, and 11 bits for the exponent.


Then caller invokes the library routine passing in the address. Inside the routine, we read the buffer to reconstruct the matrix into our own proprietary (C++) matrix instance and proceed. After this point, we are in c++ land and need not worry about other languages.


(purely my personal design.)

fxo premium as percentage of notional: mismatch

There’s an illustration in Tony’s lecture notes and also in the homework — OTM 3 month 1.10 EUR call USD put, with Spot EUR/USD = 1.07

  • Notional is USD1.1 and EUR 1.0, Not using the spot conversion rate
  • Assume the premium = USD 0.0200, paid today in USD per EUR
  • We could also say premium = 0.02/1.07 = EUR 0.0187
  • Then Pnumccy = 0.0200, or a trader might say “200 USD pips”
  • per USD notional i.e. “USD %” would be 0.0200/1.10 = 1.82%, Pnumccy% = 0.0182
  • per EUR notional i.e. “EUR %” would be 0.0187/1.0 = 1.87%, Pbaseccy% = 0.0187
  • Somehow, we got 170 EUR pips due to 0.0182/1.07 ???
  • “P” stands for premium

Observation — last line 1.87% means the premium is 1.87% of the EUR notional; the 1.82% means the same premium is 1.82% of the USD notional.

The paradox — the same premium is 1.87% of the EUR notional but only 1.82% of the USD notional !

(In general, paradoxes provide “aha” moments and great learning opportunity.)

Analysis (thin->thick->thin):

Key — the USD notional and EUR notional can have (vastly) different values before maturity.

Let’s focus on OTM (more common). Without loss of generality, let’s consider deep OTM. In such a case the EUR notional is an trivial amount and the USD notional is a king’s ransom.

Naturally, the premium as percentage of the ransom is tiny.

fill factor = ratio of inserts/buckets

(I favor “Fill Factor” over “Load Factor” …)

Contrary to some popular belief, in a hash table there are usually More buckets than inserts, so a reasonable load factor is Always below 100%, and Never above 100%. In such a case, average chain length is below 1.0. A typical bucket holds 0 or 1 element only. Collision is very, very rare.

(A degenerate hash table — ) A load factor of 150% means something like “100 buckets, but 150 inserts, so average 1.5 items per chain” in terms of separate chaining.

Load factor is also known as __fill_factor__ which indicates how many percent of the bucket array is __filled__. Note we care about how many percent of the array, not how many percent of the entries. The array represents capacity of the storage”hardware”


[13]c^c++ thread libraries

Update — [[understanding and using c pointers]] c2013 says that C11 standard implements threading, but it’s not widely supported as of 2013. Among the supported C threading libraries, pthreads is most widely supported.

pthreads — not much change since the beginning.

Boost, roguewave and other C++ thread libraries are often (sugar coated) class libraries over C-level thread libraries. On a given platform, if there is pthreads and another thread library at C level, then the class library usually provides a consistent OO wrapper to hide their differences. I feel there will never be significant capabilities added to the c++ library and not added to the underlying C library. Here’s my reasoning —

Like any system call and standard system library, thread libraries are designed to support multiple languages and diverse applications. Many languages and many essential applications are still written in C not C++. Multi-threaded or not, C is the true common denominator, not c++. (All unix thread libraries I know are C-based). It’s inconceivable for a thread library vendor with new “ideas” to add them at the c++ layer rather than the underlying C-layer. says —

Many of today’s operating systems now provide support for multithreading and synchronization. This support takes the form of a low-level procedural API accessed as C language functions. is a detailed TOC

y bond traders need drv, briefly

An IT veteran said — either to reduce or increase exposure. Exposure to many variables, not only interest rate.

As market maker, the “Reduce” is essential.

For both buy-side and sell-side, the “Increase” can be attractive. I feel derivatives provides leverage — up to hundreds of times higher exposure. Lida (MSFM) discussed “exposure…

STL is C-developer friendly

P235 [[STL Tutorial and reference guide]] points out that STL uses no polymorphism and almost no inheritance. Apparently, STL is OO-lite. As stated elsewhere on this blog, STL achieves its flexibility, extensibility, power and coherence via templates.

I feel C developers would find STL easy to use.

They would find STL easy to Understand if they learn templates.

abstract overriding concrete method

Real example P 95 [[ head first design patterns ]]

Q: abstract method overrid` concrete method@@
A: Yes (removed) says “An instance method that is not abstract can be overridden by an abstract method.” and gave examples. I think it can be used to remove “toxic” services and prevent a “consumer” from calling it by mistake.