dynamic dice game (Zhou Xinfeng book

P126 [[Zhou Xinfeng]] presents —
Game rule: you toss a fair dice repeatedly until you choose to stop or you lose everything due to a 6. If you get 1/2/3/4/5, then you earn an incremental \$1/\$2/\$3/\$4/\$5. This game has an admission price. How much is a fair price? In other words, how many dollars is the expected take-home earning by end of the game?

Let’s denote the amount of money you take home as H. Your net profit/loss would be H minus admission price. If 555 reasonable/intelligent people play this game, then there would be 555 H values. What’s the average? That would be the answer.

It’s easy to see that if your cumulative earning (denoted h) is \$14 or less, then you should keep tossing.

Exp(H|h=14) is based on 6 equiprobable outcomes. Let’s denote Exp(H|h=14) as E14
E14=1/6 \$0 + 1/6(h+1) + 1/6(h+2) + 1/6(h+3) + 1/6(h+4) + 1/6(h+5)=\$85/6= \$14.166

E15=1/6 \$0 + 1/6(h+1) + 1/6(h+2) + 1/6(h+3) + 1/6(h+4) + 1/6(h+5) where h=15, so E15=\$15 so when we have accumulated \$15, we can either stop or roll again.

It’s trivial to prove that E16=\$16, E17=\$17 etc because we should definitely leave the game — we have too much at stake.

How about E13? It’s based on 6 equiprobable outcomes.
E13 = 1/6 \$0 +1/6(E14) + 1/6(E15) + 1/6(E16) + 1/6(E17) + 1/6(E18) = \$13.36111
E12 = 1/6 \$0 + 1/6(E13) +1/6(E14) + 1/6(E15) + 1/6(E16) + 1/6(E17) = \$12.58796296

E1 =  1/6 \$0 + 1/6(E2) +1/6(E3) + 1/6(E4) + 1/6(E5) + 1/6(E6)

Finally, at start of game, expected end-of-game earning is based on 6 equiprobable outcomes —
E0 =  1/6 \$0 + 1/6(E1) + 1/6(E2) +1/6(E3) + 1/6(E4) + 1/6(E5) = \$6.153737928

Essential BS-M, my 2nd take

People ask me to give a short explanation of Black-Scholes Model (not BS-equ or BS-formula)…

I feel random variable problems always boil down to the (inherent) distribution, ideally in the form of a probability density function.

Back to basics. Look at the height of all the kids in a pre-school — There’s a distribution. Simplest way to describe this kind of distribution is a histogram [.8 -1m], [1-1.2m], [1.2-1.4m] … A probability distribution is a precise description of how the individual heights are “distributed” in a population.

Now consider another distribution — Toss 10 fair dice at once and add up the points to a “score”. Keep tossing to get a bunch of scores and examine the distribution of scores. If we know the inherent, natural distribution of the scores, we have the best possible predictor of all future outcomes. If we get one score per day, We can then estimate how soon we are likely to hit a score above 25. We can also estimate by the 30th toss, how “surely” cumulative-score would have exceeded 44.

For most random variables in real life, the inherent distribution is not a simple math function like our little examples. Instead, practioners work out a way to *characterize* the distribution. This is the standard route to solve random variable problems because characterizing the underlying distribution (of the random variable) unlocks a whole lot of insights.

Above are random variables in a static context. Stock price is an evolving variable. There’s a Process. In the following paragraphs, I have mixed the random process and the random variable at the end of the process. The process has a σ and the variable (actually its value at a future time) also has a σ.
———–
In option pricing, the original underlying Random Process Variable (RPV) is the stock price. Not easy to characterize. Instead, the pioneers picked an alternative RPV i.e. R defined as ln(Sn+1 / Sn) and managed to characterize R’s behavior. Specifically, they characterized R’s random walk using a differential equation parametrized by a σinst i.e. the instantaneous volatility [1]. This is the key parameter of the random walk or the Geometric Brownian motion.

Binomial-tree is a popular implementation of BS. B-tree models a stock price [2] as a random walker taking up/down steps every interval (say every second). To characterize the step size Sn+1 – Sn, we wanted to get the distribution of step sizes but too hard. As an alternative, we assume R follows a standard Wiener process so the value of R at any future time is normally distributed. But what is this distribution about?

Remember R is an observable random variable recorded at a fixed sampling frequency. Let’s denote R values at each sampling point (i.e. each step of the random walk) as  R1, R2 ,R3, R4 …. We treat each of them as independent random variables. If we record a large series of R values, we see a distribution, but this is the wrong route. We don’t want to treat time series values R1, R2 … as observations of the same random variable. Instead, imagine a computer picking an R value at each step of the random walk (like once a second). The distribution of each random pick is programmed into computer. Each pick has a distinct Normal distribution with a distinct σinst_1, σinst_2, σinst_3 …. [4]

In summary, we must analyze the underlying distribution (of S or R) to predict where S might be in the future[3].
[4] A major simplifying assumption of BS is a time-invariant  σinst which characterizes the distributions of  R at each step of the random walk. Evidence suggests the diffusion parameter σinst does vary and primarily depends on time and current stock price. The characterization of σinst as a function of time and S is a cottage industry in its own right and is the subject of skew modelling, vol surface, term structure of vol etc.

[1] All other parameters of the equation pale in significance — risk-free interest i.e. the drift etc.
[2] While S is a random walker, R is not really a random walker. See other posts.
[3] Like in the dice case, we can’t predict the value of S but we can predict the “distribution” of S after N sampling periods.

speak freely, westerner, humor…

Sometimes you feel tired of paralanguage-monitoring and self-shrinking – it can feel tiring[4] (for the untrained) to be always on our toes and to avoid sticking-out. Quiet people are presumably more comfortable with that but I'm not a quiet person (though I'm rather introspective or “looki”)

Sometimes you just want to, for a moment, be yourself, express (not in a loud, in-your-face way, but in a Passive way) your individuality and leave the judgement to “them”. I seem to have many family members + colleagues/bosses having that tendency, though each of them decide when to show it and when to Control it.

Humour is a decisive part of it. Some would say “No humour, don't try it.” I'm not humorous even though I find many words I speak somewhat amusing.

99% of the time I decide to “let my hair down” and speak “freely”, it has been a (conscious or semi-conscious) gamble since I have no control over the situation [1]. I suspect a lot of times the negative reaction in the audience couldn't be completely offset by the positive. Let's face it, if there's any trace of negative reaction after you say something, it tends to last a long time, even if the positive is much more. Especially true if the negative is taken personally like a joke about age or weight. Showbiz people like to take on those sensitive topics… because they can, but it's foolhardy to “try it at home”. You also see people communicating rather directly in movies and in publications, but it's an exaggerated/distorted version of reality. In reality that kind of speak is rare, shocking. It's like playing with fire.

In my (somewhat biased) perception, I tend to see westerners as less restrained, more individualistic, speaking-for-own-self, carefree, less careful, less rule-bound… This long list of perceptions would eventually lead to the American ideal of individual “freedom”, but that word alone would be oversimplification.

Experience — In my first 5 years of working, I was often the youngest team member. I didn't have to “image-manage” myself as a future leader. I rarely had junior staff looking up to me.

[1] major exception is the last few days on any job.

[4] I suspect most of us can get used to it, just like children getting used to self-restraint once in school.

PCP – 3 basic perspectives

#) 2 portfolio asset values – at expiration  or pre-expiration. This simple view IGNORES premium paid
See http://bigblog.tanbin.com/2011/06/pcp-synthetic-positions-before.html

#) 2 traders' accounts each starting with \$100k cash. This angle takes premium into account.

#) current mid-quote prices of the put/call/underlier/futures should reflect PCP, assuming good liquidity. In reality?

skew bump on a smile curve

Background — in volatility smile (rather than term structure) analysis, we often use a number (say, -2.1212) to measure the skewness of a given smile curve. Skew is one of several parameters in a calibrated formula that determines the exact shape of a given smile curve. Along with anchor volatility, skew is among the most important parameters in a parameterization scheme.

We often want to bump the skew value (say by 0.0001) and see how the smile changes.

A bump in skew would make the smile curve Steeper at the ATM point on the curve. You need to look at both the put (low strike) and call (high strike) sides of the smile curve. If the bump causes put side to move even higher and call side to move even lower, then skewness is further increased. In many cases, skew value of the entire curve is (approximately?) equal to the slope (first derivative) at the ATM point.

If you mistakenly look at only one side of the smile curve, say the put side, you might notice the curve flattening out when skew is bumped. Entire left half may become less steep, while the right half was rather flat to start with. So you may feel both halves become less steep when skew is bumped. That’s misleading!

Note skew is usually negative for equities. A bump is a bump in the magnitude.

theoretical numbers ^ vol surface

After you work in volatility field for a while, you may figure out when (and when not) to use the word “theoretical”. There’s probably no standard definition of it. I guess it basically means “according-to-BS”. It can also mean risk-neutral. All the greeks and many of the pricing formulas are theoretical.

The opposite of theoretical is typically “observed on the market”, or adjusted for skew or tail.

Now, the volatility smile, the volatility term structure and the vol surface are a departure from BS. These are empirical models, fitted against observed market quotes. Ignoring outliers among raw data, the fitted vol surface must agree with observed market prices — empirical.

a web service defines a schema + wsdl

For a given web service, I think there's exactly one “operation” (or “web method”) defined in the wsdl.

The input/output parameters of the operation are usually “umbrella” objects with a predefined “structure”, originating from the

basic C struct. As an alternative to a predefined structure, a Dictionary/Map of {string key -> anyType value} is extremely flexible

but somehow not popular. In any case, a schema is specified.

In a nutshell, a web service is specified and defined by

– a wsdl

– a bunch of data schemas

a small example of negative projection

I learnt long ago that if you do something unusual and you want to gauge people’s reaction, you need to assume everyone around you is lazy, selfish, insecure, unforgiving, fearful, over-protective of personal image… Here’s one example.

Suppose boss gives you a task, and you suggest another team member as a better candidate for the task. This can easily invite criticism (from everyone) like “avoiding work”, “pushback”, or “acting like a boss”.

CIRP vs PCP

I feel CIRP is a stronger arbitrage rule than (european) PCP. I feel the 4 prices involved in CIRP are more liquid than the 3 prices in a PCP.

Note American options don’t follow PCP.

Baye’s thereom illustration in CFA textbook

The prior estimates are our best effort, well-researched findings. In the absence of the news about expansion or no-expansion, those
estimates are the numbers we stand by. If the news gets recalled, our estimates would return to the prior estimates.

The unconditional prob of expansion is time-honored, trusted and probably based on historical observations. We always believe the
uncond prob is 41% regardless of the news about expansion. It’s rare to discover a “new info” that threatens to discredit this 41%
assessment.

##common lvalue expressions in c++

Contrary to some claims, I feel an expression can be “usable as either L-value or R-value”.

Further, I feel most (or all?) L-value expressions are usable as R-values.

It’s instructive to look at all the expressions that can be L-value expressions

– pointer variable, but not a mere address. See http://bigblog.tanbin.com/2012/03/3-meanings-of-pointer-tip-on-delete.html
– pointer variable unwrapped, like *(myPtr)
– ref variable
– nonref variable
– subscript expression
– function returning a reference
– function returning a pointer? No

Warning ! I call this the little dark corner of L-value. Any time a function (or overloaded operator) modifies an object and returns a non-const reference thereto, you invite trouble because you allow it as a L-value. The modified object is now on the LHS of assignment!

2-headed coin – Tom’s special coin

http://blog.moertel.com/articles/2010/12/07/on-the-evidence-of-a-single-coin-toss is a problem very similar to the regular 2-headed coin problem. If after careful analysis we decide to use initial estimate of 50/50, then the first Head would sway our estimate to 66.66%.

Many follow-up comments point out our trust of Tom is a deciding factor, which I agree. After seeing 100 heads in a row, we are likely to believe Tom more. Now, that is a very tricky statement.

We need to carefully separate 2 kinds of adjustments on our beliefs.
C) corrections on the initial estimate
U) updates based on new evidence. These won’t threaten to discredit the initial estimate.

I would say getting 100 heads is a U not a C.

An example of C would be other people’s (we trust) endorsements that Tom is trustworthy. In this case as compared to the “pool distribution” case, the initial estimate is more subject to correction. In the pool scenario, initial prior is largely based on estimate of pool distribution. If and when a correction occurs, we must recompute all updated versions.

The way we update our estimate relies on the initial estimate of 50/50. Seeing 100 heads and updating 100 versions of our estimate is valid precisely because the validity of the initial estimate. The latest estimate of Prob(Fair) incorporates the initial estimate of 50/50 + all subsequent updates.

If you really trust Tom more, then what if it’s revealed the 100 heads are an illusion show by a neighbor magician (Remember we are in a pub). Nothing to with Tom’s coin. The entire “new information” is recalled. Would you still trust Tom more? If not, then there’s no reason to “Correct” the initial estimate. There’s no Corrective evidence on the scene.

(See other posts for related questions) Q2: A suspicious coin is known as either fair (50/50) or 2-headed. 10 tosses, all heads. What’s the probability of unfair? You could, if you like, imagine that you randomly pick one from a pool of either fair or unfair coins, but you don’t know how many percent of them are 2-headed.
A: We will follow “a property in common with many priors, namely, that the posterior from one problem becomes the prior for another problem; pre-existing evidence which has already been taken into account is part of the prior and as more evidence accumulates the prior is determined largely by the evidence rather than any original assumption.” See wikipedia.
F denotes “picked a Fair coin”
U denotes “picked an Unfair”. P(U) == 100% – P(F)
Prior_1: P(F) is assumed to be 50%, my subjective initial estimate, based on zero information.
Posterior_1: P(F|H) is the updated estimate (confidence level) after seeing 1st Head.
P(F|H) =
Assuming P(F) is 50%, then P(F|H) comes to 1/3.
Now this 1/3 is our posterior_1 and also prior_2, to be updated after 2nd head.
P2(F|H) =
Here assuming P(F) = 1/3 and P(U) = 2/3, we get posterior_2 = 1/5 or 20% as the updated estimate after 2nd head.

Q: As shown, posterior_1 (33.33%) is used as P(F) in deriving the updated P(F) (20%), so which value is valid? 33.33% or 20%?
A: both. 33% is valid after seeing first H, and 20% is valid after seeing 2nd H. If 2nd H is declared void, then our best estimate rolls back to 33%. If after 5 heads, first head is declared void, then we should rollback all updates and return to initial prior_1 of 50/50.

Now this posterior_2 is used as prior_3, to be updated after 3rd head.
P3(F|H) =
Posterior_3 comes to 1/9 or 11.11%

Let’s stop after first 2 heads and try another solution to posterior_2.
A denotes “1st toss is Head”
B denotes “2nd toss is Head”
P(F|A)=1/3, based on the same initial estimate.
P(F|AB)=

Q (The key question): what value of P(F) to use? 50% or 1/3?

A: P(F) is the initial estimate, without considering any “news”, so P(F) == 50% i.e. our initial estimate. This 50/50 is basis of all the subsequent updates on the guess. The value of 50/50 is kind of subjective, but once we decide on that value, we can’t change this estimate half way through the successive “updates”.

Each successive update fundamentally and ultimately relies on the initial numerical level-of-belief. We stick to our initial “prior” for ever, not discarding it after 100 updates. Our updated estimates are valid/reasonable exactly due to validity of our initial estimate.

If we notice a digit is accidentally omitted within initial estimate calc, then all versions of updated estimate become invalid. We correct the initial calc and recalc all updated estimates. Here’s an example of a correction in prior_0 — pool of coins is 99% fair coins, so initial 50/50 seriously underestimates P(F) and overestimates P(Unfair coin picked).
If initial estimate is way off from reality (such as the 99%), will successive updates improve it? (Not sure about other cases) Not in this case. The 50/50 incorrect estimate is Upheld by each successive update.
If any news casts doubts over the latest estimate (rather than the validity of initial estimate), we update latest estimate to derive another posterior, but the posterior estimate is no more valid than the prior. Both have the same level of validity, because posterior is derived using prior.

We need to know X) when to Correct the initial prior (and recalc all posteriors) vs Y) when to apply a news to generate a new posterior. If we receive new information in tweets, very very few tweets are Corrective (X). Most of the news are updaters (Y). Such a news is NOT a “correction of mistake” or newly discovered fact that threatens to discredit the initial estimate — but more of a what-if scenario. “If first 10 tosses are all heads” then how would you update your estimate from the initial 50/50.

“What-if 11th toss is Tail”? We aren’t justified to discredit initial estimate. We update our 10th estimate to a posterior of P(F) = 100%, but this is as valid as the initial 50/50 estimate. 50/50 remains a valid estimate in the no-info context. When we open the pool we see 2 coins only, one fair one unfair, so our 50/50 is the best prior, and the 11th toss doesn’t threaten its validity. When we know nothing about the pool, 50/50 is a reasonable prior.

P(F|AB) comes to 1/5, since there’s no justification to “correct” initial 50/50 estimate.

2-headed coin – HeardOnTheStreet & sunrise problem

See [[Heard On the Street]] Question 4.18. 10 heads in a row is very unlikely on a fair coin, so you may feel Prob(unfair) exceeds 50%.

However, it can be fairly unlikely to pick the unfair coin.

In fact, they are equally unlikely. It turns out the unlikelihood of “10 heads/fair” is numerically close to the unlikelihood of “picking an unfair”. P(Fair coin picked) ~= 50/50

This brings me to the sunrise problem — after seeing 365 * 30 (roughly 10,000) consecutive days of sun rise or 10,000 consecutive tosses showing Head, you would think P(H) is virtually 100%, i.e. you believe you have a 2-headed coin, but what if you discover there’s only 1 such coin in a pool of 999,888,999,888,999,777,999 coins? Do you think you picked that one? Or do you think you had 10,000 lucky tosses? It may turn out you need more luck to pick that special coin than getting 20 heads on a fair coin. In such a case, you are more likely to have a fair coin than an unfair coin. Your next toss would be 50/50.

mother of 2 kids, at least 1 boy

A classic puzzle showing most people have unreliable intuition about Cond Prob.

Question A: Suppose there’s a club for mothers of exactly 2 kids — no more no less. You meet Alice and you know she has at least one boy. What’s Prob(both boys)?
Question K: You meet Kate (at clubhouse) along with her son. What’s P(she has 2 boys)?
Question K2: You also see the other kid in the stroller but not sure Boy or Girl. What’s P(BB)? This is essentially the same question on P166 [[Cows in the maze]]

Solution A: 4 equi-events BB/BG/GB/GG of 25% each. GG is ruled out, so she is equally likely to be BB/BG/GB. Answer=33%

Solution K: 8 equi-events BB1/BB2/BG1/GB2/BG2/GB1/GG1/GG2. The latter 4 cases are ruled out, so what you saw was equally likely to be BB1/BB2/BG1/GB2. Answer=50%

Question C: Each mother wears a wrist lace if she has a boy and 2 if 2 boys (Left for 1st born, Right for 2nd born). Each mother comes with a transparent (hardly visible) hairband if she has either 1 or 2 boys. There are definitely more wrist laces than hairbands in the clubhouse. If you notice a mother with a hairband, you know she has either 1 or 2 wrist laces. If you see a wrist lace, you know this mother must have a hairband.

C-A: What’s P(BB) if you see a mother with a hairband?
C-K: What’s P(BB) if you see a mother with a wrist lace on the left hand?

Solution C-A: Out of 2000 mothers, 1500 have hairband. 500 have 2 boys. P(BB) = 33%
Solution C-K: 500 have 2 wrist laces; 500 have only a left wrist lace; 500 have only a right wrist lace. P(BB) = 50%

Seeing a wrist lace is not the same as seeing a hairband. The 2 statements are NOT equivalent. Wrist laces (2000) outnumber hairbands (1500). A wrist lace sighting guarantees a hairband, so a wrist lace is more Rare, and a hairband  sighting is more Common. Within the clubhouse, 3 out of 4 hairband “tests” are positive, but only 2 out of 4 wrist lace tests are positive.

Applied to original questions…
* Alice wears hairband but perhaps One of her wrists might be naked. If she brings one child each time to clubhouse, we may not always see the a boy.
* Kate wears at least one wrist lace (so we know she has a hairband too).

\$ if we randomly “test” Alice for wrist lace on a random hand, she may fail
\$ if we randomly “test” Alice for hairband, sure pass.
–> the 2 tests are NOT equivalent.

\$\$ if we randomly “test” Kate for wrist lace on a random hand, she may fail
\$\$ if we randomly “test” Kate for hairband, sure pass.
–> the 2 tests are NOT equivalent for Kate either

The wrist-lace-test pass implies hairband-test pass, but the same knowledge object contains additional knowledge. The 2 tests aren’t equivalent.

—– How is Scenario K2 different from A?
–How many mothers are like K2? We need to divide the club into 8 equal groups
* perhaps Kate is from the BB group and you saw the first kid or the 2nd kid
* perhaps Kate is from the BG group and you saw the first kid – BG1
* perhaps Kate is from the GB group (500 mothers) and you saw the 2nd kid – GB2. Now if you randomly pick one hand from each GB mother then 250 of them would show left hand (GB1) and 250 of them would show right hand (GB2). Dividing them into 2 groups, we know Kate could be from the GB2 group.
=} Kate could be from bb1, bb2, bg1, gb2 groups. In other words, all these 4 groups are “like Kate”. They (1000 mothers) all wear wrist lace, but not all having wrist lace are like-Kate — The bg2 (250) and gb1 (250) mothers are not like-Kate

–How many mothers are like Alice? 75% consisting of BB BG GB
——-
^ Spotting a hairband, the wearer (Alice) is equally likely from the 3 groups — BB(33%) BG(33%) GB(33%)
^ Spotting a wrist lace, the wearer (Kate) is more likely from the BB group (50%) than BG(25%) or GB(25%) group.

If I hope to meet a BB mother, then spotting a wrist lace is more valuable “signal” than a hairband.  Reason? Out of the 2000 mothers, there are 2000 wrist laces, half of them from-BB. There are 1500 hairbands, and a third of them are from-BB.

Further suppose each twin-BB mother gets 100 free wrist laces (because wrist lace manufacturer is advertising?), and all the BB mothers claim to have a twin-BB. As a result, wrist laces explode. Virtually every wrist lace you see is from-BB.
——-
There are many simple ways of reasoning behind the 33% and 50%, but they don’t address the apparent similarity and the subtle difference between A and K. When would a reasoning become inapplicable? It’s good to get to the bottom of the A-vs-K difference, the subtle but fundamental. A practitioner needs to spot the difference (like an eagle).

sub-millis OMS arch – last tips from Anthony

I feel ideally you want to confine entire OMS to one single process, minimizing IPC latency [1]. In practice however, even for one symbol OMS is often split into multiple processes or “instances”.

So what’s the IPC of choice? It turns out that in sub-millis trading, MOM messaging is the IPC of choice. I mentioned synchronous call and shared memory, but my veteran friend pointed out messaging performs better in practice.

The main platform-component is one big JVM instance with an internal order lookup cache for order state maintenance.

Multi-queue – if there are 50,001 symbols, there will be 50,001 queues. Once a queue is assigned a given thread T351, it is permanently bound to T351. This is to prevent multiple threads handling events concurrently on the same symbol. Obviously we don’t want 50,001 threads. Therefore, some kind of multiplexing is in place.

[1] Note data parallelism (into multiple processes) is free of IPC and perfectly fine.

## c++0x features I understand as significant

Here are the subset of C++11 features I understand as significant. Some significant c++11 features I don’t appreciate, such as lambda.

#1 move semantics
– std::move and forward

#2 Lib: smart pointers
– make_shared() etc