never exercise American Call (no-div), again

Rule 1: For a given no-dividend stock, early exercise of American call is never optimal.
Rule 1b: therefore, the price is similar to a European call. In other words, the early exercise feature is worthless.

To simplify (not over-simplify) the explanation, it’s useful to assume zero interest rate.

The key insight is that short-selling stock is always better than exercise. Given strike is $100 but the current price is super high at $150.
* Exercise means “sell at $150 immediately after buying underlier at $100”.
* Short means “sell at $150 but delay the buying till expiry”

Why *delay* the buy? Because we hold a right not an obligation to buy.
– If terminal price is $201 or anything above strike, then the final buy is at $100, same as the Exercise route.
– If terminal price is $89 or anything below strike, then the final buy is BETTER than the Exercise route.

You can also think in terms of a super-replicating portfolio, but I find it less intuitive.

So in real markets when stock is very high and you are tempted to exercise, don’t sit there and risk losing the opportunity. 1) Short sell if you are allowed
2) Exercise if you can’t short sell

When interest rate is present, the argument is only slightly different. Invest the short sell proceeds in a bond.

BS-E is PDE !! SDE

I believe BS-equation ( a famous PDE) is not a Stoch differential equation, simply because there’s no dW term in it.

A SDE is really about two integrals on the left and right. At least one integral must be a stochastic integral.

Some (not all) of the derivations of BS-E uses stochastic integrals.


There are many ways to derive the BS-E(quation). See [[Crack]]. Roger Lee covered at least two routes.

There are many ways to derive the BS-F(ormula). See P116 [[Crack]]

There are many ways to interpret the BS-F. Roger Lee and [[Crack]] covered them extensively.

Q: BS-F is a solution to the BS-E, but is BS-F based on BS-E?
A: I would say yes, though some BS-F derivations don’t use any PDE (BS-E is PDE) at all.

BS-E is simpler than BS-F IMO. The math operations in the BS-F are non-trivial and not so intuitive.

BS-F only covers European calls and puts.

BS-E covers American and more complex options. See P74 [[Crack]]

BS-E has slightly fewer assumptions:
– Stock is assumed GBM
– no assumption about boundary condition. Can be American or exotic options.
– constant vol?

jump diffusion pricing models — brief discussion

I asked a relatively young quant I respect.

She said most sell side models do not have jump feature. The most advanced models tend to be stochastic vol. A simpler model is the local vol model.

I said the Poisson jump model is well-regarded – but she said it’s not that mature.

I said the Poisson jump model is needed since a stock price often exhibits jumps – but her answer gave me the impression that a model without this “indispensable” feature can be good enough in practice.

When you actually put the jump model into practice, it may not work better than a no-jump model. This is reality vs theory.

clarifying questions to reduce confusions in BS discussions

–At every mention of “pricing method”, ask

Q: Analytical (Ana) or Numerical (Num)?

Q: for European or American+exotic options?

Obviously Analytical methods only work for European style

Q: GBM assumption?

I think most numerical methods do. Every single method has severe

assumptions, so GBM is just one of them.

–At every mention of “Option”, ask

Q: European style of Amerian+Exotic style?

–At every mention of Black-Scholes, ask

Q: BS-E(quation) or BS-F(ormula) or BS-M(odel)?

Note numerical methods rely on BS-M or BS-E, not BS-F

–At every mention of Expectation, ask

Q: P-measure of Q-measure?

The other measures, like the T-fwd measure are too advanced, so no

need to worry for now.

local vol^stoch vol, briefly

Black-Scholes vol – constant, not a function of anything.
** simplest

stoch vol – there’s a dB element in dσ. See

** most elaborate
** this BM process is correlated to the BM process of the asset price. Correlation ranging from -1 to 0 to 1.

local vol – sigma_t as a deterministic function of S_t and t, without any “stochastic” dB element.
** middle ground. A simplification of stoch vol

Jensen’s inequality – option pricing

See also

This may also explain why a BM cubed isn’t a local martingale.

Q: How practical is JI?
A: practical for interviews.
A: JI is intuitive like ITM/OTM.
A: JI just says one thing is higher than another, without saying by how much, so it’s actually simpler and more useful than the precise math formulae. Wilmott calls JI “very simple mathematics”

JI is consistent with pricing math of vanilla call (or put). Define f(S) := (S-K)+. This hockey-stick is a kind of convex function. Now Under standard RN measure,

   E[ f(S_T) ] should exceed f (E[ S_T ])

LHS is the call price today. RHS simplifies to f (S_0) := (S_0 – K)+ which is the intrinsic value today.

How about a binary call? Unfortunately, Not convex or concave !

Jensen\'s Inequality
A graphical demonstration of Jensen’s Inequality. The expectations shown are with respect to an arbitrary discrete distribution over the xi

hockey stick – asymptote

(See also post on fwd price ^ PnL/MTM of a fwd position.)

Assume K = 100. As we get very very close to maturity, the “now-if” graph descends very very close to the linear hockey stick, i.e. the “range of (terminal) possibilities” graph.

10 years before maturity, the “range of (terminal) possibilities” graph is still the same hockey stick turning at 100, but the now-if graph is quite a bit higher than the hockey stick. The real asymptote at this time is the (off-market) fwd contract’s now-if graph. This is a straight line crossing X-axis at K * exp(-rT). See

In other words, at time 0, call value >= S – K*exp(-rT)

As maturity nears, not only the now-if smooth curve but also the asymptote both descend to the kinked “terminal” hockey stick.

Towards expiration, how option greek graphs morph

(A veteran would look at other ways the curves respond to other changes, but I feel the most useful thing for a beginner to internalize is how the curves respond to … imminent expiration.)

Each curve is a rang-of-possibility curve since the x-axis is the (possible range of) current underlier prices.

— the forward contract’s price
As expiration approaches, …
the curve moves closer to the (terminal) payout graph — that straight line crossing at K.

— the soft hockey-stick i.e. “option price vs current underlier”

As expiration approaches, …

the curve descends closer to the kinked hockey stick payout diagram

Also the asymptote is the forward contract’s price curve, as described above.

— the delta curve
As expiration approaches, …

the climb (for the call) becomes more abrupt.

See diagram in

— the gamma curve
As expiration approaches, …

the “bell” curve is squeezed towards the center (ATM) so the peak rises, but the 2 tails drop

— the vega curve
As expiration approaches, …

the “bell” curve descends, in a parallel shift

N(d2), GBM, binary call valuation – intuitive

It’s possible to get an intuitive feel for the binary call valuation formula.
For a vanilla European call, C = … – K exp(-Rdisc T)*N(d2)
N(d2) = Risk-Neutral Pr(S_T > K). Therefore,
N(d2) = RN-expected payoff of a binary call
N(d2) exp(-Rdisc T) — If we discount that RN-expected payoff to Present Value, we get the current price of the binary call. Note all prices are measure-independent.
Based on GBM assumption, we can *easily* prove Pr(S_T > K) = N(d2) .
First, notice Pr(S_T > K) = Pr (log S_T > log K).
Now, given S_T is GBM, the random variable (N@T) 
   log S_T ~ N ( mean = log S + T(Rgrow – σ^2)  ,   std = T σ^2 ). 
Let’s standardize it to get
   Z := (log S_T  – mean)/std    ~  N(0,1)
Pr = Pr (Z > (log K  – mean)/std ) = Pr (Z < (mean – log k)/std) = N( (mean – log k)/std)  = N(d2)

option pricing – 5 essential rules n their assumptions

PCP — arb + extremely tight bid/ask spread + European vanilla option only. GBM Not assumed. Any numeraire fine.

Same drift as the numeraire — tradeable + arb + numeraire must be bond or a fixed-interest bank account.

no-drift — tradeable + arb + using the numeraire

Ito — BM or GBM in the dW term. tradable not assumed. Arb allowed.

BS — tradable + arb + GBM + constant vol

quasi constant parameters in BS

dS/S = a dt + b dW [1]

[[Hull]] says this is the most widely used model of stock price behavior. I guess this is the basic GBM dynamic. Many “treasures” hidden in this simple equation. Here are some of them.

I now realize a and b (usually denoted σ) are “quasi-constant parameters”. The initial model basically assumes constant [2] a and b. In a small adaptation, a and b are modeled as time-varying parameters. In a sense, ‘a’ can be seen as a Process too, as it changes over time unpredictably. However, few researchers regard a as a Process. I feel a is a long-term/steady-state drift. In contrast, many treat b as a Process — the so-called stochastic vol.

Nevertheless in equation [1], a and b are assumed to be fairly slow-changing, more stable than S. These 2 parameters are still, strictly speaking, random and unpredictable. On a trading desk, the value of b is typically calibrated at least once a day (OCBC), and up to 3 times an hour (Lehman). How about on a volatile day? Do we calibrate b more frequently? I doubt it. Instead, implied vol would be high, and market maker may jack up the bid/ask spread even wider.

As an analogy, the number of bubbles in a large boiling kettle is random and fast-changing (changing by the second). It is affected by temperature and pressure. These parameters change too, but much slower than the “main variable”. For a short period, we can safely assume these parameters constant.

Q: where is √ t
A: I feel equation [1] doesn’t have it. In this differential equation about the instantaneous change in S, dt is assumed infinitesimal. However, for a given “distant future” from now, t is given and not infinitesimal. Then the lognormal distribution has a dispersion proportional to √ t

[2] The adjective “constant” is defined along time axis. Remember we are talking about Processes where the Future is unknown and uncertain.

N(d1) >> N(d2) | high σ, r==0, S==K

N(d1) = Pr(ST > S0) , share-measure
N(d2) = Pr(ST > S0) , RN-measure

For simplicity, T = 1Y,  S= K = $1.

First, forget the formulas. Imagine a GBM stock price with high volatility without drift. What’s the prob [terminal price exceeding initial price]? Very low. Basically, over the intervening period till maturity, most of the diffusing particles move left towards 0, so the number of particles that lands beyond the initial big-bang level is very small. The “distribution” curve is squashed to the left. [1]

However, this “diffusion” and distribution curve would change dramatically when we change from RN measure to share-measure. When we change to another measure, the “probability mass” in the Distribution would shift. Here, N(d1) and N(d2) are the prob of the same event, but under different measures. The numeric values can be very different, like 84% vs 16%.

Under share measure, the GBM has a strong drift (cf zero drift under RN) —

dS = σS dt + σ S dW

Therefore when σT is high, most of the diffusing particles move right and will land beyond the initial value, which leads to Pr(ST > S0) close to 100%

— Now the formula view —
With those nice r, S, K, T values,

d1 =  σT /2
d2 = –σT /2

Remember for a standard normal distribution, if d1 and d2 are 1 and -1 (if σ=2), then N(d1) would be 68% and N(d2) would be 32%.

[1] See posts

Pr(S_T > K | S_0 > K and r==0), intuitively

The original question — “Assuming S_0 > K and r = 0, denote C := time-0 value of a binary call. What happens to C as ttl -> 0 or ttl -> infinity. Is it below or above 0.5?”

C = Pr(S_T > K), since the discounting to PV is non-issue. So let’s check out this probability. Key is the GBM and the LN bell curve.

We know the bell curve gets more squashed [1] to 0 as ttl -> infinity. However, E S_T == S_0 at all times, i.e. average distance to 0 among the diffusing particles is always equal to S_0. See

[1] together with the median. Eventually, the median will be pushed below K. Concrete illustration — S_0 = $10 and K = $4. As TTL -> inf, the median of the LN bell curve will gradually drop until it is below K. When that happens, Pr (S_T > K) 0 as ttl -> infinity.

ttl -> 0. The particles have no time to diffuse. LN bell curve is narrow and tall, so median and mean are very close and merge into one point when ttl -> 0. That means median = mean = S_0.

By definition of the median, Pr(S_T > median) := 0.5 so Pr(S_T > S_0) = 0.5 but K is below S_0, so Pr(S_T > K) is high. When the LN bell curve is a thin tower, Pr(S_T > K) -> 100%

ITM binary call as TTL -> 0 or infinity

I was asked these questions in an exam (Assuming r = 0, and S_0 > K).

Given standard GBM dynamics of the stock, binary call price today is N(d2) i.e. risk-neutral probability of ITM.

As ttl -> 0, i.e. approaching expiry, the stock has little chance of falling below K. The binary call is virtually guaranteed ITM. So binary call price -> $1.

As ttl -> inf, my calc shows d2 -> -inf, so N(d2) -> 0. For an intuitive explanation, see

Pr(S_T > S_0 | r==0) == 50/50@@

This is an ATM binary call with K = S_0 .
Wrong thought – given zero drift, the underlier price process is stochastic but “unbiased” or “balanced”, so underlier is equally likely to finish above K or below K. This wrong thought assumes a BM not GBM.
Actually, the dynamics is given by
    dS = σ S dW
This is a classic GBM without drift, so S_T has a lognormal distribution – like a distorted bell shape. So we can’t conclude that probability[1] is 50%. Instead,
Pr (S_T > S_0 | r=0) = N(d2) = N (- σ√T/2 )    … which is =< 50%
This probability becomes 50% only if sigma is 0 meaning the price doesn’t move at all, like a zero-interest bank account. I guess the GBM degenerates to a BM (or a frozen, timeless constant?).
[1] under risk-neutral measure

BS-F -> Black model for IR-sensitive underliers

Generalized BS-Formula assumes a constant interest rate Rgrow :
Call (at valuation time 0) := C0 = Z0 * ( F0 N(d1) – K N(d2) ), where
Z0 := P(0,T) := observed time-0 price of a T-maturity zero coupon bond. There’s no uncertainty in this price as it’s already revealed and observed. We don’t need to assume constant interest rate.
F0 := S0 exp(Rgrow * T), which also appears inside d1 and d2
Q: How does this model tally with the Black model?
A: Simply redefine
F0 := S0 / Z0 , which is the time-0 “fwd price” of the asset S
Now the pricing formula becomes the Black formula for interest-rate-sensitive options. Luckily this applies even if S is the price of 5Y junk bond (or muni bond), and we know 5Y interest rate is stochastic and changes every day.

N(d1) N(d2) – my summary

n1 := N(d1)
n2 := N(d2)
Note n1 and n2 are both about probability distributions, so they always assume some probability measure. By default, we operate under (not the physical measure but) the risk-neutral measure with the money-market account as the “if-needed, standby” numeraire. 
– n2 is the implied probability of stock finishing above strike, implied from various live prices. RN measure.
– n1 is the same probability but implied under the share-measure. Therefore,
ST *N(d1) would be the weighted average payoff (i.e. expected payoff) of the asset-or-nothing call, under share-measure.
St * N(d1) would be the PV of the payoff, i.e. current price of asset-or-nothing call. Note as soon as we talk about price, it is automatically measure-independent.
Remember n2 is between $0 and $1 so it reminds us of … the binary call. I think this is the weighted average payoff of the binary call. RN measure. Therefore,
N(d2) exp(-Rdisc T) — If we discount that weighted average payoff to Present Value, we get the current price of the binary call. Note all prices are measure-independent.
N(d1) is also delta of the Vanilla call, measure-independent. Given call’s delta, using PCP, we work out put’s delta (always negative) = 1-N(d1) = -N(-d1)

The pdf N’(d1) appears in the gamma and vega formulas, measure-independent, i.e.
gamma = N’  (d1) * some function of (S, t)

vega = N’ (d1) * some other function of (S, t)

Notice we only put K (never S) in front of N(d2)

dynamic delta hedge – continual rebalancing

To dynamically hedge a long call position, we need to hold dC/dS amount of shares. That number is equal to the delta of the call. In practice, given an option position, people adjust its delta hedge on a daily basis, but in theory, rebalancing is a continual, non-stop process.
The call delta BS-formula makes it clear that “t” alone will change the delta even if everything else held constant. Specifically, even if S is not changing and interest rate is zero, the option delta won’t stay constant as maturity approaches, so we need to adjust the number of shares we hold.

replication – 1-step binomial model, evolving to real option

Now I feel the binomial tree model is a classic analytical tool for option pricing….
The 1-step binomial scenario is simple but non-trivial. Can be mind-bending. Usually we are given 5 numbers for Sd, S0, Su, Cd, Cu, and the problem is phrased like “Use some number of stock and bond to replicate the contract C” to get the same “payout outcomes” [1].

First, ignore the Cd, Cu values. The Sd, S0, Su 3 numbers alone imply RN probabilities of the up-move and down-move.

Next, using the RNP values we can replicate ANY contract including the given C contract.

The number of shares in the replication is actually the delta-hedge of the call.

[1] “Payout outcomes” mean the contract pays Cd dollars in the down-state and Cu dollars in the up-state.
—— That’s the first knowledge pearl to internalize…————

* 1-step binomial call option 
– this option contract can be replicated with stocks + bonds. Rebalancing not necessary.
– RNP/MG is an alternative to replication
* 1-step 3-state call option
– can’t replicate with stocks + ….
– RNP non-unique
(That’s assuming the 3 outcomes don’t accidentally line up.)
* 2-step 3-state call option, i.e. allowing rebalancing
– can replicate with stocks + bonds but needs rebalancing (self-financed, of course)
– RNP/MG is an alternative to replication

* fine-grained call options — infinite steps, many states
can replicate (terminal) payout with stocks + bonds, but needs dynamic delta-hedge (self-financed of course)
– * required number of stocks = delta

greeks on the move – intuitively

When learning option valuations and greeks, people often develop quick reflexes about what-if’s. Even a non-technical person can develop some of these intuitions. Because these are quick and often intuitive, this knowledge is often more practical and useful than the math details.

Some of these observations are practically important while others are obscure.

Q3: How would all indicators of an ATM instrument move when underlier rises/falls?
QQ: What if the instrument has very low/high volatility?
QQ: What if the instrument is far/close to expiry?

Q5: How would all indicators of a deep OTM (deep ITM is rare) instrument move when underlier moves towards/from strike?
QQ: What if the instrument has very low/high volatility?
QQ: What if the instrument is far/close to expiry?

Q7: How would all indicators of a deep-OTM/ATM instrument move when sigma_imp rises/falls?
QQ: What if the instrument has very low/high volatility?
QQ: What if the instrument is far/close to expiry?

Q9: How would all indicators of a deep-OTM/ATM instrument move when approaching maturity?
QQ: What if the instrument has very low/high volatility?

“Indicators” include all greeks and option valuation. The “instrument” can be a European/American call/put/straddle.

Essential BS-M, my 2nd take

People ask me to give a short explanation of Black-Scholes Model (not BS-equ or BS-formula)…

I feel random variable problems always boil down to the (inherent) distribution, ideally in the form of a probability density function.

Back to basics. Look at the height of all the kids in a pre-school — There’s a distribution. Simplest way to describe this kind of distribution is a histogram [.8 -1m], [1-1.2m], [1.2-1.4m] … A probability distribution is a precise description of how the individual heights are “distributed” in a population.

Now consider another distribution — Toss 10 fair dice at once and add up the points to a “score”. Keep tossing to get a bunch of scores and examine the distribution of scores. If we know the inherent, natural distribution of the scores, we have the best possible predictor of all future outcomes. If we get one score per day, We can then estimate how soon we are likely to hit a score above 25. We can also estimate by the 30th toss, how “surely” cumulative-score would have exceeded 44.

For most random variables in real life, the inherent distribution is not a simple math function like our little examples. Instead, practioners work out a way to *characterize* the distribution. This is the standard route to solve random variable problems because characterizing the underlying distribution (of the random variable) unlocks a whole lot of insights.

Above are random variables in a static context. Stock price is an evolving variable. There’s a Process. In the following paragraphs, I have mixed the random process and the random variable at the end of the process. The process has a σ and the variable (actually its value at a future time) also has a σ.
In option pricing, the original underlying Random Process Variable (RPV) is the stock price. Not easy to characterize. Instead, the pioneers picked an alternative RPV i.e. R defined as ln(Sn+1 / Sn) and managed to characterize R’s behavior. Specifically, they characterized R’s random walk using a differential equation parametrized by a σinst i.e. the instantaneous volatility [1]. This is the key parameter of the random walk or the Geometric Brownian motion.

Binomial-tree is a popular implementation of BS. B-tree models a stock price [2] as a random walker taking up/down steps every interval (say every second). To characterize the step size Sn+1 – Sn, we wanted to get the distribution of step sizes but too hard. As an alternative, we assume R follows a standard Wiener process so the value of R at any future time is normally distributed. But what is this distribution about?

Remember R is an observable random variable recorded at a fixed sampling frequency. Let’s denote R values at each sampling point (i.e. each step of the random walk) as  R1, R2 ,R3, R4 …. We treat each of them as independent random variables. If we record a large series of R values, we see a distribution, but this is the wrong route. We don’t want to treat time series values R1, R2 … as observations of the same random variable. Instead, imagine a computer picking an R value at each step of the random walk (like once a second). The distribution of each random pick is programmed into computer. Each pick has a distinct Normal distribution with a distinct σinst_1, σinst_2, σinst_3 …. [4]

In summary, we must analyze the underlying distribution (of S or R) to predict where S might be in the future[3].
[4] A major simplifying assumption of BS is a time-invariant  σinst which characterizes the distributions of  R at each step of the random walk. Evidence suggests the diffusion parameter σinst does vary and primarily depends on time and current stock price. The characterization of σinst as a function of time and S is a cottage industry in its own right and is the subject of skew modelling, vol surface, term structure of vol etc.

[1] All other parameters of the equation pale in significance — risk-free interest i.e. the drift etc.
[2] While S is a random walker, R is not really a random walker. See other posts.
[3] Like in the dice case, we can’t predict the value of S but we can predict the “distribution” of S after N sampling periods.

theoretical numbers ^ vol surface

After you work in volatility field for a while, you may figure out when (and when not) to use the word “theoretical”. There’s probably no standard definition of it. I guess it basically means “according-to-BS”. It can also mean risk-neutral. All the greeks and many of the pricing formulas are theoretical.

The opposite of theoretical is typically “observed on the market”, or adjusted for skew or tail.

Now, the volatility smile, the volatility term structure and the vol surface are a departure from BS. These are empirical models, fitted against observed market quotes. Ignoring outliers among raw data, the fitted vol surface must agree with observed market prices — empirical.

most important params on a parametric vol surface #term struct

Each smile curve on a vol surface is typically described by a few parameters. The most important are
1) atmVol aka anchorVol, and
2) skew

All other curve-parameters are less important. All the curve-parameters are “calibrated” or “tuned” using market quotes.

Skew is a number and basically describes (for a given maturity) the asymmetry of the vol smile. There’s one skew number for each fitted maturity. These numbers are typically negative.

That’s the parametrization along the strike axis. How about along maturity axis? What parameters describe the term structure of vol?

I don’t know for sure, but often a parametric vol surface has a term structure parametrization for each curve-parameter. For example, there’s a term-structure for anchorVol. There’s another term structure for skew. Well, in some of the most sophisticated vol surface models, there’s no such TS parametrization. I guess in practice users didn’t find it useful.

long gamma == long realized vol@@

A rule suggested by says “Long Gamma -> Profit when realized volatility is greater than the implied volatility of the purchased option”.

That’s a bit complicated for me, but here is what I know — If you buy either a put or a call ATM and then delta hedge, you would make (unrealized) profit when realized volatility turns out to be high. If instead sigma_r is low, I guess you still make a small profit. You get 0 PnL (ignoring premiums) if sigma_r is 0.

sigma_r means realized vol; sigma_i means implied vol. says “Typically in positive Gamma trades we seek realized volatility, as the more the underlying moves the better your chances for raking in profits.”

I believe this assumes delta hedge. The scenario is that during the holding period underlier moves but your delta hedge (by a stock position) is fairly effective so whatever your option gain or loss due to its delta (say around 50%) is offset by the gain or loss of your stock position. However, the gamma contribution to your PnL would be positive.

Another observation (“rule”) is
* vega measures your exposure and sensitivity to Implied vol.
* gamma measures your exposure and sensitivity to Realized vol.

Someone said, if sigma_i goes up by 1 basis point, your PnL is equal to vega.

##conventional wisedoms in option valuation

There are many empirical rules in option math, but I feel with different universality and reliability.

Rule) vega of an ATM op =~ premium / implied vol
Rule) put-call equivalence in FX options. See separate blog post
Rule) PCP – “complicated” in American style, according to CFA textbook
Rule) delta(call) + delta(put) =~ 100% — See separate blog post
Rule) delta is usually between 0 and 1? Someone told me it can exceed 1 before ex-div
Rule) option valuation always decays with time
Rule) ATM delta is “very close” to 50%, regardless of expiration
Rule) delta converges with increasing vol. See separate blog post

Rule) For a strike away from predicted forward price, the OTM option has better liquidity than the ITM option. Therefore the OTM is  more useful/important to volatility estimate at that strike.

Rule) for equities, OTM put quotes show higher i-vol than OTM calls. Incidentally, at the same low strike, the OTM put is more liquid than ITM call. Reason is, most people trade OTM options only. However, ITM options are still actively traded — if an ITM option is offered at a low enough price, someone will buy; if an OTM option is bidding high enough, someone will write the option.

ln(S/K) should be compared to what yardstick@@

(update — I feel depth of OTM/ITM is defined in terms of ln(S/K) “battling” σ√ t )

Q: if you see a spot/strike ratio of 10, how deep OTM is this put? What yardstick should I use to benchmark this ratio? Yes there is indeed a yardstick.

In bond valuation, the yardstick is yield, which takes into account coupon rate, PV-discounting of each coupon, credit quality and even OAS. In volatility trading, the yardstick has to take into account sigma and time-to-maturity. In my simplified BS (, there’s constant battle between 2 entities (more obvious if you assume risk-free rate r=0)

     ln(S/K) relative to σ√ t         …………….. (1)

Fundamentally, in BS model ln(S/K) at any time into the diffusion has a normal distribution whose
stdev = σ√ t, i.e. the quoted annualized vol scaled up for t (2.5 years in our example)

Note the diffusion starts at the last realized stock price.

Q: Why is σ a variable and t or r are not?
σ is the implied vol.
σ is the anticipated vol over the remaining life of the option. If I anticipate a 20%, i can put it in and get a valuation. Tomorrow, If i change my opinion and anticipate a much larger vol over the remaining life (say, 2 years) of the option, I can change this input and get a much larger valuation.

The risk free rate r has a small effect on popular, liquid options, and doesn’t fluctuate much

As to the t, it is actually ingrained in my 2 entities in (1), since my sigma is scaled up for t.

var-swap PnL: %%worked example

 A variance swap lets you bet on “realized” variance. The exchange automatically calculates realized variance for each day, so if you bet the total realized variance over the next 3 days will average to exceed 0.64 [1], then you can buy this contract. If it turns out to be 0.7812, you earn the difference of 0.1412 notional which would mean $141,200 on a million dollar notional.

[1] which means 80% vol (annualized), or roughly 5% daily realized vol (un-annualized)

Standard var swap PnL is defined as

    (sigma_r2 – K) N  ….. …(1)
  N denotes notional amount like $1,000,000
  K denotes strike, which is always in terms of annualized variance

sigma_r is annualized realized Vol over the n days, actually over n-1 price relatives

  sigma_r2 is annualized realized Variance, and calculated as
    252/(n-1)  [  ln2(S2/S1) + ln2(S3/S2)  + … + ln2(Sn/Sn-1)  ]
  S2 denotes the Day 2 closing price.
  ln2(S2/S1) is known as daily realized Variance un-annualized

ln(S2/S1) is known as daily realized Vol un-annualized, or DRVol

In other words, take the n-1 values of ln(PriceRelative) and find the stdev assuming 0 mean, then annualize.

A more intuitive interpretation — take the average of the n-1 daily realized variances, then multiply by 252.

Now, trading often work with DRVol rather than the S2 stuff above, so there’s an equivalent PnL formula to reveal the contribution of “today’s” DRVol to a given var swap position, and also track the cumulative contribution of each day’s DRVol. Formula (1) becomes PnL ==

252N/(n-1)*[ ln2(S2/S1)-K/252 + ln2(S3/S2)-K/252 + .. + ln2(Sn/Sn-1)-K/252 ], or
N/(n-1)*[ 252ln2(S2/S1)-K + 252ln2(S3/S2)-K + .. + 252ln2(Sn/Sn-1)-K ]
  N/(n-1) represents the notional amount allocated to each day.
  252ln2(S2/S1) represents the annualized daily realized Variance on Day 2

√252 ln(S2/S1) represents the annualized DRVol, but is omitted from the formula due to clutter

In other words, for each day get the “spread” of (annualized) DRVar over strike (K), multiply it by the daily notional, and you get a daily PnL “contribution”. Add up the daily to get the total PnL. Here’s an example with daily notional = $4166666 and K = 0.09 i.e. 30% vol

ln PR
sqrt(252) ln PR
spread over K
daily PnL contribution

You can then add up the daily contributions, which would add up to the same total PnL by Formula in (1).

how to make volatility values ANNUALIZED

(Let’s assume a flat forward curve i.e. 0 drift, 0 dividends, 0 interests.) Suppose an implied vol for a 1-year option is 20%. If we record ln(PR) i.e. log of daily price relatives until expiry, we expect 68% of the 200+ daily readings to fall between -0.2 and 0.2. That’s because ln(PR) is supposed to follow a normal distribution.

Note we aren’t 68% sure about the expiration underlier price i.e. S(t=T) or S(T) for short. This S(T) has a lognormal distribution[2], so no 68% rule. However, we do know something about the S(t=T) because the end-to-end ln(PR) is the sum of ln(daily PR), and due to central limit theorem, the overall ln(PR) has a normal distribution with a variance = sum(variance of ln(daily PR)). We always assume the individual items in the sum() are independent and “identical”, variance of ln(daily PR) is therefore 0.04/252days.

Also, Since ln(overal PR) = ln[S(T)/S(0)] has normal distribution, S(T) has a lognormal distribution. That’s the reason for [2].

To answer any option pricing question, we invariably need to convert quoted, annualized vol to what I call raw-sigma or stdev.

Rule #1 — we assume one-period variance will persist to 2 periods, 3 periods, 4 periods… (eg: a year consists of 12 one-month periods.)

Example 1: If one-year variance is 0.04, then a four-year raw-variance would be .04 * 48/12 = .16. The corresponding stdev i.e. raw-sigma would be 40%. This value is what goes into BS equation to price options with this maturity.

Example 2: If one-year variance is 0.04, then a three-month raw-variance would be .04 *3/12 = .01. The stdev i.e. raw-sigma would be sqrt(.01) = 10%. This value is what goes into BS equation to price options with this maturity. By Rule #1, we assume the same 3-month variance would persist in 2nd 3-month, the 3rd 3-month and 4th 3-month periods.

underlier terminal spot-price pdf –> option value

In any analysis of derivative valuation, we are interested in the the possible valuationS of a security at a given time. Suppose an IBM $190 option expires 22 Dec 2014, we want to know something about the possible price level on that day. We use a random variable ST to Denote S(t=T) i.e. the underlyer price at time=T. ST might be 180, 200, or 230 or whatever. (Actually IBM is quoted to 2 decimal places;-) However, as a continuous random variable, ST can be any value between 0 and 10x current price, or higher.

To keep things simple, we first look at the likelihood of ST falling below 150, between 150~200, 200~250, and beyond 250. By intuition, the probabilities of hitting these 4 “buckets” or ranges must add up to 100%.

That’s too coarse. Let’s divide into $1 buckets from 0 to $2000. We end up with 2000+1 ranges (including a special “above $2000” bucket). Say our smart computer model estimates that chance of ST falling into $200-$201 is 5 bps or 0.05%. So we draw a vertical bar of height=5; width=1/10,000 over the 200-201 range. Suppose the 201-202 probability is 3 bps, we draw a bar of that height. Iterating over our 2001 ranges, we get 2001 bars. Total area of the bars add to 1.0 [1]. Your first histogram! When the range size becomes infinitesimal, histogram becomes a pdf curve — the beautiful lognormal bell curve.

Other posts in this blog discuss how to derive the exact pdf (prob density function) of the random variable ST from the Basic assumption

\frac{dS}{S} = \mu \,dt+\sigma \,dW\,

Once we have a pdf of ST, current value of an European call (before expiration) is tractable. Since the terminal value is a hockey-stick payoff function, we multiply the pdf by a piecewise linear function, and find area under the curve. This is abstract. Let’s use histogram to illustrate.

Suppose our smart computer simulates 10,000 trials. 5 outcomes should fall into 200-201. Payoff = $200.5-$190 = $10.5. Similarly, 3 outcomes fall into 201-202, with payoff =$11.5. Roughly half the outcomes probably fall below $190 — worthless. If we compute the average payoff, we might get something like $11.11. This depends on the sigma used in the 10,000 simulations and time to expiry. We assume 0 dividend and 0 drift.

[1] In fact, the bar of 5 consists of 5 minibars, each 0.0001 wide and 1.0 long. There are exactly 10,000 minibars in the histogram representing 10,000 trials.

binomial tree: y identical diamonds

The standard CRR btree is always drawn with all straight lines, equally spaced vertically and equally spaced horizontally. Therefore you always see nothing but a strict pattern of identical diamonds. Let’s zoom into this “geometry”.

First, let’s set the stage for the discussion. In this conceptual “world”, a price (say IBM) can only be observed/sampled at periodic discrete moments, either once a second, or once a day, though the interval should be small relative to time to maturity. Price may change mid-interval, but we can’t observe that. Further, during each interval, the price either moves up or down. It can remain unchanged only in a trinomial tree — not popular in industry.

Why diamonds? Because of interlocking/recombinant. See

Why equally spaced horizontally? Because the intervals are fixed and constant — at each clock tick, the variable must either rise or fall, never stay flat like a trinomial.

Why equally spaced vertically? Because the y-axis is log(price). An interesting feature of the CRR btree. If after n-1 intervals you plot the n price values in a bar chart, they don’t fit a straight line — but try plotting log(price).

Why are all the diamonds identical? Because the nodes are equally spaced both vertically and horizontally.

I feel the regularity is a great simplification and helps us focus on the real issue — the probability of an upswing at each node — the transition probability function, which is individually determined at each node position.

option valuations – a few more intuitions

It’s quite useful to develop a feel for how much option valuation moves when underlier spot doubles or halves. Also, what if implied vol doubles or halves? What if TTL (time to expiration) halves?

For OTM / ITM / any option, annualized i-vol multiplied by TTL is the real vol. For example, If you double vol and half TTL twice, valuation remains unchanged.

If you compare a call vs a put with identical strike/expiry (E or A style), the ITM instrument and the OTM instrument have identical time value. Their valuations differ by exactly the intrinsic value of the ITM instrument. (See  — Consistent with European option’s PCP, but to my surprise, American style also shows this exact relationship. I guess it’s because the put valuation is computed from a synthetic put (

For ATM options, theoretical option valuation is proportional to both vol and TTL, i.e. time-to-live. and other calculators show that
– when you change the vol number, valuation changes linearly
– when you double TTL while holding vol constant, valuation grows quadratically.

For OTM options? non-linear

For ITM options, it’s approximately the OTM valuation + intrinsic value.

hist-vol calc using price relatives – right vs wrong ways #Piroz

(See also post on var swap PnL)

You asked me how historical volatility is computed from daily closing prices. There’s an over-simplification in my answer.

Suppose we have daily closing prices

Day 1) p1 = $30.00
Day 2) p2 = $34.50
Day 3) p3 =…
Day 4) p4
Day 5) p5

First compute p2/p1 as PR2 (price relative over day 2), p3/p2 as PR3… p5/p4 as PR5. I was CORRECT here.

Then we SHOULD compute the Natural log of PR2, PR3, PR4 and PR5. These natural logs are known as “continuously-compounded-rate-of-return” or un-annualized “daily-realized-vol”, and are denoted r2, r3, r4, r5. I missed this step.
These r2, r3, r4, r5 look like low percentages like 8.2%, 3.1%, 4%, 15%…
Realized volatility is defined as standard deviation of these percentages, usually assuming zero mean. I was right at this step.
Since the r values look like percentages, so does their stdev.

There’s also a capital “R” denoting (PriceRelative-100%). Since PR2 = 115%, so R2 = 15%. R is widely known as “Return”, return on investment, or investment return etc.

R2 is always slightly above r2 = 13.9762%. For small price movements, R is a very close approximation of r.
My oversimplification was to misuse R in place of r.

I have since verified my calc with an ex-quant and developers at a volatility trading desk.

if given binary + asset-or-nothing call prices…

There’s something intuitive (and therefore important) in this scenario –

Suppose we know the current prices of
– binary call (B) on an asset like IBM stock
– asset-or-nothing call (denoted A) on that asset

Q: What can we say about the vanilla call current price?

First, remember binary call must be worth between $0 and $1 before maturity, since the
payout is either $0 or $1. In contrast, asset-or-nothing call has current value comparable to current asset price.

It’s useful to use concrete numbers, like K=$15, exp(-rT) = 0.8 …

Consider a portfolio { A – K*B } i.e.
++ long 1 unit of asset-or-nothing call
— short K (i.e. 15) units of the binary call. Note even if there’s a discount factor of 0.87 from maturity to today, we still go short K units.

If expired … easy. If (both) exercised, we end up with 1 unit of the asset, and we owe someone $15.

In a way, the (current prices of) A and B reveal the crucial N(d1) and N(d2) values as of today.

This replicating portfolio needs no rebalancing.

recombinant binomial tree, price relative and lognormal

I find the recombinant/interlock concept a nice simplification. I was told in practice, all pricing btrees are interlocked – much easier without loss of generality. wikipedia says

“The CRR method ensures that the tree is *recombinant*, i.e. if the underlying asset moves up and then down (u,d), the price will be the same as if it had moved down and then up (d,u) — here the two paths merge or recombine.”

Often (u,d) will 100% cancel each other. At each tiny step, either

newPrice = oldPrice * u (typically 1.0101010101010101) or
newPrice = oldPrice * d (typically 0.99)
u * d == 1

Note the standard definition of u and d [[CFA]] as __PriceRelative__ of day2closing/day1closing. (Useful in h-vol…)

For a simple example, u = 2, so our underlying price either doubles or halves at each step. Same result if it double-then-half, or half-then-double. Consistent with the lognormal model…

Warning: many texts use illustrations of u=1.01000000000, d=0.99, which violates lognormal assumption but still recombinant…

In binomial tree, the next value of u is kind of random. In other words, the multiply factor could change from 2.0 to 1.01 to 1.002… usually just above 1 if step is small.

More precisely, u is computed using the underlying volatility, σ, and the time duration of a step. In a complete model, the pdf of random variable “u” can be derived.

Q: in industry, are most binomial trees recombinant?
%%A: I think so.

upper bounds on vanilla call/put prices

Background — needed in many quant simple quizzes, or appetizers. There are lots of intuitions involved.

— The easy part — lower bounds —

call – $0 or the fwd contract’s price i.e. S_now – K*exp(-rT)
put – $0 or the short fwd i.e. K*exp(-rT) – S_now

— Now the upper bounds —

put ~ K*exp(-rT). Consider a super replicating {K bonds}.
call ~ S_now i.e. the current stock price itself. Consider a super replicating {1 share}. At expiration, the stock dominates the call, ditto before expiration.

N(d2) interpreted as a (Risk Neutral) probability

Under RN measure, underlier price process is a GBM

  dS = r S     dt +     σ S    dW
denoting L := log S
  dL = ( r – ½ σ2 )  dt +     σ    dW         …… BM, not GBM. No “L” on the RHS !
Therefore, underlier terminal price has lognormal distribution

  log ST ~ N(logS0 + T(r – ½ σ2)    ,    Tσ2 )  or N (mean, std^2)
Now, Pr(ST > K) simply translates to Pr(log ST > log K)  …. normal distro math! It’s now straightforward to standardize the condition to
  Pr ( (log ST – mean)/std  > (log K – mean)/std )
=Pr( z > [log K – logS0 – T(r – ½ σ2)] / σ sqrt(T) ) ….. which by definition is
=1 – N ( [log K – logS0 – T(r – ½ σ2)] / σ sqrt(T) )
Now be careful. 1-N(a) = N(-a), so that Pr() becomes
N([    –  [log K – logS0 – T(r – ½ σ2)] / σ sqrt(T) ) = N(d2)
For d1, recognize that under Share-measure,
dS = (r+σ2) S     dt +     σ S    dW
dL = ( r +σ2 – ½ σ2 )dt +σdW  …simplifying to
dL = ( r        + ½ σ2)dt +σdW  …differs from earlier formula (RN measure) only in the “+” sign

log ST ~ N(logS0 + T(r   +   ½ σ2)    ,    Tσ2 ) …… notice the “+” sign in front of ½ σ2
Therefore under share-measure, Pr (ST > K) = N(d1)

BS-M vs naive model on stock price changes

Q: Say IBM closes at $100 today (S0). What’s the “distribution” of tomorrow’s close S1?

First, note S1 is a random variable and has a distribution (or histogram if you simulate 1000 times).

The naive model says it’s equally likely to go up 20% or down 20%. BS says it’s equally likely to go up 25% or down 20%, because

   log(1+25%) and log(1-20%) have identical values (ignoring +/-…).

BS basically says log(price relative) is normally distributed.

Obvious flaw of the naive model — it assumes IBM is equally likely to go up 200% or down 200% … to negative!

Many people say BS assumes “underlier return is normal” but all in-depth articles say “log of *** is normally distributed”. P402 of CFA textbook on stats says “continuously compounded return” is defined as log(price relative).

constant-vol assumption ^ varying daily realized vols

As stated in, people expect daily realized vol (DRVol) to fluctuate during next 3 months (or any given period). Monday 9%, Tue 17%, Wed 8.9%… However, we also know that BS diffusion equation assumes a constant vol. Any contradiction?

Well, BS is not so naive as to assume every single day’s ln(PriceRelative) == the same value throughout the life of an option. That would be a deterministic asset price model. Such a model would absolutely predict the exact IBM closing price tomorrow. It’s clearly unrealistic — no one can predict the exact IBM closing price tomorrow.

No, BS is all about diffusion/randomness, so the exact price on any (future) day is random (even though price now is realized and known). That means BS can’t predict the exact value of ln(PriceRelative), which is the to-be-realized vol. Even in the constant-vol model, this ln() could be 10% tomorrow, and 20% next day (annualized vols). Such varying vol on a daily basis is perfectly legitimate I a constant-vol model.

Q: So What is the constant-vol assumption by BS?
A: the sigma for a given stock, once calibrated using historical data, is assumed to permanently characterize the diffusion or the random walk or the geometric Brownian motion. So even though BS can’t predict the exact value of ln(PR) today vs yesterday closing, BS does predict the Distribution of that ln() value. It treats the ln() as a random variable following a precise Normal distribution. Consequently, today closing price is another random variable following a precise LOGnormal distribution.

To a layman, this is revolutionary thinking. A layman like me tries to predict today’s closing price. Knowing how hard it is, we try to draw a narrow band of the closing. BS is smarter in treating that unknown closing just like a temperature, and predicting its probability distribution instead of its exact value.

My simplified form of BS-E

At any time before expiry fair premium of a European call option for a non-dividend paying stock is
    S = spot price at valuation date such as today
    t = time to expiry, at valuation date. This value is measured in years. If now our option is 2 years 6 months from maturity then t=2.5 at valuation date
    K and r are all constants
    sigma is also assumed constant — see below
    N() is the cumulative normal distribution function. I believe N(0) = 50%, N(g) + N(-g) = 1.0 and N() is monotonic increasing
Now, in the BS formula, sigma is treated as a constant — the well-known and unrealistic constant-volatility assumption. If I were to get sigma scaled Up for t=2.5 years (our example) and denote it “simga_t” or σt then
d_1=\frac{\ln\frac{S}{K}+rt+  \frac{\sigma_t^{2}}{2}  } {\sigma_t}
d_2=\frac{\ln(\frac{S}{K})+rt-\frac{\sigma_t^{2}}{2}}{\sigma_t} = d_{1}-\sigma_t
C(S,t)=N(d_1)~S-N(d_2)~K e^{-rt}\, ……………(same as before)

Let’s try to understand parts of this monster

Q2: why the (…) (…)
A: that comes from the simple fact that at expiration (not now), the terminal valuation (I didn’t say “PnL”) is in the form of “stock price at expiry – strike”

Q2b: what’s the implication on delta?
%%A: Well we know the part after the “-” is independent(?) of spot price, so if we simulate a tiny change in S, that portion remains unchanged. Delta calc can safely ignore it.

Q: for a deep ITM call, how is this simplified?
A: ln(S/K)/sigma_t dominates in both d1 and d2, so d1 and d2 are approximately equal and N ~= 1.0. So
C ~= S-K*exp(-rt). In other words the European call valuation is mostly its intrinsic value.

Q: for a deep OTM and small rt i.e. drift ?
A: d1 and d2 are approximately equal, both extremely negative, dominated by ln(S/K), so
C ~= K * N(large negative value)

Q: what’s the exp(-rt)?
A: simple. Discounting the strike price to present value. I believe for t below 2 years (listed options) this factor is close to 1.0 and has a minor effect. However if you ignore it a profitable deal can become unprofitable.

Estimating delta, vega, gamma, theta all requires differentiating through the normal distribution.

(use to edit equations).

American vs European call option valuation

Hey Hai Tao,

My CFA textbook had a conclusion I don't believe.

Say there is a microsoft stock call option expiring 7/1/2011, X = $20, in-the-money, American style
Say there is an identical microsoft call but European style.

Assumption: the underlying stock makes no dividend or other cash payment before expiration.

Under this assumption, textbook says both call options are worth the same.

Earlier the same author said that an American option is worth at least the same as an European style option, but under this Assumption, he claims they have equal valuation.

As a Layman, my intuition tells me the American option is more valuable. Suppose my analysis tells me Microsoft might drop to $19, then the American option lets me pay $2000 today for 100 shares and sell today at $2467, earning a profit of $467. The European option may expire out of the money and worthless.

Do you agree?

forward volatility — from basic principle

See also — variance is additive. More specifically, for n multiple independent experiments with each outcome having var1, var2, …var_n, the sum of the outcomes has variance var1+var2+…var_n. Incidentally, the sum of the outcomes has mean of mean1+mean…mean_n. A random walker makes one “experiment” at each step, and the log(PR) is cumulative.
One basic assumption in is “Given that the underlying random variables for non overlapping time intervals are independent, the variance is additive.”

Q: What’s that random variable? I believe it’s the “r” i.e. log(PR) described in in This variable “r” is additive in itself. Intuitively, if over 4 days gold price moves by a ratio of 120% -> 101% -> 97% -> 99%, then we can ADD r2,r3,r4,r5… to get the overall Price-Relative of Day 5 over Day 1 closing.

where r1-2 represents log($Day2ClosingPrice / $Day1ClosingPrice) = log(PR over Day2), which is another label for the earlier r2.

It turns out this sum is the variance of log(PR1-5), or variance of r1-5 , i.e. the volatility over the 5 consecutive observations. Since the random variables r1-2 r2-3 r3-4 r4-5 follow the same pdf, each variance should be numerically identical.
==> variance over 96 hours and 5 observations (Price1 to Price5) is exactly 4 times the daily variance
==> If we assume 256 trading days in a year, then annual variance is 256 times daily variance
==> annualized vol is 16 times daily vol. If annualized vol is 80%, then log(PRdaily) has stdev = 5%…..

The sum in [1] is variance over Day 1-5. Forward variance over 2-5 can be derived from the 4 individual variance numbers, or from….

However, it’s unfair to compare 2-5 fwd variance against “spot” variance of 1-5 when the holding periods are unequal. Therefore, all the variance numbers must be annualized for a fair comparison.

Basic statistics rule — if Y = X_a + X_b, i.e. 2 independent normal variables, then the variance of Y is sum of the 2 variances.

timespan^datetime ] BS and other pricing formulas

In many BS formulas, “t” represents a number that represents a particular time value like “3 months before maturity”, but a point in time can’t be represented by a number. Actually, in most cases the “t” variable represents timespan. I borrowed a networking term — Time-To-Live or TTL.

c# has these 2 concepts well separated. Datetime is a point in time, whereas Timespan is the distance between 2 datetimes.

When a pricing formula mentions …”a function of time” it’s really a function of distance in time, i.e. function of timespan (against the maturity datetime).

The “t” is usually a floating point number measured in years — clearly a timespan.

what 80% volatility means

( has the precise math definition of Realized Variance )

We need to differentiate between i-vol vs h-vol …..

Q: what does 24% volatility mean for an option expiring in 3 months?
A: it means the stdev for an observable is —-> 24% * √ 3m/12m = 24% * 1/2 = 12%.
See P277 [[Complete guide ]]

Now let’s define that observable. If today’s closing price is $100, and closing price in 3 months is X, then (X-100)/100 is the observable.

Therefore, 3-month forward price is likely (two-thirds likelihood) to fall between ±12% of current price, or between $88 and $112. Here we ignore interest rate and dividend.

Now forget about options. Consider a stock like IBM.

Q: what does a vol of 80% means for IBM?
A: see P 76 [[Options Vol trading]]. Average day will see a 5% move (80%/√ 252. More precisely 66.666% of the days will see moves under 5%; 33.333% of the days will see wider/wilder moves.

Q: a longer example — what does 25% volatility mean for IBM’s closing prices tomorrow vs today?
A: it means the stdev for IBM daily percentage return is 25% * √ 1 / 252days = 25% / 15.9 = 1.57% [1]

A longer answer: Take the underlier’s daily closing price for today ($100) vs tomorrow (X). The daily percentage return (X-100)/100  could be 1%, -2%, 0.5% .., but for clarity I’d rather compute PriceRelative to get 1.01, 0.98, 1.005…

Now if we simulate 1000 times and plot a histogram of daily returns defined as log(PR), then we get a mean. The mean is most likely very close to 0. The histogram is likely to resemble a normal curve. If today closes at $100, and if IBM does follow an annualized vol of 25%, then tomorrow’s close is likely (two-thirds likelihood) to fall within $98.43 to $101.57. Note the numbers are imprecise because we are assuming IBM is equally like to lose 1.57% or gain 1.57%, but this is the naive model. It also predicts “equally likely to gain 101% or lose 101%” to become NEGATIVE. BS model assumes log(PR) is normal so 1.57% gain is as likely as a decline to 98.454%.

IF (big IF) IBM continues to *follow* the 25% annualized vol, and if we observe its daily PR for 5 years, we will see that most of the time (two-third of the time), daily PR falls somewhere between ±1.57% using the naive model. See P34 [[Trading Options in Turbulent Markets ]]

[1] There are 252 trading days in a year. Our 25% vol is an annualized vol.

interpolating on vol surface between tenors #worked eg

Q: how do you query the vol surface at a fitted strike but between 2 fitted maturities — Jun and next Dec, assuming today is Jan 1.

First Take sigma_J*sigma_J and sigma_D*sigma_D. Say we get Something like 20%^2 and 30%^2. Remember these are annualized sigmas. Suppose those maturities are 6 months and 24 months out. Raw variance values would be

Variance_J = (20%^2)* 6m/12m = .02
Variance_D = (30%^2)* 24m/12m = .18

Our assumption is that variance is linear with TTL, so let’s line up our raw variance values

6 months to expiry –> .02
15 months to expiry -> x
24 months to expiry -> .18

==> x = .10 (not annualized)

Annualized variance_x == x /(15/12) = .08
Annualized sigma_x = 28.28%

This estimate is better than a naïve linear interpolation like

6 -> 20%
15 -> ?????? — 25%
24 -> 30%

gamma measures non-linearity of curve – valuation-vs-spot

(Similar indications —
  risk-reversals quote measures skewness of smile curve
  strangle quote measures convexity of smile curve

Very roughly, gamma is an indication of the non-linearity of an instrument’s valuation wrt underlier price. All linear instruments have 0 gamma. Every security in the world including cash has non-zero delta. Zero delta is meaningless.

Sound bytes —

Short put/call have negative gamma.
Long put/call have positive gamma, i.e. at higher[1] spot, call delta gets more positive and put delta gets Less Negative.
* Further, at the same strike/TTL, call vs put share identical gamma. Reason is simple — their deltas in absolute terms add up to 100%

[1] let’s avoid “growing” or “increasing” as these words imply variation over time .

BS-Modle – my first take, in simple language

— Here are some pointers to myself —

Price relative
log(PR) (known as r) is normally distributed.
random walk — Brownian motion
BS started with a (diffusion) differential equation describing how instantaneous stock movement depends on exponential drift + geometric Brownian motion
** parametrized with sigma and r, assumed constant.

assumption : constant vol assumption — biggest shortcoming
** no skew no term structure
**** seriously underestimates vol at low strikes – tail risk
** due to this shortcoming, BS is good for price quoting/inversion only, not valuations
assumption : zero-dividend assumption — later addressed by Merton

Applies to European style only
Applies to stocks only, not FX with 2 interest rates

Requires integration of normal pdf, so the valuation formula is based on the N() function i.e. the cummulative function