risk-neutral means..illustrated by CIP

Background — all of my valuation procedures are subjective, like valuing a property, an oil field, a commodity …

Risk-Neutral has always been confusing, vague, abstract to me. CIP ^ UIP, based on Mark Hendricks notes has an illustration —

  • RN means .. regardless of individuals’ risk profiles … therefore objective
  • RN means .. partially [1] backed by arbitrage arguments, but often theoretical
    • [1] partially can mean 30% or 80%
    • If it’s well supported by arbitrage argument, then replication becomes theoretical foundation of RN pricing



2 reasons: BM is poor model for bond price

Reason 1 — terminal value is known. It’s more than deterministic. It’s exactly $100 at maturity. Brownian Motion doesn’t match that.

Reason 2 — drift estimate is too hard too sensitive. A BM process has a drift value. U can be very careful very thorough to estimate it, but any minor change in the drift estimate would result in very large differences in the price evolution, if the bond’s lifespan is longer than 10Y.


Q: bond price change when yield goes to zero

Can bond yield become negative? Yes 2015-2017 many bonds traded at negative yield. https://www.ft.com/content/312f0a8c-0094-11e6-ac98-3c15a1aa2e62 shows a realistic example of a vanilla bond trading at $104. Yield is negative –You pay $104 now and will get $100 repayment so you are guaranteed to lose money.

Mathematically, when yield approaches negative 100 bps, price goes to infinity.

When yield approaches zero, bond price would go to the arithmetic sum of all coupons + repayment.

never exercise American Call (no-div), again

Rule 1: For a given no-dividend stock, early exercise of American call is never optimal.
Rule 1b: therefore, the price is similar to a European call. In other words, the early exercise feature is worthless.

To simplify (not over-simplify) the explanation, it’s useful to assume zero interest rate.

The key insight is that short-selling stock is always better than exercise. Given strike is $100 but the current price is super high at $150.
* Exercise means “sell at $150 immediately after buying underlier at $100”.
* Short means “sell at $150 but delay the buying till expiry”

Why *delay* the buy? Because we hold a right not an obligation to buy.
– If terminal price is $201 or anything above strike, then the final buy is at $100, same as the Exercise route.
– If terminal price is $89 or anything below strike, then the final buy is BETTER than the Exercise route.

You can also think in terms of a super-replicating portfolio, but I find it less intuitive.

So in real markets when stock is very high and you are tempted to exercise, don’t sit there and risk losing the opportunity. 1) Short sell if you are allowed
2) Exercise if you can’t short sell

When interest rate is present, the argument is only slightly different. Invest the short sell proceeds in a bond.

marginal probability density: clarified #with equations

(Equations were created in Outlook then sent to WordPress by HTML email. )

My starting point is https://bintanvictor.wordpress.com/2016/06/29/probability-density-clarified-intuitively/. Look at the cross section at X=7.02. This is a 2D area, so volume (i.e. probability mass) is zero, not close to zero. Hard to work with. In order to work with a proper probability mass, I prefer a very thin but 3D “sheet” , by cutting again at X=7.02001 i.e 7.02 + deltaX. The prob mass in this sheet divided by deltaX is a number. I think it’s the marginal density value at X=7.02.

The standard formula for marginal density function is on https://www.statlect.com/glossary/marginal-probability-density-function:

How is this formula reconciled with our “sheet”? I prefer to start from our sheet, since I don’t like to deal with zero probability mass. Sheet mass divided by the thickness i.e. deltaX:

Since f(x,y) is assumed not to change with x, this expression simplifies to

Now it is same as formula [1]. The advantage of my “sheet” way is the numerator always being a sensible probability mass. The integral in the standard formula [1] doesn’t look like a probably mass to me, since the sheet has zero width.

The simplest and most visual bivariate illustration of marginal density — throwing a dart on a map of Singapore drawn on a x:y grid. Joint density is a constant (you can easily work out its value). You could immediate tell that marginal density at X=7.02 is proportional to the island’s width at X=7.02. Formula [1] would tell us that marginal density is

CAPM beta – phrasebook


  • regression — beta is named in the context of a regression against the market factor
  • cov/var — beta is defined mathematically as this ratio
  • excess return — in the regression, both the explanatory variable and the dependent variable are excess returns.
  • portfolio — (due to regression) a portfolio beta can be computed from weighted average

above 1 — means the regression slope is steeper than the “market”
equals 1 — is the market itself or any “normal” security
below 1 — means the regression slope is more gentle than the “market”

BS-E is PDE !! SDE

I believe BS-equation ( a famous PDE) is not a Stoch differential equation, simply because there’s no dW term in it.

A SDE is really about two integrals on the left and right. At least one integral must be a stochastic integral.

Some (not all) of the derivations of BS-E uses stochastic integrals.

removing outlier in Monte Carlo

In a monte carlo simulation, I feel we should never remove any outlier.

The special event we are trying to capture could be an extreme event, such as a deep OTM option getting exercised. Perhaps one in 9 billion realizations is an interesting data point.

Removing any outlier would alter the probability distribution. So our Monte Carlo estimate is no longer unbiased estimate.

However, if there’s a data point with operational or technical error it needs to be removed. Not removing it would also mess up the probability distribution.


There are many ways to derive the BS-E(quation). See [[Crack]]. Roger Lee covered at least two routes.

There are many ways to derive the BS-F(ormula). See P116 [[Crack]]

There are many ways to interpret the BS-F. Roger Lee and [[Crack]] covered them extensively.

Q: BS-F is a solution to the BS-E, but is BS-F based on BS-E?
A: I would say yes, though some BS-F derivations don’t use any PDE (BS-E is PDE) at all.

BS-E is simpler than BS-F IMO. The math operations in the BS-F are non-trivial and not so intuitive.

BS-F only covers European calls and puts.

BS-E covers American and more complex options. See P74 [[Crack]]

BS-E has slightly fewer assumptions:
– Stock is assumed GBM
– no assumption about boundary condition. Can be American or exotic options.
– constant vol?

jump diffusion pricing models — brief discussion

I asked a relatively young quant I respect.

She said most sell side models do not have jump feature. The most advanced models tend to be stochastic vol. A simpler model is the local vol model.

I said the Poisson jump model is well-regarded – but she said it’s not that mature.

I said the Poisson jump model is needed since a stock price often exhibits jumps – but her answer gave me the impression that a model without this “indispensable” feature can be good enough in practice.

When you actually put the jump model into practice, it may not work better than a no-jump model. This is reality vs theory.

long-term daily stock returns ~ N(m, sigma)

Label: intuitiveFinance

A basic assumption in BS and most models can be loosely stated as

“Daily stock returns are normally distributed, approximately.” I used

to question this assumption. I used to feel that if the 90% confidence

interval of an absolute price change in IBM is $10 then it will be

that way 20 years from now. Now I think differently.

When IBM price was $1, daily return was typically a few percent, i.e.

a few cents of rise or fall.

When IBM price was $100, daily return was still a few percent, i.e. a

few dollars of rise or fall.

So the return tends to stay within a narrow range like (-2%, 2%),

regardless of the magnitude of price.

More precisely, the BS assumption is about log return i.e. log(price

relative). This makes sense. If %return is normal, then what is a

-150% return?

Hull: estimate default probability from bond prices

label: credit

The arithmetic on P524-525 could be expanded into a 5-pager if we were to explain to people with high-school math background…


There are 2 parts to the math. Part A computes the “expected” (probabilistic) loss from default to be $8.75 for a notional/face value of $100. Part B computes the same (via another route) to be $288.48Q. Equating the 2 parts gives Q =3.03%.


Q3: How is the 7% yield used? Where in which part?


Q4: why assume defaults happen right before coupon date?

%%A: borrower would not declare “in 2 days I will fail to pay the coupon” because it may receive help in the 11th hour.


–The continuous discounting in Table 23.3 is confusing

Q: Hull explained how the 3.5Y row in Table 23.3 is computed. Why discount to  the T=3.5Y and not discounting to T=0Y ?


The “risk-free value” (Column 4) has a confusing meaning. Hull mentioned earlier a “similar risk-free bond” (a TBond). At 3.5Y mark, we know this risk-free bond is scheduled to pay all cash flows at future times T=3.5Y, 4Y, 4.5Y, 5Y. We use risk-free rate 5% to discount all cash flows to T=3.5Y. We get $104.34 as the “value of the TBond cash flows discounted to T=3.5Y”


Column 5 builds on it giving the “loss due to a 3.5Y default, but discounted to T=3.5Y”. This value is further discounted from 3.5Y to T=0Y – Column 6.

Part B computes a PV relative to the TBond’s value. Actually Part A is also relative to the TBond’s value.


In the model of Part B, there are 5 coin flips occurring at T=0.5Y   1.5  2.5  3.5  4.5 with Pr(default_0.5) = Pr(default_1.5) = … = Pr(default_4.5) = Q. Concretely, imagine that Pr(flip = Tail) is 25%. Now Law of total prob states


100% = Pr(05) + Pr(15) + Pr(25) + Pr(35) + Pr(45) + Pr(no default). If we factor in the amount of loss at each flip we get


Pr(05) * $65.08 + Pr(15) * $61.20 + Pr(25) * $57.52 + Pr(35) * $54.01 + Pr(45) * $50.67 + Pr(no default, no loss) + $0 == $288.48Q

clarifying questions to reduce confusions in BS discussions

–At every mention of “pricing method”, ask

Q: Analytical (Ana) or Numerical (Num)?

Q: for European or American+exotic options?

Obviously Analytical methods only work for European style

Q: GBM assumption?

I think most numerical methods do. Every single method has severe

assumptions, so GBM is just one of them.

–At every mention of “Option”, ask

Q: European style of Amerian+Exotic style?

–At every mention of Black-Scholes, ask

Q: BS-E(quation) or BS-F(ormula) or BS-M(odel)?

Note numerical methods rely on BS-M or BS-E, not BS-F

–At every mention of Expectation, ask

Q: P-measure of Q-measure?

The other measures, like the T-fwd measure are too advanced, so no

need to worry for now.

Applying Ito’s formula on math problems — learning notes

Ito’s formula in a nutshell — Given dynamics of a process X, we can derive the dynamics[1] of a function[2] f() of x .

[1] The original “dynamics” is usually in a stoch-integral form like

  dX = m(X,t) dt + s(X,t) dB

In some problems, X is given in exact form not integral form. For an important special case, X could be the BM process itself:


[2] the “function” or the dependent random variable “f” is often presented in exact form, to let us find partials. However, in general, f() may not have a simple math form. Example: in my rice-cooker, the pressure is some unspecified but “tangible” function of the temperature. Ito’s formula is usable if this function is twice differentiable.

The new dynamics we find is usually in stoch-integral form, but the right-hand-side usually involves X, dX, f or df.

Ideally, RHS should involve none of them and only dB, dt and constants. GBM is such an ideal case.

change of .. numeraire^measure

Advice: When possible, I would work with CoN rather than CoM. I believe once we identify another numeraire (say asset B) is useful, we just know there exists an equivalent measure associated with B (say measure J), so we could proceed. How to derive that measure I don’t remember. Maybe there’s a relatively simple formula, but very abstract.

In one case, we only have CoM, no CoN — when changing from physical measure to risk neutral measure. There is no obvious, intuitive numeraire associated with the physical measure!

CoN is more intuitive than CoM. Numeraire has a more tangible meaning than “measure”.

I think even my grandma could understand 2 different numeraires and how to switch between them.  Feels like simple math.

CoM has rigorous math behind it. CoM is not just for finance. I guess CoM is the foundation and basis of CoN.

I feel we don’t have to have a detailed, in-depth grasp of CoM to use it in CoN.

physical measure is impractical

Update: Now I think physical probability is not observable nor quantifiable and utterly unusable in the math including the numerical methods.  In contrast, RN probabilities can be derived from observed prices.

Therefore, now I feel physical measure is completely irrelevant to option math.

RN measure is the “first” practical measure for derivative pricing. Most theories/models are formulated in RN measure. T-Forward measure and stock numeraire are convenient when using these models…

Physical measure is an impractical measure for pricing. Physical measure is personal feeling, not related to any market prices. Physical measure is mentioned only for teaching purpose. There’s no “market data” on physical measure.

Market prices reflect RN (not physical) probabilities.

Consider cash-or-nothing bet that pays $100 iff team A wins a playoff. The bet is selling for $30, so the RN Pr(win) = 30%. I am an insider and I rig the game so physical Pr() = 80% and Meimei (my daughter) may feel it’s 50-50 but these personal opinions are irrelevant for pricing any derivative.

Instead, we use the option price $30 to back out the RN probabilities. Namely, Calibrate the pricing curves using liquid options, then use the RN probabilities to price less liquid derivatives.

Professor Yuri is the first to point out (during my oral exam!) that option prices are the input, not the output to such pricing systems.

drift ^ growth rate – are imprecise

The drift rate “j” is defined for BM not GBM
                dAt = j dt + dW term
Now, for GBM,
                dXt = r Xt  dt + dW term
So the drift rate by definition is r Xt, Therefore, it’s confusing to say “same drift as the riskfree rate”. Safer to say “same growth rate” or “same expected return”

so-called tradable asset – disillusioned

The touted feature of a “tradable” doesn’t impress me. Now I feel this feature is useful IMHO only for option pricing theory. All traded assets are supposed to follow a GBM (under RN measure) with the same growth rate as the MMA, but I’m unsure about most of the “traded assets” such as —
– IR futures contracts
– weather contracts
– range notes
– a deep OTM option contract? I can’t imagine any “growth” in this asset
– a premium bond close to maturity? Price must drop to par, right? How can it grow?
– a swap I traded at the wrong time, so its value is decreasing to deeply negative territories? How can this asset grow?

My MSFM classmates confirmed that any dividend-paying stock is disqualified as “traded asset”. There must be no cash coming in or out of the security! It’s such a contrived, artificial and theoretical concept! Other non-qualifiers:

eg: spot rate
eg: price of a dividend-paying stock – violates the self-financing criteria.
eg: interest rates
eg: swap rate
eg: future contract’s price?
eg: coupon-paying bond’s price

Black’s model isn’t interest rate model #briefly

My professors emphasized repeatedly
* first generation IR model is the one-factor models, not Black model.
* Black model initially covered commodity futures
* However, IR traders adopted Black’s __formula__ to price the 3 most common IR options
** bond options (bond price @ expiry is LN
** caps (libor rate @ expiry is LN
** swaptions ( swap rate @ expiry is LN
** However, it’s illogical to assume the bond price, libor ate, and swap rates on the contract expiry date (three N@FT) ALL follow LogNormal distributions.

* Black model is unable to model the term structure. I think it doesn’t eliminate arbitrage. I would say that a proper IR model (like HJM) must describe the evolution of the entire yield curve with N points on the curve. N can be 20 or infinite…

mean reversion in Hull-White model

The (well-known) mean reversion is in drift, i.e. the inst drift, under physical measure.

(I think historical data shows mean reversion of IR, which is somehow related to the “mean reversion of drift”….)

When changing to RN measure, the drift is discarded, so not relevant to pricing.
However, on a “family snapshot”, the implied vol of fwd Libor rate is lower the further out accrual startDate goes. This is observed on the market [1], and this vol won’t be affected when changing measure. Hull-White model does model  this feature:
<!–[if gte msEquation 12]>σte-a T-t <![endif]–>
[1] I think this means the observed ED future price vol is lower for a 10Y expiry than a 1M expiry.

HJM, again

HJM’s theory started with a formulation containing 2 “free” processes — the drift (alpha) and vol (sigma) of inst fwd rate

df­T=α(t) dt+σ(t) dW –>    
, both functions of time and could be stochastic.
Note the vol is defined differently from the Black-Scholes vol.
Note this is under physical measure (not Q measure).
Note the fwd rate is instantaneous, not the simply compounded.
We then try to replicate one zero bond (shorter maturity) using another (longer maturity), and found that the drift process alpha(t) is constrained and restricted by the vol process sigma(t), under P measure. In other words, the 2 processes are not “up to you”. The absence of arbitrage enforces certain restrictions on the drift – see Jeff’s lecture notes.
Under Q measure, the new drift process [1] is completely determined by the vol process. This is a major feature of HJM framework. Hull-white focuses on this vol process and models it as an exponential function of time-to-maturity:
<!–[if gte msEquation 12]>σte-a T-t <![endif]–> 
That “T” above is confusing. It is a constant in the “df” stochastic integral formula and refers to the forward start date of the (overnight, or even shorter) underlying forward loan, with accrual period 0.
[1] completely unrelated to the physical drift alpha(t)
Why bother to change to Q measure? I feel we cannot do any option pricing under P measure.  P measure is subjective. Each investor could have her own P measure.
Pricing under Q is theoretically sound but mathematically clumsy due to stochastic interest rate, so we change numeraire again to the T-maturity zero bond.
Before HJM, (I believe) the earlier TS models can’t support replication between bonds of 2 maturities — bond prices are inconsistent and arbitrage-able

vol, unlike stdev, always implies a (stoch) Process

Volatility, in the context of pure math (not necessarily finance), refers to the coefficient of dW term. Therefore,
* it implies a measure,
* it implies a process, a stoch process

Therefore, if a vol number is 5%, it is, conceptually and physically, different from a stdev of 0.05.

* Stdev measures the dispersion of a static population, or a snapshot as I like to say. Again, think of the histogram.
* variance parameter (vol^2) of BM shows diffusion speed.
* if we quadruple the variance param (doubling the vol) value, then the terminal snapshot’s stdev will double.

At any time, there’s an instantaneous vol value, like 5%. This could last a brief interval before it increase or decreases. Vol value changes, as specified in most financial models, but it changes slowly — quasi-constant… (see other blog posts)

There is also a Black-Scholes vol. See other posts.

Gaussian HJM, briefly

… is a subset of HJM models.

An HJM model is Gaussian HJM if vol term is deterministic. Note “vol” term means the coefficient of the dW term. Every Brownian motion must always refer to an implicit measure. In this case, the RN measure.

How about the drift term i.e. the “dt” coefficient? It too has to be deterministic to give us a Gaussian HJM.

Well, Under the RN measure, the drift process is determined completely by the vol process. Both evolve with time, but are considered slow-moving [1] relative to the extremely fast-moving Brownian Motion of “dW”. Extremely because there’s no time-derivative of a BM

[1] I would say “quasi constant”

Language is not yet precise so not ready to publish on recrec…

Radon-Nikodym derivative #Lida video

Lida pointed out CoM (change of measure) means that given a pdf bell curve, we change its mean while preserving its “shape”! I guess the shape is the LN shape?

I guess CoM doesn’t always preserve the shape.

Lida explained how to change one Expectation integral into another… Radon Nikodym.

The concept of operating under a measure (call it f) is fundamental and frequently mentioned but abstract…

Aha – Integrating the expectation against pdf f() is same as getting the expectation under measure-f. This is one simple, if not rigorous, interpretation of operating under a given measure. I believe there’s no BM or GBM, or any stochastic process at this stage — she was describing how to transform one static pdf curve to another by changing measure. I think Girsanov is different. It’s about a (stochastic) process, not a static distribution.

discounted asset price is MG but "discount" means…@@

The Fundamental Theorem

A financial market with time horizon T and price processes of the risky asset and riskless bond (I would say a money-market-account) given by S1, …, ST and B0, …, BT, respectively, is arbitrage-free under the real world probability P if and only if there exists an equivalent probability measure Q (i.e. risk neutral measure) such that
The discounted price process, X0 := S0/B0, …, XT := ST/BT is a martingale under Q.

#1 Key concept – divide the current stock price by the current MMA value. This is the essence of “discounting“, different from the usual “discount future cashflow to present value
#2  key concept – the alternative interpretation is “using MMA as currency, then any asset price S(t) is a martingale”
I like the discrete time-series notation, from time_0, time_1, time_2… to time_T.
I like the simplified (not simplistic:) 2-asset world.
This theorem is generalized with stochastic interest rate on the riskless bond:)
There’s an implicit filtration. The S(T) or B(T) are prices in the future i.e. yet to be revealed [1]. The expectation of future prices is taken against the filtration.
[1] though in the case of T-forward measure, B(T) = 1.0 known in advance.
–[[Hull]] P 636 has a concise one-pager (I would skip the math part) that explains the numeraire can be just “a tradable”, not only the MMA. A few key points:

) both S and B must be 2 tradables, not something like “fwd rate” or “volatility”
) the measure is the measure related to the numeraire asset
) what market forces ensure this ratio is a MG? Arbitragers!

HJM, briefly

* HJM uses (inst) fwd rate, which is continuously compounded. Some alternative term structure models use the “short rate” i.e. the extreme version of spot overnight rate. Yet other models [1] use the conventional “fwd rate” (i.e. compounded 3M loan rate, X months forward.)

[1] the Libor Mkt Model

* HJM is mostly under RN measure. The physical measure is used a bit in the initial SDE…

* Under RN measure, the fwd rate follows a BM (not a GBM) with instantaneous drift rate and instantaneous variance both time-dependent but slow-moving. Since it’s not GBM, the N@T is Normal, not LG
** However, to use the market-standard Black’s formula, the discrete fwd rate has to be LN

* HJM is the 2nd generation term-structure model and one of the earliest arbitrage free model. In contrast, the Black formula is not even an interest rate model.

[[Hull]] is primarily theoretical

[[Hull]] is first a theoretical / academic introductory book. He really likes theoretical stuff and makes a living on the theories.

As a sensible academic, he recognizes the (theory-practice) “gaps” and brings them to students’ attention. but I presume many students have no spare bandwidth for it. Exams and grades are mostly on the theories.

importance@GBM beyond BS-M #briefly

To apply BS-formula on interest rate derivatives, the underlyer process must become GBM, often by changing measure. I guess the dividend-paying stock also needs some treatment before the BS-formula can apply…

But GBM is not just for BS-Model:

GBM is used in Girsanov!

I guess most asset prices show an exponential growth rate, so a GBM with time-varying mu and sigma (percentage drift and percentage volatility) is IMO general and flexible, if not realistic. However, I don’t feel interest rate or FX spot rates are asset prices at all. Futures prices converge. Bond prices converge…

1st theorem of equivalent MG pricing, precisely: %%best effort

Despite my best effort, I think this write-up will have
* mistakes
* unclear, ambiguous points
, but first step is to write it down. This is first phase of thin->thick->thin.

Version 1: under RN measure [1], all traded asset [2] prices follows a GBM [4] with growth rate [5] similar to the riskfree money market account. The variance parameter of the GBM is unique to each asset.
Version 2: under RN measure [1], all traded asset [2] prices discounted to PV [3] by the riskfree money market account are martingales. In fact they are 0-drift GBM with individual volatilities.
Version 3: under RN measure [1], all traded asset [2] prices show an expected [3] return equal to the riskfree rate.
[2] many things are not really traded asset prices. See post on “so-called tradable”
[3] why we need to discount to present, and why “expected” return? because we are “predicting” the level of random walker /towards/ a target time later than the last revelation.  The value before the revelation is “realized”, “frozen” and has no uncertainty, no volatility, no diffusion, and no bell-shaped distribution.
[4] no BM here. All models are GBM.
[5] see post on drift ^ grow rate

don’t use cash instrument in replication strategies

update — use “bank account” …

Beginners like me often intuitively use cash positions when replicating some derivative position such as a fwd, option or swap.

I think that’s permissible in trivial examples, but in the literature, such a cash position is replaced by a bond position or a MMA. I think the reason is, invariably the derivative position has a maturity, so when we lend or borrow cash or deploy our savings for this replication strategy, there’s a fixed period with interest . It’s more like a holding a bond than a cash position.

Mark Hendrick’s YC rules of thumb, clean n simplified

Mark Hendricks’ lecture on Fixed Income introduced a nice, simplified methods of looking at many important mathematical rules of thumb on the yield curve.
We used only zero coupon government bonds without any call feature.
We used log yield, log fwd rate, log spot rate, log return, log price (usually negative) etc. This reduces compounding to addition! There’s no “1+r” factor either.
As much as possible we use one-period (1Y) loans. All the interest rates quoted are based on some hypothetical (but realistic) loan, and the loan period is for one-period, though it can forward-start 3 periods from time of observation. One of the common exceptions — 5Y point on the yield curve is the yield on a bond with 5Y time to live, so this loan period is 5Y, not one-period.
If the shortest unit of measurement is a month, then that’s the one-period, otherwise, 1Y is the one-period. All the rates and yields are annualized.
Perhaps the best illustration is the rule on fwd curve vs YC, on P4.12.

dummy’s PCP intro: replicating portf@expiry→pre-expiry #4IV

To use PCP in interview problem solving, we need to remember this important rule.

If you don’t want to analyze terminal values, and instead decide to analyze pre-expiry valuations, you may have difficulty.

The right way to derive and internalize PCP is to start with terminal payoff analysis. Identify the replicating portfolio pair, and apply the basic principle that

“If 2 portfolios have equal values at expiry, then any time before expiry, they must have equal value, otherwise arbitrage“.

Even though this is the simplest intro to a simple option pricing theory, it is not so straightforward!

Z_0 == discount factor

Background – in mathematical finance, DF is among the most basic yet practical concepts. Forward contracts (including equity fwd used in option pricing, FX fwd, FRA…) all rely directly on DF. DF is part of most arbitrage discussions including interview questions.

When we talk about a Discount Factor value there are always a few things implicit in the context

* a valuation date, which precedes

* a cash flow date,

* a currency

* a financial system (banking, riskfree bond…) providing liquidity, which provides

* a single, consistent DF value, rather than multiple competing values.

* [1] There's no uncertainty in this DF value, as there is about most financial contracts

– almost always the DF value is below 1.0

– it's common to chain up 2 DF periods

An easily observable security price that matches a DF value is the market price of a riskless zero-coupon bond. Usually written as Z_0. Now we can explain [1] above. Once I buy the bond at this price today (valuation date), the payout is guaranteed, not subject to some market movement.

In a math context, any DF value can be represented by a Z_0 or Z(0,T) value. This is the time-0 price of some physical security. Therefore, the physical security “Z” is a concrete representation of the abstract _concept_ of discount factor.

math power tools transplanted -> finance


* martingale originates in gambling…
* Brownian motion originates in biology.
* Heat equation, Monte Carlo, … all have roots in physical science.

These models worked well in the original domains, because the simplifications and assumptions are approximately valid even though clearly imperfect. Assumptions are needed to simplify things and make them /tractable/ to mathematical analysis.

In contrast, financial mathematicians had to make blatantly invalid assumptions. You can find fatal flaws from any angle. Brian Boonstra told me all good quants appreciate the limitations of the theories. A small “sample”:

– The root of the randomness is psychology, or human behavior, not natural phenomenon. The outcome is influenced fundamentally by human psychology.
– The data shows skew and kurtosis (fat tail).
– There’s often no way to repeat an experiment
– There’s often just a single sample — past market data. Even if you sample it once a day, or once a second, you still operate on the same sample.

local vol^stoch vol, briefly

Black-Scholes vol – constant, not a function of anything.
** simplest

stoch vol – there’s a dB element in dσ. See http://en.wikipedia.org/wiki/Stochastic_volatility

** most elaborate
** this BM process is correlated to the BM process of the asset price. Correlation ranging from -1 to 0 to 1.

local vol – sigma_t as a deterministic function of S_t and t, without any “stochastic” dB element.
** middle ground. A simplification of stoch vol

using dB_t vs B_t in a formula

label – stoch

(dW is a common synonym of dB)


Whenever I see dB, I used to see it as a differential of “B”. But it’s undefined – no such thing as a differential for a Brownian motion!


Actually any dB equation is an integral equation involving a “stochastic integral” which has a non-trivial definition. Greg Lawler spent a lot of time explaining the meaning of a stochastic integral.


I seldom see B itself, rather than dB, used in a formula describing a random process. The one big exception is the exact formula describing a GBM process.


<!–[if gte msEquation 12]>St=S0exptm-s22+Bt s<![endif]–>

Greg confirmed that the regular BM can also be described in B rather than dB:

given <!–[if gte msEquation 12]>dXt=m dt+s dBt<![endif]–>

<!–[if gte msEquation 12]>Xt=X0+m t+s Bt<![endif]–>

This signal-noise formula (see other post) basically says random walk position X at time t is a non-random, predictable position with an chance element superimposed. The chance element is a random variable ~ N(mean 0, variance s2t).


This is actually the most precise description of the random process X(t).


We can also see this as a generalized Brownian motion, expressed using a Standard Brownian motion. Specifically, it adds drift rate and variance parameter to the SBM.


martingale – learning notes#non-trivial

A martingale is a process, always associated with a filtration, or the unfolding of a story.  Almost always [1]) the unfolding has a time element.
[1] except trivial cases like “revealing one poker card at a time” … don’t spend too much time on that.
In the Ito formula context, (local) martingale basically means zero dt coefficient. Easy to explain. Ito’s calculus always predicts the next increment using 1) revealed values of some random process and 2) the next random increment in a standard BM:
      dX = m(X, Y, …, t) dt    +   1(X, Y…, t)dB1      +   2(X, Y…, t)dB2 +…
Now, E[dX] = 0 for a (local) martingale, but we know the dB terms contribute nothing.
counter-example – P xxx [[Zhou Xinfeng]] has a simple, counter-intuitive illustration: B3 is NOT a martingale even though by symmetry E[B^3] = 0. (Local) Martingale requires
     E[B^3 | last revealed B_t value] = 0 , which doesn’t hold.

BM ^ GBM with or +! drift – quick summary

B1) BM with dB and drift = 0? the Standard BM, simple but not useless. A cornerstone building block of more complex stoch processes.

G1) GBM with dB and drift = 0? The simplest, purest GBM. The tilting function used in Girsanov theorem. See my blog “GBM + zero drift”

G2) GBM with drift without dB? a deterministic exponential growth path. Example – bank account. Used in BS and pricing.

B2) BM with drift without dB? a linear, non-random walker, not a real BM, useless in stoch.

B3) Now, how do we deal with a BM with both drift and dB? use G1 construct as “tilting function” to effect a change of measure. In a nutshell,

B3 –(G1)–> B1

G3) GBM with drift + dB? The most common stock price model.

from BM’s to GBM’s drift rate – eroded by .5 sigma^2

Let’s start with a regular BM with a known drift *rate* denoted “m”, and known variance parameter value, denoted “s”:
dX = m dt + s dBt
In other words,
Xt – X0 = m*t + s*Bt
Here, “… + t” has a non-trivial meaning. It is not same as adding two numbers or adding two variables, but rather a signal-noise formula… It describes a Process, with a non-random, deterministic part, and a random part whose variance at time t is equal to (s2 t)
Next, we construct or encounter a random process G(t) related but not derived from this BM:
dG/G = m dt + s dBt    …….. [2]
It turns out this process can be exactly described as
  G = G0exp[ (m- ½ s2)t  + s Bt ]     ………. [3]
Again, the simple-looking “… + s Bt” expression has a non-trivial meaning. It describes a Process, whose log value has a deterministic component, and a random component whose variance is (s2 t).
Note in the formula above (m- ½ s2) isn’t  the drift of GBM process G(t), because left hand side is “dG / G” rather than dG itself.
In contrast, (m- ½ s2) is a drift rate in the “log” process L(t) := log G(t). This log process is a BM.

                dL = (m – ½ s2) dt + s dBt    …… [4]

If we compare [2] vs [4], we see the drift rate eroded by (½ s2).
(You may feel dL =?= dG/G but that’s “before” Ito. Since G(t) is an Ito process, to get dL we must apply Ito’s and we end up with [4].)
I wish there’s only one form to remember, but unfortunately, [2] and [4] are both used extensively.
In summary
* Starting from a BM with drift = (u) dt
** the exponential process Y(t) derived from the BM has drift (not drift rate)
= [u + ½ s2 ] Y(t) dt
* Starting from a GBM (Not something derived from BM) process with drift (not drift rate) = m* G(t) dt
** the log process L(t), derived from the GBM process, is a BM with drift
= (m – ½ s2) dt, not “…L(t) dt”

BM: y Bt^3 isn’t a martingale

Greg gave a hint: . Basically, for positive X, the average is higher, because the curve is convex.
Consider the next brief interval (long interval also fine, but a brief dt is the standard approach). dX will be Normal and symmetric. the +/- 0.001 stands for dX. For each positive outcome of dx like 0.001, there’s an equally likely -0.001 outcome. We can just pick any pair and work out the contribution to E[(X+dX)3].
For a martingale, E[dY] = 0 i.e. E[Y+dY] = E[Y]. In our case, Y:=X3 , so E[(X+dX)3] need to equal E[X3] ….
Note that Bt3 is symmetric so mean = 0. It’s 50/50 to be positive or negative, which does NOT make it a martingale. I think the paradox is filtration or “last revealed value”.
Bt3 is symmetric only when predicting at time 0. Indeed, E[Bt3 | F_0] = 0 for any target time t. How about given X(t=2.187) = 4?
E[(4 + dX)^3] works out to be 4^3 + 3*4*E[dX^2] != 4^3

stoch integral – bets on each step of a random walk

label – intuitive
Gotcha — In ordinary integration, if we integrate from 0 to 1, then dx is always a positive “step”. If the integrand is positive in the “strip”, then the area is positive. Stoch integral is different. Even if integrand is always positive the strip “area” can be negative because the dW is a coin flip.

Total area is a RV with Expectation = 0.

In Greg Lawler’s first mention (P 11) of stoch integral, he models the integrand’s value (over a brief interval deltaT) as a “bet” on a coin flip, or a bet on a random walk. I find this a rather intuitive, memorable, and simplified description of stoch integral.
Note the coin flip can be positive or negative and beyond our control. We can bet positive or negative. The bet can be any value. For now, don’t worry about the magnitude of the random step. Just assume each step is +1/-1 like a coin flip
If the random walk has no drift (fair coin), then any way you bet on it, you are 50/50 i.e. no way to beat a martingale. Therefore, the integral is typically (expected to be) 0. Let’s denote the integral as C. What about E[C2] ? Surely positive.  We need the variance rule…
Q: Does a stoch integral always have expectation equal to last revealed value of the integral?
A: Yes. It is always a local martingale. If it’s bounded, then it’s also a martingale.

change of measure, learning notes

See also — I have a long MSWord doc in my c:\0\b2b

Key — Start at discrete. Haksun’s example on coin flip…. Measure-P assigns 50/50 to head/tail. Measure-Q assigns 60/40 weights. Therefore we can see dQ/dP is a function (denoted M ), which maps each outcome (head/tail) to some amount of probability mass. Total probability mass adds up to 100%.

Requirement — the 2 measures are equivalent i.e. the pdf curve have exactly the same support. So M() is well-defined [1]. In the continuous case, suppose the original measure P is defined over a certain interval, then so is the function M(). However function M isn’t a distribution function like P, because it may not “add to 100%”. (I guess we just need the Q support to be a subset…)

Notation warning — V represents a particular event (like HT*TH), M(V) is a particular number, not a random variable IMO. Usually expectation is computed from probability. Here, however, probability is “defined” with expectation. I think when we view probability as a function, the input V is not a random variable like “X:=how many heads in 10 flips”, but a particular outcome like “X==2”.
Now we can look at the #1 important equation EQ1:

Q(V) := Ep [M(V) 1V] , where Ep () denotes expectation under the original Measure-P.

This equation defines a new probability distro “Q” using a P-expectation.

Now we can look at the #2 equation EQ2, mentioned in both Lawler’s notes and Haksun:

EQ[X] = Ep [ X M(X) ]

Notation warning — X is a random variable (such as how many heads in 10 flips), not a particular event like HT*TH. In this context, M(X) is a derived RV.

Key – here function M is used to “tilt” the original measure P. This tilting is supposed to be intuitive but not for me. The input to M() can be an event or (P141) a number! On P142 of Greg’s notes, M itself is a random variable.

Next look at a continuous distro.

Key – to develop intuition, use binomial approximation, the basis of computer simulation.

Key – in continuous setting, the “outcomes” are entire paths. Think of 1000 paths simulated. Each path gets a probability mass under P and under Q. Some paths get higher prob under Q than under P; the other paths get lower prob under Q.

The magic – with a certain tilting function, a BM with a constant drift rate C will “transform” to a symmetric BM. That magic tilting function happens to be …

I wonder what this magic tilting function looks like as a graph. Greg said it’s the exponential shape as given at end of P146, assuming m is a positive constant like 1.

In simple cases like this, the driftless BM would acquire a Positive drift via a Positive tilt. Intuitively, it’s just _weighted_average_:

* For the coin, physical measure says 50/50, but the new measure *assigns* more weight to head, so weighted average would be tilted up, positively towards heads.
* For the SBM, physical measure says 50/50, but the new measure *assigns” more weight to positive paths, so the new expectation is no longer 0 but positive.

[1] This function M is also described as a RV with Ep [M] = 1. For some outcomes Q assigns higher probability mass than P, and lower for other outcomes. Average out to be equal.

My take on Ito’s, using d(X*Y) as example

Let J be the random process defined by Jt := Xt Yt. At any time, the product of X and Y is J’s value. (It’s often instructive to put aside the “process” and regard J as a derived random VARIABLE.) Ito’s formula says
    dJ := d(Xt Yt) = Xt dY + Yt dX + dX dY
Note this is actually a stoch integral equation. If there’s no dW term hidden in dX, then this reduces to an ordinary integral equation. Is this also a “differential equation”? No. There’s no differential here.

Note that X and Y are random processes with some diffusion i.e. dW elements.

I used to see it as an equation relating multiple unknowns – dJ, dX, dY, X, Y. Wrong! Instead, it describes how the Next increment in the process J is Determined and precisely predicted
Ito’s formula is a predictive formula, but it’s 100% reliable and accurate. Based on info revealed so far, this formula specifies exactly the mean and variance of the next increment dJ. Since dJ is Guassian, the distribution of this rand var is fully described. We can work out the precise probability of dJ falling into any range.

Therefore, Ito’s formula is the most precise prediction of the next increment. No prediction can be more precise. By construction, all of Xt, Yt, Jt … are already revealed, and are potential inputs to the predictive formula. If X (and Y) is a well-defined stoch process, then dX (and dY) is predicted in terms of Xt , Yt , dB and dt, such as dX = Xt2 dt + 3Yt dB

The formula above actually means “Over the next interval dt, the increment in X has a deterministic component (= current revealed value of X squared times dt), and a BM component ~ N(0, variance = 9 Yt2 dt)”

Given 1) the dynamics of stoch process(es), 2) how a new process is composed therefrom, Ito’s formula lets us work out the deterministic + random components of __next_increment__.

We have a similarly precise prediction of dY, the next increment in Y. As such, we already know
Xt, Yt — the Realized values
dt – the interval’s length
dX, dY – predicted increments
Therefore dJ can be predicted.
For me, the #1 take-away is in the dX formula, which predicts the next increment using Realized values.

BM – B(3)^B(5) independent@@

Jargon: B3 or B(3) means the random position at time 3. I think this is a N@T.

Q: Are the 2 random variables B3 and B5 independent?
A: no. Intuitively, when B3 is very high, like 3 sigma above the mean (perhaps a value of 892), then B5 is likely to Remain high, because for the next 2 seconds, the walker follows a centered, symmetric random walk, centered at the realized value of 892.

We know the increment from time 3 to 5 is ind of all previous values. Let d be that increment.

B5 = B3 + d, the sum of two ind Normal RV. It’s another normal RV, but dependent on the two!

BM hitting 3 before hitting -5

A common Brownian Motion quiz ([[Zhou Xinfeng]]): Given a simple BM, what's the probability that it hits 3 before it hits -5?

This is actually identical to the BM with upper and lower boundaries. The BM walker stops when it hits either boundary. We know it eventually stops. At that stopping time, the walker is either at 3 or -5 but which is more likely?

Ultimately, we rely on the optional stopping theorem – At the stopping time, the martingale's value is a random variable and its expectation is equal to the initial value.

optional stopping theorem, my take

label – stoch

Background — There's no way to beat a fair game. Your winning always has an expected value of 0, because winning is a martingale, i.e. expected future value for a future time is equal to the last revealed value.

Now, what if there's a stopping time i.e, a strategy to win and end the game? Is the winning at that time still a martingale? If it's not, then we found a way to beat a fair game.

For a Simple Random Walk (coin flip) with upper/lower bounds, answer is intuitively yes, it's a martingale.

For a simple random walk with only an upper stopping bound (say $1), answer is — At the stopping time, the winning is the target level of $1, so the expected winning is also $1, which is Not the starting value of $0, so not a martingale! Not limited to the martingale betting strategy. So have we found a way to beat the martingale? Well, no.

“There's no way to beat a martingale in __Finite__ time”

You can beat the martingale but it may take forever. Even worse (a stronger statement), the expected time to beat the martingale and walk away with $1 is infinity.

The OST has various conditions and assumptions. The Martingale Betting Strategy violates all of them.

square integrable martingale

https://www.math.nyu.edu/faculty/varadhan/stochastic.fall08/3.pdf has a more detailed definition than Lawler's.

If a discrete martingale M(n) is a SIM, then

E[ M(99)^2 ] is finite, and so is E[ M(99999)^2 ].

Each (unconditional) expectation is, by definition, a fixed number and not random.

Consider another number “lim_(n-> inf) E[ M(n)^2 ]”. For a given martingale, this “magic attribute” is a fixed number and not random. A given square-integrable martingale may have an magic attribute greater than any number there is, i.e. it goes to infinity. But this magic attribute isn't relevant to us when we talk about square-integrable martingales. We don't care about the limit. We only care about “any number n”.

It's relevant to contrast that with quadratic variation. This is a limit quantity, and not random.

For a given process, Quadratic variation is a fixed value for a fixed timespan. For processA, Quadratic variation at time=28 could be 0.56; at time=30 it could be 0.6.

In this case, we divide the timespan into many, many (infinite) small intervals. No such fine-division in the discussion on square-integrable-martingales

process based@BM +! a stoch variance#Ronnie

One of the  Stochastic problems (HW3Q5.2) is revealing (Midterm2015Q6.4 also). We are given
  dX = m(X,t) dt + s(X,t) dBt
where m() and s() can be very complicated  functions. Now look at this unusual process definition, without Xt : 
Appling Ito’s, we notice this function, denoted f(), is a function of t, not a function of Xt, so df/dx = 0. We get
  dY = Yt Xt3 dt

So, There’s no dB term so the process Y has a drift only but no variance. However, the drift rate depends on X, which does have a dB component! How do you square the circle? Here are the keys:
Note we are talking about the variance of the Increment over a time interval delta_t
Key — there’s a filtration up to time t. At time t, the value of X and Y are already revealed and not random any more.
Key — variance of the increment is always proportional to delta_t, and the linear factor is the quasi-constant “variance parameter”. Just like instantaneous volatility, this variance parameter is assumed to be slow-changing. 
(Ditto for the drift rate..)
In this case, the variance parameter is 0. The increment over the next interval has only a drift element, without a random element.

Therefore, the revealed, realized values of X and Y determine the drift rate over the Next interval of delta_t

Riemann ^ stoch integral, learning notes

In a Riemann integral, each strip has an area-under-the-curve being either positive or negative, depending on the integrand’s sign in the strip. If the strip is “under water” then area is negative.

In stochastic integral [1], each piece is “increment   *   integrand”, where both increment and integrand values can be positive/negative. In contrast, the Riemann increment is always positive.

With Riemann, if we know integrand is entirely positive over the integration range, then the sum must be positive. This basic rule doesn’t apply to stochastic integral. In fact, we can’t draw a progression of adjacent strips as illustration of stochastic integration.

Even if the integrand is always positive, the stoch integral is often 0. For an (important) example, in a fair game or a drift-less random walk, the dB part is 50-50 positive/negative.

[1] think of the “Simple Process” defined on P82 by Greg Lawler.

On P80, Greg pointed out
* if integrand is random but the dx is “ordinary” then this is an ordinary integral
* if the dx is a coin flip, then whether integrand is random or not, this is a stoch integral

So the defining feature of a stoch integral is a random increment

simplest SDE (!! PDE) given by Greg Lawler

P91 of Greg Lawler’s lecture notes states that the most basic, simple SDE
  dXt = At dBt     (1)
can be intuitively interpreted this way — Xt is like a process that at time t evolves like a BM with zero drift and variance At2. 

In order to make sense of it, let’s back track a bit. A regular BM with 0 drift and variance_parameter = 33 is a random walker. At any time like 64 days after the start (assuming days to be the unit of time), the walker still has 0 drift and variance_param=33. The position of this walker is a random variable ~ N(0, 64*33). However, If we look at the next interval from time 64 to 64.01, the BM’s increment is a different random variable ~ N(0, 0.01*33).
This is a process with constant variance parameter. In contrast, our Xt process has a … time-varying variance parameter! This random walker at time 64 is also a BM walker, with 0 drift, but variance_param= At2. If we look at the interval from time 64 to 64.01, (due to slow-changing At), the BM’s increment is a random variable ~ N(0, 0.01At2).
Actually, the LHS “dXt” represents that signed increment. As such, it is a random variable ~ N(0, dt At2).

Formula (1) is another signal-noise formula, but without a signal. It precisely describes the distribution of the next increment. This is as precise as possible.

Note BS-E is a PDE not a SDE, because BS-E has no dB or dW term.

filtration +! periodic observations

In the stochastic probability (not “statistics”) literature, at least in the beginner level literature, I often see mathematicians elude the notion of a time-varying process. I think they want a more generalized and more rigorous terminology, so they prefer filtration.

I feel most of the time, filtration takes place through time.

Here’s one artificial filtration without a time element — cast a bunch of dice at once (like my story cube) but reveal one at a time.

Stoch Lesson 38 parameters of BM

Lawler defined BM with 2 params – drift and variance v, but the meaning of variance is tricky.

Note a BM is about a TVRV and notice the difference between a N@T vs TVRV. A N@T could be modeled by a Gaussian variable with a variance. The variance v of a BM is about the variance of increment. Specifically, the increment over deltaT is a regular Gaussian RV with a variance = deltaT*v

Fn-measurable, adapted-to-Fn — in my own language

(Very basic jargon…)

In the discrete context, Fn represents F1, F2, F3 … and denotes a sequence or accumulation of information.

If something Mn is Fn-measurable, it means as we get the n-th packet of information, this Mn is no longer random. It’s now measurable, but possibly unknown. I would venture to say Mn is already realized by this time. The poker card is already drawn.

If a process Mt is adapted to Ft, then Mt is Ft-measurable…

rolling fwd measure#Yuri

(label: fixedIncome, finMath)


In my exam Prof Yuri asked about T-fwd measure and the choice of T.

I said T should match the date of cashflow. If a deal has multiple cashflow dates, then we would need a rolling fwd measure.  See [[Hull]

However, for a standard swaption, I said we should use the expiry date of the option. The swap rate revealed on that date would be the underlier and assumed to follow a LogNormal distro under the chosen T-fwd measure.

copula – 2 contexts

http://www.stat.ubc.ca/lib/FCKuserfiles/file/huacopula.pdf is  the best so far. But I feel all the texts seem to skip some essential clarification. We often have some knowledge about the marginal distributions of 2 rvars. We often have calibrated models for each. But how do we model the dependency? If we have either a copula or a joint CDF, then we can derive the other. I there are 2 distinct contexts — A) known CDF -> copula, or B) propose copula -> CDF


–Context A: known joint CDF

I feel this is not a practical context but an academic context, but students need to build this theoretical foundation.


Given 2 marginal distro F1 and F2 and the joint distro (let’s call it F(u1,u2) ) between them, we can directly produce the true copula. Denoted CF(u1, u2) on P72, True copula := the copula to reproduce the joint  CDF. This true copula C contains all information on the dependence structure between U1 and U2.


http://www.stat.ncsu.edu/people/bloomfield/courses/st810j/slides/copula.pdf P9 points that if the joint CDF is known (lucky!) then we can easily find the “true” copula that’s specific to that input distro.


In contrast to Context B, the true copula for a given joint distro is constructed using the input distros.


— Context A2:

Assume the joint distribution between 2 random variables X1 and X2 is, hmm ….. stable, then there exists a definite, concrete albeit formless CDF function H(x1, x2). If the marginal CDFs are continuous, then the true copula is unique by Sklar’s theorem.




–Context B: unknown joint CDF — “model the copula i.e. dependency, and thereby the CDF between 2 observable rvars”

This is the more common situation in practice. Given 2 marginal distro F1 and F2 without the joint distro and without the dependency structure, we can propose several candidate copula distributions. Each candidate copula would produce a joint CDF. I think often we have some calibrated parametric formula for the marginal distros, but we don’t know the joint distro, so we “guess” the dependency using these candidate copulas.


* A Clayton copula (a type of Archimedean copula) is one of those proposed copulas. The generic Clayton copula can apply to a lot of “input distros”

* the independence copula

* the        comonotonicity copula

* the countermonotonicity copula

* Gaussian copula


In contrast to Context A, these “generic” copulas are defined without reference to the input distros. All of these copulas are agnostic of the input random variables or input distributions. They apply to a lot of different input distros. I don’t think they match the “true” copula though. Each proposed copula describes a unique dependency structure.


Perhaps this is similar — we have calibrated models of the SPX smile curve at short tenor and long tenor. What’s the term structure of vol? We propose various models of the term structure, and we examine their quality. We improve on the proposed models but we can never say “Look this is the true term structure”. I would say there may not exist a stable term structure.

A copula is a joint distro, a CDF of 2 (or more) random variables. Not a density function. As such, C(u1, u2) := Pr(U1<u1, U2<u2). It looks (and is) a function, often parameterized.


time-series sample — Normal distribution@@

Q: What kind of (time-series) periodic observations can we safely assume a normal distribution?
A: if each periodic observation is under the same, never-changing context

Example: suppose every day I pick a kid at random from my son’s school class and record the kid’s height. Since the inherent distribution of the class is normal, my periodic sample is kind of normal. However, kids grow fast, so there’s an uptrend in the time series. Context is changing. I won’t expect a real normal distribution in the time series data set.

In finance, majority of the important time-series data are price-related including vol and return. Prices change over time, sometimes on an uptrend, sometimes on a downtrend. Example: if I ask 100 analysts to forecast the upcoming IBM dividend, I could perhaps assume a Normal distribution, but not the time-series.

In conclusion, in a finance context my answer to the opening question is “seldom”.

I would even say that financial data is no natural science but behavior science. Seldom has an inherent Normal distribution. How about central limit theorem? It requires iid, usually not valid.

Jensen’s inequality – option pricing

See also

This may also explain why a BM cubed isn’t a local martingale.

Q: How practical is JI?
A: practical for interviews.
A: JI is intuitive like ITM/OTM.
A: JI just says one thing is higher than another, without saying by how much, so it’s actually simpler and more useful than the precise math formulae. Wilmott calls JI “very simple mathematics”

JI is consistent with pricing math of vanilla call (or put). Define f(S) := (S-K)+. This hockey-stick is a kind of convex function. Now Under standard RN measure,

   E[ f(S_T) ] should exceed f (E[ S_T ])

LHS is the call price today. RHS simplifies to f (S_0) := (S_0 – K)+ which is the intrinsic value today.

How about a binary call? Unfortunately, Not convex or concave !

Jensen\'s Inequality
A graphical demonstration of Jensen’s Inequality. The expectations shown are with respect to an arbitrary discrete distribution over the xi

return rate vs log return – numerically close but LN vs N

Given IBM price is known now, the price at a future time is a N@T random var. The “return rate” over the same period is another N@T random var. BS and many models assume —

* Price ~ logNormal
* return ~ Normal i.e. a random var following a Normal distro

The “return” is actually the log return. In contrast,

* return rate ~ a LogNormal random variable shifted down by 1.0
* price relative := (return rate +1) ~ LogNormal

N@T means Noisegen Output at a future Time, a useful concept illustrated in other posts

Q (Paradox): As pointed out on P29 [[basic black scholes]], for small returns, return rate and log return are numerically very close, so why only log return (not return rate) can be assumed Normal?

A: “for small returns”… But for large (esp. neg) returns, the 2 return calculations are not close at all. One is like -inf, the other is like -100%
A: log return can range from -inf to +inf. In contrast, return rate can only range from -100% to +inf => can’t have a Normal distro as a N@T.

Basic assumption so far — daily returns are iid. Well, if we look at historical daily returns and compare adjacent values, they are uncorrelated but not independent. One simple set-up is, construct 2 series – odd days and even days. Uncorrelated, but not independent. The observed volatility of returns is very much related from day to day.

another (2nd) numeraire paradox

(This scenario is actually a 2-period world, well-covered in [[math of financial modeling and inv mgmt]]. However, this is NOT the simplest problem using a bank account or bond as numeraire. )
Consider a one-period market with exactly 2 possible time-T outcomes w1 and w2. Among the tradable assets is G. At termination,  
                G_T(w1) = $6
                G_T(w2) = $12.
Under G-measure, we are given RN PrG (w1) = PrG (w2) = 50%. It seems at time-0 (right now) G_0 must be 9, but it turns out to be 7!
Key – this RNPG is inferred from (and must be consistent with) the current market price of another asset [1]. In fact I believe any asset’s current price must be consistent with this G-measure RNPG. I guess the discounted expected payout equals the time-0 price.
Now can there be a 0% interest bank account B? In other words, is it possible to have B_T = B_0 = $1? Well, this actually implies a PrG (w1) = 5/7 (Verified!), not 50%. So this bank account’s current price is inconsistent with whatever asset used in [1] above. Arbitrage? I guess so.
I think it’s useful to work out (from the [1] asset’s current price) the bond current price  Z_0 = $0.875. This implies a predicable drift rate. I would say all assets (G, X, Z etc) have the same drift rate as the bond numeraire.
Next, it’s useful to work out that under Z-measure the RN Prz (w1) = 66.66% and Prz (w2) = 33.33%, very different RNPG values.
Q: under Z-measure, what’s G’s drift?
A: $7 -> $8
1) The most common numeraires (bank accounts and discount bonds) have just one “outcome”. (In a more advanced context, bank account outcome is uncertain, due to stoch interest rates.) This stylized example is different and more tricky. Given such a numeraire with multiple outcomes, it’s useful to infer the bond numeraire.
2) When I must work with such a numeraire, I usually have
* If I also have X_0 then I can back out Risk Neutral PrG(w1) and PrG(w2)
* alternatively, I can use X as numeraire and back out PrX(w1) and PrX(w1)
* If on the other hand we are given some of the PG numbers, then we can compute X_0 i.e. price the asset X.
[1] Here’s one such asset X_0 = 70 and X(w1) = 60 and X(w2) = 120.

fwd px ^ px@off-market eq-fwd

fwd price ^ price of an existing eq-fwd position. Simple rule to remember —
QQ) not $0 — fwd price is well above $0. Usually close to the current price of the asset.
EE) nearly $0 — current “MTM value” (i.e. PnL) of an existing fwd contract is usually close to +-$0. In fact, at creation the contract has $0 value. This well-known statement assumes both parties negotiated the price based on arb pricing.

Q: With IBM fwd/futures contracts, is there something 2D like the IBM vol surface?

2 contexts, confusing to me (but not to everyone else since no one points them out) —

EE) After a fwd is sold, the contract has a delivery price “K” and also a fluctuating PnL/mark-to-market valuation “f” [1]. Like a stock position (how about a IRS?) the PnL can be positive/negative. At end of day 31/10/2015, the trading venue won’t report on the MTM prices of an “existing” contract (too many), but the 2 counter-parties would, for daily PnL report and VaR.

If I’m a large dealer, I may be long/short a lot of IBM forward contracts with various strikes and tenors — yes a 2D matrix…

[1] notation from P 109 [[hull]], also denoted F_t.

QQ) When a dealer quotes a price on an IBM forward contract for a given maturity, there’s a single price – the proposed delivery price. Trading venues publish these live quotes. Immediately after the proposed price is executed, the MTM value = $0, always

The “single” price quoted is in stark contrast to option market, where a dealer quotes on a 2D matrix of IBM options. Therefore the 2D matrix is more intrinsic (and well-documented) in option pricing than in fwd contract pricing.

In most contexts in my blog, “fwd price” refers to the QQ case. However, in PCP the fwd contract is the EE type, i.e. an existing fwd contract.

In the QQ context, the mid-quote is the fwd price.

Mathematically the QQ case fwd price is a function of spot price, interest rate and tenor. There’s a simple formula.

There’s also a simple formula defining the MTM valuation in EE context. Its formula is related to the QQ fwd quote formula.

Both pricing formulas derived from arbitrage/replication analysis.

EE is about existing fwd contracts. QQ is about current live quotes.

At valuation time (typically today), we can observe on the live market a ” fwd price”. Both prices evolve with time, and both follow underlier’s price S_t. Therefore, both prices are bivariate functions of (t,S). In fact, we can write down both functions —

QQ: F_t = S_t / Z_t ….. (“Logistics”) where Z_t is the discount factor i.e. the T-maturity discount bond’s price observed@ t
EE: p@f = S_t – K*Z_t

( Here I use p@f to mean price of a fwd contract. In literature, people use F to denote either of them!)

To get an intuitive feel for the formulas, we must become very familiar with fwd contract, since fwd price is defined based on it.

Fwd price is a number, like 102% of current underlier price. There exists only one fair fwd price. Even under other numeraires or other probability measures, we will never derive a different number.

In a quiz, Z0 or S0 may not be given to you, but in reality, these are the current, observed market prices. Even with these values unknown, F_t = S_t / Z_t formula still holds.

Black’s model – uses fwd price as underlie, or as a proxy of the real underlier (futures price)

Vanilla call’s hockey stick diagram has a fwd contract’s payoff curve as an asymptote. But this “fwd contract’s payoff curve” is not the same thing as current p@f, which is a single number.

tradable/non-tradable underlier in a drv contract

I guess in many, many entry-level quant questions, we are often given a task to find the Risk Neutral [2] dynamics of some variable X. Simple examples include Xa(t)=S(t)^2 or Xb=600/S, Xc=sqrt(S), Xd=exp(S), Xe=S*logS … where S is the IBM price following a GBM. In many simple cases the variable X is also GBM under the Risk Neutral measure. We use Ito’s rule…

Then we are asked to price a contract that guarantees to pay X(T) at maturity.

At this point, it’s easy to forget the X itself is not tradeable i.e. the X process is not the price process of a tradeable asset. When interest rate goes from 200 to 201, the mid-quote (of any security) doesn’t go from $200 to $201, even though the implied vol or implied yield could go from 200 to 201.Another Eg – suppose I were to maintain tight bid/ask quotes around current value of 600/ S_IBM. If IBM is trading at $30 then I quote $20. If IBM trades at $40 then I quote $15. This market-maker would induce arbitrage (– intuitive to the practitioners but not the uninitiated). A contract paying 600/S_T on maturity has a fair price today X_0 that’s very, very different from 600/S_0  [1].

Given X(t) process isn’t a tradeable (not a price process), X doesn’t have drift equal to risk-free rate “r” 😦

However, don’t lose heart — noting this Contract is a tradable , the contract’s price process C(t) is tradeable and C(t) has (exponential) drift = r 🙂

Q: Basic question – Given X(t) isn’t a price process, does it make sense to apply Ito’s on X = 600/S ?
A: Yes because 1) Ito lets us (fully) characterize the dynamics of the X(t) process, albeit NOT a price process. In turn, 2) the SDE (+ terminal condition) reveals the distro of X(T). From the distro, we could find the 3) expectation of X(T) and the 4) pre-expiry price. Note every step requires a probability measure, since dW, BM, distro, expectation are all probabilistic concepts.

[1] Try to develop intuition — By Jensen’s inequality, it should be above 600/S_0, provided S process has non-zero volatility.
[2] (i.e. using money market account probability measure)

hockey stick – asymptote

(See also post on fwd price ^ PnL/MTM of a fwd position.)

Assume K = 100. As we get very very close to maturity, the “now-if” graph descends very very close to the linear hockey stick, i.e. the “range of (terminal) possibilities” graph.

10 years before maturity, the “range of (terminal) possibilities” graph is still the same hockey stick turning at 100, but the now-if graph is quite a bit higher than the hockey stick. The real asymptote at this time is the (off-market) fwd contract’s now-if graph. This is a straight line crossing X-axis at K * exp(-rT). See http://bigblog.tanbin.com/2013/11/fwd-contract-price-key-points.html

In other words, at time 0, call value >= S – K*exp(-rT)

As maturity nears, not only the now-if smooth curve but also the asymptote both descend to the kinked “terminal” hockey stick.

backfill bias n survivorship bias, briefly

based on http://oyc.yale.edu/sites/default/files/midterm_exam1_solutions.pdf

A hedge fund index has a daily NAV value based on the weighted average NAV of constituent funds. If today we discover some data error in the 1999 NAV, we the index provider are allowed to correct that historical data. Immediately, many performance stats would be affected and needs update. Such data error is rare (I just made it up for illustration.) This procedure happens only in special scenarios like the 2 scenarios below.

Survivorship bias: When a fund is dropped from an index, past values of the index is adjusted to remove that fund's past data.

Backfill bias: For example, if a new fund has been in business for two years at the time it is added to the index, past index values are adjusted for those two years. Suppose the index return over the last 2 years was 33%, based on weighted average of 200 funds. Now this new fund is likely more successful than average. Suppose its 2Y return is 220%. Even though this new fund has a small weight in the index, including it would undoubtedly boost the 2Y index return – a welcome “adjustment”.

While backfilling is obviously a questionable practice, it is also quite understandable. When an index provider first launches an index, they have an understandable desire to go back and construct the index for the preceding few years. If you look at time series of hedge fund index performance data, you will often note that indexes have very strong performance in the first few years, and this may be due to backfilling.

Towards expiration, how option greek graphs morph

(A veteran would look at other ways the curves respond to other changes, but I feel the most useful thing for a beginner to internalize is how the curves respond to … imminent expiration.)

Each curve is a rang-of-possibility curve since the x-axis is the (possible range of) current underlier prices.

— the forward contract’s price
As expiration approaches, …
the curve moves closer to the (terminal) payout graph — that straight line crossing at K.

— the soft hockey-stick i.e. “option price vs current underlier”

As expiration approaches, …

the curve descends closer to the kinked hockey stick payout diagram

Also the asymptote is the forward contract’s price curve, as described above.

— the delta curve
As expiration approaches, …

the climb (for the call) becomes more abrupt.

See diagram in http://www.saurabh.com/Site/Writings_files/qf301_greeks_small.pdf

— the gamma curve
As expiration approaches, …

the “bell” curve is squeezed towards the center (ATM) so the peak rises, but the 2 tails drop

— the vega curve
As expiration approaches, …

the “bell” curve descends, in a parallel shift

Modified duration^Macaulay duration, briefly again

The nice pivot diagram on http://en.wikipedia.org/wiki/Bond_duration is for Macaulay duration — dollar-weighted average maturity. Zero bond has duration equal to its maturity. (I think many textbooks use this diagram because it’s a good approximation to MD.)

The all-important sensitivity to yield is …. MD i.e. modified duration. Dv01 is related to MD (not Macaulay) — http://bigblog.tanbin.com/2012/05/bond-duration-absolute-1-relative-x.html

MD is the useful measure. It turned out that MD is different from Macaulay duration by a small factor.

variance or stdev is additive – an illustration

Imagine annual (log) return is controlled by a noisegen, whose mean is a constant value M and variance is another constant value sigma^2

Since we hit the noisegen once a year, over 5 years we get 5 random “numbers”, all with the same M and sigma. Each number is the realized annual (log) return. The cumulative end-to-end return is the sum of the 5 independent random variables. This sum is a random variable with a variance, which is additive. Assumption is iid i.e. repeated noisegen hits.
In a different scenario, suppose we hit the noisegen once only and multiply the same output number by 5 in a “projected 5Y return”. Now std is additive.

In both cases, the mean of the 5Y end-to-end return is 5*M

risk-neutral measure, a beginner’s personal view

Risk neutral measure permeates derivative pricing but is not clearly understood. I believe RN measure is very useful to mathematicians. Maybe that’s why they build a complete foundation with lots of big assumptions.

Like other branches of applied math, there are drastic simplifying assumptions….

I think the two foundation building block are 1) arbitrage and 2) replication. In many textbook contexts, the prices of underliers vs derivatives are related and restrained by arbitrage. From these prices we can back out or imply RN probability values, but these are simplistic illustrations rather than serious definitions of RN measure.

On top of these and other concepts, we have Martingale and numeraire concepts.

Like Game programming for kids and for professionals, there are 2 vastly different levels of sophistication:
A) simplified — RN probabilities implied from live prices of underliers and derivatives
B) sophisticated — RN infrastructure and machinery, based on measure theory

black littermam — my brief notes

http://www.blacklitterman.org/what.html is a brief description.

First, forget about optimizer, capm, MeanVariance, or views. Let's first get a grip on Bayesian inference. Given an unfair coin, we are trying to estimate the Pr(tail) by tossing it over and over. There's uncertainty (a pr distribution) about the mean.

Updating estimates — Now we can formulate the problem as how to update our estimate of expected return when we get some special insight (like insider news). Such an insight is called a subjective “view”, in contrast to the public information about the securities. The updated estimates must be numbers, not some vague preference.

Optimizer — Once we get the updated estimates, they go into a regular portfolio allocation optimizer.

Problems of MV — concentration. Small change in the input (returns, covariance etc.) leading to drastic re-alloations.

The investor is uncertain in their estimates (prior and views), and expresses them as distributions of the unknown mean about the estimated mean. As a result, the posterior estimate is also a distribution.

selling an existing IR swap@@

I guess technically we can’t sell an IRS as it’s not a product like an orange (or a house, or an option) with an owner. A IRS is a long-term bilateral agreement. Analog? I can’t “sell” my insurance policy to someone else.

A liquid swap market lets us offset our Libor exposure —

Suppose I’m a Payer in Deal 1 with Citi, to receive Libor and pay fixed 4.5%. Five hours (or 5 days or 5 months) later, I could become a Receiver in a JPM deal (Deal 2) to pay Libor and receive fixed 4.6%. Therefore I get rid of my Libor exposure, as long as the reset dates are identical between Deal 1 and Deal 2. But strictly speaking I haven’t Sold an existing swap. Both are long-term commitments that could in theory be unwound (painful) but never “sold” IMO.

By market convention, the counterparty paying the fixed rate is called the “payer” (while receiving the floating rate), and the counterparty receiving the fixed rate is called the “receiver” (while paying the floating rate).

N(d2), GBM, binary call valuation – intuitive

It’s possible to get an intuitive feel for the binary call valuation formula.
For a vanilla European call, C = … – K exp(-Rdisc T)*N(d2)
N(d2) = Risk-Neutral Pr(S_T > K). Therefore,
N(d2) = RN-expected payoff of a binary call
N(d2) exp(-Rdisc T) — If we discount that RN-expected payoff to Present Value, we get the current price of the binary call. Note all prices are measure-independent.
Based on GBM assumption, we can *easily* prove Pr(S_T > K) = N(d2) .
First, notice Pr(S_T > K) = Pr (log S_T > log K).
Now, given S_T is GBM, the random variable (N@T) 
   log S_T ~ N ( mean = log S + T(Rgrow – σ^2)  ,   std = T σ^2 ). 
Let’s standardize it to get
   Z := (log S_T  – mean)/std    ~  N(0,1)
Pr = Pr (Z > (log K  – mean)/std ) = Pr (Z < (mean – log k)/std) = N( (mean – log k)/std)  = N(d2)

PCP with dividend – intuitively

See also posts on PCP.
See also post on replicating fwd contract.

I feel PCP is the most intuitive, fundamental and useful “rule of thumb” in option pricing. Dividend makes things a tiny bit less straightforward.

C, P := call and put prices today
F := forward contract price today, on the same strike. Note this is NOT the fwd price of the stock.

We assume bid/ask spread is 0.

    C = P + F

The above formula isn’t affected by dividend — see the very first question of our final exam. It depends only on replication and arbitrage. Replication is based on portfolio of traded securities. (Temperature – non-tradable.) But a dividend-paying stock is technically non-tradable!

* One strategy – replicate with European call, European put and fwd contract. All tradable.

* One strategy – replicate with European call, European put, bond and dividend-paying stock, but no fwd contract. Using reinvestment and adjusting the initial number of shares, replication can still work. No need to worry about the notion that the stock is “non-tradable”.

Hockey stick, i.e. range-of-possibility graphs of expiration scenarios? Not very simple.

What if I must express F in terms of S and K*exp(-rT)? (where S := stock price any time before maturity.)

  F = S – D – K*exp(-rT) … where D := present value of the dividend stream.

when there’s (implicit) measure+when there is none

Needs a Measure – r or mu. Whenever we see “drift”, it means expected growth or the Mean of some distribution (of a N@T). There’s a probability measure in the context. This could be a physical measure or a T-fwd measure or a stock-numeraire or the “risk-neutral-measure” i.e. MoneyMarketAcct as the numeraire

Needs a Measure – dW. Brownian motion is always a probabilistic notion, under some measure

Needs a Measure – E[…] is expectation of something. There’s a measure in the context.

Needs a Measure – Pr[..]

Needs a Measure – martingale

Regardless of measure – time-zero fair price of any contract. The same price should result from derivation under any measure.

Regardless of measure – arbitrage is arbitrage under any measure

option pricing – 5 essential rules n their assumptions

PCP — arb + extremely tight bid/ask spread + European vanilla option only. GBM Not assumed. Any numeraire fine.

Same drift as the numeraire — tradeable + arb + numeraire must be bond or a fixed-interest bank account.

no-drift — tradeable + arb + using the numeraire

Ito — BM or GBM in the dW term. tradable not assumed. Arb allowed.

BS — tradable + arb + GBM + constant vol

quasi constant parameters in BS

dS/S = a dt + b dW [1]

[[Hull]] says this is the most widely used model of stock price behavior. I guess this is the basic GBM dynamic. Many “treasures” hidden in this simple equation. Here are some of them.

I now realize a and b (usually denoted σ) are “quasi-constant parameters”. The initial model basically assumes constant [2] a and b. In a small adaptation, a and b are modeled as time-varying parameters. In a sense, ‘a’ can be seen as a Process too, as it changes over time unpredictably. However, few researchers regard a as a Process. I feel a is a long-term/steady-state drift. In contrast, many treat b as a Process — the so-called stochastic vol.

Nevertheless in equation [1], a and b are assumed to be fairly slow-changing, more stable than S. These 2 parameters are still, strictly speaking, random and unpredictable. On a trading desk, the value of b is typically calibrated at least once a day (OCBC), and up to 3 times an hour (Lehman). How about on a volatile day? Do we calibrate b more frequently? I doubt it. Instead, implied vol would be high, and market maker may jack up the bid/ask spread even wider.

As an analogy, the number of bubbles in a large boiling kettle is random and fast-changing (changing by the second). It is affected by temperature and pressure. These parameters change too, but much slower than the “main variable”. For a short period, we can safely assume these parameters constant.

Q: where is √ t
A: I feel equation [1] doesn’t have it. In this differential equation about the instantaneous change in S, dt is assumed infinitesimal. However, for a given “distant future” from now, t is given and not infinitesimal. Then the lognormal distribution has a dispersion proportional to √ t

[2] The adjective “constant” is defined along time axis. Remember we are talking about Processes where the Future is unknown and uncertain.

change measure but using cash numeraire #drift

Background — “Drift” sounds simple and innocent, but no no no.
* it requires a probability measure
* it requires a numeraire
* it implies there’s one or (usually) more random process with some characteristics.

It’s important to estimate the drift. Seems essential to derivative pricing.
BA = a bank account paying a constant interest rate, compounded daily. No uncertainty no pr-distro about any future value on any future date. $1 today (time-0) becomes exp(rT) at time T with pr=1 , under any probability measure.

MMA = money market account. More realistic than the BA. Today (time-0), we only know tomorrow’s value, not further.
Z = the zero coupon bond. Today (time-0) we already know the future value at time-T is $1 with Pr=1 under any probability measure. Of course, we also know the value today as this bond is traded. Any other asset has such deterministic future value? BA yes but it’s unrealistic.
S = IBM stock
Now look at some tradable asset X. It could be a stock S or an option C or a futures contract … We must must, must assume X is tradable without arbitrage.
—- Under BA measure and cash as numeraire.
   X0/B0 = E (X_T/B_T) = E (X_T)/B_T   =>
   E (X_T)/X0 = B_T/B0
Interpretation – X_T is random and non-deterministic, but its expected value (BA measure) follows the _same_ drift as BA itself.
—- Under BA measure and using BA as numeraire or “currency”,
   X0/B0 = E (X_T/B_T)
Interpretation – evaluated with BA as currency, the value of X will stay constant with 0 drift.
—- Under T-measure and cash numeraire
   X0/Z0 = E (X_T/Z_T) = E (X_T)/$1   =>
   E (X_T)/X0 = 1/Z0
Interpretation — X_T is random and non-deterministic, but its expected value (Z measure) follows the _same_ drift as Z itself.
—- Under T-measure and using Z as numeraire or “currency”,
   X0/Z0 = E (X_T/Z_T)
Interpretation – evaluated with the bond as currency, the value of X will stay constant with 0 drift.
—- Under IBM-measure and cash numeraire
   X0/S0 = E (X_T/S_T)
Interpretation – can I say X follows the same drift as IBM? No. The equation below doesn’t hold because S_T can’t come out of E()!
     !wrong —>       E (X_T)/X0 = S_T/S0    ….. wrong!
—- Under IBM-measure and IBM numeraire… same equation as above.
Interpretation – evaluated with IBM stock as currency, the value of X will stay constant with 0 drift.

Now what if X is non-tradable i.e. not the price process of a tradable asset? Consider random variable X = 1/S. X won’t have the drift properties above. However, a contract paying X_T is tradeable! So this contract’s price does follow the drift properties above. See http://bigblog.tanbin.com/2013/12/tradeablenon-tradeable-underlier-in.html

numeraire paradox

Consider a one-period market with exactly 2 possible time-T outcomes w1 and w2.

Among the tradable assets is G. At termination, G_T(w1) = $6 or G_T(w2) = $12. Under G-measure, we are given Pr(w1) = Pr(w2) = 50%. It seems at time-0 (right now) G_0 should be $9, but it turns out to be $7! Key – this Pr is inferred from (and must be consistent with) the current market price of another asset [1]. Without another asset, we can’t work out the G-distro. In fact I believe every asset’s current price must be consistent with this G-measure Pr … or arbitrage!

Since every asset’s current price should be consistent with the G-Pr, I feel the most useful asset is the bond. Bond current price works out to Z_0 = $0.875. This implies a predicable drift rate.

I would say under bond numeraire, all assets (G, X, Z etc) have the same drift rate as the bond numeraire. For example, under the Z-numeraire, G has the same drift as Z.

Q: under Z-measure, what’s G’s drift?
A: $7 -> $8

It’s also useful to work out under Z-measure the Pr(w1) = 66.66% and Pr(w2) = 33.33%. This is using the G_0, G_T numbers.

Now can there be a 0-interest bank account B? In other words, could B_T = B_0 = $1? No, since such prices imply a G-measure Pr(w1) like 5/7 (Verified!) So this bank account’s current price is inconsistent with whatever asset used in [1] above.

The most common numeraires (bank accounts and discount bonds) have just one “outcome”. (In a more advanced context, bank account outcome is uncertain, due to stoch interest rates.) This stylized example is different. Given a numeraire with multiple outcomes, it’s useful to infer the bond numeraire. It’s generally easier to work with one-outcome numeraires. I feel it’s even better if we know the exact terimnal price and the current price of this numeraire — I guess only the discount bond meet this requirement.

I like this stylized 1-period, 2-outcome world.
Q1: Given Z_T, Z_0, G_0, G_T [2], can i work out the G-Pr (i.e. distro under G-numeraire)? can i swap the roles and work out the Z-Pr ?
A: I think we can work out both distros and they aren’t identical !

Q2: Given G_0 and the G_T possible values[2] without Z prices, can we work out the G-Pr (i.e. distro under G-numeraire)?
A: no we don’t have a numeraire. In a high vs a low interest-rate world, the Pr implied by G_T would be different

[2] these are like pre-set enum values. We only know these values in this unrealistic world.

present value of 22 shares, using share as numeraire

We all know that the present value of $1 to be received in 3Y is (almost always) below $1, basically equal to exp(-r*3) where r:= continuous compound risk-free interest rate. This is like an informal, working definition of PV.

Q: What about a contract where the (no-dividend) IBM stock is used as currency or “numeraire”? Suppose contract pays 33 shares in 3Y… what’s the PV?

%%A: I feel the PV of that cash flow is 33*S_0 i.e current IBM stock price.
I feel this “numeraire” has nothing to do with probability measure. We don’t worry about the uncertainty (or probability distribution) of future dollar price of some security. The currency is the IBM stock, so the future value of 1 share is exactly 1, without /uncertainty/randomness/ i.e. it’s /deterministic/.
Similarly, given a zero bond will mature (i.e. cash flow of $1) in 3Y, PV of that cash flow is Z_0 i.e. the current market value of that bond.

N(d1) >> N(d2) | high σ, r==0, S==K

N(d1) = Pr(ST > S0) , share-measure
N(d2) = Pr(ST > S0) , RN-measure

For simplicity, T = 1Y,  S= K = $1.

First, forget the formulas. Imagine a GBM stock price with high volatility without drift. What’s the prob [terminal price exceeding initial price]? Very low. Basically, over the intervening period till maturity, most of the diffusing particles move left towards 0, so the number of particles that lands beyond the initial big-bang level is very small. The “distribution” curve is squashed to the left. [1]

However, this “diffusion” and distribution curve would change dramatically when we change from RN measure to share-measure. When we change to another measure, the “probability mass” in the Distribution would shift. Here, N(d1) and N(d2) are the prob of the same event, but under different measures. The numeric values can be very different, like 84% vs 16%.

Under share measure, the GBM has a strong drift (cf zero drift under RN) —

dS = σS dt + σ S dW

Therefore when σT is high, most of the diffusing particles move right and will land beyond the initial value, which leads to Pr(ST > S0) close to 100%

— Now the formula view —
With those nice r, S, K, T values,

d1 =  σT /2
d2 = –σT /2

Remember for a standard normal distribution, if d1 and d2 are 1 and -1 (if σ=2), then N(d1) would be 68% and N(d2) would be 32%.

[1] See posts


Pr(S_T > K | S_0 > K and r==0), intuitively

The original question — “Assuming S_0 > K and r = 0, denote C := time-0 value of a binary call. What happens to C as ttl -> 0 or ttl -> infinity. Is it below or above 0.5?”

C = Pr(S_T > K), since the discounting to PV is non-issue. So let’s check out this probability. Key is the GBM and the LN bell curve.

We know the bell curve gets more squashed [1] to 0 as ttl -> infinity. However, E S_T == S_0 at all times, i.e. average distance to 0 among the diffusing particles is always equal to S_0. See http://bigblog.tanbin.com/2013/12/gbm-with-zero-drift.html

[1] together with the median. Eventually, the median will be pushed below K. Concrete illustration — S_0 = $10 and K = $4. As TTL -> inf, the median of the LN bell curve will gradually drop until it is below K. When that happens, Pr (S_T > K) 0 as ttl -> infinity.

ttl -> 0. The particles have no time to diffuse. LN bell curve is narrow and tall, so median and mean are very close and merge into one point when ttl -> 0. That means median = mean = S_0.

By definition of the median, Pr(S_T > median) := 0.5 so Pr(S_T > S_0) = 0.5 but K is below S_0, so Pr(S_T > K) is high. When the LN bell curve is a thin tower, Pr(S_T > K) -> 100%

math tools used in option pricing vs risk mgmt – my take

In general, I feel statistics, as applied math, is a more widely used branch of math than probability. Both are used in finance. I feel their usage is different in the field of option pricing vs risk mgmt. Both efforts attempt to estimate the future movements of underlier prices. Both rely on complicated probability and statistics theories. Both try to estimate the “histogram” of a portfolio’s market value on a future date.

In option pricing, the future movement of the Underlyer is precisely modeled as a GBM (geometric Brownian motion). IMHO Stochastic is probability, not stats, and is used in option math. When I google “stochastic”, “volatility” always shows up. “Rocket science” in finance is usually about implied volatility — more probability less statistics.

In VaR, future is extrapolation from history. Risk manager doesn’t trust theoretical calculations but relies more [1] on historical data. “Statistical risk management” clearly shows the use of statistics in risk management.

In contrast, historical data is used much less in option pricing. Calibration uses current day’s market data.

[1] the “distribution” of past daily returns is used as a distribution of plant growth rate. There’s no reason to believe plant will grow any faster/slower in the future.

See other posts on probability vs stats. Risk management uses more stats than Option pricing.

Incidentally, If a portfolio include options,  then VaR would need both theoretical probability and statistics.

stoch Process^random Variable: !! same thing

I feel a “random walk” and “random variable” are sometimes treated as interchangeable concepts. Watch out. Fundamentally different!

If a variable follows a stoch process (i.e. a type of random walk) then its Future [2] value at any Future time has a Probability  distribution. If this PD is normal, then mean and stdev will depend on (characteristics of) that process, but also depend on the  distance in time from the last Observation/revelation.

Let’s look at those characteristics — In many simple models, the drift/volatility of the Process are assumed unvarying[3]. I’m not familiar with the more complicated, real-world models, but suffice to say volatility of the Process is actually time-varying. It can even follow a stoch Process of its own.

Let’s look at the last Observation — an important point in the Process. Any uncertainty or randomness before that moment is  irrelevant. The last Observation (with a value and its timestamp) is basically the diffusion-start or the random-walk-start. Recall Polya’s urn.

[2] Future is uncertain – probability. Statistics on the other hand is about past.
[3] and can be estimated using historical observations

Random walk isn’t always symmetrical — Suppose the random walk has an upward trend, then PD at a given future time won’t be a nice  bell centered around the last observation. Now let’s compare 2 important random walks — Brownian Motion (BM) vs GBM.
F) BM – If the process is BM i.e. Wiener Process,
** then the variable at a future time has a Normal distribution, whose stdev is proportional to sqrt(t)
** Important scenario for theoretical study, but how useful is this model in practice? Not sure.
G) GBM – If the process is GBM,
** then the variable at a future time has a Lognormal distribution
** this model is extremely important in practice.

ITM binary call as TTL -> 0 or infinity

I was asked these questions in an exam (Assuming r = 0, and S_0 > K).

Given standard GBM dynamics of the stock, binary call price today is N(d2) i.e. risk-neutral probability of ITM.

As ttl -> 0, i.e. approaching expiry, the stock has little chance of falling below K. The binary call is virtually guaranteed ITM. So binary call price -> $1.

As ttl -> inf, my calc shows d2 -> -inf, so N(d2) -> 0. For an intuitive explanation, see http://bigblog.tanbin.com/2013/12/gbm-with-zero-drift.html

Pr(S_T > S_0 | r==0) == 50/50@@

This is an ATM binary call with K = S_0 .
Wrong thought – given zero drift, the underlier price process is stochastic but “unbiased” or “balanced”, so underlier is equally likely to finish above K or below K. This wrong thought assumes a BM not GBM.
Actually, the dynamics is given by
    dS = σ S dW
This is a classic GBM without drift, so S_T has a lognormal distribution – like a distorted bell shape. So we can’t conclude that probability[1] is 50%. Instead,
Pr (S_T > S_0 | r=0) = N(d2) = N (- σ√T/2 )    … which is =< 50%
This probability becomes 50% only if sigma is 0 meaning the price doesn’t move at all, like a zero-interest bank account. I guess the GBM degenerates to a BM (or a frozen, timeless constant?).
[1] under risk-neutral measure

today’s price == today’s expectation of tomorrow’s price

“today’s [3] price[1] equals today’s[3] expectation[2] of tomorrow’s price” — is a well-known catch phrase. Here are some learning notes I jotted down.

[1] we are talking about tradeable assets only. Counter examples – Interest rate and Dividend-paying stock are not tradeable by definition, and won’t follow this rule.

[2] expectation is always under some probability distribution (or probability “measure”). Here the probability distro is inferred from all market prices observable Today. The prices on various derivatives across different maturities enable us to infer such a probability distribution. Incidentally, the prices have to be real, not some poor bid/ask spread that no one would accept.

[3] we use Today’s prices of other securities to back out an estimated fair price (of the target security) that’s fair as of Today. Fair meaning consistent with other prices Today. This estimate is valid to the extent those “reference prices” are valid. As soon as reference prices change, our estimate must re-adjust.

GBM formulas – when to subtract 0.5σ^2 from u

Background – I often get confused when (not) to subtract. Here’s a brief summary.
The standard GBM dynamic is
                dS = mu S dt + σ S dW …. where mu and σ are time-invariant.
The standard solution is to find the dynamics of logS, denoted L,
                dL = (mu – 0.5σ2 ) dt + σ dW …  BM, not GBM. No L on the RHS.
                L (time=T)     ~  N (mean = (mu – 0.5σ2 )T, std = …. )
So it seems our mu can’t get rid of –0.5σ2 thingy … Until we take expectation of S(time=T)
                E S(time=T) = S(time=0) exp(mu*T)     … no σ2 term
When we write down the Black Scholes PDE we use mu without the –0.5σ2 thingy.
BS formula uses mu without the –0.5σ2 thingy.

a contract paying log S_T or S^2

Background – It’s easy to learn BS without knowing how to price simpler contracts. As show in the 5 (or more) examples below, there are only a few simple techniques. We really need to step back and see the big picture.
Here’s a very common pricing problem. Suppose IBM stock price follows GBM with u and σ. Under RN, the drift becomes r, the bank account’s constant IR (or probably the riskfree rate), therefore, 

Given a (not an option) contract that on termination pays log ST, how much is the contract worth today? Note the payoff can be negative.
Here’s the standard solution —
1) change to RN measure, but avoid working with the discounted price process (too confusing). 
2) write the RN dynamics as a BM or GBM. Other dynamics I don’t know how to handle.
Denote L:= log St and apply Ito’s
dL = A dt + σ dW … where A is a time-invariant constant. So log of any GBM is a BM.
I think A = r – 0.5σ2 but the exact formula is irrelevant here.
3) so at time T, L ~ N(mean = L0 + A*T, std = …)
4) so RN-expectation of time-T value of L is L0 + A*T
5) discount the expectation to PV
Note L isn’t the price process of a tradable, so below is wrong.
E (LT/ BT) = L0/ B0   … CANNOT apply martingale formula
— What if payout = ST2 ? By HW4 Q3a the variable Jt:=St2 is a GBM with some drift rate B and some volatility.  Note this random process Jt is simply derived from the random process St. As such, Jt is NOT a price of any tradable asset [1].
Expectation of J’s terminal value = J0 exp(B*T)
I guess B = 2r + σ2 but irrelevant here.

[1] if Jt were a price process, then the discounted value of it would be martingale i.e 0 drift rate. Our Jt isn’t martingale. It has a drift rate, but this drift rate isn’t equal to the risfree rate. Only a tradable price process has such a drift rate. To clear the confusion, there are common 3 cases
1) if Jt is a price process (GBM or otherwise), then under RN measure, drift rate in it must be r Jt. See P5.16 by Roger Lee
2) if Jt is a discounted price process, then under RN measure, drift rate is 0 — martingale.
3) if not a price process, then under RN measure, drift rate can be anything.

— What if payout = max[0, (ST2) – K]?  This requires the CBS formula.
— What if payout = max[0, (logST) – K]? Once you know the RN distribution of logST is normal, this is tractable.
— what if payout = max[0, ST – K] but the stock pays continuous dividend rate q? Now the stock price process is not a tradeable.
No We don’t change the underlier to the tradeable bundle. We derive the RN dynamics of the non-tradeable price S as
dS = (r-q) S dt + σ S dW … then apply CBS formula.
So far all the “variables” are non-tradeable, so we can’t apply the MG formula
— What if payout = STXT where both are no-dividend stock prices. Now this contract can be statically replicated. Therefore we take an even simpler approach. Price today is exactly S0X0

BS-F -> Black model for IR-sensitive underliers

Generalized BS-Formula assumes a constant interest rate Rgrow :
Call (at valuation time 0) := C0 = Z0 * ( F0 N(d1) – K N(d2) ), where
Z0 := P(0,T) := observed time-0 price of a T-maturity zero coupon bond. There’s no uncertainty in this price as it’s already revealed and observed. We don’t need to assume constant interest rate.
F0 := S0 exp(Rgrow * T), which also appears inside d1 and d2
Q: How does this model tally with the Black model?
A: Simply redefine
F0 := S0 / Z0 , which is the time-0 “fwd price” of the asset S
Now the pricing formula becomes the Black formula for interest-rate-sensitive options. Luckily this applies even if S is the price of 5Y junk bond (or muni bond), and we know 5Y interest rate is stochastic and changes every day.

drv contract to swap 2 assets

nice brain teaser. No math required. Just common sense.

Suppose asset S and X have GBM dynamics (with mu_s, mu_x, sigma_s, sigma_x etc )
Suppose asset S and X have GBM dynamic too.

There’s a contract C paying (ST – XT) on termination. Could be a call option or any other contract.
There’s a contract C paying (STXT) on termination.

Q: Given X vs X have the same price today (and ditto S vs S), what can we say about C vs C price today?

To be concrete, say X0 = X0 = $0.5 and S0 = S0 = $3.

A replication portfolio for contract C — long 1 unit of S and short 1 unit of X. This portfolio has current price S0 – X0 = $2.5. Similar replication portfolio for C has current price $2.5. Tomorrow, the replication portfolios may have different prices. On expiration, they may be different in value, though each portfolio is a perfect replication.

So the GBM dynamic is irrelevant !

N(d1) N(d2) – my summary

n1 := N(d1)
n2 := N(d2)
Note n1 and n2 are both about probability distributions, so they always assume some probability measure. By default, we operate under (not the physical measure but) the risk-neutral measure with the money-market account as the “if-needed, standby” numeraire. 
– n2 is the implied probability of stock finishing above strike, implied from various live prices. RN measure.
– n1 is the same probability but implied under the share-measure. Therefore,
ST *N(d1) would be the weighted average payoff (i.e. expected payoff) of the asset-or-nothing call, under share-measure.
St * N(d1) would be the PV of the payoff, i.e. current price of asset-or-nothing call. Note as soon as we talk about price, it is automatically measure-independent.
Remember n2 is between $0 and $1 so it reminds us of … the binary call. I think this is the weighted average payoff of the binary call. RN measure. Therefore,
N(d2) exp(-Rdisc T) — If we discount that weighted average payoff to Present Value, we get the current price of the binary call. Note all prices are measure-independent.
N(d1) is also delta of the Vanilla call, measure-independent. Given call’s delta, using PCP, we work out put’s delta (always negative) = 1-N(d1) = -N(-d1)

The pdf N’(d1) appears in the gamma and vega formulas, measure-independent, i.e.
gamma = N’  (d1) * some function of (S, t)

vega = N’ (d1) * some other function of (S, t)

Notice we only put K (never S) in front of N(d2)

GBM + zero drift

I see zero-drift GBM in multiple problems
– margrabe option
– stock price under zero interest rate
For simplicity, let’s assume X_0 = $1. Given

        dX =σX dW     …GBM with zero drift-rate

Now denoting L:= log X, we get

                dL = – ½ σ2 dt + σ dW    … BM not GBM. No L on the RHS.
Now L as a process is a BM with a linear growth (rather than exponential growth).
LogX_t ~ N ( logX_0  – ½ σ2t  ,   σ2t )
E LogX_t = logX_0  – ½ σ2t  ….. [1]
=> E Log( X_t / X_0)  = – ½ σ2t  …. so expected log return is negative?
E X_t = X_0 …. X_t is a log-normal squashed bell where x-axis extends from (0 to +inf) [3].

Look at the lower curve below.
Mean = 1.65 … a pivot here shall balance the “distributed weights”
Median = 1.0 …half the area-under-curve is on either side of Median i.e. Pr(X_t < median) = 50%

Therefore, even though E X_t = X_0 [2], as t goes to infinity, paradoxically Pr(X_t<X_0) goes to 100% and most of the area-under-curve would be squashed towards 0, i.e. X_t likely to undershoot X_0.

The diffusion view — as t increases, more and more of the particles move towards 0, although their average distance from 0 (i.e. E X_t) is always X_0. Note 2 curves below are NOT progressive.

The random walker view — as t increases, the walker is increasingly drawn towards 0, though the average distance from 0 is always X_0. In fact, we can think of all the particles as concentrated at the X_0 level at the “big bang” of diffusion start.

Even if t is not large, Pr(X_t 50%, as shown in the taller curve below.

[1] horizontal center of of the bell shape become more and more negative as t increases.
[2] this holds for any future time t. Eg: 1D from now, the GBM diffusion would have a distribution, which is depicted in the PDF graphs.
[3] note like all lognormals, X_t can never go negative 

File:Comparison mean median mode.svg

arbitrage involving a convex/concave contract

(I doubt this knowledge has any value outside the exams.) Suppose a derivative contract is written on S(T), the terminal price of a stock. Assume a bank account with 0 interest rate either for deposit or loan. At time 0, the contract can be overpriced or under-priced, each creating a real arbitrage.

Basic realities (not assumptions) ? stock price at any time is non-negative.

— If the contract is concave, like L = log S, then a stock (+ bank account) can super-replicate the contract. (Can't subreplicate). The stock's range-of-possibilities graph is a straight tangent line touching the concave curve from above at a S(T) value equal to S(0) which is typically $1 or $10. The super-replication portfolio should have time-0 price higher than the contract, otherwise arbitrage by selling the contract.

How about C:=(100 ? S^2) and S(0) = $10 and C(0) = 0? Let's try {-20S, -C, +$200} so V(t=0) = $0 and V(t=T) = S^2 ? 20 S +100. At Termination,

If S=10, V = 0 ←global minimum

If S=0, V= 100

If S=11, V= 1

How about C:=sqrt(S)? S(0) = $1 and C(0) = $1? Let's try {S, +$1, -2C}. V(t=0) = 0. V(t=T) = S + 1 – 2 sqrt(S). At termination,

If S=0, V = 1

If S=1, V= 0 ←global minimum

If S=4, V= 1

If S=9, V= 4

— If the contract is convex, like exp(S), 2^S, S^2 or 1/S, then a stock position (+ bank account) can sub-replicate the contract. (Can't super-replicate). The replication range-of-possibilities graph is a straight tangent line touching the convex from below. This sub-rep should have a time-0 price below the contract, otherwise arbitrage by buying the contract and selling the replication.

dynamic delta hedge – continual rebalancing

To dynamically hedge a long call position, we need to hold dC/dS amount of shares. That number is equal to the delta of the call. In practice, given an option position, people adjust its delta hedge on a daily basis, but in theory, rebalancing is a continual, non-stop process.
The call delta BS-formula makes it clear that “t” alone will change the delta even if everything else held constant. Specifically, even if S is not changing and interest rate is zero, the option delta won’t stay constant as maturity approaches, so we need to adjust the number of shares we hold.

martingale – phrasebook

Like the volatility concept, mg is a fundamental concept but not a simple concept. It's like an elephant for the 5 blind men. It has many aspects.
process? a martingale is a process. At any time the process has a value.

MG property? A security could have many features and one of them could be the mg property meaning the security's fair value is a process and it meets the mg definition and is a mg process.

0-expectation? Expn(M_tomorrow – M_now) = 0

no-drift? A variable or a price (that qualifies as a process) with no drift is a mg.

replication – 1-step binomial model, evolving to real option

Now I feel the binomial tree model is a classic analytical tool for option pricing….
The 1-step binomial scenario is simple but non-trivial. Can be mind-bending. Usually we are given 5 numbers for Sd, S0, Su, Cd, Cu, and the problem is phrased like “Use some number of stock and bond to replicate the contract C” to get the same “payout outcomes” [1].

First, ignore the Cd, Cu values. The Sd, S0, Su 3 numbers alone imply RN probabilities of the up-move and down-move.

Next, using the RNP values we can replicate ANY contract including the given C contract.

The number of shares in the replication is actually the delta-hedge of the call.

[1] “Payout outcomes” mean the contract pays Cd dollars in the down-state and Cu dollars in the up-state.
—— That’s the first knowledge pearl to internalize…————

* 1-step binomial call option 
– this option contract can be replicated with stocks + bonds. Rebalancing not necessary.
– RNP/MG is an alternative to replication
* 1-step 3-state call option
– can’t replicate with stocks + ….
– RNP non-unique
(That’s assuming the 3 outcomes don’t accidentally line up.)
* 2-step 3-state call option, i.e. allowing rebalancing
– can replicate with stocks + bonds but needs rebalancing (self-financed, of course)
– RNP/MG is an alternative to replication

* fine-grained call options — infinite steps, many states
can replicate (terminal) payout with stocks + bonds, but needs dynamic delta-hedge (self-financed of course)
– * required number of stocks = delta

differential ^ integral in Ito’s formula

See posts on Ito being the most precise possible prediction.

Given dynamics of S is    dS = mu dt + sigma dW  , and given a (process following) a function f() of S,  then, Ito’s rule says

    df = df/dS * dS + 1/2 d(df/dS)/dS * (dS)^2

There are really 2 different meanings to d____

– The df/dS term is ordinary differentiation wrt to S, treating S as just an ordinary variable in ordinary calculus.
– The dt term, if present, isn’t a differential. All the d__ appearing outside a division (like d_/d__) actually indicates an implicit integral.
** Specifically, The dS term (another integral term) contains a dW component. So this is even more “unusual” and “different” from the ordinary calculus view point.

signal-noise ^ predictive formula – GBM

The future price of a bond is predictable. We use a predication formula like bond_price(t) = ….

The future price of a stock, assumed GBM, can be described by a signal-noise formula

S(t) =

This is not a prediction formula. Instead, this expression says the level of S at time t is predicted to be a non-random value plus a random variable (i.e. a N@T)

In other words, S at time t is a noise superimposed on a signal. I would call it a signal-noise formula or SN formula.

How about the expectation of this random variable S? The expectation formula is a prediction formula.

LogR, LRR, RF – usages

LogR = log return

RF = return factor

RR = Level return rate

VaR assuming norm distribution? probably RF or RR
VaR assuming Lognormal distribution of return factor? use LogR

linear factor model like capm and Fama-French and principal component? RF or RR

Mean-variance? RR

principal component? RR?

MV optimization with rise-free rate – sum(weight vector) == 1 @@

In a MV optimization context without a risk free asset, the weight vector must sum to 1 — any fund left over we have no where to put, not even bank account.

In a MV world with a risk-free rate (our world), the weight vector doesn’t need to sum to 1. Any difference is an allocation to the risk-free asset.

Then we work out a tangency portfolio, whose weight vector is scaled (?) to sum to 1. I feel this is just for convenience. If the tangency weight vector sums to 25, we still can construct all MV portfolios on the MV frontier by varying the “allocation to tangency” from 0 to 0.04 and beyond.

0 means all invested in risk-free asset.

0.04 means all invested in tangency.

0.05 means short risk-free  to get 25% more cash and invest all 125% into tangency portfolio.

eq-fwd contract pricing – internalize

Even if not actively traded, the equity forward contract is fundamental to arbitrage pricing, risk-neutral pricing, and derivative pricing. We need to get very familiar with the math, which is not complicated but many people aren’t proficient.

At every turn on my option pricing learning journey, we encounter our friend the fwd contract. Its many simple properties are not always intuitive. (See P 110 [[Hull]])

* a fwd contract (like a call contract) has a contractual strike and a contractual maturity date.Upon maturity, the contract’s value is frozen and stops “floating”. The PnL gets realized and the 2 counter-parties settle.
* a fwd contract’s terminal value is stipulated (ST – K), positive or negative. This is a function of ST, i.e. terminal value of underlier. There’s even a “range of possibilities” graph, in the same spirit of the call/put’s hockey sticks.
* (like a call contract) an existing fwd contract’s pre-maturity MTM value reacts to 1) passage of time and 2) current underlier price. This is another curve but the horizontal axis is current underlier price not terminal underlier price. I call it a “now-if” graph, not a  “range of possibilities” graph. The curve depicts

    pre-maturity contract price denoted F(St, t) = St                    – K exp(-r (T-t)  ) ……… [1]
    pre-maturity contract price denoted F(St, t) = St exp(-q(T-t)) -K exp(-r(T-t)) .. [1b] continuous div

This formula [1b] is not some theorem but a direct result of the simplest replication. Major Assumption — a constant IR r.

Removing the assumption, we get a more general formula
              F(St, t) = St exp(-q(T-t)) – K Zt
where Zt is today’s price of a $1 notional zero-bond with maturity T.

Now I feel replication is at the heart of everything fwd. You could try but won’t get comfortable with the many essential results [2] unless you internalize the replication.

[2] PCP, fwd price, Black model, BS formula …

Notice [1] is a function of 2 independent variables (cf call).  When (T – now) becomes 0, this formula degenerates to (ST – K). In other words, as we approach maturity, the now-if graph morphs into the “range of possibilities” graph.

The now-if graph is a straight line at 45-degrees, crossing the x-axis at    K*exp(-r  (T-t)  )

Since Ft is a multivariate function of t and St , this thing has delta, theta —

delta = 1.0, just like the stock itself
theta = – r K exp(-r  (T-t)  ) …… negative!

(Assuming exp(-q(T-t)) = 0.98 and
To internalize [1b], recall that a “bundle” of something like 0.98 shares now (at time t) continuously generates dividend converting to additional shares, so the 0.98 shares grows exponentially to 1.0 share at T. So the bundle’s value grows from 0.98St to ST , while the bond holding grows from K*Zt to K. Bundle + bond replicates the fwd contract.

 —————Ft / St is usually (above or below) close to 0 when K is close to S.  For example if K = $100 and stock is trading $102, then the fwd contract would be cheap with a positive (or negative) value.
** most fwd contracts are constructed with very low initial value.
* note the exp() is applied on the K. When is it applied on the S? [1]
* compare 2 fwd contracts of different strikes?
* fwd contract’s value has delta = 1

[1] A few cases. ATMF options are struck at the fwd price.

risk-neutral probability – basics

Simplest defining example of RNP : say a coin flip pays $1mil if H and 0 if T and the consensus market price is $400k. The RN prob inferred from the market prices is Pr(H) = 40%.

Another defining example of RNP — Suppose IBM price tomorrow can only be either $200 or $198, and current spot is $198.5, then we can back out the RN Pr(up). This prob distro is different from the “physical” distro.

We don’t know the physical prob. We assume the market price is a fair price, so we use the implied RN Prob as a fair estimate of the physical prob.

What if we know (via the coin manufacturer) the physical prob is 50/50? Well, the real people composing the market are risk averse so they are only willing to pay, in general, 400k. I guess the RNP is still 40%. In financial markets I don’t think anyone knows the physical prob. The most reliable way to estimate the physical prob is through the RNP.

Another defining example of RNP: (Roger’s P 2.28) Stock value at T1 is 115, and at Termination can rise to 150 or drop to 100. Using just these 3 numbers and 1 interval, we can derive the RNP(up | S=115 at T1). To keep things simple, we will assume the market has a consensus on the probabilities of up/down.

Next, wrap your mind around this unusual condition — that the terminal value ($150 and $100) are fixed and at Termination the stock cannot take on any value in between. This is like a coin or dice. The only unknown is the probability, not the possible values.

We can therefore infer RN P(up) = 30% as if all traders in the market all agreed on this 30%.

Note the current price of 115 is result of market adjusting to any new info. We can say the current price already reflect the RN P(up)

In the original example, at T1 the stock can also reach $75. On this branch of the tree, the Termination value is either 100 or 50. The RN P(up | S = 75 at T1) = 50%, different from the 30%.

This is another important feature of this model – the RNP depends not only on the stage we are at, but also on the information revealed so far. You can imagine the noisegen is adaptive.

iid assumption in cumulative return

Time diversification? First look at asset diversification. Split $200k into 2 uncorrelated investments so when one is down, the other might be up. Time-div assumes we could add up the log returns of perod1 and period2. Since the 2 values are two N@Ts and very likely non-perfectly-correlated (i.e. corr < 1.0), one of them might cushion the other.


Background — the end-to-end (log) return over 30 years is (by construction) sum of 30 annual returns —


r_0to1 is a N@T from noisgen1 with mu and sigma

r_1to2 is a N@T from noisgen2.

r_29to30 is a N@T


So the sum r_0to30 (denoted r) is also a random var with a distribution. Without assuming normality of noisegen1, if the 30 random variables are IID, then the sum would follow a normal distribution with E(r) = 30mu and stdev(r) = sigma * sqrt(30)


This is a very important and widely used result, at the heart of a lot of quizzes, a lot of financial data. However, the underlying IID assumption is controversial.


* The indep assumption is not too wrong. Stock return today is not highly correlated with yesterday's. Still AR(1) models include preceding period's return …. Harmless.

* The ident assumption is more problematic. We can't go back in time to run the noisegen1 again, but there are data to prove that the ident assumption is not supported by real data.


Here's my suggestion to estimate noisegen1's sigma. Look at log return. r_day1to252 = r_day1to2 + r_day2to3 + … + r_day251to252. Assuming the 252 daily return values are a sample of a single noisegenD, we can estimate noisegenD's mean and stdev, then derive the stdev of r_day1to252. This stdev is the stdev of noisegen1.


hetero-skeda-sticity, another example

[[Prem Mann]] has an example of explaining food expenditure using household income. The homo-skeda-sticity assumption on P592 is something like

   “the dispersion among expenditure of low-income households is, say, 21.1. The dispersion among high-income households is that same value. Here we mean the dispersion among the residual values i.e. the unexplained portion of expenditure.”

P598 further clarifies that the POPULATION “Spread of errors” at a given income level is a different quantity than that of the SAMPLE.

Note this is an assumption about the population not a sample. Suppose 5 income levels. A small sample having just 2 households per level (10 households in entire sample) will be too small, and is very likely to show inconsistent dispersions at low-income vs high-income.

Needless to say Dispersion is measured by stdev.

This book has some nice diagrams about the dispersions at 2 income levels.

drift under a given measure (but +! dividing by its numeraire)

See post on using cash numeraire.

I think we can assume for each numeraire, there’s just one [1] probability measure. That measure defines the probability distribution of any price process.  We can use that measure to evaluate expectations, to talk about Normal/Lognormal or dW, and to evaluate “exponential” drift (the “m” below), assuming

                dX = m X dt
Under the standard risk-neutral measure, the exponential drift is the same ( =r ) for all TRADEABLE assets, even though physical drift rates are not uniform. Specifically, the bank account itself (paying exponential short rate r) has a drift = r. So does the discount bond. So does a stock. So does a fwd contract. So does a vanilla call or binary call. So does an asset-or-nothing call.
At this point, we don’t need to worry about martingale or numeraire, though all the important results come from numeraire/MG reasoning.
I feel it’s important to remember drift is a __prediction__ about the future. It’s inherently based on some assumed probability distribution i.e. a probability measure. That probability distribution is derived from many live prices about T-expiry contracts.
Therefore, under another predicative probability distribution/measure, the predicted drift would differ.
The stock-measure is trickier. Take IBM. There exists an IBM measure. Under this measure, i.e. operating under this new (predictive) probability distribution, we can derive the (predicted) exponential drift rate of any asset’s price movement. Specifically, we can work out the predicted drift of the IBM price process. That drift is r + sigma^2, where

r:= exponential drift rate of the bank account i.e. money-market account. Consider it a physical drift but actulaly this is non-random and the same drift speed under any measure
Sigma:= the volatility of IBM. Same value under any measure.
[1] there might exists multiple, but I don’t bother.

[[Hull]]estimat`default probability from bond prices#learning notes

If we were to explain to people with basic math background, the

arithmetic on P524-525 could be expanded into a 5-pager. It's a good

example worth study.

There are 2 parts to the math. Using bond prices, Part A computes the

“expected” (probabilistic) loss from default to be $8.75 for a

notional/face value of $100. Alternatively assuming a constant hazard

rate, Part B computes the same to be $288.48*Q. Equating the 2 parts

gives Q =3.03%.

Q3: How is the 7% market yield used? Where in which part?

Q4: why assume defaults happen right before coupon date?

%%A: borrower would not declare “in 2 days I will fail to pay that

coupon” because it may receive help in the 11th hour.

–The continuous discounting in Table 23.3 is confusing

Q: Hull explained how the 3.5Y row in Table 23.3 is computed. But Why

discount to the T=3.5Y and not discounting to T=0Y ? Here's my long


The “risk-free value” (Column 4) has a confusing meaning. Hull

mentioned earlier a “similar risk-free bond” (a TBond). Right before

the 3.5Y moment, we know this risk-free bond is scheduled to pay all

cash flows at future times T=3.5Y, 4Y, 4.5Y, 5Y. That's 4 coupons +

principal. We use risk-free rate 5% to discount all 4+1 cash flows to

T=3.5Y. We get $104.34 as the value of the TBond cash flows

“discounted to T=3.5Y”

Column 5 builds on it giving the “loss due to default@3.5Y, discounted

to T=3.5Y”. Iin Column 6, This value is further discounted from 3.5Y

to T=0Y.

Part B computes a PV relative to the TBond's value. Actually Part A is

also relative to the TBond's value.

In the model of Part B, there are 5 coin flips occurring every

mid-year at T=0.5Y 1.5Y 2.5Y 3.5Y 4.5Y with Pr(default_0.5) =

Pr(default_1.5) = … = Pr(default_4.5) = Q. Concretely, imagine that

Pr(flip = Tail) is 25%. Now Law of total prob states

100% = Pr(d05) + Pr(d15) + Pr(d25) + Pr(d35) + Pr(d45) + Pr(no d). If

we factor in the amount of loss at each flip we get

Pr(d05) * $65.08 + Pr(d15) * $61.20 + Pr(d25) * $57.52 + Pr(d35) *

$54.01 + Pr(d45) * $50.67 + Pr(no d, no loss) + $0 == $288.48*Q

OLS ^ AutoRegressive models

Given some observed data Y, you first pick some explanatory variables X_1, X_2 etc

If you pick a linear model to explain the observed Y, then OLS is the best, linear, unbiased and efficient (BLUE) solution using a computer. It will give you all the parameters of your linear model – the b_0, b_1, b_2 etc.

If you feel the relationship isn’t linear, you still can use OLS. As an alternative to a linear model, you could use AR(1) models to explain Y using the X1 X2 etc. You use AR models when you believe there’s strong serial correlation or autocorrelation.

I believe AR models use additional parameters beside the b1, b2 etc. The computation is more efficient than OLS.

Stoch Lesson 59 meaning of q[=] in a simple SDE

See Lesson 55 about details on deltaW and dW
See Lesson 19 about N@T
See Lesson 33 for a backgrounder on the canonical Wiener variable W

The Hull definition of the canonical Wiener process (Lesson 33) —

deltaW = epsilon * sqrt(deltaT) // in discrete time
dW      = epsilon * sqrt(dT) // in continuous time

The “=” has a different meaning than in algebra.

Discrete time is simpler to understand. Recall deltaW is a stepsize of a random variable. The “=” doesn’t mean a step size value of 0.012 is equal to the product of an epsilon value and sqrt(deltaT).

The “=” means equivalent-to.

Here epsilon represents … (hold your breath)… a noisegen, in fact the canonical Gaussian noisegen.

I’d say both deltaW and epsilon are N@T. These are not regular variables.

Stoch Lesson 33 canonical Wiener variable ^ Gaussian variable

See Lesson 05 for the backgrounder on Level, Stepsize, time-varying random variable…
See Lesson 15 about TVRV
See Lesson 19 about N@T

In many formulas in this blog (and probably in the literature), W denotes not just some Wiener variable, but THE canonical TVRV random variable following a Wiener process a.k.a BM. Before we proceed it’s good (perhaps necessary) to pick a concrete unit of time. Say 1 sec. Now I am ready to pin down THE canonical Wiener variable W in discrete-time —

   Over any time interval h seconds, the positive or negative increment in W’s Level is generated from a Gaussian noisegen, with mean 0 and variance equal to h. This makes W THE canonical Wiener variable. [1]

Special case – If the interval is from last observation, when Level is 0, to 55 sec later, then dW = W(t=55) – 0 = W(t=55), and therefore W@55sec, as a N@T, also has a Gaussian distribution with variance = 55.

[1] I think this is the discrete version of Standard Brownian Motion or SBM, defined by Lawler on P42 with 2+1 defining properties — 1) iid random increments 2) no-jump, which __implies__ 3) Gaussian random increments

Now let’s look at the standard normal distro or canonical Gaussian distro or Gaussian noisegen — If something epsilon follows a canonical Gaussian distribution, it’s often a N@T, which is not a time-varying random variable. Also the variance and stdev are both 1.0.

I believe the canonical Wiener variable can be expressed in terms of the canonical Gaussian variable —

  deltaW = epsilon * sqrt(deltaT)  //  in discrete time
  dW = epsilon * sqrt(dT)            //  in continuous time

Let’s be concrete and suppose deltaT is 0.3 yoctosecond (more brief than any price movement). In English, this says “over a brief 0.3 yoctosecond, step_size is generated from a Gaussian noisegen with variance equal to 0.3 * 10^-24”. If we simulate this step 9999 times, we would get 9999 deltaW (stesp_size) realization values. These realizations would follow a bell-shaped histogram.

Given dW can be expressed this way, many authors including Hull uses it all the time.

Both the canonical Wiener variable and the canonical Gaussian distribution have their symbols — W vs epsilon(ϵ), or sometimes Z. They show up frequently in formulas. Don’t confuse them.

The Wiener var is always a TVRV; the Gaussian var is often a N@T.

Stoch Lesson J101 – W(t) isn’t a traditional function-of-time

See lesson 05 for a backgrounder on Level, steps
See Lesson 33 for a backgrounder on the canonical Wiener variable W

Let’s look at the notation W(t). This suggests the Level of W is a function of t. Suppose i = 55, I’d prefer the notation W_55 or Level_55, i.e. the level AFTER step_55. This level depends on i (i.e. 55), depends on t (i.e. 55 intervals after last-observation), and also depends on the 55 queries on the noisegen. Along one particular path W may be seen as a traditional function of t, but it’s misleading to think of W as a function t. Across all paths, at time t_55, W is W_55 and includes all the 9999 realized values after step_55 and all the “unrealized” values.

In other words, W at time t_55 refers to the “distribution” of all these possible values. W at time t_55 is a cross section of the 9999+ paths. The symbol W(t) means the “Distribution of W’s likely values at a future time t seconds after last observation“. Since W isn’t a traditional function of t, dW/dt is a freak. As illustrated elsewhere on this blog, the canonical Wiener variable W is not differentiable.

Stoch Lesson 55 deltaW and dW

See Lesson 05 about stepsize_i, and h…
See Lesson 33 for a backgrounder on the canonical Wiener variable W

Note [[Hull]] uses “z” instead of w.

Now let’s explain the notation deltaW in the well-known formula

S_i+1 – S_i == deltaS = driftRate * deltaT + sigma * deltaW

Here, deltaW is basically stepsize_i, generated by the noisegen at the i’th step. That’s the discrete-time version. How about the dW in the continuous time SDE? Well, dW is the stepsize_i as deltaT -> 0. This dW is from a noisegen whose variance is exactly equal to deltaT. Note deltaT is the thing that we drive to 0.

In my humble opinion, the #1 key feature of a Wiener process is that the Gaussian noisegen’s variance is exactly equal to deltaT.

Another name for deltaT is h. Definition is h == T/n.

Note, as Lawler said, dW/dt is meaningless for a BM, because a BM is nowhere differentiable.

Stoch Lesson J88 when to add scaling factor sqrt(t)

See Lesson 05 for a backgrounder on h.
See Lesson 15 for a backgrounder on paths and realizations.

In the formulas, one fine point easy to missed out is whether to include or remove sqrt(t) in front of dW. As repeated many times, notation is extremely important here. Before addressing the question, we must spend a few paragraphs on notations.
It’s instructive to use examples at this juncture. Suppose we adopt (h=) 16-sec intervals, and generate 9999 realizations of the canonical Wiener process. The 9999 “realized” stepsize values form a histogram. It should be bell-shaped with mean 0 and variance 16.0, stdev 4.0. If we next adopt (h=) 0.09-sec intervals, and generate 8888 realizations of the same process, then the resulting 8888 stepsize values should show variance 0.09, stdev 0.3.
That’s the canonical Wiener variable. So dW is defined as the stepsize as h -> 0. So dW has a Gaussian distribution with variance -> 0. Therefore dW is not customized and has well-known standard properties, including the sqrt(t) feature.
The simplest, purest, canonical Wiener variable already shows the sqrt(t) feature. Therefore, we should never put sqrt() in front of dW.
In fact, sqrt(t) scaling factor is only used with epsilon (or Z), a random variable representing the standard normal noisegen, with a fixed variance = 1.0

Stoch Lesson 22 any thingy dependent on a TVRV is likely a TVRV

See Lesson 05 about the discrete-time S_i+1 concept.
See Lesson 15 about TVRV.

I feel in general any variable dependent on a random variable is also a random variable, such as the S in

S_i+1 – S_i = deltaS = a * deltaT + b * deltaW

The dependency is signified by the ordinary-looking “+” operator. To me this addition operator means “superimpose”. The deltaS or stepsize is a combination of deterministic shift superimposed on a non-deterministic noise. That makes S itself a time-varying random variable which can follow a trillion possible paths from last-observation to Expiry.

The addition doesn’t mean the stepsize_i+1 will be known once both components i.e. (a * deltaT) and (b * deltaW) are known. In fact, deltaW can take a trillion possible values, so the stepsize in S is not exactly predictable i.e. non-deterministic. This stepsize is random. Therefore S itself is a TVRV.

Stoch Lesson 19 N@T is !! a TVRV

See Lesson 05 about norm().
See Lesson 15 about TVRV.

If we say some measurable value x ~ norm(m,v), then this x shows a normal distribution with mean m and variance v. I feel it’s safe to say x is from a particular noisegen, which is /characterized/ by the pair m and v.

Now, this x is NOT always a TVRV. Instead, when we say something follows some distribution, we are looking at the crystal ball:

– This x could be the future value of a TVRV at a specific target date, or
– This x could be the _increment_ in a TVRV over a future interval.

In both cases above, x is a Noisegen Output @ a Future Time — N@T. It’s rather useful to pin down whether some item (in a big formula) is a N@T or a TVRV. Not always obvious. Need a bit of clear thinking.

Stoch lesson 05 – Level^Stepsize ..

A3: stepsize_i. The Level_i value may or may not be normally distributed if we plot the 9999 realizations, but that depends on some factors. For example, if the noisegen is identical and independent on every query then we can rely on Central Limit Theorem. For stock prices that’s not the case.
In this series of lessons, I will create a set of “local jargon” used in later blog posts. First, Imagine the “Level” [Note 1] of a time-varying random variable W is a random walker taking up or down steps at regular intervals. At step_i, the stepsize_i [Note 2] is generated from a (Gaussian or otherwise) noisegen such as a computer. Level_i is the sum of all previous steps, positive or negative, i.e.

    Level_i = stepsize_1 + stepsize_2 + ….stepsize_i

It’s important to differentiate Level_i vs stepsize_i. Q3: which one of them has a normal distribution? Answer is hidden somewhere.

Notation is important here. It’s extremely useful to develop ascii-friendly symbols, with optional font sizing. These notations will be used in subsequent “lessons”. Here are a few more notations and jargon —

Let’s divide the total timespan T — from last-observation to Expiry — into n equal intervals. Denote a particular step as Step “i”, so first step has i=1. Let’s denote interval length as h=T/n = t_i+1 – t_i

I will use norm(a,b) to denote a Gaussian noisegen with mean=a and variance=b, so stdev=sqrt(b).

[1] The word “value” is too vague compared to Level.
[2] a.k.a. increment_i but less precise.

Stoch Lesson 15 – paths and time-varying random variables

See Lesson 05 for a backgrounder on the n steps of random walk…

RVRV is my own jargon, related to a stoch process. Before I’m confident to use the process jargon, I’ll use my own jargon.

Every time-varying-random-variable has a “Level” [1] at a given time, and therefore the variable has paths. The concept of path and the concept of time-varying-random-variable are intertwined.

For the random walker to go though the n steps one round, we query the noisegen n times. That’s a single realization of the random Process. If the walk is on a conveyer belt, then we see a “path”. One realization maps to one path. 9999 realizations would show 9999 paths and produce a good histogram.

Not everything we see in the formulas is a TVRV. The “h” isn’t; deltaAnything isn’t; drift rate isn’t … W is, though deltaW (ΔW)  isn’t. S is, though deltaS (ΔS) isn’t.

A note on randomness assumed in a stoch process — the future is usually assumed uncertain, but I won’t conclude that anything and everything in the future is random. The maturity value of a 12M time deposit is known, since default risk is assumed zero.

[1] actually not a single Level but multiple possible Levels. At a given time on each possible path, there’s a single Level.

GBM random walk – again

Mostly this write-up will cover the discrete-time process. In continuous, it’s no longer a walk [1]. Binomial tree, Monte Carlo and matlab are discrete.

Let’s divide the total timespan T — from last-observed to Expiry — into n equal intervals. At each step, look at ln(S_new/S_old), denoted r. (Notation is important in this field. It’s extremely useful to develop ascii-friendly symbols…) It’s good to denote the current step as Step “i”, so first step has i=1 i.e. r_1=ln(S_1/S_0). Let’s denote interval length as h=T/n.

To keep things simple let’s ignore the up/down and talk about the step size only. Here’s the key point —

Each step size such as our r_i is ~norm(0, h). r_i is non-deterministic, as if controlled by a computer. If we generate 1000 “realizations” of this one-step stoch process, we get 1000 r_i values. We would see a bell-shaped histogram.

What’s the “h” in the norm()? Well, this bell has a stdev, whose value depends on h. Given this is a Wiener process, sigma = sqrt(h). In other words, at each step the change is an independent random sample from a normal bell “generator” whose stdev = sqrt(step interval)

[1] more like a victim of incessant disturbance/jolt/bombardment. The magnitude of each movement would be smaller if the observation interval shortens so the path is continuous (– an invariant result independent of which realization we pick). However, the same path isn’t smooth or differentiable. On the surface, if we take one particular “realization” with interval=1microsec, we see many knee joints, but still a section (a sub-interval) may appear smooth. However, that’s the end-to-end aggregate movement over that interval. Zooming into one such smooth-looking section of the path, now with a new interval=1nanosec, we are likely to see knees, virtually guaranteed given the Wiener definition. If not in every interval then in most intervals. If not in this realization then in other realizations. Note a knee joint is not always zigzag . If 2 consecutive intervals see identical increments then the path is smooth, otherwise the 2-interval section may look like a reversal or a broken stick.

Brownian random walk -> sqrt(t)

A longer title would be “from random walk model to a stdev proportional to sqrt(t)”

Ignore the lognormal;
Ignore the rate of return;
Ignore stock prices. Just imagine a Weiner process. I find it more intuitive to consider the discrete time random walk. Assuming no drift, at each step the size and direction of the step is from a computer that generates a random number from a normal distribution like MSExcel normsinv(rand()), I’d like to explain/derive the important observation that t units of time into the Future, the UNKNOWN value of x has a Probability distribution that’s normal with mean 0 and stdev √t.

Now, time is customarily measured in years, but here we change the unit of time to picosecond, and assume that for such a short period, the future value of x has a ProbDist “b * ϵ(0,1)”, whose variance is b*b. I think we can also use the notatinon n(0,b*b).

Next, for 2 consecutive periods into the Future, x takes 2 random steps, so the sum (x_0to1 + x_1to2) also has a normal distribution with variance 2b*b. For 3 steps, variance is 3b*b…. All because the steps are independent — Markov property.

Now if we measure t in picosecond, then t means t picosecond, so the Future value after t random steps has a normal distribution with variance t b*b. So stdev is b*√t

For example, 12 days into the future vs 3 days into the future, the PD of the unknown value would have 2 normal distributions. stdev_12 = 2 * stdev_3.

Weiner process, better understood in discrete time

[[Hull]] presents a generalized Wiener process

dx = a dt + b dz

I now feel this equation/expression/definition is easier understood in discrete time. Specifically, x is a random variable, so its

Future value is unknown so we want to predict it with a pdf (super-granular histogram). Since x changes over time, we must clarify

our goal — what's the probability distribution of x a a time t a short while later? I feel this question is best answered in

discrete time. So we throw out dt and dz. (As a Lazy guy U don't even need delta_t and delta_z).

Let's make some safe simplifying assumptions : a = 0; b = 1 and last observation is x = 0. These assumptions reduce x to a Weiner

variable (i.e. x follows a Weiner process). At at a (near) future t units[1] away, we predict x future value with a normal

distribution whose stdev=sqrt(t).

[1] time is measured in years by custom

Now, What if I want to estimate the rate of change (“slope” of the chart) i.e. dx/dt? I don't think we can, because this is stoch

calculus, not ordinary calculus. I am not sure if we can differentiate or integrate both sides.

greeks on the move – intuitively

When learning option valuations and greeks, people often develop quick reflexes about what-if’s. Even a non-technical person can develop some of these intuitions. Because these are quick and often intuitive, this knowledge is often more practical and useful than the math details.

Some of these observations are practically important while others are obscure.

Q3: How would all indicators of an ATM instrument move when underlier rises/falls?
QQ: What if the instrument has very low/high volatility?
QQ: What if the instrument is far/close to expiry?

Q5: How would all indicators of a deep OTM (deep ITM is rare) instrument move when underlier moves towards/from strike?
QQ: What if the instrument has very low/high volatility?
QQ: What if the instrument is far/close to expiry?

Q7: How would all indicators of a deep-OTM/ATM instrument move when sigma_imp rises/falls?
QQ: What if the instrument has very low/high volatility?
QQ: What if the instrument is far/close to expiry?

Q9: How would all indicators of a deep-OTM/ATM instrument move when approaching maturity?
QQ: What if the instrument has very low/high volatility?

“Indicators” include all greeks and option valuation. The “instrument” can be a European/American call/put/straddle.

Essential BS-M, my 2nd take

People ask me to give a short explanation of Black-Scholes Model (not BS-equ or BS-formula)…

I feel random variable problems always boil down to the (inherent) distribution, ideally in the form of a probability density function.

Back to basics. Look at the height of all the kids in a pre-school — There’s a distribution. Simplest way to describe this kind of distribution is a histogram [.8 -1m], [1-1.2m], [1.2-1.4m] … A probability distribution is a precise description of how the individual heights are “distributed” in a population.

Now consider another distribution — Toss 10 fair dice at once and add up the points to a “score”. Keep tossing to get a bunch of scores and examine the distribution of scores. If we know the inherent, natural distribution of the scores, we have the best possible predictor of all future outcomes. If we get one score per day, We can then estimate how soon we are likely to hit a score above 25. We can also estimate by the 30th toss, how “surely” cumulative-score would have exceeded 44.

For most random variables in real life, the inherent distribution is not a simple math function like our little examples. Instead, practioners work out a way to *characterize* the distribution. This is the standard route to solve random variable problems because characterizing the underlying distribution (of the random variable) unlocks a whole lot of insights.

Above are random variables in a static context. Stock price is an evolving variable. There’s a Process. In the following paragraphs, I have mixed the random process and the random variable at the end of the process. The process has a σ and the variable (actually its value at a future time) also has a σ.
In option pricing, the original underlying Random Process Variable (RPV) is the stock price. Not easy to characterize. Instead, the pioneers picked an alternative RPV i.e. R defined as ln(Sn+1 / Sn) and managed to characterize R’s behavior. Specifically, they characterized R’s random walk using a differential equation parametrized by a σinst i.e. the instantaneous volatility [1]. This is the key parameter of the random walk or the Geometric Brownian motion.

Binomial-tree is a popular implementation of BS. B-tree models a stock price [2] as a random walker taking up/down steps every interval (say every second). To characterize the step size Sn+1 – Sn, we wanted to get the distribution of step sizes but too hard. As an alternative, we assume R follows a standard Wiener process so the value of R at any future time is normally distributed. But what is this distribution about?

Remember R is an observable random variable recorded at a fixed sampling frequency. Let’s denote R values at each sampling point (i.e. each step of the random walk) as  R1, R2 ,R3, R4 …. We treat each of them as independent random variables. If we record a large series of R values, we see a distribution, but this is the wrong route. We don’t want to treat time series values R1, R2 … as observations of the same random variable. Instead, imagine a computer picking an R value at each step of the random walk (like once a second). The distribution of each random pick is programmed into computer. Each pick has a distinct Normal distribution with a distinct σinst_1, σinst_2, σinst_3 …. [4]

In summary, we must analyze the underlying distribution (of S or R) to predict where S might be in the future[3].
[4] A major simplifying assumption of BS is a time-invariant  σinst which characterizes the distributions of  R at each step of the random walk. Evidence suggests the diffusion parameter σinst does vary and primarily depends on time and current stock price. The characterization of σinst as a function of time and S is a cottage industry in its own right and is the subject of skew modelling, vol surface, term structure of vol etc.

[1] All other parameters of the equation pale in significance — risk-free interest i.e. the drift etc.
[2] While S is a random walker, R is not really a random walker. See other posts.
[3] Like in the dice case, we can’t predict the value of S but we can predict the “distribution” of S after N sampling periods.

theoretical numbers ^ vol surface

After you work in volatility field for a while, you may figure out when (and when not) to use the word “theoretical”. There’s probably no standard definition of it. I guess it basically means “according-to-BS”. It can also mean risk-neutral. All the greeks and many of the pricing formulas are theoretical.

The opposite of theoretical is typically “observed on the market”, or adjusted for skew or tail.

Now, the volatility smile, the volatility term structure and the vol surface are a departure from BS. These are empirical models, fitted against observed market quotes. Ignoring outliers among raw data, the fitted vol surface must agree with observed market prices — empirical.

most important params on a parametric vol surface #term struct

Each smile curve on a vol surface is typically described by a few parameters. The most important are
1) atmVol aka anchorVol, and
2) skew

All other curve-parameters are less important. All the curve-parameters are “calibrated” or “tuned” using market quotes.

Skew is a number and basically describes (for a given maturity) the asymmetry of the vol smile. There’s one skew number for each fitted maturity. These numbers are typically negative.

That’s the parametrization along the strike axis. How about along maturity axis? What parameters describe the term structure of vol?

I don’t know for sure, but often a parametric vol surface has a term structure parametrization for each curve-parameter. For example, there’s a term-structure for anchorVol. There’s another term structure for skew. Well, in some of the most sophisticated vol surface models, there’s no such TS parametrization. I guess in practice users didn’t find it useful.

quantitative feel of bond duration – mapping absolute 1% -> relative x%

In the simplest illustration of modified duration, if a bond has modified duration == 5 years, then a 100bps yield change translates to 5% dollar price (valuation) change.

Note that 100 bps is an Absolute 1% change in yield, whereas the 5% is a Relative 5% change in valuation. If original valuation == $90 [1], then 100 bps =>> $4.5 change.

After we clear this little confusion, we can look at dv01. Simply set the absolute yield change to 1 bp. The valuation change would be a Relative 0.05% i.e. $0.045. The pattern is

Duration == 5 years => dv01 == 0.05% Relative change
Duration == 6 years => dv01 == 0.06% Relative change
Duration == 7 years => dv01 == 0.07% Relative change

Note 0.05% Relative change means 0.05% times Original price, not Par price.  Original price can be very different from par price, esp. for zero bonds.

[1] 90 in bond price quote means 90% of par value. For simplicity we would assume par is $100, though smallest unit is $1000 in practice.

(See P10 of YieldBook publication on Duration.)

long gamma == long realized vol@@

A rule suggested by http://www.surlytrader.com/trading-gamma/ says “Long Gamma -> Profit when realized volatility is greater than the implied volatility of the purchased option”.

That’s a bit complicated for me, but here is what I know — If you buy either a put or a call ATM and then delta hedge, you would make (unrealized) profit when realized volatility turns out to be high. If instead sigma_r is low, I guess you still make a small profit. You get 0 PnL (ignoring premiums) if sigma_r is 0.

sigma_r means realized vol; sigma_i means implied vol.

http://tylerstrading.blogspot.com/2009/05/gamma-facts.html says “Typically in positive Gamma trades we seek realized volatility, as the more the underlying moves the better your chances for raking in profits.”

I believe this assumes delta hedge. The scenario is that during the holding period underlier moves but your delta hedge (by a stock position) is fairly effective so whatever your option gain or loss due to its delta (say around 50%) is offset by the gain or loss of your stock position. However, the gamma contribution to your PnL would be positive.

Another observation (“rule”) is
* vega measures your exposure and sensitivity to Implied vol.
* gamma measures your exposure and sensitivity to Realized vol.

Someone said, if sigma_i goes up by 1 basis point, your PnL is equal to vega.

##conventional wisedoms in option valuation

There are many empirical rules in option math, but I feel with different universality and reliability.

Rule) vega of an ATM op =~ premium / implied vol
Rule) put-call equivalence in FX options. See separate blog post http://bigblog.tanbin.com/2011/11/equivalent-option-positions.html
Rule) PCP – “complicated” in American style, according to CFA textbook
Rule) delta(call) + delta(put) =~ 100% — See separate blog post http://bigblog.tanbin.com/2010/06/delta-of-call-vs-put.html
Rule) delta is usually between 0 and 1? Someone told me it can exceed 1 before ex-div
Rule) option valuation always decays with time
Rule) ATM delta is “very close” to 50%, regardless of expiration
Rule) delta converges with increasing vol. See separate blog post http://bigblog.tanbin.com/2011/11/option-rule-delta-converges-to-5050.html

Rule) For a strike away from predicted forward price, the OTM option has better liquidity than the ITM option. Therefore the OTM is  more useful/important to volatility estimate at that strike.

Rule) for equities, OTM put quotes show higher i-vol than OTM calls. Incidentally, at the same low strike, the OTM put is more liquid than ITM call. Reason is, most people trade OTM options only. However, ITM options are still actively traded — if an ITM option is offered at a low enough price, someone will buy; if an OTM option is bidding high enough, someone will write the option.

ln(S/K) should be compared to what yardstick@@

(update — I feel depth of OTM/ITM is defined in terms of ln(S/K) “battling” σ√ t )

Q: if you see a spot/strike ratio of 10, how deep OTM is this put? What yardstick should I use to benchmark this ratio? Yes there is indeed a yardstick.

In bond valuation, the yardstick is yield, which takes into account coupon rate, PV-discounting of each coupon, credit quality and even OAS. In volatility trading, the yardstick has to take into account sigma and time-to-maturity. In my simplified BS (http://bigblog.tanbin.com/2011/06/my-simplified-form-of-bs.html), there’s constant battle between 2 entities (more obvious if you assume risk-free rate r=0)

     ln(S/K) relative to σ√ t         …………….. (1)

Fundamentally, in BS model ln(S/K) at any time into the diffusion has a normal distribution whose
stdev = σ√ t, i.e. the quoted annualized vol scaled up for t (2.5 years in our example)

Note the diffusion starts at the last realized stock price.

Q: Why is σ a variable and t or r are not?
σ is the implied vol.
σ is the anticipated vol over the remaining life of the option. If I anticipate a 20%, i can put it in and get a valuation. Tomorrow, If i change my opinion and anticipate a much larger vol over the remaining life (say, 2 years) of the option, I can change this input and get a much larger valuation.

The risk free rate r has a small effect on popular, liquid options, and doesn’t fluctuate much

As to the t, it is actually ingrained in my 2 entities in (1), since my sigma is scaled up for t.

var-swap PnL: %%worked example

 A variance swap lets you bet on “realized” variance. The exchange automatically calculates realized variance for each day, so if you bet the total realized variance over the next 3 days will average to exceed 0.64 [1], then you can buy this contract. If it turns out to be 0.7812, you earn the difference of 0.1412 notional which would mean $141,200 on a million dollar notional.

[1] which means 80% vol (annualized), or roughly 5% daily realized vol (un-annualized)

Standard var swap PnL is defined as

    (sigma_r2 – K) N  ….. …(1)
  N denotes notional amount like $1,000,000
  K denotes strike, which is always in terms of annualized variance

sigma_r is annualized realized Vol over the n days, actually over n-1 price relatives

  sigma_r2 is annualized realized Variance, and calculated as
    252/(n-1)  [  ln2(S2/S1) + ln2(S3/S2)  + … + ln2(Sn/Sn-1)  ]
  S2 denotes the Day 2 closing price.
  ln2(S2/S1) is known as daily realized Variance un-annualized

ln(S2/S1) is known as daily realized Vol un-annualized, or DRVol

In other words, take the n-1 values of ln(PriceRelative) and find the stdev assuming 0 mean, then annualize.

A more intuitive interpretation — take the average of the n-1 daily realized variances, then multiply by 252.

Now, trading often work with DRVol rather than the S2 stuff above, so there’s an equivalent PnL formula to reveal the contribution of “today’s” DRVol to a given var swap position, and also track the cumulative contribution of each day’s DRVol. Formula (1) becomes PnL ==

252N/(n-1)*[ ln2(S2/S1)-K/252 + ln2(S3/S2)-K/252 + .. + ln2(Sn/Sn-1)-K/252 ], or
N/(n-1)*[ 252ln2(S2/S1)-K + 252ln2(S3/S2)-K + .. + 252ln2(Sn/Sn-1)-K ]
  N/(n-1) represents the notional amount allocated to each day.
  252ln2(S2/S1) represents the annualized daily realized Variance on Day 2

√252 ln(S2/S1) represents the annualized DRVol, but is omitted from the formula due to clutter

In other words, for each day get the “spread” of (annualized) DRVar over strike (K), multiply it by the daily notional, and you get a daily PnL “contribution”. Add up the daily to get the total PnL. Here’s an example with daily notional = $4166666 and K = 0.09 i.e. 30% vol

ln PR
sqrt(252) ln PR
spread over K
daily PnL contribution

You can then add up the daily contributions, which would add up to the same total PnL by Formula in (1).

how to make volatility values ANNUALIZED

(Let’s assume a flat forward curve i.e. 0 drift, 0 dividends, 0 interests.) Suppose an implied vol for a 1-year option is 20%. If we record ln(PR) i.e. log of daily price relatives until expiry, we expect 68% of the 200+ daily readings to fall between -0.2 and 0.2. That’s because ln(PR) is supposed to follow a normal distribution.

Note we aren’t 68% sure about the expiration underlier price i.e. S(t=T) or S(T) for short. This S(T) has a lognormal distribution[2], so no 68% rule. However, we do know something about the S(t=T) because the end-to-end ln(PR) is the sum of ln(daily PR), and due to central limit theorem, the overall ln(PR) has a normal distribution with a variance = sum(variance of ln(daily PR)). We always assume the individual items in the sum() are independent and “identical”, variance of ln(daily PR) is therefore 0.04/252days.

Also, Since ln(overal PR) = ln[S(T)/S(0)] has normal distribution, S(T) has a lognormal distribution. That’s the reason for [2].

To answer any option pricing question, we invariably need to convert quoted, annualized vol to what I call raw-sigma or stdev.

Rule #1 — we assume one-period variance will persist to 2 periods, 3 periods, 4 periods… (eg: a year consists of 12 one-month periods.)

Example 1: If one-year variance is 0.04, then a four-year raw-variance would be .04 * 48/12 = .16. The corresponding stdev i.e. raw-sigma would be 40%. This value is what goes into BS equation to price options with this maturity.

Example 2: If one-year variance is 0.04, then a three-month raw-variance would be .04 *3/12 = .01. The stdev i.e. raw-sigma would be sqrt(.01) = 10%. This value is what goes into BS equation to price options with this maturity. By Rule #1, we assume the same 3-month variance would persist in 2nd 3-month, the 3rd 3-month and 4th 3-month periods.