marginal probability density: clarified #with equations

(Equations were created in Outlook then sent to WordPress by HTML email. )

My starting point is Look at the cross section at X=7.02. This is a 2D area, so volume (i.e. probability mass) is zero, not close to zero. Hard to work with. In order to work with a proper probability mass, I prefer a very thin but 3D “sheet” , by cutting again at X=7.02001 i.e 7.02 + deltaX. The prob mass in this sheet divided by deltaX is a number. I think it’s the marginal density value at X=7.02.

The standard formula for marginal density function is on

How is this formula reconciled with our “sheet”? I prefer to start from our sheet, since I don’t like to deal with zero probability mass. Sheet mass divided by the thickness i.e. deltaX:

Since f(x,y) is assumed not to change with x, this expression simplifies to

Now it is same as formula [1]. The advantage of my “sheet” way is the numerator always being a sensible probability mass. The integral in the standard formula [1] doesn’t look like a probably mass to me, since the sheet has zero width.

The simplest and most visual bivariate illustration of marginal density — throwing a dart on a map of Singapore drawn on a x:y grid. Joint density is a constant (you can easily work out its value). You could immediate tell that marginal density at X=7.02 is proportional to the island’s width at X=7.02. Formula [1] would tell us that marginal density is

2 reasons: BM is poor model for bond price

Reason 1 — terminal value is known. It’s more than deterministic. It’s exactly $100 at maturity. Brownian Motion doesn’t match that.

Reason 2 — drift estimate is too hard too sensitive. A BM process has a drift value. U can be very careful very thorough to estimate it, but any minor change in the drift estimate would result in very large differences in the price evolution, if the bond’s lifespan is longer than 10Y.


Applying Ito’s formula on math problems — learning notes

Ito’s formula in a nutshell — Given dynamics of a process X, we can derive the dynamics[1] of a function[2] f() of x .

[1] The original “dynamics” is usually in a stoch-integral form like

  dX = m(X,t) dt + s(X,t) dB

In some problems, X is given in exact form not integral form. For an important special case, X could be the BM process itself:


[2] the “function” or the dependent random variable “f” is often presented in exact form, to let us find partials. However, in general, f() may not have a simple math form. Example: in my rice-cooker, the pressure is some unspecified but “tangible” function of the temperature. Ito’s formula is usable if this function is twice differentiable.

The new dynamics we find is usually in stoch-integral form, but the right-hand-side usually involves X, dX, f or df.

Ideally, RHS should involve none of them and only dB, dt and constants. GBM is such an ideal case.

change of .. numeraire^measure

Advice: When possible, I would work with CoN rather than CoM. I believe once we identify another numeraire (say asset B) is useful, we just know there exists an equivalent measure associated with B (say measure J), so we could proceed. How to derive that measure I don’t remember. Maybe there’s a relatively simple formula, but very abstract.

In one case, we only have CoM, no CoN — when changing from physical measure to risk neutral measure. There is no obvious, intuitive numeraire associated with the physical measure!

CoN is more intuitive than CoM. Numeraire has a more tangible meaning than “measure”.

I think even my grandma could understand 2 different numeraires and how to switch between them.  Feels like simple math.

CoM has rigorous math behind it. CoM is not just for finance. I guess CoM is the foundation and basis of CoN.

I feel we don’t have to have a detailed, in-depth grasp of CoM to use it in CoN.

physical measure is impractical

Update: Now I think physical probability is not observable nor quantifiable and utterly unusable in the math including the numerical methods.  In contrast, RN probabilities can be derived from observed prices.

Therefore, now I feel physical measure is completely irrelevant to option math.

RN measure is the “first” practical measure for derivative pricing. Most theories/models are formulated in RN measure. T-Forward measure and stock numeraire are convenient when using these models…

Physical measure is an impractical measure for pricing. Physical measure is personal feeling, not related to any market prices. Physical measure is mentioned only for teaching purpose. There’s no “market data” on physical measure.

Market prices reflect RN (not physical) probabilities.

Consider cash-or-nothing bet that pays $100 iff team A wins a playoff. The bet is selling for $30, so the RN Pr(win) = 30%. I am an insider and I rig the game so physical Pr() = 80% and Meimei (my daughter) may feel it’s 50-50 but these personal opinions are irrelevant for pricing any derivative.

Instead, we use the option price $30 to back out the RN probabilities. Namely, Calibrate the pricing curves using liquid options, then use the RN probabilities to price less liquid derivatives.

Professor Yuri is the first to point out (during my oral exam!) that option prices are the input, not the output to such pricing systems.

drift ^ growth rate – are imprecise

The drift rate “j” is defined for BM not GBM
                dAt = j dt + dW term
Now, for GBM,
                dXt = r Xt  dt + dW term
So the drift rate by definition is r Xt, Therefore, it’s confusing to say “same drift as the riskfree rate”. Safer to say “same growth rate” or “same expected return”

vol, unlike stdev, always implies a (stoch) Process

Volatility, in the context of pure math (not necessarily finance), refers to the coefficient of dW term. Therefore,
* it implies a measure,
* it implies a process, a stoch process

Therefore, if a vol number is 5%, it is, conceptually and physically, different from a stdev of 0.05.

* Stdev measures the dispersion of a static population, or a snapshot as I like to say. Again, think of the histogram.
* variance parameter (vol^2) of BM shows diffusion speed.
* if we quadruple the variance param (doubling the vol) value, then the terminal snapshot’s stdev will double.

At any time, there’s an instantaneous vol value, like 5%. This could last a brief interval before it increase or decreases. Vol value changes, as specified in most financial models, but it changes slowly — quasi-constant… (see other blog posts)

There is also a Black-Scholes vol. See other posts.

Radon-Nikodym derivative #Lida video

Lida pointed out CoM (change of measure) means that given a pdf bell curve, we change its mean while preserving its “shape”! I guess the shape is the LN shape?

I guess CoM doesn’t always preserve the shape.

Lida explained how to change one Expectation integral into another… Radon Nikodym.

The concept of operating under a measure (call it f) is fundamental and frequently mentioned but abstract…

Aha – Integrating the expectation against pdf f() is same as getting the expectation under measure-f. This is one simple, if not rigorous, interpretation of operating under a given measure. I believe there’s no BM or GBM, or any stochastic process at this stage — she was describing how to transform one static pdf curve to another by changing measure. I think Girsanov is different. It’s about a (stochastic) process, not a static distribution.

discounted asset price is MG but "discount" means…@@

The Fundamental Theorem

A financial market with time horizon T and price processes of the risky asset and riskless bond (I would say a money-market-account) given by S1, …, ST and B0, …, BT, respectively, is arbitrage-free under the real world probability P if and only if there exists an equivalent probability measure Q (i.e. risk neutral measure) such that
The discounted price process, X0 := S0/B0, …, XT := ST/BT is a martingale under Q.

#1 Key concept – divide the current stock price by the current MMA value. This is the essence of “discounting“, different from the usual “discount future cashflow to present value
#2  key concept – the alternative interpretation is “using MMA as currency, then any asset price S(t) is a martingale”
I like the discrete time-series notation, from time_0, time_1, time_2… to time_T.
I like the simplified (not simplistic:) 2-asset world.
This theorem is generalized with stochastic interest rate on the riskless bond:)
There’s an implicit filtration. The S(T) or B(T) are prices in the future i.e. yet to be revealed [1]. The expectation of future prices is taken against the filtration.
[1] though in the case of T-forward measure, B(T) = 1.0 known in advance.
–[[Hull]] P 636 has a concise one-pager (I would skip the math part) that explains the numeraire can be just “a tradable”, not only the MMA. A few key points:

) both S and B must be 2 tradables, not something like “fwd rate” or “volatility”
) the measure is the measure related to the numeraire asset
) what market forces ensure this ratio is a MG? Arbitragers!

importance@GBM beyond BS-M #briefly

To apply BS-formula on interest rate derivatives, the underlyer process must become GBM, often by changing measure. I guess the dividend-paying stock also needs some treatment before the BS-formula can apply…

But GBM is not just for BS-Model:

GBM is used in Girsanov!

I guess most asset prices show an exponential growth rate, so a GBM with time-varying mu and sigma (percentage drift and percentage volatility) is IMO general and flexible, if not realistic. However, I don’t feel interest rate or FX spot rates are asset prices at all. Futures prices converge. Bond prices converge…

1st theorem of equivalent MG pricing, precisely: %%best effort

Despite my best effort, I think this write-up will have
* mistakes
* unclear, ambiguous points
, but first step is to write it down. This is first phase of thin->thick->thin.

Version 1: under RN measure [1], all traded asset [2] prices follows a GBM [4] with growth rate [5] similar to the riskfree money market account. The variance parameter of the GBM is unique to each asset.
Version 2: under RN measure [1], all traded asset [2] prices discounted to PV [3] by the riskfree money market account are martingales. In fact they are 0-drift GBM with individual volatilities.
Version 3: under RN measure [1], all traded asset [2] prices show an expected [3] return equal to the riskfree rate.
[2] many things are not really traded asset prices. See post on “so-called tradable”
[3] why we need to discount to present, and why “expected” return? because we are “predicting” the level of random walker /towards/ a target time later than the last revelation.  The value before the revelation is “realized”, “frozen” and has no uncertainty, no volatility, no diffusion, and no bell-shaped distribution.
[4] no BM here. All models are GBM.
[5] see post on drift ^ grow rate

using dB_t vs B_t in a formula

label – stoch

(dW is a common synonym of dB)


Whenever I see dB, I used to see it as a differential of “B”. But it’s undefined – no such thing as a differential for a Brownian motion!


Actually any dB equation is an integral equation involving a “stochastic integral” which has a non-trivial definition. Greg Lawler spent a lot of time explaining the meaning of a stochastic integral.


I seldom see B itself, rather than dB, used in a formula describing a random process. The one big exception is the exact formula describing a GBM process.


<!–[if gte msEquation 12]>St=S0exptm-s22+Bt s<![endif]–>

Greg confirmed that the regular BM can also be described in B rather than dB:

given <!–[if gte msEquation 12]>dXt=m dt+s dBt<![endif]–>

<!–[if gte msEquation 12]>Xt=X0+m t+s Bt<![endif]–>

This signal-noise formula (see other post) basically says random walk position X at time t is a non-random, predictable position with an chance element superimposed. The chance element is a random variable ~ N(mean 0, variance s2t).


This is actually the most precise description of the random process X(t).


We can also see this as a generalized Brownian motion, expressed using a Standard Brownian motion. Specifically, it adds drift rate and variance parameter to the SBM.


martingale – learning notes#non-trivial

A martingale is a process, always associated with a filtration, or the unfolding of a story.  Almost always [1]) the unfolding has a time element.
[1] except trivial cases like “revealing one poker card at a time” … don’t spend too much time on that.
In the Ito formula context, (local) martingale basically means zero dt coefficient. Easy to explain. Ito’s calculus always predicts the next increment using 1) revealed values of some random process and 2) the next random increment in a standard BM:
      dX = m(X, Y, …, t) dt    +   1(X, Y…, t)dB1      +   2(X, Y…, t)dB2 +…
Now, E[dX] = 0 for a (local) martingale, but we know the dB terms contribute nothing.
counter-example – P xxx [[Zhou Xinfeng]] has a simple, counter-intuitive illustration: B3 is NOT a martingale even though by symmetry E[B^3] = 0. (Local) Martingale requires
     E[B^3 | last revealed B_t value] = 0 , which doesn’t hold.

BM ^ GBM with or +! drift – quick summary

B1) BM with dB and drift = 0? the Standard BM, simple but not useless. A cornerstone building block of more complex stoch processes.

G1) GBM with dB and drift = 0? The simplest, purest GBM. The tilting function used in Girsanov theorem. See my blog “GBM + zero drift”

G2) GBM with drift without dB? a deterministic exponential growth path. Example – bank account. Used in BS and pricing.

B2) BM with drift without dB? a linear, non-random walker, not a real BM, useless in stoch.

B3) Now, how do we deal with a BM with both drift and dB? use G1 construct as “tilting function” to effect a change of measure. In a nutshell,

B3 –(G1)–> B1

G3) GBM with drift + dB? The most common stock price model.

from BM’s to GBM’s drift rate – eroded by .5 sigma^2

Let’s start with a regular BM with a known drift *rate* denoted “m”, and known variance parameter value, denoted “s”:
dX = m dt + s dBt
In other words,
Xt – X0 = m*t + s*Bt
Here, “… + t” has a non-trivial meaning. It is not same as adding two numbers or adding two variables, but rather a signal-noise formula… It describes a Process, with a non-random, deterministic part, and a random part whose variance at time t is equal to (s2 t)
Next, we construct or encounter a random process G(t) related but not derived from this BM:
dG/G = m dt + s dBt    …….. [2]
It turns out this process can be exactly described as
  G = G0exp[ (m- ½ s2)t  + s Bt ]     ………. [3]
Again, the simple-looking “… + s Bt” expression has a non-trivial meaning. It describes a Process, whose log value has a deterministic component, and a random component whose variance is (s2 t).
Note in the formula above (m- ½ s2) isn’t  the drift of GBM process G(t), because left hand side is “dG / G” rather than dG itself.
In contrast, (m- ½ s2) is a drift rate in the “log” process L(t) := log G(t). This log process is a BM.

                dL = (m – ½ s2) dt + s dBt    …… [4]

If we compare [2] vs [4], we see the drift rate eroded by (½ s2).
(You may feel dL =?= dG/G but that’s “before” Ito. Since G(t) is an Ito process, to get dL we must apply Ito’s and we end up with [4].)
I wish there’s only one form to remember, but unfortunately, [2] and [4] are both used extensively.
In summary
* Starting from a BM with drift = (u) dt
** the exponential process Y(t) derived from the BM has drift (not drift rate)
= [u + ½ s2 ] Y(t) dt
* Starting from a GBM (Not something derived from BM) process with drift (not drift rate) = m* G(t) dt
** the log process L(t), derived from the GBM process, is a BM with drift
= (m – ½ s2) dt, not “…L(t) dt”

BM: y Bt^3 isn’t a martingale

Greg gave a hint: . Basically, for positive X, the average is higher, because the curve is convex.
Consider the next brief interval (long interval also fine, but a brief dt is the standard approach). dX will be Normal and symmetric. the +/- 0.001 stands for dX. For each positive outcome of dx like 0.001, there’s an equally likely -0.001 outcome. We can just pick any pair and work out the contribution to E[(X+dX)3].
For a martingale, E[dY] = 0 i.e. E[Y+dY] = E[Y]. In our case, Y:=X3 , so E[(X+dX)3] need to equal E[X3] ….
Note that Bt3 is symmetric so mean = 0. It’s 50/50 to be positive or negative, which does NOT make it a martingale. I think the paradox is filtration or “last revealed value”.
Bt3 is symmetric only when predicting at time 0. Indeed, E[Bt3 | F_0] = 0 for any target time t. How about given X(t=2.187) = 4?
E[(4 + dX)^3] works out to be 4^3 + 3*4*E[dX^2] != 4^3

stoch integral – bets on each step of a random walk

label – intuitive
Gotcha — In ordinary integration, if we integrate from 0 to 1, then dx is always a positive “step”. If the integrand is positive in the “strip”, then the area is positive. Stoch integral is different. Even if integrand is always positive the strip “area” can be negative because the dW is a coin flip.

Total area is a RV with Expectation = 0.

In Greg Lawler’s first mention (P 11) of stoch integral, he models the integrand’s value (over a brief interval deltaT) as a “bet” on a coin flip, or a bet on a random walk. I find this a rather intuitive, memorable, and simplified description of stoch integral.
Note the coin flip can be positive or negative and beyond our control. We can bet positive or negative. The bet can be any value. For now, don’t worry about the magnitude of the random step. Just assume each step is +1/-1 like a coin flip
If the random walk has no drift (fair coin), then any way you bet on it, you are 50/50 i.e. no way to beat a martingale. Therefore, the integral is typically (expected to be) 0. Let’s denote the integral as C. What about E[C2] ? Surely positive.  We need the variance rule…
Q: Does a stoch integral always have expectation equal to last revealed value of the integral?
A: Yes. It is always a local martingale. If it’s bounded, then it’s also a martingale.

change of measure, learning notes

See also — I have a long MSWord doc in my c:\0\b2b

Key — Start at discrete. Haksun’s example on coin flip…. Measure-P assigns 50/50 to head/tail. Measure-Q assigns 60/40 weights. Therefore we can see dQ/dP is a function (denoted M ), which maps each outcome (head/tail) to some amount of probability mass. Total probability mass adds up to 100%.

Requirement — the 2 measures are equivalent i.e. the pdf curve have exactly the same support. So M() is well-defined [1]. In the continuous case, suppose the original measure P is defined over a certain interval, then so is the function M(). However function M isn’t a distribution function like P, because it may not “add to 100%”. (I guess we just need the Q support to be a subset…)

Notation warning — V represents a particular event (like HT*TH), M(V) is a particular number, not a random variable IMO. Usually expectation is computed from probability. Here, however, probability is “defined” with expectation. I think when we view probability as a function, the input V is not a random variable like “X:=how many heads in 10 flips”, but a particular outcome like “X==2”.
Now we can look at the #1 important equation EQ1:

Q(V) := Ep [M(V) 1V] , where Ep () denotes expectation under the original Measure-P.

This equation defines a new probability distro “Q” using a P-expectation.

Now we can look at the #2 equation EQ2, mentioned in both Lawler’s notes and Haksun:

EQ[X] = Ep [ X M(X) ]

Notation warning — X is a random variable (such as how many heads in 10 flips), not a particular event like HT*TH. In this context, M(X) is a derived RV.

Key – here function M is used to “tilt” the original measure P. This tilting is supposed to be intuitive but not for me. The input to M() can be an event or (P141) a number! On P142 of Greg’s notes, M itself is a random variable.

Next look at a continuous distro.

Key – to develop intuition, use binomial approximation, the basis of computer simulation.

Key – in continuous setting, the “outcomes” are entire paths. Think of 1000 paths simulated. Each path gets a probability mass under P and under Q. Some paths get higher prob under Q than under P; the other paths get lower prob under Q.

The magic – with a certain tilting function, a BM with a constant drift rate C will “transform” to a symmetric BM. That magic tilting function happens to be …

I wonder what this magic tilting function looks like as a graph. Greg said it’s the exponential shape as given at end of P146, assuming m is a positive constant like 1.

In simple cases like this, the driftless BM would acquire a Positive drift via a Positive tilt. Intuitively, it’s just _weighted_average_:

* For the coin, physical measure says 50/50, but the new measure *assigns* more weight to head, so weighted average would be tilted up, positively towards heads.
* For the SBM, physical measure says 50/50, but the new measure *assigns” more weight to positive paths, so the new expectation is no longer 0 but positive.

[1] This function M is also described as a RV with Ep [M] = 1. For some outcomes Q assigns higher probability mass than P, and lower for other outcomes. Average out to be equal.

My take on Ito’s, using d(X*Y) as example

Let J be the random process defined by Jt := Xt Yt. At any time, the product of X and Y is J’s value. (It’s often instructive to put aside the “process” and regard J as a derived random VARIABLE.) Ito’s formula says
    dJ := d(Xt Yt) = Xt dY + Yt dX + dX dY
Note this is actually a stoch integral equation. If there’s no dW term hidden in dX, then this reduces to an ordinary integral equation. Is this also a “differential equation”? No. There’s no differential here.

Note that X and Y are random processes with some diffusion i.e. dW elements.

I used to see it as an equation relating multiple unknowns – dJ, dX, dY, X, Y. Wrong! Instead, it describes how the Next increment in the process J is Determined and precisely predicted
Ito’s formula is a predictive formula, but it’s 100% reliable and accurate. Based on info revealed so far, this formula specifies exactly the mean and variance of the next increment dJ. Since dJ is Guassian, the distribution of this rand var is fully described. We can work out the precise probability of dJ falling into any range.

Therefore, Ito’s formula is the most precise prediction of the next increment. No prediction can be more precise. By construction, all of Xt, Yt, Jt … are already revealed, and are potential inputs to the predictive formula. If X (and Y) is a well-defined stoch process, then dX (and dY) is predicted in terms of Xt , Yt , dB and dt, such as dX = Xt2 dt + 3Yt dB

The formula above actually means “Over the next interval dt, the increment in X has a deterministic component (= current revealed value of X squared times dt), and a BM component ~ N(0, variance = 9 Yt2 dt)”

Given 1) the dynamics of stoch process(es), 2) how a new process is composed therefrom, Ito’s formula lets us work out the deterministic + random components of __next_increment__.

We have a similarly precise prediction of dY, the next increment in Y. As such, we already know
Xt, Yt — the Realized values
dt – the interval’s length
dX, dY – predicted increments
Therefore dJ can be predicted.
For me, the #1 take-away is in the dX formula, which predicts the next increment using Realized values.

BM – B(3)^B(5) independent@@

Jargon: B3 or B(3) means the random position at time 3. I think this is a N@T.

Q: Are the 2 random variables B3 and B5 independent?
A: no. Intuitively, when B3 is very high, like 3 sigma above the mean (perhaps a value of 892), then B5 is likely to Remain high, because for the next 2 seconds, the walker follows a centered, symmetric random walk, centered at the realized value of 892.

We know the increment from time 3 to 5 is ind of all previous values. Let d be that increment.

B5 = B3 + d, the sum of two ind Normal RV. It’s another normal RV, but dependent on the two!

BM hitting 3 before hitting -5

A common Brownian Motion quiz ([[Zhou Xinfeng]]): Given a simple BM, what's the probability that it hits 3 before it hits -5?

This is actually identical to the BM with upper and lower boundaries. The BM walker stops when it hits either boundary. We know it eventually stops. At that stopping time, the walker is either at 3 or -5 but which is more likely?

Ultimately, we rely on the optional stopping theorem – At the stopping time, the martingale's value is a random variable and its expectation is equal to the initial value.

optional stopping theorem, my take

label – stoch

Background — There's no way to beat a fair game. Your winning always has an expected value of 0, because winning is a martingale, i.e. expected future value for a future time is equal to the last revealed value.

Now, what if there's a stopping time i.e, a strategy to win and end the game? Is the winning at that time still a martingale? If it's not, then we found a way to beat a fair game.

For a Simple Random Walk (coin flip) with upper/lower bounds, answer is intuitively yes, it's a martingale.

For a simple random walk with only an upper stopping bound (say $1), answer is — At the stopping time, the winning is the target level of $1, so the expected winning is also $1, which is Not the starting value of $0, so not a martingale! Not limited to the martingale betting strategy. So have we found a way to beat the martingale? Well, no.

“There's no way to beat a martingale in __Finite__ time”

You can beat the martingale but it may take forever. Even worse (a stronger statement), the expected time to beat the martingale and walk away with $1 is infinity.

The OST has various conditions and assumptions. The Martingale Betting Strategy violates all of them.

square integrable martingale has a more detailed definition than Lawler's.

If a discrete martingale M(n) is a SIM, then

E[ M(99)^2 ] is finite, and so is E[ M(99999)^2 ].

Each (unconditional) expectation is, by definition, a fixed number and not random.

Consider another number “lim_(n-> inf) E[ M(n)^2 ]”. For a given martingale, this “magic attribute” is a fixed number and not random. A given square-integrable martingale may have an magic attribute greater than any number there is, i.e. it goes to infinity. But this magic attribute isn't relevant to us when we talk about square-integrable martingales. We don't care about the limit. We only care about “any number n”.

It's relevant to contrast that with quadratic variation. This is a limit quantity, and not random.

For a given process, Quadratic variation is a fixed value for a fixed timespan. For processA, Quadratic variation at time=28 could be 0.56; at time=30 it could be 0.6.

In this case, we divide the timespan into many, many (infinite) small intervals. No such fine-division in the discussion on square-integrable-martingales

process based@BM +! a stoch variance#Ronnie

One of the  Stochastic problems (HW3Q5.2) is revealing (Midterm2015Q6.4 also). We are given
  dX = m(X,t) dt + s(X,t) dBt
where m() and s() can be very complicated  functions. Now look at this unusual process definition, without Xt : 
Appling Ito’s, we notice this function, denoted f(), is a function of t, not a function of Xt, so df/dx = 0. We get
  dY = Yt Xt3 dt

So, There’s no dB term so the process Y has a drift only but no variance. However, the drift rate depends on X, which does have a dB component! How do you square the circle? Here are the keys:
Note we are talking about the variance of the Increment over a time interval delta_t
Key — there’s a filtration up to time t. At time t, the value of X and Y are already revealed and not random any more.
Key — variance of the increment is always proportional to delta_t, and the linear factor is the quasi-constant “variance parameter”. Just like instantaneous volatility, this variance parameter is assumed to be slow-changing. 
(Ditto for the drift rate..)
In this case, the variance parameter is 0. The increment over the next interval has only a drift element, without a random element.

Therefore, the revealed, realized values of X and Y determine the drift rate over the Next interval of delta_t

Riemann ^ stoch integral, learning notes

In a Riemann integral, each strip has an area-under-the-curve being either positive or negative, depending on the integrand’s sign in the strip. If the strip is “under water” then area is negative.

In stochastic integral [1], each piece is “increment   *   integrand”, where both increment and integrand values can be positive/negative. In contrast, the Riemann increment is always positive.

With Riemann, if we know integrand is entirely positive over the integration range, then the sum must be positive. This basic rule doesn’t apply to stochastic integral. In fact, we can’t draw a progression of adjacent strips as illustration of stochastic integration.

Even if the integrand is always positive, the stoch integral is often 0. For an (important) example, in a fair game or a drift-less random walk, the dB part is 50-50 positive/negative.

[1] think of the “Simple Process” defined on P82 by Greg Lawler.

On P80, Greg pointed out
* if integrand is random but the dx is “ordinary” then this is an ordinary integral
* if the dx is a coin flip, then whether integrand is random or not, this is a stoch integral

So the defining feature of a stoch integral is a random increment

simplest SDE (!! PDE) given by Greg Lawler

P91 of Greg Lawler’s lecture notes states that the most basic, simple SDE
  dXt = At dBt     (1)
can be intuitively interpreted this way — Xt is like a process that at time t evolves like a BM with zero drift and variance At2. 

In order to make sense of it, let’s back track a bit. A regular BM with 0 drift and variance_parameter = 33 is a random walker. At any time like 64 days after the start (assuming days to be the unit of time), the walker still has 0 drift and variance_param=33. The position of this walker is a random variable ~ N(0, 64*33). However, If we look at the next interval from time 64 to 64.01, the BM’s increment is a different random variable ~ N(0, 0.01*33).
This is a process with constant variance parameter. In contrast, our Xt process has a … time-varying variance parameter! This random walker at time 64 is also a BM walker, with 0 drift, but variance_param= At2. If we look at the interval from time 64 to 64.01, (due to slow-changing At), the BM’s increment is a random variable ~ N(0, 0.01At2).
Actually, the LHS “dXt” represents that signed increment. As such, it is a random variable ~ N(0, dt At2).

Formula (1) is another signal-noise formula, but without a signal. It precisely describes the distribution of the next increment. This is as precise as possible.

Note BS-E is a PDE not a SDE, because BS-E has no dB or dW term.

filtration +! periodic observations

In the stochastic probability (not “statistics”) literature, at least in the beginner level literature, I often see mathematicians elude the notion of a time-varying process. I think they want a more generalized and more rigorous terminology, so they prefer filtration.

I feel most of the time, filtration takes place through time.

Here’s one artificial filtration without a time element — cast a bunch of dice at once (like my story cube) but reveal one at a time.

Stoch Lesson 38 parameters of BM

Lawler defined BM with 2 params – drift and variance v, but the meaning of variance is tricky.

Note a BM is about a TVRV and notice the difference between a N@T vs TVRV. A N@T could be modeled by a Gaussian variable with a variance. The variance v of a BM is about the variance of increment. Specifically, the increment over deltaT is a regular Gaussian RV with a variance = deltaT*v

Fn-measurable, adapted-to-Fn — in my own language

(Very basic jargon…)

In the discrete context, Fn represents F1, F2, F3 … and denotes a sequence or accumulation of information.

If something Mn is Fn-measurable, it means as we get the n-th packet of information, this Mn is no longer random. It’s now measurable, but possibly unknown. I would venture to say Mn is already realized by this time. The poker card is already drawn.

If a process Mt is adapted to Ft, then Mt is Ft-measurable…

return rate vs log return – numerically close but LN vs N

Given IBM price is known now, the price at a future time is a N@T random var. The “return rate” over the same period is another N@T random var. BS and many models assume —

* Price ~ logNormal
* return ~ Normal i.e. a random var following a Normal distro

The “return” is actually the log return. In contrast,

* return rate ~ a LogNormal random variable shifted down by 1.0
* price relative := (return rate +1) ~ LogNormal

N@T means Noisegen Output at a future Time, a useful concept illustrated in other posts

Q (Paradox): As pointed out on P29 [[basic black scholes]], for small returns, return rate and log return are numerically very close, so why only log return (not return rate) can be assumed Normal?

A: “for small returns”… But for large (esp. neg) returns, the 2 return calculations are not close at all. One is like -inf, the other is like -100%
A: log return can range from -inf to +inf. In contrast, return rate can only range from -100% to +inf => can’t have a Normal distro as a N@T.

Basic assumption so far — daily returns are iid. Well, if we look at historical daily returns and compare adjacent values, they are uncorrelated but not independent. One simple set-up is, construct 2 series – odd days and even days. Uncorrelated, but not independent. The observed volatility of returns is very much related from day to day.

another (2nd) numeraire paradox

(This scenario is actually a 2-period world, well-covered in [[math of financial modeling and inv mgmt]]. However, this is NOT the simplest problem using a bank account or bond as numeraire. )
Consider a one-period market with exactly 2 possible time-T outcomes w1 and w2. Among the tradable assets is G. At termination,  
                G_T(w1) = $6
                G_T(w2) = $12.
Under G-measure, we are given RN PrG (w1) = PrG (w2) = 50%. It seems at time-0 (right now) G_0 must be 9, but it turns out to be 7!
Key – this RNPG is inferred from (and must be consistent with) the current market price of another asset [1]. In fact I believe any asset’s current price must be consistent with this G-measure RNPG. I guess the discounted expected payout equals the time-0 price.
Now can there be a 0% interest bank account B? In other words, is it possible to have B_T = B_0 = $1? Well, this actually implies a PrG (w1) = 5/7 (Verified!), not 50%. So this bank account’s current price is inconsistent with whatever asset used in [1] above. Arbitrage? I guess so.
I think it’s useful to work out (from the [1] asset’s current price) the bond current price  Z_0 = $0.875. This implies a predicable drift rate. I would say all assets (G, X, Z etc) have the same drift rate as the bond numeraire.
Next, it’s useful to work out that under Z-measure the RN Prz (w1) = 66.66% and Prz (w2) = 33.33%, very different RNPG values.
Q: under Z-measure, what’s G’s drift?
A: $7 -> $8
1) The most common numeraires (bank accounts and discount bonds) have just one “outcome”. (In a more advanced context, bank account outcome is uncertain, due to stoch interest rates.) This stylized example is different and more tricky. Given such a numeraire with multiple outcomes, it’s useful to infer the bond numeraire.
2) When I must work with such a numeraire, I usually have
* If I also have X_0 then I can back out Risk Neutral PrG(w1) and PrG(w2)
* alternatively, I can use X as numeraire and back out PrX(w1) and PrX(w1)
* If on the other hand we are given some of the PG numbers, then we can compute X_0 i.e. price the asset X.
[1] Here’s one such asset X_0 = 70 and X(w1) = 60 and X(w2) = 120.

tradable/non-tradable underlier in a drv contract

I guess in many, many entry-level quant questions, we are often given a task to find the Risk Neutral [2] dynamics of some variable X. Simple examples include Xa(t)=S(t)^2 or Xb=600/S, Xc=sqrt(S), Xd=exp(S), Xe=S*logS … where S is the IBM price following a GBM. In many simple cases the variable X is also GBM under the Risk Neutral measure. We use Ito’s rule…

Then we are asked to price a contract that guarantees to pay X(T) at maturity.

At this point, it’s easy to forget the X itself is not tradeable i.e. the X process is not the price process of a tradeable asset. When interest rate goes from 200 to 201, the mid-quote (of any security) doesn’t go from $200 to $201, even though the implied vol or implied yield could go from 200 to 201.Another Eg – suppose I were to maintain tight bid/ask quotes around current value of 600/ S_IBM. If IBM is trading at $30 then I quote $20. If IBM trades at $40 then I quote $15. This market-maker would induce arbitrage (– intuitive to the practitioners but not the uninitiated). A contract paying 600/S_T on maturity has a fair price today X_0 that’s very, very different from 600/S_0  [1].

Given X(t) process isn’t a tradeable (not a price process), X doesn’t have drift equal to risk-free rate “r” 😦

However, don’t lose heart — noting this Contract is a tradable , the contract’s price process C(t) is tradeable and C(t) has (exponential) drift = r 🙂

Q: Basic question – Given X(t) isn’t a price process, does it make sense to apply Ito’s on X = 600/S ?
A: Yes because 1) Ito lets us (fully) characterize the dynamics of the X(t) process, albeit NOT a price process. In turn, 2) the SDE (+ terminal condition) reveals the distro of X(T). From the distro, we could find the 3) expectation of X(T) and the 4) pre-expiry price. Note every step requires a probability measure, since dW, BM, distro, expectation are all probabilistic concepts.

[1] Try to develop intuition — By Jensen’s inequality, it should be above 600/S_0, provided S process has non-zero volatility.
[2] (i.e. using money market account probability measure)

risk-neutral measure, a beginner’s personal view

Risk neutral measure permeates derivative pricing but is not clearly understood. I believe RN measure is very useful to mathematicians. Maybe that’s why they build a complete foundation with lots of big assumptions.

Like other branches of applied math, there are drastic simplifying assumptions….

I think the two foundation building block are 1) arbitrage and 2) replication. In many textbook contexts, the prices of underliers vs derivatives are related and restrained by arbitrage. From these prices we can back out or imply RN probability values, but these are simplistic illustrations rather than serious definitions of RN measure.

On top of these and other concepts, we have Martingale and numeraire concepts.

Like Game programming for kids and for professionals, there are 2 vastly different levels of sophistication:
A) simplified — RN probabilities implied from live prices of underliers and derivatives
B) sophisticated — RN infrastructure and machinery, based on measure theory

when there’s (implicit) measure+when there is none

Needs a Measure – r or mu. Whenever we see “drift”, it means expected growth or the Mean of some distribution (of a N@T). There’s a probability measure in the context. This could be a physical measure or a T-fwd measure or a stock-numeraire or the “risk-neutral-measure” i.e. MoneyMarketAcct as the numeraire

Needs a Measure – dW. Brownian motion is always a probabilistic notion, under some measure

Needs a Measure – E[…] is expectation of something. There’s a measure in the context.

Needs a Measure – Pr[..]

Needs a Measure – martingale

Regardless of measure – time-zero fair price of any contract. The same price should result from derivation under any measure.

Regardless of measure – arbitrage is arbitrage under any measure

change measure but using cash numeraire #drift

Background — “Drift” sounds simple and innocent, but no no no.
* it requires a probability measure
* it requires a numeraire
* it implies there’s one or (usually) more random process with some characteristics.

It’s important to estimate the drift. Seems essential to derivative pricing.
BA = a bank account paying a constant interest rate, compounded daily. No uncertainty no pr-distro about any future value on any future date. $1 today (time-0) becomes exp(rT) at time T with pr=1 , under any probability measure.

MMA = money market account. More realistic than the BA. Today (time-0), we only know tomorrow’s value, not further.
Z = the zero coupon bond. Today (time-0) we already know the future value at time-T is $1 with Pr=1 under any probability measure. Of course, we also know the value today as this bond is traded. Any other asset has such deterministic future value? BA yes but it’s unrealistic.
S = IBM stock
Now look at some tradable asset X. It could be a stock S or an option C or a futures contract … We must must, must assume X is tradable without arbitrage.
—- Under BA measure and cash as numeraire.
   X0/B0 = E (X_T/B_T) = E (X_T)/B_T   =>
   E (X_T)/X0 = B_T/B0
Interpretation – X_T is random and non-deterministic, but its expected value (BA measure) follows the _same_ drift as BA itself.
—- Under BA measure and using BA as numeraire or “currency”,
   X0/B0 = E (X_T/B_T)
Interpretation – evaluated with BA as currency, the value of X will stay constant with 0 drift.
—- Under T-measure and cash numeraire
   X0/Z0 = E (X_T/Z_T) = E (X_T)/$1   =>
   E (X_T)/X0 = 1/Z0
Interpretation — X_T is random and non-deterministic, but its expected value (Z measure) follows the _same_ drift as Z itself.
—- Under T-measure and using Z as numeraire or “currency”,
   X0/Z0 = E (X_T/Z_T)
Interpretation – evaluated with the bond as currency, the value of X will stay constant with 0 drift.
—- Under IBM-measure and cash numeraire
   X0/S0 = E (X_T/S_T)
Interpretation – can I say X follows the same drift as IBM? No. The equation below doesn’t hold because S_T can’t come out of E()!
     !wrong —>       E (X_T)/X0 = S_T/S0    ….. wrong!
—- Under IBM-measure and IBM numeraire… same equation as above.
Interpretation – evaluated with IBM stock as currency, the value of X will stay constant with 0 drift.

Now what if X is non-tradable i.e. not the price process of a tradable asset? Consider random variable X = 1/S. X won’t have the drift properties above. However, a contract paying X_T is tradeable! So this contract’s price does follow the drift properties above. See

numeraire paradox

Consider a one-period market with exactly 2 possible time-T outcomes w1 and w2.

Among the tradable assets is G. At termination, G_T(w1) = $6 or G_T(w2) = $12. Under G-measure, we are given Pr(w1) = Pr(w2) = 50%. It seems at time-0 (right now) G_0 should be $9, but it turns out to be $7! Key – this Pr is inferred from (and must be consistent with) the current market price of another asset [1]. Without another asset, we can’t work out the G-distro. In fact I believe every asset’s current price must be consistent with this G-measure Pr … or arbitrage!

Since every asset’s current price should be consistent with the G-Pr, I feel the most useful asset is the bond. Bond current price works out to Z_0 = $0.875. This implies a predicable drift rate.

I would say under bond numeraire, all assets (G, X, Z etc) have the same drift rate as the bond numeraire. For example, under the Z-numeraire, G has the same drift as Z.

Q: under Z-measure, what’s G’s drift?
A: $7 -> $8

It’s also useful to work out under Z-measure the Pr(w1) = 66.66% and Pr(w2) = 33.33%. This is using the G_0, G_T numbers.

Now can there be a 0-interest bank account B? In other words, could B_T = B_0 = $1? No, since such prices imply a G-measure Pr(w1) like 5/7 (Verified!) So this bank account’s current price is inconsistent with whatever asset used in [1] above.

The most common numeraires (bank accounts and discount bonds) have just one “outcome”. (In a more advanced context, bank account outcome is uncertain, due to stoch interest rates.) This stylized example is different. Given a numeraire with multiple outcomes, it’s useful to infer the bond numeraire. It’s generally easier to work with one-outcome numeraires. I feel it’s even better if we know the exact terimnal price and the current price of this numeraire — I guess only the discount bond meet this requirement.

I like this stylized 1-period, 2-outcome world.
Q1: Given Z_T, Z_0, G_0, G_T [2], can i work out the G-Pr (i.e. distro under G-numeraire)? can i swap the roles and work out the Z-Pr ?
A: I think we can work out both distros and they aren’t identical !

Q2: Given G_0 and the G_T possible values[2] without Z prices, can we work out the G-Pr (i.e. distro under G-numeraire)?
A: no we don’t have a numeraire. In a high vs a low interest-rate world, the Pr implied by G_T would be different

[2] these are like pre-set enum values. We only know these values in this unrealistic world.

present value of 22 shares, using share as numeraire

We all know that the present value of $1 to be received in 3Y is (almost always) below $1, basically equal to exp(-r*3) where r:= continuous compound risk-free interest rate. This is like an informal, working definition of PV.

Q: What about a contract where the (no-dividend) IBM stock is used as currency or “numeraire”? Suppose contract pays 33 shares in 3Y… what’s the PV?

%%A: I feel the PV of that cash flow is 33*S_0 i.e current IBM stock price.
I feel this “numeraire” has nothing to do with probability measure. We don’t worry about the uncertainty (or probability distribution) of future dollar price of some security. The currency is the IBM stock, so the future value of 1 share is exactly 1, without /uncertainty/randomness/ i.e. it’s /deterministic/.
Similarly, given a zero bond will mature (i.e. cash flow of $1) in 3Y, PV of that cash flow is Z_0 i.e. the current market value of that bond.

stoch Process^random Variable: !! same thing

I feel a “random walk” and “random variable” are sometimes treated as interchangeable concepts. Watch out. Fundamentally different!

If a variable follows a stoch process (i.e. a type of random walk) then its Future [2] value at any Future time has a Probability  distribution. If this PD is normal, then mean and stdev will depend on (characteristics of) that process, but also depend on the  distance in time from the last Observation/revelation.

Let’s look at those characteristics — In many simple models, the drift/volatility of the Process are assumed unvarying[3]. I’m not familiar with the more complicated, real-world models, but suffice to say volatility of the Process is actually time-varying. It can even follow a stoch Process of its own.

Let’s look at the last Observation — an important point in the Process. Any uncertainty or randomness before that moment is  irrelevant. The last Observation (with a value and its timestamp) is basically the diffusion-start or the random-walk-start. Recall Polya’s urn.

[2] Future is uncertain – probability. Statistics on the other hand is about past.
[3] and can be estimated using historical observations

Random walk isn’t always symmetrical — Suppose the random walk has an upward trend, then PD at a given future time won’t be a nice  bell centered around the last observation. Now let’s compare 2 important random walks — Brownian Motion (BM) vs GBM.
F) BM – If the process is BM i.e. Wiener Process,
** then the variable at a future time has a Normal distribution, whose stdev is proportional to sqrt(t)
** Important scenario for theoretical study, but how useful is this model in practice? Not sure.
G) GBM – If the process is GBM,
** then the variable at a future time has a Lognormal distribution
** this model is extremely important in practice.

today’s price == today’s expectation of tomorrow’s price

“today’s [3] price[1] equals today’s[3] expectation[2] of tomorrow’s price” — is a well-known catch phrase. Here are some learning notes I jotted down.

[1] we are talking about tradeable assets only. Counter examples – Interest rate and Dividend-paying stock are not tradeable by definition, and won’t follow this rule.

[2] expectation is always under some probability distribution (or probability “measure”). Here the probability distro is inferred from all market prices observable Today. The prices on various derivatives across different maturities enable us to infer such a probability distribution. Incidentally, the prices have to be real, not some poor bid/ask spread that no one would accept.

[3] we use Today’s prices of other securities to back out an estimated fair price (of the target security) that’s fair as of Today. Fair meaning consistent with other prices Today. This estimate is valid to the extent those “reference prices” are valid. As soon as reference prices change, our estimate must re-adjust.

GBM formulas – when to subtract 0.5σ^2 from u

Background – I often get confused when (not) to subtract. Here’s a brief summary.
The standard GBM dynamic is
                dS = mu S dt + σ S dW …. where mu and σ are time-invariant.
The standard solution is to find the dynamics of logS, denoted L,
                dL = (mu – 0.5σ2 ) dt + σ dW …  BM, not GBM. No L on the RHS.
                L (time=T)     ~  N (mean = (mu – 0.5σ2 )T, std = …. )
So it seems our mu can’t get rid of –0.5σ2 thingy … Until we take expectation of S(time=T)
                E S(time=T) = S(time=0) exp(mu*T)     … no σ2 term
When we write down the Black Scholes PDE we use mu without the –0.5σ2 thingy.
BS formula uses mu without the –0.5σ2 thingy.

a contract paying log S_T or S^2

Background – It’s easy to learn BS without knowing how to price simpler contracts. As show in the 5 (or more) examples below, there are only a few simple techniques. We really need to step back and see the big picture.
Here’s a very common pricing problem. Suppose IBM stock price follows GBM with u and σ. Under RN, the drift becomes r, the bank account’s constant IR (or probably the riskfree rate), therefore, 

Given a (not an option) contract that on termination pays log ST, how much is the contract worth today? Note the payoff can be negative.
Here’s the standard solution —
1) change to RN measure, but avoid working with the discounted price process (too confusing). 
2) write the RN dynamics as a BM or GBM. Other dynamics I don’t know how to handle.
Denote L:= log St and apply Ito’s
dL = A dt + σ dW … where A is a time-invariant constant. So log of any GBM is a BM.
I think A = r – 0.5σ2 but the exact formula is irrelevant here.
3) so at time T, L ~ N(mean = L0 + A*T, std = …)
4) so RN-expectation of time-T value of L is L0 + A*T
5) discount the expectation to PV
Note L isn’t the price process of a tradable, so below is wrong.
E (LT/ BT) = L0/ B0   … CANNOT apply martingale formula
— What if payout = ST2 ? By HW4 Q3a the variable Jt:=St2 is a GBM with some drift rate B and some volatility.  Note this random process Jt is simply derived from the random process St. As such, Jt is NOT a price of any tradable asset [1].
Expectation of J’s terminal value = J0 exp(B*T)
I guess B = 2r + σ2 but irrelevant here.

[1] if Jt were a price process, then the discounted value of it would be martingale i.e 0 drift rate. Our Jt isn’t martingale. It has a drift rate, but this drift rate isn’t equal to the risfree rate. Only a tradable price process has such a drift rate. To clear the confusion, there are common 3 cases
1) if Jt is a price process (GBM or otherwise), then under RN measure, drift rate in it must be r Jt. See P5.16 by Roger Lee
2) if Jt is a discounted price process, then under RN measure, drift rate is 0 — martingale.
3) if not a price process, then under RN measure, drift rate can be anything.

— What if payout = max[0, (ST2) – K]?  This requires the CBS formula.
— What if payout = max[0, (logST) – K]? Once you know the RN distribution of logST is normal, this is tractable.
— what if payout = max[0, ST – K] but the stock pays continuous dividend rate q? Now the stock price process is not a tradeable.
No We don’t change the underlier to the tradeable bundle. We derive the RN dynamics of the non-tradeable price S as
dS = (r-q) S dt + σ S dW … then apply CBS formula.
So far all the “variables” are non-tradeable, so we can’t apply the MG formula
— What if payout = STXT where both are no-dividend stock prices. Now this contract can be statically replicated. Therefore we take an even simpler approach. Price today is exactly S0X0

GBM + zero drift

I see zero-drift GBM in multiple problems
– margrabe option
– stock price under zero interest rate
For simplicity, let’s assume X_0 = $1. Given

        dX =σX dW     …GBM with zero drift-rate

Now denoting L:= log X, we get

                dL = – ½ σ2 dt + σ dW    … BM not GBM. No L on the RHS.
Now L as a process is a BM with a linear growth (rather than exponential growth).
LogX_t ~ N ( logX_0  – ½ σ2t  ,   σ2t )
E LogX_t = logX_0  – ½ σ2t  ….. [1]
=> E Log( X_t / X_0)  = – ½ σ2t  …. so expected log return is negative?
E X_t = X_0 …. X_t is a log-normal squashed bell where x-axis extends from (0 to +inf) [3].

Look at the lower curve below.
Mean = 1.65 … a pivot here shall balance the “distributed weights”
Median = 1.0 …half the area-under-curve is on either side of Median i.e. Pr(X_t < median) = 50%

Therefore, even though E X_t = X_0 [2], as t goes to infinity, paradoxically Pr(X_t<X_0) goes to 100% and most of the area-under-curve would be squashed towards 0, i.e. X_t likely to undershoot X_0.

The diffusion view — as t increases, more and more of the particles move towards 0, although their average distance from 0 (i.e. E X_t) is always X_0. Note 2 curves below are NOT progressive.

The random walker view — as t increases, the walker is increasingly drawn towards 0, though the average distance from 0 is always X_0. In fact, we can think of all the particles as concentrated at the X_0 level at the “big bang” of diffusion start.

Even if t is not large, Pr(X_t 50%, as shown in the taller curve below.

[1] horizontal center of of the bell shape become more and more negative as t increases.
[2] this holds for any future time t. Eg: 1D from now, the GBM diffusion would have a distribution, which is depicted in the PDF graphs.
[3] note like all lognormals, X_t can never go negative 

File:Comparison mean median mode.svg

martingale – phrasebook

Like the volatility concept, mg is a fundamental concept but not a simple concept. It's like an elephant for the 5 blind men. It has many aspects.
process? a martingale is a process. At any time the process has a value.

MG property? A security could have many features and one of them could be the mg property meaning the security's fair value is a process and it meets the mg definition and is a mg process.

0-expectation? Expn(M_tomorrow – M_now) = 0

no-drift? A variable or a price (that qualifies as a process) with no drift is a mg.

differential ^ integral in Ito’s formula

See posts on Ito being the most precise possible prediction.

Given dynamics of S is    dS = mu dt + sigma dW  , and given a (process following) a function f() of S,  then, Ito’s rule says

    df = df/dS * dS + 1/2 d(df/dS)/dS * (dS)^2

There are really 2 different meanings to d____

– The df/dS term is ordinary differentiation wrt to S, treating S as just an ordinary variable in ordinary calculus.
– The dt term, if present, isn’t a differential. All the d__ appearing outside a division (like d_/d__) actually indicates an implicit integral.
** Specifically, The dS term (another integral term) contains a dW component. So this is even more “unusual” and “different” from the ordinary calculus view point.

signal-noise ^ predictive formula – GBM

The future price of a bond is predictable. We use a predication formula like bond_price(t) = ….

The future price of a stock, assumed GBM, can be described by a signal-noise formula

S(t) =

This is not a prediction formula. Instead, this expression says the level of S at time t is predicted to be a non-random value plus a random variable (i.e. a N@T)

In other words, S at time t is a noise superimposed on a signal. I would call it a signal-noise formula or SN formula.

How about the expectation of this random variable S? The expectation formula is a prediction formula.

drift under a given measure (but +! dividing by its numeraire)

See post on using cash numeraire.

I think we can assume for each numeraire, there’s just one [1] probability measure. That measure defines the probability distribution of any price process.  We can use that measure to evaluate expectations, to talk about Normal/Lognormal or dW, and to evaluate “exponential” drift (the “m” below), assuming

                dX = m X dt
Under the standard risk-neutral measure, the exponential drift is the same ( =r ) for all TRADEABLE assets, even though physical drift rates are not uniform. Specifically, the bank account itself (paying exponential short rate r) has a drift = r. So does the discount bond. So does a stock. So does a fwd contract. So does a vanilla call or binary call. So does an asset-or-nothing call.
At this point, we don’t need to worry about martingale or numeraire, though all the important results come from numeraire/MG reasoning.
I feel it’s important to remember drift is a __prediction__ about the future. It’s inherently based on some assumed probability distribution i.e. a probability measure. That probability distribution is derived from many live prices about T-expiry contracts.
Therefore, under another predicative probability distribution/measure, the predicted drift would differ.
The stock-measure is trickier. Take IBM. There exists an IBM measure. Under this measure, i.e. operating under this new (predictive) probability distribution, we can derive the (predicted) exponential drift rate of any asset’s price movement. Specifically, we can work out the predicted drift of the IBM price process. That drift is r + sigma^2, where

r:= exponential drift rate of the bank account i.e. money-market account. Consider it a physical drift but actulaly this is non-random and the same drift speed under any measure
Sigma:= the volatility of IBM. Same value under any measure.
[1] there might exists multiple, but I don’t bother.

Stoch Lesson 59 meaning of q[=] in a simple SDE

See Lesson 55 about details on deltaW and dW
See Lesson 19 about N@T
See Lesson 33 for a backgrounder on the canonical Wiener variable W

The Hull definition of the canonical Wiener process (Lesson 33) —

deltaW = epsilon * sqrt(deltaT) // in discrete time
dW      = epsilon * sqrt(dT) // in continuous time

The “=” has a different meaning than in algebra.

Discrete time is simpler to understand. Recall deltaW is a stepsize of a random variable. The “=” doesn’t mean a step size value of 0.012 is equal to the product of an epsilon value and sqrt(deltaT).

The “=” means equivalent-to.

Here epsilon represents … (hold your breath)… a noisegen, in fact the canonical Gaussian noisegen.

I’d say both deltaW and epsilon are N@T. These are not regular variables.

Stoch Lesson 33 canonical Wiener variable ^ Gaussian variable

See Lesson 05 for the backgrounder on Level, Stepsize, time-varying random variable…
See Lesson 15 about TVRV
See Lesson 19 about N@T

In many formulas in this blog (and probably in the literature), W denotes not just some Wiener variable, but THE canonical TVRV random variable following a Wiener process a.k.a BM. Before we proceed it’s good (perhaps necessary) to pick a concrete unit of time. Say 1 sec. Now I am ready to pin down THE canonical Wiener variable W in discrete-time —

   Over any time interval h seconds, the positive or negative increment in W’s Level is generated from a Gaussian noisegen, with mean 0 and variance equal to h. This makes W THE canonical Wiener variable. [1]

Special case – If the interval is from last observation, when Level is 0, to 55 sec later, then dW = W(t=55) – 0 = W(t=55), and therefore W@55sec, as a N@T, also has a Gaussian distribution with variance = 55.

[1] I think this is the discrete version of Standard Brownian Motion or SBM, defined by Lawler on P42 with 2+1 defining properties — 1) iid random increments 2) no-jump, which __implies__ 3) Gaussian random increments

Now let’s look at the standard normal distro or canonical Gaussian distro or Gaussian noisegen — If something epsilon follows a canonical Gaussian distribution, it’s often a N@T, which is not a time-varying random variable. Also the variance and stdev are both 1.0.

I believe the canonical Wiener variable can be expressed in terms of the canonical Gaussian variable —

  deltaW = epsilon * sqrt(deltaT)  //  in discrete time
  dW = epsilon * sqrt(dT)            //  in continuous time

Let’s be concrete and suppose deltaT is 0.3 yoctosecond (more brief than any price movement). In English, this says “over a brief 0.3 yoctosecond, step_size is generated from a Gaussian noisegen with variance equal to 0.3 * 10^-24”. If we simulate this step 9999 times, we would get 9999 deltaW (stesp_size) realization values. These realizations would follow a bell-shaped histogram.

Given dW can be expressed this way, many authors including Hull uses it all the time.

Both the canonical Wiener variable and the canonical Gaussian distribution have their symbols — W vs epsilon(ϵ), or sometimes Z. They show up frequently in formulas. Don’t confuse them.

The Wiener var is always a TVRV; the Gaussian var is often a N@T.

Stoch Lesson J101 – W(t) isn’t a traditional function-of-time

See lesson 05 for a backgrounder on Level, steps
See Lesson 33 for a backgrounder on the canonical Wiener variable W

Let’s look at the notation W(t). This suggests the Level of W is a function of t. Suppose i = 55, I’d prefer the notation W_55 or Level_55, i.e. the level AFTER step_55. This level depends on i (i.e. 55), depends on t (i.e. 55 intervals after last-observation), and also depends on the 55 queries on the noisegen. Along one particular path W may be seen as a traditional function of t, but it’s misleading to think of W as a function t. Across all paths, at time t_55, W is W_55 and includes all the 9999 realized values after step_55 and all the “unrealized” values.

In other words, W at time t_55 refers to the “distribution” of all these possible values. W at time t_55 is a cross section of the 9999+ paths. The symbol W(t) means the “Distribution of W’s likely values at a future time t seconds after last observation“. Since W isn’t a traditional function of t, dW/dt is a freak. As illustrated elsewhere on this blog, the canonical Wiener variable W is not differentiable.

Stoch Lesson 55 deltaW and dW

See Lesson 05 about stepsize_i, and h…
See Lesson 33 for a backgrounder on the canonical Wiener variable W

Note [[Hull]] uses “z” instead of w.

Now let’s explain the notation deltaW in the well-known formula

S_i+1 – S_i == deltaS = driftRate * deltaT + sigma * deltaW

Here, deltaW is basically stepsize_i, generated by the noisegen at the i’th step. That’s the discrete-time version. How about the dW in the continuous time SDE? Well, dW is the stepsize_i as deltaT -> 0. This dW is from a noisegen whose variance is exactly equal to deltaT. Note deltaT is the thing that we drive to 0.

In my humble opinion, the #1 key feature of a Wiener process is that the Gaussian noisegen’s variance is exactly equal to deltaT.

Another name for deltaT is h. Definition is h == T/n.

Note, as Lawler said, dW/dt is meaningless for a BM, because a BM is nowhere differentiable.

Stoch Lesson J88 when to add scaling factor sqrt(t)

See Lesson 05 for a backgrounder on h.
See Lesson 15 for a backgrounder on paths and realizations.

In the formulas, one fine point easy to missed out is whether to include or remove sqrt(t) in front of dW. As repeated many times, notation is extremely important here. Before addressing the question, we must spend a few paragraphs on notations.
It’s instructive to use examples at this juncture. Suppose we adopt (h=) 16-sec intervals, and generate 9999 realizations of the canonical Wiener process. The 9999 “realized” stepsize values form a histogram. It should be bell-shaped with mean 0 and variance 16.0, stdev 4.0. If we next adopt (h=) 0.09-sec intervals, and generate 8888 realizations of the same process, then the resulting 8888 stepsize values should show variance 0.09, stdev 0.3.
That’s the canonical Wiener variable. So dW is defined as the stepsize as h -> 0. So dW has a Gaussian distribution with variance -> 0. Therefore dW is not customized and has well-known standard properties, including the sqrt(t) feature.
The simplest, purest, canonical Wiener variable already shows the sqrt(t) feature. Therefore, we should never put sqrt() in front of dW.
In fact, sqrt(t) scaling factor is only used with epsilon (or Z), a random variable representing the standard normal noisegen, with a fixed variance = 1.0

Stoch Lesson 22 any thingy dependent on a TVRV is likely a TVRV

See Lesson 05 about the discrete-time S_i+1 concept.
See Lesson 15 about TVRV.

I feel in general any variable dependent on a random variable is also a random variable, such as the S in

S_i+1 – S_i = deltaS = a * deltaT + b * deltaW

The dependency is signified by the ordinary-looking “+” operator. To me this addition operator means “superimpose”. The deltaS or stepsize is a combination of deterministic shift superimposed on a non-deterministic noise. That makes S itself a time-varying random variable which can follow a trillion possible paths from last-observation to Expiry.

The addition doesn’t mean the stepsize_i+1 will be known once both components i.e. (a * deltaT) and (b * deltaW) are known. In fact, deltaW can take a trillion possible values, so the stepsize in S is not exactly predictable i.e. non-deterministic. This stepsize is random. Therefore S itself is a TVRV.

Stoch Lesson 19 N@T is !! a TVRV

See Lesson 05 about norm().
See Lesson 15 about TVRV.

If we say some measurable value x ~ norm(m,v), then this x shows a normal distribution with mean m and variance v. I feel it’s safe to say x is from a particular noisegen, which is /characterized/ by the pair m and v.

Now, this x is NOT always a TVRV. Instead, when we say something follows some distribution, we are looking at the crystal ball:

– This x could be the future value of a TVRV at a specific target date, or
– This x could be the _increment_ in a TVRV over a future interval.

In both cases above, x is a Noisegen Output @ a Future Time — N@T. It’s rather useful to pin down whether some item (in a big formula) is a N@T or a TVRV. Not always obvious. Need a bit of clear thinking.

Stoch lesson 05 – Level^Stepsize ..

A3: stepsize_i. The Level_i value may or may not be normally distributed if we plot the 9999 realizations, but that depends on some factors. For example, if the noisegen is identical and independent on every query then we can rely on Central Limit Theorem. For stock prices that’s not the case.
In this series of lessons, I will create a set of “local jargon” used in later blog posts. First, Imagine the “Level” [Note 1] of a time-varying random variable W is a random walker taking up or down steps at regular intervals. At step_i, the stepsize_i [Note 2] is generated from a (Gaussian or otherwise) noisegen such as a computer. Level_i is the sum of all previous steps, positive or negative, i.e.

    Level_i = stepsize_1 + stepsize_2 + ….stepsize_i

It’s important to differentiate Level_i vs stepsize_i. Q3: which one of them has a normal distribution? Answer is hidden somewhere.

Notation is important here. It’s extremely useful to develop ascii-friendly symbols, with optional font sizing. These notations will be used in subsequent “lessons”. Here are a few more notations and jargon —

Let’s divide the total timespan T — from last-observation to Expiry — into n equal intervals. Denote a particular step as Step “i”, so first step has i=1. Let’s denote interval length as h=T/n = t_i+1 – t_i

I will use norm(a,b) to denote a Gaussian noisegen with mean=a and variance=b, so stdev=sqrt(b).

[1] The word “value” is too vague compared to Level.
[2] a.k.a. increment_i but less precise.

Stoch Lesson 15 – paths and time-varying random variables

See Lesson 05 for a backgrounder on the n steps of random walk…

RVRV is my own jargon, related to a stoch process. Before I’m confident to use the process jargon, I’ll use my own jargon.

Every time-varying-random-variable has a “Level” [1] at a given time, and therefore the variable has paths. The concept of path and the concept of time-varying-random-variable are intertwined.

For the random walker to go though the n steps one round, we query the noisegen n times. That’s a single realization of the random Process. If the walk is on a conveyer belt, then we see a “path”. One realization maps to one path. 9999 realizations would show 9999 paths and produce a good histogram.

Not everything we see in the formulas is a TVRV. The “h” isn’t; deltaAnything isn’t; drift rate isn’t … W is, though deltaW (ΔW)  isn’t. S is, though deltaS (ΔS) isn’t.

A note on randomness assumed in a stoch process — the future is usually assumed uncertain, but I won’t conclude that anything and everything in the future is random. The maturity value of a 12M time deposit is known, since default risk is assumed zero.

[1] actually not a single Level but multiple possible Levels. At a given time on each possible path, there’s a single Level.

GBM random walk – again

Mostly this write-up will cover the discrete-time process. In continuous, it’s no longer a walk [1]. Binomial tree, Monte Carlo and matlab are discrete.

Let’s divide the total timespan T — from last-observed to Expiry — into n equal intervals. At each step, look at ln(S_new/S_old), denoted r. (Notation is important in this field. It’s extremely useful to develop ascii-friendly symbols…) It’s good to denote the current step as Step “i”, so first step has i=1 i.e. r_1=ln(S_1/S_0). Let’s denote interval length as h=T/n.

To keep things simple let’s ignore the up/down and talk about the step size only. Here’s the key point —

Each step size such as our r_i is ~norm(0, h). r_i is non-deterministic, as if controlled by a computer. If we generate 1000 “realizations” of this one-step stoch process, we get 1000 r_i values. We would see a bell-shaped histogram.

What’s the “h” in the norm()? Well, this bell has a stdev, whose value depends on h. Given this is a Wiener process, sigma = sqrt(h). In other words, at each step the change is an independent random sample from a normal bell “generator” whose stdev = sqrt(step interval)

[1] more like a victim of incessant disturbance/jolt/bombardment. The magnitude of each movement would be smaller if the observation interval shortens so the path is continuous (– an invariant result independent of which realization we pick). However, the same path isn’t smooth or differentiable. On the surface, if we take one particular “realization” with interval=1microsec, we see many knee joints, but still a section (a sub-interval) may appear smooth. However, that’s the end-to-end aggregate movement over that interval. Zooming into one such smooth-looking section of the path, now with a new interval=1nanosec, we are likely to see knees, virtually guaranteed given the Wiener definition. If not in every interval then in most intervals. If not in this realization then in other realizations. Note a knee joint is not always zigzag . If 2 consecutive intervals see identical increments then the path is smooth, otherwise the 2-interval section may look like a reversal or a broken stick.

Brownian random walk -> sqrt(t)

A longer title would be “from random walk model to a stdev proportional to sqrt(t)”

Ignore the lognormal;
Ignore the rate of return;
Ignore stock prices. Just imagine a Weiner process. I find it more intuitive to consider the discrete time random walk. Assuming no drift, at each step the size and direction of the step is from a computer that generates a random number from a normal distribution like MSExcel normsinv(rand()), I’d like to explain/derive the important observation that t units of time into the Future, the UNKNOWN value of x has a Probability distribution that’s normal with mean 0 and stdev √t.

Now, time is customarily measured in years, but here we change the unit of time to picosecond, and assume that for such a short period, the future value of x has a ProbDist “b * ϵ(0,1)”, whose variance is b*b. I think we can also use the notatinon n(0,b*b).

Next, for 2 consecutive periods into the Future, x takes 2 random steps, so the sum (x_0to1 + x_1to2) also has a normal distribution with variance 2b*b. For 3 steps, variance is 3b*b…. All because the steps are independent — Markov property.

Now if we measure t in picosecond, then t means t picosecond, so the Future value after t random steps has a normal distribution with variance t b*b. So stdev is b*√t

For example, 12 days into the future vs 3 days into the future, the PD of the unknown value would have 2 normal distributions. stdev_12 = 2 * stdev_3.

Weiner process, better understood in discrete time

[[Hull]] presents a generalized Wiener process

dx = a dt + b dz

I now feel this equation/expression/definition is easier understood in discrete time. Specifically, x is a random variable, so its

Future value is unknown so we want to predict it with a pdf (super-granular histogram). Since x changes over time, we must clarify

our goal — what's the probability distribution of x a a time t a short while later? I feel this question is best answered in

discrete time. So we throw out dt and dz. (As a Lazy guy U don't even need delta_t and delta_z).

Let's make some safe simplifying assumptions : a = 0; b = 1 and last observation is x = 0. These assumptions reduce x to a Weiner

variable (i.e. x follows a Weiner process). At at a (near) future t units[1] away, we predict x future value with a normal

distribution whose stdev=sqrt(t).

[1] time is measured in years by custom

Now, What if I want to estimate the rate of change (“slope” of the chart) i.e. dx/dt? I don't think we can, because this is stoch

calculus, not ordinary calculus. I am not sure if we can differentiate or integrate both sides.

histogram/pdf -incompatible- binomial tree

This is not really restricted to option pricing. Actually, forget about option pricing! I’m thinking about the relationship of a pdf/histogram and a standard diamond-filled binomial tree. mentions but doesn’t explain the10,000 simulations. One way to simulate is via the binomial tree.

To get 2001 nodes (each corresponding to a range described in that blog), we need 2000 steps to grow our tree. However, the standard CRR btree has the 2001 price points equally spaced logarithmically. A small CRR btree might grow to these price levels.


Therefore the 2001 ranges in will not map 1-to-1 to 2001 price nodes in a btree. Multiple high price Ranges will map to the highest tree Node, and many low Nodes will fall into the lowest Range of $0~$1.

drift + random walk — baby steps

Update – a discrete random walk assumes the step size (in log space) is normally distributed.

When we enhance a granular binomial tree to be even higher granularity, the interval between 2 tree levels (sampling interval) becomes infinitesimal and we can use the standard calculus notation of ” dt “. Black-Scholes differential equation becomes

\frac{dS}{S} = \mu \,dt+\sigma \,dW\,
Note t is a Timespan, not a Datetime. I call it TTL, time-to-live or time-to-maturity.
Note the LHS denominator is the spot price, not “t”. The expression (dS/S) measures an stock’s percentage return as _holding_period_ becomes infinitesimal. Basic calculus[1] gives us the integral of the LHS
    integral(1/S * dS) = ln(S)
Ignoring the dW part, integrating the right-hand-side gives some linear function of t [0]. Therefore under zero volatility, stock price is an exponential function of t [2]. Therefore Drift is exponential — continuous compounding.
[0] note the “variable-of-integration” is S on the left hand side but ” t ” on the right-hand-side. This was a bit confusing to me.
Integrating the dW part is harder. Actually, since the dW (unlike drift) is inherently random, I doubt we can simply get the integral and predict the S at any value of t. Instead, we hope to derive the pdf of S at any point t in the future. Let me repeat the implicit but fundamental assumption — the value of S at a given t is a random variable but has a pdf. This randomness comes from the dW, not the drift.
Once we have a pdf of S(t), expiration value of an European call is tractable. Since the terminal value is a hockey-stick payoff function, we multiply the pdf by a piecewise linear function, and find area under the curve. See other blog posts.
A note on the sigma in
\frac{dS}{S} = \mu \,dt+\sigma \,dW\,
BS assumes sigma to be constant. When sigma itself moves up and down following a random motion, we have a stochastic volatility model. A simplified non-constant sigma model is the local-volatility model, popular in investment banking.

fake random walker – compounded rate of return

I wrote this in May 2012, before I took my Stoch class on Geometric Brownian motion.

I feel stock price is a random walker but ln(PriceRelative) is not. Let’s denote this ln() value as P.

As a (one-dimension) random walker, a stock price S(at time t) can move from 1000 to 1001.2 to 1000.3 to 999 to 999.5… The steps are cumulative. The current value is the cumulative effect of all incremental steps.

Now look at P. It might take on +0.03, +0.5, -0.01 -0.6, -1, +2… Not cumulative no drift. In fact each value in the series is independent of the previous values. I feel P is a random variable but not Brownian.

If L is a Brownian walker (i.e. follows a Wiener process), then deltaL (incremental change in S) is similar to our P.

Q: So which RV is Normal and which is LogNormal in this model?
%%A: I believe S(after i steps) denoted S_i has a log normal distribution but ln(S_i) is normal. More specifically —

Suppose matlab generates 3000 realizations (of the random process). In each path, we pick the i’th step so we get 3000 realizations of S_i. A histogram of the 3000 is a punched bell. However, if we compute ln(S_i) for the 3000 realizations, the histogram is bell-shaped.

Q: So which RV is in a geometric Brownian motion?
%%A: S.

Intuitively, if a RV is in a strictly Brownian motion (not Geometric), then its value at any time has a Normal (not LogNormal) distribution.

Stoch volatility ^ local-volatility

SV doesn’t refer to the random walk of a stock price (or an forex rate). SV refers to the random walk of instant volatility value [2]. This instant volatility (IV) can take on a value of 10%pa now, and 11.5%pa an hour later [1]. If a stock price were to fluctuate constantly by the micro second, then we would be able to record these movements during each second and compute realized/historical IV values for each interval.

[1] Note all volatility values are annualized, just as we compare different rice brands by per-kg price.
[2] realized vol or implied vol? Irrelevant. In the BS theory, volatility is a concept related to Brownian motion. Both r-vol and i-vol are indications of that theoretical volatility. I feel in this /context/, there’s no differentiation of implied vs realized vol.

I feel many people agree that it’s a sound assumption to assume IV follows a random walk, but there are very different random walks. For example, the stock price itself also follows a random walk, but that random walk is carefully modeled by the drift + the Brownian motion. That’s one type of random walk. The IV random walk is different and I call it a special random walk (SRW), for want of a better word.

Basically, SV models assume
1) the stock price follows a random walk characterized by an IV variable, along with a drift
2) this variable doesn’t assume a constant value as BS suggested, but follows a SRW. This SRW is described by a state variable, which depends on current stock price and has a mean-reverting tendency.

I find the mean-reversion assumption quite convincing (yes I do). In reality, if we measure the realized IBM volatility over each trading day and write down those realized-vol values on a table top calendar, we will see it surges and drops but always stays within a range instead of growing steadily. The stock price may grow steadily (drift) but the realized vol doesn’t.

SABR and local vol were said to be 2 models describing stochastic volatility, but veterans told me LV isn’t stochastic at all. I believe LV doesn’t include a dB term in sigma_t i.e the Instantaneous volatility.

LV — when IV is described merely as a function of underlier price St and of time t, we have a local volatility model. The local volatility model is a useful and simple SV model, according to some.

Another veteran in Singapore told me that local vol (like SV) is designed to explain skew. During the diffusion, IV is assumed to be deterministic, and a function of 2 inputs only — spot price at that “instant” i.e. St and t. I guess what he means is, after 888888 discrete steps of diffusion, the underlier could be at any of 888888 levels (in a semi-continuous binomial tree). At each of those levels, the IV for the next step is a function of 2 inputs — that level of underlier price and the TTL.

diffusion start and variance

This is one the many nitty-gritty pitfalls.
Black Scholes assumes a lognormal distribution of stock price as of any given future date, including the expiration date  T —
log ST ~ N( mean = … , variance = σ2 (T – t) )
This says that the log of that yet-unrealized stock price has a normal distribution. Now, as the valuation time “t” moves from ½ T to 0.99 T (approaching expiry), why would variance shrink? I thought if the “target” date is more distant from today, then variance is wider.
Well, I would say t is the so-called diffusion-start date. The price history up until time-t is known and realized. There’s no uncertainty in St. Therefore, (T – t) represents the diffusion window remaining. The longer this window, the larger the spread of diffusing “particles”.
By the way, the “mean” above is      logS0 + [(mu – σ2/2)(T-t)], where mu and σ are parameters of the original GBM dynamics. 

underlying price is equally likely to +25% or -20%

See also P402 [[CFA textbook on stats]] says Black-Scholes “model is based on a normal distribution of underlying asset returns which is the same thing as saying that the underlying asset prices themselves are log-normally distributed.”. Actually, many non-BS models also assume the same, but my focus today is the 2nd part of the sentence.

At expiration, the asset has exactly one price as reported on WSJ. However, if we simulate 1000 experiments, we get 1000 (non-unique) expiration prices. If we plot them in a __histogram__, we get a kind of bell curve. But in Black-Schole’s (and other people’s) simulations, the curve will resemble a log-normal bell. Reason? …..

Well, they tweak their simulator according to their model. They assume underlying price is a random walker taking many small steps, whose probability of reaching 125% equals probability of dropping to 80% at each step. (But remember the walks are tiny steps, so 80% is huge;) Now the reason behind the paradoxical numbers —

  log(new_px/old_px) is normally distributed, so log(1.25)=0.97 and log (0.8)= – 0.97 are equally likely.

Now if we do 1000 experiments and compute the log(price_relative), we get another histogram – a normal (NOT log-normal) curve. Note Price-relative is the ratio of new_Price / old_Price over a holding period.

Here’s Another experiment to illustrate log-normal. Imagine a volatile stock (say SUN) price is now $64. How about after a year ? Black-Scholes basically says it’s

   equally likely to double or half.
Double to $128 or half to $32. log2(new_Price / old_Price) would be 1 or -1 with equal likelihood. Intuitively,

   log (new_Price / old_Price) is normally distributed.

Now consider prices after Year1, Year2, Year3… log2(S2/currentPx) = log2(S2/S1  *  S1/currentPx) = log2(S2/S1) + log2(S1/currentPx). In English this says base-2 log of overall price-relative is sum of the log of annual price-relatives. Among the 3 possible outcomes below, the $256 likelihood equals the $16 likelihood, and is 50% the $64 likelihood.
double-double -> $256
double-half -> $64 unchanged
half-double -> $64 unchanged
half-half -> $16

This stock can also appreciate/drop to other values beside $256,$64,$16, but IF the $256 likelihood is 1.71%, then so is the $16 likelihood, and the $64 likelihood would be 3.42%. We assume no other price “path” will end up at $64 — an unsound assumption but ok for now.

Since log(S2/S1) is normally distributed, so is the sum-of-log. Therefore log(S2/currentPx) is normally distributed.

     log(price-relative) is normal.
     log(cumulative price-relative) is normal for any number of intervals. For example,

Price_After_2years/current_Price is equally likely to double or half.
Price_After_2years/current_Price is equally likely to grow to 125% or drop to 80%.

More realistic numbers — when we shrink the interval to 1 day, the expected price relative looks more like

      “equally likely to hit 101.0101% or drop to 99%”