People ask me to give a short explanation of Black-Scholes Model (not BS-equ or BS-formula)…

I feel random variable problems always boil down to the (inherent) distribution, ideally in the form of a probability density function.

Back to basics. Look at the height of all the kids in a pre-school — There’s a distribution. Simplest way to describe this kind of distribution is a histogram [.8 -1m], [1-1.2m], [1.2-1.4m] … A probability distribution is a precise description of how the individual heights are “distributed” in a population.

Now consider another distribution — Toss 10 fair dice at once and add up the points to a “score”. Keep tossing to get a bunch of scores and examine the distribution of scores. If we know the inherent, natural distribution of the scores, we have the best possible predictor of all future outcomes. If we get one score per day, We can then estimate how soon we are likely to hit a score above 25. We can also estimate by the 30th toss, how “surely” cumulative-score would have exceeded 44.

For most random variables in real life, the inherent distribution is not a simple math function like our little examples. Instead, practioners work out a way to *characterize* the distribution. This is the standard route to solve random variable problems because characterizing the underlying distribution (of the random variable) unlocks a whole lot of insights.

Above are random variables in a static context. Stock price is an evolving variable. There’s a Process. In the following paragraphs, I have mixed the random process and the random variable at the end of the process. The process has a σ and the variable (actually its value at a future time) also has a σ.

———–

In option pricing, the original underlying Random Process Variable (RPV) is the stock price. Not easy to characterize. Instead, the pioneers picked an alternative RPV i.e. R defined as ln(Sn+1 / Sn) and managed to characterize R’s behavior. Specifically, they characterized R’s random walk using a differential equation parametrized by a σinst i.e. the instantaneous volatility [1]. This is the key parameter of the random walk or the Geometric Brownian motion.

Binomial-tree is a popular implementation of BS. B-tree models a stock price [2] as a random walker taking up/down steps every interval (say every second). To characterize the step size __Sn+1 – Sn__, we wanted to get the distribution of step sizes but too hard. As an alternative, we assume R follows a standard Wiener process so the value of R at any future time is normally distributed. But what is this distribution about?

Remember R is an observable random variable recorded at a fixed sampling frequency. Let’s denote R values at each sampling point (i.e. each step of the random walk) as R1, R2 ,R3, R4 …. We treat each of them as independent random variables. ~~If we record a large series of R values, we see a distribution, ~~but this is the wrong route. We don’t want to treat time series values R1, R2 … as observations of the ~~same ~~random variable. Instead, imagine a computer picking an R value at each step of the random walk (like once a second). The distribution of each random pick is programmed into computer. Each pick has a distinct Normal distribution with a distinct σinst_1, σinst_2, σinst_3 …. [4]

In summary, we must analyze the underlying distribution (of S or R) to predict where S might be in the future[3].

[4] A major simplifying assumption of BS is a time-invariant σinst which characterizes the distributions of R at each step of the random walk. Evidence suggests the diffusion parameter σinst does vary and primarily depends on time and current stock price. The characterization of σinst as a function of time and S is a cottage industry in its own right and is the subject of skew modelling, vol surface, term structure of vol etc.

[1] All other parameters of the equation pale in significance — risk-free interest i.e. the drift etc.

[2] While S is a random walker, R is not really a random walker. See other posts.

[3] Like in the dice case, we can’t predict the value of S but we can predict the “distribution” of S after N sampling periods.