probability measure is an underlying assumption

Any stochastic differential equation has to be specified under a specific probability measure. When we change measure, the SDE will need change.

Any BM must be described under a specific probability measure, since the normal distribution assumes a probability measure…

Any expectation must be specified under a specific probability measure.


Countably Infinite collection

In probability, stoch etc we often talk about a distribution with a countable but infinite number of possible outcomes.


countable but finite – most intuitive and simple

uncountable – if the outcome is a real number..

countable but infinite – the set of all integers, or all even numbers…

sample mean ^ cond ^ unconditional expectation

Greg Lawler’s notes point out that cond expectation (CE) is a random variable, and we frequently take UE of CE, or variance of a CE. The tower property (aka iterated expectations) covers the expectation of CE…

Simple example: 2 dice rolled. Guess the sum with one revealed.

The CE depends on the one revealed. The revealed value is a Random Variable, so it follows that the CE is a “dependent RV” or “derived RV“. In contrast, the UE (unconditional exp) is determined by the underlying _distribution_, the source of randomness modeled by a noisegen. This noisegen is unknown and uncharacterized, but has time-invariant, “deterministic” properties, i.e. each run is the same noisegen, unmodified. Example – the dice are all the same. Therefore the UE value is deterministic, with zero randomness. The variance of UE is 0.

Now we can take another look at … sample mean — a statistical rather than probabilistic concept. Since the sample is a random sample, the sample mean is a RV(!) just as the CE is.

Variance of sample mean > 0 i.e. if we take another sample the mean may change. This is just another way of saying the sample mean is a random variable.

My take on Ito’s, using d(X*Y) as example

Let J be the random process defined by Jt := Xt Yt. At any time, the product of X and Y is J’s value. (It’s often instructive to put aside the “process” and regard J as a derived random VARIABLE.) Ito’s formula says
    dJ := d(Xt Yt) = Xt dY + Yt dX + dX dY
Note this is actually a stoch integral equation. If there’s no dW term hidden in dX, then this reduces to an ordinary integral equation. Is this also a “differential equation”? No. There’s no differential here.

Note that X and Y are random processes with some diffusion i.e. dW elements.

I used to see it as an equation relating multiple unknowns – dJ, dX, dY, X, Y. Wrong! Instead, it describes how the Next increment in the process J is Determined and precisely predicted
Ito’s formula is a predictive formula, but it’s 100% reliable and accurate. Based on info revealed so far, this formula specifies exactly the mean and variance of the next increment dJ. Since dJ is Guassian, the distribution of this rand var is fully described. We can work out the precise probability of dJ falling into any range.

Therefore, Ito’s formula is the most precise prediction of the next increment. No prediction can be more precise. By construction, all of Xt, Yt, Jt … are already revealed, and are potential inputs to the predictive formula. If X (and Y) is a well-defined stoch process, then dX (and dY) is predicted in terms of Xt , Yt , dB and dt, such as dX = Xt2 dt + 3Yt dB

The formula above actually means “Over the next interval dt, the increment in X has a deterministic component (= current revealed value of X squared times dt), and a BM component ~ N(0, variance = 9 Yt2 dt)”

Given 1) the dynamics of stoch process(es), 2) how a new process is composed therefrom, Ito’s formula lets us work out the deterministic + random components of __next_increment__.

We have a similarly precise prediction of dY, the next increment in Y. As such, we already know
Xt, Yt — the Realized values
dt – the interval’s length
dX, dY – predicted increments
Therefore dJ can be predicted.
For me, the #1 take-away is in the dX formula, which predicts the next increment using Realized values.

scale up a random variable.. what about density@@

The pdf curve can be very intuitive and useful in understanding this concept.

1st example — given U, the standard uniform RV between 0 and 1, the PDF is a square box with area under curve = 1. Now what about the derived random variable U’ := 2U? Its PDF must have area under the curve = 1 but over the wider range of [0,2]. Therefore, the curve height must scale DOWN.

2nd example — given Z, the standard normal bell curve, what about the bell curve of 2.2Z? It’s a scaled-down, and widened bell curve, as shows.

In conclusion, when we scale up a random variable by 2.2 to get a “derived” random variable, the density curve must scale Down by 2.2 (but not a simple multiply). How about the expectation? Must scale Up by 2.2.

BM – B(3)^B(5) independent@@

Jargon: B3 or B(3) means the random position at time 3. I think this is a N@T.

Q: Are the 2 random variables B3 and B5 independent?
A: no. Intuitively, when B3 is very high, like 3 sigma above the mean (perhaps a value of 892), then B5 is likely to Remain high, because for the next 2 seconds, the walker follows a centered, symmetric random walk, centered at the realized value of 892.

We know the increment from time 3 to 5 is ind of all previous values. Let d be that increment.

B5 = B3 + d, the sum of two ind Normal RV. It’s another normal RV, but dependent on the two!

BM hitting 3 before hitting -5

A common Brownian Motion quiz ([[Zhou Xinfeng]]): Given a simple BM, what's the probability that it hits 3 before it hits -5?

This is actually identical to the BM with upper and lower boundaries. The BM walker stops when it hits either boundary. We know it eventually stops. At that stopping time, the walker is either at 3 or -5 but which is more likely?

Ultimately, we rely on the optional stopping theorem – At the stopping time, the martingale's value is a random variable and its expectation is equal to the initial value.

optional stopping theorem, my take

label – stoch

Background — There's no way to beat a fair game. Your winning always has an expected value of 0, because winning is a martingale, i.e. expected future value for a future time is equal to the last revealed value.

Now, what if there's a stopping time i.e, a strategy to win and end the game? Is the winning at that time still a martingale? If it's not, then we found a way to beat a fair game.

For a Simple Random Walk (coin flip) with upper/lower bounds, answer is intuitively yes, it's a martingale.

For a simple random walk with only an upper stopping bound (say $1), answer is — At the stopping time, the winning is the target level of $1, so the expected winning is also $1, which is Not the starting value of $0, so not a martingale! Not limited to the martingale betting strategy. So have we found a way to beat the martingale? Well, no.

“There's no way to beat a martingale in __Finite__ time”

You can beat the martingale but it may take forever. Even worse (a stronger statement), the expected time to beat the martingale and walk away with $1 is infinity.

The OST has various conditions and assumptions. The Martingale Betting Strategy violates all of them.

square integrable martingale has a more detailed definition than Lawler's.

If a discrete martingale M(n) is a SIM, then

E[ M(99)^2 ] is finite, and so is E[ M(99999)^2 ].

Each (unconditional) expectation is, by definition, a fixed number and not random.

Consider another number “lim_(n-> inf) E[ M(n)^2 ]”. For a given martingale, this “magic attribute” is a fixed number and not random. A given square-integrable martingale may have an magic attribute greater than any number there is, i.e. it goes to infinity. But this magic attribute isn't relevant to us when we talk about square-integrable martingales. We don't care about the limit. We only care about “any number n”.

It's relevant to contrast that with quadratic variation. This is a limit quantity, and not random.

For a given process, Quadratic variation is a fixed value for a fixed timespan. For processA, Quadratic variation at time=28 could be 0.56; at time=30 it could be 0.6.

In this case, we divide the timespan into many, many (infinite) small intervals. No such fine-division in the discussion on square-integrable-martingales