probability measure is an underlying assumption

Any stochastic differential equation has to be specified under a specific probability measure. When we change measure, the SDE will need change.

Any BM must be described under a specific probability measure, since the normal distribution assumes a probability measure…

Any expectation must be specified under a specific probability measure.

Countably Infinite collection

In probability, stoch etc we often talk about a distribution with a countable but infinite number of possible outcomes.


countable but finite – most intuitive and simple

uncountable – if the outcome is a real number..

countable but infinite – the set of all integers, or all even numbers…

sample mean ^ cond ^ unconditional expectation

Greg Lawler’s notes point out that cond expectation (CE) is a random variable, and we frequently take UE of CE, or variance of a CE. The tower property (aka iterated expectations) covers the expectation of CE…

Simple example: 2 dice rolled. Guess the sum with one revealed.

The CE depends on the one revealed. The revealed value is a Random Variable, so it follows that the CE is a “dependent RV” or “derived RV“. In contrast, the UE (unconditional exp) is determined by the underlying _distribution_, the source of randomness modeled by a noisegen. This noisegen is unknown and uncharacterized, but has time-invariant, “deterministic” properties, i.e. each run is the same noisegen, unmodified. Example – the dice are all the same. Therefore the UE value is deterministic, with zero randomness. The variance of UE is 0.

Now we can take another look at … sample mean — a statistical rather than probabilistic concept. Since the sample is a random sample, the sample mean is a RV(!) just as the CE is.

Variance of sample mean > 0 i.e. if we take another sample the mean may change. This is just another way of saying the sample mean is a random variable.

My take on Ito’s, using d(X*Y) as example

Let J be the random process defined by Jt := Xt Yt. At any time, the product of X and Y is J’s value. (It’s often instructive to put aside the “process” and regard J as a derived random VARIABLE.) Ito’s formula says
    dJ := d(Xt Yt) = Xt dY + Yt dX + dX dY
Note this is actually a stoch integral equation. If there’s no dW term hidden in dX, then this reduces to an ordinary integral equation. Is this also a “differential equation”? No. There’s no differential here.

Note that X and Y are random processes with some diffusion i.e. dW elements.

I used to see it as an equation relating multiple unknowns – dJ, dX, dY, X, Y. Wrong! Instead, it describes how the Next increment in the process J is Determined and precisely predicted
Ito’s formula is a predictive formula, but it’s 100% reliable and accurate. Based on info revealed so far, this formula specifies exactly the mean and variance of the next increment dJ. Since dJ is Guassian, the distribution of this rand var is fully described. We can work out the precise probability of dJ falling into any range.

Therefore, Ito’s formula is the most precise prediction of the next increment. No prediction can be more precise. By construction, all of Xt, Yt, Jt … are already revealed, and are potential inputs to the predictive formula. If X (and Y) is a well-defined stoch process, then dX (and dY) is predicted in terms of Xt , Yt , dB and dt, such as dX = Xt2 dt + 3Yt dB

The formula above actually means “Over the next interval dt, the increment in X has a deterministic component (= current revealed value of X squared times dt), and a BM component ~ N(0, variance = 9 Yt2 dt)”

Given 1) the dynamics of stoch process(es), 2) how a new process is composed therefrom, Ito’s formula lets us work out the deterministic + random components of __next_increment__.

We have a similarly precise prediction of dY, the next increment in Y. As such, we already know
Xt, Yt — the Realized values
dt – the interval’s length
dX, dY – predicted increments
Therefore dJ can be predicted.
For me, the #1 take-away is in the dX formula, which predicts the next increment using Realized values.

scale up a random variable.. what about density@@

The pdf curve can be very intuitive and useful in understanding this concept.

1st example — given U, the standard uniform RV between 0 and 1, the PDF is a square box with area under curve = 1. Now what about the derived random variable U’ := 2U? Its PDF must have area under the curve = 1 but over the wider range of [0,2]. Therefore, the curve height must scale DOWN.

2nd example — given Z, the standard normal bell curve, what about the bell curve of 2.2Z? It’s a scaled-down, and widened bell curve, as shows.

In conclusion, when we scale up a random variable by 2.2 to get a “derived” random variable, the density curve must scale Down by 2.2 (but not a simple multiply). How about the expectation? Must scale Up by 2.2.

BM – B(3)^B(5) independent@@

Jargon: B3 or B(3) means the random position at time 3. I think this is a N@T.

Q: Are the 2 random variables B3 and B5 independent?
A: no. Intuitively, when B3 is very high, like 3 sigma above the mean (perhaps a value of 892), then B5 is likely to Remain high, because for the next 2 seconds, the walker follows a centered, symmetric random walk, centered at the realized value of 892.

We know the increment from time 3 to 5 is ind of all previous values. Let d be that increment.

B5 = B3 + d, the sum of two ind Normal RV. It’s another normal RV, but dependent on the two!

BM hitting 3 before hitting -5

A common Brownian Motion quiz ([[Zhou Xinfeng]]): Given a simple BM, what's the probability that it hits 3 before it hits -5?

This is actually identical to the BM with upper and lower boundaries. The BM walker stops when it hits either boundary. We know it eventually stops. At that stopping time, the walker is either at 3 or -5 but which is more likely?

Ultimately, we rely on the optional stopping theorem – At the stopping time, the martingale's value is a random variable and its expectation is equal to the initial value.

optional stopping theorem, my take

label – stoch

Background — There's no way to beat a fair game. Your winning always has an expected value of 0, because winning is a martingale, i.e. expected future value for a future time is equal to the last revealed value.

Now, what if there's a stopping time i.e, a strategy to win and end the game? Is the winning at that time still a martingale? If it's not, then we found a way to beat a fair game.

For a Simple Random Walk (coin flip) with upper/lower bounds, answer is intuitively yes, it's a martingale.

For a simple random walk with only an upper stopping bound (say $1), answer is — At the stopping time, the winning is the target level of $1, so the expected winning is also $1, which is Not the starting value of $0, so not a martingale! Not limited to the martingale betting strategy. So have we found a way to beat the martingale? Well, no.

“There's no way to beat a martingale in __Finite__ time”

You can beat the martingale but it may take forever. Even worse (a stronger statement), the expected time to beat the martingale and walk away with $1 is infinity.

The OST has various conditions and assumptions. The Martingale Betting Strategy violates all of them.

square integrable martingale has a more detailed definition than Lawler's.

If a discrete martingale M(n) is a SIM, then

E[ M(99)^2 ] is finite, and so is E[ M(99999)^2 ].

Each (unconditional) expectation is, by definition, a fixed number and not random.

Consider another number “lim_(n-> inf) E[ M(n)^2 ]”. For a given martingale, this “magic attribute” is a fixed number and not random. A given square-integrable martingale may have an magic attribute greater than any number there is, i.e. it goes to infinity. But this magic attribute isn't relevant to us when we talk about square-integrable martingales. We don't care about the limit. We only care about “any number n”.

It's relevant to contrast that with quadratic variation. This is a limit quantity, and not random.

For a given process, Quadratic variation is a fixed value for a fixed timespan. For processA, Quadratic variation at time=28 could be 0.56; at time=30 it could be 0.6.

In this case, we divide the timespan into many, many (infinite) small intervals. No such fine-division in the discussion on square-integrable-martingales

[13]integer immutability,refcount #python (deep)lesson#1

See gc.get_referrers() and sys.getrefcount() …
Do you know — list Objects are mutable; integer Objects are Immutable? In java, c#, c, int object is Mutable.
—-Now look at integer objects, which are Immutable
i=2 # rebinding
i=1 # rebinding back to original object!

j=i # ref count = 2 on the shared object


—-Now look at lists. Do lists hold items by value or reference? More specifically,
– Add() by pbref? yes according to many posts in this blog and also [[python essential ref]]
– Read() by pbref? Yes. pbclone would make myList[0].edit() meaningless.
– qq(mylist[2] = ….) is by reference? Probably yes, otherwise counter-intuitive.

How do you verify? How about id(mylist[0]). Note

id( i )
arr.append(i) # pbref. Ref count = 2
id( arr[0] )
i=2 # Rebinding! not “content edit”. i now points to a new object, so ref count = 1 on the original object! Counter-intuitive to me.
arr[0] = 2.1 # Rebinding, not “content edit”. arr[0] now points to a new object, so ref count = 0 on the original object “1.9”
id( arr[0] )

Q: is it possible to change the value of arr[0]?
A: arr[0] currently points to an int object. We can rebind this pointer to a “9.89” object. However, if you want to change pointee object content at arr[0], then it depends on the pointee object type. For integer objects, content is immutable, so answer is NOWAY.

process based@BM +! a stoch variance#Ronnie

One of the  Stochastic problems (HW3Q5.2) is revealing (Midterm2015Q6.4 also). We are given
  dX = m(X,t) dt + s(X,t) dBt
where m() and s() can be very complicated  functions. Now look at this unusual process definition, without Xt : 
Appling Ito’s, we notice this function, denoted f(), is a function of t, not a function of Xt, so df/dx = 0. We get
  dY = Yt Xt3 dt

So, There’s no dB term so the process Y has a drift only but no variance. However, the drift rate depends on X, which does have a dB component! How do you square the circle? Here are the keys:
Note we are talking about the variance of the Increment over a time interval delta_t
Key — there’s a filtration up to time t. At time t, the value of X and Y are already revealed and not random any more.
Key — variance of the increment is always proportional to delta_t, and the linear factor is the quasi-constant “variance parameter”. Just like instantaneous volatility, this variance parameter is assumed to be slow-changing. 
(Ditto for the drift rate..)
In this case, the variance parameter is 0. The increment over the next interval has only a drift element, without a random element.

Therefore, the revealed, realized values of X and Y determine the drift rate over the Next interval of delta_t

Riemann ^ stoch integral, learning notes

In a Riemann integral, each strip has an area-under-the-curve being either positive or negative, depending on the integrand’s sign in the strip. If the strip is “under water” then area is negative.

In stochastic integral [1], each piece is “increment   *   integrand”, where both increment and integrand values can be positive/negative. In contrast, the Riemann increment is always positive.

With Riemann, if we know integrand is entirely positive over the integration range, then the sum must be positive. This basic rule doesn’t apply to stochastic integral. In fact, we can’t draw a progression of adjacent strips as illustration of stochastic integration.

Even if the integrand is always positive, the stoch integral is often 0. For an (important) example, in a fair game or a drift-less random walk, the dB part is 50-50 positive/negative.

[1] think of the “Simple Process” defined on P82 by Greg Lawler.

On P80, Greg pointed out
* if integrand is random but the dx is “ordinary” then this is an ordinary integral
* if the dx is a coin flip, then whether integrand is random or not, this is a stoch integral

So the defining feature of a stoch integral is a random increment

maturity bucketing for VaR

[[complete guide]] P 457 pointed out VaR systems often need to aggregate cashflow amounts across different deals/positions, based on the “due date” or “maturity date”.

Example — On 12/31 if there are 33 payable amounts and 88 receivable amounts, then they get aggregated into the same bucket.

I think bucketing is more important in these cases:

  • a bond has maturity date and coupon dates
  • a swap has multiple reset dates
  • most fixed income products
  • derivative products — always has expiry dates

In StirtRisk, I think we also break down that 12/31 one day bucket by currency — 12/31 USD bucket, 12/31 JPY bucket, 12/31 AUD bucket etc.

Q: why is this so important to VaR and other market risk systems? (I do understand it hits “credit risk”.)
%%A: For floating rate products, the cashflow amount on a future date subject to market movements
%%A: FX rate on a future date 12/31 is subject to market movements
%%A: contingent claim cashflow depends heavily on market prices.
%%A: if 12/31 falls within 10D, then 10D VaR would be impacted by the 12/31 market factors

sigma is always about sqrt(variance)

sigma in BM refers to the sqrt of variance parameter, the thingy before the dB

<!–[if gte msEquation 12]>dXt=drift term+ sigma dBt<![endif]–>


sigma in a GBM refers to the thingy before the  dB

<!–[if gte msEquation 12]>dYtYt=drift term+sigma dBt<![endif]–>


In all cases, sigma has the same dimension as the walker variable, such as meter, whereas variance has dimension X2 like meter2 .

bivariate normal — E[X | Y]

See Idea is to decompose X into two parts, a multiple of Y + something independent of Y, like

X’ := the multiple of Y. Specifically, the constant multiplier c is given by rho * sigma_x/sigma_y
X” := X-X’ , the part of X that’s ind of Y.

E[X” | Y] works out to be rather Counter-intuitive. Let’s denote it as Ans.

On one hand, X” is ind of Y, so E[X” | Y] = E[X”] := E[X] – c*E[Y] = 0 – 0. Note the uncond expectations.

On the other hand, E[X” | Y] = E[X – X’|Y] = E[X|Y] – E[cY|Y] but this doesn’t lead to anywhere, since I’m not so skillful.

Actually E[X|Y] = E[X’|Y] = cY

2 multivariat normal variables can be indep

Lawler’s examples show that given iid standard normals Z1, Z2 …, two “composed” random vars “X” and “Y” can be made independent of each other by adjusting their composition multipliers a b c d:

X:= a Z1 + b Z2
Y:= c Z1 + d Z2

(Simplest example — X:= Z1 + Z2 and Y:= Z1 – Z2. See Lawler’s notes P39.
Note X = Y + 2*Z2 so they look like related but actually independent!)

This independence is counter intuitive. I’m stilling look out for an intuitive interpretation.

Note X is never independent of Z1.

For 2 joint normal RVs (and only joint normals), 0 correlation implies independence…. Therefore, we only need to show E[XY] = E[X]E[Y]. In our simple example, RHS = 0*0 and

LHS: E[XY] := E[ (Z1+Z2)(Z1-Z2) ] = E[ Z1 Z1 ] – E[ Z2 Z2 ] = 0, since the 2 terms have identical expectations.

A classic counter-example. There’s a textbook on counter-examples in calculus, in which the authors argued for the importance of counter examples.