Fwd: comparing exp( (a+b)/2 ) vs 0.5 exp(a) + 0.5 exp(b)

How about Jensen’s inequality?

Hi Richard (Qu Miao),

brain teaser — compare
A = exp( (a+b)/2 )    vs
B = 0.5 exp(a) + 0.5 exp(b)
Here’s my solution. See if it is correct.
Denote f = 2B/A. So f = exp(.5a  – .5b)  +  exp(.5b  – .5a) ….. symmetry
Denote x = .5a – .5b , so f(x) = exp(x) + exp(-x).
Now f(x) curve goes to infinity on both sides. So f(x) has minimum value of 2 occurring at x=0.
That means 2B/A has a minimum value of 2. In other words, B >= A

negative beta, sharpe, treynor

corr=1 means perfect positive corr, but doesn't tell us whether a 1 unit increase in X causes a 0.001 or 1000 units increase in Y.

When we compare returns of a fund or stock vs a stock index, we are interested in the relative size of change or “magnifying

effect”. Beta helps here.

A “normal” beta close to 1.0 means when mkt grows[1] 5%, then ibm also grows about 5%. Note this growth is fast-changing. All prices

are volatile. As shown in other posts on beta, many other CAPM variables are not volatile, but could be slow-changing.

[1] assumeing low risk-free rate, so excess return and “return” are practically no-different.

A large beta like 1.5 is more volatile. A “magnifier” stock such as tech stocks. A 5% drop in the index is likely to see a 7.5% drop

in this asset.

Beta < 1 means a "stable" stock that moves in-sync with the market but at very low magnitude.

Negative beta means short positions or something else.

A negative Sharpe ratio indicates your fund underperforms risk-less asset (like a gov bond in your fund's currency). The denominator

(std of the fund return), be it large or small, isn't responsible for this negativity.

Treynor Ratio is negative if

case1: if beta is positive, then fund underperforming risk-free rate.

case2: if beta is negative, then fund outperforming risk-free rate. This means that the fund manger has performed well, managing to

reduce risk but getting a return better than the risk free rate

risk-neutral probability – basics

Simplest defining example of RNP : say a coin flip pays $1mil if H and 0 if T and the consensus market price is $400k. The RN prob inferred from the market prices is Pr(H) = 40%.

Another defining example of RNP — Suppose IBM price tomorrow can only be either $200 or $198, and current spot is $198.5, then we can back out the RN Pr(up). This prob distro is different from the “physical” distro.

We don’t know the physical prob. We assume the market price is a fair price, so we use the implied RN Prob as a fair estimate of the physical prob.

What if we know (via the coin manufacturer) the physical prob is 50/50? Well, the real people composing the market are risk averse so they are only willing to pay, in general, 400k. I guess the RNP is still 40%. In financial markets I don’t think anyone knows the physical prob. The most reliable way to estimate the physical prob is through the RNP.

Another defining example of RNP: (Roger’s P 2.28) Stock value at T1 is 115, and at Termination can rise to 150 or drop to 100. Using just these 3 numbers and 1 interval, we can derive the RNP(up | S=115 at T1). To keep things simple, we will assume the market has a consensus on the probabilities of up/down.

Next, wrap your mind around this unusual condition — that the terminal value ($150 and $100) are fixed and at Termination the stock cannot take on any value in between. This is like a coin or dice. The only unknown is the probability, not the possible values.

We can therefore infer RN P(up) = 30% as if all traders in the market all agreed on this 30%.

Note the current price of 115 is result of market adjusting to any new info. We can say the current price already reflect the RN P(up)

In the original example, at T1 the stock can also reach $75. On this branch of the tree, the Termination value is either 100 or 50. The RN P(up | S = 75 at T1) = 50%, different from the 30%.

This is another important feature of this model – the RNP depends not only on the stage we are at, but also on the information revealed so far. You can imagine the noisegen is adaptive.

professional option traders

Professionals sell calls and puts. (Retail investors buys them.). These are “High probability” trades, i.e. high chance of profit. Given this zero-sum game, it follows that the option-buyers do low-probability trades. This isn't a risk-neutral world. Retail is risk-averse.

There are real risks that the option could get exercised, so the option sellers always need some protection.

compressed content in a http response

Now I feel an http response may be a zip containing multiple files. The response “body” will be an compressed bytes array. (To avoid confusion, I will call this a “zip” rather than a “file”.) When you parse these bytes, you may see multiple zip entries.

If you assume the entire zip is a single file and try to decompress/deflate it, it might fail. The output may be empty.

The http response also contains useful response headers. One of the headers would be content-type. The gzip and zip types seem to require different parsers.

iid assumption in cumulative return

Time diversification? First look at asset diversification. Split $200k into 2 uncorrelated investments so when one is down, the other might be up. Time-div assumes we could add up the log returns of perod1 and period2. Since the 2 values are two N@Ts and very likely non-perfectly-correlated (i.e. corr < 1.0), one of them might cushion the other.


Background — the end-to-end (log) return over 30 years is (by construction) sum of 30 annual returns —


r_0to1 is a N@T from noisgen1 with mu and sigma

r_1to2 is a N@T from noisgen2.

r_29to30 is a N@T


So the sum r_0to30 (denoted r) is also a random var with a distribution. Without assuming normality of noisegen1, if the 30 random variables are IID, then the sum would follow a normal distribution with E(r) = 30mu and stdev(r) = sigma * sqrt(30)


This is a very important and widely used result, at the heart of a lot of quizzes, a lot of financial data. However, the underlying IID assumption is controversial.


* The indep assumption is not too wrong. Stock return today is not highly correlated with yesterday's. Still AR(1) models include preceding period's return …. Harmless.

* The ident assumption is more problematic. We can't go back in time to run the noisegen1 again, but there are data to prove that the ident assumption is not supported by real data.


Here's my suggestion to estimate noisegen1's sigma. Look at log return. r_day1to252 = r_day1to2 + r_day2to3 + … + r_day251to252. Assuming the 252 daily return values are a sample of a single noisegenD, we can estimate noisegenD's mean and stdev, then derive the stdev of r_day1to252. This stdev is the stdev of noisegen1.


hetero-skeda-sticity, another example

[[Prem Mann]] has an example of explaining food expenditure using household income. The homo-skeda-sticity assumption on P592 is something like

   “the dispersion among expenditure of low-income households is, say, 21.1. The dispersion among high-income households is that same value. Here we mean the dispersion among the residual values i.e. the unexplained portion of expenditure.”

P598 further clarifies that the POPULATION “Spread of errors” at a given income level is a different quantity than that of the SAMPLE.

Note this is an assumption about the population not a sample. Suppose 5 income levels. A small sample having just 2 households per level (10 households in entire sample) will be too small, and is very likely to show inconsistent dispersions at low-income vs high-income.

Needless to say Dispersion is measured by stdev.

This book has some nice diagrams about the dispersions at 2 income levels.

Fwd: strength of correlation

“uncorrelated”, “strongly correlated” … I hear these terms as basic concepts. Good to get some basic feel

1) One of the first “sound bites” is the covariance vs correlation definitions. I like http://en.wikipedia.org/wiki/Covariance_and_correlation. Between 2 series of data (X and Y), covariance can be a very small or large num (like 580,189,272billion), which can’t possibly reveal the strength of correlation between X and Y. In contrast, the correlation number (say, 0.8) is dimentionless, and
has a value between -1 and 1. This is intuitive.
* linearly correlated means close to +1 or -1
* uncorrelated means 0. X/Y are Independent => corr = 0 http://en.wikipedia.org/wiki/Correlation_and_dependence shows that perfectly dependent pairs like Y=X^2 could have 0correlation.)

** independence is sufficient but unnecessary condition for 0 correlation. 
** 0 correlation is necessary but insufficient condition for independence. 

2) r-squared is a standard measure of the goodness of a linear regression model. In a univariate regression of y on x, r-square is corr2(x,y). High r-square like 0.99 indicates a large part of Y variation is explained by X.
3) Below I feel these are 2 similar definitions of the corr coeff. Formally, this number is the Linear correlation between two variables X and Y.

A) For the entire population,
B) For a sample taken from the population,


, which is is identical to the r definition on P612 [[Prem Mann]] —

           , where SS stands for sum-of-sqaures

B2) An equally useful formula of SS is 
        SSxy =

SSx or SSy or SSxy — all similar

drift under a given measure (but +! dividing by its numeraire)

See post on using cash numeraire.

I think we can assume for each numeraire, there’s just one [1] probability measure. That measure defines the probability distribution of any price process.  We can use that measure to evaluate expectations, to talk about Normal/Lognormal or dW, and to evaluate “exponential” drift (the “m” below), assuming

                dX = m X dt
Under the standard risk-neutral measure, the exponential drift is the same ( =r ) for all TRADEABLE assets, even though physical drift rates are not uniform. Specifically, the bank account itself (paying exponential short rate r) has a drift = r. So does the discount bond. So does a stock. So does a fwd contract. So does a vanilla call or binary call. So does an asset-or-nothing call.
At this point, we don’t need to worry about martingale or numeraire, though all the important results come from numeraire/MG reasoning.
I feel it’s important to remember drift is a __prediction__ about the future. It’s inherently based on some assumed probability distribution i.e. a probability measure. That probability distribution is derived from many live prices about T-expiry contracts.
Therefore, under another predicative probability distribution/measure, the predicted drift would differ.
The stock-measure is trickier. Take IBM. There exists an IBM measure. Under this measure, i.e. operating under this new (predictive) probability distribution, we can derive the (predicted) exponential drift rate of any asset’s price movement. Specifically, we can work out the predicted drift of the IBM price process. That drift is r + sigma^2, where

r:= exponential drift rate of the bank account i.e. money-market account. Consider it a physical drift but actulaly this is non-random and the same drift speed under any measure
Sigma:= the volatility of IBM. Same value under any measure.
[1] there might exists multiple, but I don’t bother.

Pr(random pick from [0,1] is rational)==0

14 Sep 2013, 02:52

Hi Prof Fefferman,

I understand the measure of a set can be loosely described as the length (in a 1D space) of the interval. Given the set of all rational numbers between 0 and 1, its length is … 0, as you revealed very early on. I felt you were laying out and building up towards (a rather sophisticated definition of) probability. Here’s my guess –

Between 0 and 1 “someone” picks a number X. It is either a rational or irrational number.  The chance of X being rational is 0, because the measure of the set of rational numbers (call it R1) is 0, and the measure of the irrational set (R2) is 1. Therefore Pr (picking an irrational X | X is in [0,1]) = 100%

How many members are in R1? Infinite, but R2 is infinitely larger. If only 1 electron in the solar system has a special spin, then the Pr (picking an electron with that special spin out of all solar system electrons) would be close to 0. With R1 and R2, the odds are even lower, R2 size is infinitely larger than R1, so the Pr (picking a rational) = 0.

However, we humans only see all the millions and trillions of rational numbers between 0 and 1. We don’t see too many irrational numbers. Therefore I said “someone”, perhaps a Martian with some way to see the irrational numbers. This Martian would see few rational numbers sandwiched between far more irrational numbers, so few that they are barely visible. Given the irrationals dominate the rationals in such overwhelming proportion, the chance of picking a rational is 0.

[[Hull]]estimat`default probability from bond prices#learning notes

If we were to explain to people with basic math background, the

arithmetic on P524-525 could be expanded into a 5-pager. It's a good

example worth study.

There are 2 parts to the math. Using bond prices, Part A computes the

“expected” (probabilistic) loss from default to be $8.75 for a

notional/face value of $100. Alternatively assuming a constant hazard

rate, Part B computes the same to be $288.48*Q. Equating the 2 parts

gives Q =3.03%.

Q3: How is the 7% market yield used? Where in which part?

Q4: why assume defaults happen right before coupon date?

%%A: borrower would not declare “in 2 days I will fail to pay that

coupon” because it may receive help in the 11th hour.

–The continuous discounting in Table 23.3 is confusing

Q: Hull explained how the 3.5Y row in Table 23.3 is computed. But Why

discount to the T=3.5Y and not discounting to T=0Y ? Here's my long


The “risk-free value” (Column 4) has a confusing meaning. Hull

mentioned earlier a “similar risk-free bond” (a TBond). Right before

the 3.5Y moment, we know this risk-free bond is scheduled to pay all

cash flows at future times T=3.5Y, 4Y, 4.5Y, 5Y. That's 4 coupons +

principal. We use risk-free rate 5% to discount all 4+1 cash flows to

T=3.5Y. We get $104.34 as the value of the TBond cash flows

“discounted to T=3.5Y”

Column 5 builds on it giving the “loss due to default@3.5Y, discounted

to T=3.5Y”. Iin Column 6, This value is further discounted from 3.5Y

to T=0Y.

Part B computes a PV relative to the TBond's value. Actually Part A is

also relative to the TBond's value.

In the model of Part B, there are 5 coin flips occurring every

mid-year at T=0.5Y 1.5Y 2.5Y 3.5Y 4.5Y with Pr(default_0.5) =

Pr(default_1.5) = … = Pr(default_4.5) = Q. Concretely, imagine that

Pr(flip = Tail) is 25%. Now Law of total prob states

100% = Pr(d05) + Pr(d15) + Pr(d25) + Pr(d35) + Pr(d45) + Pr(no d). If

we factor in the amount of loss at each flip we get

Pr(d05) * $65.08 + Pr(d15) * $61.20 + Pr(d25) * $57.52 + Pr(d35) *

$54.01 + Pr(d45) * $50.67 + Pr(no d, no loss) + $0 == $288.48*Q