# EnumSet^regular enum

[category javaOrphan]
A java enum type usually represents .. (hold your breath) .. a radio-button-group. A variable of this type will bind to exactly one of the declared enum constants.

eg: Continent — there are only 7 declared constants. A Continent variable binds to Africa or Antarctic but not both.
eg: SolarPlanet — there are only 8 declared constants
eg: ChemicalElement — there are only 118 declared constants
eg: ChinaProvince — there are only 23 declared constants

In contrast, enum type has a very different meaning if used within an EnumSet. (I will name a meaning soon). Each enum constant is an independent boolean flag. You can mix and match these flags.

Eg: Given enum BaseColor { Red,Yellow,Blue} we can have only 2^3 = 8 distinct combinations. R+Y gives orange color. R+Y+B gives white color.

Therefore, the BaseColor enum represents the 3 dimensions of color composition.

EnumSet was created to replace bit vector. If your bit vector has a meaning (rare!) then the underlying enum type would have a meaning. Here’s an example from [[effJava]]

Eg: enum Style {Bold, Underline, Italic, Blink, StrikeThrough, Superscript, Subscript… } This enum represents the 7 dimensions of text styling.

[[effJava]] reveals the telltale sign — if the enum type has up to 64 declared constants (only three in BaseColor.java), then entire EnumSet is actually represented as a single 64-bit integer. This proves that our three enum constants are three boolean flags.

# VaR can overstate/understate diversification benefits

 understate the curse of concentration overpraise diversified portfolio mathematically definitely possible probably not correlated crisis yes possible, since VaR treats the tail as a black box. yes. portfolio becomes highly correlated. Not really diversified chain reaction possible. Actually, Chain reaction is still better than all-eggs]1-basket yes. diversification breaks down

Well-proven in academic — VaR is, mathematically, not a coherent risk measure as it violates sub-additivity. Best illustration — Two uncorrelated credit bonds can each have $0 VaR but as a combined portfolio the VaR is non-zero. The portfolio is actually well diversified, but VaR would show risk is higher in the diversified portfolio — illogical, because the individual VaR values are simplistic. Flaw of the mathematical construction of VaR. Even in a correlated crisis, the same could happen — based on probability distribution, individual bond’s 5% VaR is zero but portfolio VaR is non-zero. A$0 VaR value is completely misleading. It can leave a big risk (a real possibility) completely unreported.

[[Complete guide]] P 434 says the contrary — VaR will always (“frequently”, IMHO) say the risk of a large portfolio is smaller than the sum of the risks of its components so VaR overstates the benefit of diversification. This is mathematically imprecise, but it does bring my attention to the meltdown scenario — two individual VaR amounts could be some x% of the $X original investment, and y% of$Y etc, but if all my investments get hit in GFC and I am leveraged, then I could lose 100% of my total investment. VaR would not capture this scenario as it assumes the components are lightly correlated based on history. In this case, the mathematician would cry “unfair”. The (idealized) math model assumes the correlation numbers to be reliable and unchanging. The GFC is a “regime change”, and can’t be modeled in VaR, so VaR is the wrong methodology.

# de-multiplex packets bearing Same dest ip:port Different source

For UDP, the 2 packets are always delivered to the same destination socket. Source IP:port are ignored.

For TCP, if there are two matching worker sockets … then delivered to them. Perhaps two ssh sessions.

If there’s only a listening socket, then both packets delivered to the same socket, which has wild cards for remote ip:port.

# StopB4: arg to range()/slice: simplistic rule

I believe range() and slicing operators always generate a new list (or string)

If you specify a stopB4 value of 5, then “5” will not be produced, because the “generator” stops right before this value.

In the simplest usage, START is 0 and STEP is 1 or -1.

…. If StopB4 is 5 then five integers are generated. If used in a for loop then we enter the loop body five times.

In a rare usage (avoid such confusion in coding test!), STEP is neither 1 or -1, or START is not zero, so StopB4 is a used in something like “if generated_candidate >= StopB4 then exit before entry into loop body”

Code below proves slicing operator is exactly the same. See https://github.com/tiger40490/repo1/blob/py1/py/slice%5Erange.py

word[:2]    # The first two characters, Not including position 2
word[2:]    # Everything except the first two characters
s[:i] + s[i:] equals s
length of word[1:3] is 3-1==2

# (intuitive)derivation of the combination formula

Q1: how many ways to pick 3 boys out of 7 to form a choir?

Suppose we don’t know the 7_choose_3 formula, but my sister said answer is 18. Let’s verify it.

How many ways to line up the 7 boys? 7!

Now suppose the 3 boys are already picked, and we put them in the front 3 positions of the line.

Q2: Under this constraint, how many ways to line up the 7 boys?
A2: In the front segment, there are 3! ways to line up the 3 boys; in the back segment, there are 4! ways to line up the remaining 4 boys. So answer is 3! x (7-3)! = 144

Since there are supposedly 18 ways to pick, then 18 * 144 must equal 7! We find out 18 is wrong answer.

# never exercise American Call (no-div), again

Rule 1: For a given no-dividend stock, early exercise of American call is never optimal.
Rule 1b: therefore, the price is similar to a European call. In other words, the early exercise feature is worthless.

To simplify (not over-simplify) the explanation, it’s useful to assume zero interest rate.

The key insight is that short-selling stock is always better than exercise. Given strike is $100 but the current price is super high at$150.
* Exercise means “sell at $150 immediately after buying underlier at$100”.
* Short means “sell at $150 but delay the buying till expiry” Why *delay* the buy? Because we hold a right not an obligation to buy. – If terminal price is$201 or anything above strike, then the final buy is at $100, same as the Exercise route. – If terminal price is$89 or anything below strike, then the final buy is BETTER than the Exercise route.

You can also think in terms of a super-replicating portfolio, but I find it less intuitive.

So in real markets when stock is very high and you are tempted to exercise, don’t sit there and risk losing the opportunity. 1) Short sell if you are allowed
2) Exercise if you can’t short sell

When interest rate is present, the argument is only slightly different. Invest the short sell proceeds in a bond.

# probability density #intuitively

Prob density function is best introduced in 1-dimension. In a 2-dimensional (or higher) context like throwing a dart on a 2D surface, we have “superstructures” like marginal probability and conditional probability … but they are hard to understand fully without an intuitive feel for the density. Density is the foundation of everything.

Here’s my best explanation of pdf:  to be useful, a bivariate density function has to be integrated via a double-integral, and produce a probability *mass*. In a small region where the density is assumed approximately constant, the product of the density and delta-x times delta-y (the 2 “dimensions”) would give a small amount of probability mass. (I will skip the illustrations…)

Note there are 3 factors in this product. If delta-x is zero, i.e. the random variable is held constant at a value like 3.3, then the product becomes zero i.e. zero probability mass.

My 2nd explanation of pdf — always a differential. In the 1D context, it’s dM/dx. dM represents a small amount of probability mass. In the 2D context, density is d(dM/dx)/dy. As the tiny rectangle “dx by dy” shrinks, the mass over it would vanish, but not the differential.

In the context of marginal and conditional probability, which requires “fixing” X = 7.02, it’s always useful to think of a small region around 7.02. Otherwise, the paradox with the zero-width is that the integral would evaluate to 0. This is an uncomfortable situation for many students.

# beta ^ rho i.e. correlation coeff #clarified

Update: I don’t have a intuitive feel for the definition of rho. In contrast, beta is intuitive, as the slope of the OLS fit

Defining formulas are similar for  beta and rho:

rho   = cov(A,B)/  (sigma_A . sigma_B)
beta = cov(A,B)/  (sigma_B . sigma_B) ,  when regressing A on B
= cov(A,B)/  variance_B

Suppose a high tech stock TT has high beta like 2.1 but low correlation with SPX (representing market return). If we regress TT monthly returns vs the SPX monthly returns, we see a cloud — poor fit i.e. low correlation coefficient. However, the slope of the fitted line through the cloud is steep i.e. high beta !

Another stock ( perhaps a boring utility stock ) has low beta i.e. almost horizontal (gentle slope) but well-fitted line, as it moves with SPX synchronously i.e. high correlation !

http://stats.stackexchange.com/questions/32464/how-does-the-correlation-coefficient-differ-from-regression-slope explains beta vs correlation. Both rho and beta measure the strength of relationship.

Rho is bounded between -1 and +1 so from the value you can get a feel. But rho doesn’t indicate how much (magnitude) the dependent variable moves in response to an one-unit change in the independent variable.

Beta of 2 means a one-unit change in the SPX would “cause” 2 units of change in the stock. However, rho value could be high (close to 1) or low (close to 0).

# left skew~left side outliers~mean PULLED left

Label – math intuitive

[[FRM]] book has the most intuitive explanation for me – negative (or left) skew means outliers in the left region.

Now, intuitively, moving outliers further out won’t affect median at all, but pulls mean (i.e. the balance point) to the left. Therefore, compared to a symmetrical distribution, mean is now on the LEFT of median. With bad outliers, mean is pulled far to the left.

Intuitively, remember mean point is the point to balance the probability “mass”.

In finance, if we look at the signed returns we tend to find many negative outliers (far more than positive outliers). Therefore the distribution of returns shows a left skew.

# BUY a (low) interest rate == Borrow at a lock-in rate

Q: What does “buying at 2% interest rate” mean?

It’s good to get an intuitive and memorable short explanation.

Rule — Buying a 2% interest rate means borrowing at 2%.

Rule — there’s always a repayment period.

Rule — the 2% is a fixed rate not a floating rate. In a way, whenever you buy you buy with a fixed price. You could buy the “floating stream” …. but let’s not digress.

Real, personal, example — I “bought” my first mortgage at 1.18% for first year, locking in a low rate before it went up.

# fwd price@beach^desert – intuitive

Not that easy to develop an intuitive understanding…

Q2.46 [[Heard on the street]]. Suppose the 2 properties both sell for $1m today. What about delivery in x months? Suppose the beach generates an expected (almost-guaranteed) steady income (rental or dividend) of$200k over this period. Suppose there’s negligible inflation over this (possibly short) period.

Paradox: you may feel after x months, the beach would have a spot price around $1m or higher, since everyone knows it can generate income. %%A: there’s no assurance about it. It could be much lower. I feel this is counter-intuitive. There might be war, or bad weather, or big change in supply/demand over x months. Our calculation here is based solely on the spot price now and the dividend rate, not on any speculation over price movements. I guess the fair “indifferent” price both sides would accept is$800k, i.e. in x months, this amount would change hand.
– If seller asks $900k forward, then buyer would prefer spot delivery at$1m, since after paying $1m, she could receive$200k dividends over x months, effectively paying $800k. – If buyer bids$750k forward, then seller would prefer spot delivery.

What would increase fwd price?
* borrowing interest Cost. For a bond, this is not the interest earned on the bond
* storage Cost

What would decrease fwd price?
* interest accrual Income
* dividend Income

# dummy’s PCP intro: replicating portf@expiry→pre-expiry #4IV

To use PCP in interview problem solving, we need to remember this important rule.

If you don’t want to analyze terminal values, and instead decide to analyze pre-expiry valuations, you may have difficulty.

The right way to derive and internalize PCP is to start with terminal payoff analysis. Identify the replicating portfolio pair, and apply the basic principle that

“If 2 portfolios have equal values at expiry, then any time before expiry, they must have equal value, otherwise arbitrage“.

Even though this is the simplest intro to a simple option pricing theory, it is not so straightforward!

# paradox – FX homework#thanks to Brett Zhang

label – math intuitive

Q7) An investor is long a USD put / JPY call struck at 110.00 with a notional of USD 100 million. The current spot rate is 95.00. The investor decides to sell the option to a dealer, a US-based bank, on day before maturity. What is the FX delta hedge the dealer must put on against this option?

Analysis: The dealer has the USD-put JPY-call. Suppose the dealer has USD 100M. Let’s see if a 1 pip change will give the (desired) $0 effect.  at 95.00 at 95.01, after the 1 pip change pnl value (in yen) of the option is same as value of a cash position (110-95)x 100M = ¥1,500M (110-95.01) x 100M = ¥1,499M loss of ¥1M value (in yen) of the USD cash 95 x 100M = ¥9,500M 95.01 x 100M = ¥9,501M gain of ¥1M value of Portfolio 0 Therefore Answer a) seems to work well. Next, look at it another way. The dealer has the USD-put JPY-call struck at JPYUSD=0.0090909. Suppose the dealer is short 11,000M yen (same as long USD 115.789M). Let’s see if a 1 pip change will give the (desired)$0 effect.

 at 95.00 i.e. JPYUSD=0.010526 at 95.01 i.e. JPYUSD=0.0105252, after the 1 pip change pnl value (in USD) of the option is same as value of a cash position (0.010526-0.009090)*11000M = $15.78947M (or ¥1500M, same as table above) (0.0105252-0.009090)*11000M=$15.77729M (or ¥1498.842M) loss of $0.012187M value (in USD) of the short 11,000M JPY position -0.010526 * 11000M= -$115.789M -0.0105252*11000M = -$115.777M gain of$0.012187M (or ¥1.1578M) value of Portfolio 0 Therefore Answer b) seems to work well.

My explanation of the paradox – the deep ITM option on the last day acts like a cash position, but the position size differs depending on your perspective. To make things obvious, suppose the strike is set at 700 (rather than 110).

1) The USD-based dealer sees a (gigantic) ¥70,000M cash position;

# Pr(S_T > K | S_0 > K and r==0), intuitively

The original question — “Assuming S_0 > K and r = 0, denote C := time-0 value of a binary call. What happens to C as ttl -> 0 or ttl -> infinity. Is it below or above 0.5?”

C = Pr(S_T > K), since the discounting to PV is non-issue. So let’s check out this probability. Key is the GBM and the LN bell curve.

We know the bell curve gets more squashed [1] to 0 as ttl -> infinity. However, E S_T == S_0 at all times, i.e. average distance to 0 among the diffusing particles is always equal to S_0. See http://bigblog.tanbin.com/2013/12/gbm-with-zero-drift.html

[1] together with the median. Eventually, the median will be pushed below K. Concrete illustration — S_0 = $10 and K =$4. As TTL -> inf, the median of the LN bell curve will gradually drop until it is below K. When that happens, Pr (S_T > K) 0 as ttl -> infinity.

——–
ttl -> 0. The particles have no time to diffuse. LN bell curve is narrow and tall, so median and mean are very close and merge into one point when ttl -> 0. That means median = mean = S_0.

By definition of the median, Pr(S_T > median) := 0.5 so Pr(S_T > S_0) = 0.5 but K is below S_0, so Pr(S_T > K) is high. When the LN bell curve is a thin tower, Pr(S_T > K) -> 100%

# stoch Process^random Variable: !! same thing

I feel a “random walk” and “random variable” are sometimes treated as interchangeable concepts. Watch out. Fundamentally different!

If a variable follows a stoch process (i.e. a type of random walk) then its Future [2] value at any Future time has a Probability  distribution. If this PD is normal, then mean and stdev will depend on (characteristics of) that process, but also depend on the  distance in time from the last Observation/revelation.

Let’s look at those characteristics — In many simple models, the drift/volatility of the Process are assumed unvarying[3]. I’m not familiar with the more complicated, real-world models, but suffice to say volatility of the Process is actually time-varying. It can even follow a stoch Process of its own.

Let’s look at the last Observation — an important point in the Process. Any uncertainty or randomness before that moment is  irrelevant. The last Observation (with a value and its timestamp) is basically the diffusion-start or the random-walk-start. Recall Polya’s urn.

[2] Future is uncertain – probability. Statistics on the other hand is about past.
[3] and can be estimated using historical observations

Random walk isn’t always symmetrical — Suppose the random walk has an upward trend, then PD at a given future time won’t be a nice  bell centered around the last observation. Now let’s compare 2 important random walks — Brownian Motion (BM) vs GBM.
F) BM – If the process is BM i.e. Wiener Process,
** then the variable at a future time has a Normal distribution, whose stdev is proportional to sqrt(t)
** Important scenario for theoretical study, but how useful is this model in practice? Not sure.
G) GBM – If the process is GBM,
** then the variable at a future time has a Lognormal distribution
** this model is extremely important in practice.

# GBM + zero drift

I see zero-drift GBM in multiple problems
– margrabe option
– stock price under zero interest rate
For simplicity, let’s assume X_0 = $1. Given dX =σX dW …GBM with zero drift-rate Now denoting L:= log X, we get dL = – ½ σ2 dt + σ dW … BM not GBM. No L on the RHS. Now L as a process is a BM with a linear growth (rather than exponential growth). LogX_t ~ N ( logX_0 – ½ σ2t , σ2t ) E LogX_t = logX_0 – ½ σ2t ….. [1] => E Log( X_t / X_0) = – ½ σ2t …. so expected log return is negative? E X_t = X_0 …. X_t is a log-normal squashed bell where x-axis extends from (0 to +inf) [3]. Look at the lower curve below. Mean = 1.65 … a pivot here shall balance the “distributed weights” Median = 1.0 …half the area-under-curve is on either side of Median i.e. Pr(X_t < median) = 50% Therefore, even though E X_t = X_0 [2], as t goes to infinity, paradoxically Pr(X_t<X_0) goes to 100% and most of the area-under-curve would be squashed towards 0, i.e. X_t likely to undershoot X_0. The diffusion view — as t increases, more and more of the particles move towards 0, although their average distance from 0 (i.e. E X_t) is always X_0. Note 2 curves below are NOT progressive. The random walker view — as t increases, the walker is increasingly drawn towards 0, though the average distance from 0 is always X_0. In fact, we can think of all the particles as concentrated at the X_0 level at the “big bang” of diffusion start. Even if t is not large, Pr(X_t 50%, as shown in the taller curve below. [1] horizontal center of of the bell shape become more and more negative as t increases. [2] this holds for any future time t. Eg: 1D from now, the GBM diffusion would have a distribution, which is depicted in the PDF graphs. [3] note like all lognormals, X_t can never go negative # differential ^ integral in Ito’s formula See posts on Ito being the most precise possible prediction. Given dynamics of S is dS = mu dt + sigma dW , and given a (process following) a function f() of S, then, Ito’s rule says df = df/dS * dS + 1/2 d(df/dS)/dS * (dS)^2 There are really 2 different meanings to d____ – The df/dS term is ordinary differentiation wrt to S, treating S as just an ordinary variable in ordinary calculus. – The dt term, if present, isn’t a differential. All the d__ appearing outside a division (like d_/d__) actually indicates an implicit integral. ** Specifically, The dS term (another integral term) contains a dW component. So this is even more “unusual” and “different” from the ordinary calculus view point. # signal-noise ^ predictive formula – GBM The future price of a bond is predictable. We use a predication formula like bond_price(t) = …. The future price of a stock, assumed GBM, can be described by a signal-noise formula S(t) = This is not a prediction formula. Instead, this expression says the level of S at time t is predicted to be a non-random value plus a random variable (i.e. a N@T) In other words, S at time t is a noise superimposed on a signal. I would call it a signal-noise formula or SN formula. How about the expectation of this random variable S? The expectation formula is a prediction formula. # Pr(random pick from [0,1] is rational)==0 14 Sep 2013, 02:52 Hi Prof Fefferman, I understand the measure of a set can be loosely described as the length (in a 1D space) of the interval. Given the set of all rational numbers between 0 and 1, its length is … 0, as you revealed very early on. I felt you were laying out and building up towards (a rather sophisticated definition of) probability. Here’s my guess – Between 0 and 1 “someone” picks a number X. It is either a rational or irrational number. The chance of X being rational is 0, because the measure of the set of rational numbers (call it R1) is 0, and the measure of the irrational set (R2) is 1. Therefore Pr (picking an irrational X | X is in [0,1]) = 100% How many members are in R1? Infinite, but R2 is infinitely larger. If only 1 electron in the solar system has a special spin, then the Pr (picking an electron with that special spin out of all solar system electrons) would be close to 0. With R1 and R2, the odds are even lower, R2 size is infinitely larger than R1, so the Pr (picking a rational) = 0. However, we humans only see all the millions and trillions of rational numbers between 0 and 1. We don’t see too many irrational numbers. Therefore I said “someone”, perhaps a Martian with some way to see the irrational numbers. This Martian would see few rational numbers sandwiched between far more irrational numbers, so few that they are barely visible. Given the irrationals dominate the rationals in such overwhelming proportion, the chance of picking a rational is 0. # 0 probability ^ 0 density, 1st look Given a simple uniform distribution over [0,10], we get a paradox that Pr (X = 3) = 0. http://mathinsight.org/probability_density_function_idea explains it, but here’s the way I see it. Say I have a correctly programmed computer (a “noisegen”). Its output is a floating point number, with as much precision as you want, say 99999 deciman points, perhaps using 1TB of memory to represent a single output number. Given this much precision, the chance of getting exactly 3.0 is virtually zero. In the limit, when we forget the computer and use our limitless brain instead, the precision can be infinite, and the chance of getting an exact 3.0 approaches zero. http://mathinsight.org/probability_density_function_idea explains that when the delta_x region is infinitesimal and becomes dx, f(3.0) dx == 0 even though f(3.0) != 0. Our f(x) is the rate-of-growth of the cummulative distribution function F(x). f(3.0)dx= 0 has some meaning but it doesn’t mean there’s a zero chance of getting a 3.0. In fact, due to continuous nature of this random variable, there’s zero chance of getting 5, or getting 0.6 or getting a pi, but the pdf values at these points aren’t 0. What’s the real meaning when we see the prob density func f(), at the 3.0 point is, f(3.0) = 0.1? Very loosely, it gives the likelihood of receiving a value around 3.0. For our uniform distribution, f(3.0) = f(2.170) = f(sqrt(2)) = 0.1, a constant. The right way to use the pdf is Pr(X in [3,4] region) = integral over [3,4] f(x)dx. We should never ask the pdf “what’s the probability of hitting this value”, but rather “what’s the prob of hitting this interval” The nonsensical Pr(X = 3) is interpeted as “integral over [3,3] f(x)dx”. Given upper bound = lower bound, this definite integral evaluate to zero. As a footnote, however powerful, our computer is still unable to generate most irrational numbers. Some of them have no “representation” like pi/5 or e/3 or sqrt(2), so I don’t even know how to specify their position on the [0,1] interval. I feel the form-less irrational numbers far outnumber rational numbers. They are like the invisible things between 2 rational numbers. Sure between any 2 rationals you can find another rational, but within the new “gap” there will be countless form-less irrationals… Pr(a picked number [0,1] is rational)=0 # c#multicast event field=newsletterTitle=M:1 relationship This is a revisit to the “BB” in the post on two “unrelated” categories of delegate — BB) event field Backed by a multicast delegate instance If a class defines 2 events (let’s assume non-static), we can think of them as 2 newsletter titles both owned by each class Instance. Say Xbox newsletter has 33 subscribers, and WeightWatcher newsletter has 11 subscribers. If we have 10 classes instances, theen 440 subscriptions. In general, Each newsletter title (i.e. event field) has * exactly 1 owner/broadcaster i.e. the class instance * 0 or more subscribers, each a 2-pointer wrapper, as described in other posts on delegates. You can say each newsletter title (i.e. each event field) defines a M:1 relationship. Forgive me for repeating the obvious — don’t confuse an event Field vs an event Firing. The Xbox newsletter can have 12 issues (12 event firings) a year, but it’s all under one newsletter title. # greeks on the move – intuitively When learning option valuations and greeks, people often develop quick reflexes about what-if’s. Even a non-technical person can develop some of these intuitions. Because these are quick and often intuitive, this knowledge is often more practical and useful than the math details. Some of these observations are practically important while others are obscure. Q3: How would all indicators of an ATM instrument move when underlier rises/falls? QQ: What if the instrument has very low/high volatility? QQ: What if the instrument is far/close to expiry? Q5: How would all indicators of a deep OTM (deep ITM is rare) instrument move when underlier moves towards/from strike? QQ: What if the instrument has very low/high volatility? QQ: What if the instrument is far/close to expiry? Q7: How would all indicators of a deep-OTM/ATM instrument move when sigma_imp rises/falls? QQ: What if the instrument has very low/high volatility? QQ: What if the instrument is far/close to expiry? Q9: How would all indicators of a deep-OTM/ATM instrument move when approaching maturity? QQ: What if the instrument has very low/high volatility? “Indicators” include all greeks and option valuation. The “instrument” can be a European/American call/put/straddle. # matrix multiplying – simple, memorable rules Admittedly, Matrix multiplication is a cleanly defined concept. However, it’s rather non-intuitive and non-visual to many people. There are quite a few “rules of thumb” about it, but many of them are hard to internalize due to the abstract nature. They are not intuitive enough to “take root” in our mind. I find it effective to focus on a few simple, intuitive rules and try to internalize just 1 at a time. Rule — a 2×9 * 9×1 is possible because the two “inside dimensions” match (out of the 4 numbers). Rule — in many multiplication scenarios, you can divide-and-conquer the computation process BY-COLUMN — A vague slogan to some students. It means “work out the output matrix column by column”. It turns out that you can simply split a 5-column RHS matrix into exactly 5 columnar matrices. Columnar 2 (in the RHS matrix) is solely responsible for Column 2 in the output matrix. All other RHS columns don’t matter. Also RHS Column 2 doesn’t affect any other output columns. You may be tempted to try “by-row”. I don’t know if it is valid, but it’s not widely used. By-column is useful when you represent 5 linear equations of 5 unknowns. In this case, the RHS matrix comprises just one column. Rule — Using Dimension 3 as an example, (3×3 square matrix) * (one-column matrix) = (another one-column matrix). Very common pattern. # option valuations – a few more intuitions It’s quite useful to develop a feel for how much option valuation moves when underlier spot doubles or halves. Also, what if implied vol doubles or halves? What if TTL (time to expiration) halves? For OTM / ITM / any option, annualized i-vol multiplied by TTL is the real vol. For example, If you double vol and half TTL twice, valuation remains unchanged. If you compare a call vs a put with identical strike/expiry (E or A style), the ITM instrument and the OTM instrument have identical time value. Their valuations differ by exactly the intrinsic value of the ITM instrument. (See http://www.cboe.com/LearnCenter/OptionCalculator.aspx.) — Consistent with European option’s PCP, but to my surprise, American style also shows this exact relationship. I guess it’s because the put valuation is computed from a synthetic put (http://25yearsofprogramming.com/blog/20070412.htm). For ATM options, theoretical option valuation is proportional to both vol and TTL, i.e. time-to-live. http://www.cboe.com/LearnCenter/OptionCalculator.aspx and other calculators show that – when you change the vol number, valuation changes linearly – when you double TTL while holding vol constant, valuation grows quadratically. For OTM options? non-linear For ITM options, it’s approximately the OTM valuation + intrinsic value. # intuitive – quick reflex with option-WRITING, again See also http://bigblog.tanbin.com/2011/04/get-intuitive-with-put-option.html. Imprecisely — + writing a call, I guarantee to “give” IBM when my counter-party “calls away” the asset. + writing a put, I guarantee to “take in” the dump, unloaded by the counter-party. Put holder has the right to “unload” the asset (IBM share) at a fixed price — a high price perhaps [1] An in-out intuitive reflex — – If I write a call, I must give OUT assets when option holder calls IN the asset; – If I write a put, I must take IN when option holder “throws OUT” the junk [1] in reality, put buyers usually buy puts at low strikes (OTM) therefore cheaper insurance. # underlying price is equally likely to +25% or -20% See also P402 [[CFA textbook on stats]] http://www.hoadley.net/options/bs.htm says Black-Scholes “model is based on a normal distribution of underlying asset returns which is the same thing as saying that the underlying asset prices themselves are log-normally distributed.”. Actually, many non-BS models also assume the same, but my focus today is the 2nd part of the sentence. At expiration, the asset has exactly one price as reported on WSJ. However, if we simulate 1000 experiments, we get 1000 (non-unique) expiration prices. If we plot them in a __histogram__, we get a kind of bell curve. But in Black-Schole’s (and other people’s) simulations, the curve will resemble a log-normal bell. Reason? ….. Well, they tweak their simulator according to their model. They assume underlying price is a random walker taking many small steps, whose probability of reaching 125% equals probability of dropping to 80% at each step. (But remember the walks are tiny steps, so 80% is huge;) Now the reason behind the paradoxical numbers — log(new_px/old_px) is normally distributed, so log(1.25)=0.97 and log (0.8)= – 0.97 are equally likely. Now if we do 1000 experiments and compute the log(price_relative), we get another histogram – a normal (NOT log-normal) curve. Note Price-relative is the ratio of new_Price / old_Price over a holding period. Here’s Another experiment to illustrate log-normal. Imagine a volatile stock (say SUN) price is now$64. How about after a year ? Black-Scholes basically says it’s

equally likely to double or half.

Double to $128 or half to$32. log2(new_Price / old_Price) would be 1 or -1 with equal likelihood. Intuitively,

log (new_Price / old_Price) is normally distributed.

Now consider prices after Year1, Year2, Year3… log2(S2/currentPx) = log2(S2/S1  *  S1/currentPx) = log2(S2/S1) + log2(S1/currentPx). In English this says base-2 log of overall price-relative is sum of the log of annual price-relatives. Among the 3 possible outcomes below, the $256 likelihood equals the$16 likelihood, and is 50% the $64 likelihood. double-double ->$256
double-half -> $64 unchanged half-double ->$64 unchanged
half-half -> $16 This stock can also appreciate/drop to other values beside$256,$64,$16, but IF the $256 likelihood is 1.71%, then so is the$16 likelihood, and the $64 likelihood would be 3.42%. We assume no other price “path” will end up at$64 — an unsound assumption but ok for now.

Since log(S2/S1) is normally distributed, so is the sum-of-log. Therefore log(S2/currentPx) is normally distributed.

log(price-relative) is normal.
log(cumulative price-relative) is normal for any number of intervals. For example,

Price_After_2years/current_Price is equally likely to double or half.
Price_After_2years/current_Price is equally likely to grow to 125% or drop to 80%.

More realistic numbers — when we shrink the interval to 1 day, the expected price relative looks more like

“equally likely to hit 101.0101% or drop to 99%”

# how bell+hockey stick move when vol drops or expiry nears

Q: how the bell and the hockey stick move when vol drops or expiration approaches?

Not complicated, but we need to Develop quick intuitions about these graphs. I feel these graphs are the keys to the mathematics.

Anyway, here are My Answers —

The Lognormal bell [1] Tightens as ttl [2] drops, assuming
– constant annualized vol therefore
– falling stdev (i.e. vol scaled-down for the shrinking TTL).

“Lognormal bell tightening” indicates lower stdev. stdev is basically the “scaled-down” vol.

Also, when ttl drops, the PnL graph drops towards the hockey stick. The hockey stick means 0 vol OR 0 ttl.

[1] the bell shape is skewed because lognormal isn’t symmetrical.
[2] TimeToLive, aka time to maturity or time to expiration.

# option rule – delta converges to 50/50 with increasing vol

Better develop reflexes — Across all maturities, all ITM/OTM options’ delta would converge towards 50% when perceived and implied volatility intensifies. Option premium rises.

50 delta means ATM.

50 delta also means no-prediction about my option finishing ITM or OTM. When vol spikes, it becomes harder for “gamblers” to assess any given strike — will it finish ITM or OTM?

Let’s use a put for illustration. When underlier becomes very volatile,
– a previously deep OTM (hopeless) suddenly looks a useful insurance protection. eg – A ultra-low-strike put.
– a previously deep ITM (sure-win) suddenly looks “unsafe” — may finish worthless.

Rule) At expiry, underlier volatility doesn’t bother us and is treated as 0
Rule) At expiry, option delta == either 0 or 1/-1 never something else. Fully diverged
Rule) In general, 0 implied volatility means all options’ deltas == either 0 or 100%
Rule) Similarly low implied volatility means all options’ deltas are close to the 2 extremes.

# low delta always means OTM, intuitively

(Low delta means low absolute magnitude. The sign of delta is a separate feature.)

The more OTM, the less sensitive to underlier moves — low delta.

The more ITM, the more stock-like — high delta. This holds for both calls and puts.

For a call holder, the most stock-like is a delta of 100%
For a put holder, the most stock-like is a delta of -100% i.e. a short stock

For a put Writer, the most stock-like is a delta of +100% i.e. long stock. This put is so deep ITM it will certainly be exercised (unload/put to the Writer), so the put writer effectively owns the underlier.

On FX vol smile curve, people quote prices at low-strike points and high strike points both using low deltas like 25 delta or 10 delta. (The 50 delta point is ATM).

– On the low-strike side, they use an OTM Put. Eg a put on USD/JPY struck@55. Such a put is clearly OTM since as of today option holder will not “unload” her USD (the silver) at a dirt cheap price of 55 yen.

– On the high-strike side, they use an OTM Call. Eg a call on USD/JPY @140. Such a call is clearly OTM, since as of today option holder will not buy (“call in”) USD (the silver) at a sky high price of 120 yen.

# repos intuitively: resembles a pawn shop

Borrower (“seller”) needs quick cash, so she deposits her grandma’s necklace + IBM shares + 30Y T bonds .. with the lender i.e. buyer of the necklace. Unlike pawn shops, the 2 sides agree in advance to return the necklace “tomorrow”.

Main benefit to borrowers — repo rate is cheaper than borrowing from a bank.

haircut – the (money) lender often demands a haircut. Instead of lending $100m cash for a$100m collateral, he only hands out \$99m.

requester – is usually the borrower. She needs money so she must compromise and accept the lender’s demand.

trader – is usually the borrower. Often a buy-side, who buys the security and needs money to pay for it. (The repo seller could be considered a trader too.)

Repo maturity is 1 day to 3M. Strictly a money market instrument.

Common collateral for most repos — Government securities are the main collateral for most repos, along with agency securities, mortgage-backed securities, and other money market instruments.

For every repo, someone has a “reverse-repo” position. In every repo deal, there’s a borrower and a lender; there’s a repo position on one side and a reverse-repo position on the other side of the fence.

Is repo part of credit business or rates business? Depends on the underlier. Part of the repo business is credit. Compare an ECN – can trade Treasuries and credit bonds.

UChicago Jeff’s assignment question is the most detailed numerical repo illustration I know of. Another good intro is http://thismatter.com/money/bonds/types/money-market-instruments/repos.htm