hazard rate – online resources

is decent —

Average failure rate is the fraction of the number of units that fail during an interval, divided by the number of units alive at the beginning of the interval. In the limit of smaller time intervals, the average failure rate measures the rate of failure in the next instant for those units surviving to time t, known as instantaneous failure rate.

is more mathematical.

http://www.omdec.com/articles/reliability/TimeToFailure.html has short list of jargon


ccy swap^ IRS^ outright FX fwd

Ccy swap vs IRS

– Similar – exchange interest payments

– diff — Ccy swap requires a final exchange of principal. FX rate is set on deal date

Ccy swap vs outright fx fwd?

– diff — outright involves no interest payments

– similar — the far-date principal exchange has an FX rate. Rate is set on deal date

– diff — the rate is the spot rate on deal date for ccy swap, but in the outright deal, the fwd rate as of the deal date

Ccy swap vs FX swap? Less comparable. Quite a confusing comparison.

hazard rate – my first lesson

Imagine credit default is caused only by a natural disaster (say hurricane or tsunami). For a brief duration ΔT (measured in Years), we assume the chance of disaster hitting is λ*ΔT, with a constant [1] λ .

Pr(no hit during    A N Y   5-year period)
= Pr (surviving 5 years)
= Pr (no default for the next 5 years from now)
= Pr (T > 5) = exp(-5λ) , denoted V(5) on P522 [[Hull]]

, where T :=  # of years from now to next hit.

This is an exponential distribution. This λ is called the hazard rate, to be estimated from market data. Therefore it has a term structure, just like the term structure of vol.

More generally,  λ could be assumed a function of t, i.e. time-varying variable, but a slow-moving variable, just like the instantaneous vol. In a noisegen, λ  and vol function as configurable parameters.

In http://www.financial-risk-manager.com/risks/credit/edf.html, λ is denoted “h”, which is assumed constant over each 12-month interval.

“Hazard rate” is the standard terminology, and also known as “default intensity” or “failure rate”.

I feel hazard rate is perhaps the #1 or among top 3 applications of
conditional probability,
conditional distribution,
conditional expectation

So the big effort in studying the conditional probability is largely to help understand credit risk.

conditional independence, learning notes

Remember — Discrete is always easier to understand …

Q: does conditional independence imply unconditional independence?

Simple example – among guys, weight is unrelated to income – conditional independence, but removing the condition, among all genders weight has a bearing on income.

However, in this math problem the terminology can be confusing – Given W = w, the 2 random variables X and Y are conditionally iid exp(w) distributed. W itself follows a G(alpha, beta) distribution. Are X and Y unconditionally independent?

I feel the key is the symbol “w”, which is neither a variable nor a number, but rather a configurable parameter. In the noisegens, this w is a constant, like 39.8. However, as operator of the noisegen, we could set this parameter and potentially modify the distribution(s).

If for all w values, X and Y are independent, then I believe X and Y are unconditionally independent.

intuitive – stdev(A+B) when independent ^ 100% correlated

(see also post on linear combo of random variables…)

Develop quick intuitions — Quiz: consider A + B under independence assumption and then under 100% correlation assumption. When is variance additive, and when is stdev additive?

(First, recognize A+B is not a regular variable like “A=3, B=2, so A+B=5”. No, A and B are random variables, from 2 noisegens. A+B is a derived random variable that’s controlled from the same 2 noisegens.)

If you can’t remember which is which, remember independence means good diversification[intuitive], lower dispersion, lower spread-out around the expected return, thinner bell, lower variance and stdev.

Conversely, remember strong correlation means poor diversification [intuitive] , magnified variance/stdev.

–Case: 100% correlated, then A+B is exactly a multiple of A [intuitive], like 2*A or 2.4*A. If you think of a normal (bell) or uniform (rectangle) distribution, you realize 2.4*A is proportionally magnified horizontally by a factor of 2.4, so the width of the distribution increases by 2.4, so stdev increases by 2.4. In Conclusion, stdev is additive.

–Case: independent
“variance is additive” applicable in the multi-period iid context.

simple rule — variance of independent[1] A + B is the sum of the variances.

[1] 0 correlation is sufficient

–Case: generalized — http://www.stat.ucla.edu/~hqxu/stat105/pdf/ch01.pdf P27 Eq5-36 is a good generalized formula.

V(A+B) = V(A) + V(B) + 2 Cov(A,B)  …. easiest form

2*Cov(A,B) := 2ρ V(A)V(B)

V( 7A ) = 7*7 V(A)

## top 5 expertise I could grow #teach?

The most sought-after Expertise I could develop.

#1 personal investment – FX/option, HY and unit trust investment

# tech IV by top employers, including brain teasers
# Wall St techie work culture
# financial dnlg, appealing to pure techies
However, some of these are hard to make a teaching career. So which domain can i teach for a living, perhaps with a PhD
  1. programming
  2. data science, combining finance with…
  3. comp science
  4. fin math

notation tips for probability puzzles

* There are many alternative notations for “probability of A and B”. I prefer p(A . B) — good for hand writing and computers

* There are many alternative notations for “probability of not-A”. I prefer p(A’ ) — good for computers. How about p(!A)? Alien to many mathematicians.

* Favor shortest abbreviations for event names. For example, probability of “getting two 6’s in 2 consecutive dice tosses” should NOT be written as p(66), but as p(K) by denoting the event as K.
* Avoid numbers in event short names — things like 2 Pr(3) very ambiguous. If feasible, avoid number subscripts too.
* Favor single letters including greek letters and Chinese characters. If feasible, avoid any subscript.

Venn diagram – good at showing mutual exclusion between 2 events.
tree diagram – good at showing cond prob between 2 events
tree diagram – good at showing independence between 2 events

Note — Mutual exclusion => independence,  but not vice versa.

Independence is intuitive most of the time but can be non-intuitive when you are deep into a tough puzzle.

Independence can be counter-intuitive and not captured in tree diagrams. Let h denote “salary above 100,000”; f=female. The 2 events happen to be indie in one firm but not another. In general, we have to assume not-independent.

Tossing a coin in the morning vs afternoon. Toss Should be independent of timing, but actual observation may not prove it.

prob integral transform (percentile), intuitively

I find the PIT concept unintuitive …Here’s some learning notes Based on http://www.quora.com/What-is-an-intuitive-explanation-of-the-Probability-Integral-Transform-aka-Universality-of-the-Uniform answer by William Chen.

Let’s say that we just took a midterm (or brainbench or IKM) and the test scores are distributed according to some weird distribution. Collect all the percentile numbers (each between 1 and 100). Based on PIT, these numbers are invariably uniformly distributed. In other words, the 100 “bins” would each have exactly the same count! The 83rd percentile students would tell you

“82% of the students scored below us, and 17% of the students scored above us, and we are exactly 1% of the batch”

Treat the bins as histogram bars… equal bars … uniform pdf.

The CDF is like the percentile function, which accepts a score and returns the percentile, a real number between 0 and 1.00.

quantile (+ quartile + percentile), briefly

http://en.wikipedia.org/wiki/Quantile_function is decent.

For a concrete example of quaNtile, i like the quaRtile concept. Wikipedia shows there are 3 quartile values q1, q2 and q3. On the pdf graph (usually bell-shaped, since both ends must show tails), these 3 quartile values are like 3 knifes cutting the probability mass “area-under-curve” into 4 equal slices consisting of 2 tails and 2 bodies.

Quantile function is related to inverse of the CDF function. Standard notation —

F(x) is the CDF function , strongly increasing from 0 to 1.
F -1() is the inverse function, whose support is (0,1)
F -1(0.25) = q1 , assuming one-to-one mapping

http://www.quora.com/What-is-an-intuitive-explanation-of-the-Probability-Integral-Transform-aka-Universality-of-the-Uniform explains in plain English that percentile function is a simplified, discrete version of our quantile function (or perhaps the inverse of it). The CDF is like a robot. You say your score, and he give you the percentage like “94% of test takers scored below you”.

Conversely, the quantile function is another robot. You say a percentage like 25%, and she gives the score “25% of the test takers scored below 362 marks”

Obvious assumption — one to one mapping, or equivalently, strongly increasing CDF.