Fwd: constant study (over the last 5 years) keeps the brain young

Learning a tech is not 100% all about getting the job done or getting

better jobs, even though that's about 99% of it, admittedly.

A few colleagues (a tiny minority) at my various jobs seem to enjoy

the learning process, even proprietary technologies with low market

value. Learning tech can be a joy. In such a context, or such a geek,

don't mention “saving time”.

As mentioned in the title + first email, I feel our brain is like

learning machine, and like a muscle. (I guess chess is also a kind of

brain exercise…) Any such exercise is never a waste of time.

Some people make a decision to learn something (Spanish? Wu-bi Chinese

input? musical instrument) even when they are as busy as we are. I

guess Learning a technology is sometimes like that.

However, I (grudgingly) agree it's not so fun learning, say, java

struct, for years and then see it falling out of favor. I carefully

pick the low-churn technologies, like you would pick watermelons at

the supermarket:) See my earlier email


python – some innovative features

I’m relatively familiar with perl, java, c++, c# and php, though some of them I didn’t use for a long time.

IMO, these python features are kind of unique, though other unknown languages may offer them.

* decorators
* list comp
* more hooks to hook into object creation. More flexible and richer controls
* methods and fields are both class attributes. I think fundamentally they are treated similarly.

intuitive – E[X*X] always exceeds E[X]*E[X], 1st look

This applies to any rvar.

We know E[X*X] – E[X]E[X] is simply the variance of X, which is always positive. This is non-intuitive to me though. (How about a discrete uniform?)

Suppose we modify the population (or the noisegen) while holding the mean constant. Visually, the pdf or histogram flats out a bit. (Remember area under pdf must always always = 1.0). E[X*X] would increase, but E[X]E[X] stays unchanged….

Now suppose we have a population. Without loss of generality, suppose E[X] = 1.2. We shrink the pdf/histogram to a single point at 1.2. This shrunk population obviously have E[X*X] = E[X]E[X]. Now we use the previous procedure to “flatten out” the pdf to the original. Now clearly E[X*X] increases beyond 1.44 while E[X]E[X] stays at 1.44…

const hazard rate, with graph

label – intuitive, mathStat


Q1: Intuitively, how is const hazard rate different from constant density i.e. uniform distro?


It’s good to first get a clear (hopefully intuitive) grasp of constant hazard rate before we talk about the general hazard rate. I feel a common usage of hazard rate is in the distribution of lifespan i.e. time-to-failure (TTF).


Eg: run 999999 experiments (what experiment? LG unimportant) and plot histogram of the lifespan of ….. Intuitively, you won’t see a bunch of bars of equal height – no uniform distro!

Eg: 10% of the remaining (poisonous!) mercury evaporates each year, so we can plot histogram of lifespan of mercury molecules…

Eg: Hurricane hit on houses (or bond issuers). 10% of the remaining shanties get destroyed each year…

Eg: 10% of the remaining bonds in a pool of bonds default each year. Histogram of lifespan ~= pdf graph…



If 10% of the survivors fail each year exactly, there’s not much randomness here:) but let’s say we have only one shanty named S3, and each year there’s a 10% chance of hazard (like Hurricane). The TTF would be a random variable, complete with its own pdf, which (for constant hazard rate) is the exponential distribution. As to the continuous case, imagine that each second there’s a 0.0000003% chance of hazard i.e. 10% per year spread out to the seconds…


I feel there are 2 views in terms of noisgen. You can say the same noisegen runs once a year, or you can say for that one shanty (or bond) we own, at time of observation, the noisegen runs once only and generates a single output representing S3’s TTF, 0 < TTF < +inf.


How does the eλt term come about? Take mercury for example, starting with 1 kilogram of mercury, how much is left after t years? Taking t = 3, it’s (1-10%)^3. In other words, cumulative probability of failure = 1- (1-10%)^3. Now divide each year into n intervals. Pr(TTF < t) = 1- (1- 10%/n) ^ n*t. As n goes to infinity, Pr(TTF < t years) = 1- e– 0.1t  i.e. the exponential distribution.


(1 – 0.1/n)n approaches e– 0.1     as n goes to infinity.

This is strikingly similar to 10%/year continuous compounding


(1 + 0.1/n)n approaches e+ 0.1     as n goes to infinity.


A1: Take the shanty case. Each year, the same number of shanties collapse — uniform density, but as the survivor population shrinks, the chance of failure becomes very high.


copula – 2 contexts

http://www.stat.ubc.ca/lib/FCKuserfiles/file/huacopula.pdf is  the best so far. But I feel all the texts seem to skip some essential clarification. We often have some knowledge about the marginal distributions of 2 rvars. We often have calibrated models for each. But how do we model the dependency? If we have either a copula or a joint CDF, then we can derive the other. I there are 2 distinct contexts — A) known CDF -> copula, or B) propose copula -> CDF


–Context A: known joint CDF

I feel this is not a practical context but an academic context, but students need to build this theoretical foundation.


Given 2 marginal distro F1 and F2 and the joint distro (let’s call it F(u1,u2) ) between them, we can directly produce the true copula. Denoted CF(u1, u2) on P72, True copula := the copula to reproduce the joint  CDF. This true copula C contains all information on the dependence structure between U1 and U2.


http://www.stat.ncsu.edu/people/bloomfield/courses/st810j/slides/copula.pdf P9 points that if the joint CDF is known (lucky!) then we can easily find the “true” copula that’s specific to that input distro.


In contrast to Context B, the true copula for a given joint distro is constructed using the input distros.


— Context A2:

Assume the joint distribution between 2 random variables X1 and X2 is, hmm ….. stable, then there exists a definite, concrete albeit formless CDF function H(x1, x2). If the marginal CDFs are continuous, then the true copula is unique by Sklar’s theorem.




–Context B: unknown joint CDF — “model the copula i.e. dependency, and thereby the CDF between 2 observable rvars”

This is the more common situation in practice. Given 2 marginal distro F1 and F2 without the joint distro and without the dependency structure, we can propose several candidate copula distributions. Each candidate copula would produce a joint CDF. I think often we have some calibrated parametric formula for the marginal distros, but we don’t know the joint distro, so we “guess” the dependency using these candidate copulas.


* A Clayton copula (a type of Archimedean copula) is one of those proposed copulas. The generic Clayton copula can apply to a lot of “input distros”

* the independence copula

* the        comonotonicity copula

* the countermonotonicity copula

* Gaussian copula


In contrast to Context A, these “generic” copulas are defined without reference to the input distros. All of these copulas are agnostic of the input random variables or input distributions. They apply to a lot of different input distros. I don’t think they match the “true” copula though. Each proposed copula describes a unique dependency structure.


Perhaps this is similar — we have calibrated models of the SPX smile curve at short tenor and long tenor. What’s the term structure of vol? We propose various models of the term structure, and we examine their quality. We improve on the proposed models but we can never say “Look this is the true term structure”. I would say there may not exist a stable term structure.

A copula is a joint distro, a CDF of 2 (or more) random variables. Not a density function. As such, C(u1, u2) := Pr(U1<u1, U2<u2). It looks (and is) a function, often parameterized.