long-term daily stock returns ~ N(m, sigma)

Label: intuitiveFinance

A basic assumption in BS and most models can be loosely stated as

“Daily stock returns are normally distributed, approximately.” I used

to question this assumption. I used to feel that if the 90% confidence

interval of an absolute price change in IBM is $10 then it will be

that way 20 years from now. Now I think differently.

When IBM price was $1, daily return was typically a few percent, i.e.

a few cents of rise or fall.

When IBM price was $100, daily return was still a few percent, i.e. a

few dollars of rise or fall.

So the return tends to stay within a narrow range like (-2%, 2%),

regardless of the magnitude of price.

More precisely, the BS assumption is about log return i.e. log(price

relative). This makes sense. If %return is normal, then what is a

-150% return?

set up dependency between two c++ projects

I will mostly talk about MSVS but will mention linux build too.

The common methods to set up the dependency, all done inside project BB

1) specify project dependency in MSVS

2) define include path to cover the header from AA

3) define library path to cover the object file from AA

Suppose project BB uses “things” in project AA. Say a file in BB #-includes a header in AA. I think it's not necessary to have any. This is important when you debug the dependency. You may fail to find any trace of the dependency in any of those places. The file in BB can simply #-include the full/relative path to the header from AA.

http://blogs.msdn.com/b/vcblog/archive/2010/02/16/project-settings-changes-with-vs2010.aspx describes project reference vs project dependency….

….. much more to say.

c++compiler must know data type sizes

http://stackoverflow.com/questions/6264249/how-does-the-compilation-linking-process-work points out

So what the compiler outputs is rough machine code that is not yet fully built, but is laid out so we know the size of everything, in other words so we can start to calculate where all of the absolute addresses will be located. The compiler also outputs a list of symbols which are name/address pairs. The symbols relate a memory offset in the machine code in the module with a name. The offset being the absolute distance to the memory location of the symbol in the module.

Stroustrup told me about a key performance advantage of c++ over modern languages — local variables. If we want to use more local variables and fewer heap objects, then I can understand that each time need to know the size of every data type.

new-expression alloates AND invokes ctor…most of the time

Neither the allocation or the ctor step is universal and guaranteed to happen.

1) ctor

If allocation fails then no “raw memory” is there to initialize.

new int; // This won’t invoke any ctor since “int” is not a class/struct and doesn’t have a ctor.

2) allocation

It’s possible to bypass the allocation, if you use placement-new

c++threading support,historically

[[c++concurrencyInAction]] has a concise historical review…

* POSIX is a platform-specific C api, just like the Windows API.
I feel this is an OS-level API. Every OS-level API tends to be in C (not c++), so various languages can all use it. OS-level API tend to “present” the hardware as is. The hardware has no concepts of object orientation or exceptions. The C language is the natural language for the hardware.
* Threading is a low-level, OS-level feature. (Kernel support? Not sure about the jargon). Naturally, threading is implemented in C and exposed first as C api. Many c++ threading developers simply use C api without any motivation to use c++.
* There’s no surprise that the early c++ concurrency libraries are wrappers over the existing C api. Examples include MFC and Boost.
* Note low-level C api is usually platform-dependent. The c++ wrapper api has an opportunity to smooth out the platform differences.

[15] a G3 WallSt survival KPI:how fast U figure things out, relative2coworkers

As stated in other posts, it’s all about local system knowledge + tools

Survival (and bonus, stress..) is always linked to benchmark against team peers, not external teams like another department, not the manager himself. If entire team is low-calibre, then it’s a problem for the manager, but rarely your own problem. (It might threaten your team’s survival…)

LG2: system performance. Most guys in the team probably aren’t so good at this kinda thing, unless the  team has a specific mandate to monitor and fix performance.
LG2: test coverage. May affect your reputation, but once I figure things out I can often relax and test
LG2: code smell. May affect your reputation but is clearly a LGlp compared to getting things to work.
LG2: code quality
LG2: readability by the team lead. OK team lead may complain, but if it works then it’s relatively easy to improve it.
LG2: extensibility. Yang is big on this. Many books are big on this, but it’s an art not a science. Many team members probably aren’t good at it.
LG2:  system stability such as occasional hangs. Often a non-show-stopper.
** eg: my VS2015 build tends to hang and I had to research on how to fix it — show-stopper.

All of the above are secondary to … “figuring things out” i.e. how to get something to work, at a minimum usability.

Design? See also posts on arch_job. Can be a real problem, because if your design gets shot down, you waste time on rework.



IV^GTD – grow as techie@WS

I want to grow stronger/broader/versatile as a survivor, not necessarily grow my value-add. Actually I don’t have to grow.
— IV skills – Compared to GTD skills, these skills give me more confidence, more protection, more stress-relief. It works in big job markets like Wall St.

Lots of theory, which is my sweet spot.
Selective on what IV topics to learn!
coding IV + algo — favored by SiV
— (portable) GTD skills
Lots of tools….
Selective on what GTD topics to learn!

Needed for a lead developer, but such a role is stressful. I guess some good lead developers are also good at IV, but I’m not sure and I assume I’ll not be good at both.

Warning – a lot of projects don’t provide real GTD enrichment. Eg: Quartz, tibrv wrappers, Gemfire wrappers, FIX wrappers
Macquarie environment lets me learn lots of GTD skills.
OC gave me IV (c#) and GTD enrichment.
Stirt – none!

A java environment would give me some additional GTD enrichment but less IV enrichment

In SG, I continued my previous strategy, learning IV skills + GTD skills. Not effective so far. I feel my c# IV skills improved a lot but still not there. C++ IV skills didn’t improve much partly due to distractions.


[[algo in a nutshell]] bucket sort, insertion sort..

== bucket sort
This book asserts that bucket sort is #1 top dog when data is uniformly distributed (not normally distributed!) — i.e. can be uniformly partitioned using a fast hashing function. Number of buckets created equals the input data size, just like in a standard hash table. Book shows bench-marking on large samples of random floating point numbers. However, I disagree.
  1.  for special data types like strings and small ints, radix sort is probably faster than bucket sort
  2. Bucket sort is #1 for large data volumes, beating quick sort, heap sort etc. But for nearly-sorted data, Insertion-sort is faster.

Other notes

  • Linked list — used inside each bucket. Arrays are possible alternative, but 50% slower.
  • —-Relation to other sorts:
  • Can be seen as generalization of counting sort
  • A cousin of radix sort in the most-to-least significant digit (MSD) flavor i.e. top-down radix sort
–hash sort, a variation of bucket sort, customized for random strings.
Book shows benchmarking on random strings. Hash sort is #1 for large data volumes, beating quick sort, heap sort etc

I guess if string distribution is unconstrained/unknown (without guarantee of randomness), hash sort will not have any advantage

in both cases, the hash code must be strictly “ordered” , so bucket #1 must hold lowest data items.

== insertion sort
P100. for nearly-sorted data, insertion sort is faster than all other sorts (including bucket sort, radix sorts), often by an order of magnitude

https://stackoverflow.com/questions/736920/is-there-ever-a-good-reason-to-use-insertion-sort shows insertion sort is better than divide-n-conquer for 1) nearly-sorted or 2) small collection

https://www.cs.princeton.edu/~rs/AlgsDS07/18RadixSort.pdf shows that MSD radix sort also need to switch to insertion-sort when sample size is small.

== counting sort
P315. “fastest” sort for the right data set. I think a single English word is suitable. However, for nearly sorted data, insertion sort is still faster.

I guess it’s still slower than insertion sort if nearly-sorted.

== Binary search tree vs hash table
P130 discusses when to use which

top geeks – 5 categories

Let’s try to be critical, incisive… It’s also good but not critical to widen the perspective.
–5) instrumentation for GTD. This is the #1 most valuable skill for GTD, not necessarily to move up to leadership.
–7) Some (not “top top”) geeks can quickly pick up enough details of a new “tool” and use it to get things done (often not perfectly). I had this advantage in my youth. I might have it still.
eg: git, IDE, tibrv,
eg: Venkat, Avichal
–8) Some (not top top) geeks have rare knowledge of advanced/obscure topics like concurrency, memory mgmt, templates, elegant SQL joins, arcane language rules…, but the difference here is — the knowledge is needed only for interviews.
eg: IV questions at many HFT
eg: most threading questions
eg: Venkat
eg: [[effC++]]

–9) Some successful geeks have good local system knowledge. See post on Helicopter view.
Prime eg: Yang.

–1) Some top geeks (+ open source heroes) can quickly create a complete non-trivial software, often single-handedly.
* probably the hardest achievement. These are the doers not just thinkers and dreamers.
* productivity is often 100X a regular developer
–2) [G] Some top geeks are experts on google-style algorithm challenges.
* This domain takes at least a few months (possibly years) of practice. Without this mileage, you won’t be there.
* similar to mathematical talent — somewhat innate
–3) [G] Some top geeks can optimize performance in a demanding system. A specialist knowledge.
eg: the Russian Sybase guru in PWM CT
eg: facebook memory allocator?
I feel this skill is overrated. Not much business value and impact in most cases.
–4) Some top geeks are experts at system (including compiler) internals i.e. under the hood + debugging, but not obscure low level details
* can uncover the “why” no one else can. Widely valuable.
** the very deep “why” is sometimes not that critical. In many real projects we don’t need to uncover the why if we can circumvent (find a workaround), including a rewrite
** in rare cases, we have no choice but find the “why”, but if the team doesn’t have expertise, they still would find some inferior solution or just live with the problem.
[G=needed only at largest websites like Google, but many other employers pose these questions too.]

Hull: estimate default probability from bond prices

label: credit

The arithmetic on P524-525 could be expanded into a 5-pager if we were to explain to people with high-school math background…


There are 2 parts to the math. Part A computes the “expected” (probabilistic) loss from default to be $8.75 for a notional/face value of $100. Part B computes the same (via another route) to be $288.48Q. Equating the 2 parts gives Q =3.03%.


Q3: How is the 7% yield used? Where in which part?


Q4: why assume defaults happen right before coupon date?

%%A: borrower would not declare “in 2 days I will fail to pay the coupon” because it may receive help in the 11th hour.


–The continuous discounting in Table 23.3 is confusing

Q: Hull explained how the 3.5Y row in Table 23.3 is computed. Why discount to  the T=3.5Y and not discounting to T=0Y ?


The “risk-free value” (Column 4) has a confusing meaning. Hull mentioned earlier a “similar risk-free bond” (a TBond). At 3.5Y mark, we know this risk-free bond is scheduled to pay all cash flows at future times T=3.5Y, 4Y, 4.5Y, 5Y. We use risk-free rate 5% to discount all cash flows to T=3.5Y. We get $104.34 as the “value of the TBond cash flows discounted to T=3.5Y”


Column 5 builds on it giving the “loss due to a 3.5Y default, but discounted to T=3.5Y”. This value is further discounted from 3.5Y to T=0Y – Column 6.

Part B computes a PV relative to the TBond’s value. Actually Part A is also relative to the TBond’s value.


In the model of Part B, there are 5 coin flips occurring at T=0.5Y   1.5  2.5  3.5  4.5 with Pr(default_0.5) = Pr(default_1.5) = … = Pr(default_4.5) = Q. Concretely, imagine that Pr(flip = Tail) is 25%. Now Law of total prob states


100% = Pr(05) + Pr(15) + Pr(25) + Pr(35) + Pr(45) + Pr(no default). If we factor in the amount of loss at each flip we get


Pr(05) * $65.08 + Pr(15) * $61.20 + Pr(25) * $57.52 + Pr(35) * $54.01 + Pr(45) * $50.67 + Pr(no default, no loss) + $0 == $288.48Q

threading support in compiler^library

Not sure what POSIX falls into. OS-level library?

Java (followed by c#) absorbed many concurrency features into the compiler, by adding language keywords. Is this related to the VM? I think so. Is The library approach still necessary in the presence of VM? Yes. http://stackoverflow.com/questions/30816307/does-java-jvm-use-pthread confirmed that jvm uses pthreads on linux and macOS.

Python, perl and most languages have no VM and use the common add-on “library” approach. The library is usually a wrapper over C threading library.

clarifying questions to reduce confusions in BS discussions

–At every mention of “pricing method”, ask

Q: Analytical (Ana) or Numerical (Num)?

Q: for European or American+exotic options?

Obviously Analytical methods only work for European style

Q: GBM assumption?

I think most numerical methods do. Every single method has severe

assumptions, so GBM is just one of them.

–At every mention of “Option”, ask

Q: European style of Amerian+Exotic style?

–At every mention of Black-Scholes, ask

Q: BS-E(quation) or BS-F(ormula) or BS-M(odel)?

Note numerical methods rely on BS-M or BS-E, not BS-F

–At every mention of Expectation, ask

Q: P-measure of Q-measure?

The other measures, like the T-fwd measure are too advanced, so no

need to worry for now.

XiaoAn@age discrimination:$5k offer !! bad4an aging programmer

I could keep this current job in Singapore for a few years. At age 44, or 45… I might be lucky again to get another finance IT job but what about 50?

The odds will grow against me. I’m on an increasingly tilted playing field. At 60 I’ll have very very low chance.

XiaoAn points out that at such an age, even finding a $5k job is going to be tough. I believe indeed some percentage of the hiring managers don’t like hiring someone older. XiaoAn admitted he’s one of them.

You could feel confident about age discrimination, but the reality is … in Singapore there are very few such job candidates so we just don’t know how the employers would react.

Also bear in mind at age 55 I am unlikely to perform as before on interviews.

app arch – civil engineer or a salesman, sg^U.S.

I feel in Singapore context, “architect” is more likely to refer to a hands-off role. In the US, I never meet an architect who doesn’t writes code hands-on at least half his time.

A real app architect in finance (different from professional software product vendors) really needs hands-on capabilities. Her output is not just on paper, but in code. If her design (not the document but the “key ideas” implemented) doesn’t work, she must roll up her sleeves and make it work. Most of those ideas are low level, like some serialization or some threading construct, and require just one good developer. In that sense it’s not unlike a library module.

In this context, architect is the name of the technical lead of ANY dev team. Often known as lead developer, in a smaller team.

Any dev team needs a technical leader. Half the time, the team manager or the project manager is technical enough to lead a suppor team but not a dev team,  so an architect is needed.  Often, the architect is the only leader.

The pre-sales architect is very different. Castle in the sand. Imaginary buildings.

Update: I feel in the US I could become a lead developer, once I become familiar with a codebase, but any role that requires a lot of persuasion I’m not too sure. I feel if my technical grasp of everything is higher than the rest then it’s possible. It’s all relative.

C++ one source file -> one object file

Not sure if you can compile multiple source files into a single *.obj file…


Compilation refers to the processing of source code files (.c, .cc, or .cpp) and the creation of an ‘object’ file. This step doesn’t create anything the user can actually run!

Instead, the compiler merely produces the machine language instructions that correspond to the source code file that was compiled. For instance, if you compile (but don’t link) three separate files, you will have three object files created as output, each with the name .o or .obj (the extension will depend on your compiler). Each of these files contains a translation of your source code file into a machine language file — but you can’t run them yet! You need to turn them into executables your operating system can use. That’s where the linker comes in.

0 probability ^ 0 density, 3rd look

See also the earlier post on 0 probability vs 0 density.

[[GregLawler]] P42 points out that for any continuous RV such as Z ~ N(0,1), Pr (Z = 1) = 0 i.e. zero point-probability mass. However the sum of many points Pr ( |Z| < 1 ) is not zero. It’s around 68%. This is counterintuitive since we come from a background of discrete, rather than continuous, RV.

For a continuous RV, probability density is the more useful device than probability of an event. My imprecise definition is

prob_density at point (x=1) := Pr(X falling around 1 in a narrow strip of width dx)/dx

Intuitively and graphically, the strip’s area gives the probability mass.

The sum of probabilities means integration , because we always add up the strips.

Q: So what’s the meanings of zero density _vs_ zero probability? This is tricky and important.

In discrete RV, zero probability always means “impossible outcome” but in continuous RV, zero probability could mean either
A) zero density i.e. impossible outcome, or
B) positive density but strip width = 0

Eg: if I randomly selects a tree in a park, Pr(height > 9999 meter) = 0… Case A. For Case B, Pr (height = exactly 5M)=0.

continuous 0 density at a point (A) => impossible
discrete 0 probability at a point => impossible
continuous 0 probability at a point. 0 width always true by definition. Not meaningful
continuous 0 probability over a range (due  to A) => impossible

0 probability ^ 0 density, 2nd look #cut rope

See the other post on 0 probability vs 0 density.

Eg: Suppose I ask you to cut a 5-meter-long string by throwing a knife. What’s the distribution of the longer piece’s length? There is a density function f(x). Bell-shaped, since most people will aim at the center.

f(x) = 0 for x > 5 i.e. zero density.

For the same reason, Pr(X > 5) = 0 i.e. no uncertainty, 100% guaranteed.

Here’s my idea of probability density at x=4.98. If a computer simulates 100 trillion trials, will there be some hits within the neighborhood around x=4.98 ? Very small but positive density. In contrast, the chance of hitting x=5.1 is zero no matter how many times we try.

By the way, due to the definition of density function, f(4.98) > 0 but Pr(X=4.98) = 0, because the range around 4.98 has zero width.

Applying Ito’s formula on math problems — learning notes

Ito’s formula in a nutshell — Given dynamics of a process X, we can derive the dynamics[1] of a function[2] f() of x .

[1] The original “dynamics” is usually in a stoch-integral form like

  dX = m(X,t) dt + s(X,t) dB

In some problems, X is given in exact form not integral form. For an important special case, X could be the BM process itself:


[2] the “function” or the dependent random variable “f” is often presented in exact form, to let us find partials. However, in general, f() may not have a simple math form. Example: in my rice-cooker, the pressure is some unspecified but “tangible” function of the temperature. Ito’s formula is usable if this function is twice differentiable.

The new dynamics we find is usually in stoch-integral form, but the right-hand-side usually involves X, dX, f or df.

Ideally, RHS should involve none of them and only dB, dt and constants. GBM is such an ideal case.

no IV muscle growth]sg..really@@

Common reason: no IV to give feedback on any “muscle growth”

Common reason: Most if not ALL of the growth areas have since become non-strategic, due to the limited job market

  • c# —  Actually my c# had tremendous and fast growth, no slower than my 2010-2012 peak period, but there was no IV to verify it
  • py — was growing fast in Mac, but no full time python jobs
  • quant — I went through a hell of growth in quant-dev, but gave up
  • c++ tool knowledge — was growing in Mac but not a QQ topic at all.
  • c++ optimization for HFT — I read quite a bit but can’t pass the interviews 😦 so I gave up