testing threaded designs – enterprise apps( !! lib)

Bottom line – (Unconventional wisdom) Be bold to create new threaded designs. It doesn’t have to be rock solid like in the standard library.

Experts unanimously agree that non-trivial MT designs are hard to verify or test, often exceedingly hard. There are often too many possibilities. Maybe a million tests pass but next test reveals a bug. Therefore peer review is the way to go.  I feel that’s the “library” view, the view from the language creators. Different from enterprise apps.

In enterprise apps, if a MT design passes load test and UAT, then good enough. No budget to test further. If only 1 in a million cases fail then that case has something special — perhaps a special combination or purely timing coincidence. Strictly speaking those are still logical errors and design defects. A sound design ought to handle such cases gracefully. Most designs aren’t so sound. If such a failure happens so rarely, just about everyone involved would agree it’s basically impossible to catch it during testing. Such a bug, if ever uncovered, would be too hard to catch and catching it should not be in any job scope here. Missing it is tolerable and only human.

A goal keeper can’t be expected to catch 99 penalties in a row.

In the enterprise reality, such a bug is probably never uncovered (unless a log happens to provide the evidence). It often takes too much effort to investigate such a rare issue. Not worth the effort.


immutability of a c# delegate instance

Mutability is all about object state. We are familiar with object state of ordinary objects, but delegate objects are different – subtle.

A delegate’s object state is its inv list. More specifically, the state is the ordered list of 2-pointer thingies.

So immutability means that the collection and ordering are permanent. Each 2-pointer thingy is naturally immutable given that object pointer [1] and the function pointer are just addresses rather than “pointer variables”. In other words, each 2-pointer thingy holds 2 addresses and there’s no way to “edit” any address.

Immutability doesn’t cover the Target object. Target object is fully mutable. This is what confuses folks like me. If an immutable Student object HAS-A professor field, then the professor pointee object is presented as immutable. Here’s the key — Every time a Student method returns the professor, the professor is cloned, so the original professor instance is never exposed/leaked. No such cloning underlies the immutable delegate instance.

[1] This is actually the Target property of the dlg instance. For a static method, Target is null.

fear: c# iwt java/python

The more my GTD “survival” depend on tools or environment, the more fearful and the less confident I feel.

Extreme eg:  When programming Perl, php (and python) I don’t rely on any complicated tool. The libraries I use tend to be simple and small.

Extreme eg: Excel VBA is fully inflicted with infrastructure dependency. Any idiosyncrasy or nuance of Excel can give inexplicable subtleties.

— Java —

Java is a powerful, mid-level language with a remarkably robust sandbox. The ugly, dirty world outside is largely kept away from us programmers, so we operate in a clean room, sandbox, or a constant-temperature green house.

However, when I had to debug into Spring-JMS I hit the same infrastructure-dependency.

JNI — The sandbox leaks when you play with JNI. Infrastructure dependency on a massive scale. You now need a lot of “platform-knowledge” about the “platform” outside. It’s a lot dirtier than inside the sandbox.

Eclipse Debugger — is less well-behaving and less understood than other aspects of java development. When the source code is out of sync with the executing binary, things could keep going but would fail in strange ways — undefined behavior.

How about stack overflow. No exception. Just crash.

— c# —

The c# language is more feature-rich than java, but still consistent and structured. No complaints. However, when you use c# for real projects you hit infrastructure dependency. I mentioned in numerous blogs how much platform knowledge needed.

Unlike the famed write-once-run-anywhere portability of java, Many really, really important parts of C# development are tied into the “platform” outside, meaning developers need that platform knowledge. Just a few examples


* threading implemented on top of win threads

* many critical tasks need MSVS

* Excel integration

* windows service integration

SGD exchange rate management by MAS

HK government uses a “pegged FX rate”. The drawback is “imported inflation”. HKD is kept artificially low, so HK living cost is higher.

SG government chose a different strategy – “managed float”. Whenever SGD FX rate exceeds the policy band, MAS would buy/sell SGD in the open market, which cost tax payer’s money.

Due to the impossible trinity (http://en.wikipedia.org/wiki/Impossible_trinity),
– SG government loses the freedom to set independent interest rate. SGD interest rate is forced to follow USD interest rate.
– China has capital control, Fixed exchange rate and Independent interest rate policy.

convert sequence@obj→strings→concat #listCompr

P60 [[cookbook]] shows a neat trick

>>> data = [‘aa’, 50, 91.1]
>>> ‘, ‘ . join(str(d) for d in data)
‘aa, 50, 91.1’

Technique: generator expression,
Technique: str() conversion ctor. Without conversion, joining string with non-string leads to exceptions.
Technique: calling join() method on the delimiter string

The author points out that string concat can be very inefficient if you blindly use “+” operator. Similarly, java/dotnet offers stringBuilders, and c++ offers stringstream

python LGB rule, %%take

http://my.safaribooksonline.com/book/programming/python/1565924649/functions/ch04-28758 clarifies many key points discussed below.

(The simple-looking “global” keyword can be tricky if we try to understand its usage too early.)

The key point highlighted inadequately (except the above book) is the distinction between assignment and reference.

* AA – Assignment – When a variable appears on the LHS, it’s (re)created in some namespace [1], without name lookup. By default, that namespace is the local namespace.
* RR – Reference – any other context where you “use” a variable, it’s assumed to be a pre-existing variable. Query and method invocation are all references. Requires name lookup, potentially causing an error.

Let me test you on this distinction —

Q: what happens if I Assign to a non-existent variable?
A: I believe it will get created.

Q: what happens if I Reference a non-existent varialbe?
A: I believe it’s a NameError, in any context

The LGB search sequence is easily understood in the RR context. Failed search -> error.

In the AA context, there’s no need to look up, as the variable is simply created.

shadowing — local var often shadows a global var. This makes sense in the RR case, and I think also in the AA case.

[1] A namespace is probably implemented as an “idict”, a registry (presumably) mapping variable names to object addresses.

Now we are ready to look at the effect of keyword “global”.

RR – the LGB search sequence shrinks to “G”.
AA – the variable is simply (re)created in the G rather than L namespace.

http://my.safaribooksonline.com/book/programming/python/1565924649/functions/ch04-28758 points out that “Global names needs to be declared only if they are assigned in a function.” If not assigned, then no shadowing concern.

risk premium — clarified

risk premium (rp) is defined as the Expected (excess) return. A RP value is an “expected next-period excess return” (ENPER) number calculated from current data, using specific factors. A RP model specifies those factors and related parameters.

Many people call these factors “risk factors”. The idea is, any “factor” that generates excess return must entail a risk. If any investor earns that excess return, then she must be (knowingly/unknowingly) assuming that risk. The Fama/French value factor and size factor are best examples.

Given a time series of historical returns, some people simply take the average as the Expected. But I now feel the context must include an evaluation date i.e. date of observation. Any data known prior to that moment can be used to estimate an Expected return over the following period (like12M). Different people use different models to derive that forward estimate i.e. a prediction. The various estimates create a supply/demand curve for the security. When all the estimates hit a market place, price discovery takes place.

Some simple models (like CAPM) assumes a time-invariant, steady-state/equilibrium expected return. It basically assumes that each year, there’s a big noisegen that influences the return of each security. This single noisegen generates the return of the broad “market”, and every security is “correlated” with it, measured by its beta. Each individual security’s return also has uncertainty in it, so a beta of 0.6 doesn’t imply the stock return will be exactly 60% of the market return. Given a historical time series on any security, CAPM simply takes the average return as the unconditional, time-invariant steady-state/equilibrium estimate of the steady-state/equilibrium long-term return.

How do we benchmark 2 steady-state factor models? See other blog posts.

Many models (including the dividend-yield model) produce dynamic estimates, using a recent history of some market data to estimate the next-period return. So how do I use this dynamic estimate to guide my investment decisions? See other posts.

Before I invest, my estimate of that return needs to be quite a bit higher than the riskfree return, and this excess return i.e. the “Risk premium” need to be high enough to compensate for the risk I perceive in this security. Before investing, every investor must feel the extra return is adequate to cover the risk she sees in the security. The only security without “risk premium” is the riskfree bond.

oprofile – phrasebook

root – privilege required to start/stop the daemon, but the query tools don’t need root

dtrace – comparable. I think these two are the most powerful profilers on solaris/linux.

statistical – results can be partially wrong. Example – call graph.

Per-process – profiling is possible. I think default is system-wide.

CPU – counters (hardware counters). Therefore, low impact on running apps, lower than “attachment” profilers.

userland – or kernel : both can be profiled

recompile – not required. Other profilers require recompiling.

kernel support – must be compiled in.

oprifiled – the daemon. Note there’s no executable named exactly “oprofile”.

[[Optimizing Linux performance]] has detailed usage examples of oprofile. [[linux programmer’s toolbox]] has decent coverage too.

HIV testing – cond probability illustrated

A common cond probability puzzle — Suppose there’s a test for HIV (or another virus). If you carry the virus, there’s a 99% chance the test will correctly identify it, with 1% chance of false negative (FN). If you aren’t a carrier, there’s a 95% chance the test will come up clear, with a 5% chance of false positive (FP). To my horror my result comes back positive. Many would immediately assume there a 99% chance I’m infected. The intuition is, like in many probability puzzles, incorrect.

In short Pr(IsCarrier|Positive result) depends on the prevalence of HIV.

Suppose out of 100million people, the prevalence of HIV is X (a number between 0 and 1). This X is related to what I call the “pool distribution”, a fixed, fundamental property of the population, to be estimated.

P(TP) = P(True Positive) = .99X
P(FN) = .01X
P(TN) = P(True Negative) = .95(1-X)
P(FP) = .05(1-X)

The 4 probabilities above add up to 100%. A positive result is either a TP or FP. I feel a key question is “Which is more likely — TP or FP”. This is classic conditional probability.

Denote C==IsCarrier. What’s p(C|P)? The “flip” formula says

p(C|P) p(P) = p(C.P) = p(P|C) p(C)
p(P) is simply p(FP) + p(TP)
p(C) is simply X
p(P|C) is simply 99%
Actually, p(C.P) is simply p(TP)

The notations are non-intuitive. I feel a more intuitive perspective is “Does TruePositive dominate FalsePositive or vice versa?” As explained in [[HowToBuildABrain]], if X is very low, then FalsePositive dominates TruePositive, so most of the positive results are false positives.

map reduce filter zip …

Importance — these handful of functions are the core FP features in Python, according to http://www.ibm.com/developerworks/linux/library/l-prog/index.html?ca=drs-

I feel it’s good to tackle the family members one by one, as a group. I believe each has some counterpart in either perl, C++, java and most likely in c#, so we can just list one counterpart to help us grasp each.

Bear in mind how common/uncommon each function is. Don’t overspend yourself on the wrong thing.

Don’t care about list vs tuple for now! Don’t care about advanced usages. Just grasp one typical usage of each.

http://www.lleess.com/2013/07/python-built-in-function-map-reduce-zip.html is concise.

–filter() = grep in perl

filter(function, sequence) -> sequence

>>> filter(lambda d: d != ‘a’, ‘abcd’) 

–map() = map in perl

>>> map(lambda a: a+1, [1,2,3,4] )
[2, 3, 4, 5]

–reduce = Aggregate() in c#
reduce(function, sequence [, initial]) -> value

>>> reduce(lambda x, y: x+y, range(0,10), 10)

–zip is less common

–apply() is deprecated

which socket/port is hijacking bandwidth

I guess some HFT machine might be dedicated to one (or few) process, but in general, multiple applications often share one host. A low latency system may actually prefer this, due to the shared memory messaging advantage.  In such a set-up, It’s extremely useful to pinpoint exactly which process, which socket, which network port is responsible for high bandwidth usage.

Solaris 10? Using Dtrace? tough? See [[solaris performance and tools]]

Linux? doable

# use iptraf to see how much traffic flowing through a given network interface.
# given a specific network interface, use iptraf to see the traffic break down by individual ports. If you don’t believe it, [[optimizing linux perf ]] P202 has a iptraf screenshot showing the per-port volumes
# given a specific port, use netstat or lsof to see the process PID using that port.
# given a PID, use strace and /proc/[pid]/fd to drill down to the socket (among many) responsible for the traffic. Socket is seldom shared (see other posts) between processes. I believe strace/ltrace can also reveal which user functions make those socket system calls.

sharedMem in low latency systems

Hi Anthony,

Is shared-memory a popular messaging solution in low-latency trading?

I know some high-volume data processing engines (like Ab Initio) favor

shared memory as the fastest IPC solution.

However, I feel in low latency trading, messaging (like tibrv, 29west,

Solace) is more popular. For a trading engine, shared memory IPC can

be the basis of messaging between processes on the same machine, but

not across different machines.

Do your system use shared memory?

If interested, you can check out




variance or stdev is additive – an illustration

Imagine annual (log) return is controlled by a noisegen, whose mean is a constant value M and variance is another constant value sigma^2

Since we hit the noisegen once a year, over 5 years we get 5 random “numbers”, all with the same M and sigma. Each number is the realized annual (log) return. The cumulative end-to-end return is the sum of the 5 independent random variables. This sum is a random variable with a variance, which is additive. Assumption is iid i.e. repeated noisegen hits.
In a different scenario, suppose we hit the noisegen once only and multiply the same output number by 5 in a “projected 5Y return”. Now std is additive.

In both cases, the mean of the 5Y end-to-end return is 5*M

cache-miss in(CPU hogging)hot-function

P245 [[Optimizing Linux Perf]] (2005) points out this “intriguing” scenario. Here are the investigation steps to identify it —

First, we use _oprofile_ to identify a function(s) taking up the most application time. In other words, the process is spending a large portion (I would imagine 5%+) of cpu cycles in this function. However, the cycles could be spent in one lengthy entry or a million quick re-entries. Either way, this would be a hotspot. Then we use oprofile/valgrind(cachegrind)/kcache on the same process, and check if the hot function generates high cache misses.

The punch line – the high cache misses could be the cause of the observed process hogging. I assume the author has experienced the same but I’m not sure how rare or common this scenario is.

Some optimization guy in a HFT shop told me main memory is now treated as IO, so cache miss is treated seriously. http://programmers.stackexchange.com/questions/142864/writing-low-latency-java mentions that “Cache misses are your biggest cost to performance. Use algorithms that are cache friendly.”

By the way, instruction cache miss is worse than data cache miss. My friend Shanyou also said the same.

share huge static memory chunk btw2processes

Based on P244 [[linux sys programming ]]

I am not quite sure about the use cases[1], but let’s say this huge file needs to be loaded into memory, by 2 processes. If readonly, then memory savings is possible by sharing the memory pages between the 2 processes. Basically, the 2 virtual address spaces map to the same physical pages.

Now suppose one of them — Process-A — needs to write to that memory. Copy-on-write takes place so Process-B isn’t affected. The write is “intercepted” by the kernel which transparently creates a “copy”, before committing the write to the new page.

fork() system call is another user of Copy-on-write technique.

[1] Use cases?

* perhaps a large library, where the binary code must be loaded into memory
* memory-mapped file perhaps

What code is machine/arch specific@@ syscalls, std lib …

A: (obviously) binary object code — including the unix/windows kernels. It runs on the bare hardware — must be machine specific.

A: Assembly language source code — must conform to the hardware instruction set. You can't use a i386 assembly code on a PowerPC. Note each line of assembly source code translates to a line in machine code.

A: C compiler — implements the ABI [1]. The object code produced by the compiler is binary machine code, so the compiler itself must be architecture-specific.

A: Syscalls – are specific to the Architecture. The linux syscalls for i386 architecture include about 300 functions. 90% of them are universally available on all architectures but the rest are architecture-specific.

A: C standard library (like glibc) – provides wrappers over syscalls. Since a small number of syscalls are architecture-specific, the std lib is necessarily architecture-specific. However, the “contamination” stops here – I believe anything linked to the std lib is portable at source level. Therefore the std lib provides a “standard” API. Java portability is even better – where the same bytecode compiled on one architecture is usable on any other, assuming 100% pure java without native code.

[1] API vs ABI – explained in [[linux system programming]]

## c++ instrumentation tools

(Each star means one endorsement)

oprofile **
gprof *
callgrind (part of valgrind)
sar *
strace *
(Compared to strace, I feel there are more occasions when ltrace is useful.)

*Pin threads to CPUs. This prevents threads from moving between cores and invalidating caches etc. (sched_setaffinity)

See more http://virtualizationandstorage.wordpress.com/2013/11/19/algo-high-frequency-trading-design-and-optmization/

## 10 unix signal scenarios

A signal can originate from outside the process or from within.

The precise meaning of signal generation requires a clear understanding of signal handlers. See P125 [[art of debugging]] and P280 [[linux sys programming]]

— External —
# (SIGKILL/SIGTERM) q(kill) commands
# (SIGINT) ctrl-C
# (SIGHUP) we can send this signal to Apache, to trigger a configuration reload.

— internal, i.e. some kernel code module “sends” this signal to the process committing the “crime” —
# (SIGFPE) divide by zero; arithmetic overflow,
# (SIGSEGV) memory access violation
# (SIGABRT) assertion failure can cause this signal to be generated
# (SIGTRAP) target process hitting a breakpoint. Except debuggers, every process ignores this signal.

java cod`IV #threading #Gelber

My solution was attached on 27 Jan 2011
Hi Bin,
Here is the problem I was speaking about:
1. Develop an application in Java that will satisfy the following requirements:

– Read in any number of text files.
– Calculate 5 most commonly used letters
– Calculate 3 most commonly used characters
– Calculate 3 most commonly used numbers
– Calculate 10 most commonly used words
– Concurrently parse all files, however, limit your application to 2 parser threads. If there are more files than available parser threads, you will need to queue files for parsing.

2. Write an application in Java that will pull down the HTML content of any number of specified websites – single file per URL, no depth. Strip out all metadata and generate statistics by showing 2 most commonly used letters, numbers, characters and words.


Fwd: IDC interview (12/09/13) Desktop

Subject: IDC interview (12/09/13) Desktop

I got the offer for this but turned it down today – HSBC seems a better one.

a lot of questions about Swing, since the project is a desktop data consolidation application using Swing and JIDE.

OO concepts:

Open-closed principle

Encapsulation and inheritance, your understanding?

aggregation or composition, comments?


enumerate the Collection APIs

Which Set implementation or interface retains the natural order of the elements?

[TB] Natural order is like the order in an English dictionary, right? TreeSet/SkipListSet, SortedSet (interface)


whiteboard Pascal triangle

Quick sort

binary search

How to refactor the following code snippet (they took out a code from Effective Java):

public class Person {

private final Date birthDate;

public boolean isBabyBoomer() {

Calendar gmtCal = Calendar.getInstance(TimeZone.getTimeZone(“GMT”));

gmtCal.set(1946, Calendar.JANUARY, 1, 0, 0, 0);

Date boomStart = gmtCal.getTime();

gmtCal.set(1965, Calendar.JANUARY, 1, 0, 0, 0);

Date boomEnd = gmtCal.getTime();

return birthDate.compareTo(boomStart) >= 0 && birthDate.compareTo(boomEnd) < 0;



[TB] I feel isBabyBoomer should become a static utility method, like

isBabyBoomer(Person) or isBabyBoomer(Date)

The boomStart and boomEnd should be static constants. No need to create them as temp objects in each invocation.

Alternatively, we only need to look at a given year of birth “YOB”. Is baby boomer IFF 1946 <= YOB <= 1964 i.e. YOB between 1946 and 1964.

Am I right?

Fwd: HSBC interview (12/11/13, Jersey City), a Dodd Frank Recon Report project

Looks like most of java questions are on threading and spring, and nothing else?

See my answers below.


if wait/notify is called outside the sync block, what exception is thrown?

[TB] Doing this won’t work. It’s a programmer error, not system error like out of memory. I don’t remember this exception because I have never made this mistake. 

So this is an obscure question.

enumerate Executors’s static methods to create a thread pool

[TB] I remember ExecutorS.java has a bunch of static factory methods. They let us specify the queue data structure used, the size of the queue, (min/max) size of the pool. There are also methods to create a single-thread pool. I think there are also methods to create timer-enabled thread pool.

how do you do to ensure a thread processed prior to other active threads. What do you in 1.4 and in 1.5 or later?

[TB] Thread priority is a platform-dependent feature. Usually java threads are kernel threads, so the kernel scheduler decides whether to honour the priority I set on my java threads. Sometimes the kernel scheduler ignores the priority I set. If that happens, I call yield() in many places in the methods executed by other threads. However, the “important” thread may still get pre-empted (context-switched out). In such a case, I would need to use lock or condition variables to block all other threads. Another techniques is a global Boolean flag, set by the important thread, and read-only by all other threads. If it’s 0, then all other threads will go back to a sleep loop.

Volatile variable? What’s your comment?


How many ways of instantiating a Spring container?

How many ways of injecting a bean in Spring XML config 

What’s drawbacks of using constructor as injection mean? circular reference, what’s exception will be thrown?

Spring annotation. If a bean must be provided with a constructor with another bean injected, what’s the attribute of the annotation should be used to enforce it.

What’re the scopes in a bean tag in Spring XML?

If a scope being prototype, what will return from using “getBean()”


How to check the long running java process, is it done, deadlock, hanging there or still running.

[TB] Good question. If it is done or deadlocked then cpu usage would be 0 for a long time (like a few minutes). Hang is always for specific reasons like deadlock or blocked on IO.

If the java is a network server then it could be waiting for requests, so no-activity is OK. We had better know the actual type of program it is.

[TB] Ideally there should be log entry from the process. For a network server log could say “waiting for requests”. In reality, the programmer may have forgotten to do the right things.

kill -3 to get the threasdump


empty while(true) loop hogging CPU

Someone (barclays?) gave me a basic tuning quiz —

Q: what cpu utilization will you see if a program executes an empty while(1==1){} ?

A: On a 16-core machine, 3 instances of this program each take up 4% of the aggregate CPU according to windows taskmgr. I think 4% means hogging one entire core.

A: on my RedHat linux, the same program has 99.8% CPU usage meaning one entire core.

A: On a dual-core, if I start 2 instances, each takes up about 50% i.e. one entire core, keeping all other processes off both cores. With 3 instances, system becomes visibly slow. Taskmgr itself becomes a bit hard to use and reports about 30%-40% by each instance.

I think this would count as cpu intensive job.

R-sqaure — drilling down a bit

http://web.maths.unsw.edu.au/~adelle/Garvan/Assays/GoodnessOfFit.html shows that R^2 can be negative if a constant intercept need to be included. Because R-square is defined as the proportion of variance explained by the fit, if the fit is actually worse than just fitting a horizontal line then R-square is negative.

An R-square value of 0.8234 means that the fit explains 82.34% of the total variation in the original data points about their __average__

execution risk in a VWAP execution algo

Background: VWAP strategy is known to have minimal market impact but bad “execution risk”.

Suppose you are given a large (500,000 shares) Sell order. Suppose your goal is minimal market impact i.e. avoid pushing up the price. What execution strategy? I don't know about other strategies, but VWAP strategies generally participate according to market volume, so given a decent implementation the market impact is often … reduced.

I think the idea of the Exec risk is the _uncertainty_ of the final block price. If an implementation offers a very tight control and results in well-controlled final block price, then exec risk is small. http://www.cis.upenn.edu/~mkearns/finread/impshort.pdf explains with an example —

suppose MSFT trades 30 million shares on an average day. If a trader has three million MSFT shares to trade, a VWAP algorithm may be appropriate. However, if the trader

gets 30,000 shares of MSFT to trade, then the savings of market impact (by spreading the trade over the whole day) is not significant compared against the opportunity cost the trader could save by trading the stock within the next few minutes. Quick execution means the uncertainty (or std) in “final block price” is much reduced. With a small order you would achieve something close to the arrival price.

a few ideas on how to manage the manager #OCBC

This is about a generic manager, not a particular person☺. I will use “she” or “he” interchangeably.

• Result or reason? Managers want results, not reasons. When she asks me for a reason, what she really wants is how she can help solve the problem and make progress towards the result.

• I don't disclose my actual implementation details. If managers ask, I try to avoid the details. I feel managers don't want to be bothered with implementation details, esp. when the codebase grows big and my module is outside the most critical 10% core modules.

• I seldom deviate from manager's technical direction. If I must deviate, I try to keep a low profile and get things to work quickly. If I miss a deadline and attracts his question, then I try to give a reason without disclosing my deviation.

• When things get unbearable, I tell myself I am not imprisoned for life here.

• When I receive a put-down remark, I remind myself I'm competent and I generally get things done, and I am respected by my colleagues.

• When asked about progress, I used to give a guarded response like “working but there are some questions about ….”.

DOS | Temporarily change working directory for a single command

pushd myjava & java -Djava.rmi.server.codebase= -classpath %CP% com.ocbc.quest.murexgateway.MurexServer 19673  & popd


This DOS command line does 3 things

1)      temporarily chdir into “myjava” folder and

2)      run a (long) java command line, and then

3)      restore the previous directory.


Actually, the java process blocks the script forever. If you use ctrl-C to terminate the blocking java process, you still get back into the previous directory J





risk-neutral measure, a beginner’s personal view

Risk neutral measure permeates derivative pricing but is not clearly understood. I believe RN measure is very useful to mathematicians. Maybe that’s why they build a complete foundation with lots of big assumptions.

Like other branches of applied math, there are drastic simplifying assumptions….

I think the two foundation building block are 1) arbitrage and 2) replication. In many textbook contexts, the prices of underliers vs derivatives are related and restrained by arbitrage. From these prices we can back out or imply RN probability values, but these are simplistic illustrations rather than serious definitions of RN measure.

On top of these and other concepts, we have Martingale and numeraire concepts.

Like Game programming for kids and for professionals, there are 2 vastly different levels of sophistication:
A) simplified — RN probabilities implied from live prices of underliers and derivatives
B) sophisticated — RN infrastructure and machinery, based on measure theory

"preempt a thread" == move it off driver’s seat

First some jargon. The “eligible club” (as described in other posts) includes all the runnable threads that are eligible to take the driver’s seat. In contrast, a “waiting” thread is BLOCKED and suspended.

Usually there are more eligible threads than processors (even though some CPU’s like the Sparc T1 can have 32 simultaneous threads on 8 cores.) In other words, there are are more drivers than cars.

The technical jargon “preempt” means “move an executing thread off the driver’s seat”. See P127[[headfirstC]]

[[linux sys programming]] P165 has more details.

vwap execution chasing the wrong signal – my guess

A vwap algo starts with a “model profile”, which tells us each hour (or minute) of the trading day typically experiences how many percent of the total daily volume.

Then the algo tries to execute according to the model profile, executing 10% in the first hour. The actual market profile may show a spike in the second hour. Suppose 2nd hour usually gets below half of first hour according to the model profile, but we see it's going to more than double the first hour, because the past 5 minutes show a very high spike in volume.

Question is, should we increase our trade rate? I guess there's reason to do so. When the volume spikes, we should trade bigger chunks so as to let the spike “mask and absorb” our market impact. If we don't capture this spike, then 2nd hour might end up being 80% of daily volume, but we only put in 4% our quantity, so our remaining quantity would cause market impact.

However, it's also possible to chase the wrong signal. The spike might cause a large rise or (more likely in realistic panic-prone markets) drop in price, which could reverse soon. Suppose we are selling a big quantity and the spike causes a big drop. Our active participation would further deepen the drop. We might do better to patiently wait for a reversal.

algorithm efficiency at Google/HFT: random thoughts

I was told Google engineers discuss algorithm efficiency everyday, including lunch time.

I feel latency due to software algorithm might be a small component of overall latency. However, the bigger portions of that latency may be unavoidable – network latency, disk write(?), serialization for transmission(?), … So the only part we could tune might be the software algorithm.

Further, it’s also possible that all the competitors are already using the same tricks to minimize network latency. In that case the competitive advantage is in the software algorithm.

I feel algorithm efficiency could be more permanent and fundamental than threading. If I compare 2 algorithms A1 and A2 and find A2 being 2x A1’s speed, then no matter what threading or hardware solutions I use, A2 still beats A1.

black littermam — my brief notes

http://www.blacklitterman.org/what.html is a brief description.

First, forget about optimizer, capm, MeanVariance, or views. Let's first get a grip on Bayesian inference. Given an unfair coin, we are trying to estimate the Pr(tail) by tossing it over and over. There's uncertainty (a pr distribution) about the mean.

Updating estimates — Now we can formulate the problem as how to update our estimate of expected return when we get some special insight (like insider news). Such an insight is called a subjective “view”, in contrast to the public information about the securities. The updated estimates must be numbers, not some vague preference.

Optimizer — Once we get the updated estimates, they go into a regular portfolio allocation optimizer.

Problems of MV — concentration. Small change in the input (returns, covariance etc.) leading to drastic re-alloations.

The investor is uncertain in their estimates (prior and views), and expresses them as distributions of the unknown mean about the estimated mean. As a result, the posterior estimate is also a distribution.

c++iv: Jump#2

Mostly obscure QQ type of questions. I feel i may have to give up on some of the very low level (perf optimization) topics. I feel java and c# interviews are not so low.

Q: Stack overflow – who can detect it and print an error msg? JVM can do it but what if there’s no VM?

Q: What data type would you use for the tasks in a thread pool??
(I find this question too advanced. c++11 offers Futures…)
%%A: look at pthread-create. a func ptr taking a void ptr

Q: After malloc(), how do you cast the pointer to MyClass* ? Do you call the ctor? How?
(This is asked again by Alex of DRW)
A: placement-new?

  • Inter-thread communications in thread pool – how does it work?
  • Thread pool — Your resume mentioned your home-made thread pool? How?
  • Boost::any, boost::bind, boost::function
  • CPU cache – how do you use it to improve performance? Any specific techniques?
  • Stack size – who controls it? at Compile time or run time?
  • Shared ptr – how is it implemented?
  • Scoped lock – what is it, why use it?
  • Your bash shell customizations as a cpp developer?
  • $LD_LIBRARY_PATH — what is it?

2 JGC algos for latency^throughput

https://databricks.com/blog/2015/05/28/tuning-java-garbage-collection-for-spark-applications.html is a 2015 Intel blog.

Before the G1 algo, Java applications typically use one of two garbage collection strategies: Concurrent Mark Sweep (CMS) garbage collection and ParallelOld GC (similar to parallelGC, the java8 default).

The former aims at lower latency, while the latter is targeted for higher throughput. Both strategies have performance bottlenecks: CMS GC does not do compaction, while Parallel GC performs only whole-heap compaction, which results in considerable pause times.

  1. For applications with real-time response, “generally” (by default) we recommend CMS GC;
  2. for off-line or batch programs, we use Parallel GC. In my experience, this 2nd scenario has less stringent requirements so no need to bother.

https://blog.codecentric.de/en/2013/01/useful-jvm-flags-part-6-throughput-collector/ (2013) has an intro on Throughput vs. pause times

exit code lost – Horror story on Microsoft dev environment

This is such a poorly documented feature!

If you run your exe and then echo %ERRORLEVEL% you can get the correct exit code 99. If you use WScript.Shell Exec to invoke it, then check the ExitCode property you always see 0.


http://www.visualbasicscript.com/Getting-exit-code-with-Exec-method-m62896.aspx shows the solution.

Do While objExec.Status = 0
WScript.Sleep 100

WScript.Echo objExec.ExitCode

http web service in async mode@@ #YH

I think in my dotnet client/server system, we use WCF duplex extensively, meaning the server can push updates to the clients. Each client must subscribe though. I think this is exactly event-driven as you said.

I remember that if I put a break point in the client’s updateReceived() callback method, it gets hit automatically, without the client polling the server.

WCF duplex is a standard and mature feature in microsoft WCF.

The server endpoint uses https, and I believe it’s a web service.

Is it possible that the server push is implemented actually by client poll under the hood? I don’t think so. There’s a polling duplex ..

See http://msdn.microsoft.com/en-us/library/cc645027(v=vs.95).aspx

c++Chicago/Sing (Jump) IV Aug 2012

Q: UDP vs TCP diff?
%%A: multicast needs UDP.
%%A: UDP is faster – no connection setup/teardown no error check no ACK, no sequence number; shorter emvelope

Q: How would you add reliability to multicast?
%%A: sequence number

Q: How would you use tibco for trade messages vs pricing messages?
%%A: the trade msg must be delivered reliably to back office?
%%A: one of them is real time?

Q5: In your systems, how serious was data loss in non-CM multicast?
%%A: Usually not a big problem. During peak volatile periods, messaging rates could surge 500%. Data loss would deteriorate.

Q5b: how would you address the high data loss?
%%A: test with a target message rate. Beyond the target rate, we don’t feel confident.
A: tune the tibco reliability parameter —  http://javarevisited.blogspot.sg/2011/04/dataloss-advisory-in-tibco-rendezvous.html

Q7: how is order state managed in your OMS engine?
%%A: if an order is half-processed and pending the 3nd reply from ECN, the single thread would block.

Q7b: even if multiple orders (for the same security) are waiting in the queue?
%%A: yes. To allow multiple orders to enter the “stream” would be dangerous.

Now I think the single thread should pick up and process all new orders and keep all pending orders in cache. Any incoming exchange messages would join the same task queue (or a separate task queue) – the same single thread.

3 main infrastructure teams
* exchange connectivity – order submission
* exchange connectivity – price feed i.e. market data. I think this is incoming-only, probably higher volume. Probably similar to Zhen Hai’s role.
* risk infrastructure – no VaR mathematics.