testing threaded designs – enterprise apps( !! lib)

Bottom line – (Unconventional wisdom) Be bold to create new threaded designs. It doesn’t have to be rock solid like in the standard library.

Experts unanimously agree that non-trivial MT designs are hard to verify or test, often exceedingly hard. There are often too many possibilities. Maybe a million tests pass but next test reveals a bug. Therefore peer review is the way to go.  I feel that’s the “library” view, the view from the language creators. Different from enterprise apps.

In enterprise apps, if a MT design passes load test and UAT, then good enough. No budget to test further. If only 1 in a million cases fail then that case has something special — perhaps a special combination or purely timing coincidence. Strictly speaking those are still logical errors and design defects. A sound design ought to handle such cases gracefully. Most designs aren’t so sound. If such a failure happens so rarely, just about everyone involved would agree it’s basically impossible to catch it during testing. Such a bug, if ever uncovered, would be too hard to catch and catching it should not be in any job scope here. Missing it is tolerable and only human.

A goal keeper can’t be expected to catch 99 penalties in a row.

In the enterprise reality, such a bug is probably never uncovered (unless a log happens to provide the evidence). It often takes too much effort to investigate such a rare issue. Not worth the effort.


immutability of a c# delegate instance

Mutability is all about object state. We are familiar with object state of ordinary objects, but delegate objects are different – subtle.

A delegate’s object state is its inv list. More specifically, the state is the ordered list of 2-pointer thingies.

So immutability means that the collection and ordering are permanent. Each 2-pointer thingy is naturally immutable given that object pointer [1] and the function pointer are just addresses rather than “pointer variables”. In other words, each 2-pointer thingy holds 2 addresses and there’s no way to “edit” any address.

Immutability doesn’t cover the Target object. Target object is fully mutable. This is what confuses folks like me. If an immutable Student object HAS-A professor field, then the professor pointee object is presented as immutable. Here’s the key — Every time a Student method returns the professor, the professor is cloned, so the original professor instance is never exposed/leaked. No such cloning underlies the immutable delegate instance.

[1] This is actually the Target property of the dlg instance. For a static method, Target is null.

fear: c# iwt java/python

The more my GTD “survival” depend on tools or environment, the more fearful and the less confident I feel.

Extreme eg:  When programming Perl, php (and python) I don’t rely on any complicated tool. The libraries I use tend to be simple and small.

Extreme eg: Excel VBA is fully inflicted with infrastructure dependency. Any idiosyncrasy or nuance of Excel can give inexplicable subtleties.

— Java —

Java is a powerful, mid-level language with a remarkably robust sandbox. The ugly, dirty world outside is largely kept away from us programmers, so we operate in a clean room, sandbox, or a constant-temperature green house.

However, when I had to debug into Spring-JMS I hit the same infrastructure-dependency.

JNI — The sandbox leaks when you play with JNI. Infrastructure dependency on a massive scale. You now need a lot of “platform-knowledge” about the “platform” outside. It’s a lot dirtier than inside the sandbox.

Eclipse Debugger — is less well-behaving and less understood than other aspects of java development. When the source code is out of sync with the executing binary, things could keep going but would fail in strange ways — undefined behavior.

How about stack overflow. No exception. Just crash.

— c# —

The c# language is more feature-rich than java, but still consistent and structured. No complaints. However, when you use c# for real projects you hit infrastructure dependency. I mentioned in numerous blogs how much platform knowledge needed.

Unlike the famed write-once-run-anywhere portability of java, Many really, really important parts of C# development are tied into the “platform” outside, meaning developers need that platform knowledge. Just a few examples


* threading implemented on top of win threads

* many critical tasks need MSVS

* Excel integration

* windows service integration

SGD exchange rate management by MAS

HK government uses a “pegged FX rate”. The drawback is “imported inflation”. HKD is kept artificially low, so HK living cost is higher.

SG government chose a different strategy – “managed float”. Whenever SGD FX rate exceeds the policy band, MAS would buy/sell SGD in the open market, which cost tax payer’s money.

Due to the impossible trinity (http://en.wikipedia.org/wiki/Impossible_trinity),
– SG government loses the freedom to set independent interest rate. SGD interest rate is forced to follow USD interest rate.
– China has capital control, Fixed exchange rate and Independent interest rate policy.

python – convert sequence of obj to strings then concat

P60 [[cookbook]] shows a neat trick

>>> data = [‘aa’, 50, 91.1]
>>> ‘, ‘ . join(str(d) for d in data)
‘aa, 50, 91.1’

Technique: generator expression,
Technique: str() conversion ctor. Without conversion, joining string with non-string leads to exceptions.
Technique: calling join() method on the delimiter string

The author points out that string concat can be very inefficient if you blindly use “+” operator. Similarly, java/dotnet offers stringBuilders, and c++ offers stringstream

python LGB rule, %%take

http://my.safaribooksonline.com/book/programming/python/1565924649/functions/ch04-28758 clarifies many key points discussed below.

(The simple-looking “global” keyword can be tricky if we try to understand its usage too early.)

The key point highlighted inadequately (except the above book) is the distinction between assignment and reference.

* AA – Assignment – When a variable appears on the LHS, it’s (re)created in some namespace [1], without name lookup. By default, that namespace is the local namespace.
* RR – Reference – any other context where you “use” a variable, it’s assumed to be a pre-existing variable. Query and method invocation are all references. Requires name lookup, potentially causing an error.

Let me test you on this distinction —

Q: what happens if I Assign to a non-existent variable?
A: I believe it will get created.

Q: what happens if I Reference a non-existent varialbe?
A: I believe it’s a NameError, in any context

The LGB search sequence is easily understood in the RR context. Failed search -> error.

In the AA context, there’s no need to look up, as the variable is simply created.

shadowing — local var often shadows a global var. This makes sense in the RR case, and I think also in the AA case.

[1] A namespace is probably implemented as an “idict”, a registry (presumably) mapping variable names to object addresses.

Now we are ready to look at the effect of keyword “global”.

RR – the LGB search sequence shrinks to “G”.
AA – the variable is simply (re)created in the G rather than L namespace.

http://my.safaribooksonline.com/book/programming/python/1565924649/functions/ch04-28758 points out that “Global names needs to be declared only if they are assigned in a function.” If not assigned, then no shadowing concern.

risk premium — clarified

risk premium (rp) is defined as the Expected (excess) return. A RP value is an “expected next-period excess return” (ENPER) number calculated from current data, using specific factors. A RP model specifies those factors and related parameters.

Many people call these factors “risk factors”. The idea is, any “factor” that generates excess return must entail a risk. If any investor earns that excess return, then she must be (knowingly/unknowingly) assuming that risk. The Fama/French value factor and size factor are best examples.

Given a time series of historical returns, some people simply take the average as the Expected. But I now feel the context must include an evaluation date i.e. date of observation. Any data known prior to that moment can be used to estimate an Expected return over the following period (like12M). Different people use different models to derive that forward estimate i.e. a prediction. The various estimates create a supply/demand curve for the security. When all the estimates hit a market place, price discovery takes place.

Some simple models (like CAPM) assumes a time-invariant, steady-state/equilibrium expected return. It basically assumes that each year, there’s a big noisegen that influences the return of each security. This single noisegen generates the return of the broad “market”, and every security is “correlated” with it, measured by its beta. Each individual security’s return also has uncertainty in it, so a beta of 0.6 doesn’t imply the stock return will be exactly 60% of the market return. Given a historical time series on any security, CAPM simply takes the average return as the unconditional, time-invariant steady-state/equilibrium estimate of the steady-state/equilibrium long-term return.

How do we benchmark 2 steady-state factor models? See other blog posts.

Many models (including the dividend-yield model) produce dynamic estimates, using a recent history of some market data to estimate the next-period return. So how do I use this dynamic estimate to guide my investment decisions? See other posts.

Before I invest, my estimate of that return needs to be quite a bit higher than the riskfree return, and this excess return i.e. the “Risk premium” need to be high enough to compensate for the risk I perceive in this security. Before investing, every investor must feel the extra return is adequate to cover the risk she sees in the security. The only security without “risk premium” is the riskfree bond.

oprofile – phrasebook

root – privilege required to start/stop the daemon, but the query tools don’t need root

dtrace – comparable. I think these two are the most powerful profilers on solaris/linux.

statistical – results can be partially wrong. Example – call graph.

Per-process – profiling is possible. I think default is system-wide.

CPU – counters (hardware counters). Therefore, low impact on running apps, lower than “attachment” profilers.

userland – or kernel : both can be profiled

recompile – not required. Other profilers require recompiling.

kernel support – must be compiled in.

oprifiled – the daemon. Note there’s no executable named exactly “oprofile”.

[[Optimizing Linux performance]] has detailed usage examples of oprofile. [[linux programmer’s toolbox]] has decent coverage too.