# Month: March 2015

# Fwd: difficult phone screen with Credit Suisse

# Fwd: java threading questions {xr

# Fwd: vague question { xr – production troubleshoot

# Fwd: MS IV 20141211

# Fwd: Bloomberg interview questions 2/13/2015 {XR

String getRegularExpression(String str1, String str2);

Base on input 2 strings, return a regex String that can cover 2 input strings.

e.g “Bloomberg”, “BloomingDale” should output “Bloom.*g.*”

I proposed a solution:

step 1: find out common segments of the 2 strings

step 2: insert “.*” in between the common segments to form a string

He asked: how to find the common segments?

also he let me think of below examples, but I really couldn’t find any

clue from his example.

“Bloomberg”, “BloomingDale”

“StarBucksCoffee”, “RedCoffee”

In the end, he said my proposed solution is not technically optimized.

Obviously, he has better solution, but he didn’t give me the hint.

So I still have no idea at all.

Can you think of a solution?

# beware of creating reference to dereferenced pointer

In one sample code (p190 [[c++in24hour]]), we see that once you delete a pointer, the reference to the object becomes a dangling __reference__. More common when multi-threaded.

I guess sometimes we do create a reference to a dereferenced pointer, but we have standard guards to avoid dangling references. What are these guards? Not sure.

Reference param is best practice in c++. What if caller passes an unwrapped pointer into a function’s ref param? How do you ensure the underlying heap/stack thingy doesn’t get deallocated?

– Single-threaded — if the pointee is accessible only on this thread, it can’t possibly “vaporize” beneath my feet. The caller won’t (technically impossible) delete the ptr before the func returns.

– multi-threaded — all hells break lose. We need guards.

There are many similar multi-threading “delete” issues to be solved with ownership control. Only one function on one thread should delete.

# memory leak in GUI event handlers – swing^c#

Here’s a common java swing memory leak as described on stack overflow — Essentially, because you register an object as a listener and forget to unregister, the notifier ends up holding onto the reference of the object, and leaks a bit of memory.

Same problem with c# events. [[effective c#]] refers to it as “runtime coupling despite compile-time loose coupling”. Recall an event

is a pseudo field in a class, and the fields’ type is a delegate, which is a list of 2-pointer constructs. First pointer usually points to a listener object. Imagine a bunch of large listener objects are added to the delegate’s inv list (registered for event), and one of them is no longer needed. You want its memory freed, but the event source object still holds a reference via the delegate instance. See [[effective c#]] item 23. One tip is to unregister in the listener’s Dispose()

I feel the dotnet event/delegate language construct has no exact java counterpart at the language level, but at runtime, swing event

paradigm == c# event paradigm. Fundamentally the same paradigm. Part of the paradigm is that the event source object holds a

collection of subscriber Objects. If the GUI stays up for 22 hours then the subscriber Objects are held “hostage” for that long, and

cannot be garbage collected.

# haunted by negative workplace relations #YH

Hi YH,

I too get haunted by criticism, failed relationships (but not those sexual types;) with past coworkers, or my poor judgement and decisions in past projects.

*I have always enjoyed good relationships with all my team members under the same manager. Everyone is friendly, nice or at least OK with me. Similarly, all my users liked me. Some colleagues and users even treat me as the best guy in my team.*“

**rather**powerful protection when I feel “haunted”. This view may be naive but it’s an example of a healthy self-image IMO.

*Those managers I had problem with is an unpopular guy. Each has lost more than 1 team member due to the harsh mistreatment he gives. Each is hated by some team members and not respected by many. Basically, each is a difficult boss, but I actually managed to survive and adapt to them for longer than many fellow sufferers.*“

# value-creation sectors – turn tech2highest profit

Notes on Raymond Teo discussion: west coast shops … creating big value through high user base, high data, …..innovation is high risk — “risk premium”. In contrast, Singapore IT projects have a fixed “value” i.e. the project budget. ($1m is quite big.) Directly impacts IT salary.

Those “upstream” sectors often needs a smaller number of elite techies.The “upstream” don’t really exist in Singapore. Therefore, only a small elite of techies in Singapore would end up in the financial IT domain. The vast majority of techies live in a totally different world and earn S$60k (up to $100k) a year as non-managers.

As I stated elsewhere, an mediocre techie easily earn a Wall St salary on Wall St, but the same guy would earn $60k in Singapore.

—–

Database/OperatingSys vendors enjoy highest margins (in silicon Valley) but no all Oracle clients get the same ROI buying oracle. Let’s compare the major client-industries that are using technologies. In which sectors does technology create the most profit? Note consumers are the largest user population of technology, but consumers don’t make profit and are not an “industry”, and (crucially) won’t pay us a salary. Our focus here is the major commercial sectors/industries using technology to make profit.

Telecom equipment makers make heavy use of technology and have good margin. Like the DB/OS creators, a small number of elite engineers create industry standards…. Anyway, here are a few sectors.

#1a) HFT or algo trading, either sell side or buy side

1a) other trading engines

2) risk engine, but is the calc really useful, relevant and accurate to be reliable?

2) settlement system can be just as critical and valuable to an i-bank

1b) search engine?

1b) facebook, linkedin, Amazon …

1c) DB/OS/language, as described above, is not a client industry, but DB/OS profit margin is higher than all other domains.

?) Atlassian

4) Telecom equipment maker, as described above

5) commercial banking, corporate/retail banking, insurance. Lower margin than trading, but higher than non-financial service industries. ecommerce like amazon, ebay…? I feel thousands of online services offer useful services but have low margin. B2B better than B2C.

#eor

telecom operators? I know a lot of operators (US or S’pore) don’t pay very high, partly because technology is at equipment makers. health care? logistics? no no no

# using dB_t vs B_t in a formula

label – stoch

(dW is a common synonym of dB)

Whenever I see dB, I used to see it as a differential of “B”. But it’s undefined – no such thing as a differential for a Brownian motion!

Actually any dB equation is an integral equation involving a “stochastic integral” which has a non-trivial definition. Greg Lawler spent a lot of time explaining the meaning of a stochastic integral.

I seldom see B itself, rather than dB, used in a formula describing a random process. The one big exception is the exact formula describing a GBM process.

<!–[if gte msEquation 12]>*S**t**=**S**0*exp*t**m-**s**2**2**+**B**t** s*<![endif]–>

Greg confirmed that the regular BM can also be described in B rather than dB:

given <!–[if gte msEquation 12]>*d**X**t**=m dt+s d**B**t*<![endif]–>

<!–[if gte msEquation 12]>*X**t**=**X**0**+m t+s **B**t*<![endif]–>

This signal-noise formula (see other post) basically says random walk position X at time t is a non-random, predictable position with an chance element superimposed. The chance element is a random variable ~ N(mean 0, variance s^{2}t).

This is actually the most precise description of the random process X(t).

We can also see this as a generalized Brownian motion, expressed using a Standard Brownian motion. Specifically, it adds drift rate and variance parameter to the SBM.

# martingale – learning notes#non-trivial

_{1}(X, Y…, t)dB

_{1}+ S

_{2}(X, Y…, t)dB

_{2 }+…

^{3}is NOT a martingale even though by symmetry E[B^3] = 0. (Local) Martingale requires

# BM ^ GBM with or +! drift – quick summary

B1) BM with dB and drift = 0? the Standard BM, simple but not useless. A cornerstone building block of more complex stoch processes.

G1) GBM with dB and drift = 0? The simplest, purest GBM. The tilting function used in Girsanov theorem. See my blog “GBM + zero drift”

G2) GBM with drift without dB? a deterministic exponential growth path. Example – bank account. Used in BS and pricing.

B2) BM with drift without dB? a linear, non-random walker, not a real BM, useless in stoch.

B3) Now, how do we deal with a BM with both drift and dB? use G1 construct as “tilting function” to effect a change of measure. In a nutshell,

B3 –(G1)–> B1

G3) GBM with drift + dB? The most common stock price model.

# from BM’s to GBM’s drift rate – eroded by .5 sigma^2

_{t}

_{t}– X

_{0 }= m*t + s*B

_{t}

_{t}” has a non-trivial meaning. It is not same as adding two numbers or adding two variables, but rather a signal-noise formula… It describes a Process, with a non-random, deterministic part, and a random part whose variance at time t is equal to (s

^{2}t)

_{t }…….. [2]

_{0}exp[ (m- ½ s

^{2})t + s B

_{t}] ………. [3]

_{t}” expression has a non-trivial meaning. It describes a Process, whose log value has a deterministic component, and a random component whose variance is (s

^{2}t).

^{2}) isn’t the drift of GBM process G(t), because left hand side is “dG / G” rather than dG itself.

^{2}) is a drift rate in the “log” process L(t) := log G(t). This log process is a BM.

dL = (m – ½ s^{2}) dt + s dB_{t} …… [4]

^{2}).

^{2 }] Y(t) dt

^{2}) dt, not “…L(t) dt”

# BM: y Bt^3 isn’t a martingale

^{3}].

^{3}, so E[(X+dX)

^{3}] need to equal E[X

^{3}] ….

_{t}

^{3}is symmetric so mean = 0. It’s 50/50 to be positive or negative, which does NOT make it a martingale. I think the paradox is filtration or “last revealed value”.

_{t}

^{3}is symmetric only when predicting at time 0. Indeed, E[B

_{t}

^{3}| F_0] = 0 for any target time t. How about given X(t=2.187) = 4?

# stoch integral – bets on each step of a random walk

Total area is a RV with Expectation = 0.

—

__integrand’s value__(over a brief interval deltaT) as a “bet” on a coin flip, or a bet on a random walk. I find this a rather intuitive, memorable, and simplified description of stoch integral.

^{2}] ? Surely positive. We need the variance rule…

# BM with drift is more visual than a GBM +! drift

Now I think BM with drift is easier to deal with than GBM — with or without drift.

For a BM with drift, the position at a future time T has a familiar normal distribution. The next increment (+ve or –ve), too, has a normal distribution.

The GBM has a asymmetric bell shape. Harder to visualize.

# matlab | index of first +ve value

See the stoch HW8 on Poisson process, where our array has timestamps of each consecutive jump, and we need to find how many jumps by time=2 minutes.

# product of 2 stoch processes driven by same BM

_{t}:= dB

_{t}

_{}

_{t}:= 5dB

_{t}

_{0}= Y

_{0}= 0

_{t}

_{ }:=X

_{t}Y

_{t}so dZ = X dY + Y dX + dX dY = (10B

_{t}) dB

_{t}+ 5 dt

# order routing inside an exchange #Nsdq

A major stock exchange’s job spec mentions that the exchange actually runs an internal SOR.

I thought only trading houses maintain SOR. Now I know that when an order (probably a market order) hits a stock exchange NDQ, NDQ may need to route it somewhere else. Not sure if it’s internal or external destination.

My friend Alan said all 13 U.S. exchange receive a live NBBO feed. Based on that, ABC can “forward” the order to the “best” venue.

# change of measure, learning notes

See also — I have a long MSWord doc in my c:\0\b2b

Key — Start at discrete. Haksun’s example on coin flip…. Measure-P assigns 50/50 to head/tail. Measure-Q assigns 60/40 weights. Therefore we can see dQ/dP is a function (denoted M ), which maps each outcome (head/tail) to some amount of probability mass. Total probability mass adds up to 100%.

Requirement — the 2 measures are equivalent i.e. the pdf curve have exactly the same support. So M() is well-defined [1]. In the continuous case, suppose the original measure P is defined over a certain interval, then so is the function M(). However function M isn’t a distribution function like P, because it may not “add to 100%”. (I guess we just need the Q support to be a subset…)

Notation warning — V represents a particular event (like HT*TH), M(V) is a particular number, not a random variable IMO. Usually expectation is computed from probability. Here, however, probability is “defined” with expectation. I think when we view probability as a function, the input V is not a random variable like “X:=how many heads in 10 flips”, but a particular outcome like “X==2”.

Now we can look at the #1 important equation EQ1:

Q(V) := Ep [M(V) 1V] , where Ep () denotes expectation under the original Measure-P.

This equation defines a new probability distro “Q” using a P-expectation.

Now we can look at the #2 equation EQ2, mentioned in both Lawler’s notes and Haksun:

EQ[X] = Ep [ X M(X) ]

Notation warning — X is a random variable (such as how many heads in 10 flips), not a particular event like HT*TH. In this context, M(X) is a derived RV.

Key – here function M is used to “tilt” the original measure P. This tilting is supposed to be intuitive but not for me. The input to M() can be an event or (P141) a number! On P142 of Greg’s notes, M itself is a random variable.

Next look at a continuous distro.

Key – to develop intuition, use binomial approximation, the basis of computer simulation.

Key – in continuous setting, the “outcomes” are entire paths. Think of 1000 paths simulated. Each path gets a probability mass under P and under Q. Some paths get higher prob under Q than under P; the other paths get lower prob under Q.

The magic – with a certain tilting function, a BM with a constant drift rate C will “transform” to a symmetric BM. That magic tilting function happens to be …

I wonder what this magic tilting function looks like as a graph. Greg said it’s the exponential shape as given at end of P146, assuming m is a positive constant like 1.

In simple cases like this, the driftless BM would acquire a Positive drift via a Positive tilt. Intuitively, it’s just _weighted_average_:

* For the coin, physical measure says 50/50, but the new measure *assigns* more weight to head, so weighted average would be tilted up, positively towards heads.

* For the SBM, physical measure says 50/50, but the new measure *assigns” more weight to positive paths, so the new expectation is no longer 0 but positive.

[1] This function M is also described as a RV with Ep [M] = 1. For some outcomes Q assigns higher probability mass than P, and lower for other outcomes. Average out to be equal.

# log normal RV raised to any power is also log normal

^{n}is also a lognormal variable.

# [17]U.S.hunt priorities #400w

To optimize for income, I would leverage on my 1) analytics 2) threading 3) SQL (+ possibly algo) expertise. Importance of income:

- health insurance — will need for kids for sure
- first few gigs might be low rate, given the dry season
- home purchase in 5Y — confirmed requirement. If I want short commute + Chinese community, then price will be high
- might go without income longer, like a few months
- initial set-up cost — 5k-10k, but at this level adjustment on wife’s part will be tough. Even a 20k relief fund is drop in a bucket. Need 100k.
- tail risk — I might come back to SGP sooner (like end of 2017) and take a low-pay job
- tail risk — if GC process goes astray, I may go back to SG without a GC, so the X years I spent here I need to earn enough.

May not have much of a choice, given the slow c2c market right now.

My 2017 priorities:

- avoid churn? so avoid web, javascript, in-memory DB,
~~spring/hibernate~~and most of the new technologies I hear about the west coast - muscle-building contexx? so avoid java/sql? Better to deepen my c++ as a 2nd front. Willing to take lower salary? LG2 can postpone this to 2018/2019. See post on re-enter c++
- my c++ learning is reaching a critical mass
- —- unsorted lower priorities —-
- income to keep me feeling secure and self-respecting. I got this from the offers!
- flexibility to work from Singapore. Can spend more time with wife, parents and kids.
- location for upcoming home-purchase
- unlock new markets — west coast, c++, data science ..
- commute?
- chance to impress manager
- leisure time to exercise, improve c++, call home, learn driving etc

My ideal (yet realistic) 1st project:

- slightly below (like 80%) my capacity. Chance to impress manager due to my specialty knowledge. “These managers could make things happen”.
- has spare capacity to check out the potential homes
- salary? LG2 like $65/hr
- c++ or java
- possibly west coast but the high rate is usually for FTE.

Alternatively, a temp contract to try c++ again, but only after I clear some high-end java interviews to build my confidence in earning capacity.

- low salary like 120k
- hands-on c++ (not C) on a large codebase to build mileage
- NY or west coast or anywhere else