Useful tips – series summation quiz

Facts to internalize, so when you see them in the middle of a puzzle, you could recognize them.

Fact A:


Fact A2:

1/5 +  1/52 + 1/53  … = 1/4

Fact C: (I call this a geometric-arithmetic series)

1/5  +   2/52  +  3/53 … = 5/ 4*4

Fact C2: (similar)

1/5 +   4/52  +  9/5+ 16/54… is also tractable

OLS ^ AutoRegressive models

Given some observed data Y, you first pick some explanatory variables X_1, X_2 etc

If you pick a linear model to explain the observed Y, then OLS is the best, linear, unbiased and efficient (BLUE) solution using a computer. It will give you all the parameters of your linear model – the b_0, b_1, b_2 etc.

If you feel the relationship isn’t linear, you still can use OLS. As an alternative to a linear model, you could use AR(1) models to explain Y using the X1 X2 etc. You use AR models when you believe there’s strong serial correlation or autocorrelation.

I believe AR models use additional parameters beside the b1, b2 etc. The computation is more efficient than OLS.

asset^liability on a bank’s bal sheet

When a bank issues a financial statement, the meaning of AS (asset) and LI (liability) tend to confuse me.

Suppose JPMC bank has client IBM…

Liability – Deposits (incl. CD) at the bank
Liability – overnight borrowing. This interest rate could surge like in 2008.
Liability – Commercial papers issued by the bank
Liability – Bonds issued by the bank
Asset – securities owned by the bank (treasury department?), including stocks, govt bonds and corp bonds etc. Securities could devalue like bad loan!
Asset – Loans to corporations like IBM — on the balance sheet treated like a govt bond!
Asset – Loans/mtg to retail — on the balance sheet treated like a govt bond!
Asset – spare cash

AS = LI + share holders’ equity

If the bank issues 600M shares in an IPO, the $600mil collected is considered share holders’ equity capital or simply “capital” or simply “equity”.

Chronologically, the balance sheet starts with the initial share holders’ equity. Then Deposits come in and sitting there as spare cash. Similarly the bank can issue bonds.
Then the bank could use the spare cash to buy securities — without change on the LI side.
The bank can also use the spare cash to give loans — without change on the LI side.

Each type of transaction above affects the balance sheet only in a “realized” sense i.e. book values —

Big warning – all the AS numbers and LI numbers and equity values are book values.
* Latest share price doesn’t enter the equation. Those 600M shares will always be recorded as worth $600M on the balance sheet.
* market value (m2m) of the loans lent out doesn’t matter
* market value (m2m) of the securities owned by the bank doesn’t matter.

Fair Value accounting tries to change that. Mark-to-market is a big effort in GS and many investment banks.

finance calc ^ accounting calc

I'm trying to understand the relation between the finance perspective and the accounting perspective.

Financial data often fall into 2 categories – accounting data and financial data. For example, I believe live market data (“financial data”) is largely irrelevant to accountants when updating the accounting records.

Accounting calc is strictly by the rule, a bit mechanical; finance calc is flexible and subject to interpretation.

When we hear of accounting, we think of the external (and internal) accounting firms, and financial reporting (yes accountant's job). I feel financial reporting has legal implications. Investors and regulators demand accurate calc. Therefore accounting rules are like laws. Breaking these rules is like breaking the law, falsifying and cheating tax authority and regulators, retail investors, institutional investors.

In my mind, “Finance” as a profession and discipline is about … valuation of corporate and other securities ultimately for transactions. For eg, an investment is a transaction — buying a security.

Finance is at a somewhat higher level than accounting?

Stoch Lesson 59 meaning of q[=] in a simple SDE

See Lesson 55 about details on deltaW and dW
See Lesson 19 about N@T
See Lesson 33 for a backgrounder on the canonical Wiener variable W

The Hull definition of the canonical Wiener process (Lesson 33) —

deltaW = epsilon * sqrt(deltaT) // in discrete time
dW      = epsilon * sqrt(dT) // in continuous time

The “=” has a different meaning than in algebra.

Discrete time is simpler to understand. Recall deltaW is a stepsize of a random variable. The “=” doesn’t mean a step size value of 0.012 is equal to the product of an epsilon value and sqrt(deltaT).

The “=” means equivalent-to.

Here epsilon represents … (hold your breath)… a noisegen, in fact the canonical Gaussian noisegen.

I’d say both deltaW and epsilon are N@T. These are not regular variables.

Stoch Lesson 33 canonical Wiener variable ^ Gaussian variable

See Lesson 05 for the backgrounder on Level, Stepsize, time-varying random variable…
See Lesson 15 about TVRV
See Lesson 19 about N@T

In many formulas in this blog (and probably in the literature), W denotes not just some Wiener variable, but THE canonical TVRV random variable following a Wiener process a.k.a BM. Before we proceed it’s good (perhaps necessary) to pick a concrete unit of time. Say 1 sec. Now I am ready to pin down THE canonical Wiener variable W in discrete-time —

   Over any time interval h seconds, the positive or negative increment in W’s Level is generated from a Gaussian noisegen, with mean 0 and variance equal to h. This makes W THE canonical Wiener variable. [1]

Special case – If the interval is from last observation, when Level is 0, to 55 sec later, then dW = W(t=55) – 0 = W(t=55), and therefore W@55sec, as a N@T, also has a Gaussian distribution with variance = 55.

[1] I think this is the discrete version of Standard Brownian Motion or SBM, defined by Lawler on P42 with 2+1 defining properties — 1) iid random increments 2) no-jump, which __implies__ 3) Gaussian random increments

Now let’s look at the standard normal distro or canonical Gaussian distro or Gaussian noisegen — If something epsilon follows a canonical Gaussian distribution, it’s often a N@T, which is not a time-varying random variable. Also the variance and stdev are both 1.0.

I believe the canonical Wiener variable can be expressed in terms of the canonical Gaussian variable —

  deltaW = epsilon * sqrt(deltaT)  //  in discrete time
  dW = epsilon * sqrt(dT)            //  in continuous time

Let’s be concrete and suppose deltaT is 0.3 yoctosecond (more brief than any price movement). In English, this says “over a brief 0.3 yoctosecond, step_size is generated from a Gaussian noisegen with variance equal to 0.3 * 10^-24”. If we simulate this step 9999 times, we would get 9999 deltaW (stesp_size) realization values. These realizations would follow a bell-shaped histogram.

Given dW can be expressed this way, many authors including Hull uses it all the time.

Both the canonical Wiener variable and the canonical Gaussian distribution have their symbols — W vs epsilon(ϵ), or sometimes Z. They show up frequently in formulas. Don’t confuse them.

The Wiener var is always a TVRV; the Gaussian var is often a N@T.

quant trend – more stats, less theory or complex model

I see increasing evidence (though anecdotal) that real world quant roles now require less math theory, and more programming.

A Barcap friend told me many (mostly buy-side) pricing systems basically take market quotes as the most reliable price, rather than computing a theoretical price from a complicated model. Why hire so many math wizards to build a model when the output is … questionable in the face of live price on the market? You can tune (based on market data!) your model to output a price, but I feel it has to be consistent with market prices. When they become inconsistent, either your model is imperfect or it has “discovered” a mispriced security or some rare trading opportunity like a stat arb. Well, I feel the chance of “discovery” is higher the simpler your model is. Further, how do you know this “opportunity” (if true) can’t be discovered by a simple analysis without a model? If you are after such opportunities, then the faster you process market data, the earlier you catch the opportunity. That means simpler math. Complicated model is harder to program right, i.e. more bugs.

A Danske quant shared how important programming is to a quant. Ultimately, the quants are hired to help traders make decisions. Every Trader loves usable soft market data they can play with. Whatever great idea you have, you’ve got to put it into a computer, otherwise no one will use it.

My young quant friends in Barcap and MS shared that in the first few years as a quant, programming is probably the most important skill.

For a buy-side quant (usually in eq and FX), stats know-how is probably more relevant than stoch or volatility. I think a high frequency shop won’t trade a lot of options, since liquidity is much lower. I guess many buy-sides do trade options, largely for hedging or to earn the premium.

On the other hand, there’s still a demand for the deep theoretical knowledge. I feel the math jargon is still an entry requirement for any quant role. Otherwise you don’t know what people are talking about. These jargon terms require a hell lot of background knowledge, probably taking a few years. Even the basic BS model can easily throw a curve ball. I bet you can’t catch unless you did a few months of “home work”.

Now a much bigger curve ball — Interest rate derivatives are the most math-intensive. (Danske is actually an IRD sell-side.) I was told credit derivatives add additional complexity, but I’d guess bulk of the math complexity is in the IR stoch vol. There are even more complicated products (MBS?) out there but the total profit in that market must be big enough to justify hiring quants. Structured derivatives market probably qualify as such a market.

Structured derivatives (aka exotics) are more math-intensive than vanilla derivatives. The vendor (sell-side) must price it carefully to protect himself. Overpriced, no client wants it and it’s a waste of vendor’s effort [1]. Under-priced, vendor himself is under-protected. Therefore pricing requires deep math — risk, modelling, embedded optionality, back testing… This is a prime example of high-touch (relationship-based) trading. Unfortunately, I feel technology (and other market factors) is driving the other direction — low touch (i.e. automated), high volume flow trading. Just one simple example — in recent years I see more FX option products offered electronically — A high touch product going low-touch?

[1] I was a client for some of these simple structured products. I found the price too high. I could see the vendor spent time selling it and finally withdrawing it. Such unattractive, hard-to-sell products in the structured product marketplace are common — lower revenue, higher cost, lower margin for the vendor, and lower salary.

distribution – a bridge from probability to statistics@@

I feel distribution is about a noisegen. There's always some natural source of randomness —

– people making choices

– coin flip

– height of people

All of these could be simulated then characterized by some infinitely sophisticated computer “noisegen”. For each noisegen we can sample it 1000 times and plot a histogram. If we sample infinite times, we get a pdf curve like

* uniform distribution

* binomial distribution

* normal distribution

The natural distributions may not follow any mathematically well-known distribution. If you analyze some astronomical occurrence, perhaps there's no math formula to describe it. In fact, even the familiar thick-tail may not have a closed-form pdf.

Nevertheless the probability distribution is arguably the most “needed” foundation of statistics. Note prob dist is about the Next noisegen output. (I don't prefer “future” — When Galileo dropped his 2 cannonballs, no one know for sure which one would land first, even though it was in the past.) Every noisegen is presumed consistent though its internal parameters may change over time.

I feel probability study is about theoretical models of the distribution; statistics is about picking/adjusting these models to fit observed data. Here's a good contrast — In device physics and electronic circuits, everyone uses fundamental circuit models. Real devices always show deviation from the models, but the deviations are small and well understood.

In probability theories, the noisegen is perfect, consistent, stable and “predictable”. In statistics we don't know how many noisegens are at play, which well-known noisegen is the closest, or how the noisegen evolves over time.

I feel probability theories build theoretical noisegen models largely to help the statisticians.

"Independence" in probability ^ statistics

I feel probability and statistics have different interpretations of Ind, which affects our intuition —

– independence in probability is theoretical. The determination of ind is based on idealized models and rather few fundamental axioms. You prove independence like something in geometry. Black or white.

– independence in statistics is like shades of grey, to be measured. Whenever there’s human behavior or biological/evolution diversification, the independence between a person’s blood type, birthday, income, #kids, education, lifespan .. are never theoretically provable. Until proven otherwise, we must assume these are all dependent. More commonly, we say these “random variables” (if measurable) are likely correlated to some extent.

* ind in probability problems are pure math. Lots of brain teasers and interview questions.
* ind in stats is often related to human behavior. Rare to see obvious and absolute independence

For Independence In Probability,
1) definition is something like Pr (1<X<5 | 2<Y<3) = Pr (1<X<5) so the Y values are irrelevant.

2) an equivalent definition of independence is the “product definition” — something like P(1<X<5 AND 2<Y<3) = product of the 2 prob. We require this to be true for any 2 "ranges" of X and of Y. I find this definition better-looking but less intuitive.

You could view these definitions as  a proposition if you already have a vague notion of independence. This is a proposition about the entire population not a sample. If you collect some samples, you may actually see deviation from the proposition!?

Actually, my intuition of independence often feels unsure. I now feel those precise definitions above are more clear, concise, provable, and mathematically usable. In some cases they challenge our intuition of independence.

An Example in statistics –If SPX has risen for 3 days in a row, does it have to do with the EUR/JPY movement?

E(X*Y) = E(X)E(Y) if X and Y are independent. Is this also an alternative definition of independence? Not sure.

I feel most simple examples of independence are the probability kind — “obviously independent” by common sense. It’s not easy to establish using statistics that some X and Y are independent. You can’t really collect data to deduce independence, since the calculated correlation will  likely be nonzero.

Simple example?

mean reversion – vol^pair^underlier price

I feel implied vol shows more mean reversion than other “assets” (pretending eq vol is an asset class). In fact Wall Street’s biggest eq-vol house has a specific definition for HISTORICAL vol mean-reversion — “daily HISTORICAL vol exceeding weekly HISTORICAL vol over the same sampling period“. In other words “vol of daily Returns exceeding vol of weekly Returns, over the same sampling period”. I think in the previous sentence “vol” means stdev.

This pattern is seen frequently. To trade this pattern, buy var swap, long daily vol and short weekly vol… (but is it h-vol or i-vol??) I am not sure if retail investors could do this though.

In contrast, Stocks, stock indices, commodities and FX can trend up (no long term mean reversion). Fundamental reason? Inflation? Economic growth?

The (simplistic) argument that “a price can’t keep falling” is unconvincing. Both IBM and IBM – 2 yr option can rise and fall.  However, IBM could show a strong trend over 12 months during which it mostly climbs, so a trader betting big on mean reversion may lose massively. The option can have a run, but probably not too long. I feel volatility can’t have long term trends.

A practitioner (Dan) said mean reversion is the basis of pair trading. I guess MR is fairly consistent in the price difference between relative value pairs.

Interest rate? I feel for years IR can trend up, or stay low. I guess the mean reversion strategies won’t apply?

I feel mean reversion works best under free market conditions. The more “manipulated”, the more concentration-of-influence, the less mean reversion at least over the short term. Over long term? No comments.

concrete illustration – variance of OLS estimators

Now I feel b is a sample estimate of the population beta (a parameter in our explanatory linear “model” of Y), but we need to know how close that b is to beta. If our b turns out to be 8.85, then beta could be 9, or 90. That’s why we work out and reassure ourselves that (under certain assumptions) b has a normal distribution around beta, and the variance is …. that var(b|X).

I just made up a concrete but fake illustration, that I will share with my friends. See if I got the big picture right.

Say we have a single sample of 1000 data points about some fake index Y = SPX1 prices over 1000 days; X = 3M Libor 11am rates on the same days. We throw the 2000 numbers into any OLS and get a b1 = -8.85 (also some b0 value). Without checking heteroscedasticity and serial correlation, we may see var(b1) = 0.09, so we are 95% confident that the population beta1 is between 2 sigmas of -8.85, i.e. -8.25 and -9.45. Seems our -8.85 is usable — when the rate climbs 1 basis point, SPX1 is likely to drop 8.85 points or thereabout.

However, after checking heteroscedasticity (but not serial corr), var(b1) balloons to 9.012, so now we are 95% confident that true population beta1 is between 2 sigmas of -8.85 i.e. -2.25 and -14.25, so our OLS estimate (-8.85) for the beta1 parameter is statistically less useful. When the rate climbs 1 basis point, SPX1 is likely to drop… 3, 5, 10, 13 points. We are much less sure about the population beta1.

After checking serial corr, var(b) worsens further to 25.103, so now we are 95% confident that true beta is between +1.15 and -18.85. When the rate climbs 1 point, SPX1 may drop a bit , a lot, or even rise, so our -8.85 estimate of beta is almost useless. One thing it It does help — it does predict that SPX1 is UNlikely to rise 100 points due to the a 1 basis point rate change, but we “know” this without OLS.

Then we realize using this X to explain this Y isn’t enough. SPX1 reacts to other factors more than libor rate. So we throw in 10 other explanatory variables and get their values over those 1000 days. Then we hit multicolleanearity, since those 11 variables are highly correlated. The (X’ X)^-1 becomes very large.

0 probability ^ 0 density, 1st look

Given a simple uniform distribution over [0,10], we get a paradox that Pr (X = 3) = 0. explains it, but here’s the way I see it.

Say I have a correctly programmed computer (a “noisegen”). Its output is a floating point number, with as much precision as you want, say 99999 deciman points, perhaps using 1TB of memory to represent a single output number. Given this much precision, the chance of getting exactly 3.0 is virtually zero. In the limit, when we forget the computer and use our limitless brain instead, the precision can be infinite, and the chance of getting an exact 3.0 approaches zero. explains that when the delta_x region is infinitesimal and becomes dx, f(3.0) dx == 0 even though f(3.0) != 0.

Our f(x) is the rate-of-growth of the cummulative distribution function F(x). f(3.0)dx= 0 has some meaning but it doesn’t mean there’s a zero chance of getting a 3.0. In fact, due to continuous nature of this random variable, there’s zero chance of getting 5, or getting 0.6 or getting a pi, but the pdf values at these points aren’t 0.

What’s the real meaning when we see the prob density func f(), at the 3.0 point is, f(3.0) = 0.1? Very loosely, it gives the likelihood of receiving a value around 3.0. For our uniform distribution, f(3.0) = f(2.170) = f(sqrt(2)) = 0.1, a constant.

The right way to use the pdf is Pr(X in [3,4] region) = integral over [3,4] f(x)dx. We should never ask the pdf “what’s the probability of hitting this value”, but rather “what’s the prob of hitting this interval”

The nonsensical Pr(X = 3) is interpeted as “integral over [3,3] f(x)dx”. Given upper bound = lower bound, this definite integral evaluate to zero.

As a footnote, however powerful, our computer is still unable to generate most irrational numbers. Some of them have no “representation” like pi/5 or e/3 or sqrt(2), so I don’t even know how to specify their position on the [0,1] interval. I feel the form-less irrational numbers far outnumber rational numbers. They are like the invisible things between 2 rational numbers. Sure between any 2 rationals you can find another rational, but within the new “gap” there will be countless form-less irrationals… Pr(a picked number [0,1] is rational)=0

memory leaks in one WPF/Silverlight app

The app has multiple screens (windows). A user can open and close them throughout a login session.

The acid test – when we close a data-heavy window, actual memory usage (not the virtual) should reduce. In our case, memory stayed high. In some cases it reached 1G+ and the entire application crashed. We fixed the issue – capped at 300M.

The leaks and the fixes

* Event registration was found to be the #1 category. In window OnClose, a lot of clean-up was needed. Release references, unregister events.

* Dispose() method was added to many classes.

Fwd: what tech skills are in demand OUTside WallSt@@


How is it going? I believe what you wrote. JQuery is purely browser-side but I'm short (i.e. biased against) browser technology. I have chosen to move away from it, so I am seeing all the reasons to support my short position …

For trading systems
* most biz logic is on some kind of server side daemon, i.e. no web server
* the dominant user interface is not browser but a desktop GUI in c#, swing or something older
* there are often some reporting tool. I think they are often but not always browser based.
* In addition, there are some web interfaces that expose in ReadWrite mode the data residing on the server-side.

For risk and backend systems
* i feel the UI tends to be browser, and server side would include a web server.
* the business logic is usually on the server side. Sometimes it's a batch job. Sometimes message driven in real time.

For other financial/banking apps, I'm not sure, but web technology is used more for sure.

How about the GS private wealth systems? Which category? I would put it under the “other financial” category. It has a lot of database-centric software modules, and there's also a lot of online service modules.

Outside finance, I feel web technology is prevalent, mainstream, dominant, often the default choice for a given requirement. Financial technology is a niche. I said the same thing in my 2010 blog  If we want to stay close to the mainstream and be employable across the tech world, then invest in web skillset.

—- your wrote —
However even with financial sectors, there seems to a growing need for front end technologies.  JQuery seems to be very hot.

2 common quality metrics on an OLS estimator b

1) t-score — measures how far our b value (8.81) is compared to the stdev. The t-score or t-stat is basically

  b / stdev(b|X)

t-score is more commonly used than z-score, because … sigma of the population's (residual) is unknown — to be estimated.

2) R-squared — measures how much of the Y variation is explained by the model (or by the explanatory variable X)


F-test is also common but less common.

some random comparisons – wpf ^ swing

I'm no expert. Just some observations of a newbie

* jtable vs grid

* dependency property, attached property – fundamental in wpf. No counterpart in swing.

* events (field-like delegate fields) are used throughout wpf, not only for event-handlers on a visual component (like button's mouse-click)

* the containment hierarchy is central to wpf command routing, event routing, property inheritance, and data binding. In swing, ……

* property change listener is another area to compare

* declarative programming via xaml? Many basic functionalities of xaml can be achieved in c# code, which looks like swing code.

* routed event? How does swing handle it?
* commanding? No such feature in swing.
* in both technologies, a visual object comprises many constituents.

OLS var(b) – some notes

The Y data set (eg SPX) could be noisy — high stdev. If you pick the correct X data set (eg temperature, assuming univariate) to explain it, then the 1000 sample residual numbers e1 e2 e3 … e1000 would show less noise. This is intuitive — most of the variations in Y are explained by X.

the sigma^2 on page 53 refers to the noise in the 1000 sample residual values, but I am not sure if this sigma^2 is part of a realistic OLS regression.

The last quarter of the regression review is all about measuring the quality of the the OLS estimator. OLS is like an engine or black box. Throw the Y and X data points in, and you get a single b value of 8.81. (If you have another explanatory variable X2, then you get b2.) This 8.81 is an estimate of the population parameter denoted beta. Real beta could be 9.1 or -2 or whatever. To assess our confidence we compute a value for var(b) from the Y/X data points. This var(b) is a quality metric on the OLS estimate.

var(b) depends on (X'X)^-1
var(b) depends on X' SIGMA X
var(b) depends on sigma^2 ? but SIGMA probably depends on it.

In financial data, Hetero and serial corr invariablly mess up SIGMA (a 1000/1000 matrix). If we successfully account for these 2 issues, then our var(b) will become more accurate and much higher. If variance is too high like 9 relative to the b value of 8.81, then our computed b value is a poor esitmate of beta. Stdev is 3 so the true beta could fall within 8.81 +- 3*2 with 95% confidence.

(X`X)^-1 can become very large due to collinearity.

I asked Mark — if I am very clever to pick the best explanatory variable X, but still the sample residual (e1 e2 e3… e1000) still shows large noise, but white noise, without hetero or serial-corr, then my var(b) is still too high. Well I have done a good job but we just need more data. However, in reality, financial data always suffer from Hetero and serial-corr.

AttachedProperty – clarifying questions

Attached Property is a tricky construct. When confused, a beginner (like me) could ask a few key questions

Q: Which class (not “object”) defines the AProp?
Q: Which object’s (not “class”) hashtable holds the property value?

The holder object is the visual/screen object “decorated” by the property value. Just like Opacity, “Dock.Left” describes some aspect of a visual. The “provider” is a class not an object. The AProp provider class is like a category or namespace, and tells us what category this property falls into. Is it a docking property or a text formatting property?

Any AProp (or any DProp) data can be seen as a key-value pair. In the Dock example, an OKButton’s hashtable holds the value of “Left” under the key DockPanel.DockProperty. Note the property is “specified” or “defined” not by Button class but by the DockPanel class.

After the OKButton saves the “Left” value, the natural question is who reads it? Answer — The container. In this case, the user of this property happens to be an instance of the provider class — DockPanel. In some cases, the user is unrelated to the AProp provider. You can even attach a nonsensical property to a nonsensical object, so the property value is never used.

2013 Citadel IV – c#, C++

Q: System.Array.CopyTo() vs Clone()
%%A: Clone() is declared in a supertype IClonable, whereas CopyTo is only in Array class – confirmed
%%A: Clone() is controversial – confirmed.
%%A: Clone() returns a new object whereas CopyTo requires a target array of the correct size to pre-exist – correct

AA: Array class has  a static method Clone(), but I feel the real important point is the Clone() controversy.

Q4: What does the “…” mean in C++ catch(…)?
%%A: If the troublemaker has no (not empty) exception spec, then “…” means catch-all. Any type of exception can be thrown and will be caught here
%%A: if the troublemaker has an exception spec of type B, then “…” means all subtypes of B. If another  type of exception is thrown, then unexpected() triggers, with the catch(…) ignored. Tested 🙂
A: i think my answer was correct.

Q4b: What’s the c# counter part?
%%A: catch(Exception) or an empty catch{/**/} — confirmed

Q3: delegate — what is it and what’s the usage?

Q3b: what’s the c++ equivalent (or closest)
Q: What design patterns are behind the c# delegate?
A: for multicast …. observer
A: for unicast … command? The delegate instance usually has a host object
Q: how do you manage memory leaks?

dispatcher ^ dispatcherObject in WPF — some simple tips

Probably a low-level detail we seldom need to know. Don't spend too much time here.

See P928 [[Pro WPF in c#]]

I feel DispatcherObject means “UI object under a UI dispatcher”. Every visual (including Button) derives from the type System.Windows.Threading.DispatcherObject. See

DispatchObject as a base class also offers the popular methods
– this.CheckAccess() and

– this.VerifyAccess()

Every DispatcherObject (including myButton) has a this.Dispatcher property, whose type is  System.Windows.Threading.Dispatcher

4 basic "consumers" of an existing template #incl. subclass

Given an existing class template, [[c++timesaving Techniques]] Chapter 31 details 4 simple/basic “techniques” to use it as a consumer[1].

1) concretize and use it as a regular class. Most common and basic usage. This is how we use STL containers.
2) concretize and use it as a field of your own class. Simple composition.
3) concretize and derive from the concretized class
4) derived from the template unconcretized. You get another template. Classic extension. P678 and P684 [[absoluteC++]] explains the syntax

Other techniques not mentioned in this (beginner/intermediate) book
7) use it unconcretized as a field of your own template. Common usage of STL.
template class C7{vector v;};

8) template specialization

9) use it as a template type argument in another template. “Nested template”. P684 [[absoluteC++]]
template class C9: public multiset<vector >;

[1] Recall the distinction between a library codebase and a consumer codebase. A regular function or class we write can a consumer or part of some library. A class template is always in a library.

STL containers and smart ptr – pbclone on the outside

STL containers should store only values and smart ptr in containers, so said [[c++ coding standards]].

Essentially, store “value types” not reference types, borrowing c# terminology. Containers pass element objects (including 32-bit ptr objects) by value, as a rule. By-reference is less supported and a bit dangerous.

Now let’s look at smart ptr. Except inside the smart ptr class itself, do we ever pass smart ptr Instances by reference/ptr, or we always pass smart ptr Instances by value? I feel by-value is absolutely essential otherwise the special logic in copiers is bypassed.
I feel for STL containers and smart ptr, internal operations may pass element objects by reference or by ptr , but public API is mostly pass-by-value (pbclone) or pass-by-ptr, which is the C tradition. C++ style pass-by-reference is popular mostly in standard constructs such as copier and assignment operator or passing large objects without pointers.

size of a CLR (c# etc) heap object

(JVM is probably no different. See the post on Java) [[.NET performance]] circa 2012 shows a layout of a typical CLR heap object containing some simple custom fields.

–32-bit machine (4-byte alignment) —
For a no-field object, Logically 8 bytes needed = sync block + vptr. However, 12 bytes used in reality. No explanation given.

Due to memory alignment, For an object with a single small field, it’s 12 bytes. Small means a byte, bool, or pointer etc.

–64-bit machine (8-byte alignment), basically everything doubled
For a no-field object, logically 16 bytes needed. In reality? Not sure.

For an object with a single small field, it’s 24 bytes.

log4net line number (%line) requirements

If you don’t satisfy all the conditions, then you get “0” as the dummy line number.

1) deploy the pdb files of all of my own DLL and EXE. If you put in some of them you may see some line numbers.
2) pdb file and the DLL/EXE should come from the same build. Version mismatch will trigger no exception. Just “0”.

Obviously you need something like this to see any logging at all —


struct in C is like c# value-type

Before C++, java or c#, C offers the struct. This is a true-blue value type. When you put a struct type variable on the LHS, the entire struct instance with all the fields are cloned bitwise.

If one of the fields happens to be a pointer like a c_str, then the address inside the pointer field is copied.

Beside pbclone, you can also work with a pointer to struct — a bit advanced.

In C++, the struct is backward compatible with C — pbclone by default.

C++ also added lots of features into the struct construct. It's essentially identical to the class except members are public by default.

Therefore, c++ class/struct instances follow value semantics (pbclone) by default

In java, there's only class, no struct. Any class instance is pbref — simple and clean. You never get bitwise copy with java class instances.

In c#, the class behaves just like java classes. The struct behaves like C struct.

basics of 2-D array in C

Array of strings are the least confusing. Matrix-double is also widely used.

A[33][22] is a 2D matrix with 33 rows 22 columns. For element A[3][2] , 3 is the first subscript, i.e. row-number. Row number can go up to 33 – 1.



– The 2D array layout is contiguous — a[0][0] a[0][1]….a[1][0] a[1][1]. I think of it as 33 simple arrays connected end-to-end, each 22 cells

– The array of pointer is 33 pointers.

Char namesA[count][size]; //size = limit on long names

Char *namesB[count]; //this many strings; this many pointers

not-b4-not-af : position-check in sorted data structure

[[more eff c++]] raised many points about the difference between equality check vs equivalence (i call it “ranking-check”) check. My short summary is

– not-before-not-after (based on comparison)
– regular equality check as in java

Ranking-check — checking among existing data items in the container to determine where (if any) to put in an “incoming guest”. If the check shows incoming guest would hit the same position as an existing item, then a set/map (without prefix “multi-“) would reject.

Note standard STL set and map are all red-black-tree-based and all use ranking-check. There’s no hash container in standard STL. C++11 added unordered_*

The other points in the book tend to overwhelm a beginner, who is better off with a firm grip on just one concept — the real difference between the 2 checks.

Stoch Lesson J101 – W(t) isn’t a traditional function-of-time

See lesson 05 for a backgrounder on Level, steps
See Lesson 33 for a backgrounder on the canonical Wiener variable W

Let’s look at the notation W(t). This suggests the Level of W is a function of t. Suppose i = 55, I’d prefer the notation W_55 or Level_55, i.e. the level AFTER step_55. This level depends on i (i.e. 55), depends on t (i.e. 55 intervals after last-observation), and also depends on the 55 queries on the noisegen. Along one particular path W may be seen as a traditional function of t, but it’s misleading to think of W as a function t. Across all paths, at time t_55, W is W_55 and includes all the 9999 realized values after step_55 and all the “unrealized” values.

In other words, W at time t_55 refers to the “distribution” of all these possible values. W at time t_55 is a cross section of the 9999+ paths. The symbol W(t) means the “Distribution of W’s likely values at a future time t seconds after last observation“. Since W isn’t a traditional function of t, dW/dt is a freak. As illustrated elsewhere on this blog, the canonical Wiener variable W is not differentiable.

Stoch Lesson 55 deltaW and dW

See Lesson 05 about stepsize_i, and h…
See Lesson 33 for a backgrounder on the canonical Wiener variable W

Note [[Hull]] uses “z” instead of w.

Now let’s explain the notation deltaW in the well-known formula

S_i+1 – S_i == deltaS = driftRate * deltaT + sigma * deltaW

Here, deltaW is basically stepsize_i, generated by the noisegen at the i’th step. That’s the discrete-time version. How about the dW in the continuous time SDE? Well, dW is the stepsize_i as deltaT -> 0. This dW is from a noisegen whose variance is exactly equal to deltaT. Note deltaT is the thing that we drive to 0.

In my humble opinion, the #1 key feature of a Wiener process is that the Gaussian noisegen’s variance is exactly equal to deltaT.

Another name for deltaT is h. Definition is h == T/n.

Note, as Lawler said, dW/dt is meaningless for a BM, because a BM is nowhere differentiable.

Stoch Lesson J88 when to add scaling factor sqrt(t)

See Lesson 05 for a backgrounder on h.
See Lesson 15 for a backgrounder on paths and realizations.

In the formulas, one fine point easy to missed out is whether to include or remove sqrt(t) in front of dW. As repeated many times, notation is extremely important here. Before addressing the question, we must spend a few paragraphs on notations.
It’s instructive to use examples at this juncture. Suppose we adopt (h=) 16-sec intervals, and generate 9999 realizations of the canonical Wiener process. The 9999 “realized” stepsize values form a histogram. It should be bell-shaped with mean 0 and variance 16.0, stdev 4.0. If we next adopt (h=) 0.09-sec intervals, and generate 8888 realizations of the same process, then the resulting 8888 stepsize values should show variance 0.09, stdev 0.3.
That’s the canonical Wiener variable. So dW is defined as the stepsize as h -> 0. So dW has a Gaussian distribution with variance -> 0. Therefore dW is not customized and has well-known standard properties, including the sqrt(t) feature.
The simplest, purest, canonical Wiener variable already shows the sqrt(t) feature. Therefore, we should never put sqrt() in front of dW.
In fact, sqrt(t) scaling factor is only used with epsilon (or Z), a random variable representing the standard normal noisegen, with a fixed variance = 1.0

method == func ptr with an implicit this-ptr

A Microsoft/TwoSigma developer revealed to me that basic OO features can be implemented in C with func ptr. A non-static method is

basically a C function whose first parameter is a ptr to the host object. The func's address is then saved as a func ptr field of

the struct. See also P342 [[headfirstC]]

I also read that in the early days of C++, every C++ source file was once converted to C source code and then compiled as C. That

means all c++ syntax features supported then were like sugar coating over C syntax, and some combination of C language features

(mostly pointers) could emulate all C++ features known then. This was the C++ early days. No longer possible.

c# attribute on method arg – usage

I feel attributes on method param is a rarely used feature. I feel it is meant for reflection or documentation. presents a simple usage of this feature, whereby runtime reflection extracts the additional information contained in the attribute value. You can then act on the information.

It also shows that an override method can “add” such attributes.

argument validation — presents a practical usage — argument validation. For example, you can mark an argument as NotNull

Here’s another usage, but I don’t remember if I tested it successfully.

y minimize code behind xaml

I feel this is a valid question. Many wpf designs make this a high priority. It takes extra effort. Why make the “sacrifice”?

Code behind is a melting pot paradigm. In contrast, code in VM, M and command classes are far easier to read, test and modularize. They are often much shorter. These classes often have very few members. To a certain extent, I would say the more classes you use to organize the code, the better — imagine all of these logic put into one class — the code behind:(

See also

Codebehind is considered part of Xaml and in the View layer. The code logic in CB is married to the view and not Reusable.

It’s considered best practice to
– move stuff from code behind to xaml or
– move stuff from code behind to VM or other c# classes

The reverse is usually easier — quick and dirty, sometimes clever manipulations to achieve an effect. These manipulations done in CB tend to be messy and not modularized or compartmentized.

Stoch Lesson 22 any thingy dependent on a TVRV is likely a TVRV

See Lesson 05 about the discrete-time S_i+1 concept.
See Lesson 15 about TVRV.

I feel in general any variable dependent on a random variable is also a random variable, such as the S in

S_i+1 – S_i = deltaS = a * deltaT + b * deltaW

The dependency is signified by the ordinary-looking “+” operator. To me this addition operator means “superimpose”. The deltaS or stepsize is a combination of deterministic shift superimposed on a non-deterministic noise. That makes S itself a time-varying random variable which can follow a trillion possible paths from last-observation to Expiry.

The addition doesn’t mean the stepsize_i+1 will be known once both components i.e. (a * deltaT) and (b * deltaW) are known. In fact, deltaW can take a trillion possible values, so the stepsize in S is not exactly predictable i.e. non-deterministic. This stepsize is random. Therefore S itself is a TVRV.

tricks – wpf/swing IV

As my friend Guillaume pointed out, a lot of GUI know-how fall into the “tricks” category. Just open any cookbook on WPF/swing. (Let's not elaborate why GUI technologies have this commonality.)

Q: How would you deal with “how-would-you” interview questions ☺

A: prioritize according to rarity. Avoid those tricks less widely needed

A: prioritize among the visual components by rarity. jtable is by far the most used jcomponent in swing. I wasted a lot of time over text-related components.

A: real time application and related tricks are priority

A: memory management is priority but there are no feature “tricks” here

A: prioritize according to layer. Some knowledge is fundamental and used in a lot of tricks. Perhaps browse some books


*dependency property, attached property

*event handler

send slow task to worker thr then back to EDT, briefly

How do you thread a long-running operation (a “tortoise”) that needs to update the GUI? Putting it on the EDT will block all screen

updates and user interactions until the tortoise finishes.

P930 [[Pro WPF in c#]] confirmed my belief that in wpf (as in swing),

1) the task should first be dispatched (from EDT or another thread) to a worker thread, which is designed for tortoises, and

2) any resulting update to GUI is then sent back, from the worker thread, to the EDT, which is the only thread permitted to update

the screen.

ContextBoundObject vs SynchronizationContext, first look

As illustrated in [[threading in c#]], CBO is about “serialize all instance methods”. Now I guess the CBO construct relies on SC construct.

SC is about marshalling calls to the GUI thread. There’s 1 or zero SC instance for each thread. The GUI thread always has one. The SC instance is automatically created for the GUI thread.

In, I feel the sync context construct is similar to the dispatcher construct. Both are effectively “handles” on the UI message pump. Since other threads can’t directly pass “tasks” to the UI thread, they must use a handle like these. Assuming sc1 covers the GUI thread, sc1.Post (some_delegate) is like Dispatcher.BeginInvoke(some_delegate)

Similar to Thread.CurrentThread static property, SC.Current is a static property. Thread1 Calling SynchronizationContext.Current would get object1, while Thread2 calling SynchronizationContext.Current will get object2. advocates WPF Dispatcher instead of SC

xaml resource dictionaries – a few pointers

What's the confusion with RD elements “” and WR

elements “”?

1) All these elements define an INSTANCE of the ResourceDictionary

class. If you see 5 chunks of it in a xaml, then in the VM there will

be 5 instances(?).

2) I believe most visual controls have a non-static property

Resources, of type ResourceDictionary. Therefore Each layer on the

containment hierarchy should offer a ResourceS field, often empty. You

can populate it like


3) This “Resources” field often encloses a RD element, instantiating

an RD instance. Often the element simply define the

key/value pairs, without an RD layer — Confusing.

4) The resource dict instance at any Container level is available to

be shared by the children. The highest level is probably Application

level, but i usually use the Window level. That meets my basic needs

of sharing a resource.

[[wpf succinctly]] points out

You can create a resource dictionary at the Application, Window, and

UserControl levels. A custom RD allows you to import your own C#

objects to use (and share) in XAML data binding.

A common practice is to put your view model Instance in a window-level

RD. See [[wpf succinctly]]. This resource must be defined before it's

used as a data context.

ObjectDataProvider, learning notes

I feel ObjectDataProvider is an unsung hero….

One of the usages is binding to non-static method of some stateful util class. See also [[wpf recipes in c#]]. First you need to put an instance of your class into a resource dict. ObjectDataProvider element with an “ObjectType” attribute would instantiate it.
It's useful to put an instance of your stateful util class into the window's resource dict. You can later use it in many ways (won't elaborate). Without ObjectDataProvider, the instance is constructed by the default ctor — serious limitation.