python: routine^complex tasks

XR,

Further to our discussion, I used perl for many years. 95% of my perl tasks are routine tasks. With py, I would say “majority” of my tasks are routine tasks i.e. solutions are easy to find on-line.

  • routine tasks include automated testing, shell-script replacement, small-scale code generation, text file processing, query XML, various data stores or via http post/get
  • For “Complex tasks” , at least some part of it is tricky and not easily solved by Googling. Routine reflection / concurrency / c++Integration / daemon process… are documented widely, with sample code, but these techniques can be pushed to the limit.
    •  Even if we just use these techniques as well-documented, but we combine them in unusual ways, then Google search will not be enough.
    • Beware — decorators , meta-programming, polymorphism, on-the-fly code-generation, serialization, remote procedure call … all rely on reflection.

When you say py is not as easy as xxx and takes several years to learn, I think you referred to complex tasks.

I can see a few reasons why managers choose py over java for certain tasks. I heard there are a few jvm-based scripting languages (scala, groovy, clojure, jython …) but I guess python beats them on several fronts including more packages (i.e. wheels) and more mature, more complete and proven solutions, familiarity, reliability + wider user base.

One common argument to prefer any scripting language over any compiled language is faster development. True for routine tasks. For complex tasks, “your mileage may vary”. As I said, if the software system requirement is inherently complex, then implementation in any language will be complex. When the task is complex, I actually prefer more verbose source code — possibly more transparent and less “opaque”.

Quartz is one example of a big opaque system for a complex task. If you want, I can describe some of the complex tasks (in py) I have come across though I don’t have the level of insight that some colleagues have.

When you said the python debugger was less useful to you than java debugger, it’s a sign of java’s transparency. My “favorite” opaque parts of py are module import and reflection.

If python (or any language) has excellent performance/stability + good on-line resources [1] + reasonable library of components comparable to the mature languages like Java/c++, then I feel sooner or later it will catch on. I feel python doesn’t have exactly the performance.

[1] documentation is nice-to-have but not sufficient. Many programmers don’t have time to read documentation in-depth.

probability density is always prob mass per unit space

It’s worthwhile to get an intuitive feel for the choice of words in this jargon.

* With discrete probabilities, there’s the concept of “probably mass function”
* With continuous probability space, the corresponding concept is “density function”.

Density is defined as mass per unit space.

For a 1D probability space, the unit space is length. Example – width of a nose is a RV with a continuous distro. Mean = 2.51cm, so the probability density at this width is probably highest…

For a 2D probability space, the unit space is an area. Example – width of a nose and temperature inside are two RV, forming a bivariate distro. You can plot the density function as a dome. Total volume = 1.0 by definition. Density at (x=2.51cm, y=36.01 C) is the height of the dome at that point.

The concentration of “mass” at this location is twice the concentration at another location like (x=2.4cm, y=36 C).

risk-factor-based scenario analysis#Return to RiskMetrics

Look at [[return to risk riskMetrics]]. Some risk management theorists have a (fairly sophisticated) framework about scenarios. I feel it’s worth studying.

Given a portfolio of diverse instruments, we first identify the individual risk factors, then describe specific scenarios. Each scenario is uniquely defined by a tuple of 3 numbers for the 3 factors (if 3 factors). Under each scenario, each instrument in the portfolio can be priced.

I think one of the simplest set-up I have seen in practice is the Barcap 2D grid with stock +/- percentage changes on one axis and implied vol figures on the other axis. This grid can create many scenario for an equity derivative portfolio.

I feel it’s important to point out two factors can have non-trivial interdependence and influence each other. (Independence would be nice. In a (small) sample, you may actually observe statistical independence but in another sample of the same population you may not.) Between the risk factors, the correlation is monitored and measured.

##boost modules USED ]finance n asked]IV

#1) shared_ptr (+ intrusive_ptr) — I feel more than half  (70%?) of the “finance” boost usage is here. I feel every architect who chooses boost will use shared_ptr
#2) boost thread
# serialization
# boost::ANY

Gregory said bbg used shared_ptr, hash tables and function binders from boost

Overall, I feel these topics are mostly QQ types, frequently asked in IV (esp. those in bold) but not central to the language. I would put only smart pointers in my Tier 1 or Tier 2.

—— other modules used in my systems
* noncopyable — for singletons
** private derivation
** prevents synthesized {op= and copier}, since base class is noncopyable.

* polymorphic_cast
* numeric_cast — mostly int types, also float types

* operators ?
* bind?
* tuple
* regex
* scoped_ptr — as non-copyable [1] stackVar,
[1] different from auto_ptr

## c++real jobs offers I got

  1. eSpeed
  2. Pimco in 2010
  3. (Sg) Citi eq-der
  4. BNP
  5. Pimco accrual accounting 2017
  6. (sg) Art of Click
  7. ICE

Q1: Assuming past performance is an indicator of future success, how did I fare in past successful/unsuccessful c++ IV?

Outside the very selective (mostly HFT) c++ jobs, I had a pass rate around 30% on the technical front. A subset of these became real offers.

Q1a: c++ coding IV? Among the real offers I received, only eSpeed had a coding interview.

I feel the very selective c++ jobs will be difficult for me, but the majority of c++ coding IV I would have a chance. These questions usually cover recursion, array/string/primitives + perhaps some map/set.

Q1b: C++ QnA IV?
My c++ knowledge will be as good as before, after 1) reviewing my blog and 2) studying the past questions again.

If I consider my learning in Macquarie, I would say my c++ know-how is potentially better than before.

Q: What statistics would make me feel confident about my c++ competence? 10 real offers in history?
A: I feel 3 more

central limit theorem – clarified

background – I was taught CLT multiple times but still unsure about important details..

discrete — The original RV can have any distro, but many
illustrations pick a discrete RV, like a Poisson RV or binomial RV. I think to some students a continuous RV can be less confusing.

average — of N iid realizations/observations of this RV is the estimate [1]. I will avoid the word “mean” as it’s paradoxically ambiguous. Now this average is the average of N numbers, like 5 or 50 or whatever.

large group — N needs to be sufficiently large, esp. if the original RV’s distro is highly asymmetrical. This bit is abstract, but lies at the heart of CLT. For the original distro, you may want to avoid some extremely asymmetrical ones but start with something like a uniform distro or a pyramid distro. We will realize that regardless of the original distro, as N increases our “estimate” becomes a Gaussian RV.

[1] estimate is a sample mean, and an estimate of the population mean. Also, the estimate is a RV too.

finite population (for the original distro) — is a common confusion. In the “better” illustrations, the population is unlimited, like NY’s temperature. In a confusing context, the total population is finite and perhaps small, such as dice or the birth hour of all my
classmates. I think in such a context, the population mean is actually some definite but yet-unknown number to be estimated using a subset of the population.

log-normal — an extremely useful extension of CLT says “the product of N iid random variables is another RV, with a LN distro”. Just look at the log of the product. This RV, denoted L, is the sum of iid random variables, so L is Gaussian.