[16]python: routine^complex tasks #XR

XR,

Further to our discussion, I used perl for many years. 95% of my perl tasks are routine tasks. With py, I would say “majority” of my tasks are routine tasks i.e. solutions are easy to find on-line.

  • routine tasks include automated testing, shell-script replacement, text file processing, query XML, query various data stores, query via http post/get, small-scale code generation, simple tcp client/server.
  • For “Complex tasks” , at least some part of it is tricky and not easily solved by Googling. Routine reflection / concurrency / c++Integration / importation … are documented widely, with sample code, but these techniques can be pushed to the limit.
    •  Even if we just use these techniques as documented, but we combine them in unusual ways, then Google search will not be enough.
    • Beware — decorators , meta-programming, polymorphism, on-the-fly code-generation, serialization, remote procedure call … all rely on reflection.

When you say py is not as easy as xxx and takes several years to learn, I think you referred to complex tasks.

It’s quite impressive to see that some powerful and useful functionalities can be easily implemented by following online tutorials. By definition these are routine tasks. One example that jump out at me is reflection in any language. Python reflection can be powerful like a power drill to break a stone wall. Without such a power drill the technical challenge can look daunting. I guess metaclass is one such power drill. Decorator is a power drill I used in %%logging decorator with optional args

I can see a few reasons why managers choose py over java for certain tasks. I heard there are a few jvm-based scripting languages (scala, groovy, clojure, jython …) but I guess python beats them on several fronts including more packages (i.e. wheels) and more mature, more complete and proven solutions, familiarity, reliability + wider user base.

One common argument to prefer any scripting language over any compiled language is faster development. True for routine tasks. For complex tasks, “your mileage may vary”. As I said, if the software system requirement is inherently complex, then implementation in any language will be complex. When the task is complex, I actually prefer more verbose source code — possibly more transparent and less “opaque”.

Quartz is one example of a big opaque system for a complex task. If you want, I can describe some of the complex tasks (in py) I have come across though I don’t have the level of insight that some colleagues have.

When you said the python debugger was less useful to you than java debugger, it’s a sign of java’s transparency. My “favorite” opaque parts of py are module import and reflection.

If any language has excellent performance/stability + good on-line resources [1] + reasonable library of components comparable to the mature languages like Java/c++, then I feel sooner or later it will catch on. I feel python doesn’t have exactly the performance. In contrast, I think php and javascript can achieve very high performance in their respective usage domains.

[1] documentation is nice-to-have but not sufficient. Many programmers don’t have time to read documentation in-depth.

[16]standard practice around q[delete]

See also post about MSDN overview on mem mgmt…

  • “delete” risk: too early -> stray pointer
  • “delete” risk: too “late” -> leak. You will see steadily growing memory usage
  • “delete” risk: too many times -> double free
  • … these risks are resolved in java and dotnet.

For all simple scenarios, I feel the default and standard idiom to manage the delete is RAII. This means the delete is inside a dtor, which in turn means there’s a wrapper class, perhaps some smart ptr class.

It also means we create stack instances of
– the wrapper class
– a container holding the wrapper objects
– an umbrella holding the wrapper object

Should every departure/deviation be documented?

I feel it’s not a best idea to pass a pointer into some cleanup function, and inside the function, delete the passed-in pointer. What if the pointee is a pointee already deleted, or a pointee still /in active service/, or a non-heap pointee, or a pointee embedded in a container or umbrella… See P52 [[safe c++]]

probability density = always prob mass per unit space

It’s worthwhile to get an intuitive feel for the choice of words in this jargon.

* With discrete probabilities, there’s the concept of “probably mass function”
* With continuous probability space, the corresponding concept is “density function”.

Density is defined as mass per unit space.

For a 1D probability space, the unit space is length. Example – width of a nose is a RV with a continuous distro. Mean = 2.51cm, so the probability density at this width is probably highest…

For a 2D probability space, the unit space is an area. Example – width of a nose and temperature inside are two RV, forming a bivariate distro. You can plot the density function as a dome. Total volume = 1.0 by definition. Density at (x=2.51cm, y=36.01 C) is the height of the dome at that point.

The concentration of “mass” at this location is twice the concentration at another location like (x=2.4cm, y=36 C).

git | manage feature branch #2delete

A few ways to create new feature branch

If you want the branch to show up in Stash, then it is probably simpler to create it on Stash website (choose the correct parent branch …). After that

git fetch # no arg after "fetch"... will show the new branch downloaded to local repository as "remote branch"
git branch -a # list all (local and remote) branches
git checkout feature/new_br_from_stash # creates a matching local branch

Using command line without Stash

git branch feature/my_br # creates a brand-newbranch named feature/my_br, in the local repo only.
git branch # lists all available branches in local repo
git checkout feature/my_br # selects my_br as the default branch.
git branch -m feature/my_BR # renames the default branch to feature/my_BR, in the local repo only.
git push origin feature/my_BR # creates and copies branch to Stash, assuming branch doesn't exist there.

(Note Git branches are designed to be disposable. So in theory you could create a branch for a small change and then merge and throw it away. No best practice no recommendatation yet.)

All feature branches, including temporary throw-away branches, can be deleted from Stash.

risk-factor-based scenario analysis [[Return to RiskMetrics]]

Look at [[return to risk riskMetrics]]. Some risk management theorists have a (fairly sophisticated) framework about scenarios. I feel it’s worth studying.

Given a portfolio of diverse instruments, we first identify the individual risk factors, then describe specific scenarios. Each scenario is uniquely defined by a tuple of 3 numbers for the 3 factors (if 3 factors). Under each scenario, each instrument in the portfolio can be priced.

I think one of the simplest set-up I have seen in practice is the Barcap 2D grid with stock +/- percentage changes on one axis and implied vol figures on the other axis. This grid can create many scenario for an equity derivative portfolio.

I feel it’s important to point out two factors can have non-trivial interdependence and influence each other. (Independence would be nice. In a (small) sample, you may actually observe statistical independence but in another sample of the same population you may not.) Between the risk factors, the correlation is monitored and measured.

##boost modules USED ]finance n asked]IV

#1) shared_ptr (+ intrusive_ptr) — I feel more than half  (70%?) of the “finance” boost usage is here. I feel every architect who chooses boost will use shared_ptr
#2) boost thread
#3) hash tables

Morgan uses Function, Tuple, Any, TypeTraits, mpl

Gregory (RTS) said bbg used shared_ptr, hash tables and function binders from boost

Overall, I feel these topics are mostly QQ types, frequently asked in IV (esp. those in bold) but not central to the language. I would put only smart pointers in my Tier 1 or Tier 2.

—— other modules used in my systems
* tuple
* regex
* noncopyable — for singletons
** private derivation
** prevents synthesized {op= and copier}, since base class is noncopyable.

* polymorphic_cast
* numeric_cast — mostly int types, also float types
* bind?
* scoped_ptr — as non-copyable [1] stackVar,
[1] different from auto_ptr

central limit theorem – clarified

background – I was taught CLT multiple times but still unsure about important details..

discrete — The original RV can have any distro, but many
illustrations pick a discrete RV, like a Poisson RV or binomial RV. I think to some students a continuous RV can be less confusing.

average — of N iid realizations/observations of this RV is the estimate [1]. I will avoid the word “mean” as it’s paradoxically ambiguous. Now this average is the average of N numbers, like 5 or 50 or whatever.

large group — N needs to be sufficiently large, esp. if the original RV’s distro is highly asymmetrical. This bit is abstract, but lies at the heart of CLT. For the original distro, you may want to avoid some extremely asymmetrical ones but start with something like a uniform distro or a pyramid distro. We will realize that regardless of the original distro, as N increases our “estimate” becomes a Gaussian RV.

[1] estimate is a sample mean, and an estimate of the population mean. Also, the estimate is a RV too.

finite population (for the original distro) — is a common confusion. In the “better” illustrations, the population is unlimited, like NY’s temperature. In a confusing context, the total population is finite and perhaps small, such as dice or the birth hour of all my
classmates. I think in such a context, the population mean is actually some definite but yet-unknown number to be estimated using a subset of the population.

log-normal — an extremely useful extension of CLT says “the product of N iid random variables is another RV, with a LN distro”. Just look at the log of the product. This RV, denoted L, is the sum of iid random variables, so L is Gaussian.

perl q[..] range operator

there are 2 largely unrelated operators:

  • list range-operator,
  • scalar range-operator

#1 most common usage (list context) iterates over a range of integers or strings, similar to python xrange(). [1] has clear examples.

#2 (possibly more powerful) usage is during text file line-by-line processing. The operator selects a continuous chunk of lines.  Un-intuitively, this is considered scalar context. [1] has useful example, but need to understand the basics first.

#2b simplest use cases has two hardcoded numbers.

[1] http://perldoc.perl.org/perlop.html#Range-Operators Overall, this article has too many technicalities to obscure the key, useful features.