big O, in%%lang

O, o, Omega and theta all try to answer the question “for this algorithm, how many more iterations when N triples?” (Avoid “doubles” to avoid potential ambiguity about 2), where N measures the input.

Example: For the quicksort algorithm, answer is O (N * log N), meaning “iterations increase by a factor of (3 * log 3) when input triples”
Example: For the binary search algorithm, answer is O (log N), meaning “(log 3) times more iterations when input triples”

O () means “no more than iterations”.
O is an upper bound, not a tight bound


enforcing Perl coding standards

Large Perl teams often wish to enforce Perl “coding standards” to control bewildering style variations permitted in Perl.

Perl::Critic applies 256 coding style “policies” and outputs warnings. Management could adopt various levels of enforcements
* automate the check in an automated build/test/release/deploy process
* track violation statstics for each developer and each team
* periodic scan of codebase
* require every developer to check, and send output to a coworker for peer review

These practices are similar to taint checking, “use strict” and -w.

Teams often wish to customize or disable some “policies”. P122 [[ Mastering Perl ]]

The #1 question is “How precisely do Perl::Critic && PPI detect violations”, without false-positives, without false-negatives. How intelligent and reliable is it?

competition from the young and foreigners

可能你觉得比同龄同行同学更动荡, 但他们有一天面对挑战可能没你表现得好, 因为他们没经历过风风雨雨.

当然也可能有的人没经历过却也能应付自如, 判断比你还准.

也有可能他们运气好, 不会被裁退. 我本人不想运气,我不能允许自己依靠公司提供一个铁饭碗.

高屋见/嶙/?, 总的来说, 面对外来者的挑战, “迎”比”避”有一些益处.

%%value-add as a 5-year batch veteran

Why do employers ask for 5 years experience in batch development? Here are the most important value-add of a real veteran, based on my first-hand observation

(See also %% posts on batch wishlist.)

1) robust and resilient. My experience shows that serious batch jobs can fail for a large number of reasons such as unexpected input or network delays

2) Flexibility for change. I think batch apps are seen as quick-and-dirty, and flexible. People ask for more changes cos they assume *cost* of change is lower for batch apps than non-batch apps. Such expectations call for deep experience in batch design.

2A) extensibility, which is slightly different from “flexibility”. Example: adding parallelism, retry.. If not well designed, you often need to throw out old tested codebase and restart from scratch.

) modularization for a development team. Minimize stepping on each other’s toes.

) readability, ease of learning. Batch jobs are often seen as temporary, so documentation and design are lower priorities in batch than non-batch. Many batch applications actually need hand-over and maintenance by a new guy. I think a good system design can ease documentation, learning and knowledge-transfer.

* fine-grained control. Consider the monitoring features of JMX and Weblogic
* testability
* performance optimization experience


Let’s take the base sql vocabulary as a starting point
without joins
without sub queries
without grouping
without agg ie aggregates
without union

Q: which addition is “troublesome” for users?

$ Join is natural to sql. Even outer join is natural.
$ Union is not as natural but simple to understand
$ Sub query is an unnatural addition to sql. ugly.
$ correlated sub query is complex.
$ Group-by imposes restrictions on other parts of a select-statement, such as “select expressions must be …”
$ Agg imposes restrictions, such as “other select expressions must be …”

app design in a fast-paced financial firm#few tips

#1 design goal? flexibility (for change). Decouple. Minimize colleagues’ source code change.

characteristic: small number of elite developers in-house (on wall street)
-> learn to defend your design
-> -> learn design patterns
-> automate, since there isn’t enough manpower

characteristic: too many projects to finish but too few developers and too little time
-> fast turnaround

characteristic: reputation is more important here than other firms
-> unit testing
-> automated testing

characteristic: perhaps quite a large data volume, quite data-intensive
-> perhaps “seed” your design around data and data models

characteristic: wide-spread use of stored proc, but Many java designs aren’t designed to work well with stored proc. Consider hibernate.
-> learn coping strategies

characteristic: “approved technologies”
characterstic: developers move around
-> maintenance left to other guys
-> documentation is ideally “less necessary” if your design is easy to understand
-> learn documentation tools like javadoc

forkey ^ join ^ cartesian-product

referential-integrity ^ forkey ^ any_type_of_join ^ cartesian-product — at the heart of the relational paradigm.

* most if not all joins (including self-join, outer join) are Cartesian in nature, and produce a intermediate Cartesian table (icart) initially. (No need to explain “initially”)
* forkeys exist primarily (if not always) as join-columns
* relational model relies on forkeys at its heart
* normalization usually (if not always) create forkeys