X years needed2become effective java GTD guy

Update — It would be good to have some measurable yardsticks for GTD competence. Right now the g_GTD_zbs tag includes at least 50 blogs.

XR, (letter sent in July 2010)

Here the goal is not getting the job but keeping the job. You said there’s a difference between a knowledgeable java student and an experienced java developer.

I said I only did java for 3 years.. well maybe 4. I think much of that time I was doing repetitive work, not learning. About half the time (1-2 years) I was learning, by experimenting, reading, discussing, debugging… (I actually learned a lot of java from our long phone conversations.)

I feel if a bright young guy is given a good training project, then in 6 – 24 months he might be able to reach my level of java proficiency.

I also said a lot of young coders could become faster than me with a specific java task after a few months of java learning. Well … an apprentice carpenter could become faster than his master at sawing along a line, but can’t replace the master completely. I feel the hundreds of decisions made each week by an experienced java developer are often based on more experience.

C# is probably same thing – 6 to 24 months to become effective. A very experienced c# friend told me “3 months”. I spent about 6 serious months and another 12 repetitive months on c#…

Venkat, one of the fastest-learning developers I have worked with, said (in Lau Pa Sa) “To get really competent with a new language (like java, c#, python) honestly you need at least one year.” Venkat had the strongest C++ skill I know. I witnessed how he picked up c#. He later struggled a bit with java.

The 64 million dollar question is, what are the really effective learning strategies. I don’t have good answers. Guo Qiao was an advocate of programmer online learning. Some people even suggest taking part in open source projects, to learn from practicing master programmers… I’d love to hear your input.

Thanks.

Advertisements

private gz posts – which blog

Background — I have a few dozen “private and confidential” blog posts about work experiences and plans.

Choice: recrec blog, private view
Choice: 1330152open blog
(Discarded choice: private blog in blogger)

–advantages of recrec:

  • well-designed categories. I often rely on them for my own review
  • the posts in this blog are higher quality and valuable insight
  • I can easily switch on/off the privacy protection, without moving the post

–advantages of the 1330152 blog:

  • the recrec blog was not designed for self review purpose, though it is used that way nowadays

–Conclusion: use quality as criteria. Keep low qualities in 1330152.

JVM = a bytecode interpreter + JIT compiler

I used to think the JVM is a layer on top hardware and executes platform-independent bytecode against the hardware. The hardware components include

  • filesystems
  • network ports
  • CPU and memory
  • kernel threads
  • user input devices + screen

Consider assembly code. I guess assembly code deals directly with the same hardware components, with possible exception of threads.

(Not sure where the operating system kernel comes into play. See https://bintanvictor.wordpress.com/2011/09/08/what-is-kernel-space-vs-userland/)

Now I think JVM includes a JIT compiler that converts bytecode into assembly. See https://bintanvictor.wordpress.com/2016/02/09/javac-jit-2-compilers/

 

##high complexity high mkt-value specializations

Opening example 1: Quartz — High complexity. Zero market value as the deep insight gained is decidedly local and won’t make you a stronger developer on another team.

Opening example 2: IDE, Maven, git, Unix + shell scripting — modest complexity; Makes me stronger developer in real projects, but no premium on the labor market.

My best experiences on Wall St — tech skills with high market value + complexity high enough that few developers could master it. On a project involving these I get better lifestyle, lower stress… Examples:

  • threading
  • java collections
  • SQL complex queries + stored proc. Declining demand in high-end jobs?
  • SQL tuning
  • MOM-based, high volume system implementation — reasonable complexity and market value, but not mainstream. Mostly used in trading only 😦
  • pricing math — high market value but too specialized 😦
  • trading algorithms, price distribution, … Specialized 😦

Let’s look at a few other tech skills:

  • c++ build automation — modest complexity; low value
  • c++ low latency — high value;  barrier too high for me 😦
  • java reflection, serialization — high complexity high practical value, but market value is questionable 😦
  • .NET — some part can be high complexity, but demand is a bit lower than 2011 😦
  • Java tuning — high complexity; not high value practically
  • python — modest complexity, growing market value
  • PHP — lower complexity and lower market value than py, IMHO

in-depth^cursory study@advanced program`topics

I have seen both, and tried both.

AA) A minority of developers spend the time studying and understanding the important details of threading, generics, templates … before writing code.

BB) Majority of developers don’t have the time, vision or patience, and only learn the minimum to get the (sloppy) job done. They copy-paste sample code and rely on testing, but the test cases are limited to realistic UAT test cases and will not cover edge cases that have yet to occur. For concurrency designs, such tests are not good enough.

AA is often seen as unnecessary except:
* Interviewers often go in-depth
* In a product vendor rather than an in-house application team
* If requirement is unusual (eg: B2BTradingEngine), then BB won’t even pass basic developer self-tests. Sample code may not be available — see my post http://wp.me/p74oew-1oI. So developers must bite the bullet and take AA approach. This is my sweet spot. See
https://bintanvictor.wordpress.com/2016/10/31/strength-gtd-with-non-trivial-researchinnovation/ https://bintanvictor.wordpress.com/2016/03/30/theoretical-complexity-strength/
https://bintanvictor.wordpress.com/2016/03/03/tech-specializations-no-such-luxury-in-singapore/

On Wall St projects, I learned to abandon AA and adopt BB.

On Wall St interviews, I still adopt AA.

On West Coast, I think AA is more needed.

## 5common list operations defined concisely

See also https://bintanvictor.wordpress.com/2012/05/06/functional-lang-basic-list-operations-python-stl-linq/

Among the languages (STL, Linq, java stream, javascript, perl), I feel python has a simple and clean set of list operations, but in this post I will use generic names for generic operations.

I believe all these operations treat original list as immutable, and produce a new “copy” as return value. STL has a few algorithms in this category.

Every operation is able to take a lambda as input. Alternatives to lambda — function references, functors, function pointers…. Collectively these are known as FirstClassFunctions. See https://bintanvictor.wordpress.com/2012/01/04/functional-programming-personal-observations-fmd/

Operation Filter — like grep, selects a subset of the original. Each item in the selected subset is unchanged.
** Operation FindAll — is a synonym of Filter.
** Operation Find — returns first item of the subset.

The above Filter-family operations are the simplest operations.

Operation Reduce — (as in MapReduce) is an _aggregation/accumulation_ operation, starting with an optional initial value. It aggregates a list into a single value.
** Operation Fold — is similar but without initial value

Operation Map (as in MapReduce) — creates a different (unlike Filter) object from each list item. Output list is same length as input list, unlike Filter. I also call this “decorator”

—–also-ran operations I wouldn’t rank as most common Functional programming operations:

Operation ForEach

python bisect #cod`IV

The bisect module is frequently needed in coding tests esp. codility. In this write-up, I will omit all other function parameters.

* bisect.bisect_right(x)  # less useful … returns an index i such that
all(val <= x for val in a[lo] to a[i-1]) for the left side and all(val > for val in a[i] to a[hi-1]) for the right side.
* bisect.bisect_left(x) # returns an index i such that
all(val < x for val in a[lo] to a[i-1]) for the left side and all(val >= for val in a[i] to a[hi-1]) for the right side.

In other words,

  • bisect_left(needle) returns the first index above or matching needle.
  • bisect_right(needle) returns the first index above needle.

A few scenarios:

  1. If No perfect hit, then same value returned by both functions.
    • Common scenario: if needle is higher than all, then “i” would both be the last index + 1.
    • Common scenario: if the needle is lower than all, then “i” would both be 0
    • in all cases, You can always insert Before this position
  2. If you get a perfect hit on a list values, bisect_left would return that “perfect” index, so bisect_left() is more useful than bisect_right(). I feel this is similar to std::lower_bound
    • This is confusing, but bisect_right() would return a value such that a[i-1] == x, so the returned “i” value is higher. Therefore, bisect_right() would never return the “perfect” index.
  3. If you have a lower-bound input value (like minimum sqf) that might hit, then use bisect_left(). If it returns i, then all list elements qualify from i to end of list
  4. If you have an upper-bound input value that might hit, then use bisect_left(). If it returns i, then all list values qualify from 0 to i. I never use bisect_right.
  5. Note the slicing syntax in python a[lo] to a[i-1] == a[lo:i] where the lower bound “lo” is inclusive but upper bound “i” is exclusive.
import bisect
needle = 2
float_list = [0, 1, 2, 3, 4]
left = bisect.bisect_left(float_list, needle)
print 'left (should be lower) =', left # 2

right = bisect.bisect_right(float_list, needle)
print 'right (should be higher) =', right # 3

Basic objects/services hosted in a java server

For a basic web server, the resources (on disk) and objects (in memory) hosted in the server are mostly static files.

For a php/perl/python powered web server, the objects hosted would be the scripts to print html. There are almost always some resources beneath those scripts.

Simpler example — for an ftp server, the resources managed are the files.

Another simple example — time server. The resource beneath the server is the host OS.

For a database server, the resources managed by the server are the tables. The server performs heavy-duty CRUD operations on the tables. The most trivial operation — a simple select — is comparable to apache serving a static page.

For a CORBA or RMI server, there are actual “remote” objects and corresponding “skeleton” objects hosted in the server’s memory.

How about a regular java server? Resources — disk files, and databse and other servers on the network. More important are the objects hosted in the java server. They all live in JVM.
* domain entity objects are well-defined, such as
** (Hibernate) entity objects from data sources,
** message objects, and
** objects created from user input
** more generally, objects from external data brought into java via some interface are usually domain entity objects.
* temporary objects — can lead to memory leak if not reclaimed systematically. * infrastructure objects, such as spring beans and MOM system objects. I think 3rd party java packages often introduce many infrastructure objects.

relative funding advantage paradox by Jeff

(Adapted from Jeff’s lecture notes. [[hull]] P156 example is similar.)
Primary Market Financing available to borrowers AA and BB are
AA
BB
fixed rate
7%
7.5%
<= AA’s real advantage
floating rate
Libor + 1%
Libor 1.24%
needs to borrow
floating
fixed
Note BB has lower credit rating and therefore higher fixed/floating interest costs. AA’s real, bigger advantage is “fixed”, BUT prefers floating. This mismatch is the key and presents a golden opportunity.
Paradoxically, regardless of L being 5% or 10% or whatever, AA and BB can both save cost by entering an IRS.
To make things concrete, suppose each needs to borrow $100K for 12M. AA prefers a to break it into 4 x 3M loans. We forecast L in the near future around  6 ~ 6.5%.
— The strategy –
BB to pay 6.15% to, and receive Libor from, AA. So in this IRS, AA is floating payer.
Meanwhile, AA to borrow from Market fixed 7% (i.e. $7k interest) <= AA's advantage
Meanwhile, BB  to borrow from market L + 1.24% (i.e. L+1.25K) <= BB's advantage
——-
To see the rationale, it’s more natural to add up the net INflow —
AA: -L+6.15  -7 = -L-0.85. This is a saving of 15bps
BB:   L -6.15  -L-1.24 = -7.39. This is a saving of 11bps
Net net, AA pays floating (L+0.85%) and BB pays fixed (7.39%) as desired.
Notice in both markets AA enjoys preferential treatment, but the 2 “gaps” are different by 26 bps i.e. 50 (fixed) ~ 24 (floating). AA and BB Combined savings = 26 bps is exactly the difference between the gaps. This 26 bps combined saving is now shared between AA and BB.
———————————————————————————————-
Fake [1] Modified example
AA
BB
fixed rate
7%
7.5%
floating rate
Libor + 1%
Libor 1.74%
<= AA’s real advantage
needs to borrow
fixed
floating
— The strategy –
AA to pay 5.85% to and receive Libor from BB.
Meanwhile, BB  to borrow fixed 7.5% 
Meanwhile, AA to borrow L + 1% <= AA's advantage
Net inflow:
AA:  L -5.85 -L-1 = -6.85, saving 15 bps
BB: -L+5.85-7.5 = -L-1.65, saving 9 bps

[1] [[Hull]] P156 points out that the credit spread (AA – BB) reflects more in the fixed rate than the floating rate, so usually, AA’s advantage is in fixed. Therefore this modified example is fake.

———————————————————————————————-
The pattern? Re-frame the funding challenge — “2 companies must have different funding needs and gang up to borrow $100K fixed and $100k floating total, but only one half of it using AA’s preferential rates. The other half must use BB’s inferior rates.
In the 2nd example, since AA’s advantage lies _more_ in floating market, AA’s floating rate is utilized. BB’s smaller disadvantage in fixed is accepted.
It matter less who prefers fixed since it’s “internal” between AA and BB like 2 sisters. In this case, since AA prefers something (fixed) other than its real advantage (float), AA swaps them “in the family”. If AA were to prefer floating i.e. matching her real advantage, then no swap needed.
Q: Why does AA need BB?
A: only if AA needs something other than its real advantage. Without BB, AA must borrow at its lower advantage (in “fixed” rate market), wasting its real advantage in floating market.

python ‘global myVar’ needed where@@

Suppose you have a global variable var1 and you need to “use” it in a function f1()

Note Global basically means module-level. There’s nothing more global than that in python.

Rule 1a: to read any global variable in any function you don’t need “global”. I think the LGB rule applies.

Rule 1b: to call a mutator method on a global object, you don’t need “global”. Such an object can be a dict or a list or your custom object. Strings and integers have no mutators!

Rule 2: to rebind the variable to another object in memory (i.e. pointer reseat), you need “global” declaration to avoid compilation error. This situation is rare in my projects.

%%tech leadership: a few pathways

Many friends feel I could consider how to move up. As an experienced (I didn’t say “strong”) hands-on developer, I saw a chance to move to lead developer, architect or project manager roles when moving to Asia. It didn’t happen, for several reasons.

Now I feel by default I will remain a senior developer at VP or AVP level, and surpassed by younger competitors. Already I feel many 30-something are stronger than I am at this age, technically.

With aging and family commitment, this default pathway is weighing more heavy.

Some people (like Ken Li) believe we will be fine. My parents also feel I worry too much. However, the world is not so kind and caring. Commenting on business (not individual) competition, Andy Grove said only the paranoid survive. The default pathway might get narrower and we might find it harder to find jobs. (In Singapore I already see the telltale signs.) I feel I need to think harder how to move up. I think there are some visible pathways. See also https://bintanvictor.wordpress.com/2015/12/27/app-architect-civil-engineer-or-a-salesman-sgus/

If I have good enough know-how about certain frameworks and base products (such as spring, or MOM, or DB with embedded business logic) I could lead a team in building a few types of solution: * database-centric web apps
* batch processing of data either, in java or scripts.

There are many other types I’m less familiar with but feel confident I can pick up and lead. * mobile front-end + server side
* javascript front-end + server side
* nosql to replace the RDBMS
* MOM-based

##most used(GTD+)Generic know-how4Wall St developers

#1 java — the ecosystem
#2 c/c++ — not yet my chosen direction
#3 dotnet including c#, WCF, remoting, windows debugging …
#4 py — purely automation in most places(easy instrumentation); advanced py systems in a few places

# unix power user tools – including scripting, instrumentation… More widespread but depth is seldom required compared to SQL and python. More well-defined as a tech skill than windows dev.

# SQL/stored proc — losing pole position in low latency and big data environments, still dominant in traditional apps

# Excel add-in and automation

# javascript

## java 7 features – my picks

(Java 8 and java 5 are major changes compared to java 7)

#1) [advanced] fork and join
#2) [advanced] File change notifications
#3) new filesystem io API
#4) try( FileOutputStream fos = newFileOutputStream(“movies.txt”); DataOutputStream dos = newDataOutputStream(fos) ) {
dos.writeUTF(“Java 7 Block Buster”);
} catch(IOException e) {
// log the exception
}

# catch(ExceptionOne | ExceptionTwo | ExceptionThree e) {
# diamond operator. List> my = new List<> ();

how likely are two threads sharing a CHM were to clash@@

I would also like to point out the java ConcurrentHashMap lets two threads (possibly a writer thread and a reader thread) access the shared data structure concurrently, probably without locking or CAS, when the two threads happen to access two distinct segments.

Note there can be a large number of segments, up to a few hundred at least, so the chance of two threads hitting distinct segments is very high (99.999% chance for a 333-segments map). Therefore, contrary to what Jun said, for reader/writer threads to access the same data structure,  we don’t always need locking in every read and write access.

concurrencyLevel – the estimated number of concurrently updating threads. The implementation may use this value as a sizing hint.

nice builder pattern]%%apps#quant

For a given class, The more complex its construction, the more you need builders. Builders is probably the solution when you have the most complicated object construction logic. If builder can’t handle it then nothing else can.

Example 1: My quant object ForwardCurve construction takes a lot of inputs including yield curve, repo curve, dividend collection, dividend blending … Each of these inputs are non-trivial quant objects, so it’s sometimes cumbersome to pass all of them into a forwardCurve constructor. As an alternative, we use a builder class to _gather_ the inputs and then call myFwdBuilder.constructCurve().

Optional inputs with default values — (supported in c++ but) not supported in java constructors. Therefore, with 3 optional inputs you may need 8 constructors! With a nested builder class, we need just one, private, simple constructor.

Builder can even use a recursive algorithm.

Prepare, prepare, prepare, build — the last build() is quick and exception-safe. The complexity and error handling is all extracted into the builder class.

With this builder, we have the flexibility to call myFwdBuilder.getCurve() any number of time.

We can also query the builder to get attributes that we can’t put into the final product i.e. the constructed ForwardCurve object. For example, the raw dividend input data should not be part of the fwd object. They are digested, then absorbed into the fwd curve in “digested” form.

Note one builder instance holds one curve + all the building materials for that one curve.

With a builder, all the fields can be public final.

Example 2: The Libor yield curve construction takes at least 3 series of rates (ED deposits, ED futures, swaps). So we use builders to manage the complex construction.

Example 3: The volatility object construction is even more complex.

java local var( !! fields)need explicit initialization

http://stackoverflow.com/questions/268814/uninitialized-variables-and-members-in-java briefly mentions the reason.

Instance variables (i.e. fields) of object type default to being initialized to null. Local variables of object type are not initialized by default and it’s a compile time error to access an undefined variable.

For primitives, story is similar

West Coast^Wall St techies #per HenryWu

Henry Wu felt

* west coast techies have clearly higher calibre.

* Wall St is more busy. I guess he means higher workload, faster pace, more quick-n-dirty and lower quality.

* Very few older guys on the west coast. There’s probably implicit age discrimination that’s hard to prove.

* Perm roles pay higher on the West coast. However, his sample may be different from David Leak’s

send()recv() ^ write()read() @sockets

Q: A socket is also a file descriptor, so why bother with send()recv() when you can use write()read()?

A: See https://stackoverflow.com/questions/9475442/unix-domain-socket-vs-named-pipes

send()recv() are recommended, more widely used and better documented.

[[linux kernel]] P623 actually uses read()write() for udp, in stead of sendto()recvfrom(), but only after a call to connect() to set the remote address

STL containers: pick correct SORT for each

Needed in coding IV

  • std::sort() is good for array, vector, deque…
  • std::list has no random access iterator so it must use list::sort() method
  • a single string? You have to construct a vector<char>
  • unordered containers don’t need sorting

map (and set) remain sorted so don’t need a sort operation. They can specify a special sort comparitor either as a template arg or a ctor arg. P316 and P334 [[c++ std lib]] compare the two choices. You can also also pass no comparitor arg but define operator< in the key class

quicksort learning notes #no src

Quick sort is not the most efficient in many situations. It’s not the implementation of sort() in many popular languages. Yet, it’s the most valuable sort to learn, primarily because of interview. I think quicksort is a good test of coding abilities. The challenges:

#1 high-level goal of first pass — hard to remember!
#2 the gist of the first pass
#3 implementation in any language — many pitfalls

http://baike.baidu.com/link?url=rl5hQIbAqdmt53Pmxp7FXhrksHQxjOIb8Knd7dI4xQ0lRSjLNdSbcVj_Dcav-V17dKBe1ggrOWXsDinNPINd22RnFZ2cQ5MDkWdGM8XZwc1TTqtMNfc8kM-0LuT6iCrDR-RigbcKPpYQwK_gDWWOQ_ has real code.

High-level keywords:

  • pivot Object — the rightmost object in a section can be designated the pivot object. Some people use the 1st object or a random object within the section. To keep things simple, we will assume the values are unique
  • final resting place — goal is to put the pivot object into its final resting place within the section, using scan-and-swap
  • swaps — are the mechanism of movements
  • scan — you must scan in some direction
  • partition — after the first pass, the pivot object is in its final resting place and all smaller objects are on its left.

First we need to understand the high-level goal. Then the next challenge is the partition algorithm. The only way to remember the implementation details is writing the code.

On P 146 [[introduction to algorithms]], the procedure partition(A,p,r) does the following on the Section A[p,r] inclusive.

  • it progressively shifts the rightmost (pivot) object from r to the grave/anchor position within the section
  • it keeps the rightmost object value as the benchmark value throughout the procedure.
  • it returns the new index of this object. The index is defined in the entire array A[].
  • it shuffles 0 or more elements within the section
  • it doesn’t try to sort any subsection

Upon receiving an unsorted section, the procedure simply puts the rightmost thingy into the grave position within the section.

Corollary: first scene in the quicksort movie actually completes the job of putting the rightmost object into its final resting place as an anchor within the entire array. After that, we focus on sorting the “left-section” and the “right-section” (in separate threads) without worrying about the first anchor object. Within the left-section, first scene completes the job of putting the rightmost object into its grave, a final resting place within the entire array.

Coding note — the recursion is not using a single function name like A calling A itself. Instead, qsort() calls partition() then qsort(). Most of the work is in partition().

Coding note — partition() function isn’t recursive in itself.