##satisfying occupation{I have(partial)financial freedom

(relevant to “post-65″…)

As I told German and Yihai, I feel my passive income may reach half of my burn rate — some level of financial freedom, how do I want to spend my remaining career on more satisfying endeavors?

  • data science and machine learning? I feel market depth is poor, and I don’t feel confident to outshine
  • more in-depth c++, like ICE projects
  • Wenqiang’s innovative xml tool? I thought it had “value” and commercial potential but Wrong
  • dotnet? I used to feel this is up and coming, with strong potential
  • swing? I used to feel it needs help but no no … I feel it’s losing ground hopelessly.
  • See also after 65..contribute to stackOverFlow+OSS testing

Elements of a satisfying job after I achieve partial financial freedom:

  1. respect from team; decent benchmarking within the team.
    1. free of the heaviest stressors life performance improvement pain
  2. personal strength, so I can excel and earn the respect. Eg: Barc, 95Greene, Zed, Catcha, Chartered
  3. sustainable social value and social impact. Very hard. May have to Let Go
  4. commute
  5. freedom to blog and experiment with coding in the office
  6. –LG2
  7. workload? I don’t mind if I really like the work, but there will always be some drudgery workload in any job.
  8. salary?
  9. market depth?

MOM^shared memory ring buffer^UDP : mkt data transmission

I feel in most environments, the MOM design is most robust, relying on a reliable middleware. However, latency sensitive trading systems won’t tolerate the additional latency and see it as unnecessary.

Gregory told me about his home-grown simple ring buffer in shared memory. He used a circular byte array. Message boundary is embedded in the payload. When the producer finishes writing to the buffer, it puts some marker to indicate end of data. Greg said the consumer is slower, so he makes it a (periodic) polling reader. When consumer encounters the marker it would stop reading. I told Gregory we need some synchronization. Greg said it’s trivial. Here’s my idea —

Design 1 — every time the producer or the consumer starts it would acquire a lock. Coarse-grained locking

But when the consumer is chipping away at head of the queue, the producer can simultaneously write to the tail, so here’s

Design 2 — the latest message being written is “invisible” to the consumer. Producer keeps the marker unchanged while adding data to the tail of queue. When it has nothing more to write, it moves the marker by updating it.

The marker can be a lock-protected integer representing the index of the last byte written.

No need to worry about buffer capacity, or a very slow consumer.

MOM UDP multicast or TCP shared_mem
how many processes 3-tier 2-tier 2-tier
1-to-many distribution easy easiest doable
intermediate storage yes tiny. The socket buffer is typically 64MB yes
producer data burst supported message loss is common in such a situation supported
async? yes no yes
additional latency yes zero latency minimal

As CTO, I’d favor transparent langs, wary of outside libraries

If I were to propose a system rewrite, or start a new system from scratch without constraints like legacy code, then Transparency (+instrumentation) is my #1 priority.

  • c++ is the most opaque. Just look at complex declarations, or linker rules, the ODR…
  • I feel more confident debugging java. The JVM is remarkably well-behaving (better than CLR), consistent, well discussed on-line
  • key example — the SOAP stub/skeleton hurts transparency, so does the AOP proxies. These semi-transparent proxies are not like regular code you can edit and play with in a sandbox
  • windows is more murky than linux
  • There are many open-source libraries for java, c++, py etc but many of them affects transparency. I think someone like Piroz may say Spring is a transparent library
  • SQL is transparent except performance tuning
  • Based on my 1990’s experience, I feel javascript is transparent but I could be wrong.
  • I feel py, perl are still more transparent than most compiled languages. They too can become less transparent, when the codebase grows. (In contrast, even small c++ systems can be opaque.)

This letter is about which language, but allow me to digress briefly. For data store and messaging format (both require serialization), I prefer the slightly verbose but hugely transparent solutions, like FIX, CTF, json (xml is too verbose) rather than protobuf. Remember 99% of the sites use only strings, numbers, datetimes, and very rarely involve audio/visual data.

[13]arrayName^ptrVar – spreadsheet

See other posts why an array name is a permanent name plate on a permanently allocated room in memory. Once you understand that, you know
– can’t move this name plate to another room
– can’t put a 2nd name plate on the same room
– can’t put this array name on the LHS of assignment

I feel overall, in most everyday contexts you can treat the array name in this example as if it’s a variable whose type is int*. However, the differences lurk in the dark like snakes, and once bitten, you realize there are many many differences. The 2 constructs are fundamentally different.

Q: how to edit the below html table structure (content is easy)?
A: edit the html
A: copy paste to ms-word

ptr to heap array ptr-var array-name (array not on heap)
declaration array-new int* p//allocate 32bit int arr[3]//allocate 3 ints
declared as a struct/class field same as ptr-var 4-bytes onsite entire array embedded onsite
declared as func param var no such thing common converted a ptr-var by compiler
initialize probably not supported char* p=”abc”;// puts the literal string in RO memory char arr[]=”xyz”; //copying the literal string from RO memory to stack/heap where the array is allocated. The new copy is editable.
object passed as func arg same as ptr-var common converted to &arr[0]
dereference same as ptr-var common gives the first int element value. P119[[primer]]
address of same as ptr-var double ptr &arr == arr. See headfirstC post and also
rebind/reseat same as ptr-var common syntax error
add alias to the pointee same as ptr-var common unsupported
as LHS (not as func param) same as ptr-var common. Reseat syntax error

accumulation: contractor vs FTE #XR


You said that we contractors don’t accumulate (积累) as FTE do.

I do agree that after initial 2Y of tough learning, some FTE could reap the monetary rewards whereas consultants are often obliged, due to contract, to leave the team. (Although there are long-term contracts, they don’t always work out as promised.)

Here’s my experience in GS for 2.5Y. My later months had much lower “bandwidth” tension i.e. the later months required less learning and figure-things-out. Less stress, fewer negative feedbacks, less worry about my own competence, more confidence , more in-control because more familiar with the local system. If my compensation had become 150k I would say that money amounts to “reaping the reward”. In reality, the monetary accumulation was an empty promise.

As a developer stays longer, the accumulation in terms of his value-add to the team is natural and likely [1]. Managers like to point out that after a FTE stays in the team for 2Y her competence, her design, her solutions, her suggestions, her value-add per year grows higher every year. If her initial value-add to the company can be quantified as $100k, every year it grows by 30%. Alas, that doesn’t always translate to compensation.

That’s accumulation in personal income. How about accumulation in tech skill? Staying in one system usually means less exposure to other, or newer, technologies. Some developers prefer to be shielded from newer technologies. I embrace them. I feel my technical accumulation is higher when I keep moving from company to company.

[1] There are exceptions. About 5% of the old timers are, in my view, organization /dead-weights/. Their value-add doesn’t grow and is routinely surpassed within a year by bright new joiners. Often company can’t let them go due to political, legal or ethical reasons.

You said IV questions change over time so much (ignoring the superficial changes) that the IV skills we acquire today is useless in 5Y and we have to again learn new IV skills. This is not intuitive to me. Please give one typical example if you can without a lot of explaining (I understand your time constraints). I guess you mean technology churn? If I prepare for a noSQL or big data interview, then I will probably face technology churn.

On the other hand, in my experience, many interview topics remain ever-green including some hard topics — algorithms (classic algos and creative algos), classic data structures, concurrency, java OO, pass-by reference/value, SQL, unix commands, TCP/UDP sockets, garbage collection, asynchronous/synchronous concepts, pub/sub, producer/consumer, thread pool concepts… In the same vein, most coding tests are similar to 10 yeas ago when I first received them. So the study of these topics do accumulate to some extent.

edit1file]big python^c++ prod system #XR

Q1: suppose you work in a big, complex system with 1000 source files, all in python, and you know a change to a single file will only affect one module, not a core module. You have tested it + ran a 60-minute automated unit test suit. You didn’t run a prolonged integration test that’s part of the department-level full release. Would you and approving managers have the confidence to release this single python file?
A: yes

Q2: change “python” to c++ (or java or c#). You already followed the routine to build your change into a dynamic library, tested it thoroughly and ran unit test suite but not full integration test. Do you feel safe to release this library?
A: no.

Assumption: the automated tests were reasonably well written. I never worked in a team with a measured test coverage. I would guess 50% is too high and often impractical. Even with high measured test coverage, the risk of bug is roughly the same. I never believe higher unit test coverage is a vaccination. Diminishing return. Low marginal benefit.

Why the difference between Q1 and Q2?

One reason — the source file is compiled into a library (or a jar), along with many other source files. This library is now a big component of the system, rather than one of 1000 python files. The managers will see a library change in c++ (or java) vs a single-file change in python.

Q3: what if the change is to a single shell script, used for start/stop the system?
A: yes. Manager can see the impact is small and isolated. The unit of release is clearly a single file, not a library.

Q4: what if the change is to a stored proc? You have tested it and run full unit test suit but not a full integration test. Will you release this single stored proc?
A: yes. One reason is transparency of the change. Managers can understand this is an isolated change, rather than a library change as in the c++ case.

How do managers (and anyone except yourself) actually visualize the amount of code change?

  • With python, it’s a single file so they can use “diff”.
  • With stored proc, it’s a single proc. In the source control, they can diff this single proc
  • with c++ or java, the unit of release is a library. What if in this new build, beside your change there’s some other change , included by accident? You can’t diff a binary 😦

So I feel transparency is the first reason. Transparency of the change gives everyone (not just yourself) confidence about the size/scope of this change.

Second reason is isolation. I feel a compiled language (esp. c++) is more “fragile” and the binary modules more “coupled” and inter-dependent. When you change one source file and release it in a new library build, it could lead to subtle, intermittent concurrency issues or memory leaks in another module, outside your library. Even if you as the author sees evidence that this won’t happen, other people have seen innocent one-line changes giving rise to bugs, so they have reason to worry.

  • All 1000 files (in compiled form) runs in one process for a c++ or java system.
  • A stored proc change could affect DB performance, but it’s easy to verify. A stored proc won’t introduce subtle problems in an unrelated module.
  • A top-level python script runs in its own process. A python module runs in the host process of the top-level script, but a typical top-level script will include just a few custom modules, not 1000 modules. Much better isolation at run time.

There might be python systems where the main script actually runs in a process with hundreds of custom modules (not counting the standard library modules). I have not seen it.

effi^instrumentation ] new project

I always prioritize instrumentation over effi/productivity/GTD.

A peer could be faster than me in the beginning but if she lacks instrumentation skill with the local code base there will be more and more tasks that she can’t solve without luck.

In reality, many tasks can be done with superficial “insight”, without instrumentation, with old-timer’s help, or with lucky search in the log.

What if developer had not added that logging? You are dependent on that developer.

I could be slow in the beginning, but once I build up (over x months) a real instrumentation insight I will be more powerful than my peers including some older timers. I think the Stirt-tech London team guru (John) was such a guy.

In reality, even though I prioritize instrumentation it’s rare to make visible progress building instrumentation insight.

##transparent^semi-transparent^opaque languages

With a transparent language, I am very likely (high correlation) to have higher GTD/productivity/KPI.

Stress — correlates with opacity. Would I take a job in php, SQL, javascript, perl?

Confidence to take on a gig — The more transparent, the higher my confidence — like py, javascript

Bootstrap — with a transparent language, I’m confident to download an open source project and hack it (Moodle …). With an opaque language like C++, I can download, make and run it fine, but to make changes I often face the opacity challenge. Other developers are often more competent at this juncture.

Learning — The opaque parts of a language requires longer and more “tough” learning, but sometimes low market value or low market depth.

Competitiveness — I usually score higher percentiles in IV, and lower percentiles in GTD. The “percentile spread” is wider and worse with opaque languages. Therefore, I feel 滥竽充数 more often

In this context, transparency is defined as the extent to which you can use __instrumentation__ (like debugger or print) to understand what’s going on.

  • The larger systems tend to use the advanced language features, which are less transparent.
  • The more low-level, the less transparent.

–Most of the items below are “languages” capable of expressing some complexity:

  • [T] SQL, perl, php, javascript, ajax, bash
  • [T] stored proc unless complex ones, which are uncommon
  • [T] java threading is transparent to me, but not other developers
  • [S] java reflection-based big systems
  • [T] regular c++, c# and java apps
  • [O]… but consider java dynamic proxy, which is used in virtually every non-trivial package
  • [T] most python scripts
  • [S] … but look at python import and other hacks. Import is required in large python systems.
  • [O] quartz
  • [S] Spring underlying code base. I initially thought it was transparent. Transparent to Piroz
  • [O] Swing visual components
  • [O] C# WCF, remoting
  • [T] VBA in Excel
  • —-below are not “languages” even in the generalized sense
  • [S] git .. semi-transparency became stressor cos KPI!
  • [O] java GC … but NO STRESS cos so far this is not a KPI
  • [O] MOM low level interface
  • [S] memory leak detectors in java, c#, c++
  • [O] protobuf. I think the java binding uses proxies
  • [T] XML + simple SAX/DOM
  • [S =semi-transparent]
  • [O =opaque]


zero sum game #my take

“Zero sum game” is a vague term. One of my financial math professors said every market is a zero sum game. After the class I brought up to him that over the long term, the stock (as well as gov bond) market grows in value [1] so the aggregate “sum” is positive. If AA sells her 50 shares to BB who later sells them back to AA, they can all become richer. With a gov bond, if you buy it at par, collect some coupons, sell it at par, then everyone makes money. My professor agreed, but he said his context was the very short term.

Options (if expired) and futures look more like ZSG to me, over any horizon.

If an option is exercised then I’m not sure, since the underlier asset bought (unwillingly) could appreciate next day, so the happy seller and the unwilling buyer could both grow richer. Looks like non-zero-sum-game.

Best example of ZSG is football bet among friends, with a bookie; Best example of NZSG is the property market. Of course we must do the “sum” in a stable currency and ignore inflation.

[1] including dividends but excluding IPO and delisting.

threading, boost – IV questions

Read-guard, write-guard ?

Q: how to test if current thread already holds the lock?
AA: Boost namespace  function get_id() returns An instance of boost::thread::id class that represents that currently executing thread.
AA: pthread_self() is a C function (=> non-method) that returns the thread id of the current thread.
AA: see the pthreads recursive mutex note below.
A: using this thread::id object a lock can “remember” which thread is holding it.

Q: Is scoped_lock reentrant by default? Can you make it reentrant?
AA: With a boost recursive mutex, a single thread may lock the same mutex several times and must unlock the mutex the **same-number-of-times**.

AA: pthreads supports it. http://www.opengroup.org/onlinepubs/009695399/functions/pthread_mutex_lock.html says — If the mutex type is PTHREAD_MUTEX_RECURSIVE and the mutex is currently owned by the calling thread, the mutex lock count shall be incremented by one and the pthread_mutex_trylock() function shall immediately return success.

Q: Boost supports try_lock()?
AA: yes

Q: Boost supports lockInterruptibly()?
A: probably not, but there’s timed_lock

Q: Boost supports reader/writer locks?
AA: read/write lock in the form of boost::shared_mutex

Q: difference between mutex and condition var?

Q: can you write code to implement “recursive” locking with reader/writer threads?
A: key technique is a bool isHoldingLock(targetlock)

Q: what types of smart pointers did you use and when. What are the differences? I’d like to focus on the fundamentals not the superstructures.

Q: what other boost lib?
%%A: I briefly used boost bind. I believe STL binders are imperfect, and lambdas are superior

Q2: if I want a custom class as a map key?
%%A: must overload the less-than operator

Q2c: what if I can’t change the source code?
%%A: derive from binary_function to get a functor, and pass it to map constructor as the optional type param. However, the new functor only has “operator()” overloaded? I think it will still work, as the function doesn’t have to be named “less()”.
AA: Item 42 [[eff STL]]

%%Q2g: is operator less-than a friend func or a method?
AA: can be a non-friend(if everything public) or a friend class/func.