fastest container: array of POD #or pre-sized vector

relevant to low-latency market data.

  • raw array is the most memory efficient; vector is very close, but need to avoid reallocation
  • std::array is less popular but should offer similar performance to vector
  • all other containers are slower and bigger footprint
  • For high-performance, avoid container of pointers — Cache affinity loves contiguous memory. After accessing 1st element, the accessing 2nd element is likely a cache-hit
Advertisements

peers’priority^%%priority: top5 #beyond gz

In a nut shell, Some of my peers’ priorities I have decidedly given up, and some of my priorities are fairly unique to me.

Actually I only spoke to a small number of (like 10) peers, mostly Chinese and Indian. There are big differences among their views. However, here’s a grossly oversimplified sample of “their” priorities.

  • theirs — top school district
  • theirs — move up (long-term career growth) in a good firm?
    • build long-term relationships with big bosses
  • theirs — green card
  • theirs — early retirement
  • ———-
  • mine — diversify (instead of deepen or stack-up) and avoid stagnation
  • mine — stay technical and marketable, till 70
  • mine — multiple properties and passive income
  • mine — shorter commute
  • mine — smaller home price tag

 

after 70,non-profit OK; voluntary work no

After my prime years, when I can only work half the time, I may be able to work towards some meaningful cause, but not completely voluntary work. If there’s no income, I will have low motivation to continue.

With a salary, I feel more commitment, more responsibility.

In our later years, my wife and I also have a non-trivial financial need. I don’t want to depend on my kids or welfare to support ourselves. I may have to continue my drive for more income.

IKM, boost, c++11, templates, sockets] QQ IV #Ashish

Hi Ashish,

How is your new job? What technologies? What technical challenges?

(I just turned down a windows c++ dev role, partly because I’m more interested in linux. I feel there are fewer online resources to help with vc++ developers.)

I now have a little theory on the relative importance of some c++ tech skills in a job candidate. I feel all of the skills below are considered “secondary importance” to most of the (10 – 20) interviewers I have met.

c++11 —— is not yet widely used. Many financial jobs I applied have old codebases they don’t want to upgrade. Most of the c++11 features we use as developers are optional convenience features, Some features are fundamental (decltype, constexpr …) yet treated as simple convenience features.

I feel move semantics and r-value references are fairly deep but these are really advanced features for library writers, not app developers.

Boost —— is not really widely used. 70% of the financial companies I tried don’t use anything beyond shared_ptr. Most of the boost features are considered optional and high-level, rather than fundamental. If I tell them I only used shared_ptr and no other boost library, they will usually judge me on other fronts, without deducting points.

Q: Are there job candidates who are strong with some boost feature (beside shared_ptr) but weak on STL and core c++?
A: I have not seen any.

Q: Are there programmers strong on core c++ but unfamiliar with boost?
A: I have seen many

sockets —— is relevant to some low-latency, network teams. Call them infrastructure team, or engineering team, or back-end team. I just happened to apply for too many of this kind. To them, socket knowledge is essential, but to the “mainstream” c++ teams, socket is non-essential.

(Actually socket api is not a c/c++ language feature, but a system library. Unlike STL, socket library has narrower usage.)

templates —— another advanced feature primarily for library developers. Really deep and complex. App developers don’t really need this level of wizardry. A few experienced c++ guys told me their teams each has a team member using fancy template meta-programing techniques that no one could understand or maintain. None of my interviewers went very deep on this topic.

IKM —— is far less important than we thought. I know many people who score very high but fail the tech interview badly. On the other hand, I believe some candidates score mediocre but impress the interviewers when they come on-site.

STL vector-of-containers: better use pointers

Typical example: If you heavily use a vector of map, it’s more convenient to use a vector of pointers to maps. The java way.

If you drop the “pointers to”, then when you retrieve the map from the vector, you often get a copy, unless you do this:

map<int, int> * mapPtr = &v1[i];

By the way, here’s an initializer for std::map:

vec.push_back(map<int, int>{{32,1}} );

mkt data tech skills: !portable !shared

Raw mkt data tech skill is better than soft mkt data even though it’s further away from “the money”:

  • standard — Exchange mkt data format won’t change a lot. Feels like an industry standard
  • the future — most OTC products are moving to electronic trading and will have market data to process
  • more necessary than many modules in a trading system. However ….. I guess only a few systems need to deal with raw market data. Most down stream systems only deal with the soft market data.

Q1: If you compare 5 typical market data gateway dev [1] jobs, can you identify a few key tech skills shared by at least half the jobs, but not a widely used “generic” skill like math, hash table, polymorphism etc?

Q2: if there is at least one, how important is it to a given job? One of the important required skills, or a make-or-break survival skill?

My view — I feel there is not a shared core skill set. I venture to say there’s not a single answer to Q1.

In contrast, look at quant developers. They all need skills in c++/excel, BlackScholes, bond math, swaps, …

In contrast, also look at dedicated database developers. They all need non-trivial SQL, schema design. Many need stored procs. Tuning is needed if large tables

Now look at market data gateway for OPRA. Two firms’ job requirements will share some common tech skills like throughput (TPS) optimization, fast storage.

If latency and TPS requirements aren’t stringent, then I feel the portable skill set is an empty set.

[1] There are also many positions whose primary duty is market data but not raw market data, not large volume, not latency sensitive. The skill set is even more different. Some don’t need development skill on market data — they only configure some components.

accumulation: contractor vs FTE #XR

XR,

You said that we contractors don’t accumulate (积累) as FTE do.

I do agree that after initial 2Y of tough learning, some FTE could reap the monetary rewards whereas consultants are often obliged, due to contract, to leave the team. (Although there are long-term contracts, they don’t always work out as promised.)

Here’s my experience in GS for 2.5Y. My later months had much lower “bandwidth” tension i.e. the later months required less learning and figure-things-out. Less stress, fewer negative feedbacks, less worry about my own competence, more confidence , more in-control because more familiar with the local system. If my compensation had become 150k I would say that money amounts to “reaping the reward”. In reality, the monetary accumulation was an empty promise.

As a developer stays longer, the accumulation in terms of his value-add to the team is natural and likely [1]. Managers like to point out that after a FTE stays in the team for 2Y her competence, her design, her solutions, her suggestions, her value-add per year grows higher every year. If her initial value-add to the company can be quantified as $100k, every year it grows by 30%. Alas, that doesn’t always translate to compensation.

That’s accumulation in personal income. How about accumulation in tech skill? Staying in one system usually means less exposure to other, or newer, technologies. Some developers prefer to be shielded from newer technologies. I embrace them. I feel my technical accumulation is higher when I keep moving from company to company.

[1] There are exceptions. About 5% of the old timers are, in my view, organization /dead-weights/. Their value-add doesn’t grow and is routinely surpassed within a year by bright new joiners. Often company can’t let them go due to political, legal or ethical reasons.

You said IV questions change over time so much (ignoring the superficial changes) that the IV skills we acquire today is useless in 5Y and we have to again learn new IV skills. This is not intuitive to me. Please give one typical example if you can without a lot of explaining (I understand your time constraints). I guess you mean technology churn? If I prepare for a noSQL or big data interview, then I will probably face technology churn.

On the other hand, in my experience, many interview topics remain ever-green including some hard topics — algorithms (classic algos and creative algos), classic data structures, concurrency, java OO, pass-by reference/value, SQL, unix commands, TCP/UDP sockets, garbage collection, asynchronous/synchronous concepts, pub/sub, producer/consumer, thread pool concepts… In the same vein, most coding tests are similar to 10 yeas ago when I first received them. So the study of these topics do accumulate to some extent.

edit1file]big python^c++ prod system #XR

Q1: suppose you work in a big, complex system with 1000 source files, all in python, and you know a change to a single file will only affect one module, not a core module. You have tested it + ran a 60-minute automated unit test suit. You didn’t run a prolonged integration test that’s part of the department-level full release. Would you and approving managers have the confidence to release this single python file?
A: yes

Q2: change “python” to c++ (or java or c#). You already followed the routine to build your change into a dynamic library, tested it thoroughly and ran unit test suite but not full integration test. Do you feel safe to release this library?
A: no.

Assumption: the automated tests were reasonably well written. I never worked in a team with a measured test coverage. I would guess 50% is too high and often impractical. Even with high measured test coverage, the risk of bug is roughly the same. I never believe higher unit test coverage is a vaccination. Diminishing return. Low marginal benefit.

Why the difference between Q1 and Q2?

One reason — the source file is compiled into a library (or a jar), along with many other source files. This library is now a big component of the system, rather than one of 1000 python files. The managers will see a library change in c++ (or java) vs a single-file change in python.

Q3: what if the change is to a single shell script, used for start/stop the system?
A: yes. Manager can see the impact is small and isolated. The unit of release is clearly a single file, not a library.

Q4: what if the change is to a stored proc? You have tested it and run full unit test suit but not a full integration test. Will you release this single stored proc?
A: yes. One reason is transparency of the change. Managers can understand this is an isolated change, rather than a library change as in the c++ case.

How do managers (and anyone except yourself) actually visualize the amount of code change?

  • With python, it’s a single file so they can use “diff”.
  • With stored proc, it’s a single proc. In the source control, they can diff this single proc
  • with c++ or java, the unit of release is a library. What if in this new build, beside your change there’s some other change , included by accident? You can’t diff a binary 😦

So I feel transparency is the first reason. Transparency of the change gives everyone (not just yourself) confidence about the size/scope of this change.

Second reason is isolation. I feel a compiled language (esp. c++) is more “fragile” and the binary modules more “coupled” and inter-dependent. When you change one source file and release it in a new library build, it could lead to subtle, intermittent concurrency issues or memory leaks in another module, outside your library. Even if you as the author sees evidence that this won’t happen, other people have seen innocent one-line changes giving rise to bugs, so they have reason to worry.

  • All 1000 files (in compiled form) runs in one process for a c++ or java system.
  • A stored proc change could affect DB performance, but it’s easy to verify. A stored proc won’t introduce subtle problems in an unrelated module.
  • A top-level python script runs in its own process. A python module runs in the host process of the top-level script, but a typical top-level script will include just a few custom modules, not 1000 modules. Much better isolation at run time.

There might be python systems where the main script actually runs in a process with hundreds of custom modules (not counting the standard library modules). I have not seen it.

effi^instrumentation ] new project

I always prioritize instrumentation over effi/productivity/GTD.

A peer could be faster than me in the beginning but if she lacks instrumentation skill with the local code base there will be more and more tasks that she can’t solve without luck.

In reality, many tasks can be done with superficial “insight”, without instrumentation, with old-timer’s help, or with lucky search in the log.

What if developer had not added that logging? You are dependent on that developer.

I could be slow in the beginning, but once I build up (over x months) a real instrumentation insight I will be more powerful than my peers including some older timers. I think the Stirt-tech London team guru (John) was such a guy.

In reality, even though I prioritize instrumentation it’s rare to make visible progress building instrumentation insight.