denigrate%%intellectual strength #ChengShi

I have a real self-esteem problem as I tend to belittle my theoretical and low-level technical strength. CHENG, Shi was the first to point out “你就是比别人强”.

  • eg: my grasp of middle-school physics was #1 strongest across my entire school (a top Beijing middle school) but I often told myself that math was more valuable and more important
  • eg: my core-java and c++ knowledge (QQ++) is stronger than most candidates (largely due to absorbency++) but i often say that project GTD is more relevant. Actually, to a technical expert, knowledge is more important than GTD.
  • eg: I gave my dad an illustration — medical professor vs GP. The Professor has more knowledge but GP is more productive at treating “common” cases. Who is a more trusted expert?
  • How about pure algo? I’m rated “A-” stronger than most, but pure algo has far lower practical value than low-level or theoretical knowledge. Well, this skill is highly sought-after by many world-leading employers.
    • Q: Do you dismiss pure algo expertise as worthless?
  • How about quant expertise? Most of the math has limited and questionable practical value, though the quants are smart individuals.

Nowadays I routinely trivialize my academic strength/trec relative to my sister’s professional success. To be fair, I should say my success was more admirable if measured against an objective standard.

Q: do you feel any IQ-measured intelligence is overvalued?

Q: do you feel anything intellectual (including everything theoretical) is overvalued?

Q: do you feel entire engineering education is too theoretical and overvalued? This system has evolved for a century in all successful nations.

The merit-based immigration process focus on expertise. Teaching positions require expertise. When laymen know you are a professional they expect you to have expertise. What kind of knowledge? Not GTD but published body of jargon and “bookish” knowledge based on verifiable facts.

lambda^anon class instance ] java

A java lambda expression is used very much like an instance of an anonymous class. However, http://tutorials.jenkov.com/java/lambda-expressions.html#lambda-expressions-vs-anonymous-interface-implementations pointed out one interesting difference:

The anonymous instance in the example has a field named. A lambda expression cannot have such fields. A lambda expression is thus said to be stateless.

bone health for dev-till-70 #CSY

Hi Shanyou,

I have a career plan to work as a developer till my 70’s. When I told you, you pointed out bone health, to my surprise.

You said that some older adults suffer a serious bone injury and become immobile. As a result, other body parts suffer, including weight, heart, lung, and many other organs. I now believe loss of mobility is a serious health risk.

These health risks directly affect my plan to work as a developer till my 70’s.

Lastly, loss of mobility also affects our quality of life. My mom told me about this risk 20 years ago. She has since become less vocal about this risk.

Fragile bones become more common when we grow older. In their 70’s, both my parents suffered fractures and went through surgeries.

See ## strengthen our bones, reduce bone injuries #CSY for suggestions.

##With time2kill..Come2 jobjob blog

  • for coding drill : go over
    • [o] t_algoClassicProb
    • [o] t_commonCodingQ22
    • t_algoQQ11
    • open questions
  • go over and possibly de-list
    1. [o] zoo category — need to clear them sooner or later
    2. [o] t_oq tags
    3. t_nonSticky tags
    4. [o] t_fuxi tags
    5. Draft blogposts
    6. [o] *tmp categories? Sometimes low-value
    7. remove obsolete tags and categories
  • Hygiene scan for blogposts with too many categories/tags to speed up future searches? Low value?
  • [o=good for open house]

impostor’s syndrome: IV^on-the-job

I feel like impostor more on the job than in interviews. I have strong QQ (+ some zbs) knowledge during interviews. I feel it more and more often in my c++ in addition to java interviews.

Impostor’s syndrome is all about benchmarking. In job interviews, I sometimes come up stronger than the interviewers, esp. with QQ topics, so I sometimes feel the interviewer is the impostor !

In my technical discussions with colleagues, I also feel like an expert. So I probably make them feel like impostors.

So far, all of the above “expert exchanges” are devoid of any locaySys. When the context is a localSys, I have basically zero advantage.  I often feel the impostor’s syndrome because I probably oversold during interview and set up sky-high expectations.

y C++will live on #in infrastructure

I feel c++ will continue to dominate the “infrastructure” domains while application developer jobs will continue to shift towards modern languages.

Stroustrup was confident that the lines of source code out there basically ensure that c++compiler will still be needed 20 years out. I asked him “What competitors do you see in 20 years”. He estimated there are billions of c++ source code by line count.

I said C would surely survive and he dismissed it. Apparently, many of the hot new domains rely on c++. My examples below all fall under the “infrastructure” category.

  • mobile OS
  • new languages’ runtimes such as the dotnet CLR, JVM
  • blockchain mining — compute intensive
  • TensorFlow
  • AlphaGo
  • google cloud
  • Jupyter for data science
  • Most deep learning base libraries are written in c++, probably for efficiency

##[19]Y WallStContract=%%best Arena #Grandpa

Competition arenas … we all CHOOSE the arena to compete in. It’s a choice, either implicit choice or explicit choice. I would say better to be conscious about this choice.

Not much new content in this blogpost. I feel very convinced to stick with WallSt contract market. Here is a ranking of the reasons why I consider it a rational decision, though our decisions are shaped by our deeply personal experiences and inherently irrational.

Beware of attachment !

  1. low stress, low expectation — my #1 reason as of 2019
  2. low-caliber competitors, mostly due to the “offputting” below
  3. age friendly
  4. I get to practice interviews and keep a precious burning-pleasure for a month each year on average.
    1. In contrast, If I were an ibank VP I would have multiple obstacles on that front.
  5. — other reasons to prefer Wall St contract
  6. higher probability of greenfield projects
  7. leverage on domain knowledge
  8. I can easily explain my job hopping profile
  9. ?? a number of firms to hop around

Now the downside, off-putting factors. ## Y most young dev shun contracts — Many bright or young competitors are put off by these factors, reducing the competition.

PIP@Macq: tough judge@%%design

If I were the judge, then Kevin’s solution may get rejected or rated mediocre.

I think the judgement can be unreasonably tough when the judge herself is a practitioner — consider Yang and Sundip Jangi.

On the other hand,

  • Yang liked my OO design in EOS
  • Sundip liked my personalization design

The outcome (PIP etc) doesn’t mean my work (i.e. output) is sub-standard. The outcome has many reasons and causes.

I need to be fair and impartial to myself. [[learned optimism]] uses the three P’s. One of them is Personal.

tsn: what if I fail due2capabilities #Okao

Yet another revisit. See also post on dare2fail.

My intern David Okao asked “What if the west coast workplaces are too demanding? Can you cope?” I replied

  • As an adventurer, I don’t mind the risk… robust, resilient confidence
  • As an adventurer, I see myself as adaptable, a survivor
  • I may have my secret weapons
  • I may find my strengths such as domain knowledge, data analysis, trouble-shooting, efficient design, math(yes)

I then went over a similar discussion about MLP with Ashish, when I said —

  • If similar to Macq, I put up a good fight but still fail due to personal “capabilities”, I ought to feel positive about the whole experience.
  • I’m good at the job-hunting game so no real worries.

Q: biggest c++dev experiences #$5k DBS

In 2015 I toyed with the outlandish idea of applying for a $5k/M DBS job just to get a hardcore big-project c++ dev experience. Luckily I didn’t get into that job, as such a learning experience would be underwhelming, not enriching.

The criteria for “big” depends on the specific skills (zbs/GTD or IV) you want to demonstrate or learn.

  • If you mean navigation, tracing, build, enhancement, large team BestPractices,,, in a large codebase, then MS is largest, followed by Macq MTS and Macq quant.
  • If you mean substantial source code size that demonstrates data structures, threading,,, then my NYSE integrated feed parser is largest
    followed by my weekend coding interview projects. Mvea is next biggest.

It’s therefore worthwhile to review those weekend projects, mostly hosted in cppProj.

— How about “expert” status? I would say none of these large projects are relevant.

In interviews and online discussions, the acid tests for “expert” is invariably some low-level, theoretical details.

## IV skills: disaster-rescue 逃生

I have relied on this parachute over and over again. Therefore, I have real motivation to strengthen this parachute and keep it in good condition.

  • OC — parachute saved me from transfer to BAU
  • 2007 after first entry to U.S., parachute saved me from sitting on bench for months as happened to … Florence
  • Vz — after my contract was cut unexpectedly, parachute saved me from sitting on bench. This proves to be the first of many rescues including 95G, Barclays,
  • [B] Stirt — after the layoff, parachute helped me avoid sitting on bench for months.
  • [B] Macq — after the PIP, parachute saved me from another slow job search on SG job market
  • [B] deMunk — parachute gave me confidence that I had the technical capabilities to escape — My “parachute” was my marketable skillset
  • [B=example of boxer cornered n beaten]

Other people’s examples:

  • [B] Youwei — his java IV skills rescued him from the MS layoff
  • Venkat of OC — it rescued him from a terrible boss
  • Shanyou and Deepak — need better parachutes
  • Jack Zhang
  • Davis Wei

job hopper: more common@WestCoast

I first noticed that west coast employers didn’t care about my job-hopper profile. Then I saw more evidence

## localSys note-taking: choices

Other choices include jira

MSOL
paper Windows10 StickyNotes MSOL Notes #can hide 🙂 Tasks mail folders recoll #1 default
Encourages Periodic refresh AAA unsuitable B # everything  is temporary C # “browse” F #needs discipline F #needs discipline
browse A B #click to see details A B #copy-edit subject B #click to see details
searchable F unsuitable unsuitable A D #too many C
Organization of 30+ topics Unsuitable F #intrusive until hidden D C # categories B AAA # hierarchical
Radiator AAA B # Kyle use it long-term F F D F
Screen shot F F F A A B
ez2edit B A A Mostly readonly A
locate by title +!search within small population only if in front A iFF tiny population B #no tree but can put into one category D #must search B #filename not always fully used

real effort but below-bar@@ only Macq

Macq is probably the only job where I focused on localSys GTD but still fell below the bar.

The PIP cast a long shadow and left a deep scar. Am still recovering. This scar is deeper than Stirt …

Remember the abused children, who grew up traumatized? I was not traumatized as a kid. My traumatic experience is still /devastating/ but I can handle it. I have the maturity to handle it.

Adults are often traumatized by failed marriage, in-law conflicts,..

##conquests since GS #j,c++..

Until I left GS, I didn’t know how it feels to “conquer” a sizable, lucrative tech skill. Such a tech skill represents a specific job market with supply and demand.

  • perl? not sizable not lucrative 😦 Until 2011, Perl was my only core competency 😦 How lucky am I now !
  • coreJava QQ (as defined on WallSt) .. was my first conquest. After this conquest I haven been trying to replicate my success story, while many peers stayed in one firm in order to move up.
  • SQL? Many interview topics not explored, but now SQL is no longer a sizable job market:(
  • MOM? Not sizable
  • sockets? not so sizable, not yet conquered
  • bond math … was a small conquest
  • c++ QQ .. was my 2nd conquest, as experienced in 2017 onward
  • CIV .. was my 3rd conquest. Growing bigger, though I only rate myself B among west coast candidates.

burn-out: intensity^overtime #CNA

For 99.5% of us, high intensity is unsustainable almost by definition. I believe Intensity doesn’t equal GTD output, which is usually measured by some numbers, or subjectively by observers. As a student, I had a strong foundation (“abilities”), so a moderate intensity was able to create high output. Intensity factors include (not limited to)

  • non-stop mental focus
  • truly high-pace workplace
  • creative engine or analytical engine
  • deadlines
  • multi-tasking and context switching
  • quick recall of huge amount of details
  • GS style time booking

https://www.channelnewsasia.com/news/commentary/working-relentless-pace-wont-help-career-prospects-10596240?cid=h3_referral_inarticlelinks_24082018_cna is a good online article

During summer, the firm works Just four days — a total of 32 hours. The company’s summer workload must fit reduced hours, Mr Fried insists. This employer doesn’t offer “flexible” hours in exchange for intensity

Employers and policymakers focus a lot on the excessive hours, but compared with overtime, work intensity predicts much greater reductions in well-being and career-related outcomes.

python *args **kwargs: cheatsheet

“asterisk args” — I feel these features are optional in most cases. I think they can create additional maintenance work. So perhaps no need to use these features in my own code.

However, some codebases use these features so we had better understand the syntax rules.

— Inside the called function astFunc(),

Most common way to access the args is a for-loop.

It’s also common to forward these asterisk arguments:

def astFunc(*args, **kwargs):
anotherFunc(*args, **kwargs)

I also tested reading the q[ *args ] via list(args) or args[:]

— how to use these features when calling a function:

  • astFunc(**myDict) # astFunc(**kwa)
  • simpleFunc(**myDict) # simpleFunc(arg1, arg2) can also accept **myDict

See my github

brank ≈ luxury car #white elephant

What’s in common between a luxury car and a brank like “Director”?

  • enviable, glamorous, glorifying
  • high maintenance, high visibility
  • you need some minimum personal capacity (“performance“) to maintain it over a long time
  • can wear out quickly, esp. when you age
  • Not a passive income generator,
  • perhaps not a reliable cash-cow asset,

My contractor job is like a mid-range reliable Japanese car.

Q:are java primitive+reference on heap or stack #escape

An old question but my answers are not really old 🙂

In Java, a so-called “referent” is a non-primitive thingy with a unique address on heap, accessed via heap pointers.

In java, a referent is always an Object, and an Object is always on heap therefore always a referent.

(Java language defines “reference types” in terms of primitive types, so we need a clear understanding of primitive types first.)

In java, a primitive thingy is either part of a (heapy) Object or a local thingy on stack

(In C++ lingo, object can be a new int(88)…)

A reference is, at run-time really a heap pointer. Assuming 32-bit machine, the pointer itself occupies 4 bytes and must be allocated somewhere. If the reference is local to a method, like a parameter or a local variable, then the 4 bytes are on stack like a 32-bit primitive local variable. If it’s part of an object then it’s on heap just like a 32-bit primitive field.

— advanced topic: escape

Escape analysis is enabled by default. EA can avoid construction an Object on heap, by using the individual fields as local variables.

— advanced topic: arrays

Arrays are special and rarely quizzed. My unverified hypothesis:

  • at run-time an array of 3 ints is allocated like an Object with 3 int-fields
  • at run-time an array of 3 Dogs is allocated like an Object with 3 Dog fields. This resembles std::vector<shared_ptr<Dog>>
  • Q: how about std::vector<Dog>?
  • %%A: I don’t think java supports it.
  • The array itself is an Object

q[static thread_local ] in %%production code

static thread_local std::vector<std::vector<std::string>> allFills; // I have this in my CRAB codebase, running in production.

Justification — In this scenario, data is populated on every SOAP request, so keeping them as non-static data members is doable but considered pollutive.

How about static field? I used to think it’s thread-safe …

When thread_local is applied to a variable of block scope, the storage-class-specifier static is implied if it does not appear explicitly. In my code I make it explicit.

live-free-of restrictions #LS

I can live with the restrictions of diet, burn rate, workout routine, perhaps strict work hours … but not LiuShuo style self-restraint in office context. Why?

  • I feel the suffering is not worthwhile.
  • I don’t feel self-confident I can “improve” myself.

I want to live free of this last category of restrictions. For this, I would let go of any leadership career.

job hopper^stagnant mediocre#CNA

https://www.channelnewsasia.com/news/commentary/career-mobility-new-normal-career-stability-job-hopping-11469760 is a Singapore perspective:

Long gone is the notion of the career ladder, where the ideal CV looks like a narrow, vertical progression. Today’s gold-standard CV looks like a career matrix, with horizontal and vertical moves signifying depth and breadth of experience, skills and exposure to different cultures.

Employers have gone from being cynical about hiring job-hoppers to becoming accustomed to seeing diverse CVs from top talent who are in frequent demand.

##%%c++(n java)memoryMgmt trec : IV

Q: Your resume says “memory mgmt” for c++, so what did you do personally, not as a team?
A: I will talk about the features in my system. Most of them are not written by me, honestly

  • per-class: restricting allocation to heap or stack
  • per-class: customize operator new for my class
  • per-class: disable operator new for my class and provide allocateFromRingBuffer() API
  • per-class: move support
  • static thread_locals
  • ring buffer to eliminate runtime allocation
  • custom smart pointers
  • memory leak detector tools like valgrind+massif .. breakdown heap/non-heap footprint@c++app #massif
  • RAII
  • pre-allocate DTOs@SOD #HFT #RTS
  • customize new_handler() — I didn’t write this

Q: java

  • GC tuning
  • memory sizing
  • JDK tools like jhat
  • memory profiler tools
  • string intern
  • investigating memory leaks, all related to GC

prod write access to DB^app server@@

Q: Is production write access more dangerous in DB or app server?
A: I would say app server, since a bad software update can wipe out production data in unnoticeable ways. It could be a small subset of the data and unnoticeable for a few days.

It’s not possible to log all database writes. Such logging would slow down the live system and take up too much disk space. It’s basically seen as unnecessary.

However, tape backup is “protected” from unauthorized writes. It is usually not writable by the app server. There’s a separate process and separate permission to create/delete backup tapes.

%%geek profile cf 200x era, thanks2tsn

Until my early 30’s I was determined to stick to perl, php, javascript, mysql, http [2] … the lighter, more modern technologies and avoided [1] the traditional enterprise technologies like java/c++/c#/SQL/MOM/Corba . As a result, my rating in the “body-building contest” was rather low.

Like assembly programming, I thought the “hard” (hardware-friendly) languages were giving way to easier, “productivity” languages in the Internet era. Who would care about a few microsec? Wrong…. The harder languages still dominate high-end jobs.

Analogy?

* An electronics engineering graduate stuck in a small, unsuccessful wafer fab
* An uneducated pretty girl unable to speak well, dress well.

Today (2017) my resume features java/c++/py + algo trading, quant, latency … and I have some accumulated insight on core c++/c#, SQL, sockets, connectivity, ..

[1] See also fear@large codebase
[2] To my surprise, some of these lighter technologies became enterprise —

  1. linux
  2. python
  3. javascript GUI
  4. http intranet apps

##lockfree queue implementations c++J

I think a linked queue with 2 pointers (head / tail) could be relatively easy to implement by myself.

anon classes^lambda: java perf #class file loading

Based on [[JavaPerm]] P381

  • An anon class requires an actual *.class file created by javac compiler and loaded from serialized form (usually disk).
  • No such class file for a lambda.

This difference has very limited performance impact, as of java 8. However, More than half the interview questions are about fancy theoretical knowledge, so this is knowledge valuable to interviews.

 

top-of-book imbalance: predictive power

At any time, for a given order book there’s an observable ticking ratio I name as “top-of-book-imbalance” or TOBI := b/(b+a) where

b is total volume at best bid level and
a is total volume at best ask level

For a given stock, whenever TOBI was high, statistically we had seen more market-buy orders in the preceding time period; and conversely when TOBI was low, we had seen more market-sell orders.

Therefore, TOBI has proven predictive power.

engagement+spare time utilization: %%strength

Very few peers are so conscious of burn^rot. Higher utilization of spare time is a key strength during my US peak + my dotcom peak + also my high school. We could analyze what’s common and what’s different between these peaks…

Outside those peaks, I also used this strength to complete my UChicago program, but the tangible benefit is smaller.

(This is different from efficiency on the job. Many efficient colleagues spend less time in office but get more done.  My style involves sacrificing personal spare time and family time.)

Looking forward, I guess this strength could be strategic for research-related domains, including any job involving some elements of research and accumulation.

A repeated manager praise for me is “broad-based”, related to this strength.

many IV failures before 1st success #c++/HFT/CIV

  • i had so many failures at c++ interviews before I started passing. Now I look like a c++ rock star to some
  • I had so many failures at HFT interviews before I started passing at DRW, SIG, Tower
  • I had so many failures at remote speed coding before I started passing.

With west-coast type of companies including nsdq, I can see my rise in ranking. If I try 10 more times the chance of further progress is more than 70%.

I /may never/ become a rock star in these west coast CIVs but I see my potential as a dark horse in 1) white-board CIV, 2) pure-algo

“May never” means “may not happen” … Scott Meyers

[19] 2 reasons Y I held on to c++ NOT c#

In both cases, I faced steep /uphill/ in terms of GTD-traction, engagement, sustained focus, smaller-than-expected job market [1] .. but why I held on to c++ but abandoned c#?

[1] actually c# was easier than c++ in GTD-traction, entry barrier, opacity

Reason: GUI — 95% of the c# jobs I saw were GUI but GUI is not something I decided to take on. The server-side c# job market has remained extremely small.

Reason: in 2015 after Qz, I made the conscious decision to refocus on c++. I then gained some traction in GTD and IV, enough to get into RTS. By then, it was very natural for me to hold on to c++.

— minor reasons

Reason: west coast coding tests — python and c/c++ are popular

throughput^latency #wiki

High bandwidth often means high-latency:( .. see also linux tcp buffer^AWS tuning params

  • RTS is throughput driven, not latency-driven.
  • Twitter/FB fanout is probably throughput-driven, not latency-driven
  • I feel MOM is often throughput-driven and introduces latency.
  • I feel HFT OMS like in Mvea is latency-driven. There are probably millions of small orders, many of them cancelled.

https://en.wikipedia.org/wiki/Network_performance#Examples_of_latency_or_throughput_dominated_systems shows

  • satellite is high-latency, regardless of throughput
  • offline data transfer by trucks) is poor latency, excellent throughput

semaphore: often !! ] thread library

lock and condVar are essential components of any thread library. Counting Semaphore is not.

  • In (C library) POSIX the semaphore functions do not start with pthread_ like locks and condVars are.
  • In (C library) SysV, the semaphore API is not part of any thread library whatsoever
  • In java, locks and condVars are integrated with every object, but Semaphore is a separate class
  • Windows is different

Important Note — both dotnet and java are a few abstraction levels higher than the thread or semaphore “libraries” provided on top of an operating system. These libraries [1]  are sometimes NOT part of kernel, even if the kernel provides basic thread support.

[1] ObjectSpace and RogueWave both provides thread libraries, not built into any operating system whatsoever.

[18]t-investment: c++now surpassing java

My learning journey has been more uphill in c++. Up to 2018, I probably have invested more effort in c++ than any language including java+swing.

I analyzed c++QQ more than java QQ topics, because java is Significantly easier, more natural for me.

I read and bought more c++ books than java+swing books.

If I include my 2Y in Chartered and 2Y in Macq, then my total c++ professional experience is comparable to java.

Q: why until recently I felt my GTD mileage was less than in java+swing?

  • A #1: c++ infrastructure is a /far cry/ from the clean-room java environment. More complicated compilation and more runtime problems.
  • A: I worked on mostly smaller systems… less familiar with the jargons and architecture patterns
  • A: not close to the heart of bigger c++ systems

Q: why until recently I didn’t feel as confident in c++ as java+swing?

  • A #1: interview experiences. About 30% of my c++ interviews were HFT. I always forgot I had technical wins at SIG and WorldQuant
  • A #2: GTD mileage, described above.

prefer ::at()over operator[]read`containers#UB

::at() throws exception … consistently 🙂

  • For (ordered or unordered) maps, I would prefer ::at() for reading, since operator[] silently inserts for lookup miss.
  • For vector, I would always favor vector::at() since operator[] has undefined behavior when index is beyond the end.
    1. worst outcome is getting trash without warning. I remember getting trash from an invalid STL iterator.
    2. better is consistent seg fault
    3. best is exception, since I can catch it

 

Machine Learning #notes

Machine Learning — can be thought of as a method of data analysis, but a method that can automate analytical model building. As such, this method can find hidden insights unknown to the data scientist. I think the AlphaGo Zero is an example .. https://en.wikipedia.org/wiki/AlphaGo_Zero

Training artificial intelligence without datasets derived from human experts is… valuable in practice because expert data is “often expensive, unreliable or simply unavailable.”

AlphaGo Zero’s neural network was trained using TensorFlow. The robot engaged in reinforcement learning, playing against itself until it could anticipate its own moves and how those moves would affect the game’s outcome

So the robot’s training is by playing against itself, not studying past games by other players.

The robot discovered many playing strategies that human players never thought of. In the first three days AlphaGo Zero played 4.9 million games against itself and learned more strategies than any human can.

In the game of GO, world’s strongest players are no longer humans. Strongest players are all robots. The strongest strategies humans have developed are easily beaten by these robots. Human players can watch these top (robot) players fight against each other, and try to understand why their strategies work.

MSOL defer-send rule #3spaces

I have used this type of rules in many companies. My current set-up is —

Apply this rule after I submit a msg marked as Normal importance: defer delivery by 1 minute.

I wish there’s a “importance = A or B” condition.

generate loopfree paths: graph node A→B|anyPair

Q1: given 2 nodes in a graph containing N (eg 121) nodes, potentially with cycles, generate all simple paths between the pair. A simple path has no cycle. (In other words, for a simple path, length + 1 ==  # unique nodes)

I think there are classic math algorithms for it, because this is part of basic graph theory. Here are some applications of this type of algorithms —

  • Q1b (special case of Q1): given 2 nodes in a C-by-R rectangular grid, where every node is connected to (up to) four neighbors, generate all cycle-free paths between the pair.
    • Below, I solved this problem in python
  • Q2 (simple application of Q1 algo): generate all simple paths between ALL node pair in a graph. The shortest simple path has length=0. Longest simple path can potentially visit every node exactly once.
  • A: first generate all 121-Choose-2 node pairs. For each pair, solve Q1. Lastly generate the 121 trivial paths of length=0.
    • repetitive 😦
  • Q2b (special case of Q2): In a C-by-R rectangular grid, where every node is connected to (up to) four neighbors, generate all cycle-free paths between ALL pairs. I believe this simple leetcode problem#79 does it.
    • Now I think the algo is much simpler than I imagined. Should really code it by hand.
  • Q2c (easy one based on Q2): given a binary tree containing no cycles, generate all paths.

— A1b: my DFT implementation (probably not 100% correct) , where each “trail” either fails or becomes a path.

  1. from NodeA start a breadcrumb/trail. We can’t revisit any node already visited on current breadcrumb,
    1. if this is a matrix, then instead of a hashtable, we can also use a shadow matrix, but the breadcrumb is much smaller than a shadow matrix
  2. if we reach a node surrounded by nodes on the same breadcrumb, then the trail fails at a dead-end
  3. else we will reach NodeB 🙂 Print the breadcrumb

By construction, we won’t see duplicate paths 🙂

https://github.com/tiger40490/repo1/blob/py1/py/algo_grid/classic_count4waySimplePaths.py is the implementation

–BFT? I don’t think BFT can print each unique path

inserting interval #merging

Q (Leetcode): Given a set of non-overlapping intervals, insert a new interval into existing intervals (merge if necessary) and print updated list of intervals. Intervals were a vector sorted according to their start times.

–analysis–

Now I feel the #1 main data structure is a doubly linked list (dlist) of Segment objects:

  • { segment_left_mark,
  • ptr to next node, ptr to prev node
  • optionally a (bool or) enum having A/B, where A means current segment is AboveWater (an interval) or BelowWater i.e. a gap}.

Every time this dlist is modified, we would update a “helper container” — a tree of node pointers, sorted by the segment_left_mark value. Tree to help successive inserts. However, if each insert(vector intervals) has a sorted vector then we can binary search the vector and don’t need to tree.

First, binary search to locate the left mark among all existing marks. Ditto right mark. Based on these 2 results, there are many cases.

  1. done — Case (simple) both fall into the same existing interval. No op
  2. done — case (simple) both fall into the same gap segment. Create 2 new segments and insert into the dlist
  3. done — case (simple) one boundary falls into a gap the other falls into a adjacent interval — just adjust the segment_left_mark without inserting new segment
  4. done — case — bridge: both boundaries fall into different intervals. Adjust segment_left_mark of 2 affected segments, then link up the two to skip the intermediate segments
  5. done — case — wipeout: both boundaries fall into different gaps, wiping out at least 1 interval.
  6. done — case (most complex) — one falls into an interval, the other into a non-adjacent gap.
  7. case — incoming interval left boundary is lower than all boundaries, but right boundary falls into some segment
  8. case — incoming interval is very low
  9. case (special) — if an interval becomes adjacent to another, then merge the two.

Need a sorted tree of all marks + array of segments. Redundant but helpful.

Each segment (interval or gap) is represented by {left mark, right mark} where left <= right. I will save the segment objects into (a linked list and) an array. Even elements are interval objects and odd elements are gap objects. Now superceded by dlist.

I think this problem is all about corner cases. Perhaps start with the complex cases which will take care of the simpler cases. No need to pass Leetcode tests. Due to the pointer complexity, I prefer python.

https://github.com/tiger40490/repo1/blob/py1/py/linklist/insertInterval.py is my solution but I dare not test on Leetcode

pick java if you aspire 2be arch #py,c#

If you want to be architect, you need to pick some domains.

Compared to python.. c#.. cpp, Java appears to be the #1 best language overall for most enterprise applications.

  • Python performance limitations seem to require proprietary extensions. I rarely see pure python server that’s heavy-duty.
  • c#is less proven less mature. More importantly it doesn’t work well with the #1 platform — linux.
  • cpp is my 2nd pick. Some concerns:
    • much harder to find talents
    • Fewer open-source packages
    • java is one of the cleanest languages. cpp is a blue-collar language, rough around the edges and far more complex.

specify(by ip:port) multicast group to join

http://www.nmsl.cs.ucsb.edu/MulticastSocketsBook/ has zipped sample code showing

mc_addr.sin_port = thePort;

bind(sock, (struct sockaddr *) &mc_addr, sizeof(mc_addr) ) // set the group port, not local port!
—-
mc_req.imr_multiaddr.s_addr = inet_addr(“224.1.2.3”);

setsockopt(sock, IPPROTO_IP, IP_DROP_MEMBERSHIP,
(void*) &mc_req, sizeof(mc_req) // set the IP by sending a IGMP join-request

Note setsocopt() actually sends a request!

====That’s for multicast receivers.  Multicast senders use a simpler procedure —

mc_addr.sin_addr.s_addr = inet_addr(“224.1.2.3”);
mc_addr.sin_port = htons(thePort);

sendto(sock, send_str, send_len, 0, (struct sockaddr *) &mc_addr, …

limit-IOC ^ market-IOC

Limit IOC (Immediate-or-Cancel): Can be used for FX Spot and CFD.

An instruction to fill as much of an order as possible within pre-defined tolerances of a limit price, immediately (5 second Time-to-Live).

Unlike Market IOC orders, Limit IOC orders allow a Client to control the maximum slippage that they are willing to accept.

Under normal market conditions a Market IOC order will be filled in full immediately. In the event that it isn’t, any residual amount will be cancelled. Price Tolerance cannot be added on a Market IOC order, meaning that a client cannot control slippage.

java: protected^package-private i.e.default

https://docs.oracle.com/javase/tutorial/java/javaOO/accesscontrol.html (java8) shows two nice tables:

  • There’s no more “private protected
  • default access level is better known as “package-private” — strictly more private more restrictive than Protected . (Protected is more like Public). The 2nd table shows that
    • a “package-private” member of Alpha is accessible by Beta (same package) only, whereas
    • a “protected” member of Alpha is accessible by Beta and Alphasub
    • Therefore, “neighbors are more trusted than children”

I find it hard to remember so here are some “sound bytes”

  1. “protected” keyword only increases visibility never decreases it.
  2. So a protected field is more accessible than package-private (default) field
    • As an example, without “protected” label on my field1, my subclasses outside the package cannot see field1.
  3. same-package neighbors are local and trusted more than (overseas) children outside the package, possibly scattered over external jars

For a “protected” field1, a non-subclass in the same package can see it just as it can see a default-accessible field2

Not mentioned in the article, but when we say “class Beta can access a member x of Alpha”, it means that the compiler allows you to write, inside Beta methods, code that mentions x. It could be myAlpha.x or it could be Alpha.x for a static member.

fiber^thread, conceptually #shallow

I believe the concept of fiber is not standardized across languages. Here are some general observations

  • fibers are unknown to kernel. They are similar to userland threads that are implemented in userland thread libraries, rather than implemented by kernel and system calls.
  • like userland threads, a fiber adds less load on the kernel. See [[pthreads]]
  • diff: fibers are even more light-weight than threads
  • diff: fibers are usually short-lived, perhaps similar to tasks
  • diff: fibers have smaller stacks
  • a few language (only heard about scala?) support millions of concurrent fibers in one OS. For threads, with a IO-heavy workload, you probably can run tens of thousands of threads on a single JVM.

IDE IV=easier4some !!4me

I guess many (majority ?) candidates consider IDE coding IV easier than white-board. I think they rely on IDE as a coding aid. Many programmers are not used to “demo” coding using white board … stressful, unnatural?

The “coding aid” feature helps me too, but it helps my competitors more !

If on white-board (or paper), my relative rating is A-, then on IDE i would score B

  • my relative weakness on IDE — ECT speed
  • my relative strength on white-board — clarity of thinking/explanation

design IV: telltale signs of disapproval

(Note a subtype of design interview is the SDI.) When interviewer asks, as described in https://www.susanjfowler.com/blog/2016/10/7/the-architecture-interview:

  • why you choose this approach (this design, this direction…)
  • what are the pros and cons of this approach
  • what alternatives there might be

It’s 80% a sign of disapproval.

If they are only curious, they would phrase it differently.

Remember interviewer has a rather fixed idea how this design should go. If you propose something unfamiliar to her, she can’t “lead” the interview with confidence. She risks losing control. Therefore, she has no choice but steer you back to her familiar territory.

Most of them won’t want to admit that your idea is plausible but different from her idea.

y FIX needs seqNo over TCP seqNo

My friend Alan Shi said … Suppose your FIX process crashed or lost power, reloaded (from disk) the last sequence received and reconnected (resetting tcp seq#). It would then receive a live seq # higher than expected. This could mean some executions reports were missed. If exchange notices a sequence gap, then it could mean some order cancellation was missed.  Both scenarios are serious and requires request for resend. CME documentation states:

… a given system, upon detecting a higher than expected message sequence number from its counterparty, requests a range of ordered messages resent from the counterparty.

Major difference from TCP sequence number — FIX specifies no Ack though some exchange do. See Ack in FIX^TCP

Q; how could FIX miss messages given TCP reliability? I guess tcp session disconnect is the main reason.

https://kb.b2bits.com/display/B2BITS/Sequence+number+handling has details:

  • two streams of seq numbers, each controlled by exch vs trader
  • how to react to unexpected high/low values received. Note “my” outgoing seq is controlled by me hence never “unexpected”
  • Sequence number reset policy — After a logout, sequence numbers is supposed to reset to 1, but if connection is terminated ‘non-gracefully’ sequence numbers will continue when the session is restored. In fact a lot of service providers (eg: Trax) never reset sequence numbers during the day. There are also some, who reset sequence numbers once per week, regardless of logout.

 

[10]y memory footprint is key to latency#JGC

see also post on large in-memory search

I suspect there’s a dilemma —

  • large heap allocation request -> free list search is harder and slower. Need to avoid ad-hoc/unplanned request for large chunks.
  • many small heap allocation requests -> free list mgr becomes hotspot
  • .. in reality, pre-allocating large arrays is probably a performance win.

I thought in low latency, time costs outweigh space costs, but no. Garbage collection is a major performance issue in low latency systems. I guess so is object creation. I guess that’s why memory efficiency affects latency. GC probably takes less cpu time if there’s less stuff to scan.

Distributed cache (memory visualization) isn’t much used in low latency systems, possibly because

  • * serialization
  • * network latency
  • * TCP? I feel the fastest HFT systems have very limited IO perhaps in the form of shared memory? FIX is based on TCP so I would assume very high overhead, but in reality, it may be a small overhead.

mlock() : low-level syscall to prevent paging ] real-time apps

https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux_for_real_time/7/html/reference_guide/using_mlock_to_avoid_page_io has sample code

See also https://eklitzke.org/mlock-and-mlockall

https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_MRG/1.3/html/Realtime_Reference_Guide/sect-Realtime_Reference_Guide-Memory_allocation-Using_mlock_to_avoid_memory_faults.html says

If the application is entering a time sensitive region of code, an mlockall call prior to entering, followed by munlockall can reduce paging while in the critical section. Similarly, mlock can be used on a data region that is relatively static or that will grow slowly but needs to be accessed without page faulting.

4buffers in 1 TCP connection #full-duplex

Many socket programmers may not realize there are four transmission buffers in a single TCP connection or TCP session. These buffers are set aside by the kernel when the socket is created.

Two buffers in socket AA (say in U.S.) + two sockets on socket BB (say, in Singapore)

Two receive buffers + two send buffers

At any time, there could be data in all four buffers. AA can initiate a send in the middle of receiving data because TCP is a full-duplex protocol.

Whenever one of the two sockets initiate a send, it has a job duty to ensure it never overflows the receiving buffer on the other end. This is a the essence of flow-control. This flow-control relies on the 2 buffers involved (not the other two buffers) P 247 [[computer networking]] has details. Basically, sender has an estimate of the remaining free space in the receive buffer so sender never sends too many bytes. It keeps unsent data in its send buffer.

Is UDP full-duplex? My book doesn’t mention it. I think it is.

CHANNEL for multicast; TCP has Connection

In NYSE market data lingo, we say “multicast channel”.

  • analogy: TV channel — you can subscribe but can’t connect to it.
  • analogy: Twitter hashtag — you can follow it, but can’t connect to it.

“Multicast connectivity” is barely tolerable but not “connection”. A multicast end system joins or subscribes to a group. You can’t really “connect” to a group as there could be zero or a million different peer systems without a “ring leader” or a representative.

Even for unicast UDP, “connect” is the wrong word as UDP is connectionless.

Saying a nonsense like “multicast connection” is an immediate giveaway that the speaker isn’t familiar with UDP or multicast.

dominant server-side language@WallSt: evolution

Don’t spend too much time.. Based on my limited observations,

  • As of 2007, the top dog was java.
  • The dominance is even stronger in 2018.
  • Q: how about in 10 years?
  • A: I feel java will remain #1

Look at the innovation leaders — West coast. For their (web) server side, they seem to have shifted slightly towards python, javascript, RoR

Q: Why do I consider buy-side, sell-side and other financial tech shops as a whole and why don’t I include google finance?
A: … because there’s mobility between sub-domains within, and entry barrier from outside.

Buy-side tend to use more c++; Banks usually favor java; Exchanges tend to use … both c++ and java. The latency advantage of c++ isn’t that significant to a major exchange like Nsdq.

 

%%FIX xp

Experience: special investment upfront-commission. Real time trading app. EMS. Supports
* cancel
* rebook
* redemption
* sellout

Experience: Eq volatile derivatives. Supports C=amend/correction, X=cancel, N=new

FIX over RV for market data dissemination and cache invalidation

Basic FIX client in MS coding interview.

addicted to c++for()loop@@

I find myself thinking in c++ while writing python for loops. Inefficient.

for i in range(0,len(mystrLiTup),1): # explicit is best
  mystrLiTup[i]...

What if you need to adjust the index “i” in the loop? A transitional construct (until we get rid of c++ thinking):

mystrLiTup='sentence. with. periods. end'
i=-1
while True:
  i+=1 # increment explicitly
  if i >= len(mystrLiTup): break
  print mystrLiTup[i]
  if mystrLiTup[i] == '.':
    i+=1 

mystr[:-2] COPY_truncate last 2 chars #mystr[2:]

  • Python string (tuple too) is immutable, so mystr[:-2] returns a copy with last 2 chars truncated
  • Even for a mutable list, this slicing syntax returns a copy.
  • …. This seems to be the best syntax to truncate list, string and tuple.

See https://www.dotnetperls.com/slice-python

— how about

mystr[2:] # As expected, this clones then chops First 2 chars

— If you want to truncate a list in-place, use the q(del) keyword (not a function)

Syntax is easy for single-element deletion. Tricky for slice deletion

list tuple str comment
del immutable in-place truncate?
var[:-2] tested copy_truncate LAST 2
var[2:] tested copy_truncate FIRST 2

malloc=long considered costly

Heap allocation is extremely slow compared to other operations.

 

c++GC interface

https://stackoverflow.com/questions/27728142/c11-what-is-its-gc-interface-and-how-to-implement

GC interface is partly designed to enable

  • reachability-based leak detectors
  • garbage collection

The probe program listed in the URL shows that as of 2019, all major compilers provide trivial support for GC.

Q: why does c++ need GC, given RAII and smart pointers?
A: system-managed automatic GC instead of manual deallocation, without smart pointers

c++QQ/zbs Expertise: I got some

As stated repeatedly, c++ is the most complicated and biggest language used in industry, at least in terms of syntax (tooManyVariations) and QQ topics. Well, I have impressed many expert interviewers on my core-c++ language insight.

That means I must have some expertise in c++ QQ topics. For my c++ zbs growth, see separate blog posts.

Note socket, shared mem … are c++ ecosystem, like OS libraries.

Deepak, Shanyou, Dilip .. are not necessarily stronger. They know some c++ sub-domains better, and I know some c++ sub-domains better, in both QQ and zbs.

–Now some of the topics to motivate myself to study

  • malloc and relatives … internals
  • enable_if
  • email discussion with CSY on temp obj
  • UDP functions

have used Briefly : sharedMem/lockfree/..

Just like my early learning curve in sockets, Dynamic Programming and swing, I have yet to achieve a breakthrough in these topics, So there are too many topics and I don’t know what to focus on.

It’s important not to exaggerate your expertise in these areas. Once interviewers find out your exaggeration, subconscious they would discount other parts of your resume.

  • c++ lock-free — “Used in my project but not written by me”
  • Shared mem
  • Boost::*
  • Epoll
  • Multiple inheritance
  • Pyton multiprocessing

UDP/TCP socket read buffer size: can be 256MB

For my UDP socket, I use 64MB.
For my TCP socket, I use 64MB too!

These are large values and required kernel turning. In my linux server, /etc/sysctl.conf shows these permissible read buffer sizes:

net.core.rmem_max = 268435456 # —–> 256 MB
net.ipv4.tcp_rmem = 4096   10179648   268435456 # —–> 256 MB

Note a read buffer of any socket is always maintained by the kernel and can be shared across processes [1]. In my mind, the TCP/UDP code using these buffers is kernel code, like hotel service. Application code is like hotel guests.

[1] Process A will use its file descriptor 3 for this socket, while Process B will use its file descriptor 5 for this socket.

multiple hits: lower_bound gives Earliest

Looking for lower_bound (2) in {0,1,2,2,3,4}, you get the earliest perfect hit among many, i.e. the left-most “2”.

No such complexity in upper_bound since upper_bound never returns the perfect hit.

No such complexity in set.lower_bound since it won’t hold duplicates.

int main(){
  vector<int> s{0,1,2,2,3,4};
  vector<int>::iterator it = lower_bound(s.begin(), s.end(), 2);
  cout<<"to my left: "<<*(it-1)<<endl;
  cout<<"to my right: "<<*(it+1)<<endl;
  cout<<"to my right's right: "<<*(it+2)<<endl;
}

socket accept() key points often missed

I have studied accept() many times but still unfamiliar.

Useful as zbs, and perhaps QQ, rarely for GTD…

Based on P95-97 [[tcp/ip socket in C]]

  • used in tcp only
  • used on server side only
  • usually called inside an endless loop
  • blocks most of the time, when there’s no incoming new connections. The existing clients don’t bother us as they communicate with the “child” sockets independently. The accept() “show” starts only upon a new incoming connection
    • thread remains blocked, starting from receiving the incoming until a newborn socket is fully Established.
    • at that juncture the new remote client is probably connected to the newborn socket, so the “parent thread[2]” have the opportunity/license to let-go and return from accept()
    • now, parent thread has the newborn socket, it needs to pass it to a child thread/process
    • after that, parent thread can go back into another blocking accept()
  • new born or other child sockets all share the same local port, not some random high port! Until now I still find this unbelievable. https://stackoverflow.com/questions/489036/how-does-the-socket-api-accept-function-work confirms it.
  • On a host with a single IP, 2 sister sockets would share the same local ip too, but luckily each socket structure has at least 4 [1] identifier keys — local ip:port / remote ip:port. So our 2 sister sockets are never identical twins.
  • [1] I omitted a 5th key — protocol as it’s a distraction from the key point.
  • [2] 2 variations — parent Thread or parent Process.

addiction2low-level hacking:keep doing; no shame

Update: low-level hacking is generally easier in c++ than java.

When I become interested in a tech topic, I often throw cold water over my head — “This is such a /juvenile/, albeit productive and wholesome, hobby. Look at ex-classmates/colleagues so and so, with their business wing. They deal with business strategies. My tech stuff is so low-level and boring compared to what they deal with.”

Damaging, harmful, irrational, demoralizing SMS! Get Real, Man! Let’s assess our own situation

  • A) On one hand, I need to avoid spending too much time becoming expert in some low-leverage or high-churn technology (php? XML? ASP?).
  • B) On the other hand, the enthusiasm and keen interest is hard to get and extremely valuable. They could be the catalyst that grow my zbs and transform me into a veteran over a short few years. Even with this enthusiasm and depth of interest, such a quick ascent is not easy and not likely. Without them, it’s simply impossible.

Case: grandpa. His research domain(s) is considered unglamorous 冷门 but he is dedicated and passionate about it. He knows that in the same Academy of social sciences, economics, geopolitics and some other fields are more important. He often feels outside the spotlight (kind of sidelined but for valid reasons). That is a fact which had a huge impact on my own choice of specialization. But once he decided to dig in and invest his whole life, he needed to deal with that fact and not let it affect his motivation and self-image. As a senior leader of these unglamorous research communities, he has to motivate the younger researchers.

Case: Greg Racioppo, my recruiter, treats his work as his own business. The successful recruiters are often in the same business for many years and make a long term living and even create an impact for their employees (and people like me). They could easily feel “boring” compared to the clients or the candidates, but they don’t have to.

Case: PWM wealth advisors. They could feel “boring” compared to the filthy rich clients they deal with, but in reality, these advisors are more successful than 99% of the population.

Case: The ratio of support staff to traders is about 50:1, but I don’t feel “boring” because of them.

Case: Look at all the staff in a show, movie, supporting the stars.

JGC duration=100→10ms ] “Z-GC” #frequency

This blog has many posts on JGC overhead.

  • For low-Latency JVM, duration (pause time) outweighs frequency
  • For mainstream JVM, overhead + throughput outweighs duration

–frequency

Could be Every 10 sec , as documented in my blogpost

–stop-the-world duration:

100 mills duration is probably good enough for most apps but too long for latency sensitive apps, according to my blogpost.

For a 32GB JVM in a latency-sensitive Barclays system, worst long pause == 300ms.

The new Z-GC features GC pause times below 10ms on multi-terabyte heaps. This is cutting-edge low pause.

blockchain #phrasebook

A blockchain is a peer-to-peer network that timestamps records by hashing them into an ongoing chain of hash-based proof-of-work, forming a record that cannot be changed without redoing the proof-of-work.

In contrast, a distributed ledger is a peer-to-peer network that uses a defined consensus mechanism to prevent modification of an ordered series of time-stamped records. All blockchains are distributed ledgers, but not all distributed ledgers are blockchains.

Keywords:

  • Peer-to-peer — no central single-point-of-failure
  • Immutable — records of past transactions
  • Ever-growing — the chain keeps growing and never shrinks. Is there some capacity issue in terms of storage, backup, search performance?
  • Double-spend — is a common error to be prevented by blockchain

##google-searchable dev technologies(!!Qz):lower stress

This topic is Rather important to my stress level, my available time for learning, my available time for family….

For example, with Quartz, I must ask around and experiment (millions of times). Slow to build a coherent understanding. Slow ramp-up. (In contrast, with Python I could do that by reading good articles online.) So my productivity lag/gap remains even after a few years.

Other Negative examples – Tibrv, Autosys, Accurev, Less-known boost libraries,..

MSVS? Most of the search results are about c#, so it’s somewhat harder to solve problems.

Eclipse CDT? Most of the search results are about Eclipse java.

Positive examples – vbscript, DOS batch,

Yet, this stressor is mild compared to “performance warnings”.

linker dislikes [non-generic]function definition in shared header

I used to feel header files are optional so we can make do without them if they get in our ways. This post shows they aren’t optional in any non-trivial c++ project. There is often only one (or few) correct way to structure the header vs implementation files. You can’t make do without them.

Suppose MyHeader.h is included in 2 cpp files and they are linked to create an executable.

A class definition is permitted in MyHeader.h:

class Test89{
void test123(){}
};

However, if the test123() is a free function, then linker will fail with “multiple definition” of this function when linking the two object files.

http://stackoverflow.com/questions/29526585/why-defining-classes-in-header-files-works-but-not-functions explains the rules

  • repeated definition of function (multiple files including the same header) must be inlined
  • repeated class definition (in a shared header) is permitted for a valid reason (sizing…). Since programmers could not only declare but define a member function in such a class, in a header, the compiler silently treats such member functions as inline

c++parse DateTime using stringstream #no boost

This is the simplest way I have found.

#include <ctime>
#include <iomanip>
#include <iostream>
#include <sstream>
using namespace std;

//withou Boost, parsing string to DateTime and back
// from http://arsenmk.blogspot.sg/2014/07/converting-string-to-datetime-and-vice.html
int main(){
 stringstream ss{ "1970-01-01 8:00:01" };
 tm simpleStruct; //construct a placeholder on stack
 //parse and output to the placeholder
 ss >> get_time(&simpleStruct, "%Y-%m-%d %H:%M:%S");

 time_t secSinceEpoch = mktime(&simpleStruct);
 if (secSinceEpoch < 0) {
 cout << "parsing failed. (Very strict.) " << secSinceEpoch << endl;
 return -1;
 }
 cout << secSinceEpoch <<" seconds since Epoch (1970/1/1 midnight GMT) is -> ";
 cout << asctime(localtime(&secSinceEpoch));
}

ask`lower base2reduce mgr expectation@@ No

This plan didn’t work out at Macq. Expectation was still too high.

The logic is, if my coworkers get total comp 200k and I ask only 160k, then I’m more likely to get some bonus. Even if I underperform them, I would still hit somewhere below 200k.

Now I think if I qualify to stay, then there will be some bonus even if my base is, say 190k. Hiring managers would not agree to a 200k base and run the risk paying doughnut bonus to a qualified employee.

## tips: quickly get into shape4algo IV

Q: how many days (immersion) do you need to get back to shape for coding interview?
A: a week to 2 months

This question becomes more relevant when you realize your proficiency is way off your peak level achieved x years ago.

xp: I was full time job hunting after layoff from BofA. At the same time Ashish was also trying. He took a few months to get into shape.

Note “shape” doesn’t involve advanced skills. It’s all about basic know-how. Therefore, this goal is a relatively low-hanging fruit and within my reach. Therefore, I feel motivated. I feel (including coding) interview preparation is my favorite sport. I fee like a prize fighter. When I prepare for this, I never feel spinning my wheel as I feel in other endeavors. My spare time utilization is highest in this area. In this area, I can more easily sink my teeth in, engage, and dig in my heels (as in tug-of-war)

  1. review my eclipse IV code in java and c++. Real code supposed to help memory more.
    1. review the tricky algo in my blog
  2. books like [[Elements of Programming Interview]]
    1. careercup? less ideal than EPI
  3. paste everywhere tiny stickers with Q&A? less important for pure algo quizzes
  4. self-tests like gaokao

 

##thread cancellation techniques: java #pthread,c#

Cancellation is required when you decide a target thread should be told to give up halfway. Cancellation is a practical technique, too advanced for most IV.

Note in both java and c#, cancellation is cooperative. The requester (on it’s own thread) can’t force the target thread to stop.

C# has comprehensive support for thread cancellation (CancellationToken etc). Pthreads also offer cancellation feature. Java uses a numbers of simpler constructs, described concisely in [[thinking in java]]. Doug Lea discussed cancellation in his book.

Here are the java techniques

  • interrupt
  • loop polling – the preferred method if your design permits.
  • thread pool shutdown, which calls thread1.interrupt(), thread2.interrupt() …
  • Future — myFuture.cancel(true) can call underlyingThread.interrupt()

Some blocking conditions are clearly interruptible — indicated by the compulsory try block surrounding the wait() and sleep(). Other blocking conditions are immune to interrupt.

NIO is interruptible but the traditional I/O isn’t.

The new Lock objects supports lockInterruptibly(), but the traditional synchronized() lock grab is immune to interrupt.

## Y in sg(^U.S.)u can’t be developer till 65,succinctly

In the US, at 65 you could work as a developer. (Actually that’s not the mainstream for most immigrant techies. What do they do? Should ask Ed? Anirudh? Liu Shuo, ZR…)

Why SG is different? Here’s my answer, echoing my earlier posts.

  1. US culture (job market, managers…) has a tradition of being more open to older techies
  2. US culture respects technologists. Main street techies get paid significantly higher than SG main street techies
  3. high-end (typical VP-level) technical work – more comon to get in the US than SG, partly because wage premium is smaller, like 100k -> 150k

socket stats monitoring tools – on-line resources

This is a rare interview question, perhaps asked 1 or 2 times. I don’t want to overspend.

In ICE RTS, we use built-in statistics modules written in C++ to collect the throughput statistics.

If you don’t have source code to modify, I guess you need to rely on standard tools.

[14]JGC tuning: will these help@@

Q: have a small nursery generation, so as to increase the frequency of GC collections?

AA: An optimal nursery size for maximum application throughput is such that as many objects as possible are garbage collected by young collection rather than old collection. This value approximates to about half of the free heap.

Q: have maximum heap size exceeding RAM capacity.
A: 32-bit JVM won’t let you specify more than 4G even with 32 GB RAM. Suppose you use 64-bit JVM, then actually JVM would start and would likely use up all available RAM and starts paging.

which socket/port is hijacking bandwidth

I guess some HFT machine might be dedicated to one (or few) process, but in general, multiple applications often share one host. A low latency system may actually prefer this, due to the shared memory messaging advantage.  In such a set-up, It’s extremely useful to pinpoint exactly which process, which socket, which network port is responsible for high bandwidth usage.

Solaris 10? Using Dtrace? tough? See [[solaris performance and tools]]

Linux? doable

# use iptraf to see how much traffic flowing through a given network interface.
# given a specific network interface, use iptraf to see the traffic break down by individual ports. If you don’t believe it, [[optimizing linux perf ]] P202 has a iptraf screenshot showing the per-port volumes
# given a specific port, use netstat or lsof to see the process PID using that port.
# given a PID, use strace and /proc/[pid]/fd to drill down to the socket (among many) responsible for the traffic. Socket is seldom shared (see other posts) between processes. I believe strace/ltrace can also reveal which user functions make those socket system calls.

"uninitialized" is usually either a pointer||primitive type

See also c++ uninitialized “static” objects ^ stackVar

1) uninitialized variable of primitive types — contains rubbish
2) uninitialized pointer — very dangerous.

We are treating rubbish as an address! This address may happen to be Inside or Outside this process’s address space.

Read/write on this de-referenced pointer can lead to crashes. See P161 [[understanding and using C pointers]].

There are third-party tools to help identify uninitialized pointers. I think it’s by source code analysis. If function3 receives an uninitialized pointer it would look completely normal to the compiler or runtime.

3) uninitialized class instance? Possible. Every class instance in c++ will have its memory layout well defined, though a field therein may fall into category 1) or 2) above.

Ashish confirmed that a pointer field of a class will be uninitialized by default.

4) uninitialized array of pointers could hold wild pointers

5) I think POD class instances can also show up uninitialized. See https://stackoverflow.com/questions/4674332/declaring-a-const-instance-of-a-class

[13] mutable data types: python ilt java #dictKey

As illustrated in python object^variable, Type, Value, immutable, initialize.. and P29 [[python essential ref]], python simple int Age variable is implicitly a pointer to a reference-counted, copy-on-write, IMmutable pointee object. That begs the question ..

Q: so, how do 2 variables share a Mutable object?
A: Use instance methods like mutablePerson.setAge()???
A: Lists and dictionaries also offer versatile “shared mutable objects”. listofMutables[0] would return a reference to the first element.

I feel these 2 answers cover 80% of the use cases.

In summary,
– “scalar” or primitive type Variables point to immutable Objects. Example — Everyday strings and numbers
– most “composite” object are mutable, such as dict, list and user-defined objects
– between the 2 well-understood categories, there exist some special data types
** tuples are composite but immutable
** method objects?
** class objects?
** modules?

Incidentally, (in a bold departure from java/c#[2]) only immutables (string, tuple,”myInt” variables) can be dictionary keys[1]. Lists, dictionaries and most user-defined objects are Mutable therefore disqualified. A “fake” immutable tuple of list also disqualifies — just try it. In real project, we only use strings and ints as keys.

[1] the underlying Object must be immutable, even though the variables can re-bind.
[2] c# expects but does not require keys to be immutable

##[12] some c++(mostly IV)questions

Q: what if I declare but not implement my dtor?
A: private copy ctor is often left that way. Linker will break.
Q: how to access a c++ api across network
A: corba?
Q: default behavior of the q(less) class-template
A: I have at least 2 posts on this. Basically it calls operator< in the target type (say Trade class). If missing, then compiler fails.
Q: how do you detect memory leak?
A: valgrind
%%A: customize op-new and op-delete to maintain counters?
Q: in java, i can bind a template type param — bound params. c++?
%%A: no, but I have a blog post on the alternative solutions
Q891: are assignment op usually marked const?
%%A: no it must modify the content.
Q891a: what if i override the default one with a non-const op=? will the default one still exist?
A: I think both will co-exist
Q: do people pass std::strings by ref? See Absolute C++
%%A: yes they should
Q: unwrap a null ptr
A: undefined behavior. Often termination.
Q: delete a null ptr
A: C++ guarantees that operator delete checks its argument for null-ness. If the argument is 0, the delete expression has no effect.
Q: comparing null pointers?
A: In C and C++ programming, two null pointers are guaranteed to compare equal
Q: use a ref to a reclaimed address?
A: Rare IV… undefined. May crash.
Q: can i declare a Vector to hold both Integers and Floats?
A: slicing. To get polymorphism, use ptr or ref.
Q: what op must NOT be overloaded as member func?
A: rare IV…… qq(.) is one of five.
Q: Can i declare an Insect variable to hold an Ant object, without using pointers? “Ant a; Insect i = a;”?
A: see post on slicing
Q: Without STL or malloc (both using heap), how does C handle large data structures.
%%A: I used to deal with large amounts of data without malloc. Global var, linked data structure, arrays
%%A: I can have a large array in main() or as a static variable, then use it as a “heap”

RAM insufficient@@ telltale signs #scanRate

Scan Rate is the most important indicator. There’s a threshold to look for. It indicates “number of pages per second scanned by the page stealing daemon” in unix but not windows. http://www.dba-oracle.com/oracle10g_tuning/t_server_ram.htm is concise

Paging (due to insufficient Physical memory) is another important indicator, correlated with Scan Rate. There’s a threshold to look for.

dv01 ^ duration – software algorithm

Q: Do dv01 and duration present the same level of software complexity? Note most bonds I deal with have embedded options.

I feel answer is no. dv01 is “simulated” with a small (25 bps?) bump to yield… Eff Duration involves complex OAS. See the Yield Book publication on Durations.

In AutoReo, eff duration is computed in a separate risk system — a batch system… No real time update.

By contrast, eq option (FX option probably similar) positions need to have their delta and other sensitivities updated more frequently.

self-management: Part 2: demonstrate@ CV+interview

Tip: See blog on [[vague vs normal vs specific answers]], but in this case, be specific if you want to demonstrate these qualities.

Tip: you don’t need to prove anything if you already have convincing team-lead experience on your CV.
Tip: Tell stories
Tip: show your familiarity with the typical front office culture.
Tip: demonstrate you know how to work with QA team – give them the docs on build/deploy, DB deployment
Tip: demonstrate your documentation best practices – wiki (GS), jira (GS, ML), cheatsheet/runbook

Tip: Humor? If you are 100% sure about side effects, then I think it can help demonstrate communication effectiveness, but I feel a lot of good self-running developers aren’t humorous.

bond duration(n KeyRateDuration) #learning notes 2

Jargon warning: yield is best written in bps/year, like 545bps/year. If you say 5.45% it gets ambiguous in some contexts such as modified duration. “1% rise in yield” could mean 2 things

– 5.45% —-> 6.45%,
– 5.45% –x-> 5.50% is a misunderstanding

This is not academic; this is real. Portfolio sensitivity to yield fluctuations is a key concern of banks on Wall St or Main St. It’s all about x bps change in yield. (From now on, always use bps to describe yield; avoid percentage.)

DV01 is dollar value of a “basis point”, free of any ambiguity.

DV01 and modified duration are 2 of the most widely used bond math numbers. Both are derived from bond cash flow.

Mac duration — definition — weighted average of wait time for the cash flows.
Mac duration — usage — not much in real world trading

Modified duration — definition — Mac duration modified “slightly”, by a tiny factor. REDUCED by (1+r)
Modified duration — usage — more useful than Mac duration. It measures price sensitivity to a yield shift, on a given bond.

For a simple example of a bond with modified duration of 5 years. 100 bps yield change results in a 5% dollar price change.

Key Rate Duration is an natural (and intuitive) extension of the duration concept, useful in MBS etc.

AtomicReference = a global lookup table

As briefly mentioned in another post, an AtomicReference object is (serious!) a global [1] lookup table.

Example 1: Imagine a lookup service to return the last trade on NYSE. If 2 trades execute in the same clock cycle (rare in a GHz server), then return a Collection.

This is a lookup table with just one default key and a value which is an object’s address. A simple example, but we may not need CAS or AtomicReference here — each thread just overwrite each other.

Example 2: say a high frequency trading account must maintain accurate cash balance and can’t afford to lose an update, because each cash balance change is large. CashBalance isn’t a float (there’s no AtomicFloat anyway) but an object.

This is also a single-entry lookup table. Now how do you update it lock-free? CAS and AtomicReference.

[1] In reality, the AtomicReference is not globally accessible. You need to get a reference to it, but it’s better to ignore this minor point and focus on the key concept.

named pipe simple eg #unix socket

$ mkfifo –mode=0666 /tmp/myfifo # create a named pipe, with a file name

$ cat /etc/passwd > /tmp/myfifo # writer will block until some reader comes up on the receiving end.

Now open another terminal

$ cat < /tmp/myinfo # or

$ tail -f /tmp/myfifo

http://developers.sun.com/solaris/articles/named_pipes.html shows pros and cons. http://www.linuxjournal.com/article/2156 is a simple tutorial.

Motivation? Allow totally unrelated programs to communicate with each other

A side note to be elaborated in another blog — name pipe is a FIFO stream, whereas unix domain socket can be data gram or stream (like TCP)

initialize array-of-pointer to all nulls

Suggestion 1: LimitOrder* orders[howManyOrders] = {};

Suggestion 0: P37 [[c++ coding standards]] suggests “.. = {NULL}“, consistent with P115 [[c++primer]].

See also http://bigblog.tanbin.com/2012/07/default-initializevalue-initialize-new.html

See also http://www.informit.com/articles/article.aspx?p=1852519 on c++11 array-init:

For an array (on stack or heap), an empty pair of braces indicates default initialization. Default initialization of POD types usually means initialization to binary zeros, whereas for non-POD types default initialization means default construction

[09] insightful article: managing tight deadlines

http://fh.rolia.net/f0/c1050/hit/post/6237273.html (removed?) + my comments

“A job usually involves many phases: investigation, brainstorming, design, implementation, testing and documentation. A quality job requires effort and time in every phase. However, when push comes to shove, many of the phases can be skipped. The problems will show up much later. By that time, nobody would care who’s to blame. And companies are more than willing to budget for these problems, in the form of increased support, more bug fixes, or even a brand-new system. You just have to be WILLING and ABLE to produce imperfect software.”

“The second important thing to managing work load is that you have to be the master of your domain, not your boss. This means you don’t tell your boss everything. And you make a lot of decisions yourself. Otherwise, you lose control.” — My GS team lead knows too much about my project. I tell him everything about my project.

“It starts from estimates. You know better than anyone else how long each piece will take. A hard piece to others might be easy for you. But a simple task might end up taking a lot of your time. Don’t tell your boss that you’ve worked on something before and can borrow a lot of code from previous projects.”

“The same applies in the middle of your project. A seemingly complicated piece could turn out to be smooth sailing. Yet a small issue could bog you down for many hours. Again, don’t tell your boss you finished something in an hour which was budgeted for half a day. But do tell him that a bug from another team cost you many hours unexpectedly.”

“What do you do when you see something wrong in the requirement? Or something wrong with other people’s work which you depend on? If you’re pressed for time, act as if you didn’t see them. Act like a fool. You may be punished for missing your own deadline, but you’re unlikely to be punished for not spotting other people’s mistakes.” — By not reporting the issues early, project will suffer but you will not, but avoid making your boss look bad — try to give him a scapegoat. Project will suffer — project will need more time, and the reason is other people’s mistakes. You expand the impact of their mistakes to get more time for yourself. Conversely, when your mistake affects them, they might do the same.

“What do you do when deadline approaches and you discovered a big hole in your work? Again, act like a fool. Act as if you didn’t see them. You hand in your work. And a week later, people would find issues. But that’s normal. Nobody’s perfect. You and your boss get punished for missing deadline; but neither(???) of you would be held responsible for (non-critical?) bugs. Rather you will be given new budget to fix things, probably after a relaxing break in the sun.”

“Every decision you make affects your schedule. Be flexible. Be creative. Be able to accept imperfections. Be a liar if need be. The important thing is to look good, not to be good. Image is everything. And you can cut a lot of corners without affecting your image.”

— GS Slogan says “Tell your manager the bad news early”. If it’s your mistake, then decide if he will find out you were hiding it. Some managers periodically ask “Any problem?” Often he can’t be sure you were knowingly hiding problems — you could act like a fool.

— There are different bugs and different levels of impact. Manager may say some functionality is important or some time line is important, but question them in your head.  It takes a lot of observations to figure out which ones are really important to your manager’s bonus. Many delays and missing features are manageable or expected.

— My GS team peers know what bugs are tolerable to the manager. If manager must explain it to users and other teams, then you know it’s a visible bug. This knowledge comes from experience. However, initially you need to do some quality work to win trust.

— In fact, the best GS (and other) team members often contribute non-trivial bugs, partly because they are more productive and change lots and lots of code quickly.

c++ stream format flags and manipulators – key words

bitVector — probably there’s a hidden bitArray of boolean flags for each stream instance. http://www.cplusplus.com/reference/iostream/ios_base/flags/

1-to-1 — one flag for one manipulator function

Concrete eg — http://www.cplusplus.com/reference/iostream/manipulators/showpos/

transformer — a typical manipulator is a function accepting a stream by reference, and returning the same stream object by reference