fascinated by analysis@fail`tech brands

I tend to learn something (very indirectly) about competition, churn, focus, …

  • Nokia — refocused on its traditional strength in telecom infrastructure
  • Erickson
  • RIM
  • Yahoo
  • Novell

I feel my choice of java and c++ have been good, but regret my investment in dotnet, pthreads

I feel any dominant technology can get displaced. A few examples (not intended to be complete) — C/C++ (by java), RDBMS, Solaris (by Linux)

My tech bets for 2019-2020:

  1. system knowledge #CSY
  2. socket
  3. c++ TMP — a pure QQ domain for IV muscle-building
  4. c++ threading

pass temp into func by val: mv-ctor skipped #RVO CSY

Suppose we have a class MoveOnlyStr which has only move-ctor, no copy-ctor. Suppose we pass an unnamed temporary instance of this class into a function by value, like void func1(MoverOnlyStr arg_mos).

Q: Will the move-ctor be used to create the argument object arg_mos? We discussed this in your car last time we met up.

A: If the temp is produce by a function, then No. My test shows I was right to predict that compiler optimizes away the temporary, due to RVO. So move-ctor is NOT used. This RVO optimization has existed long before c++11.

A: if the temp is not produced by a function, then RVO is irrelevant (nothing “Returned”) but I don’t know if there’s still some copy-elision.

##mkt-data jargon terms: fakeIV

Context — real time high volume market data feed from exchanges.

  • Q: what’s depth of market? Who may need it?
  • Q: what’s the different usages of FIX vs binary protocols?
  • Q: when is FIX not suitable? (market data dissemination …)
  • Q: when is binary protocol not suitable? (order submission …)
  • Q: what’s BBO i.e. best bid offer? Why is it important?
  • Q: How do you update BBO and when do you send out the update?
  • Q[D]: how do you support trade-bust that comes with a trade id? How about order cancellation — is it handled differently?
  • Q: how do you handle an order modification?
  • Q: how do you handle an order replacement?
  • Q: How do you handle currency code? (My parser sends the currency code for each symbol only once, not on every trade)
  • Q: do you have the refresh channel? How can it be useful?
  • Q[D]: do you have a snapshot channel? How can it be useful?
  • Q: when do you shutdown your feed handler? (We don’t, since our clients expect to receive refresh/snapshots from us round the clock. We only restart once a while)
  • Q: if you keep running, then how is your daily open/close/high/low updated?
  • Q: how do you handle partial fill of a big order?
  • Q: what’s an iceberg order? How do you handle it?
  • Q: what’s a hidden order? How do you handle it?
  • Q: What’s Imbalance data and why do clients need it?
    • Hint: they impact daily closing prices which can trigger many derivative contracts. Closing prices are closely watched somewhat like LIBOR.
  • Q: when would we get imbalance data?
  • Q: are there imbalance data for all stocks? (No)
  • ——— symbol/reference data
  • Q[D]: Do you send security description to downstream in every message? It can be quite long and take up lots of bandwidth?
  • Q: what time of the day do symbol data come in?
  • Q5[D]: what essential attributes are there in a typical symbol message?
  • Q5b: Can you name one of the key reasons why symbol message is critical to a market data feed?
  • Q6: how would dividend impact the numbers in your feed? Will that mess up the data you send out to downstream?
  • Q6b[D]: if yes how do you handle it?
  • Q6c: what kind of exchanges don’t have dividend data?
  • Q: what is a stock split? How do you handle it?
  • Q: how are corporate actions handled in your feed?
  • Q: what are the most common corporate actions in your market data? (dividends, stock splits, rights issue…)
  • ——— resilience, recovery, reliability, capacity
  • Q: if a security is lightly traded, how do you know if you have missed some updates or there’s simply no update on this security?
  • Q25: When would your exchange send a reset?
  • Q25b: How do you handle resets?
  • Q26[D]: How do you start your feed handler mid-day?
  • Q26b: What are the concerns?
  • Q[D]: how do you handle potential data loss in multicast?
  • Q[D]: how do you ensure your feed handler can cope with a burst in messages in TCP vs multicast?
  • Q[D]: what if your feed handler is offline for a while and missed some messages?
  • Q21: Do you see gaps in sequence numbers? Are they normal?
  • Q21b: How is your feed handler designed to handle them?
  • Q[D]: does your exchange have a primary and secondary line? How do you combine both?
  • Q[D]: between primary and secondary channel, if one channel has lost data but the other channel did receive the data, how does your system combine them?
  • Q[D]: how do you use the disaster recovery channel?
  • Q[D]: If exchange has too much data to send in one channel, how many more channels do they usually use, and how do they distribute the data among multiple channels?
  • Q: is there a request channel where you send requests to exchange? What kind of data do you send in a request?
  • Q: Is there any consequence if you send too many requests?
    Q: what if some of your requests appear to be lost? How does your feed handler know and react?

[D=design question]

python % string formatting #padding

Many details I tend to forget:

  • the actual data can be either a tuple or dict. The dict version is powerful but less known
    • if there’s just one item in the tuple, you don’t need to parenthesis —  myStr = “%d” % var1
  • for a tuple, the format specifier count must match the tuple length
  • for a dict, each format specifier must name a valid key value.
myStr = "%(var1)d" % locals()) # locals() returns a dict including var1 
  • There are at least two q(%) in the above expression
  • extra parentheses after the 2nd % are required:

“#%03d” % (id(node)%1000)  # return last 3 digit of object id, with 0-padding

coreJava^big-data java job #XR

In the late 2010’s, Wall street java jobs were informally categorized into core-java vs J2EE. Nowadays “J2EE” is replaced by “full-stack” and “big-data”.

The typical core java interview requirements have remained unchanged — collections, threading, JVM tuning, compiler details (including keywords, generics, overriding, reflection, serialization ), …, but relatively few add-on packages.

(With the notable exception of java collections) Those add-on packages are, by definition, not part of the “core” java language. The full-stack and big-data java jobs use plenty of add-on packages. It’s no surprise that these jobs pay on par with core-java jobs. More than 5 years ago J2EE jobs, too, used to pay on par with core-java jobs, and sometimes higher.

My long-standing preference for core-java rests on one observation — churn. The add-on packages tend to have a relatively short shelf-life. They become outdated and lose relevance. I remember some of the add-on

  • Hadoop, Spark
  • functional java
  • SOAP, REST
  • GWT
  • NIO
  • Protobuf, json
  • Gemfire, Coherence, …
  • ajax integration
  • JDBC
  • Spring
  • Hibernate, iBatis
  • EJB
  • JMS, Tibco EMS, Solace …
  • XML-related packages (more than 10)
  • Servlet, JSP
  • JVM scripting including scala, groovy, jython, javascript@JVM… (I think none of them ever caught on outside one or two companies.)

None of them is absolutely necessary. I have seen many enterprise java systems using only one or two of these add-on packages.

2 uses@q[del] ] python

https://stackoverflow.com/questions/6146963/when-is-del-useful-in-python

  • Usage 1: in-place delete from a container — dict and list (usually inefficient)
    • You can also slice-delete from a list
    • You can even del mylist[start:end:stride]
  • Usage 2: remove a local(or global) name such as MyClass, myVar
    • the name is erased from the symbol table
    • dir() will no longer show it
    • usage? metaprogramming might use runtime introspection on dir()

c++enum GTD tips: index page

choose python^c++for cod`IV

1) Some hiring teams have an official policy — focus on coding skills and let candidate pick any language they like. I can see some interviewers are actually language-agnostic.

2) 99% of the hiring teams also have a timing expectation. Many candidates are deemed too slow. This is obvious in online coding tests. We all have failed those due to timing. (However, I guess on white-board python is not so much faster to code.)

If these two factors are important to a particular position, then python is likely better than c++ or java or c#.

  • Your code is shorter and easier to edit. No headers to include.
  • Compiler errors are shorter
  • c++ pointer or array indexing errors can crash without error message. Dynamic languages like python always give an error message.
  • STL iterator can become end() or invalidated. Result would often look normal, so we can spend a long time struggling a hidden bug.
  • Edit-Compile-Test cycle is reduced to Edit-Test
  • no uninitialized variables
  • Python offers some shortcuts to tricky c++ tasks, such as
    1. string: split,search, and many other convenient features. About 1/3 of the coding questions require non-trivial string manipulation.
    2. vector (a.k.a List): slicing, conversion(to string etc), insertion, deletion…, are much faster to write. Shorter code means lower chance of mistakes
    3. For debugging: Easy printing of vector, map and nested containers. No iteration required.
    4. easy search in containers
    5. iterating over any container or iterating over the characters of a string — very easy. Even easier than c++11 for-loop
    6. Dictionary lookup failure can return a default value
    7. Nested containers. Basically, the more complex the data structure , the more time-saving python is.
    8. multiple simultaneous return values — very very easy in python functions.
    9. a python function can return a bool half the times and a list otherwise!

If the real challenge lies in the algorithm, then solving it in any language is equally hard, but I feel timed coding tests are never that hard. A Facebook seminar presenter emphasized that across tech companies, every single coding problem is always, always solvable within the time limit.

hackerrank IV: full query

/*requirement: given 99 sentences and 22 queries, check each query against all 99 sentences. All the words in the query must show up in the sentence to qualify.
*/
bool check(string & q, string & sen){
    istringstream lineStream(q);
	string qword;
    while(getline(lineStream, qword, ' '))
	  if ( string::npos == sen.find(qword)) return false;
	return true;
}
void textQueries(vector <string> sentences, vector <string> queries) {
  for (string & qr: queries){
	string output1query;
    for(int i=0; i<sentences.size(); ++i){
	  if (check(qr, sentences[i]))	    output1query.append(to_string(i)+" ");
	}
	if (output1query.empty()) output1query = "-1";
    cout<<output1query<<endl;	 
  }
}
int main(){
  vector<string> sen={"a b c d", "a b c", "b c d e"};
  vector<string> que={"a b", "c d", "a e"};
  textQueries(sen, que);
}

multiple inheritance !! always trouble-maker

Many interviewers ask about MI.

  • Both java and c# officially support MI via interface inheritance
  • python supports MI but seldom used in my projects
  • c++ MI is safe if superclasses are protocol classes , just like java/c#.
  • c++ MI is OK if the superclasses do not conflict, and subclass is final, preventing the diamond problem
  • [[effC++]] also says it’s OK to publicly subclass an interface and private subclass a concrete class. I don’t fully understand it and not widely used IMHO
  • [[Alaxandrescu]] pointed out a combination of MI+TMP is essential for library designs. STL doesn’t use it, but ETSFlow does.

%%lead ] theoretical+lowLevel IV topics

See also the marketable_xp spreadsheet… I have consistently demonstrated strength in

1) TT: theoretical complexity: %%strength
2) LL: lowLevel IV topics (seldom needed in GTD) — threads, dStruct, vptr, language rules…

So how could TT/LL influence my 10Y career direction?

  • research domain? benefits from TT
  • mkt risk? TT
  • mkt data? LL
  • algo trading? Too competitive and poor market depth
  • quant dev? my TT advantage is not enough .. Too competitive and poor market depth
  • network optimization? My LL advantage is not enough
  • app owner? No. Not benefiting from my strengths and I tend to lose interest quickly

c++GC interface

https://stackoverflow.com/questions/27728142/c11-what-is-its-gc-interface-and-how-to-implement

GC interface is partly designed to enable

  • reachability-based leak detectors
  • garbage collection

The probe program listed in the URL shows that as of 2019, all major compilers provide trivial support for GC.

Q: why does c++ need GC, given RAII and smart pointers?
A: system-managed automatic GC instead of manual deallocation, without smart pointers

pass in func name^func ptr^functor object

Let’s set the stage — you can either pass in a function name (like myFunc), or a ptr to function (&myFunc) or a functor object. The functor is recommended even though it involves more typing. Justification — inlining. I remember a top c++ expert said so.

I believe “myFunc” is implicitly converted to & myFunc by compiler, so these two forms are equivalent.

 

3gradual changes]SG job market #cautious optimism

  1. c++ (and c#) is /conceding/ market share to java, partly due to the two categories above. Apparently, java is growing more dominant than before. I guess java is more proven, better supported, by a bigger ecosystem and have bigger talent pool. In contrast, c++ skill is harder to find in Singapore?
    1. Overall good news for me since my java arm is still stronger than c++ arm
  2. remote hiring — more Singapore teams are willing to hire from overseas. Lazada said “mostly over skype”
  3. Many non-finance companies now can pay 150k base or higher for a senior dev role. In my 2015 job search, I didn’t find any
  4. Many smaller fintech companies (not hedge funds) can pay 150k base or higher
  5. contracts becoming slightly more common
  6. lighter-blue-collar — programmer used to be blue-collar support staff for the revenue staff. Some of the companies listed above treat programmers as first-class citizens.

Reality warning — every time I try the SG job market, recruiters would tell me there are so many “new” employers or new markets, but invariably, i need to focus on the old guards .. mostly ibanks, as the new market is not open to me. This is similar to Deepak, Shanyou trying the wall St c++ job market.

I must stop /romanticizing/ about the “improvement” in SG job market.

  • Singapore tech shops are mostly not keen about my profile. U.S.? Not sure.
  • Singapore fintech shops ? zero interest shown, even when I asked 150k
  • Singapore buy-sides are interested but way too selective and kinda slow.
  • Note except GS I didn’t try the ibank jobs this time round.

Basically no change in the landscape since 2015. The jobs available to me are mostly ibanks. Cherish the MLP job but beware attachment. If this job goes sour, I would have to consider WallSt, rather than another perm job in SG.

Y allocate static field in .c file %%take

why do we have to define static field myStaticInt in a cpp file?

For a non-static field myInt, the allocation happens when the class instance is allocated on stack, on heap (with new()) or in global area.

However, myStaticInt isn’t take care of. It’s not on the real estate of the new instance. That’s why we need to declare it in the class header, and then define it exactly once (ODR) in a cpp file. It is allocated at compile time — static allocation.

RVO^move : on return value

Let’s set the stage. A function returns a local Trade object “myTrade” by value. Will RVO kick in or move-semantic kicks in? Not both!

I had lots of confusions about these 2 features.[[effModernC++]] P176 has a long discussion and an advice — do not write std::move() hoping to “help” compiler on a local object being returned from a function

  • If the local object is eligible for RVO then all compilers would elide the copy. Your std::move() would hinder the compiler and back fire
  • if the local object is ineligible for RVO then compiler are required to return an rvalue object, often implicitly using st::move(), so your help is unneeded.
    • Note local object returned by clone is a naturally-occurring temp object.

P23 [[c++stdLib]] gave 2-line answer:

  1. if Trade class has a suitable copy or move ctor, then compiler may choose to “elide the copy”. This was long implemented as RVO optimization in most compilers before c++11. https://github.com/tiger40490/repo1/blob/cpp1/cpp/rvr/moveOnlyType_pbvalue.cpp is my experiment.
  2. otherwise, if Trade class has a move ctor, the myTrade object is robbed

So if condition for RVO is present, then most likely your move-ctor will NOT run.

live updated hitCount over last5s#presumably Indeed

I hit a similar question in NY, possibly LiquidNet or CVA

Q: Make a system (perhaps a function?) that returns the average number of hits per minute from the past 5 minutes.

I will keep things simple by computing the total hit over the last 300 seconds. (Same complexity if you want average order amount at Amazon over last 5 minutes.)

Let’s first build a simple system before expanding it for capacity.

Let’s first design a ticking system that logs an update every time there’s an update. The log can be displayed or broadcast like a “notice board”, or we can update a shared atomic<int>.

Whenever we get a new record (a hit), we save it in a data structure stamped with an expiry date (datetime). At any time, we want to quickly find the earliest unexpired record i.e. the blue record. There’s only one blue at any time.

What data structure? RingBuffer with enough capacity to hold the last 5 minutes worth of record.

I will keep the address of the current blue record which is defined as the earliest unexpired record in the last update. When a new record comes in, i check “Is the blue expired?” If NO, then easy.. this new record is too close to the last new record. I simply update my “notice board” in O(1). If YES then we run a binary search for the new blue. Once we find it, we have to compute a new update in O(W), where W is the minimum of two counts, A) recently expired records B) still unexpired records.  After the update, we remove the expired items from our data structure.

–That concludes my first design. Now what if we also need to update the notice board even when there is no new record?

I would need an alarm set to the expiry time of the current blue.

–Now what if the updates are too frequent? I can run a schedule update job. I need to keep the address of a yellow record, defined as the newest record of the last update.

When triggered, routine is familiar. I check “Is the blue expired?” If NO then easy… If YES then binary-search for the new blue.

c++compiler select`move-ctor

This is a __key__ part of understanding move-semantics, seldom quizzed. Let’s set the stage:

  • you overload a traditional insert(Amount const &)  with a move version insert(Amount &&)
    • by the way, Derrick of TrexQuant introduced a 3rd alternative emplace()
  • without explicit std::move, you pass in an argument into insert()
  • (For this example, I want to keep things simple by avoid constructors, but the rules are the same.)

Q1: When would the compiler select the rvr version?

P22 [[c++stdLib]] has a limited outline. Here’s my illustration

  • if I pass in a temporary like insert(originalAmount + 15), then this argument is a rvalue obj so the rvr version is selected
  • if I pass in a regular variable like insert(originalAmount), then this argument is an lvalue obj so the traditional version is selected
  • … See also my dedicated blogpost in c++11overload resolution TYPE means..

After we are clear on Q1, we can look at Q2

Q2: how would std::move help?
A: insert(std::move(originalAmount)); // if we know the object behind originalAmount is no longer needed.

https://github.com/tiger40490/repo1/blob/cpp1/cpp1/rvrDemo.cpp shows when we need to use std::move() and when we don’t need.

Q:Just when do App(!! lib)devs write std::move

I feel move ctor (and move-assignment) is extremely implicit and “in-the-fabric”. I don’t know of any user function with a rvr parameter. Such a function is usually in some library. Consequently, in my projects I have not seen any user-level code that shows “std::move(…)”

Let’s look at move ctor. “In the fabric” means it’s mostly rather implicit i.e. invisible. Most of the time move ctor is picked by compiler based on some rules, and I have basically no influence over it.

https://github.com/tiger40490/repo1/blob/cpp1/cpp1/rvrDemo.cpp shows when I need to call move() but it’s a contrived example — I have some object (holding a resource via heap pointer), I use it once then I don’t need it any more, so I “move” its resource into a container and abandon the crippled object.

Conclusion — as app developers I seldom write code using std::move.

  • P20 [[c++ std lib] shows myCollection.insert(std::move(x)); // where x is a local nonref variable, not a heap pointer!
    • in this case, we should provide a wrapper function over std::move() named getRobberAliasOf()
    • I think you do this only if x has part of its internal storage allocated on heap, and only if the type X has a move ctor.

I bet that most of the time when an app developer writes “move(…)”, she doesn’t know if the move ctor will actually get picked by compiler. Verification needed.

— P544 [[c++primer]] offers a “best practice” — Outside of class implementations (like big4++), use std::move only when you are certain that you need to do a move and it is guaranteed safe.

Basically, the author believes user code seldom needs std::move.

— Here’s one contrived example of app developer writing std::move:

string myStr=input;
vectorOfString.push_back(std::move(myStr)); //we promise to compiler we won’t use myStr any more.

Without std::move, a copy of myStr is constructed in the vector. I call this a contrived example because

  • if input is a char-array, then emplace_back() is more efficient
  • if input is another temp string, then we can simply use push_back(input), which would bind to the rvr overload anyway.

c++11 atomic{int}^AtomicInteger.java #Bool can be half-written

The #1 usage of atomic<int> is load() and store(). I will use short form “load/store” or “l/s”.

The #2 usage is CAS.  Interviewers are mostly interested in this usage, though I won’t bother to remember the function names —
* compare_exchange_strong()
* compare_exchange_week()


The CAS usage is same as AtomicInteger.java, but the load/store usage is more like the thread-safety feature of Vector.java. To see the need for load/store, we need to realize the simple “int” type assignment is not atomic [1]:

  • P1012 [[c++ standard library]] shocked me by affirming that without locks, you can read a “half-written Boolean” [1].

To solve this problem, atomic<int> uses internal locks (just like Vector.java) to ensure load() and store() is always atomic.

[1] different from java. https://stackoverflow.com/questions/11459543/should-getters-and-setters-be-synchronized points out that 32-bit int in java is never “half-written”. If you read a shared mutable int in java, you can hit a stale value but never a c++ style half-written value. Therefore, java doesn’t need guarded load()/store() functions on an integer.

Q: are these c++ atomic types lock-free?
A: for load/store — not lock-free. See P 1013
A: for CAS — lock-free CPU instructions are used, if available.

c++QQ/zbs Expertise: I got some

As stated repeatedly, c++ is the most complicated and biggest language used in industry, at least in terms of syntax (tooManyVariations) and QQ topics. Well, I have impressed many expert interviewers on my core-c++ language insight.

That means I must have some expertise in c++ QQ topics. For my c++ zbs growth, see separate blog posts.

Note socket, shared mem … are c++ ecosystem, like OS libraries.

Deepak, Shanyou, Dilip .. are not necessarily stronger. They know some c++ sub-domains better, and I know some c++ sub-domains better, in both QQ and zbs.

–Now some of the topics to motivate myself to study

  • malloc and relatives … internals
  • enable_if
  • email discussion with CSY on temp obj
  • UDP functions

trySomethingNew : avoid perm jobs@@

There’s a high risk of under-performing. In a perm job, that invites warning, perf improvement, bonus fear — all forms of stigma-phobias.

With contract jobs, I can operate without the fear of stigma!

Here, “under-performing” mostly refers to “figure-things-out slower than team peers”, which (usually but) doesn’t always attracts those stigmas. Ultimately it’s the manager’s assessment.

Stirt/Quartz – For example, my figure-things-out speed was not slower than my peers and not slower than Barcap 2nd half, but still i got the stigma.

Citi — for an opposite example, my figure-things-out speed was rather slow but I didn’t get the stigma. I got renewed once.

making out a little-endian 32-bit int

Let’s set the stage — We have a stream of bytes in little-endian format. Let’s understand it according to the spec. The struct is packed tight without padding.

Spec says left-most field is a char. It is always on the left-most, regardless of endianness. If we look at the 8 bits, they are normal. 0x41 is ‘A’. Within the 8 bits, no reordering due to endianness.

Spec says next four bytes is an integer. Most significant bit (suppose a one) is on the right end, representing 2^31. What’s the integer value? To work out by hand, we need to pick the four bytes as is — Byte1 Byte2 Byte3 Byte4. Then we reverse them into Byte4 Byte 3 Byte2 Byte1. Now this 32-bit integer is human-readable. The human-readable form is now a binary number taught in classrooms.

Note the software program still uses the original “Byte1 Byte2 Byte3 Byte4” and can print out the correct integer value.

Spec says next four bytes is a float. There’s nothing I can do to make out its value without a computer, so I don’t bother to rearrange the bytes.

Next 2 bytes is a string like “XY”. First byte is “X”. Endian-ness doesn’t bother us.

posix^SysV-sharedMem^MMF

http://www.boost.org/doc/libs/1_65_0/doc/html/interprocess/sharedmemorybetweenprocesses.html#interprocess.sharedmemorybetweenprocesses.sharedmemory.xsi_shared_memory points out

  • Boost.Interprocess provides portable shared memory in terms of POSIX semantics. I think this is the simplest or default mode of Boost.Interprocess. (There are at least two other modes.)
  • Unlike POSIX shared memory segments, SysV shared memory segments are not identified by names but by ‘keys’. SysV shared memory mechanism is quite popular and portable, and it’s not based in file-mapping semantics, but it uses special system functions (shmgetshmatshmdtshmctl…).
  • We could say that memory-mapped files offer the same interprocess communication services as shared memory, with the addition of filesystem persistence. However, as the operating system has to synchronize the file contents with the memory contents, memory-mapped files are not as fast as shared memory. Therefore, I don’t see any market value in this knowledge.

how I achieved%% ComfortableEconomicProfile by44

(I want to keep this blog in recrec, not tanbinvest. I want to be brief yet incisive.)

See 3 ffree scenarios: cashflow figures. What capabilities enabled me to achieved my current Comfortable Economic profile?

  • — top 3
  • by earning SGP citizenships
  • by developing my own investment strategies, via trial-n-error
  • by staying healthy
  • — the obvious
  • by high saving rate thanks to salary + low burn rate — efficiency inspired by SG gov
  • by consistent body-building with in-demand skills -> job security. I think rather few of my peers have this level of job security. Most of them work in one company for years. They may be lucky when they need a new job, but they don’t have my inner confidence and level of control on that “luck”. Look at Y.W.Chen. He only developed that confidence/control after he changed job a few times.

When I say “Comfortable” I don’t mean “above-peers”, and not complete financial freedom, but rather … easily affordable lifestyle without the modern-day pressure to work hard and make a living. In my life there are still too many pressures to cope with, but I don’t need to work so damn hard trying to earn enough to make ends meet.

A higher salary or promotion is “extremely desirable” but not needed. I’m satisfied with what I have now.

I can basically retire comfortably.

have used Briefly : sharedMem/lockfree/..

Just like my early learning curve in sockets, Dynamic Programming and swing, I have yet to achieve a breakthrough in these topics, So there are too many topics and I don’t know what to focus on.

It’s important not to exaggerate your expertise in these areas. Once interviewers find out your exaggeration, subconscious they would discount other parts of your resume.

  • c++ lock-free — “Used in my project but not written by me”
  • Shared mem
  • Boost::*
  • Epoll
  • Multiple inheritance
  • Pyton multiprocessing

implement an exchange #Trex Kenny

Q: create an exchange with messaging for NewOrderSingle, ExecutionReport etc. (I think interviewer means the matching server.)

  • need to support restart in any client or the exchange itself.
  • (Use ack or other techniques) basic reliability such as
    • Client needs to know for sure if new order is received.
    • Exchange needs to ensure execution report is received.
  • Please write c++ code, and compile it if possible. One hour given.
  • no need to send out market data to subscribers — not the focus

POSIX^SysV sempaphores

https://www.ibm.com/developerworks/library/l-semaphore/index.html — i have not read it.

My [[beginning linux programming]] book also touches on the differences.

I feel this is less important than the sharedMem topic.

  • The posix semaphore is part of pthreads i.e. Posix Threads
  • The sysV semaphore is part of IPC and often mentioned along with sysV sharedMem

The counting semaphore is best known and easy to understand.

  • The pthreads semaphore can be used this way or as a binary semaphore.
  • The system V semaphore can be used this way or as a binary semaphore. See http://portal.unimap.edu.my/portal/page/portal30/Lecturer%20Notes/KEJURUTERAAN_KOMPUTER/SEM10809/EKT424_REAL_TIME_SYSTEM/LINUX_FOR_YOU/12_IPC_SEMAPHORE.PDF

Linux manpage pointed out — System V semaphores (semget(2)semop(2), etc.) are an older semaphore API. POSIX semaphores provide a simpler, and better designed interface than System V semaphores; on the other hand POSIX semaphores are less widely available (especially on older systems) than System V semaphores.

The same manage implies both APIs use a _counting_ semaphore semantic, without notification semantics

git | backup b4 history rewrite

Personal best practice. Git History rewrite is not always no-brainer and riskless. Once in 100 times it can become nasty.

It should Never affect file content, so at end of the rewrite we need to diff against a before-image to confirm no change.

The dumbest (and most foolproof) before-image is a zip of entire directory but here’s a lighter alternative:

  1. git branch b4rewrite/foo
  2. git reset or rebase or cherry-pick
  3. git diff b4rewrite/foo

Note the branch name can be long but always explicit so I can delete it later without doubt.

posix sharedMem: key points { boost

http://www.boost.org/doc/libs/1_65_0/doc/html/interprocess/sharedmemorybetweenprocesses.html#interprocess.sharedmemorybetweenprocesses.sharedmemory.shared_memory_steps is excellent summary

* We (the app developer) need to pick a unique name for the shared memory region, managed by the kernel.

* we can use create_only, open_only or open_or_create

* When we link (or “attach” in sysV lingo) App1’s memory space to the shared memory region, the operating system looks for a big enough memory address range in App1’s address space and marks that address range as an special range. Changes in that address range are automatically seen by App2 that also has mapped the same shared memory object.

* As shared memory has kernel or filesystem persistence, we must explicitly destroy it.

Above is the posix mode. The sysV mode is somewhat different.

ranged-based for-loop over a map or vector

Hi Kam,

Thanks for your valuable tip. I found this 3-point summary on https://stackoverflow.com/questions/15176104/c11-range-based-loop-get-item-by-value-or-reference-to-const

1) Choose for(auto x : myVector) when you want to work with copies.
2) Choose for(auto &x : myVector) when you want to work with original items and may modify them.
3) Choose for(auto const &x : myVector) when you want to work with original items and will not modify them.

In the case of myMap, the first form for(auto myPair: myMap) still clones the pairs from myMap. To use a reference instead of a clone, we need the 2nd or 3rd forms.

nearest-rank percentile definition

https://en.wikipedia.org/wiki/Percentile#The_nearest-rank_method has a few concise pointers

  • a percentile value is always an original object, never interpolated
  • we can think of it as a mapping function percentile(int100) -> an original object, where int100 data type can be a value from 1 to 100 (zero disallowed). Two distinct inputs can map to the same output object.
  • 100th percentile is always the largest object in the list. No such thing as 0th. 1st percentile is not always the smallest object.

python: value@ECT+syntax > deep insight

c++ interviews value deep insight more than any language. Java and c# interviews also value them highly, but not python interviews.

Reminder — zoom in and dig deep in c++, java and c# only. Don’t do that in python too much.

Instead of deep insight, accumulate ECT syntax … highly valued in TIMED coding tests.

Use brief blog posts with catchy titles

depth-first-traversal, height-aware

Latest: https://github.com/tiger40490/repo1/blob/cpp1/cpp1/binTree/DFT_show_level.cpp


struct Node {
    int data;
    Node *left, *right, *next;
    Node(int x, Node * le = NULL, Node * ri = NULL) : data(x), left(le), right(ri), next(NULL) {}
};
    Node _15(15);
    Node _14(14);
    Node _13(13);
    Node _12(12);
    Node _11(11);
    Node _10(10);
    Node _9(9);
    Node _8(8);
    Node _7(7, &_14, &_15);
    Node _6(6, NULL, &_13);
    Node _5(5, &_10, NULL);
    Node _4(4, NULL, &_9);
    Node _3(3, &_6,  &_7);
    Node _2(2, &_4,  &_5);
    Node root(1, &_2, &_3);

int maxD=0;
void recur(Node * n){
  static int lvl=0;
  ++lvl;
  if (lvl>maxD) maxD = lvl;
  if (n->left){ recur(n->left); }
  cout<<n->data<<" processed at level = "<<lvl<<endl;
  if (n->right){ recur(n->right); }
  --lvl;
}
int maxDepth(){
    recur(&root);
    cout<<maxD;
}
int main(){
   maxDepth();
}

p2p messaging beats MOM ] low-latency trading

example — RTS exchange feed dissemination infrastructure uses raw TCP and UDP sockets and no MOM

example — the biggest sell-side equity OMS network uses MOM only for minor things (eg?). No MOM for market data. No MOM carrying FIX order messages. Between OMS nodes on the network, FIX over TCP is used

I read and recorded the same technique in 2009… in this blog

Q: why is this technique not used on west coast or main street ?
%%A: I feel on west coast throughput outweighs latency. MOM enhances throughput.

after fork(): threads,sockets.. #Trex

I have read about fork() many times without knowing these details, until Trex interviewer asked !

–based on http://man7.org/linux/man-pages/man2/fork.2.html

The child process is created with a single thread—the one that called fork(). The entire virtual address space of the parent is replicated in the new process, including the states of pthread mutexes, pthread condition variables, and other pthreads objects In particular, if in parent process a lock was held by some other thread t2, then child process only has the main thread (which called fork()) and no t2 but the lock is still unavailable. This is a common problem, addressed in http://poincare.matf.bg.ac.rs/~ivana/courses/ps/sistemi_knjige/pomocno/apue/APUE/0201433079/ch12lev1sec9.html.

The very 1st instruction executed in Child is the instruction after fork() — as proven in https://github.com/tiger40490/repo1/blob/cpp1/cpp1/fork3times.cpp

The child inherits copies of the parent’s set of open file descriptors, including stdin/stdout/stderr. Child process should usually close them.

Special case — socket file descriptor inherited. See https://bintanvictor.wordpress.com/2017/04/29/socket-shared-between-2-processes/

 

gdb to show c++thread wait`for mutex/condVar

https://github.com/tiger40490/repo1/blob/cpp1/cpp/thr/pthreadCondVar.cpp shows my experiment using gdb supplied by StrawberryPerl.

On this g++/gdb set-up, “info threads” shows thread id number 1 for main thread, “2” for the thread whose pthread_self() == 2 … matching 🙂

The same “info-threads” output also shows

  • one of the worker threads is executing sleep() while holding lock (by design)
  • the other worker threads are all waiting for the lock.
  • At the same time, the main thread is waiting in a conditional variable, so info-threads shows it executing a different function.

multicast: IV care only about bookish nlg !!practical skills

Hi friends,

I recently used multicast for a while and I see it as yet another example of the same pattern — technical interviewers care about deep theoretical knowledge not practical skills.

Many new developers don’t know multicast protocol uses special IP addresses. This is practical knowledge required on my job, but not asked by interviewers.

Unlike TCP, there’s not a “server” or a “client” in a multicast set-up. This is practical knowledge in my project but not asked by interviewers.

When I receive no data from a multicast channel, it’s not obvious whether nobody is sending or I have no connectivity. (In contrast, with TCP, you get connection error if there’s no connectivity. See tcp: detect wire unplugged.) This is practical knowledge, but never asked by interviewers.

I never receive a partial message by multicast, but I always receive partial message by TCP when the message is a huge file. This is reality in my project, but never asked by any interviewer.

So what do interviewers focus on?

  • packet loss — UDP (including multicast) lacks delivery guarantee. This is a real issue for system design, but I seldom notice it.
  • higher efficiency than TCP — I don’t notice it, though it’s a true.
  • socket buffer overflow — should never happen in TCP but could happen in UDP including multiast. This knowledge is not needed in my project.
  • flow control — TCP receiver can notify sender to reduce sending speed. This knowledge is not needed in many projects.
  • non-blocking send/receive — not needed in any project.

So what can we do? Study beyond what’s needed in the project. (The practical skills used is only 10% of the interview requirements.) Otherwise, even after 2 years using multicast in very project, I would still look like as a novice to an interviewer.

Without the job interviews, it’s hard to know what theoretical details are required. I feel a multicast project is a valuable starting point to get me started. I can truthfully mention multicast in my resume. Then I need to attend interviews and study the theoretical topics.

breakdown heap/non-heap footprint@c++app #massif

After reading http://valgrind.org/docs/manual/ms-manual.html#ms-manual.not-measured, I was able to get massif to capture non-heap memory:

valgrind --tool=massif  --pages-as-heap=yes --massif-out-file=$massifOut .../xtap -c ....
ms_print $massifOut

Heap allocation functions such as malloc are built on top of system calls  such as mmapmremap, and brk. For example, when needed, an allocator will typically call mmap to allocate a large chunk of memory, and then hand over pieces of that memory chunk to the client program in response to calls to malloc et al. Massif directly measures only these higher-level malloc et al calls, not the lower-level system calls.

Furthermore, a client program may use these lower-level system calls directly to allocate memory. By default, Massif does not measure these. Nor does it measure the size of code, data and BSS segments. Therefore, the numbers reported by Massif may be significantly smaller than those reported by tools such as top that measure a program’s total size in memory.



			

q[java ecosystem]==jxee+tools+..

This classification helps me organize my java learning, but let’s not spend too much time on this imprecise concept —

So-called “java ecosystem” is anything outside the “core java” stack and include jxee plus ..

  • GC, JIT
  • JNI
  • swing, AWT
  • ajax integration
  • protobuf/json
  • tools: eclipse, Maven, CI tools,
  • tools: JDK bundled tools like jhat, visualvm

 

3players1coin #MS probability

Q: Three players A/B/C flipping a fair coin one after each other until the first head is thrown, What’s the probability of Alice winning.

I think the problem is the same if coin is biased P(H)=0.6


Denote Pr(Alice eventually wins) as x.

Pr(first 3 are TTT AND Alice eventually wins) = 1/8 * x

x = 1/2 + 1/8 * x  —>  x=4/7

hacking imported module #homemade trick

Background: Suppose in a big python application your main script imports a few packages and modules. One of them is mod2.py, which in turn imports mod2a.py.

Now You need to add invesgative logging/instrumentation to mod2a.py but this file is loaded from a readonly firm-wide repository, common practice in big teams. Here’s my tested technique:

1. clone mod2a.py to your home dir and add the logging. Now we need to import this modified version.
2. clone mod2.py to your home dir and open it to locate the importation of mod2a
3. edit mod2.py to change the importation of mod2a:

sys.path.insert(0, ‘/home/dir’)
import mod2a # via /home/dir
sys.path.remove(‘/home/dir’)

All other imports should be unaffected.

4. edit main.py and update the importation of mod2.py to load it too from /home/dir

As an additional hack, Some people may rename the modified mod2a.py file. This is doable IIF the import line is

from mod2a import someSymbol

Otherwise, every mention of “mod2a” in mod2.py needs a change

bashslash escape: bash tricky rules

This is about shell interpreting the backslash sequence inside single-quote or double-quote.

Once bash does its parsing, it can pass the result to a command like perl or grep.

----Most escape sequences don't care about single-quote vs double-quote
$ echo "msgType\t"
msgType\t

$ echo "msgType\b"
msgType\b

# \b is meaningful in perl regex 🙂

----double backslash -- single-quote is simpler than double-quote
$ echo 'msgType\\'
msgType\\

$ echo "msgType\\"
msgType\

----single quote within single-quoted string is very tricky:
$ echo 'msgType\'\' 
msgType\'

# in the above, the last \' is a second token, a single-char string.

$ echo $'msgType\''  # dollar sign is crucial
msgType'

$ echo 'msgType\'' # somehow doesn't work without $
>

microservices “MSA” #phrasebook

I feel MSA is more of a architect interview topic, not a developer interview topic. Dev complexity is low by design.

eg: error acct lookup, receiving productId + possibly a clientId, returning an error acct

Now the phrasebook:

  • jxee — As of 2019, I guess jxee has the best support for MSA
  • enterprise — enterprise-bias. Most of the practices used in SOA/MSA come from developers who have created software applications for large enterprise organizations.
  • SOA — is the ancestor and now out of fashion. I think MSA will also fall out of fashion.
  • stateless — stateless microservice is best. Can be highly concurrent and scaled out
  • scalability — hopefully better
  • decentralized — rather than monolithic
  • modularity
  • communication protocol — supposedly lightweight, but more costly than in-process communication
    • http — is commonly used for communication. Presumably not asynchronous
    • messaging — metaphor is often used for communication. I doubt there’s any MOM of message queue.
  • cloud-friendly — cheaper
  • flexible — in the face of changing requirements, though I’m not sure time-to-market will improve
  • simple-facade — (of a big monolithic service) is now replaced by more complex interface, so I suspect this is not always popular.
  • complexity — (various forms) is the public enemy but I don’t know which weapon (REST,SOA,ESB,MOM,Spring) actually works
  • in-process — services can be hosted in a single process, but less common
  • devops — is a driver
    • testability — each service is easy to test, but not integration test
    • loosely coupled — decentralized, autonomous dev teams
    • deployment — is ideally independent for each service, and continuous, but overall system deployment is complicated

contents to keep in .C rather than .H file

1) Opening example — Suppose a constant SSN=123456789 is used in a1.cpp only. It is therefore a “local constant” and should be kept in a1.cpp not some .H file.  Reason?

The .H file may get included in some new .cpp file in the future. So we end up with multiple .cpp files dependent (at compile-time) on this .H file. Any change to the value or name of this SSN constant would require recompilation to not only a1.cpp but unnecessarily to other .cpp files 😦

2) #define and #include directives — should be kept in a1.cpp as much as possible, not .H files. This way, any change to  the directives would only require recompiling a1.cpp.

The pimpl idiom and forward-declaration use similar techniques to speed up recompile.

3) documentation comments — some of these documentations are subject to frequent change. If put in .H then any comment change would trigger recompilation of multiple .cpp files

## mkt data: avoid byte-copying #NIO

I would say “avoid” or “eliminate” rather than “minimize” byte copying. Market data volume is gigabytes so we want and can design solutions to completely eliminate byte copying.

  • RTS uses reinterpret_cast but still there’s copying from kernel socket buffer to userland buffer.
  • Java NIO buffers can remove the copying between JVM heap and the socket buffer in C library. See P226 [[javaPerf]]
  • java autoboxing is highly unpopular for market data systems. Use byte arrays instead

CMS JGC: deprecated in java9

Java9/10 default GC is G1. CMS is officially deprecated in Java 9.

Java8/7 default GC is ParallelGC, CMS. See https://stackoverflow.com/questions/33206313/default-garbage-collector-for-java-8

Note parallelGC uses

  • parallel in most generations
  • serial in old gen

…whereas parallelOldGC uses parallel in all generations.

Q: why is CMS deprecated?
A: one blogger seems to know the news well. He said JVM engineering team needs to focus on new GC engines and need to let go the most high-maintenance but outdated codebase — the CMS, As a result, new development will cease on CMS but CMS engine is likely to be available for a long time.

[19] attitude@jxee^coreJava

jxee (esp. web java) is fashionable … high growth, big job pool

  • –in terms of churn resistance and shelf life .. jxee << core java

Q: Is there some jxee component with stable demand and accu? Spring? Servlet is very relevant from 1999 to 2019 but not quizzed in IV!

  • –in terms of project LOE … core java =< jxee

Some developers are afraid of the unique challenges [1] in core java, but I’m more afraid of complexities in jxee packages esp. when combined in non-standard combinations. See my blogpost on python routine tasks and my blogpost on spring.

[1] threading, latency, collections .. but I don’t want to elaborate here.

  • –in terms of IV body building and entry barrier, jxee < core java < cpp. I struggled with cpp IV for years but was able to crack c# IV within 2 years.

Without enough evidence, I feel jxee skills are less elite, easy to self-study, and shallow until you hit project issues.

  • –In terms of salary, jxee = core java = cpp .. I was proven wrong! Even though c++ and core java IVs are arguably harder and more elitist, they don’t pay higher than jxee jobs.
  • –in terms of market depth+size, cpp < core java < jxee

core java is mostly limited to ibanks + buy-side. jxee presumably offers better market depth and breadth. Without enough evidence, I feel job pool is growing for jxee not for core java or cpp. Similarly, job pool is growing for javascript, mobile, big data, cloud..

No need to experiment at home or read books like I did on JMS, EJB, Spring. It takes too much time but doesn’t really give me …

latency QQ ]WallSt IV #java,c++..

Latency knowledge

  • is never needed on the job but … high Market Value
  • is not GTD at all but … is part of zbs
  • is not needed in py jobs
  • is needed in many c++ interview topics but .. in java is concentrated in JIT and GC
  • is an elite skill but … many candidates try
  • some depth is needed for IV and other discussions but … relatively low-complexity .. low-complexity topics #eg:GC/socket

Arrays.sort(primitiveArray) beats List.sort() #defaultMethod

In terms of sorting performance, Arrays.sort(primitiveArray) is a few times faster than Collections.sort() even though both are O(N logN). My learning notes:

  • Arrays.sort(int []) is a double-pivot quicksort, probably using random access
  • Arrays.sort(Object []) is a mergesort
  • Collections.sort(List) defers to List.sort()
    • List.sort() is a Java8 default method in the List.java interface. It copies data to an array then runs a mergesort
    • ArrayList.java overrides the default method, so no copying for ArrayList from  java8 onwards

RandomAccess marker interface (ArrayList implements) is completely irrelevant. That’s because any List.java subtype that provides RandomAccess can simply override (at source code level) the default method as demonstrated in ArrayList.java. This is cleaner than checking RandomAccess at runtime. One or Both designs could potentially be JIT-compiled to remove the runtime check.

 

ADL #namespace

  • “Put the function in the same namespace as the classes it operates on.” is a one-liner summary
  • If you want to write a function that needs only a class’s public interface – then that function doesn’t have to be a (static/non-static) member. The function can become a free function placed in the same name space as the class. This increases encapsulation and data hiding, and reduces coupling as only the public interface is needed.. P79 [[c++codingStd]]
  • It’s also an idiom/pattern to impress interviewers. The published experts all seem to view ADL as a library feature mainly for overloaded operators. Read cppreference and [[c++codingStd]]. I’m not interested in that usage.
  • boost seems to use a lot of ADL
  • I feel namespace loves ADL more than any other c++ feature 🙂
  • ADL is an endorsed, prized compiler feature but still with criticisms [1]. To eliminate the confusion, simply fully qualify the function call.

[1] https://stackoverflow.com/questions/8111677/what-is-argument-dependent-lookup-aka-adl-or-koenig-lookup has the best quickguide with simple examples.

https://softwareengineering.stackexchange.com/questions/274306/free-standing-functions-in-global-namespace is a short, readable discussion.

My annotations on the formal, long-winded definition — ADL governs look-up of unqualified function names in function-call expressions (including operator calls). These function names are looked up in the namespaces of their arguments in addition to the usual namespace-based lookup.

JDK + eclipse install: %%preference

  • C:/j8 or j801 # I prefer shorter path. I almost never have two versions of java on one machine
    • There’s an embedded ./jre directory for a so-called “private JRE”
  • public JRE is optional and should be installed outside JDK directory, but I have not needed it since I started using java.
  • JAVA_HOME and PATH
  • —-eclipse: I prefer the simple zip file download
  • C:/ide/elcipse #worked OK
  • c++ide? can use c:/ide/eclipseCDT.

 

github tips #email

  • q(git config credential.helper store) is the command that finally fixed my git-bash forgetting-password problem on my win7 Dell. The github “recommended” command only worked in my win10 Lenovo and office win7:
    • git config --global credential.helper wincred 
      
  • On Linux, I only needed to run a simple command to cache my credentials
$ git config credential.helper store
$ git push 
Username for 'https://github.com': <USERNAME>
Password for 'https://USERNAME@github.com': <PASSWORD>
  • to comply with email privacy,
    • git config --global user.email {ID}+{username}@users.noreply.github.com
      You can find it in https://github.com/settings/emails
  • –to download an entire branch on the github web interface: CloneOrDownload button can download a zipfile
  • –to mass upload using drag-n-drop
  • I realize that the zipfile downloaded has a problem. When I open the zipfile (but not extract) and select the files, the drag-n-drop interface fails. I had to copy the files to a tmp directory.
  • –delete folder: not so easy
  • –file naming tips
    • ^ is slightly less ideal
    • I would use underscore as default and use dash sparingly (like 0-N). Camel case is even better.
  • –rename file @web interface is easy
  • –rename branch: no web interface. I had to …
    • checkout branch locally,
    • rename branch
    • commit
    • push, which triggered a login pop-up.
  • –rename folder: no web interface. I had to
    • git mv dir1 com/xx/dir1 # close MSWE if you get perm error
    • git commit
    • git push

c++static field init: rules

See also post on extern…

These rules are mostly based on [[c++primer]], about static Field, not local statics or file-scope static variables.

Rule 1 (the “Once” rule) — init must appear AND execute exactly once for each static field.

In my Ticker Plant xtap experience, the static field definition crucially sets aside storage for the static field. The initial value is often a dummy value.

Corollary: avoid doing init in header files, which is often included multiple times. See exception below.

Rule 2 (the “Twice” rule) — static field Must (See exception below) be DECLARED in the class definition block, and also DEFINED outside. Therefore, the same variable is “specified” exactly twice [1]. However, the run time would “see” the declaration multiple times if it’s included in multiple places.

Corollary: always illegal to init a static field at both declaration and definition.

[1] Note ‘static’ keyword should be at declaration not definition. Ditto for static methods. See P117 [[essential c++]]

The Exception — static integer constant Fields are special, and can be initialized in 2 ways
* at declaration. You don’t define it again.
* at definition, outside the class. In this case, declaration would NOT initialize — Rule 1

The exception is specifically for static integer constant field:

  • if non-const, then you can only initialize it in ctor
  • if non-integer static, then you need to define it outside
  • if non-const static, then ditto
  • if not a field, then a different set of rules apply.

Rule 3: For all other static fields, init MUST be at-definition, outside the class body.

Therefore, it’s simpler to follow Rule 3 for all static fields including integer constants, though other people’s code are beyond my control.

——Here’s an email I sent about the Exception —–
It turned these are namespace variables, not member variables.

Re: removing “const” from static member variables like EXCHANGE_ID_L1

Hi Dilip,

I believe you need to define such a variable in the *.C file as soon as you remove the “const” keyword.

I just read online that “integer const” static member variables are special — they can be initialized at declaration, in the header file. All other static member variables must be declared in header and then defined in the *.C file.

Since you will overwrite those EXCHANE_ID_* member variables, they are no longer const, and they need to be defined in Parser.C.

efficient swap(): two containers-of-T

Background — template function std::swap(T&, T&) works for int, float etc, but the same implementation will not work efficiently for vector, list, map or set. Therefore I suspected there might be specializations of swap() template function.

As it turns out, vector (and the other containers) provides a swap() member function. So the implementation of vector swap is indeed different from std::swap().

RandomAccess #ArrayList sorting

Very few JDK containers implement the RandomAccess marker interface. I only know Stack.java, ArrayList.java and subclass Vector.java. Raw array isn’t.

Only List.java subtypes can implement RandomAccess. Javadoc says

“The primary purpose of this interface is to allow generic algorithms to alter their behavior when applied to either random or sequential access lists.”

Q: which “generic algos” actually check RamdonAccess?
AA: Collections.binarySearch() in https://docs.oracle.com/javase/7/docs/api/java/util/Collections.html
AA: to my surprise, Collections.sort() does NOT care about RandomAccess, so ArrayList sorting is no different from LinkedList sorting! See my blogpost Arrays.sort(primitiveArray) beat List.sort()

http://etutorials.org/Programming/Java+performance+tuning/Chapter+11.+Appropriate+Data+Structures+and+Algorithms/11.6+The+RandomAccess+Interface/ has more details

 

CPU(data)cache prefetching

https://stackoverflow.com/questions/1950878/c-for-loop-indexing-is-forward-indexing-faster-in-new-cpus top answer is concise. I think the observations may not be relevant in x years but the principles are.

  • adjacent cache line (ACL) prefetcher — simple to understand
  • cpu can detect streams of memory accesses in forward or backward directions

Note L1/L2/L3 caches are considered part of the CPU even if some of them are physically outside the microprocessor.

private-header^shared-header

In our discussions on ODR, global variables, file-scope static variables, global functions … the concept of “shared header” is often misunderstood.

  • If a header is only included in one *.cpp, then its content is effectively part of a *.cpp.

Therefore, you may experiment by putting “wrong” things in such a private header and the set-up may work or fail, but it’s an invalid test. Your test is basically putting those “wrong” things in an implementation file!

 

python random-generate/shuffle list@int

I now think the simplest is xrange() followed by random.shuffle. Don’t bother with random.sample(). See my github code.

https://stackoverflow.com/questions/22842289/generate-n-unique-random-numbers-within-a-range shows

https://github.com/tiger40490/repo1/blob/py1/py/array/qsort.py uses

random.sample(xrange(99, 100), 19)

In general, below construct is useful because sampleSize must never exceed size of the range i.e. the population:

random.sample(xrange(-99, -99+sampleSize), sampleSize)

I think the generated list has no duplicates, so I had to manually create some duplicates. I guess random.shuffle can work on a list containing duplicate…

–to generate an array of 8 integers between 0 and 100

>>> import random
>>> random.sample(xrange(100), 8)
[39, 53, 1, 80, 54, 61, 4, 26]

##past easy jobs : my80% did exceed benchmark

Ideally, I want to get a job role slightly lower than the highest salary, where my 80% effort can start to exceed the expectation of THE appraiser.

Grandpa said “At 100% if you don’t hit their requirement, then you don’t need to put in 120% and sacrifice family. It’s their hiring mistake. Their problem. if they don’t pay a compensation package then just leave.” Am I afraid of job change? See separate blogpost.

Looking at past jobs, the numbers below are very imprecise and subjective.

[G/g=Greenfield]
[B/b=brownfield]

  • [b] Citi muni — my 70<-80% might be enough, or might earn me no bonus like many Citi guys.
  • [B/g] GS — after the steep learning curve, my 100% was indeed enough to meet the high bar, but I feel 95% would not be
  • [G] 95G — after I proved myself, my 80<-90% was exceeding.
    • Big factor — colleagues were weaker
    • Big factor — my designs were favored by the boss
  • [G] Barc — after I proved myself, my 70<-90% was exceeding.
    • Big factor — I built a high-value, high visibility part of the system
    • Big factor — no one else was qualified to work on that
  • [g] Macq role was very senior — my 100% was NOT enough even on the basic devops part of the job. I once felt my 80% was fine in the 1st year, but actually expectation is much higher than I thought.
  • [B] Stirt — my 100% was barely good enough but not enough to earn a bonus. The project was a few months old when I joined but I struggled with the Qz platform:(. I was unfairly bench-marked against 3Y veterans! I would look decent if bench-marked against freshers.
  • [b] OC — my 70% would be enough but too relaxed
  • [b] RTS — After initial 9 months (NYSE+Aquis), my 40<-50% would be enough