attribute((packed))

发表于1月 31, 20187月 19, 2018 作者 BinTAN

#define PACKED __attribute__ ((packed))

This is used in our parsers. https://stackoverflow.com/questions/11770451/what-is-the-meaning-of-attribute-packed-aligned4 explained that this is a compiler feature.

fascinated by analysis@fail`tech brands

发表于1月 30, 201812月 5, 2018 作者 BinTAN

I tend to learn something (very indirectly) about competition, churn, focus, …

Nokia — refocused on its traditional strength in telecom infrastructure
Erickson
RIM
Yahoo
Novell

I feel my choice of java and c++ have been good, but regret my investment in dotnet, pthreads

I feel any dominant technology can get displaced. A few examples (not intended to be complete) — C/C++ (by java), RDBMS, Solaris (by Linux)

My tech bets for 2019-2020:

system knowledge #CSY
socket
c++ TMP — a pure QQ domain for IV muscle-building
c++ threading

core file location

发表于1月 30, 20181月 30, 2018 作者 BinTAN

Still no definitive answers…

In one machine, q(ulimit -c) returns 0 meaning suppressing core files. I had to run q(ulimit -c unlimited) to get my core files generated.

pass temp into func by val: mv-ctor skipped #RVO CSY

发表于1月 30, 20188月 19, 2019 作者 BinTAN

Suppose we have a class MoveOnlyStr which has only move-ctor, no copy-ctor. Suppose we pass an unnamed temporary instance of this class into a function by value, like void func1(MoverOnlyStr arg_mos).

Q: Will the move-ctor be used to create the argument object arg_mos? We discussed this in your car last time we met up.

A: If the temp is produce by a function, then No. My test shows I was right to predict that compiler optimizes away the temporary, due to RVO. So move-ctor is NOT used. This RVO optimization has existed long before c++11.

A: if the temp is not produced by a function, then RVO is irrelevant (nothing “Returned”) but I don’t know if there’s still some copy-elision.

##mkt-data jargon terms: fakeIV

发表于1月 29, 20183月 21, 2018 作者 BinTAN

Context — real time high volume market data feed from exchanges.

Q: what’s depth of market? Who may need it?
Q: what’s the different usages of FIX vs binary protocols?
Q: when is FIX not suitable? (market data dissemination …)
Q: when is binary protocol not suitable? (order submission …)
Q: what’s BBO i.e. best bid offer? Why is it important?
Q: How do you update BBO and when do you send out the update?
Q[D]: how do you support trade-bust that comes with a trade id? How about order cancellation — is it handled differently?
Q: how do you handle an order modification?
Q: how do you handle an order replacement?
Q: How do you handle currency code? (My parser sends the currency code for each symbol only once, not on every trade)
Q: do you have the refresh channel? How can it be useful?
Q[D]: do you have a snapshot channel? How can it be useful?
Q: when do you shutdown your feed handler? (We don’t, since our clients expect to receive refresh/snapshots from us round the clock. We only restart once a while)
Q: if you keep running, then how is your daily open/close/high/low updated?
Q: how do you handle partial fill of a big order?
Q: what’s an iceberg order? How do you handle it?
Q: what’s a hidden order? How do you handle it?
Q: What’s Imbalance data and why do clients need it?
- Hint: they impact daily closing prices which can trigger many derivative contracts. Closing prices are closely watched somewhat like LIBOR.
Q: when would we get imbalance data?
Q: are there imbalance data for all stocks? (No)
——— symbol/reference data
Q[D]: Do you send security description to downstream in every message? It can be quite long and take up lots of bandwidth?
Q: what time of the day do symbol data come in?
Q5[D]: what essential attributes are there in a typical symbol message?
Q5b: Can you name one of the key reasons why symbol message is critical to a market data feed?
Q6: how would dividend impact the numbers in your feed? Will that mess up the data you send out to downstream?
Q6b[D]: if yes how do you handle it?
Q6c: what kind of exchanges don’t have dividend data?
Q: what is a stock split? How do you handle it?
Q: how are corporate actions handled in your feed?
Q: what are the most common corporate actions in your market data? (dividends, stock splits, rights issue…)
——— resilience, recovery, reliability, capacity
Q: if a security is lightly traded, how do you know if you have missed some updates or there’s simply no update on this security?
Q25: When would your exchange send a reset?
Q25b: How do you handle resets?
Q26[D]: How do you start your feed handler mid-day?
Q26b: What are the concerns?
Q[D]: how do you handle potential data loss in multicast?
Q[D]: how do you ensure your feed handler can cope with a burst in messages in TCP vs multicast?
Q[D]: what if your feed handler is offline for a while and missed some messages?
Q21: Do you see gaps in sequence numbers? Are they normal?
Q21b: How is your feed handler designed to handle them?
Q[D]: does your exchange have a primary and secondary line? How do you combine both?
Q[D]: between primary and secondary channel, if one channel has lost data but the other channel did receive the data, how does your system combine them?
Q[D]: how do you use the disaster recovery channel?
Q[D]: If exchange has too much data to send in one channel, how many more channels do they usually use, and how do they distribute the data among multiple channels?
Q: is there a request channel where you send requests to exchange? What kind of data do you send in a request?
Q: Is there any consequence if you send too many requests?
Q: what if some of your requests appear to be lost? How does your feed handler know and react?

[D=design question]

python % string formatting #padding

发表于1月 28, 20183月 16, 2019 作者 BinTAN

Many details I tend to forget:

the actual data can be either a tuple or dict. The dict version is powerful but less known
- if there’s just one item in the tuple, you don’t need to parenthesis — myStr = “%d” % var1
for a tuple, the format specifier count must match the tuple length
for a dict, each format specifier must name a valid key value.

myStr = "%(var1)d" % locals()) # locals() returns a dict including var1

There are at least two q(%) in the above expression
extra parentheses after the 2nd % are required:

“#%03d” % (id(node)%1000) # return last 3 digit of object id, with 0-padding

coreJava^big-data java job #XR

发表于1月 28, 20183月 13, 2019 作者 BinTAN

In the late 2010’s, Wall street java jobs were informally categorized into core-java vs J2EE. Nowadays “J2EE” is replaced by “full-stack” and “big-data”.

The typical core java interview requirements have remained unchanged — collections, threading, JVM tuning, compiler details (including keywords, generics, overriding, reflection, serialization ), …, but relatively few add-on packages.

(With the notable exception of java collections) Those add-on packages are, by definition, not part of the “core” java language. The full-stack and big-data java jobs use plenty of add-on packages. It’s no surprise that these jobs pay on par with core-java jobs. More than 5 years ago J2EE jobs, too, used to pay on par with core-java jobs, and sometimes higher.

My long-standing preference for core-java rests on one observation — churn. The add-on packages tend to have a relatively short shelf-life. They become outdated and lose relevance. I remember some of the add-on

Hadoop, Spark
functional java
SOAP, REST
GWT
NIO
Protobuf, json
Gemfire, Coherence, …
ajax integration
JDBC
Spring
Hibernate, iBatis
EJB
JMS, Tibco EMS, Solace …
XML-related packages (more than 10)
Servlet, JSP
JVM scripting including scala, groovy, jython, javascript@JVM… (I think none of them ever caught on outside one or two companies.)

None of them is absolutely necessary. I have seen many enterprise java systems using only one or two of these add-on packages.

2 uses@q[del] ] python

发表于1月 27, 20183月 17, 2019 作者 BinTAN

https://stackoverflow.com/questions/6146963/when-is-del-useful-in-python

Usage 1: in-place delete from a container — dict and list (usually inefficient)
- You can also slice-delete from a list
- You can even del mylist[start:end:stride]
Usage 2: remove a local(or global) name such as MyClass, myVar
- the name is erased from the symbol table
- dir() will no longer show it
- usage? metaprogramming might use runtime introspection on dir()

q[+=] Yes, but no q[++] in python #str

发表于1月 27, 20183月 17, 2019 作者 BinTAN

q(+=) is well supported for number , timedelta and …. STRING

q(++) is unsupported, so you would need

myInt += 1

bash: iterate over an integer range

发表于1月 25, 20181月 25, 2018 作者 BinTAN

https://stackoverflow.com/questions/169511/how-do-i-iterate-over-a-range-of-numbers-defined-by-variables-in-bash

c++enum GTD tips: index page

发表于1月 25, 20186月 14, 2020 作者 BinTAN

cast between int and enum? — static_cast<MyEnum>(age); static_cast<int>(color>
assignment between enum and int variable? cast needed
enum type is often defined “nested” as a class member
print enum as a character — C++
c++user-defined constant: enum^const static
clever use of enum(char?) in demanding c++ app
how to avoid virtual functions — use enum; CRTP
switch on enum? OK

choose python^c++for cod`IV

发表于1月 24, 20183月 17, 2019 作者 BinTAN

1) Some hiring teams have an official policy — focus on coding skills and let candidate pick any language they like. I can see some interviewers are actually language-agnostic.

2) 99% of the hiring teams also have a timing expectation. Many candidates are deemed too slow. This is obvious in online coding tests. We all have failed those due to timing. (However, I guess on white-board python is not so much faster to code.)

If these two factors are important to a particular position, then python is likely better than c++ or java or c#.

Your code is shorter and easier to edit. No headers to include.
Compiler errors are shorter
c++ pointer or array indexing errors can crash without error message. Dynamic languages like python always give an error message.
STL iterator can become end() or invalidated. Result would often look normal, so we can spend a long time struggling a hidden bug.
Edit-~~Compile~~-Test cycle is reduced to Edit-Test
no uninitialized variables
Python offers some shortcuts to tricky c++ tasks, such as
1. string: split,search, and many other convenient features. About 1/3 of the coding questions require non-trivial string manipulation.
2. vector (a.k.a List): slicing, conversion(to string etc), insertion, deletion…, are much faster to write. Shorter code means lower chance of mistakes
3. For debugging: Easy printing of vector, map and nested containers. No iteration required.
4. easy search in containers
5. iterating over any container or iterating over the characters of a string — very easy. Even easier than c++11 for-loop
6. Dictionary lookup failure can return a default value
7. Nested containers. Basically, the more complex the data structure , the more time-saving python is.
8. multiple simultaneous return values — very very easy in python functions.
9. a python function can return a bool half the times and a list otherwise!

If the real challenge lies in the algorithm, then solving it in any language is equally hard, but I feel timed coding tests are never that hard. A Facebook seminar presenter emphasized that across tech companies, every single coding problem is always, always solvable within the time limit.

hackerrack IV: balanced braces

发表于1月 24, 20187月 6, 2019 作者 BinTAN

Every char in a given long string is one of the 6 unique characters {}[](). Write a bool isStrictlyBalanced(string)

Note {[}] is not strictly balanced.

I wrote and tested a c++ stack/map solution within 30 minutes

hackerrank IV: full query

发表于1月 24, 20187月 6, 2019 作者 BinTAN

/*requirement: given 99 sentences and 22 queries, check each query against all 99 sentences. All the words in the query must show up in the sentence to qualify.
*/
bool check(string & q, string & sen){
    istringstream lineStream(q);
	string qword;
    while(getline(lineStream, qword, ' '))
	  if ( string::npos == sen.find(qword)) return false;
	return true;
}
void textQueries(vector <string> sentences, vector <string> queries) {
  for (string & qr: queries){
	string output1query;
    for(int i=0; i<sentences.size(); ++i){
	  if (check(qr, sentences[i]))	    output1query.append(to_string(i)+" ");
	}
	if (output1query.empty()) output1query = "-1";
    cout<<output1query<<endl;	 
  }
}
int main(){
  vector<string> sen={"a b c d", "a b c", "b c d e"};
  vector<string> que={"a b", "c d", "a e"};
  textQueries(sen, que);
}

git | list untracked files

发表于1月 23, 20184月 10, 2019 作者 BinTAN

git clean -n #dry-run

git ls-files -o #also works but harder to remember

I still don’t know how to list all files each with a status like uncommitted/untracked/committed

multiple inheritance !! always trouble-maker

发表于1月 22, 20186月 22, 2019 作者 BinTAN

Many interviewers ask about MI.

Both java and c# officially support MI via interface inheritance
python supports MI but seldom used in my projects
c++ MI is safe if superclasses are protocol classes , just like java/c#.
c++ MI is OK if the superclasses do not conflict, and subclass is final, preventing the diamond problem
[[effC++]] also says it’s OK to publicly subclass an interface and private subclass a concrete class. I don’t fully understand it and not widely used IMHO
[[Alaxandrescu]] pointed out a combination of MI+TMP is essential for library designs. STL doesn’t use it, but ETSFlow does.

c++protocol class==a java interface

发表于1月 22, 20185月 5, 2019 作者 BinTAN

Python uses “protocol” to mean something else.

[[EffC++]] P 201 has a succinct definition for a protocol class

no field
no ctor
(not pure) virtual dtor
all methods are pure virtual

%%lead ] theoretical+lowLevel IV topics

发表于1月 22, 20184月 25, 2018 作者 BinTAN

See also the marketable_xp spreadsheet… I have consistently demonstrated strength in

1) TT: theoretical complexity: %%strength
2) LL: lowLevel IV topics (seldom needed in GTD) — threads, dStruct, vptr, language rules…

So how could TT/LL influence my 10Y career direction?

research domain? benefits from TT
mkt risk? TT
mkt data? LL
algo trading? Too competitive and poor market depth
quant dev? my TT advantage is not enough .. Too competitive and poor market depth
network optimization? My LL advantage is not enough
app owner? No. Not benefiting from my strengths and I tend to lose interest quickly

c++GC interface

发表于1月 22, 20188月 11, 2019 作者 BinTAN

https://stackoverflow.com/questions/27728142/c11-what-is-its-gc-interface-and-how-to-implement

GC interface is partly designed to enable

reachability-based leak detectors
garbage collection

The probe program listed in the URL shows that as of 2019, all major compilers provide trivial support for GC.

Q: why does c++ need GC, given RAII and smart pointers?
A: system-managed automatic GC instead of manual deallocation, without smart pointers

pass in func name^func ptr^functor object

发表于1月 22, 20189月 12, 2019 作者 BinTAN

Let’s set the stage — you can either pass in a function name (like myFunc), or a ptr to function (&myFunc) or a functor object. The functor is recommended even though it involves more typing. Justification — inlining. I remember a top c++ expert said so.

I believe “myFunc” is implicitly converted to & myFunc by compiler, so these two forms are equivalent.

3gradual changes]SG job market #cautious optimism

发表于1月 22, 20186月 6, 2020 作者 BinTAN

c++ (and c#) is /conceding/ market share to java, partly due to the two categories above. Apparently, java is growing more dominant than before. I guess java is more proven, better supported, by a bigger ecosystem and have bigger talent pool. In contrast, c++ skill is harder to find in Singapore?
1. Overall good news for me since my java arm is still stronger than c++ arm
remote hiring — more Singapore teams are willing to hire from overseas. Lazada said “mostly over skype”
Many non-finance companies now can pay 150k base or higher for a senior dev role. In my 2015 job search, I didn’t find any
Many smaller fintech companies (not hedge funds) can pay 150k base or higher
contracts becoming slightly more common
lighter-blue-collar — programmer used to be blue-collar support staff for the revenue staff. Some of the companies listed above treat programmers as first-class citizens.

Reality warning — every time I try the SG job market, recruiters would tell me there are so many “new” employers or new markets, but invariably, i need to focus on the old guards .. mostly ibanks, as the new market is not open to me. This is similar to Deepak, Shanyou trying the wall St c++ job market.

I must stop /romanticizing/ about the “improvement” in SG job market.

Singapore tech shops are mostly not keen about my profile. U.S.? Not sure.
Singapore fintech shops ? zero interest shown, even when I asked 150k
Singapore buy-sides are interested but way too selective and kinda slow.
Note except GS I didn’t try the ibank jobs this time round.

Basically no change in the landscape since 2015. The jobs available to me are mostly ibanks. Cherish the MLP job but beware attachment. If this job goes sour, I would have to consider WallSt, rather than another perm job in SG.

char* {-cast-} const-char*

发表于1月 22, 20181月 30, 2018 作者 BinTAN

From ptr-to-const-char, you need an explicit cast to ptr-to-cast, removing constness.

The reverse needs no cast — the language/compiler automatically converts a regular ptr-to-char to a ptr-to-const-char. Why? I think it’s relevant to consider the rationale behind

int i=5;
int const ci = i; // no cast needed.

How about smart pointers?

Y allocate static field in .c file %%take

发表于1月 22, 20187月 7, 2019 作者 BinTAN

why do we have to define static field myStaticInt in a cpp file?

For a non-static field myInt, the allocation happens when the class instance is allocated on stack, on heap (with new()) or in global area.

However, myStaticInt isn’t take care of. It’s not on the real estate of the new instance. That’s why we need to declare it in the class header, and then define it exactly once (ODR) in a cpp file. It is allocated at compile time — static allocation.

RVO^move : on return value

发表于1月 21, 20188月 23, 2019 作者 BinTAN

Let’s set the stage. A function returns a local Trade object “myTrade” by value. Will RVO kick in or move-semantic kicks in? Not both!

I had lots of confusions about these 2 features.[[effModernC++]] P176 has a long discussion and an advice — do not write std::move() hoping to “help” compiler on a local object being returned from a function

If the local object is eligible for RVO then all compilers would elide the copy. Your std::move() would hinder the compiler and back fire
if the local object is ineligible for RVO then compiler are required to return an rvalue object, often implicitly using st::move(), so your help is unneeded.
- Note local object returned by clone is a naturally-occurring temp object.

P23 [[c++stdLib]] gave 2-line answer:

if Trade class has a suitable copy or move ctor, then compiler may choose to “elide the copy”. This was long implemented as RVO optimization in most compilers before c++11. https://github.com/tiger40490/repo1/blob/cpp1/cpp/rvr/moveOnlyType_pbvalue.cpp is my experiment.
otherwise, if Trade class has a move ctor, the myTrade object is robbed

So if condition for RVO is present, then most likely your move-ctor will NOT run.

live updated hitCount over last5s#presumably Indeed

发表于1月 21, 20184月 4, 2019 作者 BinTAN

I hit a similar question in NY, possibly LiquidNet or CVA

Q: Make a system (perhaps a function?) that returns the average number of hits per minute from the past 5 minutes.

I will keep things simple by computing the total hit over the last 300 seconds. (Same complexity if you want average order amount at Amazon over last 5 minutes.)

Let’s first build a simple system before expanding it for capacity.

Let’s first design a ticking system that logs an update every time there’s an update. The log can be displayed or broadcast like a “notice board”, or we can update a shared atomic<int>.

Whenever we get a new record (a hit), we save it in a data structure stamped with an expiry date (datetime). At any time, we want to quickly find the earliest unexpired record i.e. the blue record. There’s only one blue at any time.

What data structure? RingBuffer with enough capacity to hold the last 5 minutes worth of record.

I will keep the address of the current blue record which is defined as the earliest unexpired record in the last update. When a new record comes in, i check “Is the blue expired?” If NO, then easy.. this new record is too close to the last new record. I simply update my “notice board” in O(1). If YES then we run a binary search for the new blue. Once we find it, we have to compute a new update in O(W), where W is the minimum of two counts, A) recently expired records B) still unexpired records. After the update, we remove the expired items from our data structure.

–That concludes my first design. Now what if we also need to update the notice board even when there is no new record?

I would need an alarm set to the expiry time of the current blue.

–Now what if the updates are too frequent? I can run a schedule update job. I need to keep the address of a yellow record, defined as the newest record of the last update.

When triggered, routine is familiar. I check “Is the blue expired?” If NO then easy… If YES then binary-search for the new blue.

c++compiler select`move-ctor

发表于1月 21, 20188月 23, 2019 作者 BinTAN

This is a __key__ part of understanding move-semantics, seldom quizzed. Let’s set the stage:

you overload a traditional insert(Amount const &) with a move version insert(Amount &&)
- by the way, Derrick of TrexQuant introduced a 3rd alternative emplace()
without explicit std::move, you pass in an argument into insert()
(For this example, I want to keep things simple by avoid constructors, but the rules are the same.)

Q1: When would the compiler select the rvr version?

P22 [[c++stdLib]] has a limited outline. Here’s my illustration

if I pass in a temporary like insert(originalAmount + 15), then this argument is a rvalue obj so the rvr version is selected
if I pass in a regular variable like insert(originalAmount), then this argument is an lvalue obj so the traditional version is selected
… See also my dedicated blogpost in c++11overload resolution TYPE means..

After we are clear on Q1, we can look at Q2

Q2: how would std::move help?
A: insert(std::move(originalAmount)); // if we know the object behind originalAmount is no longer needed.

https://github.com/tiger40490/repo1/blob/cpp1/cpp1/rvrDemo.cpp shows when we need to use std::move() and when we don’t need.

Q:Just when do App(!! lib)devs write std::move

发表于1月 21, 20187月 7, 2020 作者 BinTAN

I feel move ctor (and move-assignment) is extremely implicit and “in-the-fabric”. I don’t know of any user function with a rvr parameter. Such a function is usually in some library. Consequently, in my projects I have not seen any user-level code that shows “std::move(…)”

Let’s look at move ctor. “In the fabric” means it’s mostly rather implicit i.e. invisible. Most of the time move ctor is picked by compiler based on some rules, and I have basically no influence over it.

https://github.com/tiger40490/repo1/blob/cpp1/cpp1/rvrDemo.cpp shows when I need to call move() but it’s a contrived example — I have some object (holding a resource via heap pointer), I use it once then I don’t need it any more, so I “move” its resource into a container and abandon the crippled object.

Conclusion — as app developers I seldom write code using std::move.

P20 [[c++ std lib] shows myCollection.insert(std::move(x)); // where ~~x is a local nonref variable, not a heap pointer~~!
- in this case, we should provide a wrapper function over std::move() named getRobberAliasOf()
- I think you do this only if x has part of its internal storage allocated on heap, and only if the type X has a move ctor.

I bet that most of the time when an app developer writes “move(…)”, she doesn’t know if the move ctor will actually get picked by compiler. Verification needed.

— P544 [[c++primer]] offers a “best practice” — Outside of class implementations (like big4++), use std::move only when you are certain that you need to do a move and it is guaranteed safe.

Basically, the author believes user code seldom needs std::move.

— Here’s one contrived example of app developer writing std::move:

string myStr=input;
vectorOfString.push_back(std::move(myStr)); //we promise to compiler we won’t use myStr any more.

Without std::move, a copy of myStr is constructed in the vector. I call this a contrived example because

if input is a char-array, then emplace_back() is more efficient
if input is another temp string, then we can simply use push_back(input), which would bind to the rvr overload anyway.

c++11 atomic{int}^AtomicInteger.java #Bool can be half-written

发表于1月 21, 20188月 23, 2019 作者 BinTAN

The #1 usage of atomic<int> is load() and store(). I will use short form “load/store” or “l/s”.

operator++() also works correctly and is singled out by Scott Meyers as a common usage. See also atomic{int} offers operator+=()

The #2 usage is CAS. Interviewers are mostly interested in this usage, though I won’t bother to remember the function names —
* compare_exchange_strong()
* compare_exchange_week()

The CAS usage is same as AtomicInteger.java, but the load/store usage is more like the thread-safety feature of Vector.java. To see the need for load/store, we need to realize the simple “int” type assignment is not atomic [1]:

P1012 [[c++ standard library]] shocked me by affirming that without locks, you can read a “half-written Boolean” [1].

To solve this problem, atomic<int> uses internal locks (just like Vector.java) to ensure load() and store() is always atomic.

[1] different from java. https://stackoverflow.com/questions/11459543/should-getters-and-setters-be-synchronized points out that 32-bit int in java is never “half-written”. If you read a shared mutable int in java, you can hit a stale value but never a c++ style half-written value. Therefore, java doesn’t need guarded load()/store() functions on an integer.

Q: are these c++ atomic types lock-free?
A: for load/store — not lock-free. See P 1013
A: for CAS — lock-free CPU instructions are used, if available.

c++QQ/zbs Expertise: I got some

发表于1月 21, 20188月 11, 2019 作者 BinTAN

As stated repeatedly, c++ is the most complicated and biggest language used in industry, at least in terms of syntax (tooManyVariations) and QQ topics. Well, I have impressed many expert interviewers on my core-c++ language insight.

That means I must have some expertise in c++ QQ topics. For my c++ zbs growth, see separate blog posts.

Note socket, shared mem … are c++ ecosystem, like OS libraries.

Deepak, Shanyou, Dilip .. are not necessarily stronger. They know some c++ sub-domains better, and I know some c++ sub-domains better, in both QQ and zbs.

–Now some of the topics to motivate myself to study

malloc and relatives … internals
enable_if
email discussion with CSY on temp obj
UDP functions

trySomethingNew : avoid perm jobs@@

发表于1月 21, 20184月 21, 2018 作者 BinTAN

There’s a high risk of under-performing. In a perm job, that invites warning, perf improvement, bonus fear — all forms of stigma-phobias.

With contract jobs, I can operate without the fear of stigma!

Here, “under-performing” mostly refers to “figure-things-out slower than team peers”, which (usually but) doesn’t always attracts those stigmas. Ultimately it’s the manager’s assessment.

Stirt/Quartz – For example, my figure-things-out speed was not slower than my peers and not slower than Barcap 2nd half, but still i got the stigma.

Citi — for an opposite example, my figure-things-out speed was rather slow but I didn’t get the stigma. I got renewed once.

making out a little-endian 32-bit int

发表于1月 20, 20183月 12, 2018 作者 BinTAN

Let’s set the stage — We have a stream of bytes in little-endian format. Let’s understand it according to the spec. The struct is packed tight without padding.

Spec says left-most field is a char. It is always on the left-most, regardless of endianness. If we look at the 8 bits, they are normal. 0x41 is ‘A’. Within the 8 bits, no reordering due to endianness.

Spec says next four bytes is an integer. Most significant bit (suppose a one) is on the right end, representing 2^31. What’s the integer value? To work out by hand, we need to pick the four bytes as is — Byte1 Byte2 Byte3 Byte4. Then we reverse them into Byte4 Byte 3 Byte2 Byte1. Now this 32-bit integer is human-readable. The human-readable form is now a binary number taught in classrooms.

Note the software program still uses the original “Byte1 Byte2 Byte3 Byte4” and can print out the correct integer value.

Spec says next four bytes is a float. There’s nothing I can do to make out its value without a computer, so I don’t bother to rearrange the bytes.

Next 2 bytes is a string like “XY”. First byte is “X”. Endian-ness doesn’t bother us.

paste commented code into q[ vi ]

发表于1月 20, 20184月 3, 2019 作者 BinTAN

vim by default would mess up the pasting, by inserting // before every line after the first comment.

:set paste

was able to disable this default behavior for me, but this command has side effects, so I usually turn it off immediately:

:set nopaste

https://vim.fandom.com/wiki/Toggle_auto-indenting_for_code_paste has some details also has link to official documentation.

posix^SysV-sharedMem^MMF

发表于1月 19, 20189月 27, 2020 作者 BinTAN

http://www.boost.org/doc/libs/1_65_0/doc/html/interprocess/sharedmemorybetweenprocesses.html#interprocess.sharedmemorybetweenprocesses.sharedmemory.xsi_shared_memory points out

Boost.Interprocess provides portable shared memory in terms of POSIX semantics. I think this is the simplest or default mode of Boost.Interprocess. (There are at least two other modes.)
Unlike POSIX shared memory segments, SysV shared memory segments are not identified by names but by ‘keys’. SysV shared memory mechanism is quite popular and portable, and it’s not based in file-mapping semantics, but it uses special system functions (shmget, shmat, shmdt, shmctl…).
We could say that memory-mapped files offer the same interprocess communication services as shared memory, with the addition of filesystem persistence. However, as the operating system has to synchronize the file contents with the memory contents, memory-mapped files are not as fast as shared memory. Therefore, I don’t see any market value in this knowledge.

how I achieved%% ComfortableEconomicProfile by44

发表于1月 19, 20186月 27, 2020 作者 BinTAN

(I want to keep this blog in recrec, not tanbinvest. I want to be brief yet incisive.)

See 3 ffree scenarios: cashflow figures. What capabilities enabled me to achieved my current Comfortable Economic profile?

— top 3
by earning SGP citizenships
by developing my own investment strategies, via trial-n-error
by staying healthy
— the obvious
by high saving rate thanks to salary + low burn rate — efficiency inspired by SG gov
by consistent body-building with in-demand skills -> job security. I think rather few of my peers have this level of job security. Most of them work in one company for years. They may be lucky when they need a new job, but they don’t have my inner confidence and level of control on that “luck”. Look at Y.W.Chen. He only developed that confidence/control after he changed job a few times.

When I say “Comfortable” I don’t mean “above-peers”, and not complete financial freedom, but rather … ~~easily affordable lifestyle without the modern-day pressure to work hard and make a living~~. In my life there are still too many pressures to cope with, but I don’t need to work so damn hard trying to earn enough to make ends meet.

A higher salary or promotion is “extremely desirable” but not needed. I’m satisfied with what I have now.

I can basically retire comfortably.

a typical speed-coding test

发表于1月 19, 20185月 14, 2019 作者 BinTAN

parent/child pairs→tree algos #Indeed

have used Briefly : sharedMem/lockfree/..

发表于1月 19, 20186月 20, 2019 作者 BinTAN

Just like my early learning curve in sockets, Dynamic Programming and swing, I have yet to achieve a breakthrough in these topics, So there are too many topics and I don’t know what to focus on.

It’s important not to exaggerate your expertise in these areas. Once interviewers find out your exaggeration, subconscious they would discount other parts of your resume.

c++ lock-free — “Used in my project but not written by me”
Shared mem
Boost::*
Epoll
Multiple inheritance
Pyton multiprocessing

implement an exchange #Trex Kenny

发表于1月 19, 20185月 25, 2018 作者 BinTAN

Q: create an exchange with messaging for NewOrderSingle, ExecutionReport etc. (I think interviewer means the matching server.)

need to support restart in any client or the exchange itself.
(Use ack or other techniques) basic reliability such as
- Client needs to know for sure if new order is received.
- Exchange needs to ensure execution report is received.
Please write c++ code, and compile it if possible. One hour given.
no need to send out market data to subscribers — not the focus

POSIX^SysV sempaphores

发表于1月 19, 20186月 22, 2020 作者 BinTAN

https://www.ibm.com/developerworks/library/l-semaphore/index.html — i have not read it.

My [[beginning linux programming]] book also touches on the differences.

I feel this is less important than the sharedMem topic.

The posix semaphore is part of pthreads i.e. Posix Threads
The sysV semaphore is part of IPC and often mentioned along with sysV sharedMem

The counting semaphore is best known and easy to understand.

The pthreads semaphore can be used this way or as a binary semaphore.
The system V semaphore can be used this way or as a binary semaphore. See http://portal.unimap.edu.my/portal/page/portal30/Lecturer%20Notes/KEJURUTERAAN_KOMPUTER/SEM10809/EKT424_REAL_TIME_SYSTEM/LINUX_FOR_YOU/12_IPC_SEMAPHORE.PDF

Linux manpage pointed out — System V semaphores (semget(2), semop(2), etc.) are an older semaphore API. POSIX semaphores provide a simpler, and better designed interface than System V semaphores; on the other hand POSIX semaphores are less widely available (especially on older systems) than System V semaphores.

The same manage implies both APIs use a _counting_ semaphore semantic, without notification semantics

git | backup b4 history rewrite

发表于1月 19, 20183月 21, 2019 作者 BinTAN

Personal best practice. Git History rewrite is not always no-brainer and riskless. Once in 100 times it can become nasty.

It should Never affect file content, so at end of the rewrite we need to diff against a before-image to confirm no change.

The dumbest (and most foolproof) before-image is a zip of entire directory but here’s a lighter alternative:

git branch b4rewrite/foo
git reset or rebase or cherry-pick
git diff b4rewrite/foo

Note the branch name can be long but always explicit so I can delete it later without doubt.

posix sharedMem: key points { boost

发表于1月 17, 20186月 2, 2018 作者 BinTAN

http://www.boost.org/doc/libs/1_65_0/doc/html/interprocess/sharedmemorybetweenprocesses.html#interprocess.sharedmemorybetweenprocesses.sharedmemory.shared_memory_steps is excellent summary

* We (the app developer) need to pick a unique name for the shared memory region, managed by the kernel.

* we can use create_only, open_only or open_or_create

* When we link (or “attach” in sysV lingo) App1’s memory space to the shared memory region, the operating system looks for a big enough memory address range in App1’s address space and marks that address range as an special range. Changes in that address range are automatically seen by App2 that also has mapped the same shared memory object.

* As shared memory has kernel or filesystem persistence, we must explicitly destroy it.

Above is the posix mode. The sysV mode is somewhat different.

ranged-based for-loop over a map or vector

发表于1月 17, 20189月 14, 2018 作者 BinTAN

Hi Kam,

Thanks for your valuable tip. I found this 3-point summary on https://stackoverflow.com/questions/15176104/c11-range-based-loop-get-item-by-value-or-reference-to-const

1) Choose for(auto x : myVector) when you want to work with copies.
2) Choose for(auto &x : myVector) when you want to work with original items and may modify them.
3) Choose for(auto const &x : myVector) when you want to work with original items and will not modify them.

In the case of myMap, the first form for(auto myPair: myMap) ~~still clones the pairs~~ from myMap. To use a reference instead of a clone, we need the 2nd or 3rd forms.

empty sys.exit() to end program from any func

发表于1月 16, 20183月 17, 2019 作者 BinTAN

In a timed coding test, I think it saves precious time to do

  if some_condition: print some_result; sys.exit(). # can use in any function :)

If the pre-exit routine is needed at several exit-points in the program flow, then extract the routine as a function that ends in sys.exit()

Note The exit() function is completely irrelevant.

nearest-rank percentile definition

发表于1月 16, 20184月 3, 2019 作者 BinTAN

https://en.wikipedia.org/wiki/Percentile#The_nearest-rank_method has a few concise pointers

a percentile value is always an original object, never interpolated
we can think of it as a mapping function ~~percentile(int100) -> an original object~~, where int100 data type can be a value from 1 to 100 (zero disallowed). Two distinct inputs can map to the same output object.
100th percentile is always the largest object in the list. No such thing as 0th. 1st percentile is not always the smallest object.

python: value@ECT+syntax > deep insight

发表于1月 15, 20184月 13, 2019 作者 BinTAN

c++ interviews value deep insight more than any language. Java and c# interviews also value them highly, but not python interviews.

Reminder — zoom in and dig deep in c++, java and c# only. Don’t do that in python too much.

Instead of deep insight, accumulate ECT syntax … highly valued in TIMED coding tests.

Use brief blog posts with catchy titles

depth-first-traversal, height-aware

发表于1月 15, 20187月 17, 2019 作者 BinTAN

Latest: https://github.com/tiger40490/repo1/blob/cpp1/cpp1/binTree/DFT_show_level.cpp


struct Node {
    int data;
    Node *left, *right, *next;
    Node(int x, Node * le = NULL, Node * ri = NULL) : data(x), left(le), right(ri), next(NULL) {}
};
    Node _15(15);
    Node _14(14);
    Node _13(13);
    Node _12(12);
    Node _11(11);
    Node _10(10);
    Node _9(9);
    Node _8(8);
    Node _7(7, &_14, &_15);
    Node _6(6, NULL, &_13);
    Node _5(5, &_10, NULL);
    Node _4(4, NULL, &_9);
    Node _3(3, &_6,  &_7);
    Node _2(2, &_4,  &_5);
    Node root(1, &_2, &_3);

int maxD=0;
void recur(Node * n){
  static int lvl=0;
  ++lvl;
  if (lvl>maxD) maxD = lvl;
  if (n->left){ recur(n->left); }
  cout<<n->data<<" processed at level = "<<lvl<<endl;
  if (n->right){ recur(n->right); }
  --lvl;
}
int maxDepth(){
    recur(&root);
    cout<<maxD;
}
int main(){
   maxDepth();
}

p2p messaging beats MOM ] low-latency trading

发表于1月 14, 201810月 11, 2019 作者 BinTAN

example — RTS exchange feed dissemination infrastructure uses raw TCP and UDP sockets and no MOM

example — the biggest sell-side equity OMS network uses MOM only for minor things (eg?). No MOM for market data. No MOM carrying FIX order messages. Between OMS nodes on the network, FIX over TCP is used

I read and recorded the same technique in 2009… in this blog

Q: why is this technique not used on west coast or main street ?
%%A: I feel on west coast throughput outweighs latency. MOM enhances throughput.

after fork(): threads,sockets.. #Trex

发表于1月 14, 20186月 6, 2020 作者 BinTAN

I have read about fork() many times without knowing these details, until Trex interviewer asked !

–based on http://man7.org/linux/man-pages/man2/fork.2.html

The child process is created with a single thread—the one that called fork(). The entire virtual address space of the parent is replicated in the new process, including the states of pthread mutexes, pthread condition variables, and other pthreads objects In particular, if in parent process a lock was held by some other thread t2, then child process only has the main thread (which called fork()) and no t2 but the lock is still unavailable. This is a common problem, addressed in http://poincare.matf.bg.ac.rs/~ivana/courses/ps/sistemi_knjige/pomocno/apue/APUE/0201433079/ch12lev1sec9.html.

The very 1st instruction executed in Child is the instruction after fork() — as proven in https://github.com/tiger40490/repo1/blob/cpp1/cpp1/fork3times.cpp

The child inherits copies of the parent’s set of open file descriptors, including stdin/stdout/stderr. Child process should usually close them.

Special case — socket file descriptor inherited. See https://bintanvictor.wordpress.com/2017/04/29/socket-shared-between-2-processes/

gdb to show c++thread wait`for mutex/condVar

发表于1月 14, 20183月 21, 2019 作者 BinTAN

https://github.com/tiger40490/repo1/blob/cpp1/cpp/thr/pthreadCondVar.cpp shows my experiment using gdb supplied by StrawberryPerl.

On this g++/gdb set-up, “info threads” shows thread id number 1 for main thread, “2” for the thread whose pthread_self() == 2 … matching 🙂

The same “info-threads” output also shows

one of the worker threads is executing sleep() while holding lock (by design)
the other worker threads are all waiting for the lock.
At the same time, the main thread is waiting in a conditional variable, so info-threads shows it executing a different function.

c++log current fileName+lineNo via macro

发表于1月 14, 20187月 20, 2018 作者 BinTAN

stringstream ss; ss<<“before processMessage(), at “<<__FILE__<<“:”<<__LINE__;

multicast: IV care only about bookish nlg !!practical skills

发表于1月 13, 20184月 15, 2018 作者 BinTAN

Hi friends,

I recently used multicast for a while and I see it as yet another example of the same pattern — technical interviewers care about deep theoretical knowledge not practical skills.

Many new developers don’t know multicast protocol uses special IP addresses. This is practical knowledge required on my job, but not asked by interviewers.

Unlike TCP, there’s not a “server” or a “client” in a multicast set-up. This is practical knowledge in my project but not asked by interviewers.

When I receive no data from a multicast channel, it’s not obvious whether nobody is sending or I have no connectivity. (In contrast, with TCP, you get connection error if there’s no connectivity. See tcp: detect wire unplugged.) This is practical knowledge, but never asked by interviewers.

I never receive a partial message by multicast, but I always receive partial message by TCP when the message is a huge file. This is reality in my project, but never asked by any interviewer.

So what do interviewers focus on?

packet loss — UDP (including multicast) lacks delivery guarantee. This is a real issue for system design, but I seldom notice it.
higher efficiency than TCP — I don’t notice it, though it’s a true.
socket buffer overflow — should never happen in TCP but could happen in UDP including multiast. This knowledge is not needed in my project.
flow control — TCP receiver can notify sender to reduce sending speed. This knowledge is not needed in many projects.
non-blocking send/receive — not needed in any project.

So what can we do? Study beyond what’s needed in the project. (The practical skills used is only 10% of the interview requirements.) Otherwise, even after 2 years using multicast in very project, I would still look like as a novice to an interviewer.

Without the job interviews, it’s hard to know what theoretical details are required. I feel a multicast project is a valuable starting point to get me started. I can truthfully mention multicast in my resume. Then I need to attend interviews and study the theoretical topics.

python 2D array init #list !! tuple

发表于1月 13, 20183月 17, 2019 作者 BinTAN

>> li=[[-2]] # simplest
>> tu=((-2,),) # more keyboard work
>>> len(li[0])
1

https://github.com/tiger40490/repo1/blob/py1/py/2d/printDiagonally.py uses similar list syntax to populate a bigger matrix

breakdown heap/non-heap footprint@c++app #massif

发表于1月 13, 201811月 3, 2020 作者 BinTAN

After reading http://valgrind.org/docs/manual/ms-manual.html#ms-manual.not-measured, I was able to get massif to capture non-heap memory:

valgrind --tool=massif  --pages-as-heap=yes --massif-out-file=$massifOut .../xtap -c ....
ms_print $massifOut

Heap allocation functions such as malloc are built on top of system calls such as mmap, mremap, and brk. For example, when needed, an allocator will typically call mmap to allocate a large chunk of memory, and then hand over pieces of that memory chunk to the client program in response to calls to malloc et al. Massif directly measures only these higher-level malloc et al calls, not the lower-level system calls.

Furthermore, a client program may use these lower-level system calls directly to allocate memory. By default, Massif does not measure these. Nor does it measure the size of code, data and BSS segments. Therefore, the numbers reported by Massif may be significantly smaller than those reported by tools such as top that measure a program’s total size in memory.

q[java ecosystem]==jxee+tools+..

发表于1月 13, 20188月 11, 2019 作者 BinTAN

This classification helps me organize my java learning, but let’s not spend too much time on this imprecise concept —

So-called “java ecosystem” is anything outside the “core java” stack and include jxee plus ..

GC, JIT
JNI
swing, AWT
ajax integration
protobuf/json
tools: eclipse, Maven, CI tools,
tools: JDK bundled tools like jhat, visualvm

3players1coin #MS probability

发表于1月 13, 20182月 11, 2018 作者 BinTAN

Q: Three players A/B/C flipping a fair coin one after each other until the first head is thrown, What’s the probability of Alice winning.

I think the problem is the same if coin is biased P(H)=0.6

Denote Pr(Alice eventually wins) as x.

Pr(first 3 are TTT AND Alice eventually wins) = 1/8 * x

x = 1/2 + 1/8 * x —> x=4/7

hacking imported module #homemade trick

发表于1月 13, 20185月 5, 2019 作者 BinTAN

Background: Suppose in a big python application your main script imports a few packages and modules. One of them is mod2.py, which in turn imports mod2a.py.

Now You need to add invesgative logging/instrumentation to mod2a.py but this file is loaded from a readonly firm-wide repository, common practice in big teams. Here’s my tested technique:

1. clone mod2a.py to your home dir and add the logging. Now we need to import this modified version.
2. clone mod2.py to your home dir and open it to locate the importation of mod2a
3. edit mod2.py to change the importation of mod2a:

sys.path.insert(0, ‘/home/dir’)
import mod2a # via /home/dir
sys.path.remove(‘/home/dir’)

All other imports should be unaffected.

4. edit main.py and update the importation of mod2.py to load it too from /home/dir

As an additional hack, Some people may rename the modified mod2a.py file. This is doable IIF the import line is

from mod2a import someSymbol

Otherwise, every mention of “mod2a” in mod2.py needs a change

bashslash escape: bash tricky rules

发表于1月 12, 20183月 23, 2019 作者 BinTAN

This is about shell interpreting the backslash sequence inside single-quote or double-quote.

Once bash does its parsing, it can pass the result to a command like perl or grep.

----Most escape sequences don't care about single-quote vs double-quote
$ echo "msgType\t"
msgType\t

$ echo "msgType\b"
msgType\b

# \b is meaningful in perl regex 🙂

----double backslash -- single-quote is simpler than double-quote
$ echo 'msgType\\'
msgType\\

$ echo "msgType\\"
msgType\

----single quote within single-quoted string is very tricky:
$ echo 'msgType\'\' 
msgType\'

# in the above, the last \' is a second token, a single-char string.

$ echo $'msgType\''  # dollar sign is crucial
msgType'

$ echo 'msgType\'' # somehow doesn't work without $
>

microservices “MSA” #phrasebook

发表于1月 12, 20183月 16, 2019 作者 BinTAN

I feel MSA is more of a architect interview topic, not a developer interview topic. Dev complexity is low by design.

eg: error acct lookup, receiving productId + possibly a clientId, returning an error acct

Now the phrasebook:

jxee — As of 2019, I guess jxee has the best support for MSA
enterprise — enterprise-bias. Most of the practices used in SOA/MSA come from developers who have created software applications for large enterprise organizations.
SOA — is the ancestor and now out of fashion. I think MSA will also fall out of fashion.
stateless — stateless microservice is best. Can be highly concurrent and scaled out
scalability — hopefully better
decentralized — rather than monolithic
modularity
communication protocol — supposedly lightweight, but more costly than in-process communication
- http — is commonly used for communication. Presumably not asynchronous
- messaging — metaphor is often used for communication. I doubt there’s any MOM of message queue.
cloud-friendly — cheaper
flexible — in the face of changing requirements, though I’m not sure time-to-market will improve
simple-facade — (of a big monolithic service) is now replaced by more complex interface, so I suspect this is not always popular.
complexity — (various forms) is the public enemy but I don’t know which weapon (REST,SOA,ESB,MOM,Spring) actually works
in-process — services can be hosted in a single process, but less common
devops — is a driver
- testability — each service is easy to test, but not integration test
- loosely coupled — decentralized, autonomous dev teams
- deployment — is ideally independent for each service, and continuous, but overall system deployment is complicated

typedef as class member

发表于1月 12, 20183月 23, 2019 作者 BinTAN

member typedef (usually public) is widely used in the standard library, We should use it in place of “global” typedef as much as possible.
local typedef is also recommended .. https://stackoverflow.com/questions/10103453/is-typedef-inside-of-a-function-body-a-bad-programming-practice

show your best practice in coding tests!

contents to keep in .C rather than .H file

发表于1月 11, 20184月 4, 2019 作者 BinTAN

1) Opening example — Suppose a constant SSN=123456789 is used in a1.cpp only. It is therefore a “local constant” and should be kept in a1.cpp not some .H file. Reason?

The .H file may get included in some new .cpp file in the future. So we end up with multiple .cpp files dependent (at compile-time) on this .H file. Any change to the value or name of this SSN constant would require recompilation to not only a1.cpp but unnecessarily to other .cpp files 😦

2) #define and #include directives — should be kept in a1.cpp as much as possible, not .H files. This way, any change to the directives would only require recompiling a1.cpp.

The pimpl idiom and forward-declaration use similar techniques to speed up recompile.

3) documentation comments — some of these documentations are subject to frequent change. If put in .H then any comment change would trigger recompilation of multiple .cpp files

## mkt data: avoid byte-copying #NIO

发表于1月 10, 20185月 5, 2019 作者 BinTAN

I would say “avoid” or “eliminate” rather than “minimize” byte copying. Market data volume is gigabytes so we want and can design solutions to completely eliminate byte copying.

RTS uses reinterpret_cast but still there’s copying from kernel socket buffer to userland buffer.
- kernel bypass : possible usage ] RTS describes how to avoid that.
Java NIO buffers can remove the copying between JVM heap and the socket buffer in C library. See P226 [[javaPerf]]
java autoboxing is highly unpopular for market data systems. Use byte arrays instead
- java^c# generic: struct, boxing, erasure describes how c# generics addressed that

notepad++saving real-estate

发表于1月 10, 20182月 10, 2018 作者 BinTAN

Survival tip — Alt-t gets to Settings, if you need to unhide …

Motivation — most monitors are too “thin”, so the bars take up vertical space.

Menu bar? Better hide by default. can toggle using Alt or F10
Toolbar? Better hide
Document tab? Vertical is best

std::forward_list = slist

发表于1月 10, 20183月 21, 2019 作者 BinTAN

http://www.stroustrup.com/C++11FAQ.html#std-forward_list highlights the extreme space-efficiency

Not sure of the use cases but I suspect it is a choice in extremely space-efficient designs when arrays won’t work. Geek4geek mentions two use cases:

hash table
represent graph: matrix ^ edgeSset #forward_list

CMS JGC: deprecated in java9

发表于1月 10, 20184月 13, 2019 作者 BinTAN

Java9/10 default GC is G1. CMS is officially deprecated in Java 9.

Java8/7 default GC is ParallelGC, CMS. See https://stackoverflow.com/questions/33206313/default-garbage-collector-for-java-8

Note parallelGC uses

parallel in most generations
serial in old gen

…whereas parallelOldGC uses parallel in all generations.

Q: why is CMS deprecated?
A: one blogger seems to know the news well. He said JVM engineering team needs to focus on new GC engines and need to let go the most high-maintenance but outdated codebase — the CMS, As a result, new development will cease on CMS but CMS engine is likely to be available for a long time.

TrySomethingNew@lower salary: seldom worked:(

发表于1月 9, 20184月 8, 2019 作者 BinTAN

I feel ICE worked, because I quickly became confident with C++ GTD.

[19] attitude@jxee^coreJava

发表于1月 9, 20185月 19, 2019 作者 BinTAN

jxee (esp. web java) is fashionable … high growth, big job pool

–in terms of churn resistance and shelf life .. jxee << core java

Q: Is there some jxee component with stable demand and accu? Spring? Servlet is very relevant from 1999 to 2019 but not quizzed in IV!

–in terms of project LOE … core java =< jxee

Some developers are afraid of the unique challenges [1] in core java, but I’m more afraid of complexities in jxee packages esp. when combined in non-standard combinations. See my blogpost on python routine tasks and my blogpost on spring.

[1] threading, latency, collections .. but I don’t want to elaborate here.

–in terms of IV body building and entry barrier, jxee < core java < cpp. I struggled with cpp IV for years but was able to crack c# IV within 2 years.

Without enough evidence, I feel jxee skills are less elite, easy to self-study, and shallow until you hit project issues.

–In terms of salary, jxee = core java = cpp .. I was proven wrong! Even though c++ and core java IVs are arguably harder and more elitist, they don’t pay higher than jxee jobs.
–in terms of market depth+size, cpp < core java < jxee

core java is mostly limited to ibanks + buy-side. jxee presumably offers better market depth and breadth. Without enough evidence, I feel job pool is growing for jxee not for core java or cpp. Similarly, job pool is growing for javascript, mobile, big data, cloud..

No need to experiment at home or read books like I did on JMS, EJB, Spring. It takes too much time but doesn’t really give me …

latency QQ ]WallSt IV #java,c++..

发表于1月 9, 20183月 21, 2019 作者 BinTAN

Latency knowledge

is never needed on the job but … high Market Value
is not GTD at all but … is part of zbs
is not needed in py jobs
is needed in many c++ interview topics but .. in java is concentrated in JIT and GC
is an elite skill but … many candidates try
some depth is needed for IV and other discussions but … relatively low-complexity .. low-complexity topics #eg:GC/socket

Arrays.sort(primitiveArray) beats List.sort() #defaultMethod

发表于1月 9, 20187月 30, 2019 作者 BinTAN

In terms of sorting performance, Arrays.sort(primitiveArray) is a few times faster than Collections.sort() even though both are O(N logN). My learning notes:

Arrays.sort(int []) is a double-pivot quicksort, probably using random access
Arrays.sort(Object []) is a mergesort
Collections.sort(List) defers to List.sort()
- List.sort() is a Java8 default method in the List.java interface. It copies data to an array then runs a mergesort
- ArrayList.java overrides the default method, so no copying for ArrayList from java8 onwards

RandomAccess marker interface (ArrayList implements) is completely irrelevant. That’s because any List.java subtype that provides RandomAccess can simply override (at source code level) the default method as demonstrated in ArrayList.java. This is cleaner than checking RandomAccess at runtime. One or Both designs could potentially be JIT-compiled to remove the runtime check.

damaged-goods self-image .. due to codility!

发表于1月 9, 20186月 26, 2019 作者 BinTAN

One of the most devastating damaged-goods experiences was the layoff at baml.

However, if I did better at codility i would have transferred and I would not have felt like damaged goods!

Of course there are multiple contributing factors to "damaged goods", but in this case, codility is one.

ADL #namespace

发表于1月 9, 20186月 15, 2020 作者 BinTAN

“Put the function in the same namespace as the classes it operates on.” is a one-liner summary
If you want to write a function that needs only a class’s public interface – then that function doesn’t have to be a (static/non-static) member. The function can become a free function placed in the same name space as the class. This increases encapsulation and data hiding, and reduces coupling as only the public interface is needed.. P79 [[c++codingStd]]
It’s also an idiom/pattern to impress interviewers. The published experts all seem to view ADL as a library feature mainly for overloaded operators. Read cppreference and [[c++codingStd]]. I’m not interested in that usage.
boost seems to use a lot of ADL
I feel namespace loves ADL more than any other c++ feature 🙂
ADL is an endorsed, prized compiler feature but still with criticisms [1]. To eliminate the confusion, simply fully qualify the function call.

[1] https://stackoverflow.com/questions/8111677/what-is-argument-dependent-lookup-aka-adl-or-koenig-lookup has the best quickguide with simple examples.

https://softwareengineering.stackexchange.com/questions/274306/free-standing-functions-in-global-namespace is a short, readable discussion.

My annotations on the formal, long-winded definition — ADL governs look-up of unqualified function names in function-call expressions (including operator calls). These function names are looked up in the ~~namespaces of their arguments~~ in addition to the usual namespace-based lookup.

## VP jobs: I did get a few

发表于1月 8, 201812月 7, 2018 作者 BinTAN

SCB-FM
MS-Shanghai under Nitin. He clearly wanted to offer me the VP role.
baml-Ravi — promised a Chicago team lead. No interview needed.
baml-stirt
Mac — job was well above AVP in both salary and responsibility. Required real leadership
EMPworld — CTO

JDK + eclipse install: %%preference

发表于1月 8, 20185月 11, 2018 作者 BinTAN

C:/j8 or j801 # I prefer shorter path. I almost never have two versions of java on one machine
- There’s an embedded ./jre directory for a so-called “private JRE”
public JRE is optional and should be installed outside JDK directory, but I have not needed it since I started using java.
JAVA_HOME and PATH
—-eclipse: I prefer the simple zip file download
C:/ide/elcipse #worked OK
c++ide? can use c:/ide/eclipseCDT.

%% rehash table #Broadway

发表于1月 7, 20183月 9, 2018 作者 BinTAN

https://github.com/tiger40490/repo1/blob/py1/py/rehash_table.py

Showcase: simple link Node class with a __str__()

Showcase : populate a python list with None’s

github tips #email

发表于1月 7, 20187月 6, 2019 作者 BinTAN

q(git config credential.helper store) is the command that finally fixed my git-bash forgetting-password problem on my win7 Dell. The github “recommended” command only worked in my win10 Lenovo and office win7:
- ```
git config --global credential.helper wincred 
```
On Linux, I only needed to run a simple command to cache my credentials

$ git config credential.helper store
$ git push 
Username for 'https://github.com': <USERNAME>
Password for 'https://USERNAME@github.com': <PASSWORD>

to comply with email privacy,

git config --global user.email {ID}+{username}@users.noreply.github.com
You can find it in https://github.com/settings/emails

–to download an entire branch on the github web interface: CloneOrDownload button can download a zipfile
–to mass upload using drag-n-drop
I realize that the zipfile downloaded has a problem. When I open the zipfile (but not extract) and select the files, the drag-n-drop interface fails. I had to copy the files to a tmp directory.
–delete folder: not so easy
–file naming tips
- ^ is slightly less ideal
- I would use underscore as default and use dash sparingly (like 0-N). Camel case is even better.
–rename file @web interface is easy
–rename branch: no web interface. I had to …
- checkout branch locally,
- rename branch
- commit
- push, which triggered a login pop-up.
–rename folder: no web interface. I had to
- git mv dir1 com/xx/dir1 # close MSWE if you get perm error
- git commit
- git push

hot key; copy current line

发表于1月 7, 201810月 8, 2019 作者 BinTAN

First learn the keyboard accelerator to select current line.

Remember shift-HOME selects back till beginning; shift-END selects till the end.

HOME then shift-END

END then shift-HOME

c++static field init: rules

发表于1月 7, 20182月 21, 2018 作者 BinTAN

efficient swap(): two containers-of-T

发表于1月 6, 20189月 14, 2018 作者 BinTAN

Background — template function std::swap(T&, T&) works for int, float etc, but the same implementation will not work efficiently for vector, list, map or set. Therefore I suspected there might be specializations of swap() template function.

As it turns out, vector (and the other containers) provides a swap() member function. So the implementation of vector swap is indeed different from std::swap().

RandomAccess #ArrayList sorting

发表于1月 6, 201811月 3, 2020 作者 BinTAN

Very few JDK containers implement the RandomAccess marker interface. I only know Stack.java, ArrayList.java and subclass Vector.java. Raw array isn’t.

Only List.java subtypes can implement RandomAccess. Javadoc says

“The primary purpose of this interface is to allow generic algorithms to alter their behavior when applied to either random or sequential access lists.”

Q: which “generic algos” actually check RamdonAccess?
AA: Collections.binarySearch() in https://docs.oracle.com/javase/7/docs/api/java/util/Collections.html
AA: to my surprise, Collections.sort() does NOT care about RandomAccess, so ArrayList sorting is no different from LinkedList sorting! See my blogpost Arrays.sort(primitiveArray) beat List.sort()

http://etutorials.org/Programming/Java+performance+tuning/Chapter+11.+Appropriate+Data+Structures+and+Algorithms/11.6+The+RandomAccess+Interface/ has more details

CPU(data)cache prefetching

发表于1月 6, 20185月 5, 2019 作者 BinTAN

https://stackoverflow.com/questions/1950878/c-for-loop-indexing-is-forward-indexing-faster-in-new-cpus top answer is concise. I think the observations may not be relevant in x years but the principles are.

adjacent cache line (ACL) prefetcher — simple to understand
cpu can detect streams of memory accesses in forward or backward directions

Note L1/L2/L3 caches are considered part of the CPU even if some of them are physically outside the microprocessor.

private-header^shared-header

发表于1月 4, 20187月 7, 2019 作者 BinTAN

In our discussions on ODR, global variables, file-scope static variables, global functions … the concept of “shared header” is often misunderstood.

If a header is only included in one *.cpp, then its content is effectively part of a *.cpp.

Therefore, you may experiment by putting “wrong” things in such a private header and the set-up may work or fail, but it’s an invalid test. Your test is basically putting those “wrong” things in an implementation file!

many system calls use thread synchronization !

发表于1月 3, 20184月 3, 2018 作者 BinTAN

“Most C library calls (such as I/O and memory allocation functions) perform thread synchronization underneath.” according to [[Java Native Interface]].

I guess memory allocation by default uses a process-wide shared heap, rather than thread-specific heap.

scala: usage and adoption

发表于1月 3, 20188月 3, 2018 作者 BinTAN

Outside Morgan, I only know (from reliable sources) that some machine learning teams use scala in addition to python and java.

python random-generate/shuffle list@int

发表于1月 2, 20187月 30, 2019 作者 BinTAN

I now think the simplest is xrange() followed by random.shuffle. Don’t bother with random.sample(). See my github code.

https://stackoverflow.com/questions/22842289/generate-n-unique-random-numbers-within-a-range shows

random.sample() to generate ..
random.suffle() to shuffle a list in-place, used in https://github.com/tiger40490/repo1/blob/py1/py/tree/simpleBST.py

https://github.com/tiger40490/repo1/blob/py1/py/array/qsort.py uses

random.sample(xrange(–99, 100), 19)

In general, below construct is useful because sampleSize must never exceed size of the range i.e. the population:

random.sample(xrange(-99, -99+sampleSize), sampleSize)

I think the generated list has no duplicates, so I had to manually create some duplicates. I guess random.shuffle can work on a list containing duplicate…

–to generate an array of 8 integers between 0 and 100

>>> import random
>>> random.sample(xrange(100), 8)
[39, 53, 1, 80, 54, 61, 4, 26]

##past easy jobs : my80% did exceed benchmark

发表于1月 2, 20188月 20, 2019 作者 BinTAN

Ideally, I want to get a job role slightly lower than the highest salary, where my 80% effort can start to exceed the expectation of THE appraiser.

Grandpa said “At 100% if you don’t hit their requirement, then you don’t need to put in 120% and sacrifice family. It’s their hiring mistake. Their problem. if they don’t pay a compensation package then just leave.” Am I afraid of job change? See separate blogpost.

Looking at past jobs, the numbers below are very imprecise and subjective.

[G/g=Greenfield]
[B/b=brownfield]

[b] Citi muni — my 70<-80% might be enough, or might earn me no bonus like many Citi guys.
[B/g] GS — after the steep learning curve, my 100% was indeed enough to meet the high bar, but I feel 95% would not be
[G] 95G — after I proved myself, my 80<-90% was exceeding.
- Big factor — colleagues were weaker
- Big factor — my designs were favored by the boss
[G] Barc — after I proved myself, my 70<-90% was exceeding.
- Big factor — I built a high-value, high visibility part of the system
- Big factor — no one else was qualified to work on that
[g] Macq role was very senior — my 100% was NOT enough even on the basic devops part of the job. I once felt my 80% was fine in the 1st year, but actually expectation is much higher than I thought.
[B] Stirt — my 100% was barely good enough but not enough to earn a bonus. The project was a few months old when I joined but I struggled with the Qz platform:(. I was unfairly bench-marked against 3Y veterans! I would look decent if bench-marked against freshers.
[b] OC — my 70% would be enough but too relaxed
[b] RTS — After initial 9 months (NYSE+Aquis), my 40<-50% would be enough

2000 billable hour/Y assumes 2D furlough

发表于1月 1, 20181月 30, 2019 作者 BinTAN

There are 52 x 2 weekends and 9 public holidays, total of 113 non-billable days.

If there are 2 furlough days then total 115 non-billable days, leaving 250 billable days i.e. 2000 billable hours.

In my past experience, furlough was rare, but I usually take 15 vacation days each year.

	ptr-ref layering #re…发表在《convert a reference variable i…》
	1330152open⇒发表在《My xx-absorbency[def#1]!=highe…》
	why our coding drill…发表在《## coding IV P/F》
	“hard” l…发表在《FB: spiral number pattern》
	sensitivities = #1 v…发表在《beta ^ rho i.e. correlation co…》