[20]OC-effective: overrated

Today on my jog along the Marina Promenade, I reminded myself that my parents gave me a good body and i have taken good care of it. that’s why I’m able to enjoy life to the fullest.

Then it occurred to me that those “effective” people (in the OC sense) usually don’t or can’t make the same claim. They are often overworked, overweight, overeating, lacking exercise. It’s rare to see one of them fit and slim (TYY might be).

Remember OC-effective is defined in the corporate context. Mostly corporate managers.

— OC-effective people are no more likely to be healthier than average. Their life expectancy is not longer. I actually believe health is wealth.

— OC-effective ≠ (Fuller) wealth or measured in Brbr. Many people are more wealthy but not MNC managers.

— OC-effective ≠ “effective with the team”, as in the SG sense. Sometimes it is largely inter-personal (with the boss) effectiveness.

— OC-effective is mostly theoretical and assessment can be very biased . Promotion is decided by upper management, so what team members feel doesn’t count. 360-review is marketing.

— OC-effective ≠ true leadership. We all know some lousy managers getting promoted (RTS, deMunk). However, I think many good leaders have OC-effectiveness. Listening is essential.

— OC-effective ≠ satisfaction with life. Grandpa often says these individuals 活得好累. They often have more stress due to heavy responsibilities.

— OC-effective is effective in that one organization and may not be in another organization. Consider Honglin. In contrast, hands-on developers like me possess more portable skills mostly in the form of IV.

— OC-effective ≠ adding social value. The corporation may not create lasting social value. OC-effectiveness means effective at helping the company reach its goals, regardless of value. In SG context, social value is rather important.

Is java/c# interpreted@@No; CompiledTwice!

category? same as JIT blogposts

Q: are java and c# interpreted? QQ topic — academic but quite popular in interviews.

https://stackoverflow.com/questions/8837329/is-c-sharp-partially-interpreted-or-really-compiled shows one explanation among many:

The term “Interpreter” referencing a runtime generally means existing code interprets some non-native code. There are two large paradigms — Parsing: reads the raw source code and takes logical actions; bytecode execution : first compiles the code to a non-native binary representation, which requires much fewer CPU cycles to interpret.

Java originally compiled to bytecode, then went through an interpreter; now, the JVM reads the bytecode and just-in-time compiles it to native code. CIL does the same: The CLR uses just-in-time compilation to native code.

C# compiles to CIL, while JIT compiles to native; by contrast, Perl immediately compiles a script to a bytecode, and then runs this bytecode through an interpreter.

save iterators and reuse them+!invalidation risk

For a static STL container, the iterator objects can be safely stored and reused. The dreaded iterator invalidation is a risk only under structural changes.

Many coding interview questions allow (even require) me to save those iterators and store them in a vector, a hash table … Afterwards, we retrieve an iterator value, and visit the next/previous of it.

standard SQL to support pagination #seek#Indeed

Context — In the Indeed SDI, each company page shows the latest “reviews” or “articles” submitted by members. When you scroll down (or hit Next10 button) you will see the next most recent articles.

Q: What standard SQL query can support pagination? Suppose each record is an “article” and page size is 10 articles.

I will assume each article has an auto-increment id (easier than a unique timestamp) maintained in the database table. This id enables the “seek”” method.  First page (usually latest 10 articles) is sent to browser.  Then the “fetch-next” command from browser would contain the last id fetched. When this command hits the server, should we return the next 10 articles after that id (AA), or (BB) should we check the latest articles again, and skip first 10? I prefer AA. BB can wait until user refreshes the web page.

The SQL-2008 industry standard supports both (XX) top-N feature and (YY) offset feature, but for several reasons [1], only XX is recommended :

select * from Articles where id < lastFetchedId fetch first 10

[1] http://www.use-the-index-luke.com clearly explains that the “seek” method is superior to the “offset” method. The BB scenario above is one confusing scenario affecting the offset method. Performance is also problematic when offset value is high. Fetching the 900,000th page is roughly 900,000 times slower than the first page.

signed-int: longest K-sum subArray #Exact_sum 80%

Q: given a signed-int array, find the longest subarray having sum=K

realistic scenario: we know someone executed a buy and a sell and made $K profit. There are many possible buy-sell points when I review the price points. We know this trader held the position longest to gain exposure, so which buy-sell points did he use?

====analysis

First scan to build cumulative sum — the cumsum array. Looks like a price chart. Initial price-level is the first original element.

Now scan this sum-array. At each element s[j],

  1. I save s[j] in hashmap {s[j] -> j] iFF no such key (price level) exists yet. This hashmap records the earliest occurrence of price level s[j].
  2. Then I look for the (by design) earliest element s[i] such that s[i] == s[j] – K. If found in the hash table, then compute hold:=j-i and update the global longest_hold if necessary.

Note the hashmap values (original subscripts) are all earlier than j, since we are scanning forward.

Can consolidate to one scan (as an equivalent-complexity optimization) like the above, or separate into two scans for simplicity.

Aha! Reusable technique — hash table is usable thanks to “Exactly K”.

4 common mem techniques]%%HRT assignment

Technique: reinterpret_cast

Technique: memcpy

Technique: pre-allocated buffer as local static objects … can be better than "stack" allocation only for a large buffer.

Unlike local variables, local Static objects can be returned by pointer — major flexibility in low-level memory management.

https://stackoverflow.com/questions/3835857/in-c-does-using-static-variables-in-a-function-make-it-faster?rq=1 discusses the runtime cost of static local vs local

heap allocation: java Can beat c++

  • case 1 (standard java): you allocate heap memory. After you finish with it you wait for the java GC to clean it up.
  • case 2 (low latency java): you allocate heap memory but disable java GC. Either you hold on to all your objects, or you leave unreachable garbage orbiting the earth forever.
  • case 3 (c++): you allocate heap memory with the expectation of releasing it, so the compiler sets up housekeeping in advance for the anticipated delete(). This housekeeping overhead is somehow similar to try/catch before c++11 ‘noexcept’.

Stroustrup suggested that #2 will be faster than #3, but #3 is faster than #1. I said “But c++ can emulate the allocation as jvm does?” Stroustrup said C++ is not designed for that. I think he meant impractical/invalid. I have seen online posts about this “emulation” but I would trust Stroustrup more.

  • case 4 (C): C/c++ can sometimes use local variables to beat heap allocation. C programmers use rather few heap allocations, in my experience.

Note jvm or malloc are all userland allocators, not part of kernel and usually not using system calls. You can substitute your own malloc.

https://stackoverflow.com/questions/18268151/java-collections-faster-than-c-containers top answer by Kanze is consistent with what Stroustrup told me.

  • zero dynamic allocation (Similar to Case 4) is always faster than even the fastest dynamic allocation.
  • jvm allocation (without the GC clean-up) can be 10 times faster than c++ allocation. Similar to Case 2^3
    • Q: Is there a free list in JVM allocator? Yes

https://softwareengineering.stackexchange.com/questions/208656/java-heap-allocation-faster-than-c claims

  • c++ Custom allocators managing a pool of fixed-sized objects can beat jvm
  • jvm allocation often requires little more than one pointer addition, which is certainly faster than typical C heap allocation algorithms in malloc

a standard representation of binTree

I think on Leetcode there’s now a standard representation of simple binTree. https://leetcode.com/problems/convert-sorted-array-to-binary-search-tree/ shows one example.

see also (de)serialize binary tree #funptr ECT

Let’s not spend too much time here. This representation is simple but not very visual. Can’t deal with recombinant trees. It is useful only for Binary trees not general graphs. Graphs need adjacency matrix or edgeSet.

— My own idea is probably similar:

Can we make do without id and design the serialization sequence to save/reconstruct the tree structure?
I can output layer by layer.  If 2nd layer has no missing node, then 3rd layer consists of exactly 4 nodes, using a sentinel value for any NUL.  If 2nd node is NUL like [A/NUL/C/D], then the next layer would have “2 child nodes for A, 2 child nodes for C, 2 child nodes for D”, where the 2 child nodes for A could be NUL/NUL

What if all integer values are valid values? Trivial problem. I could add a one-bit flag to each serialized payload.

The id/edgeList based solution is general purpose and efficient. The technique above are 雕虫小技. Venkat of OC said something similar about regex state machine. In the same vein, disjoint set is a general solution though many problems have simplified union-find solutions.

template specialization based on NDTTP=true

We know it’s possible to specialize a template for a concrete type like int or std::string, but I didn’t know that It’s also possible to

… specialize a (class or function) template for a particular compile-time const value (like “true”) of a NDTTP (like “bool flag”)

  • On [[Alexandrescu]] Page xii , Scott Meyers showed an elegant example of specializing for “true”. Note “true” is a value, not a data type !
  • P 34 has a longer example.

Note on reading TMP code — the template specialization syntax is clumsy and can add noise to the signal. Better ignore the syntax rules for now to focus on the gist.

#1(reusable)AuxDS for algo challenges

Here’s a Reusable data structure for many pure algo challenges:

Pre-process to construct a static data store to hold a bunch of “structs” in linked list -OR- growing vector -OR- growing RBTree , all O(1) insertion :). Then we can build multiple “indices” pointing to the nodes

Here are a few O(1) indices: (Note O(1) lookup is the best we can dream of)

  • hashtable {a struct field like a string -> iterator into the data store}
  • array indexed by a struct field like small int id, where payload is an iterator from the data store
  • If the structs have some non-unique int field like age, then we can use the same key lookup to reach a “group”, and within the group use one (or multiple) hashtable(s) keyed by another struct field

I think this is rather powerful and needed only in the most challenging problems like LRU cache.

checked STL^checked java Collections

jdk checkedList, checkedMap etc are designed to check type errors — checking any newly added item has the correct type for the collection. See P 246 [[java generics]]

STL checked container checks very different coding errors. See http://www.informit.com/articles/article.aspx?p=373341, which is extracted from my book [[c++codingStd]]

##java heap allocation+!explicit q[new]

Most of these are java compiler tricks. Here are a few java heap allocations without explicit q[new]

  • (pre-runtime) enum instance instantiation — probably at class-loading time
    • P62 [[java precisely]]
  • String myStr = “string1”; // see string pool blogpost
    • P11 [[java precisely]]
    • NOT anonymous temp object like in c++
  • “string1” + “str2” — is same as myStr.concat(..). So a method can often new up an object like this.
    • P10/11 [[java precisely]]
  • boxing
  • (most tricky) array initialization
    • int[] days ={31,28,31/* instantiates the array on heap */};
    • most tricky
    • P17 [[java precisely]] has examples
    • P16 [[java precisely]] also shows an alternative syntax “new int[]”

try{}must be completed by ..#j^c++^c#

— Java before 7 and c#
try{} should be completed by at least a catch or finally. Lone wolf try{} block won’t compile. See https://www.c-sharpcorner.com/UploadFile/skumaar_mca/exception-handling-in-C-Sharp/

In particular, try/finally without catch is a standard idiom.

— java 7:
try{} should be completed by a catch, a finally, both or none .. four configurations 🙂

The try/finally configuration now has an important special case i.e. try-with-resources, where the finally is implicit so you won’t see it in anywhere.

— c++ as of 2019
C++ has no finally.

try{} must be followed by catch.

REST^SOAP

REST stands for Representational State Transfer … basically means that each unique URL is a representation of some object. You can get the contents of that object using an HTTP GET, use a POST, PUT, or DELETE to modify the object (in practice most of the services use a POST for this).

— soap vs REST (most interviewers probably focus here) —

  • REST has only GET POST PUT DELETE; soap uses custom methods “setAge()” etc
  • SOAP takes more dev effort, despite it’s name
  • SOAP used to dominate enterprise apps, though XR used REST in ibanks.

–real REST URLs

https://restfulapi.net/resource-naming/ shows some examples. It also says

URIs should not be used to indicate that a CRUD function is performed. URIs should be used to uniquely identify resources and not any action upon them. HTTP request methods should be used to indicate which CRUD function is performed.

HTTP GET http://api.example.com/device-management/managed-devices  //Get all devices
HTTP POST http://api.example.com/device-management/managed-devices  //Create new Device
HTTP GET http://api.example.com/device-management/managed-devices/{id}  //Get device for given Id
HTTP PUT http://api.example.com/device-management/managed-devices/{id}  //Update device for given Id
HTTP DELETE http://api.example.com/device-management/managed-devices/{id}  //Delete device for given Id

##tech skill superficial exposure: 5 eg

see also exposure: semi-automatic(shallow)Accu #$valuable contexx and low-complexity topics #JGC..

  • –(defining?) features (actually limitations) of superficial exposure
  • accu — (except bash, sql) I would say very few of the items below offer any accu comparable to my core-c++ and core-java accu
  • entry barrier — (except SQL) is not created since you didn’t gain some real edge
  • commodity skill —
  • traction — spinning the wheel
  1. JGC — i think most guys have only textbook knowledge, no GTD knowledge.
  2. lockfree — most guys have only textbook knowledge
  3. [e] spring, tibrv — xp: I used it many times and read many books but no breakthrough. In-depth knowledge is never quizzed
  4. bash scripting — xp: i read books for years but only gained some traction by writing many non-trivial bash scripts
  5. SQL — xp: 5 years typical experience is less than 1Y@PWM
  6. –other examples (beware oth)
  7. [e] EJB, JPA (hibernate), JMS, Hadoop(?)
  8. memory profilers; leak detectors
  9. design patterns — really just for show.
  10. Ajax
  11. [e] Gemfire
  12. java latency tuning — is an art not a science.
    • Poor portability. JVM tuning is mostly about .. GC but at application level, real problem is usually I/O like data store + serialization. Sometimes data structure + threading also matter receives disproportionate limelight.
    • conclusion — superficial, textbook knowledge in this skillset probably won’t help with latency tuning.
  13. [e=ecosystem like add-on packages outside the core language.] Ecosystem skills are hard to enable deep learning and traction as employers don’t need deep insight. GTD is all they need.

average_case ^ amortized

>> bigO (and big-Theta) is about input SIZE, therefore, usable on average or worst case data quality, in my blogpost for CSY

There’s no special notation like bigX specifically for average case.

>> Worst vs average case … is about data quality, or the random choice made by the algorithm. Note if an algo makes a random choice it’s always explicit, using a random numgen. Rather few algos do that, such as QuickSelect

With hash table, we (novices) are permitted to say “quality of hash function” but this is technically incorrect.  My book [[algorithms in a nutshell]] P93 states that IFF given a uniform hash function, then even in worst case bucket sort runs in linear time O(N).

>> Amortized … is about one expensive step in a large number K (≅ input size N) steps like inserts. Amortize a car purchase over x years…

Amortized analysis makes no assumption about quality of data, and therefore is a worst-case analysis.

Note Sorting has no amortized analysis AFAIK.

eg: quicksort+quickSelect can degrade to O(NN). Amortization doesn’t apply.

worst case amortized — is a standard analysis
Eg: vector expansion. Worst-case doesn’t bother us.

— statistical-average amortized analysis — is a standard analysis
eg: Hashtable insert is O(1) based on both statistical average AND amortized analysis :

  • Some inserts will hit long chain, while most insertions hit short chain, so O(1) on average, statistically
  • Amortization is relevant iFF an insert triggers a hash table expansion.

What if the input is so bad that all hash codes are identical? Then amortization won’t help. For hash tables, people don’t worry about worst-case data, because a decent, usable hash function is always, always possible.

eg: iterating a RBTree. Data quality doesn’t bother us as RBTree has guaranteed time complexity :). Traversing entire tree is O(N). Therefore statistical average per-iteration is O(1). Worst per-iteration is O(log N)

http://www.cs.cornell.edu/courses/cs312/2006sp/lectures/lec18.html says “amortized analysis is a worst case analysis” , where “worse” means data quality

average case per-operation — can be quite pessimistic
eg: vector insert

eg: RBTree iteration (see above)

worst case per-operation — not common, relevant to realtime systems only.

I feel this analysis easily hits the extreme far-outliers, so the per-operation cost is far worse than typical,

nonVirtual1() calling this->virt2() #templMethod

http://www.cs.technion.ac.il/users/yechiel/c++-faq/calling-virtuals-from-base.html has a simple sample code. Simple idea but there are complexities:

  • the given print() should never be used inside base class ctor/dtor. In general, I believe any virt2() like any virtual function behaves non-virtual in ctor/dtor.
  • superclass now depends on subclass. The FAQ author basically says this dependency is by-design. I believe this is template-method pattern.
  • pure-virtual is probably required here.

lazy evaluation: a common lambda usage

http://dobegin.com/lambda-functions-everywhere/

Lambda function fits nice to do lazy evaluation. It gives a special syntax to capture some lambda expression or a block to be evaluated later:

// Java: delay counting lines of an input text
Supplier<Integer> createLineCounter(String text) {
    return () -> countLines(text);
}

Here, the method creates and returns a Supplier-of-integer, a callable object that produces an integer not now, but at a later time.

Q: at registration time, the string text is received but used only at counting time, so where is this string stored?
%%A: probably “captured” in some hidden field of the Supplier object

I suspect our dropcopy framework uses lambda mostly for this purpose.

intervalTree: classic RBTree to find overlapp`event

Notable detail — non-balancing BST won’t give logN performance, since the tree depth may degrade

Q1: given N event objects {start, end, id}, use a RBTree to support a O(logN) query(interval INT) that returns any one event that overlaps with INT, or NULL if none.

If the end time == start time in the input INT, then it becomes the focus today :

Q2: given N event objects {start, end, id}, use a RBTree to support a O(logN) query(timestamp ii) that returns any one event that’s ongoing at time ii, or NULL if none.

P312 [[intro to algorithms]] describes an elegant solution.

The text didn’t highlight — I think the end time of the input interval INT is … ignored. (Therefore, in my Q2, input ii is a single timestamp, not an event.) Precisely because of this conscious decision, the tree is sorted by event start time, and the additional payload (in each node N) is the subtree end time i.e. last end time of all events started before N (including N itself). By construction, N’s payload is equal or higher than N’s start time. (Note Our input ii can fall into gaps between the event intervals.)

Eg: suppose N starts at 2:22 and left-child payload says 3:33, then we know for sure there at least one ongoing even during the interval 2:22 to 3:33.  Not sure if this is useful insight.

The text illustrates and proves why this design enables a clean and simple binary search, i.e. after each decision, we can safely discard one of “my” two subtrees. Here’s my (possibly imperfect) recall:

Let’s suppose each time the current node/event is not “ongoing”. (If it is then we can return it 🙂  ) So here’s the gist of the algorithm:

Suppose i’m the “current node”, which is root node initially. Compare ii to my left child L’s payload (not my payload).

As defined, L’s payload is the highest end times of L + all its sub nodes. In other words, this payload is the highest end time of all events starting before me.

Note the algo won’t compare L’s start time with ii
Note the algo won’t compare my start time with ii, though the scenario analysis below considers it.

  • case 1: If payload is lower than ii, then we know all events on my left have ended before ii, so we can discard the left subtree completely. Only way to go is down the right subtree.
  • the other case (case 2): if payload is higher or equal to ii, then we know my left subtree contains some events (known as “candd” aka candidates) that will end after ii.
  • case 2a: suppose my start time is before ii, then by BST definition every candidate has started before ii. We are sure to find the “candd” among them.
  • case 2b: suppose my start time is after ii, then by BST definition, all right subtree events have not started. Right subtree can be discarded
  • In both cases 2a and 2b, we can discard right subtree.

 

xchg between register/memory triggers fencing

I guess xchg is used in some basic concurrency constructs, but I need to research more.

See my blogpost hardware mutex, based@XCHG instruction

https://c9x.me/x86/html/file_module_x86_id_328.html points out the implicit locking performed by CPU whenever one of the two operands is a memory location. The other operand must be a register.

This implicit locking is quite expensive according to https://stackoverflow.com/questions/50102342/how-does-xchg-work-in-intel-assembly-language, but presumably cheaper than user-level locking.

This implicit locking involves a memory fence.

370,000 MPS isn’t tough for multicast #CSY

370,000 msg/sec isn’t too high. Typical exchange message size is 100 bits, so we are talking about 37 Mbps, , less than 0.1% of a 100 Gbps network capacity.

My networking mentor CSY and I both believe it’s entirely possible to host 8 independent threads in a single process to handle the 8 independent message channels. Capacity would be 296 Mbps on a single NIC and single PID.

See also mktData parser: multi-threaded]ST mode #Balaji

I feel a more bandwidth-demanding multicast application is video-on-demand where a single server may need to simultaneously stream different videos.

Q: how about world cup final real-time multicast video streaming to millions of subscribers?
%%A: now I think this is not so demanding, because the number of simultaneous video streams is one

::value^::type #typedef

This TMP technique is the simplest but not for the uninitiated.

In type_traits.h, many templates expose a ::value or ::type construct. Often these two “members” are the only visible output from the type trait meta-functions.

  • ::value is a static field, typically true/false
  • ::type is a member typedef, typically related to the type argument T, but can be “void” like in enable_if

 

message fragment in Internet Protocol !! TCP

IP layer handles fragmentation/defrag. UDP and TCP are one layer above IP and relies on this “service” of the IP layer.

UDP may (TCP is different) occasionally lose an entire “logical” packet, but never Part of a logical packet.

In my own words, If IP layer loses a “fragment” it discards the entire packet.

When a logical packet is broken up at IP layer into physical packets, the constituent physical packets will either be delivered altogether or lost altogether. The frag/defrag IP service is transparent to upper layers so UDP/TCP don’t need to worry about basic data integrity.

I will attempt to contrast it to TCP flow control, which breaks up a megabyte file into smaller chunks. Each chunk is a “logical” packet. TCP (not UDP) uses sequence numbers in the packets.

celebrity search in O(N) #MS

Q (MS 2018): write an algorithm to search for celebrities in a collection of people with the help of a blackbox predicate that can tell whether one person knows the other (bool a.knows(b)) which is an asymmetric relation. Celebrities are those who know no one but are known by everyone else.


Aha : There can be 0 or 1 celebrity.

There are exactly N*(N-1) relationships in the system, so brute force is O(N*N) but I propose a 2-pass O(N) solution:

  1. First pass: shrink the collection of Nodes [A, B .. N]
    1. check a random pair. if A knows B, then remove A from list; else, remove B from list. So first step always shrinks list by one
    2. Now check any random pair of persons in remaining list. Check either direction. Will remove exactly one person.
    3. list will shrink to 1 person (Dan) after N-1 steps
  2. Second pass: check this Dan thoroughly. Dan must be known to everyone and must not know anyone. We end up with either no celebrity OR one celebrity in Dan.

Aha — better use A/B/C or Alice/Bob/Cody instead of #1, #2, #3.

shortest path:2nodes]binary matrix #BFT

Q: given 2 cells in a binary matrix (1=black, 0=white=blocked), check the pair are connected and if yes return the shortest path. There exists a path of length 1 between any 2 cells IFF both are side by side or stack atop.

count paths between 2 bTree nodes #PimcoQ9 Ashish is arguably harder than this problem, but this problem allows moving in four directions.

binary-matrix island count #DeepakM technique is more applicable. A BFT path should work.

  • every reachable node is painted Green (like 2)
  • we give up after our queue is empty

https://github.com/tiger40490/repo1/blob/py1/py/grid/classic_connectedPair.py is the implementation, briefly tested.

longest consecutive ints]O(N) #zebra

Popularity — 1000+ likes on Leetcode … possibly popular

Q(Leetcode #128): Given an unsorted array of integers, find the longest consecutive element sequence, in O(N) time. Eg: given [100, 4, 200, 1, 3, 2] return [1,2,3,4]

I call this the zebra problem because  every consecutive sequence of int is a black stripe and the gaps between them are white stripes. We want the widest black stripe. Obviously, each stripe has minimum size 1.

https://github.com/tiger40490/repo1/blob/py1/py/array/zebra.py is my O(N) solution, not tested on Leetcode.

========

What’s UnionFind? A reusable technique?

Like inserting interval #merging #80% done, I  feel this is a data structure problem,

To keep things simple, i will first run one iteration to remove all duplicate items.

I will use hashtable where key a known item. The value is a pointer to a “segment” object.

A segment stores the min and max values. All integers within [min, max] of the segment are always known-items during my scan of input array.

When a new item is either min-1 or max+1, we expand the segment by adjusting the extremes…

The trick is joining two segments, without link pointers. After joining, we don’t really adjust the min/max fields. We only update the max-length global variable if needed.

To keep the hashtable small, I can optionally delete from it but we don’t want to do a range delete within the loop — O(NN)

##OK,%%spare time had low ROTI..how about Theirs@@

I often feel bad that all of my efforts in my spare time had no tangible ROTI, but look around, who fared better?

Note this post is more about peer comparison (recrec blog), less about my spare time usage (open blog)

For the record my spare time effort did produce some tangible outcomes

  • coding drill in github and wordpress. I think most of my friends didn’t bother to record. Their practice is short-term.
  • yoga, jogging
  • blogging on housing (and school districts) — our real needs. The time spent convinced me to live in Bayonne
  • blogging on personal investment — complex. The time spent directly affected my investment decisions
  • blogging, discussion on boy. The time spent directly influenced my views, decisions and communications with family members
  • helping friends (Deepak,Ashish,YH) with job hunting
  • helping my kids with sports, piano, renzi
  • –less tangible:
  • studying risk systems, data science, crypto-currency? Helps way-finding and navigating the job market

[18] ##spend more$ to prolong engagement+absorbency

  • increase spend on good mobile data and mobile devices to capture the creativity, absorbency …
  • increase spend on printer
  • increase spend on hotel stay near office (or taxi home) to capture the engagement on localSys
  • spend on flights to gain family time, engaged
  • spend unpaid leave to attend interviews .. to gain joy, motivation, engagement, precious insight into their selection priority

Not so sure about …

  • Regrettable — spent unpaid leave before starting Macq job .. to gain peaceful family time? low ROI
  • questionable spend (gave up higher pay rate) to gain … c++ skills like QQ, GTD, zbs

Q: bond price change when yield goes to zero

Can bond yield become negative? Yes 2015-2017 many bonds traded at negative yield. https://www.ft.com/content/312f0a8c-0094-11e6-ac98-3c15a1aa2e62 shows a realistic example of a vanilla bond trading at $104. Yield is negative –You pay $104 now and will get $100 repayment so you are guaranteed to lose money.

Mathematically, when yield approaches negative 100 bps, price goes to infinity.

When yield approaches zero, bond price would go to the arithmetic sum of all coupons + repayment.

detach()^pthread_exit()^pthread_join() at end@main()

See https://stackoverflow.com/questions/3559463/is-it-ok-to-call-pthread-exit-from-main

Q: When would main() call ..join() vs ..exit()?

  • I would call pthread_join() if main thread has something to do after child threads completes.
  • I would call pthread_exit() if main thread has nothing to do and should leave the child threads running.
    • However, if the child thread object is destructed…
  • I would call detach() to create a daemon child thread, which would automatically terminate with host process — according to MS training.
  • I would not call any of them from main() if I want main() to bring down entire process… The normal return of main() kills all threads, unlike jvm — see Anthony Williams comment in  https://stackoverflow.com/questions/3692591/return-versus-pthread-exit-in-pthread-start-functions

For c++11 std::thread object, rules are different from pthreads. If a std::thread object is destructed at end of main(), it would by default invoke std::terminate()! So before any return statement, main() need to call child.detach() , so as to leave the child Thread running.

Explained more in std::thread dtor std::terminate()

##simplicity@design pushed2the limit #

Here I collection simple concepts proven rather versatile, resilient, adaptable. Note in these designs, the complexity can never disappear or reduce. Complexity shifts to somewhere else more manageable.

  • [c] stateless — http
  • [!s] microservices — complexity moves out of a big service into the architecture
  • [c] pure functions — without side effects
  • use the database concept in solving algo problems such as the skyline #Gelber
  • stateless static functions in java — my favorite
  • [c !s] garbage collection — as a concept. Complexity shifts from application into the GC codebase
  • REST
  • in c# and c++, all nested classes are static, unlike in java
  • [!s] python for-loop iteration over a dir, a file, a string … See my blog post
  • [c] immutable — objects in concurrent systems
  • [c] STM i.e. single-threaded mode, without shared mutable #Nsdq
  • [c] pipe — the pipe concept in unix is a classic
  • JSON
  • [c] hash table as basis of OO — py, javascript, perl..
  • [c] sproc (+trigger) — as a simple concept “data storage as guardian of its data, and a facade hiding the internal complexities”
  • [!s] dependency injection
  • [c !s] EDT — swing EDT and WPF
  • [c] RAII
  • [!s] smart pointers as a concept
  • singleton implemented as a static local object, #Scott Meyers
  • DougLea’s singleton — all-static class
  • [c=celebrated, classic, time-honored, ..]
  • [!s = not so simple in implementation]

 

most(new/old)specializations turn out non-strategic

Look at my past vindicative specializations vs 

The Singapore government make technology bets. Venture capitalist make bets. Many organizations also make technology bets.. Every technology bet can lose or win.

In this post, I’m not advocating trySomethingNew. I am advocating specialization, which often requires accumulation and sticking to something not so new, like FIX, threading, SQL, sockets, bond math, …

 

blockingMutex implementation ] kernel!!userland

Background — Linux kernel provides two types of locks — spinlock and blocking mutex, as in https://www.kernel.org/doc/htmldocs/kernel-locking/locks.html . Here I focus on the mutex. I think this is far more useful to userland applications.

https://lwn.net/Articles/575460/ has good pointers:

  • I believe context switch is expensive, since CPU cache has to be replaced. Therefore, optimistic spin is beneficial.

https://github.com/torvalds/linux/blob/master/kernel/locking/mutex.c shows

  • a blocking mutex is used in kernel, perhaps not directly used by userland apps
  • implemented using spin lock + some wait_lock
  • maintains a wait_list. Not visible to any userland app.

 

y FIX needs seqNo over TCP seqNo

My friend Alan Shi said … Suppose your FIX process crashed or lost power, reloaded (from disk) the last sequence received and reconnected (resetting tcp seq#). It would then receive a live seq # higher than expected. This could mean some executions reports were missed. If exchange notices a sequence gap, then it could mean some order cancellation was missed.  Both scenarios are serious and requires request for resend. CME documentation states:

… a given system, upon detecting a higher than expected message sequence number from its counterparty, requests a range of ordered messages resent from the counterparty.

Major difference from TCP sequence number — FIX specifies no Ack though some exchange do. See Ack in FIX^TCP

Q; how could FIX miss messages given TCP reliability? I guess tcp session disconnect is the main reason.

https://kb.b2bits.com/display/B2BITS/Sequence+number+handling has details:

  • two streams of seq numbers, each controlled by exch vs trader
  • how to react to unexpected high/low values received. Note “my” outgoing seq is controlled by me hence never “unexpected”
  • Sequence number reset policy — After a logout, sequence numbers is supposed to reset to 1, but if connection is terminated ‘non-gracefully’ sequence numbers will continue when the session is restored. In fact a lot of service providers (eg: Trax) never reset sequence numbers during the day. There are also some, who reset sequence numbers once per week, regardless of logout.

 

TCP blocking send() timeout #details

See also recv()/send() with timeout #CSY

I see three modes

  1. non-blocking send() — immediate return if unable to send
  2. blocking send() without timeout — blocks forever, so the thread can’t do anything.
  3. blocking send() with timeout —

SO_SNDTIMEO: sets the timeout value specifying the amount of time that an output function blocks because flow control prevents data from being sent. If a send operation has blocked for this time, it shall return with a partial count or with errno set to [EAGAIN] or [EWOULDBLOCK] if no data is sent. The default for this option is zero, which indicates that a send operation shall not time out. This option stores a timeval structure. Note that not all implementations allow this option to be set.

In xtap library, timeout isn’t implemented at all. Default is non-blocking.  If we configure to use 2), then we can hit a strange problem — one of three receivers gets stuck but keeps its connection open. The other receives are starved even though their receive buffers are free.

linux hardware interrupt handler #phrasebook

I think some interrupts are generated by software, but here I focus on hardware interrupt handlers.

  • pseudo-function — Each handler is like a pseudo-function containing a series of instructions that will run on a cpu core
  • top priority — interrupt context is higher priority than kernel context or process context. You can say “interrupt” means “emergency”. Emergency vehicles don’t obey traffic rules.
    • However, an interrupt handler “function” can get interrupted by another [1]. The kernel somehow remembers the “stack”
  • not preemptible — except the [1] scenario, kernel can’t suspend a hardware interrupt handler in mid-stream and put in another series of instructions in the “driver’s seat”
  • no PID — since there’s no preemption, we don’t need a pid associated with this series of instructions.

if thread fails b4 releasing mutex #CSY

My friend Shanyou asked:

Q: what if a thread somehow fails before releasing mutex?

I see only three scenarios:

  • If machine loses power, then releasing mutex or not makes no difference.
  • If process crashes but the mutex is in shared memory, then we are in trouble. The mutex will be seen as forever in-use. The other process can’t get this mutex. I feel this could be a practical problem, with practical solutions like reboot or process restart.
  • If process is still alive, I rely on stack unwinding.

Stack unwinding is set up by compiler. The only situation when this compiler-generated stack unwinding is incomplete is — if the failing function is declared noexcept. (In such a case, the failure is your self-inflicted problem since you promised to compiler it should never throw exception.) I will assume we don’t have a noexcept function. Therefore, I assume stack unwinding is robust and all stack objects will be destructed.

If one of the stack objects is a std::unique_lock, then compiler guarantees an unlocked status on destruction. That’s the highest reliability and reassurance I can hope for.

FIX heartBtInt^TCP keep-alive^XDP heartbeat^xtap inactivity^MSFilter hearbeat

When either end of a FIX connection has not Sent any data for [HeartBtInt] seconds, it should send a Heartbeat message. This is first layer of defense.

When either end of the connection has not Received any data for (HeartBtInt + “some reasonable transmission lag”) seconds, it will send a TestRequest probe message to verify the connection. This is 2nd layer of defense. If there is still no message received after (HeartBtInt + “some reasonable transmission lag”) seconds then the connection should be considered lost.

Heartbeats issued as a response to TestRequest must contain the TestReqID transmitted in the TestRequest. This is useful to verify that the Heartbeat is the result of the TestRequest and not as the result of a HeartBtInt timeout. If a response doesn’t have the expected TestReqId, I think we shouldn’t ignore it. If everyone were to ignore it, then TestReqId would be a worthless feature added to FIX protocol.

FIX clients should reset the heartbeat interval timer after every transmitted message (not just heartbeats).

TCP keep-alive is an optional, classic feature.

NYSE XDP protocol uses heartbeat messages in both multicast and TCP request server. The TCP heartbeat requires client response, similar to FIX probe message.

Xtap also has an “inactivity timeout”. Every incoming exchange message (including heartbeats) is recognized as “activity”. 60 sec of Inactivity triggers an alert in the log, the log watchdog…

MS Filter framework supports a server-initiated heartbeat, to inform clients that “I’m alive”.  An optional feature — heartbeat msg can carry an expiration value a.k.a dynamic frequency value like “next heartbeat will arrive in X seconds”.

#112=testRequest id. There can be many test requests, each with an id.

boost::optional #NaN #a G9 boost construct

https://www.boost.org/doc/libs/1_65_1/libs/optional/doc/html/index.html has illustrations using optional<int>.

— #1 Usage: possibly-empty argument-holder
I encountered this in MS library:

void myFunc(optional<ReportType &> reportData_){
if(reportData_) cout<<“non-empty”;

In addition, declaration of this function features a default-arg:

void myFunc(optional<ReportType &> reportData_ = optional<ReportType &>());

Q: for an int param, how does this compare with a simple default-arg value of -1?
A: boost::optional wins if all integer values are valid values, so you can’t pick one of them to indicate “missing”

StopB4: arg to range()/slice: simplistic rule

I believe range() and slicing operators always generate a new list (or string)

If you specify a stopB4 value of 5, then “5” will not be produced, because the “generator” stops right before this value.

In the simplest usage, START is 0 and STEP is 1 or -1.

…. If StopB4 is 5 then five integers are generated. If used in a for loop then we enter the loop body five times.

In a rare usage (avoid such confusion in coding test!), STEP is neither 1 or -1, or START is not zero, so StopB4 is a used in something like “if generated_candidate >= StopB4 then exit before entry into loop body”

Code below proves slicing operator is exactly the same. See https://github.com/tiger40490/repo1/blob/py1/py/slice%5Erange.py

word[:2]    # The first two characters, Not including position 2
word[2:]    # Everything except the first two characters
s[:i] + s[i:] equals s
length of word[1:3] is 3-1==2

bbg: allocate half the players to NY^SF

Always good to be concrete, so without loss of generality, let’s say J=99 i.e. 198 players.

Q1: Given 2J players, we need to allocate exactly half of them to NY and the rest to San Francisco, to form two teams for a inter-city contest. We receive the NY fare and SF fare for each player, as 2J (i.e. 198) pairs of integers. How can we minimize the total fares?

Q2(bonus): what if 2J+1 players, and you can decide which city gets an extra player?

— Analysis: how many possible allocations? 2J-choose-J. Very large number of allocations. Something like O(J!) so brute force is impractical.

If for any player the two fares both go up by $9800, it doesn’t change our algorithm at all. Therefore, we only care about the fare difference (N-S) for each player.

— solution: I will assume most players live near SF so SF fares are lower. I tabulate and sort the 198 fare differences “NY fare – SF fare” and suppose indeed at least 99 (half) are positive[1]. Therefore, my “base” is SF i.e. my base allocation is everyone-allocated-to-SF. Next I must pick 99 (half) of them and allocate to NY.

I will avoid the higher values in the table.

I simply pick the lower 99 (half) in the table, because these are the 99 “extra” costs I will incur. I want to minimize the total of these 99 values, whether they are mostly positive or mostly negative.

  • Note the highest number in the table indicates the most expensive player for NY relative to SF. If this number is negative, then SF is more expensive than NY for All players so follow the rule in [1] but notice he is the least expensive players for SF relative to NY. Regardless of positive/negative sign, we want to keep this person in SF .
  • Note the lowest number in the table indicates the least expensive player for NY relative to SF. If this number is negative, then SF is more expensive than NY for this player — normal. Regardless, we want to allocate her to NY.

[1] if proven otherwise (by the data), then we could easily treat NY as base to keep the solution intuitive. Actually even if we don’t change the base, the algorithm still works, albeit unintuitively.

A2: Say J=99 so total 199 players We already picked 99 for NY. Do we pick one more for NY or keep remaining 100 in SF?

Just look at the next lowest value in the table. If positive, then NY is more expensive for him, so keep him in SF.

forward^move on various reference types

Suppose there are 3 overloads of a sink() function, each with a lvr, rvr or nonref parameter.

(In reality these three sink() functions will not compile together due to ambiguity. So I need to remove one of the 3 just to compile.)

We pass string objects into sink() after going through various “routes”. In the table’s body, if you see “passing on as lvr” it means compiler picks the lvr overload.

thingy passed through move/fwd ↴ route: as-is route: move() route: fwd  without move() route: fwd+move
output from move() (without any named variable) pbclone or pbRvr ? should be fine pbref: passing on a lvr pbRvr
natural temp (no named variable) pbclone or pbRvr ? should be fine pbref: passing on a lvr pbRvr
named rvr variable from move() ! [2] pbref or pbclone [1] pbRvr pbref?
lvr variable pbref or pbclone passing on a rvr pbref? should never occur
nonref variable pbref or pbclone passing on a rvr pbref? should never occur

https://github.com/tiger40490/repo1/blob/cpp1/cpp1/rvrDemo.cpp
https://github.com/tiger40490/repo1/blob/cpp1/cpp1/mv_fwd.cpp are my experiment.  Take-aways:

— when a non-template function has a rvr param like f(MyType && s), we know the object behind s is a naturally-occurring temporary object (OR someone explicitly marked a non-temp as no-longer-in-use) but in order to call the move-ctor of MyType, we have to MyType(move(s)). Otherwise, to my surprise, s is treated as a lvr and we end up calling the MyType copy-ctor. [1]

  • with a real rvr (not a universal ref), you either directly steal from it, or pass it to another Func2 via Func2(move(rvrVar)). Func2 is typically mv-ctor (including assignment)

[2] Surprise Insight: Any time you save the output of move() in a Named rvr variable like q(string && namedRvrVar), this variable itself has a Location and is a l-value expression, even though the string object is now marked as abandoned. Passing this variable would be passing a lvr!

  • If you call move(), you must not save move() output to a named variable! This rule is actually consistent with…
  • with a naturally-occurring temp like string(“a”)+”1″, you must not save it to named variable, or it will become non-temp object.

— std::forward requires a universal reference i.e. type deduction.

If I call factory<MyType>(move(mytype)), there’s no type deduction. This factory<MyType>(..) is a concretized function and is equivalent to factory(MyType && s). It is fundamentally different from the function template. It can Only be called with a true rvr. If you call it like factor<MyType>(nonref>, compiler will complain no-matching-function, because the nonref argument can only match a lvr parameter or a by-value parameter.

https://gcc.gnu.org/onlinedocs/libstdc++/libstdc++-api-4.6/a01033_source.html shows that every time any function gets a variable var1 as a universal ref or a true rvr, we always apply either move(var1) or forward(var1) before passing to another function. I think as-is passing would unconditionally pass var1 as a lvr.

ValueType default ctor needed in std::map[]

Gotcha — Suppose you define a custom Value type for your std::map and you call mymap[key] = Value(…). There’s an implicit call to the no-arg ctor of Value class!

If you don’t have a no-arg ctor, then avoid operator[](). Use insert(make_pair()) and find(key) methods.

Paradox — for both unordered_map and map, a lookup HIT also requires a no-arg ctro to be available. Reason is,

  • since your code uses map operator[], compiler will “see” the template member function operator[]. This function explicitly calls the ctor within a if-block.
    • If you have any lookup miss, this ctor is called at run time
    • If you have no lookup miss, this ctor is never called at run time
    • compiler doesn’t know if lookup miss/hit may happen in the future, so both branches are compiled into a.out
  • If your source code doesn’t use operator[], then at compile time, the template function operator[] is not included in a.out, so you don’t need the default ctor.

 

python % string formatting #padding

Many details I tend to forget:

  • the actual data can be either a tuple or dict. The dict version is powerful but less known
    • if there’s just one item in the tuple, you don’t need to parenthesis —  myStr = “%d” % var1
  • for a tuple, the format specifier count must match the tuple length
  • for a dict, each format specifier must name a valid key value.
myStr = "%(var1)d" % locals()) # locals() returns a dict including var1 
  • There are at least two q(%) in the above expression
  • extra parentheses after the 2nd % are required:

“#%03d” % (id(node)%1000)  # return last 3 digit of object id, with 0-padding

after fork(): threads,sockets.. #Trex

I have read about fork() many times without knowing these details, until Trex interviewer asked !

–based on http://man7.org/linux/man-pages/man2/fork.2.html

The child process is created with a single thread—the one that called fork(). The entire virtual address space of the parent is replicated in the new process, including the states of pthread mutexes, pthread condition variables, and other pthreads objects In particular, if in parent process a lock was held by some other thread t2, then child process only has the main thread (which called fork()) and no t2 but the lock is still unavailable. This is a common problem, addressed in http://poincare.matf.bg.ac.rs/~ivana/courses/ps/sistemi_knjige/pomocno/apue/APUE/0201433079/ch12lev1sec9.html.

The very 1st instruction executed in Child is the instruction after fork() — as proven in https://github.com/tiger40490/repo1/blob/cpp1/cpp1/fork3times.cpp

The child inherits copies of the parent’s set of open file descriptors, including stdin/stdout/stderr. Child process should usually close them.

Special case — socket file descriptor inherited. See https://bintanvictor.wordpress.com/2017/04/29/socket-shared-between-2-processes/

 

gdb to show c++thread wait`for mutex/condVar

https://github.com/tiger40490/repo1/blob/cpp1/cpp/thr/pthreadCondVar.cpp shows my experiment using gdb supplied by StrawberryPerl.

On this g++/gdb set-up, “info threads” shows thread id number 1 for main thread, “2” for the thread whose pthread_self() == 2 … matching 🙂

The same “info-threads” output also shows

  • one of the worker threads is executing sleep() while holding lock (by design)
  • the other worker threads are all waiting for the lock.
  • At the same time, the main thread is waiting in a conditional variable, so info-threads shows it executing a different function.

breakdown heap/non-heap footprint@c++app #massif

After reading http://valgrind.org/docs/manual/ms-manual.html#ms-manual.not-measured, I was able to get massif to capture non-heap memory:

valgrind --tool=massif  --pages-as-heap=yes --massif-out-file=$massifOut .../xtap -c ....
ms_print $massifOut

Heap allocation functions such as malloc are built on top of system calls  such as mmapmremap, and brk. For example, when needed, an allocator will typically call mmap to allocate a large chunk of memory, and then hand over pieces of that memory chunk to the client program in response to calls to malloc et al. Massif directly measures only these higher-level malloc et al calls, not the lower-level system calls.

Furthermore, a client program may use these lower-level system calls directly to allocate memory. By default, Massif does not measure these. Nor does it measure the size of code, data and BSS segments. Therefore, the numbers reported by Massif may be significantly smaller than those reported by tools such as top that measure a program’s total size in memory.



			

RandomAccess #ArrayList sorting

Very few JDK containers implement the RandomAccess marker interface. I only know Stack.java, ArrayList.java and subclass Vector.java. Raw array isn’t.

Only List.java subtypes can implement RandomAccess. Javadoc says

“The primary purpose of this interface is to allow generic algorithms to alter their behavior when applied to either random or sequential access lists.”

Q: which “generic algos” actually check RamdonAccess?
AA: Collections.binarySearch() in https://docs.oracle.com/javase/7/docs/api/java/util/Collections.html
AA: to my surprise, Collections.sort() does NOT care about RandomAccess, so ArrayList sorting is no different from LinkedList sorting! See my blogpost Arrays.sort(primitiveArray) beat List.sort()

http://etutorials.org/Programming/Java+performance+tuning/Chapter+11.+Appropriate+Data+Structures+and+Algorithms/11.6+The+RandomAccess+Interface/ has more details

 

private dtor restricts instantiation to heap only

My test proves that declaration (even without definition) of a stack instance or global instance is illegal because compiler generates a (illegal) call to the private dtor!

Friend and static method can both also call the private dtor.

One usage of such a class — force all users to go trough factory to instantiate this class.

class PrivDtor{
  ~PrivDtor(){
        cout<<"dtor\n";
  }
public:
  static void destroy(PrivDtor* p){p->~PrivDtor();}
  static void del(PrivDtor* p){delete p;}
};

//PrivDtor globalInstance;  //won't compile
int main(){
  //PrivDtor stackInstance; //won't compile

  PrivDtor * p = new PrivDtor;
  //PrivDtor::del(p); // works!
  PrivDtor::destroy(p);  //works!
}

 

mktData conflation: design question

I have hit this same question twice — Q: in a streaming price feed, you get IBM prices in the queue but you don’t want consumer thread AA to use “outdated” prices. Consumer BB needs a full history of the prices.

I see two conflicting requirements by the interviewer. I will point out to the interviewer this conflict.

I see two channels — in-band + out-of-band needed.

  1. in-band only — if full tick history is important, then the consumers have to /process/ every tick, even if outdated. We can have dedicated systems just to record ticks, with latency. For example, Rebus receives every tick, saves it and sends it out without conflation.
  2. out-of-band — If your algo engine needs to catch opportunities at minimal latency, then it can’t afford to care about history. It must ignore history. I will focus on this requirement.
  3. dual-band — Combining the two, if your super-algo-engine needs to analyze tick-by-tick history and also react to the opportunities, then the “producer” thread alone has to do all work till order transmission, but I don’t know if it can be fast enough. In general, the fastest data processing system is single-threaded without queues and minimal interaction with other data stores. Since the producer thread is also the consumer thread for the same message, there’s no conflation. Every tick is consumed! I am not sure about the scalability of this synchronous design. FIFO Queue implies latency. Anyway, I will not talk further about this stringent “combo” requirement.

https://tabbforum.com/opinions/managing-6-million-messages-per-second?print_preview=true&single=true says “Many firms mitigate the data they consume through the use of simple time conflation. These firms throw data on the floor based solely on the time that data arrived.”

In the Wells interview, I proposed a two-channel design. The producer simply updates a “notice board” with latest prices for each of 999 tickers. Registered consumers get notified out-of-band to re-read the notice board[1], on some messaging thread. Async design has a latency. I don’t know how tolerable that is. I feel async and MOM are popular and tolerable in algo trading. I should check my book [[all about HFT]]…

In-band only — However, the HSBC manager (Brian?) seems to imply that for minimum latency, the socket reader thread must run the algo all the way and send order out to exchange in one big function.

Out-of-band only — For slightly slower markets, two market-leading investment bank gateways actually publish periodic updates regardless how many raw input messages hit it. Not event-driven, not monitoring every tick!

  • Lehman eq options real time vol publisher
  • BofA Stirt Sprite publishes short-term yield curves on the G10 currencies.
    • Not EURUSD spot prices

[1] The notification can only contain indicative price numbers and serve to invite clients to check the notice board. If clients rely solely on the indicative, it defeats conflation and brings us back to a FIFO design.

no overflow]TCP slow receiver #non-blocking sender

Q: Does TCP receiver ever overflow due to a fast sender?

A: See http://www.mathcs.emory.edu/~cheung/Courses/455/Syllabus/7-transport/flow-control.html

A: should not. When the receiver buffer is full, the receiver sends AdvertizedWindowSize to informs the sender. If sender app ignores it and continues to send, then sent data will remain in the send buffer and not sent over the wire. Soon the send buffer will fill up and send() will block. On a non-blocking TCP socket, send() returns with error only when it can’t send a single byte. (UDP is different.)

Non-block send/receive operations either complete the job or returns an error.

Q: Do they ever return with part of the data processed?
A: Yes they return the number of bytes transferred. Partial transfer is considered “completed”.

 

declare variables ] loop header: c^j #IF/for/while

Small trick to show off in your coding test…

Background — In short code snippet, I want to minimize variable declarations. The loop control variable declaration is something I always want to avoid.

https://stackoverflow.com/questions/38766891/is-it-possible-to-declare-a-variable-within-a-java-while-conditional shows java WHILE-loop header allows assignment:

List<Object> processables;
while ((processables = retrieveProcessableItems(..)).size() > 0) {/*/}

But only (I’m 99% sure) c++ WHILe-loop header allows variable declaration.

The solution — both java/c++ FOR-loop headers allow variable declaration. Note the condition is checked Before first iteration, in both for/while loops.

update — c++0x allows variable declaration in IF-block header, designed to limit the variable scope.

if (int a=string().size()+3) cout<<a << ” = a \n”; // shows 3

discover cpu count ] my machine #lscpu

NIC — There’s another blog post https://bintanvictor.wordpress.com/2018/03/19/discover-nic-my-machine-lspci/

$ lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 16
On-line CPU(s) list: 0-15
Thread(s) per core: 1
Core(s) per socket: 8
CPU socket(s): 2
NUMA node(s): 2
L1d cache: 32K
L1i cache: 32K
L2 cache: 256K
L3 cache: 20480K


$ less /proc/cpuinfo|egrep 'core id|processor'  # on the same machine shows

16 “processors”, but #0 and #1 have the same core-id! Total 8 core-id values. I think this is because in each (of two) socket, there are 8 cores with unique core-id values.

dmesg | grep rocessor # doesn’t work any more

dxdiag # windows

dxdiag /t c:\deldel.txt
find “CPU” c:\deldel.txt # copy-paste may not work

(Revisit) senior manager IV: what they watch out for

Common, non-trivial, perhaps hidden issues in a candidate, ranked:

  • twist and turn and not /candid/ about his past
  • not listening
    • jumping to conclusions
    • assuming too much
    • not waiting for a long-winded interviewer to finish, perhaps interrupting
    • jumping to the defensive, when deliberately provoked
  • desperate for the job and losing his perspective
  • opinionated
  • choosy on projects
  • conceited beyond “cool confidence”
  • —–secondary
  • impulsive answers ^ elusive, guarded answers
    • elusive answers? common
  • bad-mouth past managers
  • lack of humor? common
  • tensed up and not cool and calm, perhaps prone to workplace stress
  • exaggeration about his past? common
  • not articulate enough? common
  • poor eye contact? common

c++debug^release build can modify app behavior #IV

This was actually asked in an interview, but it’s also good GTD knowledge.

https://stackoverflow.com/questions/4012498/what-to-do-if-debug-runs-fine-but-release-crashes points out —

  • fewer uninitialized variables — Debug mode is more forgiving because it is often configured to initialize variables that have not been explicitly initialized.
    • For example, Perhaps you’re deleting an uninitialized pointer. In debug mode it works because pointer was nulled and delete ptr will be ok on NULL. On release it’s some rubbish, then delete ptr will actually cause a problem.

https://stackoverflow.com/questions/186237/program-only-crashes-as-release-build-how-to-debug points out —

  • guard bytes on the stack frame– The debugger puts more on the stack, so you’re less likely to overwrite something important.

I had frequent experience reading/writing beyond an array limit.

https://stackoverflow.com/questions/312312/what-are-some-reasons-a-release-build-would-run-differently-than-a-debug-build?rq=1 points out —

  • relative timing between operations is changed by debug build, leading to race conditions

Echoed on P260 [[art of concurrency]] which says (in theory) it’s possible to hit threading error with optimization and no such error without optimization, which represents a bug in the compiler.

P75 [[moving from c to c++]] hints that compiler optimization may lead to “critical bugs” but I don’t think so.

  • poor use of assert can have side effect on debug build. Release build always turns off all assertions as the assertion failure messages are always unwelcome.

collection-of-abstract-shape: j^c++

In java, this usage pattern is simple and extremely common — Shape interface.

In c++, we need a container of shared_ptr to pure abstract base class 😦

  • pure abstract interface can support MI
  • shared_ptr supports reference counting, in place of garbage collection
  • pointer instead of nonref payload type, to avoid slicing.

This little “case study” illustrates some fundamental differences between java and c++, and showcases some of the key advantages of java.

SFINAE #sizeof#ptr2member as template-param

https://jguegant.github.io/blogs/tech/sfinae-introduction.html is relatively simple, concise. Shows how to test T has method1()

https://en.wikibooks.org/wiki/More_C%2B%2B_Idioms/SFINAE is shorter and uses the same sizeof trick.

https://en.wikibooks.org/wiki/More_C%2B%2B_Idioms/Member_Detector is another illustration

–all 3 resource above use sizeof and function template (not class template) —

https://github.com/tiger40490/repo1/blob/cpp1/cpp/template/includes useful demo of my own code in production, powering the nyse+nyseAmerican real time market data parsers behind most of the biggest financial data sites.

When the compiler evaluates sizeof, which is a compile-time task, it would try one of the 3 func() overloads and check the winner’s return type[1] . Always exactly one of the 3 overloads can compile.

When T is AddRefreshOrderStruct, the compiler tries 1st overload, where AddRefreshOrderStruct needs a long field, and AddRefreshOrderStruct needs a sendTimeNS field. Pass! So the return type is int.

When T is EmptyStruct, the compiler tries the 1st overload, where EmptyStruct needs a long field … failed:( Only the last overload, the default overload, passes.

[1] the size of the return type is used to initialize the static const field!

The asterisk at end of the func declarations is needed as the func() argument will be NULL pointer. NULL pointer can match a pointer to any type.

##[17] 4 tech career growth directions #aggressive

In spite of the threats and pains, I remain hopeful about my technical growth. Ideally, I hope to

  1. maintain competitiveness in java. Consider body-building
  2. selectively build up my c++ IV competitiveness, in terms of QQ/BP/ECT, from 6 to 7, over 2 years
    1. sockets, concurrency, sharedMem, templates ..
  3. try to recover the /sunk cost/ in quant domain
  4. slowly branch out to data analytics or data science

[1 java] feels achievable, based on my recent interviews. [4 data] is secondary and nice to have, not a must at age 43, so [2 cpp] is my main drive.

The interviews are crucial. Without IV, I can improve my GTD or zbs but painfully slow improvement of IV competitiveness.

Q: Suppose we don’t know java. Based on our c++ IV performance, can we survive as a c++ developer?
A: I will survive. My family needs me.

std::vector-of-containers: initializer list

Typical example: If you heavily use a vector of map, it’s tempting to use a vector of pointers to maps. The java way.

If you drop the “pointers to”, then when you retrieve the map from the vector, you often get a copy, unless you save the return value in a reference variable

By the way, here’s an initializer for std::map:

vec.push_back(map<int, int>{{32,1}} );

LD_LIBRARY_PATH ^ objdump RUNPATH

This is related to q[cannot open shared object file] abc.so

See https://amir.rachum.com/blog/2016/09/17/shared-libraries/#rpath-and-runpath for the RUNPATH

q(objdump) can inspect the binary file better than q(ldd) does.

q(ldd) shows the final, resolved path of each .so file, but (AFAIK) doesn’t show how it’s resolved. The full steps of resolution is described in http://man7.org/linux/man-pages/man8/ld.so.8.html

q(objdump) can shed some light … in terms of DT_RUNPATH section of the binary file.

[17] 5 unusual tips@initial GTD

See also https://bintanvictor.wordpress.com/wp-admin/edit.php?s&post_status=all&post_type=post&action=-1&m=0&cat=560907660&filter_action=Filter&paged=1&action2=-1

* build up instrumentation toolset
* Burn weekends, but first … build momentum and foundation including the “instrumentation” detailed earlier
* control distractions — parenting, housing, personal investment, … I didn’t have these in my younger years. I feel they take up O2 and also sap the momentum.
* Focus on output that’s visible to boss, that your colleagues could also finish so you have nowhere to hide. Clone if you need to. CSDoctor told me to buy time so later you can rework “under the hood” like quality or design

–secondary suggestions:
* Limit the amount of “irrelevant” questions/research, when you notice they are taking up your O2 or dispersing the laser. Perhaps delay them.

Inevitably, this analysis relies on the past work experiences. Productivity(aka GTD) is a subjective, elastic yardstick. #1 Most important is GTD rating by boss. It sinks deep… #2 is self-rating https://bintanvictor.wordpress.com/2016/08/09/productivity-track-record/

MOM+threading Unwelcome ] low latency@@ #FIX/socket

Piroz told me that trading IT job interviews tend to emphasize multi-threading and MOM. Some use SQL too. I now feel all of these are unwelcome in low latency trading.

A) MOM – see also HFT mktData redistribution via MOMFor order processing, FIX is the standard. FIX can use MOM as transport, but not popular and unfamiliar to me.

FIX does use buffers to hold a burst of incoming or outgoing messages. The buffers resemble message queues.

B) threading – Single-Threaded-Mode is generally the fastest in theory and in practice. (I only have a small observed sample size.) I feel the fastest trading engines are STM. No shared mutable. Nsdq new platform (in java) is STM

Multithreading is OK if the threads don’t compete for resources like CPU, I/O or locks. Compared to STM, most lockfree systems introduce latency like retries, and additional memory barrier. By default compiler optimization doesn’t need such memory barriers.

C) SQL – as stated elsewhere, flat files are much faster than relational DB. How about in-memory relational DB?

Rebus, the order book engine, is in-memory.

blockchain #phrasebook

A blockchain is a peer-to-peer network that timestamps records by hashing them into an ongoing chain of hash-based proof-of-work, forming a record that cannot be changed without redoing the proof-of-work.

In contrast, a distributed ledger is a peer-to-peer network that uses a defined consensus mechanism to prevent modification of an ordered series of time-stamped records. All blockchains are distributed ledgers, but not all distributed ledgers are blockchains.

Keywords:

  • Peer-to-peer — no central single-point-of-failure
  • Immutable — records of past transactions
  • Ever-growing — the chain keeps growing and never shrinks. Is there some capacity issue in terms of storage, backup, search performance?
  • Double-spend — is a common error to be prevented by blockchain

[[Focus]] for technical xx

See also my email to Raja, XR and ZengSheng on “the zone”, the “creative engine”…

As the book author, Dan’s concept of focus is more narrow. It’s related to a student’s focus. I told my son he needs better concentration. As a student I was very good with concentration and engaged.

Compared to my engaged “traction” days, I now spend too much time emailing, blogging on non-tech, calling home, research about housing, shopping, food preparation … But that’s not the key. During the work or study hours, I am not focusing as I did in primary school.

Traction/engaged days are with growth, concentration, positive feedback loop, tech blogging….

I think Yang was the first manager who pointed out I need better focus. Perhaps he meant I tried to study all the systems in PWM Comm all at once. I made real progress (with ErrorMemos, SQL, Java …) after I sent my wife/son back to Singapore.

I made real progress with quant stuff during the Barcap days. However, I also had to keep job hunting remotely.

Avichal, who sat beside me, said I had too many distractions… Disengaged.

Similar to dabao’s case, I guess the initial code complexity requires focus… Without sufficient focus, the complexity is just too much for my capacity. Better shut out everything else and focus on delivering something non-trivial. Remember PWM, Quartz, …

Grandpa had a lot to say about focus…

TCP listening socket shared by2processes #mcast

Common IV question: In what scenarios can a listening socket (in memory) be shared between 2 listening processes?

Background — a socket is a special type of file descriptor (at least in unix). Consider an output file handle. By default, this “channel” isn’t shared between 2 processes. Similarly, when a packet (say a price) is delivered to a given network endpoint, the kernel must decide which process to receive the data, usually not to two processes.

To have two processes both listening on the same listening-socket, one of them is usually a child of the other. The webpage in [1] and my code in https://github.com/tiger40490/repo1/blob/py1/py/sock/1sock2server.py show a short python code illustrating this scenario. I tested. q(lsof) and q(ss) commands both (but not netstat) show the 2 processes listening on the same endpoint. OS delivers the data to A B A B…

https://bintanvictor.wordpress.com/2017/04/29/so_reuseport-socket-option/ shows an advanced kernel feature to let multiple processes bind() to the same endpoint.

For multicast (UDP only) two processes can listen to the same UDP endpoint. See [3] and [2]

A Unix domain socket can be shared between two unrelated processes.

See

[1] http://stackoverflow.com/questions/670891/is-there-a-way-for-multiple-processes-to-share-a-listening-socket

[2] http://stackoverflow.com/questions/1694144/can-two-applications-listen-to-the-same-port

[3] http://www.tldp.org/HOWTO/Multicast-HOWTO-2.html

##which common UDP/TCP functions are blocking

Q: which blocking functions support a timeout?
A: All

A non-blocking send fails when it can’t send a single byte, usually because destination send buffer (for TCP) is full. For UDP, see [4]

Note read() and write() has similar behavior. See send()recv() ^ write()read() @sockets and http://www.mathcs.emory.edu/~cheung/Courses/455/Syllabus/7-transport/flow-control.html

[1] meaning of non-blocking connect() on TCP is special. See P51[[tcp/ip sokets in C]] and https://www.scottklement.com/rpg/socktut/nonblocking.html
[2b] accept() returning with timeout is obscure knowledge — see accept(2) manpage
[2c] accept() is designed to return immediately on a nonblocking socket — https://stackoverflow.com/questions/12861956/is-it-possible-and-safe-to-make-an-accepting-socket-non-blocking and https://www.scottklement.com/rpg/socktut/nonblocking.html
[3] select() on a non-blocking socket is obscure knowledge —  See https://www.scottklement.com/rpg/socktut/nonblocking.html
[4] UDP has no send buffer but some factors can still cause blocking … obscure knowledge
[5] close() depends on SO_LINGER and non-blocking by default … https://stackoverflow.com/questions/6418277/how-to-close-a-non-blocking-socket
[6] MSG_DONTWAIT flag

t/out default flags arg to func fcntl on entire socket touching TCP buffers?
y select blocking not supported? still blocking! [3]  no
epoll blocking not supported?  no
y recv blocking can change to NB [6] can change to NB  yes
y send/write TCP blocking can change to NB [6] can change to NB  yes
y recvfrom blocking can change to NB [6] can change to NB  yes
y sendto UDP can block [4] can change to NB [6] can change to NB  yes
y close() with SO_LINGER not the default not supported yes yes
close() by default NB [5] not supported can change to blocking yes
y[2b] accept by default blocking not supported yes yes
accept on NB socket [2c] not supported yes no
LG connect()TCP blocking not supported? can change to NB [1] no
LG connect()UDP NB NotApplicable

support MyType a=7 despite q[ explicit ] conversion constructor@@

Background:

  explict MyType(int); // would disallow
  MyType a = 77;

http://en.cppreference.com/w/cpp/language/converting_constructor has a solution:

  MyType a = (MyType) 77; // Static cast would invoke the explicit conversion constructor!

In general, most custom types should make conversion constructors explicit to prevent hidden bugs, but smart pointer need an implicit conversion constructor, to support

  SmartPtr myPtr = new int(77);

–A real example from CFM quant code

 FwdCurve::const_iterator iter = find( key ); //non-const itr

 QL_ASSERT( iter != ((FwdCurve * const) this)->end(), "not found" ); // original code probably before "explicit". 

// I had to change it to
 FwdCurve_iterator endItr = ((FwdCurve * const) this)->end();
 QL_ASSERT( iter != FwdCurve_const_iterator(endItr), "not found" ); //call the conversion ctor explicitly

## insertion sort: phrasebook

  • nearly sorted — for nearly-sorted sample, beat all other sorts  including counting sort
  • small — for small sample, beats MSD radix sort and all comparison sort
    • switch — many comparison sort implementations switch to insertion-sort when the sub-array becomes small
  • poker — Intuitively, this is how players sort poker cards
    • shift — requires shifting part of the sorted portion.

python has limited treeMap, array

Relevant to some coding questions (like Facebook) that emphasize big-O.

— vector — The default list is a vector, with constant time random access.

Just like std::vector, deleting from beginning is slow, according to https://stackoverflow.com/questions/19441488/efficiency-of-len-and-pop-in-python

— fixed array — https://stackoverflow.com/questions/176011/python-list-vs-array-when-to-use shows array type is rarely needed.

— tree map — not popular

https://github.com/tiger40490/repo1/blob/py1/py/tree/simpleBST.py is my own BST without self-balancing.

##c++good habits like java q[final]

  • Good habit — internal linkage
    • ‘static’ keyword on file-level static variables
    • anonymous namespace
  • if (container.end() ==itrFromLower_bound) {…}
  • use typedef to improve readability of nested containers
  • typedef — “typedef int FibNum” as source documentation
  • initialize local variables upon declaration.
  • use ‘const/constexpr’ whenever it makes sense. The c++11 constexpr is even better.
  • [c++11] – use ‘override’ or ‘final’ keywords when overriding inherited virtual methods. Compiler will break if there’s nothing to override.

##string utilities in c++std lib

Needed for coding IV and GTD

  1. stringstream
  2. cin
  3. vector<char> for sorting and lower_bound
  4. list<char> for splice? could be slow but who cares in a quick coding test? Simpler than swap
  5. iomanip for output to stringstream
  6. vector<std::string>, set<std::string>
  7. getline on stringstream or cin
  8. std::string methods. Some are powerful but unfamiliar
    1. find_last_not_of()
  9. c_str and char array? seldom needed nowadays

establish tech stronghold(GTD+nlg)]each team

Note – focus is job performance, not IV.

To survive in a team,need unique technical specialty in a key area.

With the possible exception of the Citi Muni team, I have not seen a banking IT team to tolerate a “dead weight” guy. To do reasonably well in any financial IT team, I often need some “hard” strength (like an subject matter expertise), not just soft skill. If within your team you have a unique technical strength in a complex and important sub-system, then you can “name your price”, and you are kind of irreplaceable. Perhaps a go-to person. Some team members don’t have any expertise but survive on strong relationships. I’m unable to do that. In the past teams, I think many team members have such a strength. If I don’t have it, my position would be less established, less solid.

  • 😦 😦 stirt? Worst experience. Was good at the insignificant preferences framework; canceled trades
  • 😦 GS? Uncomfortable position. My Perl was not top notch. My java was below Yang. I became good at Error Memos (EOS) and trailers/upfronts
  • 🙂 OC? was good at Bloomberg adapter and Guardian builtin web server to show the log files
  • 🙂 🙂 95 Green? My DB design was convincing to Ravi. My wait-notify based design was pretty hard to match.
  • 🙂 Barc? FMD, fmath
  • 😦 Macquarie? In the quant team, whatever I accomplish seems trivial to manager, since he’s too strong. I’m seen as very slow on small tasks.

factory returning smart ptr^pbclone #Sutter2013

Background — This is one example of “too many variations”. A simple idea become popular… People start mix-n-match it with other ideas and create many, many interesting questions. In practice we should stick to one or two best practices and avoid the other 50 variations.

Update: https://herbsutter.com/2013/05/30/gotw-90-solution-factories/ is a little too dogmatic as an article. A few take-aways:

  • returning raw ptr can indeed leak in real life. I would still do this in simple projects.
  • For a polymorphic type, return a unique_ptr by default. See also [[effModernC++]]. (Return shared_ptr only if the factory saves it in some internal state, but who does?)
  • For a non-polymorphic type, return a clone. Safely assume it’s copyable or movable in c++11.

I think it’s quite common (see [[safe c++]] P51) to have a factory function creating a heap “resource”. Given it’s on heap, factory must return the object by pointer. Easy to hit memory leak. Standard solution is return by ref count smart ptr.

Q: what if I forget to destruct the obj returned?
A: I feel it’s uncommon and unnatural. (More common mistake is to forget deletion.) Often the object returned (i.e. the smart pointer object) is on the stack – no way to “forget”.

Note if a (64-bit) raw pointer variable is on the stack, then the stack unwinding will deallocate/reclaim the 64-bit, but won’t call delete on it!

Q: what if the object returned is saved as a field like myUmbrella.resource1
A: I feel this is fine because the myUmbrella object will be destructed by someone’s responsibility. That would automatically destruct the smart pointer field this->resource1.

Alternative to the ref count smart ptr, we can also return a (boost) scoped pointer. See [[safe c++]].

Q: what if factory need to return a pointer to stack?
A: I don’t think so. Pointee goes out of scope….
A: I feel most “resources” are represented by a heapy thingy. Alternatively, it’s also conceivable to hold a “permanent” resource like globally. Memory management is non-issue.

 

 

By the way, [[safe c++]] also mentions the __common__ pattern of a (non-factory) function returning a pointer to an existing object, rather than a “new SomeClass”. The challenges and solutions are different.

struct packing^memory alignment %%experiment

Sound byte — packing is for structs but alignment and padding is Not only for structs

Longer version — alignment and padding apply to Any object placement, whether that object is a field of a struct or an independent object on stack. In contrast, the packing technique is specific to a struct having multiple fields.

https://github.com/tiger40490/repo1/blob/cpp1/cpp1/memAlign.cpp has my test code to reveal the rules:

  1. For a (8-byte) long long object on stack (or in a struct), the address is 8-byte aligned. So padding is added by compiler, unless you say “packed” on a struct.
  2. For a (4-byte) int object on stack (or in a struct), the address is 4-byte aligned.
  3. For a (2-byte) short object on stack (or in a struct), the address is 2-byte aligned.
  4. For a char object on stack (or in a struct), the address is 1-byte aligned. All memory address are 1-byte aligned, so compiler adds no padding.

http://stackoverflow.com/questions/11108328/double-alignment Wug’s answer echoes Ashish’s comment that tiny fields like char should be grouped together, due to Rule #4. This applies to stack layout as well [1]. However, compiler optimization can be subversive:

Not only can the compiler reorder the stack layout of the local variables, it can assign them to registers, assign them to live sometimes in registers and sometimes on the stack, it can assign two locals to the same slot in memory (if their live ranges do not overlap) and it can even completely eliminate variables.

[1] See https://stackoverflow.com/questions/1061818/stack-allocation-padding-and-alignment

how many degrees(upTo180)btw hr^min hands #Dilip

Dilip had an elegant solution by hand.

3:15 — MH is at 90 degrees; HH is slightly over 90. It’s 1/4 way towards 120 degrees i.e. 90+7.5 degrees. Therefore, the answer is 7.5 degrees

3.10 — MH is at 60 degrees; HH is slightly over 90. It’s 1/6 way towards 120 degrees, i.e. 95 degrees. Answer is 95 – 6 = 35 degrees.

Note the MH is always at an integer degree but HH can be at a xxx.5 degrees.

app arch – civil engineer or a salesman, sg^U.S.

I feel in Singapore context, “architect” is more likely to refer to a hands-off role. In the US, I never meet an architect who doesn’t writes code hands-on at least half his time.

A real app architect in finance (different from professional software product vendors) really needs hands-on capabilities. Her output is not just on paper, but in code. If her design (not the document but the “key ideas” implemented) doesn’t work, she must roll up her sleeves and make it work. Most of those ideas are low level, like some serialization or some threading construct, and require just one good developer. In that sense it’s not unlike a library module.

In this context, architect is the name of the technical lead of ANY dev team. Often known as lead developer, in a smaller team.

Any dev team needs a technical leader. Half the time, the team manager or the project manager is technical enough to lead a suppor team but not a dev team,  so an architect is needed.  Often, the architect is the only leader.

The pre-sales architect is very different. Castle in the sand. Imaginary buildings.

Update: I feel in the US I could become a lead developer, once I become familiar with a codebase, but any role that requires a lot of persuasion I’m not too sure. I feel if my technical grasp of everything is higher than the rest then it’s possible. It’s all relative.

WallSt friends’ comment@slow coder,deadlines

This is life-n-death:  if you are not adding enough value you are out…

With important exceptions (Stirt, Lab49..) Wall street systems are stringent about time line, less about system failures, even less about maintainability or total cost of ownership or Testing. I feel very few (like 5%) Wall St systems are high precision and I include the pricing, risk, trade execution systems. Numerical accuracy is important to the users though, because those numbers are about the only thing they can understand. Their job is about control on those numbers.

In City muni, Boris’s code was thrown out because it didn’t make it to production. Any production code is controlled not by dev team but many many layers of control measures. So my production code in Citi will live.

If you are slow, Anthony Lin feels they may remove you and get a replacement to hit the deadline. If they feel it’s hard to find replacement and train him up, then they keep you – all about time lines.

Hou Li felt your effort does protect you – 8.30 – 6pm everyday. If still slow, then manager may agree estimate is wrong. She felt deadline and effort estimate are arbitrary. However, if you are obviously slower than peers, then boss knows it.

non-virtual dtor: some random scenarios tested #IV

Background — How important are these scenarios? First off, tech quizzes are extremely important since you are judged just over a few questions. Second, these scenarios pop up by accidents, rather than be design, all the time in real projects. You better learn to deal with a drunken driver while on the road.

Q1: what if Base and Derived dtor both non-virtual and an autoVar is destroyed?
AA: Tested — ~Derived() and then ~Base().  See post on DCBC.

Q2: What if Base dtor is non-virtual but Derived is virtual, and a Derived auto variable is destroyed on the stack?
AA: Same as Q1. For an autoVariable that’s not deleted via a ptr, Derived ctor (virtual or not) runs, followed by Base dtor. Same DCBC

Q3: Base dtor is non-virtual but Derived is virtual. Derived heap pointer is assigned to a B pointer and deleted?
AA: only ~Base runs , in my g++ test, though officially UB.

Q4: Base and Derived are virtual. Derived heap pointer is assigned to a B pointer and deleted?
AA: ~Derived() then ~Base(). DCBC + dynamic dispatch

Note the well-known __undefinedBehavior__ affects delete only, not stack variables or static variables.Note virtual keyword affects pointer variable. Non-ref variables aren’t affected.

The object to destruct is a Derived
~B ~D st? delete heap D via B* or D* test result notes
1 nv nv stack ~D~B DCBC + static dispatch
2 nv virtual stack ~D~B DCBC + static dispatch
3 nv virtual B*  ~B only static dispatch. “virtual” ignored
4 v virtual D*  ~D~B DCBC + dynamic dispatch
5 v implicitly v D*  ditto ditto
struct B{
  ~B(){ cout<<"~B\n";}
};
struct D: public B{
  virtual ~D(){ cout<<"~D\n";}
};
int main(){
  B* myD=new D();
  delete myD;
}

PURE-interface types ] c++^java

http://stackoverflow.com/questions/318064/how-do-you-declare-an-interface-in-c
Someone pointed out

“The whole reason you have a special Interface type-category in addition to abstract base classes in C#/Java is because C#/Java do not support multiple inheritance. C++ supports multiple inheritance, and so a special type isn’t needed. An abstract base class with only pure virtual methods is functionally equivalent to a C#/Java interface.”…. actually with minor differences.

The sample code shows no special ctor, though the dtor is public virtual but without “=0” (not pure virtual), so I assume an “interface” type in c++ should have a virtual dtor, since it is designed to be subclassed.

Google style guide is more strict — A class is a pure interface if it meets the following requirements:

  • It has only public pure virtual methods, non-pure virtual destructor, and static methods
    • I feel the dtor should be empty since there’s no instance field.
  • It may not have non-static data members — same as java8 interfaces
  • It need not have any constructors defined. If a constructor is provided, it must take no arguments and it must be protected.

calling unknown code while hold`(or Not hold`) lock #XR

In c++, if the unknown code does a fork(), then all locks are replicated to new born process. See after fork(): threads

[[eff c#]] warns against calling “unknown” code while holding a lock. This advice applies to java and c++ too.

Now, What defines “unknown”? Any delegate can have an inv list containing unknown callbacks. Raising an event involves delegate invocation. If you receives an Action instance, realize it’s a delegate instance too.
Another category of “unknown code” might be implementations of an interface. You get a IComparable instance, and you just call Compare() without knowing the implementation
Suppose within obj1.method1() we grab a lock and then call the unknown code. Such unknown code could circle back to call obj1.method2(). If method2() also grabs the same lock, then no issue, thanks to c# reentrancy. Note c# reentrancy is implemented by counting — http://stackoverflow.com/questions/3687505/recursive-nested-locking-in-c-sharp-with-the-lock-statement
If method2 is on another thread, then we get a classic single-lock deadlock — method2 would be blocked by the lock held by method1(), hopefully for a short time only. If method1() is unfortunately waiting for method2 to finish its work on the other thread, then the deadlock would persist.
Even if there’s no locking involved, method2 may circle back to invoke method1 (on the same or a separate thread, on the same obj1 or another object), recursively. If this recursion is unplanned (like unplanned pregnancy:), then stack overflow may ensue.
Dumb luck if unintentional recursion doesn’t hit stack overflow.

c++advanced IV questions – contributed by real interviewers

http://www.quora.com/Anyone-can-list-good-common-and-advanced-questions-for-testing-C++-STL-skill-during-interview

These authors are real interviewers. They really do care about the topics I mentioned to you Monday night. They are seeking c++ specialists who understand the low level details. They don't care about the high level, architecture, design, picking the right framework or library etc.
Victor

maturity bucketing for VaR

[[complete guide]] P 457 pointed out VaR systems often need to aggregate cashflow amounts across different deals/positions, based on the “due date” or “maturity date”.

Example — On 12/31 if there are 33 payable amounts and 88 receivable amounts, then they get aggregated into the same bucket.

I think bucketing is more important in these cases:

  • a bond has maturity date and coupon dates
  • a swap has multiple reset dates
  • most fixed income products
  • derivative products — always has expiry dates

In StirtRisk, I think we also break down that 12/31 one day bucket by currency — 12/31 USD bucket, 12/31 JPY bucket, 12/31 AUD bucket etc.

Q: why is this so important to VaR and other market risk systems? (I do understand it hits “credit risk”.)
%%A: For floating rate products, the cashflow amount on a future date subject to market movements
%%A: FX rate on a future date 12/31 is subject to market movements
%%A: contingent claim cashflow depends heavily on market prices.
%%A: if 12/31 falls within 10D, then 10D VaR would be impacted by the 12/31 market factors

subverting kernel’s resource-allocation – a few eg

[[Linux sys programming]] explains several techniques to subvert OS resource-allocation decisions. Relevant to performance-critical apps.

P275 mlockall() — mlock() : prevent paging ] real time apps

P173 sched_setaffinity(). A strongly cache-sensitive app could benefit from hard affinity (stronger than the default soft affinity), which prevents the kernel scheduler migrating the process to another
processor.

[[The Linux Programmer’s Toolbox]]. A write-only variable will be removed by the compiler’s optimizer, but such a variable could be useful to debugger. I read somewhere that you can mark it volatile — subversive.

Any way to prevent “my” data or instruction leaving the L1/L2 cache?

Any way to stay “permantly” in the driver’s seat, subverting the thread scheduler’s time-slicing?

cache-miss in(CPU hogging)hot-function

P245 [[Optimizing Linux Perf]] (2005) points out this “intriguing” scenario. Here are the investigation steps to identify it —

First, we use _oprofile_ to identify a function(s) taking up the most application time. In other words, the process is spending a large portion (I would imagine 5%+) of cpu cycles in this function. However, the cycles could be spent in one lengthy entry or a million quick re-entries. Either way, this would be a hotspot. Then we use oprofile/valgrind(cachegrind)/kcache on the same process, and check if the hot function generates high cache misses.

The punch line – the high cache misses could be the cause of the observed process hogging. I assume the author has experienced the same but I’m not sure how rare or common this scenario is.

Some optimization guy in a HFT shop told me main memory is now treated as IO, so cache miss is treated seriously. http://programmers.stackexchange.com/questions/142864/writing-low-latency-java mentions that “Cache misses are your biggest cost to performance. Use algorithms that are cache friendly.”

By the way, instruction cache miss is worse than data cache miss. My friend Shanyou also said the same.

empty while(true) loop hogging CPU

Someone (barclays?) gave me a basic tuning quiz —

Q: what cpu utilization will you see if a program executes an empty while(1==1){} ?

A: On a 16-core machine, 3 instances of this program each take up 4% of the aggregate CPU according to windows taskmgr. I think 4% means hogging one entire core.

A: on my RedHat linux, the same program has 99.8% CPU usage meaning one entire core.

A: On a dual-core, if I start 2 instances, each takes up about 50% i.e. one entire core, keeping all other processes off both cores. With 3 instances, system becomes visibly slow. Taskmgr itself becomes a bit hard to use and reports about 30%-40% by each instance.

I think this would count as cpu intensive job.

python pseudo constructors4everyday tasks #[]^()

These Python builtin functions have something in common:

* pseudo [1] constructors — manufacture an object of the specified type
* conversion constructors — converting some input value into an object of the specified type

[1] Actually builtin functions rather than ctor. I guess the fine differences between builtin functions, keywords and operators are not that important at this stage.

Because these are functions, they use “()”. The real list/dict ctors use other brackets.

P64 [[essential ref]] lists these and more, as “type conversion” functions.

– str()
– dict()
– list() — see py list initialize []^list()
– tuple()
– set()
– int()

– file() — very similar to open()

IPC mutex #boost

favorite IV topic

Update — both pthreads mutex and pthreads condVar can be configured as cross-process. Note — both are used for inter-thread coordination. For inter-process, better use semaphore.

https://stackoverflow.com/questions/6477525/interprocess-mutex-with-pthreads favors IPC mutex over IPC semaphore.

IPC mutex is supported in many operating systems such as ….?

A mutex in boost::thread is effective only within a single process [1]. To share a mutex between 2 processes, you can use either

A) an anonymous mutex in shared mem managed by your processes

B) a named mutex managed by the kernel (“hotel service desk”). They don’t live in shm. In this case, I guess the mutex is treated like a modem or any other hardware device, whereby any access is regulated by the kernel.

  • In B, the mutex is a boost::interprocess::named_mutex
  • In A, the mutex is a boost::interprocess::interprocess_mutex

If you want recursive,

  • In B, you can use a boost::interprocess::named_recursive_mutex
  • In A, you can use a boost::interprocess::interprocess_recursive_mutex

But Why bother with a shared mutex in the first place? Usually to guard a shared resource such as an object (say object K) in shared mem. By definition K is accessible from 2 (or more) processes P1 and P2. They can both ignore and bypass the mutex and access K directly. Therefore the guardian mutex lock is only advisory.

[1] because the mutex object lives in the private memory of the process

4 Scopes for operator-overloads #new()

  1. non-static member operators are very common, such as smart ptr operator++(), operator< () in iterators
  2. static member operator new() is sometimes needed. ARM explains why static.
  3. friend operators are fairly common, such as operator<<()
  4. class specific free standing operator is recommended by Sutter/Andrei, to be placed in the same namespace as the target class. Need to understand more. Advanced technique.

remain relevant(survival)to your team^global economy#localSys

When we feel job insecurity, we look around and see who’s more secure, more relevant, more current, and more in-demand. We may find some colleagues more relevant to your team or to the company. Maybe they are local system experts. Maybe they have deep expertise in the base vendor product (Tibco, Murex, LotusNotes, WebLogic…). We may want to _specialize_ like them.

Resist!

Job security derives from skill relevance, but relevant to who? The industry, not the local team, which could disappear if entire team becomes obsolete and irrelevant. I have seen people (including myself) specializing in DNS, VB.NET, Swing, Perl, EJB, let alone home-grown systems such as SecDB. These guys are (extremely) relevant but for a limited time only.

Have a long-term perspective — If you want job security through skill relevance, then invest in more permanent skills such as Math, algorithms, data structures, threading, SQL,…

Have a global perspective — Job security is a serious, almost life-and-death issue. You probably should look beyond your local industry. What skills are truly relevant on the global arena? Some skill (say Murex) might be more relevant locally but less globally.

Avoid niche domains unless it helps boost your mainstream skills.

undefined behavior C++: #1 famous, unknown secrets

(See below for smart ptr, template, non-RTTI)

Deleting [1] a derived[3] object via a base[4] pointer is undefined behavior if base[6] class has non-virtual dtor, with or without vtable.

This is well-known but it applies to a very specific situation. Many similar situations aren’t described by this rule —
[1a] This rule requires pointer delete. In contrast, automatic destruction of a non-ref “auto” variable (on stack) is unrelated.

[1b] This rule requires a heap object. Deleting a pointee on stack is a bug but it’s outside this rule.

[1c] This rule is about delete-expression, not delete[]

[3] if the object’s run-time type is base, then this rule is Inapplicable

[4] if the pointer is declared as pointer-to-derived, then Inapplicable, as there is no ambiguity which dtor to run

[3,4] if the object run time type is base, AND pointer is declared pointer-to-derived? Inapplicable — compiler or runtime would have failed much earlier before reaching this point.

[6] what if derived class has non-virtual dtor? Well, that implies base non-virtual too. So Yes applicable.

*) P62 [[effC++]] points out that even in the absence of virtual functions (i.e. in a world of non-RTTI objects), you can still hit this UB by deleting a subclass instance via a base pointer.

**) The same example also shows a derived class-template is considered just like a derived class. Let me spell out the entire rule — deleting an instance of a derived-class-template via a pointer to base-class-template is UB if the base class-template has a non-virtual dtor.

What if the pointee is deleted by a smart_ptr destructor? I think you can hit this UB.

eg – c++ exception-passing ^ param-passing

Update — [[c++primer]] points out the key concept of ExceptionObject (exo for short) in a try/catch. This is the object thrown, assuming we don’t throw a pointer.

exo isn’t the object created (“object1”) and passed to the “throw” statement. That object1 can have a short lifespan and destroyed much sooner than the exo, which needs a long lifespan — described on P1039. Note exo is copy-constructed from object1. See R1 below.

exo isn’t the object caught either, unless you catch by reference. P1037.

exo IS the object re-thrown if you use the empty throw statement.
—–
Based on [[More effC++]] item on the difference between exception-passing and function-parameter-passing. Illustrates a few rules — R1 R2 R3

class B: public std::exception{};
class C: public B{}; //and so on
class D: public C{}; //and so on
class K: public I{}; //and so on

f(){ D* ex = new K();
  throw *ex; // Copy#1 created [R1] and thrown, sliced [1] to D
}
main(int){
  try{ f();}
  catch (exception * ex){} // silently ignored because no “address” is thrown — type mismatch [R2]

  //catch (B ex){} //sliced Copy#2a created, just like function parameter pbclone
  // we will leave the above catch clause commented without further discussion.

  catch(B & exCaught){ //Copy#1 caught without slicing
      //throw; //throws Copy#1 [R3], type D
      throw exCaught; //Copy#2b created [R1] and thrown, sliced[1] to B
}

[1] slicing by copy-ctor.

java/c++overriding: 8 requirements #CRT

Here’s Another angle to look at runtime binding i.e. dynamic dispatch i.e. virtual function. Note [[effModernC++]] P80 has a list of requirements, but I wrote mine before reading it.

For runtime binding to work its magic (via vptr/vtbl), you must, must, must meet all of these conditions.

  • method must be –> non-static.
  • member must be –> non-field. vtbl offers no runtime resolution of FIELD access. See [[java precisely]]. A frequent IV topic.
  • host object must be accessed via a –> ref/ptr, not a non-ref variable. P163 [[essential c++]]. P209 [[ARM]] explains that with a nonref, the runtime object type is known in advance at compile time so runtime dispatch is not required and inefficient.
  • method’s parameter types must be —> an exact copy-paste from parent to subclass. No subsumption allowed in Java [2]. C++ ARM P210 briefly explains why.
  • method is invoked not during ctor/dtor (c++). In contrast, Java/c# ctor can safely call virtual methods, while the base object is under construction and incomplete, and subclass object is uninitialized!
  • method must be –> virtual, so as to hit the vtbl. In Java, all non-static non-final methods are virtual.
  • in c++ the call or the function must NOT be scope-qualified like ptr2->B::virtF() — subversion. See P210 ARM
  • the 2 methods (to choose from) must be defined INSIDE 2 classes in a hierarchy. In contrast, a call to 2 overload methods accepting a B param vs a D param respectively will never be resolved at runtime — no such thing as “argument-based runtime binding”. Even if the argument is a D instance, its declared type (B) is always used to statically resolve the method call. This is the **least-understood** restriction among the restrictions. See http://bigblog.tanbin.com/2010/08/superclass-param-subclass-argument.html

If you miss any one condition, then without run/compile-time warnings compiler will __silently__ forgo runtime binding and assume you want compile-time binding. The c++11 “overload” and java @Override help break the silence by generating compiler errors.

However, return type of the 2 functions can be slightly different (see post on covariant return type). P208 ARM says as of 1990 it was an error for the two to differ in return type only, but [[c++common knowledge]] P100 gives a clear illustration of clone() method i.e. virtual ctor. See also [[more eff C++]] P126. CRT was added in 1998.

[2] equals(SubclassOfObject) is overloading, not overriding. @Override disallowed — remember Kevin Hein’s quiz.

Here’s a somewhat unrelated subtle point. Suppose you have a B extended by C, and a B pointer/ref variable “o” seated at a C object, you won’t get runtime binding in these cases:

– if you have a non-static field f defined in both B/C, then o.f is compile-time binding, based on declared type. P40 [[java precisely]]
– if you have a static method m() defined in both B/C, then o.m() is compile-time binding, based on declared type. [1]
– if you have a nonref B variable receiving a C object, then slicing — u can’t access any C part.

[1] That’s well-known in java. In C++, You can also “call a static member function using the this pointer of a non-static member function.”

##[11]upstreams(sg gov)for next 10Y #wallSt tech

(See also blog post on 2010 GS annual report.) Singapore government is smart to always look out for upstream domains. Similarly, in financial IT, there are some upstream *domains* too.

Upstream: real time risk — upstream in derivative risk … but poor Market Depth
Upstream: grid — Memory virtualization, grid computing
Upstream: Eq(and FX) HFT and algo trading is upstream to other asset classes
Upstream: Option-related (including mtg) analytics tend to be more advanced, and probably upstream. Any derivative with optinality tends to be dominated by vol in terms of pricing complexity.
Upstream: connectivity — collocation, streaming,
Upstream: latency — sub-m, multi-core, non-blocking…
Upstream: voluminous market data processing — storage, distribution…FX and eq option has the highest volume, and often dominates entire infrastructure. Many asset classes are far behind
Upstream: SecDB — firmwide singleton mkt risk engine. Upstream in m-risk technology. Most shops have data silos.

However, upstream technologies often become fads.
– C++? not really upstream. A strong c++ guy isn’t necessarily strong in java or C#.
– web service
– solace-like messaging hardware
– CEP? How much real value does it add?
– rule engines
– MortgageBackedSecurity pricing? Not really upstream, though most complex analytically.
– ETL?
– hibernate?
– python?

pipe, stream, pesudo file, tcp-socket#xLang

* Java has a family of IO classes;
* c++ has a family of stream classes derived from class ios;
* unix has various file-like things.

“pipe” and “stream” are 2 general terminologies for an input/output destination with sequential[1] access. A file is one subtype of stream. Therefore java and c++ access files using their family of classes above.

[1] c++ gives each sequential stream a get-ptr and a put-ptr.
A named pipe is an entity in the file system….

TCP (not UDP) socket is a stream-oriented protocol. Sockets, either unix domain or TCP (not udp), is also like a pipe/stream. In fact, Unix treats any socket as a file so you can read and write to a it like a file. Perl liberally “confuses” file handles and sockets, just as C uses the same type of file-descriptor-number for files and sockets.

TCP (not UDP) is ordered — if two messages are sent over a connection in sequence, the first message will reach the receiving application first. When data segments arrive in the wrong order, TCP buffers the out-of-order data until all data can be properly re-ordered and delivered to the application.

Python wraps the socket descriptor number in a socket object, but you can still get the underlying descriptor number by mySocket.fileno()–

import socket
mySocket = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
print mySocket.fileno()

concurrent JGC does occasional stop-the-world #GS

Folks often mention concurrent GC as the opposite of stop-the-world (STW) GC, but in fact in Java lingo “concurrent” actually involves short stop-the-world pauses. In the initial marking, the GC root objects are marked as alive. During this phase, all threads of the application are suspended.

* STW means always-STW.
* Concurrent means occasional-STW.

STW is simpler and cleaner than Concurent because … during concurrent marking you can miss some reachable objects. See https://java.sun.com/j2se/reference/whitepapers/memorymanagement_whitepaper.pdf  for the exact details.

Even in single-threaded STW you could design finalize() to perform resurrection like http://www.xyzws.com/Javafaq/what-is-resurrection-in-garbage-collection/47

 

sizeof : array on stack^heap

Here are 3 int pointers — almost indistinguishable at runtime, but compile-time….?

int main(){
cout<<sizeof (char)<<endl; // always 1 by definition
cout<<sizeof (void*)<<endl; // shows 8 on my 64-bit machine, the size of any address
cout<<sizeof (int)<<endl; // shows 4 on my 64-bit machine

int * ipNew = new int;
cout<<sizeof ipNew // 8, size of an address

int * iaNew = new int[5];
cout<<sizeof(iaNew) // 8 too, since array-new returns just an address

int ia[] = {1,3,9};
cout<<sizeof(ia) // 12, size of an Entire array variable on stack
}

At run time, ia probably is just a simple pointer (to stack) and doesn’t remember the array length. However, when sizeof is evaluated at compile time, compiler knows the array length!

const nonref variable: reasonably immutable

Reading the c++ FAQ 18.10, I got the impression that most const variables are ptr or ref.

const nonref primitive is like a java constant or java immutable.

Q: are there const nonref class object? Are they immutable?
A: Yes according to http://www.learncpp.com/cpp-tutorial/810-const-class-objects-and-member-functions/

Once a const class object has been initialized via constructor, any 
attempt to modify the member variables of the object is disallowed, as it 
would violate the constness of the object.

Just like the built-in data types (int, double, char, etc…), class objects 
can be made const by using the const keyword.

Q: can you cast away its constness?
A: very tricky.  See effC++ and the IBM arcitlce.  Short answer is no.

up and down casting nonref/ref/ptr

Primary usage of dynamic_cast is down-cast
* base/derived class must have vptr, or you get compile-time error
* original and target types must be ptr/ref, or you get compile-time error [1]
* there’s just one object involved, not 2
** That one object must be a complete and full[2] Derived object, otherwise you get runtime (not compile time) failure, indicated by 1) null ptr or 2) exception (during reference down-cast)
*** boost polymorphic_cast adds uniformity by throwing exception for both

[1] static_cast can cast nonref.
[2] static_cast doesn’t incur the overhead to check that

Q: up-casting a nonref??
A: no casting operator required, but you get sliced. qq/ myB = myD; /
A: by the way, no such thing in java, since java doesn’t use “nonref”

Q: down-casting a nonref??
A: impossible in dynamic_cast. “dynamic_cast can be used only with pointers and references to objects”

Q: down-casting a ptr (to a polymorphic object only)?
A: dynamic_cast. May return NULL. java has a similar feature.
A: see also boost polymophic_cast

Q: down-casting a ref (to a polymorphic object only)?
A: dynamic_cast. Never returns NULL. .. down cast a reference

Q: down-cast a base ptr (or ref) to a derived object but no vtbl/vptr no virt func?
A: impossible. dynamic_cast won’t compile.

Q: up-cast a ptr?
A: common in c++ and java. everyday virtual function scenario. no operator required.

Q: up-cast a ref?
A: legit but less common than ptr. See post on virtual^redefining

java: best way2report bad arguments #exceptions

Say you are writing a utility Class1.getByIndex(int arrayIndex). Best way to enforce that arrayIndex is positive?

  • throw an unchecked error and crash the jvm? Recommended by many experts. I guess the downside is tolerable, and there are no better candidates as a standard, universal solution.
    • What if too often? Caller code needs a fix!
  • throw a checked exception and force other programmers to try-catch? I think the overhead grows when you set this as standard practice.
  • return a boolean result? Impossible to Many methods/constructors.

Ideally, you want to remind other programmers to check their input first, but not force them to put up a useless try-catch.

O(n) sort – radix sort

The principles are relevant to pure-algo interviews.

Q: quicksort is universally usable. How about radix sort?
A: not widely mentioned as a drawback against quicksort. (In contrast, I think Counting-sort is not universally usable…)
A: wikipedia says integers could [1] represent limited-length strings of characters (e.g., names or dates) and specially formatted floating point numbers, radix sort is not limited to integers.

Radix sort is good for parallel computing. See http://www.drdobbs.com/parallel/parallel-in-place-radix-sort-simplified/229000734

Radix-sort Expected running time is faster than quicksort? Could be but….
… but the hidden constant factor in O() is much worse than quicksort.

[1] It’s hard to prove “cannot”