c++debug^release build can modify app behavior #IV

This was actually asked in an interview, but it’s also good GTD knowledge.

https://stackoverflow.com/questions/4012498/what-to-do-if-debug-runs-fine-but-release-crashes points out —

  • fewer uninitialized variables — Debug mode is more forgiving because it is often configured to initialize variables that have not been explicitly initialized.
    • For example, Perhaps you’re deleting an uninitialized pointer. In debug mode it works because pointer was nulled and delete ptr will be ok on NULL. On release it’s some rubbish, then delete ptr will actually cause a problem.

https://stackoverflow.com/questions/186237/program-only-crashes-as-release-build-how-to-debug points out —

  • guard bytes on the stack frame– The debugger puts more on the stack, so you’re less likely to overwrite something important.

I had frequent experience reading/writing beyond an array limit.

https://stackoverflow.com/questions/312312/what-are-some-reasons-a-release-build-would-run-differently-than-a-debug-build?rq=1 points out —

  • relative timing between operations is changed by debug build, leading to race conditions

Echoed on P260 [[art of concurrency]] which says (in theory) it’s possible to hit threading error with optimization and no such error without optimization, which represents a bug in the compiler.

P75 [[moving from c to c++]] hints that compiler optimization may lead to “critical bugs” but I don’t think so.

  • poor use of assert can have side effect on debug build. Release build always turns off all assertions as the assertion failure messages are always unwelcome.
Advertisements

asymmetry lower_bound^upper_bound #IIF lookup miss

For a “perfect” hit in both set::lower_bound() and std::lower_bound(), the return value is equivalent to the target; whereas upper_bound is strictly higher than target. See

To achieve symmetry, we need to decrement (if legal) the iterator returned from upper_bound.
———-
If no perfect hit, then lower_bound() and upper_bound() both give the next higher node, i.e. where you would insert the target value.

#include <iostream>
#include <algorithm>
#include <vector>
using namespace std;

vector<int> v{1,3,5};
int main(){
  vector<int>::iterator it;
  it = lower_bound(v.begin(), v.end(), 2); cout<<*it<<endl;
  it = upper_bound(v.begin(), v.end(), 2); cout<<*it<<endl;
}

multiple hits: lower_bound gives earliest

Looking for lower_bound (2) in {0,1,2,2,3,4}, you get the earliest perfect hit among many, i.e. the left-most “2”.

No such complexity in upper_bound since upper_bound never returns the perfect hit.

No such complexity in set.lower_bound since it won’t hold duplicates.

int main(){
  vector<int> s{0,1,2,2,3,4};
  vector<int>::iterator it = lower_bound(s.begin(), s.end(), 2);
  cout<<"to my left: "<<*(it-1)<<endl;
  cout<<"to my right: "<<*(it+1)<<endl;
  cout<<"to my right's right: "<<*(it+2)<<endl;
}

non-local static class-instance: pitfalls

Google style guide and this MSDN article both warn against non-local static objects with a ctor/dtor.

  • (MSDN) construction order is tricky, and not thread-safe
  • dtor order is tricky. Some code might access an object after destruction 😦
  • (MSDN) regular access is also thread-unsafe, unless immutable, for any static object.
  • I feel any static object including static fields and local statics can increase the risk of memory leak since they are destructed very very late. What if they hold a growing container?

I feel stateless global objects are safe, but perhaps they don’t need to exist.

mgr position stress: project delay #cf FTE/contractor

contractor is most care-free. Even As an employee, the pressure to deliver is lower than the mgr.

As a junior VP (perhaps a system owner) you could still stay behind a shield (defend yourself) — “I did my best given the limitations and constraints”. However, As mgr, you are more expected to own the task and solve those problems at a higher level of effectiveness, including negotiations with other departments.

“Results or reasons?” … is the manager’s performance review.

Recall Yang, Stirt-risk …

  • —- past barometer due to project delivery pressure —-
  • GS – 10/10,  “if i quit GS I may have to quit this country; I must Not quit”
  • Stirt – 8
  • Mac – 7
  • OC – 5, largely due to fear of bonus stigma
  • 95G, Barc – 3, due to mgr pressurizing
  • Citi – 2

+! trySomethingNew] sg, what could I have(possibly)got

See also past vindicative specializations

  • I would still do my MSFM
  • I would still fail to get into algo trading or quant dev — too few jobs and extremely high bar
  • I would likely fail to get into leadership roles. I was considered for leadership roles at 1 to 3 companies

However,

  • I could possibly have focused on a specialization such as risk system + some analytics
  • would probably have joined citi, barc, baml, UBS, SC or .. in sg
  • probably java or swing or connectivity
  • would Not have achieved the c#/py/c++ ZBS growth
  • would Not have the skills to get this ICE raw mkt data job or the other c++ job offers.
  • no guarantee to become a manager or app owner. There could be many old timers in the team.
  • possibly less stress and pain. Lower chance of performance stress (#1 biggest stressor), because my GTD/KPI would be higher due to my java/SQL zbs.

lower_bound() may return end() #gotcha

lower_bound() may return end().

If your target value is too high and nothing qualifies, all 6 functions return the right end of the range. If you look at the (key value in) return value i.e. end-of-range,

  • This end-of-range node is a dummy. Never read its key value.
  • After lower_bound or upper_bound, always validate before reading the return value

I spent hours puzzled by the wrong data returned after lower_bound()->first. Specifically, if the map/set is keyed by integer, then the end()->first can be normal int value even when it fails and returns map.end()!

Consistent across 6 functions:

  • std::lower_bound
  • std::upper_bound
  • set/map methods

What if the target value is too low? Easier — upper bound should return left boundary iterator, and lower_bound returns the same iterator! See https://github.com/tiger40490/repo1/blob/cpp1/cpp1/miscIVQ/curveInterpolation_CVA.cpp

 

ETF share creation #over-demand context

http://www.etf.com/etf-education-center/7540-what-is-the-etf-creationredemption-mechanism.html is detailed.

Imagine a DJ tracking ETF by Vanguard has NAV=$99,000 per share, but is trading at $101,000. Overpriced. So the AP will jump in for arbitrage — by Buying the underlying stocks and Selling a single ETF unit. Here’s how AP does it.

  1. AP Buys the underlying DJ constituent stocks at the exact composition, for $99,000
  2. AP exchanges those for one unit of ETF from Vanguard.
    1. No one is buying the ETF in this step, contrary to the intuition.
    2. So now a brand new unit of this ETF is created and is owned by the AP
  3. AP SELLs this ETF unit on the open market for $101,000 putting downward pressure on the price.

Q: So how does the hot money get used to create the new ETF shares?
A: No. The hot money becomes profit to the earlier ETF investors. The ETF provider or the AP don’t receive the hot money.

## vi (+less) cheatsheet

https://github.com/tiger40490/repo1/blob/bash/bash/vimrc has some tricks including how to make vim remember last edit location.

  • ~~~~ command mode #roughly ranked
  • [3] dt — “dta” delete until the next “a”
  • [2] 9s — wipe out 9 characters (including current) and enter insert-mode. Better than R when you know how many chars (9) to change
    • to delete 5 characters … there is NO simpler keystroke sequence
  • R — Overwrite each character one by one until end of line. Useful if the replacement content is similar to original?
  • Ctrl-R to re-do
  • cw — wipe out from cursor to end of word and puts you into insert mode
    • c2w or 2cw
  • :se list (or nolist) to reveal invisible chars
  • C — wipe out from cursor to END of line and puts you into insert-mode
  • capital O — open new line above cursor
  • A — to append at END of current line
  • from inside q(LESS), type a single “v” to launch vi

–paging commands in vi and less

  • jump to end of file: capital G == in both vi and LESS
  • jump to head of file: 1G == in both vi and LESS
  • page dn: Ctrl-f == in both; LESS also uses space
  • page up: Ctrl-b == in both; LESS also uses b

[3/4] means vi receives 3 keystrokes; we hit 4 keys including shift or ctrl …

vi on multiple files

[3/4] means vi receives 3 keystrokes; we hit 4 keys including shift or ctrl …

–“split” solution by Deepak M

vi file1 # load 1st file

  • :sp file2 # to show 2nd file upstairs
  • :vsp file3 # to show 2nd file side by side
  • You end up with  — file2 and file3 side by side upstairs, and file1 downstairs!
  • [2/3] ctrl-ww # To move cursor to the “next” file, until it cycles back

–the q( :e ) solution

vi file1 # load 1st file

  • :e file2 # to put 2nd file on foreground
  • [1/3] ctrl-^ — to switch to “the other file”
  • This solution is non-ideal for moving data between files, since you must save active file before switching and you can’t see both files

–editing 3 or more files

  1. vi file1 file2 file3
  2. q(:n) to switch to next, q(:N) for previous…
  3. q(:args) shows all files
  • –Suppose now you are at file2.
  • q(:e file4) works. q(^) will toggle between file2 and file4
  • However, q(:n :N  :args) only work on the original list, not the new files from q(:e)

q(:n :N ^) always shows the current filename in status bar:)

## 9 specific short-term goals I look fwd 2keep me Motivated

This question is relevant to “meaningful endeavors”, “next direction” and “sustained focus”.

Q: Over the last 10Y, what I looked forward to :

  • before GS — it’s all about earning capacity.
  • GS– promotion and salary increment, but I soon realized limitations in me, in the environment etc
  • contracting phase — in-demand, muscle building; try something new; billing rate
  • sg 3 jobs– personal investment
  • after re-entry to U.S. — IV batting average, as gauge of my market value

Q: what positive feedbacks can I look forward to, to keep me motivated?

  1. success with tricky coding questions from real interviews (perhaps from friends)
  2. more time for myself (but not in bad mood) — blogging, reading, exercise, sight-seeing.
  3. more time to reunion with family and grandparents. Remember [[about time]] movie theme?
  4. more income to provide for kids, grandparents and my dear wife
  5. more savings — to achieve more investment success
  6. more savings — buy a home nearer to office to cut commute
  7. more IV success, perhaps in quant or HFT domains?
  8. growing IV capabilities towards better jobs
  9. positive feedback from mgr like Anand and Ravi K.
    • promotion?
  10. build zbs in c++/py — unrelated to IV, but gives me the much-needed respect, cool confidence, freedom from stress …?
  11. weight and fitness improvement
  12. more insights to publish on my blog, a sign of my accumulation

[17]orgro^unconnecteDiversify: tech xx ROTI

Update — Is the xx fortified with job IV success? Yes to some extent.

Background – my learning capacity is NOT unlimited. In terms of QQ and ZZ (see post on tough topics with low leverage), many technical subjects require substantial amount of /laser energy/, not a few weeks of cram — remember FIX, tibrv and focus+engagement2dive into a tech topic#Ashish. With limited resources, we have to economize and plan long term with vision, instead of shooting in all directions.

Actually, at the time, c#+java was a common combination, and FIX, tibrv … were all considered orgro to some extent.

Example – my time spent on XAML now looks not organic growth, so the effort is likely wasted. So is Swing…

Similarly, I always keep a distance from the new web stuff — spring, javascript, mobile apps, cloud, big data …

However, on the other extreme, staying in my familiar zone of java/SQL/perl/Linux is not strategic. I feel stagnant and left behind by those who branch out (see https://bintanvictor.wordpress.com/2017/02/22/skill-deependiversifystack-up/). More seriously, I feel my GTD capabilities are possibly reducing as I age, so I feel a need to find new “cheese station”.

My Initial learning curves were steeper and exciting — cpp, c#, SQL.

Since 2008, this has felt like a fundamental balancing act in my career.

Unlike most of my peers, I enjoy (rather than hate) learning new things. My learning capacity is 7/10 or 8/10 but I don’t enjoy staying in one area too long.

How about data science? I feel it’s kind of organic based on my pricing knowledge and math training. Also it could become a research/teaching career.

I have a habit of “touch and go”. Perhaps more appropriately, “touch, deep dive and go”. I deep dived on 10 to 20 topic and decided to move on: (ranked by significance)

  • sockets
  • linux kernel
  • classic algorithms for IV #2D/recur
  • py/perl
  • bond math, forex
  • black Scholes and option dnlg
  • pthreads
  • VisualStudio
  • FIX
  • c#, WCF
  • Excel, VBA
  • xaml
  • swing
  • in-mem DB #gemfire
  • ION
  • functional programming
  • java threading and java core language
  • SQL joins and tuning, stored proc

Following such a habit I could spread out too thin.

mgr position limitation:没当上经理,但换工作/国家 更容易

我选择一直当程序员, 有个好处就是比较容易回新加坡工作,过几年再来美国也容易。

当经理的就没这么灵活。 他们不能太频繁换工作,因为简历会受影响。经理的空缺,数目也低得多。很多类型的经理职位,只在中国, 不可能在另一国家找到同类职位。比如鲁诺所任职的国营企业,比如某同学任职的外资企业中国分公司总裁。

公司招聘经理非常谨慎,招聘程序员则比较简洁迅速。 这对我找工作很有利。

personal learn`]difficult workplace: %%tips #XR

I understand your boss could be pushing hard and you have colleagues around who may notice what you do…. Well, Not much wiggle room for self-study. A few suggestions:

  • try to put some technical ideas into the code. I did manage to put in some threading, some anonymous inner classes, some memcpy(), some local byte array buffer into my project, so I get to practice using them. (I understand the constraints of a tight time line…)
  • it takes a few minutes to write down new findings discovered at work. I put them in my blog. I also try to spend a little more time later on to research on the same topic, until I feel confident enough to talk about it in an interview.
  • I try to identify some colleagues who are willing to discuss technical issues in my project. I try to discuss only when boss is not paying attention. Boss is likely to feel we are taking too much time on some unimportant issue.

If the learning topic is not related to work, then I feel it’s similar to checking personal investment account at work. (In the ICE office now, some employees get a cubicle with 4 walls so they get more freedom than me.) Do your colleagues check their investment accounts at lunch time? I believe they always get a bit of personal time. In GS, on average very roughly 1 out of 9 working hours is spent on personal matters, and the other companies have higher than that. We all need personal time to call insurance, immigration, repair, … The managers might need even more personal time. I would guess at least 60 minutes a day is yours. Question is how to avoid drawing attention. I don’t care that much about drawing attention, so I often print technical articles to read, or read on-line, or blog on-line.

It’s more discrete to write email to record your technical learning. I often send those emails to my blog (email-to-publish) or to my personal email address.


Personal time (be it 60 minutes or 3 hours at some banks) is never enough. We just have to try harder to squeeze a bit more out of the 9 hours. If you are serious about learning in your personal time, then I see two much bigger obstacles
1) family responsibility and distractions
2) insufficient motivation and persistent effort (三天打鱼两天晒网)

In my Singapore years (4.5 years), I felt overwhelmed not by work but family duties, so my weekends/evenings were almost never put to good use for personal learning. I can’t just blame my family members though. I do get quiet time 10.30 pm to 12.30 and many hours on weekends. Somehow, I didn’t put in persistent effort so I didn’t experience significant growth in my technical capabilities.

A very capable colleague was able to do his math research at home and make progress. His wife is probably a full time home maker and takes care of their 3 kids. He is wealthy so he may have a maid and a separate study room at home. However, I feel a more important factor is his persistent effort and focus. A rolling stone gathers no moss. By the way, this same guy runs about 5 miles at least 4 times a week. Determined and efficient. Good time management and correct priorities.

If (a big IF) we are sufficient motivated, we will find time or make time, either the 60 minutes at work, or on trains, or at home, or in Starbucks. In reality, very few individuals have that level of motivation, so I believe some external factors can help, such as (my favorite) —

* jot down some idea in a draft email and do a bit of research whenever I get time to build on the idea, until it’s complete and fairly substantial. Idea could be something I overhead, an idea I’m truly interested in. The learning is mostly in the research but also in the subsequent reviews. If I don’t review the email, I will forget most of it. When I do review it, I not only refresh my memory, but I often discover connections with other things I studied, or find new ideas to learn — 温故而知新. Learning is associative, like growing a spider web.

c#/c++/quant – accumulated focus

Update — such a discussion is a bit academic. I don’t always have a choice to focus on one area. I can’t afford to focus too much. Many domains are very niche and there are very few jobs.

If you choose the specialist route instead of the manager route, then you may find many of the successful role models need focus and accumulation. An individual’s laser energy is a scare resource. Most people can’t focus on multiple things, but look at Hu Kun!

eg: I think many but not all the traders I know focus for a few years on an asset class to develop insight, knowledge, … Some do switch to other asset classes though.
eg: I feel Sun L got to focus on trading strategies….
eg: my dad

All the examples I can think of fall into a few professions – medical, scientific, research, academic, quant, trading, risk management, technology.

By contrast, in the “non-specialist” domains focus and accumulation may not be important. Many role models in the non-specialist domains do not need focus. Because focus+accumulation requires discipline, most people would not accumulate. “Rolling stone gathers no moss” is not a problem in the non-specialist domains.

I have chosen the specialist route, but it takes discipline, energy, foresight … to achieve the focus. I’m not a natural. That’s why I chose to take on full time “engagements” in c#, c++ and UChicago program. Without these, I would probably self-teach these same subjects on the side line while holding a full time java job, and juggling the balls of parenting, exercise, family outings, property investment, retirement planning, home maintenance….[1] It would be tough to sustain the focus. I would end up with some half-baked understanding. I might lose it due to lack of use.

In my later career, I might choose a research/teaching domain. I think I’m reasonably good at accumulation.

–See also
[1]  home maintenance will take up a lot more time in the US context. See Also
https://1330152open.wordpress.com/2015/08/22/stickyspare-time-allocation-history/ — spare time allocation
https://1330152open.wordpress.com/2016/04/15/set-measurable-target-with-definite-time-frame-or-waste-your-spare-time/
https://1330152open.wordpress.com/2016/04/26/spare-time-usage-luke-su-open/

mgr position risk: forced out

An engineer can be forced out, too, due to performance or attitude, but a mgr can be forced out for no fault of her own — change of upper management.

The “like” factor is more important in a manager than an engineer. In a sense, a mgr keeps her place by pleasing her superior, in addition to doing her job (of getting things done.)

Therefore, a mgr position can feel more /precarious/ than an engineer position.

mgr position risk: targeted hatred

“Hatred” is stronger word than “dislike”. Hatred demands actions.

Hatred can emerge among subordinate employees, superiors, downstream teams, or lateral colleagues.

If an employee feels unfairly treated, usually she puts up with it or quit, but a fair percentage (30%?) of them could decide to take action. I once reached out to HR at OC. Some lodge an official complaint.

Even if the employee doesn’t take action, the intense dislike is bound to spread and become infectious.

How easy is it to neutralize or contain hatred? Get real.

How easy is it to remain fair to every employee? Get real.

mgr position stress: inferiority,rivalry

At the senior mgr level, your position in the hierarchy is highly visible to everyone and also in your own mind. The higher, the more visible. You are more likely to feel inferior (+superior) to other people across the industry. In contrast, the regular employee and contractors are not in a position to feel that much inferiority — /blissful oblivion/ means happiness and carefree.

Some would say the inferiority is a /part and parcel/ of moving up, so most people would willingly accept it. I think each individual reacts different. Some may be more affected by it when they move up.

Rivalry is another side of the same coin. It can get ruthless. I remember Mark in PWM.

Demotions and promotions are more intense than the annual bonus situation.

 

 

senior mgr position risk: temptations

A risk underestimated at the senior mgr position — seduction, temptation. You will be a target. I guess the operators are sharp observers. They could possibly spot your weakness.  It’s really human to be attracted to the opposite sex. You can’t completely hide your vulnerability.

A friend ML told me it can be very hard to resist at the right time and right place.

Alcohol is a common “weakening” factor, or possibly a weapon used by the operator.

python to dump binary data in hex digits

Note hex() is a built-in, but I find it inconvenient. I need to print in two-digits with leading 0.

Full source is hosted in https://github.com/tiger40490/repo1/blob/py1/tcpEchoServer.py

def Hex(data): # a generator function
  i=0
  for code in map(ord,data):
    yield "%02x " % code
    i += 1
    if i%8==0: yield ' '

print ''.join(Hex("\x0a\x00")); exit(0)

housekeeping^payload fields: vector,string,shared_ptr

See also std::string/vector are on heap; reserve() to avoid re-allocation

std::vector — payload is an array on heap. Housekeeping fields hold things like size, capacity, pointer to the array. These fields are allocated either on stack or heap or global area depending on your variable declaration.

  • Most STL (and boost) containers are similar to vector in terms of memory allocation
  • std::string — payload is a char-array on heap, so it can expand both ways. Housekeeping data includes size…
  • shared_ptr — payload includes a ref counter and a raw-pointer object [1] on heap. This is the control-block shared by all “club members”. There’s still some housekeeping data (pointer to the control block), typically allocated on stack if you declare the shared_ptr object on stack and then use RAII.

If you use “new vector” or “new std::string”, then the housekeeping data will also live on stack, but I find this practice less common.

[1] this is a 32-byte pointer object, not a pure address. See 3 meanings of POINTER + tip on q(delete this)

array^pointer variables types: indistinguishable

  • int i; // a single int object
  • int arr[]; //a nickname of the starting address of an array, very similar to a pure-address const pointer
  • int * const constPtr;
  • <— above two data types are similar; below two data types are similar —->
  • int * pi; //a regular pointer variable,
  • int * heapArr = new int[9]; //data type is same as pi

c++big4: prefer synthesized

I think it’s the author of [[safe c++]] who pointed out that if we have to maintain non-default big4, then it’s extra workload for the maintenance programmer. He argued convincingly that it’s not a good idea to require other programmers or yourself to “always remember to do something”

pointer as field –#1 pattern in c++ explains that shared_ptr as a pointer field allows us to use the default big4.

array as field #implementation pattern

array field is less common in java/c# than in c++ and include vector, hashtable, deque.

As an alternative, please consider replacing the array with a vector. This would use heap memory but total memory usage is probably similar.

  • benefit — lower risk of seg fault due to index out of range
  • benefit — growable, though in many cases this is unneeded
  • benefit — different instances can different sizes, and the size is accessible at run time.
  • benefit — compared to a heap array as a field, vector offers RAII safety

ensure operator<< is visible via header file

If you define operator<<() for a basic ValueObject class like Cell, to be used in higher-level class like Board, then you need to make this declaration visible to Board.cpp via header files.

If you only put the definition of operator<<() in a ValueObj.cpp and not the ValueObj.h, and you link the object files ValueObj.o and Board.o, everything compiles fine. When Board.cpp calls operator<< on this Cell object it would use the default operator<< rather than yours.

2obj files compiled@different c++toolchains can link@@

(many interviewers asked…)

Most common situation — two static libs pre-compiled on toolchain A and B, then linked. Usually we just try our luck. If not working, then we compile all source files on the same toolchain.

Toolchain A and B could differ by version, or compiler brand, or c vs c++ … I guess there’s an Application Binary Interface between different toolchains.

https://gcc.gnu.org/onlinedocs/libstdc++/manual/using_dual_abi.html says that it’s possible (“straightforward”) to link C++03 and C++11 code together.

##teaching the privileged to get ahead@@

It’s often easier, more lucrative to focus on the affluent consumers, but consider “value”.

Example — trading techniques. This kinda teaching doesn’t really have much social value, except .. risk reduction? Zero-sum game … you help some win, so other investors must lose.

Example — coach some brainy kids get into gifted classes. This is gaming the competitive “system”. Actually the poor kids need your help more.

Example — coach table tennis kids win competitions. Arguably you help improve the table tennis game, but how much social value is there? Mostly you are helping those few individual kids get-ahead

Many other teaching subjects do have social value

  • languages, writing
  • tech, math, science
  • programming
  • health care
  • financial literacy
  • arts

EarlyRetireExtreme: learning as pastime !! mainstay

The ERE author enjoys learning practical skills as a hobby. In fact, his learning programs could be more than a hobby, since he has no full time job.

However, I am very different human being from him. I feel very few such learning programs can the mainstay during my semi- or full retirement. Why?

  • I need to work towards some level of commitment, and a daily routine.
  • I need to make some contribution and be paid for it
  • I prefer interaction with other people

q[less] functor ^ operator<() # map/sort/lower_bound

  • In coding tests, you can use any solution below, so I will use a global operator<(Trade, Trade)
  • In QQ, we need some basic know-how to discuss the alternatives but …. seldom quizzed

Let’s summarize the rules for QQ — I wanted to say “simple rules” but… non-trivial.

1) multiset/multimap/set/map use functor “less” .
** That’s by default. You can (but probably no practical reason to) specify any functor when instantiating the multiset class template. See post on [[allocator, vptr…]]
** Note a functor is always a class template.

2) each instantiation of functor template “LESS” typically calls operator<()
** note this “operator<()” can be a non-static method, or a global/namespace thing
** If you have an entity class Trade, you can customize by overloading this operator as a global function accepting 2 Trade arguments.
** However, Warren of CS (market data system) said it should often be a (non-static?) method of the Trade entity class. I feel this is best practice.

The dogmatic [[effective stl]] P179 doesn’t mention overloading operator< but advocates subclassing binary_function and giving it a unique name…. But this is significantly more (and unfamiliar) code to write, so for simplicity, simply overload operator<() and don’t touch q(less).

ptr-to-Trade as a key — see [[effSTL]] P88. Basically, you need a custom functor class deriving from std::binary_function. Beware the syntax pitfall highlighted in my blog post. Note shared_ptr is an option, not a must.

If you don’t need a red-black tree container, but need sorting, binary_search, lower_bound etc — then you have flexibility. Simplest is a pointer to a global bool function. See https://bintanvictor.wordpress.com/2017/10/01/binary-search-in-sorted-vector-of-tick-pointer/


How about std::lower_bound()? Same defaults — less and operator<()

How about std::sort()? Same defaults — less and operator<()

 

big-data arch job market #FJS Boston

Hi YH,

My friend JS left the hospital architect job and went to some smaller firm, then to Nokia. After Nokia was acquired by Microsoft he stayed for a while then moved to the current employer, a health-care related big-data startup. In his current architect role, he finds the technical challenges too low so he is also looking for new opportunities.

JS has been a big-data architect for a few years (current job 2Y+ and perhaps earlier jobs). He shared many personal insights on this domain. His current technical expertise include noSQL, Hadoop/Spark and other unnamed technologies.

He also used various machine-learning software packages, either open-sourced or in-house, but when I asked him for any package names, he cautioned me that there’s probably no need to research on any one of them. I get the impression that the number of software tools in machine-learning is rather high and there’s yet an emerging consensus. There’s presumably not yet some consolidation among the products. If that’s the case, then learning a few well-known machine-learning tools won’t enable us to add more value to a new team using another machine-learning tool. I feel these are the signs of an nascent “cottage industry” in the early formative phase, before some much-needed consolidations and consensus-building among the competing vendors. The value proposition of machine-learning is proven, but the technologies are still evolving rapidly. In one word — churning.

If one were to switch career and invest oneself into machine-learning, there’s a lot of constant learning required (more than in my current domain). The accumulation of knowledge and insight is lower due to the churn. Job security is also affected by the churn.

Bright young people are drawn into new technologies such as AI, machine-learning, big data, and less drawn into “my current domain” — core java, core c++, SQL, script-based batch processing… With the new technologies, Since I can’t effectively accumulate my insight(and value-add), I am less able to compete with the bright young techies.

I still doubt how much value-add by machine-learning and big data technologies, in a typical set-up. I feel 1% of the use-cases have high value-add, but the other use cases are embarrassingly trivial when you actually look into it. I guess it mostly consist of

  1. * collecting lots of data
  2. * store in SQL or noSQL, perhaps on a grid or “cloud”
  3. * run clever queries to look for patterns — data mining

See https://bintanvictor.wordpress.com/2017/11/12/data-mining-vs-big-data/. Such a set-up has been around for 20 years, long before big-data became popular. What’s new in the last 10 years probably include

  • – new technologies to process unstructured data. (Requires human intelligence or AI)
  • – new technologies to store the data
  • – new technologies to run query against the data store

container of smart^raw pointer

In many cases, people need to store addresses in a container. Let’s use std::vector for example. Both smart ptr and raw ptr are common and practical

  • Problem with raw ptr — stray pointer. Usually we the vector doesn’t “own” the pointee, and won’t delete them. But what if the pointee is deleted somewhere and we access the stray pointer in this vector? smart pointer would solve this problem nicely.
  • J4 raw ptr — footprint efficiency. Raw ptr object is smaller.

##fastest container choices: array of POD #or pre-sized vector

relevant to low-latency market data.

  • raw array is “lean and mean” — the most memory efficient; vector is very close, but we need to avoid reallocation
  • std::array is less popular but should offer similar performance to vector
  • all other containers are slower, with bigger footprint
  • For high-performance, avoid container of node/pointer — Cache affinity loves contiguous memory. After accessing 1st element, then accessing 2nd element is likely a cache-hit
    • set/map, linked list suffer the same

q[less] functor ^ operator<() again, briefly

[[effSTL]] P177 has more details than I need. Here are a few key points:

std::map and std::set — by default uses less<Trade>, which often uses a “method” operator<() in the Trade class

  • If you omit this operator, you get verbose STL build error messages about missing operator<()
  • this operator<() must be a const method, otherwise you get lengthy STL build error.
  • See https://stackoverflow.com/questions/1102392/stdmaps-with-user-defined-types-as-key
  • The other two (friendly) alternatives are
  • function pointer — easiest choice for quick coding test
  • binary functor class with an operator()(Trade, Trade) — too complex but most efficient, best practice.

 

compute FX swap bid/ask quotes from spotFX+IR quotes #eg calc

Trac Consultancy’s coursebook has an example —

USD/IDR spot = 9150 / 9160
1m USD = 2.375% / 2.5%
1m IDR = 6.125% / 6.25%

Q: USD/IDR forward outright = ? / ?

Rule 1: treat first currency (i.e. USD) as a commodity like silver. Like all currency commodities, this one has a positive carry i.e. interest.

Rule 2: Immediately, notice our silver earns lower interest than IDR, so silver is at fwd Premium, i.e. fwd price must be higher than spot.

Rule 3: in a simple zero-spread context, we know fwd price = spot * (1 + interest differential). This same formula still holds, but now we need to decide which spot bid/ask to use, which 1m-USD bid/ask to use, which 1m-IDR bid/ask to use.

Let’s say we want to compute the fwd _b_i_d_ price (rather than the ask) of the silver. The only fulfillment mechanism is — We the sell-side would borrow IDR, buy silver, lend the silver. At maturity, the total amount of silver divided by the amount of IDR would be same as my fwd bid price. In these 3 trades, we the sell-side would NOT cross the bid/ask spread even once, so we always use the favorable side of bid/ask, meaning

Use the Lower 1m-IDR
Use the Lower spot silver price
Use the Higher 1m-silver

Therefore fwd bid = 9150 [1 + (6.125%-2.5%)/12] = 9178

…… That’s the conclusion. Let’s reflect —

Rule 4: if we arrange the 4 numbers ascending – 2.375 / 2.5 / 6.125 / 6.25 then we always get interest differential between … either the middle pair (6.125-2.5) OR the outside pair (6.25-2.375). This is because the dealer always uses the favorable quote of the lend and borrow.

Rule 5: We are working out the bid side, which is always lower than ask, so the spot quote to use has to be the bid. If the spot ask were used, it could be so much higher than the other side (for an illiquid pair) that the final fwd bid price is higher than the fwd ask! In fact this echos Rule 9 below.

Rule 5b: once we acquire the silver, we always lend it at the ask (i.e. 2.5). From Rule 4, the interest differential is (6.125-2.5)

Rule 9: As a dealer/sell-side, always pick the favorable side when picking the spot, the IR on ccy1 and IR on ccy2.  If at any step you were to pick the unfavorable number, that number could be so extreme (huge bid/ask spread exists) as to make the final fwd bid Exceed the ask.

Let’s apply the rules on the fwd _a_s_k_ = 9160 [ 1+ (6.25% – 2.375%)/12 ] = 9190

Rule 1/2/3/4 same.

Apply Rule 5 – use spot ask (which is the higher quote). Once we sell silver spot, we lend the IDR sales proceeds at the higher side which is 6.25%….

##xp@career diversification #instead of stack-up/deepen

  • biz wing — in addition to my tech wing. I learned a bit but not enough. Not strategic
  • quant? diversify. The on-the-job learning was effective and helped me with subsequent interviews, but further push (UChicago) are not bearing fruits
  • data science? diversify
  • big data java jobs? stack-up
  • —-diversify within the tech space, where I have proven strengths
  • py? bearing fruits. Confidence.
  • swing? positive experience
  • unix -> web dev -> java? extremely successful
  • c++? slowly turning positive
  • dotnet? reasonable
  • FIX? diversify

some international securities have no cusip/isin but never missing both

A BAML collateral system dev told me some securities in his system have no cusip or isin, but must have one of them.

I believe some international assets pledged as collateral could be missing one of them.

Japanese gov bond is a common repo asset — cross-currency repo. The borrower needs USD but uses Japanese bond as collateral.

In MS product reference database, I see these identifiers:

  • internal cusip
  • external cusip – used in U.S./Canada
  • cins – CUSIP International Numbering System, for “foreign” securities
  • isin – if you want to trade something inter-nationally
  • sedol
  • bloomberg id
  • Reuters RIC code, RT symbol and RT tick

 

collateral: trade booked before confirmation

In collateral system, a margin call requires the counter party to post additional collateral (within a short window like a day). If the collateral is in the form of a bond (or another security), then it’s considered a “trade”. There are often pre-agreed procedures to automatically transfer the bond.

So the IT system actually books the trade automatically, even before the collateral operations team gets to confirm the trade with the counter party. That’s what I heard from an application owner. However, I suspect these bonds could be held in some special account and transferred and confirmed automatically when required. In such a case, the trade booking is kind of straight-through-processing.

I guess the counter-party is often an margin account owner, perhaps hedge funds in a prime brokerage system.

tail-recursion Fibonacci # tricky]python

Tail recursion is a “halo” skill in coding interviews. It turns out that most recursive functions can be reworked into the tail-call form, according to http://chrispenner.ca/posts/python-tail-recursion.

The same author also demonstrates

  1. python recursion stack depth is about 1000 only, so deep recursion is unpopular in python
  2. python doesn’t support tail recursion
  3. some decorator trick can simulate tail recursion in python

—————-

Easiest demo problem is factorial(N). For Fibonacci, https://stackoverflow.com/questions/22111252/tail-recursion-fibonacci has a very short python implementation (though I suspect python doesn’t optimize tail recursion). Let me rephrase the question:

Q: Given f(firstVal=0, secondVal=1, length=0) returns 0, f(0,1,1) returns 1, can you implement f(0,1,N) using recursion but in O(N) time and O(1) space? Note Fib(N) ==f(0,1,N)

Key points in the python solution:

  • Start with iterative algo, then convert it to tail recursion.
  • use 2 extra arguments to hold last two intermediate values like Fib(2) Fib(3) etc
  • We saw in the iterative solution that memory usage is O(1), a good sign that tail recursion might be possible.
  • if you observe the sequence of Fib() values computed in the blackbox, actually, you see Fib(2), Fib(3) … up to Fib(N), exactly like the iterative solution.
  • solution is extremely short but non-trivial

https://github.com/tiger40490/repo1/blob/cpp1/cpp1/FibTailRecurse.cpp is my very brief implementation

real-time symbol reference-data: arch #RTS

Real Time Symbol Data is responsible for sending out all security/product reference data in real time, without duplication.

  • latency — typically 2ms (not microsec) latency, from receiving to sending out the enriched reference data to downstream.
  • persistence — any data worthing sending out need to be saved. In fact, every hour the same system sends a refresh snapshot to downstream.
    • performance penalty of disk write — is handled by innoDB. Most database access is in-memory. Disk write is rare. Enough memory to hold 30GB of data. https://bintanvictor.wordpress.com/2017/05/11/exchange-tickers-and-symbols/ shows how many symbols there across all trading venues.
  • insert is actually slower than update. But first, system must check if there’s a need to insert or update. If no change, then don’t save the data or send out.
  • burst / surge — is the main performance headache. We could have a million symbols/messages flooding in
  • relational DB with mostly in-memory storage

peers’priority^%%priority: top5 #beyond gz

In a nut shell, Some of my peers’ priorities I have decidedly given up, and some of my priorities are fairly unique to me.

Actually I only spoke to a small number of (like 10) peers, mostly Chinese and Indian. There are big differences among their views. However, here’s a grossly oversimplified sample of “their” priorities.

  • theirs — top school district
  • theirs — move up (long-term career growth) in a good firm?
    • build long-term relationships with big bosses
  • theirs — green card
  • theirs — early retirement
  • ———-
  • mine — diversify (instead of deepen or stack-up) and avoid stagnation
  • mine — stay technical and marketable, till 70
  • mine — multiple properties and passive income
  • mine — shorter commute
  • mine — smaller home price tag

 

converting btw epoch, HHMMSS ..

  • Note epoch is second-level, not millisecond level.
  • Note epoch is timezone agnostics. It’s always defined in the UTC.
  • Note struct tm is the workhorse. It can break up a time object into the year/weekday/…/second components

—to convert from epoch to HHMMSS:
time_t tt = secSinceEpoch_as_integer;
//time_t tt = time(NULL); //Or you can get current time
struct tm * ptm = gmtime(&tt);

—to convert current time to int (epoch):
long secSinceEpoch = time(NULL);

//alternatively
long millsecSinceEpoch = (std::chrono::system_clock::now().time_since_epoch()).count();

—epoch timestamp is typically in seconds

  • 1513961081 — 10 digits, seconds since Epoch
  • 1511946930032722000 — 19 digits, nanosec since Epoch

##c++dev tools for GTD+zbs

The primary tool chain (IDE, gcc, gdb…) often provides similar features. Therefore, some of these are no longer needed.

  • c++filt
  • [so] ctags, cxref, cflow
  • depend and cppdepend
  • [s] lint — outdated and no longer popular
  • [x] prof/gprof
  • [x] valgrind
  • [o] nm and objdump
  • [s=works on source files]
  • [o=works on object files]
  • [x=works on executables only]

g++ -D_GLIBCXX_DEBUG #impractical

This is a good story for interviews.

In a simple program I wrote from scratch, this flag saved the day. My input to std::set_difference was not sorted, as detected by this flag. Without this flag, the compiler didn’t complain and I had some unexpected successful runs, but with more data I hit runtime errors.

I had less luck using this flag with an existing codebase. After I build my program with this flag, I got random run-time crashes due to “invalid pointer at free()” whenever i use a std::stringstream.

 

custom delimiter for cin operator>> #complicated

Tested but is too hard to remember. Better use the getline() trick in https://bintanvictor.wordpress.com/2017/11/05/simplest-cway-to-split-string-on-custom-delimiter/

struct comma_is_space : std::ctype<char> { //use comma as delimiter
  comma_is_space() : std::ctype<char>(get_table()) {}
  static mask const* get_table() {
    static mask rc[table_size];
    rc[','] = std::ctype_base::space;
    return &rc[0];
  }
};

istringstream iss(line);
iss.imbue(locale(cin.getloc(), new comma_is_space));

binary search in sorted vector of Tick pointer

Note the mismatched args to the comparitor functions.

(I was unable to use a functor class.)

std::vector<Tick const*> vec;
int target;
bool mylessFunc(Tick const * tick, unsigned int target) {
     //cout<<tick->ts<<" against "<<target<<endl; 
     return tick-ts < target;
}
lower_bound(vec.begin(),vec.end(),target, mylessFunc);

bool mygreaterFunc(unsigned int target, Tick const * tick){
     //cout<<a->ts<<" against "<<target<<endl; 
     return tick->ts > target;
}
upper_bound(vec.begin(),vec.end(),target, mygreaterFunc)