async messaging-driven #FIX

A few distinct architectures:

  • architecture based on UDP multicast. Most of the other architectures are based on TCP.
  • architecture based on FIX messaging, modeled after the exchange-bank messaging, using multiple request/response messages to manage one stateful order
  • architecture based on pub-sub topics, much more reliable than multicast
  • architecture based on one-to-one message queue
Advertisements

## notable linux system calls: QQ question

https://syscalls.kernelgrok.com can sort the functions by function id

http://asm.sourceforge.net/syscall.html is ordered by function id

  • fork()
  • waitpid()
  • open() close() read() write()
  • –socket
  • socket() connect() accept()
  • recvfrom() sendto()
  • shutdown() is for socket shutdown and is more granular than the generic close()
  • select()
  • epoll family
  • –memory
  • brk
  • mmap

## G5 algoIV constructs x-lang #dataStruct++

  1. elegant, legit simplifications
  2. Judicious use of global -vs- argument -vs- local variables
  3. iterator and  Pointer in node/VO class
  4. Construct tree, hashmap or Sorted array as auxiliary, or for memoisation
  5. Recursion
  6. sentinel node trick: array/slists

The tools below are more specialized and less widely relevant

  • Graph traversal, recursive or iterative..
  • Permutations etc, recursive or iterative ..
  • Std::lower_bound etc
  • sorting

The topics below are Not really for a coding test:

  • DP
  • concurrency

latency QQ ]WallSt IV #java,c++..

Latency knowledge

  • is never needed on the job but … high Market Value
  • is not GTD at all but … is part of zbs
  • is not needed in py jobs
  • is needed in many c++ interview topics but .. in java is concentrated in JIT and GC
  • is an elite skill but … many candidates try
  • some depth is needed for IV and other discussions but … relatively low-complexity .. low-complexity topics #eg:GC/socket

%%3 GTD concerns as lead dev #instru/code reading..

Below are my biggest concerns when considering a lead dev role. Actually, many of my peers often hints “My role (as a mgr, or architect, or app owner..) is even harder” but I think they are Not more effective if asked to play the hands-on developer role.

  1. code reading — requires focus and stamina.
    • In addition, taking ownership of a big code base requires determination and confidence
  2. instrumentation — using prints or special tools, troubleshooting, error reproducing, ..
  3. pressure due to major feature not completed by impending deadlines

Note none of these is tested in any interview!

  • How about firefighting? I tend to panic but my peers aren’t stronger than me technically!
  • How about design? Not really my weakness. My peers’ designs are not necessarily better than mine
  • knowledge of generic tech (rather than local system knowledge)? I’m actually stronger than most peers
  • quality, attention to details? is usually my strength
  • fine tuning? Most peers are not good at this
(not ranked) bonus peers are .. generic or local nlg?
code reading no impact some are stronger  local sys nlg
investigate boss needs answers usually no diff  local
deadline yes ?  local
firefighting yes not stronger  local
design somewhat visible not stronger  some generic

defining+using your swap() #gotcha

Gotcha! If you define your own swap() then it may not get picked up depending on some subtle factors. In the demo below, when the args are declared as “int” variables, then the hidden std::swap() gets picked up instead of your custom swap(size_t, size_t)!

Note there’s no specific #include required.

  • Solution : if practical, avoid “using namespace std” even in cpp files
  • solution : except the outermost main(), enclose everything  in an anonymous namespace, to force the unqualified swap() to resolve to your custom version.
#include <iostream>
#include <assert.h>

//namespace{

using namespace std;

int value1 = 5;
int calls=0;
template <typename T> void swap(size_t a, size_t b){
        ++calls;
        std::cout<<"in my swap()"<<std::endl;
}

int mymain(){
  int a=0, b=1;
  int oldCount = calls;
  swap<int>(a,b); //int arguments won't invoke my swap()
  assert (calls == oldCount);

  std::cout<<"after 1st call to swap()"<<std::endl;
  swap<int>(0,1); //calls my swap()
  assert (calls == 1+oldCount);

  std::swap<int>(a,b);  //can compile even without any #include
  // no return required
}

//}

int main(){
  mymain();
}

##[17] 4 tech career growth directions #aggressive

In spite of the threats and pains, I remain hopeful about my technical growth. Ideally, I hope to

  1. maintain competitiveness in java. Consider body-building
  2. selectively build up my c++ IV competitiveness, in terms of QQ/BP/ECT, from 6 to 7, over 2 years
    1. sockets, concurrency, sharedMem, templates ..
  3. try to recover the /sunk cost/ in quant domain
  4. slowly branch out to data analytics or data science

[1 java] feels achievable, based on my recent interviews. [4 data] is secondary and nice to have, not a must at age 43, so [2 cpp] is my main drive.

The interviews are crucial. Without IV, I can improve my GTD or zbs but painfully slow improvement of IV competitiveness.

Q: Suppose we don’t know java. Based on our c++ IV performance, can we survive as a c++ developer?
A: I will survive. My family needs me.

some Technicalities for QnA IV

We often call these “obscure details”. At the same level, these are a small subset of a large amount of details, so we can’t possibly remember them all 😦

Surprisingly, interviewers show certain patterns when picking which technicality to ask. Perhaps these “special” items aren’t at the same level as the thousands of other obscure details??

These topics are typical of QQ i.e. tough topics for IV only, not tough topics for GTD.

  • archetypical: which socket syscalls are blocking and when
  • $LD_LIBRARY_PATH
  • hash table theoretical details? too theoretical to be part of this discussion
  • select() syscall details
  • vptr, vtable

 

gdb skill level@Wall St

I notice that, absolutely None of my c++  veteran colleagues (I asked only 3) [2] is a gdb expert as there are concurrency experts, algo experts [1], …

Most of my c++ colleagues don’t prefer (reluctance?) console debugger. Many are more familiar with GUI debuggers such as eclipse and MSVS. All agree that prints are often a sufficient debugging tool.

[1] Actually, these other domains are more theoretical and produces “experts”.

[2] maybe I didn’t meet enough true c++ alpha geeks. I bet many of them may have very good gdb skills.

I would /go out on a limb/ to say that gdb is a powerful tool and can save lots of time. It’s similar to adding a meaningful toString() or operator<< to your custom class.

Crucially, it could help you figure things out faster than your team peers. I first saw this potential when learning remote JVM debugging in GS.

— My view on prints —
In perl and python, I use prints exclusively and never needed interactive debuggers. However, in java/c++/c# I heavily relied on debuggers. Why the stark contrast? No good answer.

Q: when are prints not effective?
A: when the edit-compile-test cycle is too long, not automated but too frequent (like 40 times in 2 hours) and when there is real delivery pressure. Note the test part could involve many steps and many files and other systems.
A: when you can’t edit the file at all. I have not seen it.

A less discussed fact — prints are simple and reliable. GUI or console debuggers are often poorly understood. Look at step-through. Optimization, threads, and exceptions often have unexpected impacts. Or look at program state inspection. Many variables are hard to “open up” in console debuggers. You can print var.func1()

 

waste time@c++IDE

c++ IDE build is 20% (up to 50%) more complicated than java, partly because c++ compilation is more complicated than java. Another reason is my previous investment in learning java compilation.

It can take many hours to fix an IDE project, but the long term return and leverage is questionable, since I use c++ IDE much fewer than java IDE.

Eclipse is my choice, although there are many usability issues and apparent bugs. Far from ideal.

Experience — devcpp is simpler.

Experience — codeBlocks xx isn’t worth the effort

 

RVR^LVR %% Lesson #1 #400w

A working definition of lval/rval on http://thbecker.net/articles/rvalue_references/section_01.html says — An L-value expression is an expression that

1) refers to a memory location (heap/stack but not a literal value), and allows us to take the address of that memory location via the address-of (&) operator.

The other key features are all obscure:
2) (my suggestion) can pass into a reference parameter of a function

3) can be the RHS of a reference variable initializer

If overwhelming/mind-boggling, just focus on 1). In short. L-value expression evaluates to a LLLocation in memory

The book [I] introduces a simple test of Rval vs Lval, introduced by Stephan T. Lavavej — An Lval expression is anything that has a name. Minor qualifications:
* function returning a reference…

An R-value expression is any expression that is not an L-value expression. C++ standard decrees “A function call is an L-value only if the result type is a reference”
Note both lval and rval refer to “expressions” including variables, but do not refer to objects. Explained more in Q: What is a rvr %%take
ALL expressions can evaluate to some value, so can appear on the RHS, but only some can appear on the LHS. But this has nothing to do with the lval/rval definitions!
Something, like a subscript expression, can appear on the RHS but it is strictly a lval expression, not a rval expression. If you (like me) find there are too many categories of lval expression, then just remember an lval expression is typically a variable.
! An expression is either a rval or lval, never both !
! An lval expression can bind only to a lval ref (, never a rval ref ) expression
! [P] a rval expression binds to either:
!! a rval reference variable, or
!! const lval reference variable
!! never to a non-const lval reference variable
* a regular variable can appear on RHS/LHS but is always always a lval, never an rval expression.
** a function call is usually an rval expression, but  sometimes an lval expression
* a literal is always an rval expression, since it is not a “place holder” and has no address.
* [P] arithmetic expressions are rval expressions
* [P] subscript expressions and unwrapped pointers are lval expressions. Most common lval expression is the variable name
[P] rval ref variable must bind to an rval expression, never a lval expression!
[P] rval ref indicates the object referenced will be “relocated” soon. Therefore it should bind to temp objects…
E=[[Eff Modern C++]]
P=[[c++primer]]
I=[[c++for the impatient]]

dtor is simple thing.. really@@

update: [[safe c++]] has a concise argument that if ctor is allowed to throw exception, then it’s possible to have dtor skipped accidentally. Only an empty dtor can be safely skipped.

This is typical of c++ — destructor (dtor) is one of the most important features but quite tricky beneath the surface

* exception
* dtor sequence – DCBC
* virtual dtor
* synthesized dtor is usually no good if there’s any pointer field
* lots of undefined behaviors
* there are guidelines for dtor in a base class vs leaf class — never mindless
* objects put into containers need a reasonable dtor
* when the best practice is to leave the dtor to the compiler, you could look stupid by writing one, esp. in an interview.

* smart pointer classes is all about dtor

* RAII is all about dtor

* interplay with delete

*** placement new, array-delete vs delete

*** override operator delete

*** double-delete

*** ownership !

local jargon,local domain knowledge = localSys knowledge

I agree that finance IT skills tend to “converge” across big banks, so we poor developers need not learn so many damn shit new stuff like git, python, continuous build…

On the domain knowledge side, the “local system knowledge” has no convergence as such, so each time I take up a new system, I must spend the time on the stored procs, table relationships, batch jobs, feed files, etc. There are also a lot of non-technical local knowledge. I hear you when your manager complained “what do you know about the system?”

Local system knowledge can be a long learning curve. Those who write the code in the first place has a huge advantage —

“It takes me 3 months to write this damn shit, then we add new features and fix bugs, but it’s still familiar to me. A new guy would take a year to get familiar code-wise, provided he has chance to explore all the key components. If he’s only exposed to half the system, then he would never know  the other half.”

(Therefore, I like to rewrite stuff.)

In http://bigblog.tanbin.com/2011/04/3-types-of-domain-knowledge-in-finance.html I mentioned 3 types of domain knowledge — Jargon, Math, and Architecture.The Math part is stable. Local system knowledge includes a lot of local jargon. Sometimes generic jargon can help us pick up the local jargon.

subverting kernel’s resource-allocation – a few eg

[[Linux sys programming]] explains several techniques to subvert OS resource-allocation decisions. Relevant to performance-critical apps.

P275 mlockall() — mlock() : prevent paging ] real time apps

P173 sched_setaffinity(). A strongly cache-sensitive app could benefit from hard affinity (stronger than the default soft affinity), which prevents the kernel scheduler migrating the process to another
processor.

[[The Linux Programmer’s Toolbox]]. A write-only variable will be removed by the compiler’s optimizer, but such a variable could be useful to debugger. I read somewhere that you can mark it volatile — subversive.

Any way to prevent “my” data or instruction leaving the L1/L2 cache?

Any way to stay “permantly” in the driver’s seat, subverting the thread scheduler’s time-slicing?

##[12] some c++(mostly IV)questions

Q: what if I declare but not implement my dtor?
A: private copy ctor is often left that way. Linker will break.
Q: how to access a c++ api across network
A: corba?
Q: default behavior of the q(less) class-template
A: I have at least 2 posts on this. Basically it calls operator< in the target type (say Trade class). If missing, then compiler fails.
Q: how do you detect memory leak?
A: valgrind
%%A: customize op-new and op-delete to maintain counters?
Q: in java, i can bind a template type param — bound params. c++?
%%A: no, but I have a blog post on the alternative solutions
Q891: are assignment op usually marked const?
%%A: no it must modify the content.
Q891a: what if i override the default one with a non-const op=? will the default one still exist?
A: I think both will co-exist
Q: do people pass std::strings by ref? See Absolute C++
%%A: yes they should
Q: unwrap a null ptr
A: undefined behavior. Often termination.
Q: delete a null ptr
A: C++ guarantees that operator delete checks its argument for null-ness. If the argument is 0, the delete expression has no effect.
Q: comparing null pointers?
A: In C and C++ programming, two null pointers are guaranteed to compare equal
Q: use a ref to a reclaimed address?
A: Rare IV… undefined. May crash.
Q: can i declare a Vector to hold both Integers and Floats?
A: slicing. To get polymorphism, use ptr or ref.
Q: what op must NOT be overloaded as member func?
A: rare IV…… qq(.) is one of five.
Q: Can i declare an Insect variable to hold an Ant object, without using pointers? “Ant a; Insect i = a;”?
A: see post on slicing
Q: Without STL or malloc (both using heap), how does C handle large data structures.
%%A: I used to deal with large amounts of data without malloc. Global var, linked data structure, arrays
%%A: I can have a large array in main() or as a static variable, then use it as a “heap”

remain relevant(survival)to your team^global economy#localSys

When we feel job insecurity, we look around and see who’s more secure, more relevant, more current, and more in-demand. We may find some colleagues more relevant to your team or to the company. Maybe they are local system experts. Maybe they have deep expertise in the base vendor product (Tibco, Murex, LotusNotes, WebLogic…). We may want to _specialize_ like them.

Resist!

Job security derives from skill relevance, but relevant to who? The industry, not the local team, which could disappear if entire team becomes obsolete and irrelevant. I have seen people (including myself) specializing in DNS, VB.NET, Swing, Perl, EJB, let alone home-grown systems such as SecDB. These guys are (extremely) relevant but for a limited time only.

Have a long-term perspective — If you want job security through skill relevance, then invest in more permanent skills such as Math, algorithms, data structures, threading, SQL,…

Have a global perspective — Job security is a serious, almost life-and-death issue. You probably should look beyond your local industry. What skills are truly relevant on the global arena? Some skill (say Murex) might be more relevant locally but less globally.

Avoid niche domains unless it helps boost your mainstream skills.

most fundamental data type in c#/c++/java

Notice the inversion —
* C# — Simple types like “int” are based on struct — by aliaing
* C — struct is based on primitives like “int” — by composition
* C++ — classes are based on the C struct

See http://msdn.microsoft.com/en-us/library/s1ax56ch(v=vs.80).aspx. Ignoring enum types for a moment, all c# value types are struct.

Now we are ready to answer the big Question
Q: beside pointers, what’s the most fundamental data type in this language?
A: C# — most fundamental type is the struct. Simple ints, classes and all other types (except enums) are built on top of struct.
A: Java/C++ — most fundamental types are primitives. In C++ all other types are derived using pointer and composition.
A: Once again, java emerges as the cleanest

Now we know why c# doesn’t call int a PRIMITIVE type because it’s not monolithic nor a truly fundamental type.

socket, tcp/udp, FIX

I was told both UDP and TCP are used in many ibanks. I feel most java shops generally prefer the simpler TCP.
Specifically, I learnt from many veterans that some ibanks use multicast (UDP) for internal messaging.

FIX is not much used. It is sometimes used as purely message formatting protocol (without any session management) on top of multicast UDP. (In contrast, full-featured FIX communication probably prefer TCP for connection/session control.)

Similar – Object serialization like protobuf.

another basic difference iterator^container^algo

(Why bother? Well, you need to know these when you debug STL or extend STL.)

– Containers — are class templates. 100%
– Algorithms — are function templates. 100%

– iterators? A beginner can safely assume that Most of the time iterators are defined inside each container template as a Member type. Since a container has a dummy type T, the iterator must be a class template of T (or a typedef thereof), a Nested class template of T, presumably, inferred from the syntax :

vector<int>::iterator

– The container/algo/iterator adapters are typically class templates
– functor are typically class templates

A trivial consequence — the declarations of containers and iterators are complicated by the templates. Algorithm declarations are simpler in comparison.

## c++ syntax monsters

* constructor calls without new — see other posts.
* initializers in constructors
* array of pointers (P234 [[24]])
* function parameters — reference types
* copy constructor
* constant pointer to constant object
* assignment operator overloading
* function pointers, and (worse still) function pointers to member functions
* generic functions and (worse still) generic classes

A crude progress bar of a learner is these “chapters”….

FX settlement system

I once interfaced with the FX settlement (clearance) system of a major US bank (let’s call it CB). There are always 2 counterparties to each trade. Each counterparty has a nostro account with an “clearance bank” [1]. The settlement process must transfer one currency from clearance bank acct A to clearance bank acct B, and transfer the other currency from B to A.

For example, JPM could be a counter party, but for one particular FX trade, the HKD clearance bank could be HSBC in Hongkong.

In our case, one of the 2 clearance banks is always CB itself. In fact one of the 2 counter parties is usually a CB-controlled trading account.

The settlement system is also know as “cash manager”.

[1] For example nostro acct to receive HKD must exist in a HK bank — the nostro bank. Only HK banks can be the clearance bank for HKD. In the special case of Euro, nostro acct can be in any Eurozone country.

java hash table taking in a new node

incoming “guest” — a key-value pair, to be stored. We attach a 3rd item — computed hash code or compressed index — to each guest.

Actually there are other “things” attached to each guest. These tend to distract you when you first study hash tables —

* a pointer. When storing data, we usually store pointers to original data. In Java, Original Keys and Values are always Objects never primitives, so they are always always stored as pointers.

* a bucket id ie the array slot

* a next-pointer since each node is in a linked list

sql to list mutual funds

we don’t know when the database may get updated. we only know it’s very rare

Jingsong J. Feng if the change frequency is very low, create a trigger in the database, when data change, trigger a query, and update result

Jingsong J. Feng schedule Thread to run it every miniute or every 5 miniutes
Bin X. Tan/VEN… to update the local cache?
Jingsong J. Feng yes

Jingsong J. Feng oh, set a timeout variable — Hibernate

How about an update/insert/delete trigger to send a GET request to our servlet and expire the cache immediately? The next visitor to the website would see the updated list fresh from the database.

If there’s a jms broker, trigger can send a cache-expiry alert to the queue even when the web system is unavailable. When the web site comes up again, it receives the jms message and expires the cache?

Without a JMS broker, the GET request could update some persistent expiry-flag on the cache server. Even with frequent reboots, the cache server could check the expiry-flag periodically.

Q: Can a trigger send a GET?
A: yes

Q: Can a trigger be a jms client?
A: Not so common. http://www.oracle.com/technology/products/integration/bam/10.1.3/TechNotes/TechNote_BAM_AQ_Configuration.pdf
(AQ = Oracle Advanced Queuing )

cross rate derivation illustrated

There’s confusion over the term “cross”. I believe it means the pair is not “native” to the broker/dealer you are using, so they have to synthesize 2 “native” trades. If your broker/dealer offers only USD/XXX , then a lot of non-USD currency pairs will be crosses.

If you are a big dealer bank using EBS and Reuters, then you can combine those 2 brokers’ offerings together and present a consolidated list of “native” pairs. Anything not in the list is considered a cross currency pair. Note EBS/Reuters “native” spreads are very tight. This is a defining feature of “native” pairs.

Since a cross is always synthesized using 2 pairs, transaction cost (and bid/ask spread) is higher.
——————-
In the context of “cross rate”, this term means “an exchange rate between 2 non-USD currencies”. That’s a USD cross. There are also Euro crosses — any non-Eur currency pair whose rate is computed from 2 Eur rates.

If USD/AAA has a bid/offer and USD/BBB has a bid/offer, and CCC/USD has a bid/offer. When computing the cross rates between A B and C, just remember to always take the worse prices, which are the safest prices for the market-maker.
– Offer of the cross pair is worse joining worse
– Bid of the cross pair is worse joining worse too

One principal, but many scenarios. Here’s one scenario.Given
1) USD/CHF = 0.9648/52
2) USD/JPY = 82.64/69
CHF/JPY offer must be computed as USDJPY / USDCHF, but which number by which number? Worst offer quote is the highest, so 82.69 / 0.9648 = 85.71.

Here’s the math explained in English. As a market maker who is publishing the 4 quotes of 1) and 2), at what price am i willing to convert my CHF into JPY? Well, I would convert to USD then JPY, using the safest/greediest conversion rates among my outgoing quotes. I would safely convert from CHF to USD at 0.9648, then safely convert from USD to JPY at 82.69.