Asia catch`up with U.S.exchanges

Thanks Sonny,

I could imagine that u.s. exchanges are more advanced in terms of trading rules, operational complexity, matching engine, order types … because presumably u.s. exchanges have more volume and more variations in securities. It’s like a bigger, older hospital is more sophisticated since it has treated many more patients.

On the other hand, I read more than once (over the last 7 years) that in terms of pure latency the bigger exchanges in the U.S. are often slower, but only moderately, so everything still works fine. I don’t know any capacity concerns on the horizon. Some of the smaller exchanges in Asia were aggressive to beat the bigger exchanges on latency. World #1 fastest exchange is now Bombay.

Good sharing.

Victor

Advertisements

bash script: demo`trap,read,waiting for gdb-attach

#1 valuable feature is the wait for gdb to attach, before unleashing the data producer.

#2 signal trap. I don’t have to remember to kill off background processes.

# Better source this script. One known benefit -- q(jobs) command would now work

sigtrap(){
  echo Interrupted
  kill %1 %2 %3 %4 # q(jobs) can show the process %1 %2 etc
  set -x
  trap - INT
  trap # show active signal traps
  sleep 1
  jobs
  set +x
}

set +x
ps4 # my alias to show relevant processes
echo -e "\njobs:"
jobs
echo -en "\nContinue? [y/any_other_key] "
unset REPLY; read $REPLY
[ "$REPLY" = "y" ] || return

trap "sigtrap" INT # not sure what would happen to the current cmd and to the shell

base=/home/vtan//repo/tp/plugins/xtap/nysemkt_integrated_parser/
pushd $base
make NO_COMPILE=1 || return
echo '---------------------'
popd
/bin/rm --verbose $base/*_vtan.*log /home/vtan/nx_parser/working/CSMIParser_StaticDataMap.dat

set -x

#If our parser is a client to rebus server, then run s2o as a fake rebus server:
s2o 40490|tee $base/2rebus_vtan.bin.log | decript2 ctf -c $base/etc/cdd.cfg > $base/2rebus_vtan.txt.log 2>&1 &

#if our parser is a server outputing rtsd to VAP, then run c2o as a fake client:
c2o localhost 40492|tee $base/rtsd_vtan.bin.log | decript2 ctf -c $base/etc/cdd.cfg > $base/rtsd_vtan.txt.log 2>&1 &

# run a local xtap process:
$base/shared/tp_xtap/bin/xtap -c $base/etc/test_replay.cfg > $base/xtap_vtan.txt.log 2>&1 &
#sleep 3; echo -en "\n\n\nDebugger ready? Start pbflow? [y/any_other_key] "
#unset REPLY; read $REPLY; [ "$REPLY" = "y" ] || return

# playback some historical data, on a multicast port:
pbflow -r999 ~/captured/ax/arcabookxdp1-primary 224.0.0.7:40491 &

set +x
jobs
trap

c++variables: ! always objects

Every variable that holds data is an object. Objects are created either with static duration (sometimes by defining rather than declaring the variable), with automatic duration (declaration alone) or with dynamic duration via malloc().

That’s the short version. Here’s the long version:

  • heap objects — have no name, no host variable, no door plate. They only have addresses. The address could be saved in a “pointer-object”, which is a can of worm.
    • In many cases, this heap address is passed around without any pointer object.
  • stack variables (including function parameters) — each stack object has a name (multiple possible?) i.e. the host variable name, like a door plate on the memory location. This memory is allocated when the stack frame is created. When you clone a stack variable you get a cloned object.
    • Advanced — You could create a reference to the stack object, when you pass the host variable by-reference into another function. However, you should never return a stack variable by reference.
  • static Locals — the name myStaticLocal is a door plate on the memory location. This memory is allocated the first time this function is run. You can return a reference to myStaticLocal.
  • file-scope static objects — memory is allocated at program initialization, but if you have many of them scattered in many files, their order of initialization is unpredictable. The name myStaticVar is a door plate on that memory location, but this name is visible only within this /compilation-unit/. If you declare and define it (in one step!) in a shared header file (bad practice) then you get multiple instances of it:(
  • extern static objects — Again, you declare and define it in one step, in one file — ODR. All other compilation units  would need an extern declaration. An extern declaration doesn’t define storage for this object:)
  • static fields — are tricky. The variable name is there after you declare it, but it is a door plate without a door. It only becomes a door plate on a storage location when you allocate storage i.e. create the object by “defining” the host variable. There’s also one-definition-rule (ODR) for static fields, so you first declare the field without defining it, then you define it elsewhere. See https://bintanvictor.wordpress.com/2017/05/30/declared-but-undefined-variable-in-c/

in-depth article: epoll illustrated #SELECT

(source code is available for download in the article)

Compared to select(), the newer linux system call epoll() is designed to be more performant.

Ticker Plant uses epoll. No select() at all.

https://banu.com/blog/2/how-to-use-epoll-a-complete-example-in-c/ is a nice article with sample code of a TCP server.

  • bind(), listen(), accept()
  • main() function with an event loop. In the loop
  • epoll_wait() to detect
    • new client
    • new data on existing clients
    • (Using the timeout parameter, it could also react to a timer events.)

I think this toy program is more readable than a real-world epoll server with thousands of lines.

incisive example showing diff: with^without extern-C

---- dummy8.c ----
#include <stdio.h> 
//without this "test", we could be using c++ compiler unknowingly 😦
int cfunc(){ return 123; }
---- dummy9.cpp ----
#include <iostream>
extern "C" // Try removing this line and see the difference
  int cfunc();
int main(){std::cout << cfunc() <<std::endl; }

Above is complete source of a c++ application using a pre-compiled C function. It shows the need for extern-C.

/bin/rm -v *.*o *.out
### 1a
g++ -v -c dummy8.c # 
objdump --syms dummy8.o # would show mangled function name _Z5cfuncv
### 1b
gcc -v -x c -c dummy8.c # Without the -x c, we could end up with c++ compiler 😦
objdump --syms dummy8.o # would show unmangled function name "cfunc"
### 2
g++ -v dummy8.o dummy9.cpp  # link the dummy8.o into executable

# The -v flag reveals the c vs c++ compiler versions 🙂
### 3
./a.out

So that’s how to compile and run it. Note you need both a c compiler and a c++ compiler. If you only use a c++ compiler, then you won’t have any pre-compiled C code. You can still make the code work, but you won’t be mixing C and C++ and you won’t need extern-C.

My goal is not merely “make the code work”. It’s very easy to make the code work if you have full source code. You won’t need extern-C. You have a simpler alternative — compile every source file in c++ after trivial adjustments to #include.

c++dynamicLoading^dynamicLinking^staticLinking, basics

https://en.wikipedia.org/wiki/Dynamic_loading

*.so and *.dll files are libraries for dynamic linking.
*.a and *.lib files are libraries for static linking.

“Dynamic loading” allows an executable to start up in the absence of these libraries and integrate them at run time, rather than at link time.

You use dlopen(“path/to/some.so”) system call. In Windows it’s LoadLibrary(“path/to/some.dll”)

low-complexity topics #eg:GC/socket

java GC is an example of “low-complexity domain”. Isolated knowledge pearls. (Complexity would be high if you delve into the implementation.)

Other examples

  • FIX? slightly more complex when you need to debug source code. java GC has no “source code” for us.
  • tibrv set-up
  • socket programming? relatively small number of variations and combinations.
  • stateless feed parser against an exchange spec. Can become more complex when the code size increases.

C++build error: declared but undefined variable

I sometimes declare a static field in a header, but fail to define it (i.e. give it storage). It compiles fine and may even link successfully. When you run the executable, you may hit

error loading library /home/nysemkt_integrated_parser.so: undefined symbol: _ZN14arcabookparser6Parser19m_srcDescriptionTknE

Note this is a shared library.
Note the field name is mangled. You can un-mangle it using c++filt:

c++filt _ZN14arcabookparser6Parser19m_srcDescriptionTknE -> arcabookparser::Parser::m_srcDescriptionTkn

According to Deepak Gulati, the binary files only contain mangled names. The linker and all subsequent programs deal exclusively with mangled names.

If you don’t use this field, the undefined variable actually will not bother you! I think the compiler just ignores it.

OPRA: name-based sharding by official feed provider

https://www.opradata.com/specs/48_Line_Notification_Common_IP_Multicast_Specification.pdf shows 48 multicast groups each for a subset of the option symbols. When there were 24 groups, the symbols starting with RU to SMZZZ were too heavy too voluminous for one multicast group, more so than the other 23 groups.

Our OPRA parser instance is simple and efficient (probably stateless) so presumably capable of handling multiple OPRA multicast groups per instance. We still use one parser per MC group for simplicity and ease of management.

From: Chen, Tao
Sent: Tuesday, May 30, 2017 8:24 AM
To: Tan, Victor
Subject: RE: OPRA feed volume

Opra data is provided by SIAC(securities industry automation corporation). The data is disseminated on 53 multicast channels. TP runs 53 instances of parser and 48 instances of rebus across 7 servers to handle it.