Locating a message]binary feed#multiple issues solved

Hi guys, thanks to all your help, I managed to locate the very first trading session message in the raw data file.

We hit and overcame multiple obstacles in this long “needle search in a haystack”.

· Big Obstacle 1: endian-ness. It turned out the raw data is little-endian. For my “needle”, the symbol id 15852 (in decimal)/3dec(in hex) is printed swapped as “ec3d” when I finally found it.

Solution: the exchange spec mentions this.

· Big Obstacle 2: my hex viewers (like “xxd”) adds line breaks to the output, so my needle can be missed by my search. (Thanks to Vishal for pointing this out.)

Solution: xxd -c 999999 raw/feed/file > tmp.txt; # then grep in tmp.txt

The default xxd column size is 16 so every 16 bytes output will get a line break — unwanted! So I set a very large column size.

· Obstacle 3: identify the data file among 20 files. Thanks to this one obstacle, I spent most of my time searching in the wrong files L

Solution: remove each file successively, starting from the later hours, and retest, until the needle stops showing. The last removed file must contain needle. That file is a much smaller haystack.

o one misleading info is the “9.30 am” mentioned in the spec. Actually the message came long before that.

o Another misleading info is the timestamp passed to my function. Not sure where it comes from, but it says 08:00:00.1 am, so I thought the needle must be in the 8am-8.30am file, but actually, it is in the 4am-4.30am file

· Obstacle 4: my “needle” was too short so there are too many useless matches.

Solution: find a longer and more unique needle, such as the SourceTime field, which is a 32-bit integer. When I convert it to hex digits I get 8 hex digits. Then I flip it due to endian-ness. Then I get a more unique needle “008e0959”. I was then able to search in all data files:

for f in arca*0; do

xxd -c999999 -p $f > $f.hex

grep -ioH 008e0959 $f.hex && echo found in $f

done

· Obstacle 5: it’s easy to print out wrong hex representation using C/C++, so for most of this exercise I wasn’t sure if I was looking at correct hex values in my c++ log

o If you convert a long byte array to hex and print without whitespace, you could see 15002100ffffe87600,but when I added a space after each byte, it looks like 15 00 21 00 ffffe876 00, so the 3rd byte was overflowing without

o If you forget padding, then you can see a lot of single “0” when you should get “00”. Again, if you don’t include white space you won’t notice.

o

Solution: I have worked out some simplified code that works. I have a c++ solution and c solution. You can ask me if you need it.

· Obstacle 6: In some cases, sequence number is not in the raw feed. In this case the sequence number is in the feed, so Nick’s suggestion was valid, but I was blocked by other obstacles.

If sequence number is in the feed, you would see incrementing hex numbers periodically in the hex viewer (but shown in little-endian)

Victor.Tan@theICE.com 100 Hillside Ave, White Plains, NY 10603

*******************************************************
This message may contain confidential information and is intended for specific recipients unless explicitly noted otherwise. If you have reason to believe you are not an intended recipient of this message, please delete it and notify the sender. This message may not represent the opinion of Intercontinental Exchange, Inc. (ICE), its subsidiaries or affiliates, and does not constitute a contract or guarantee. Unencrypted electronic mail is not secure and the recipient of this message is expected to provide safeguards from viruses and pursue alternate means of communication where privacy or a binding message is desired.
*******************************************************

bash script show`trap,read,waiting for gdb-attach

#1 valuable feature is the wait for gdb to attach, before unleashing the data producer.

#2 signal trap. I don’t have to remember to kill off background processes.

# Better source this script. One known benefit -- q(jobs) command would now work

sigtrap(){
  echo Interrupted
  kill %1 %2 %3 %4 # q(jobs) can show the process %1 %2 etc
  set -x
  trap - INT
  trap # show active signal traps
  sleep 1
  jobs
  set +x
}

set +x
ps4 # my alias to show relevant processes
echo -e "\njobs:"
jobs
echo -en "\nContinue? [y/any_other_key] "
unset REPLY; read $REPLY
[ "$REPLY" = "y" ] || return

trap "sigtrap" INT # not sure what would happen to the current cmd and to the shell

base=/home/vtan//repo/tp/plugins/xtap/nysemkt_integrated_parser/
pushd $base
make NO_COMPILE=1 || return
echo '---------------------'
popd
/bin/rm --verbose $base/*_vtan.*log /home/vtan/nx_parser/working/CSMIParser_StaticDataMap.dat

set -x

#If our parser is a client to rebus server, then run s2o as a fake rebus server:
s2o 40490|tee $base/2rebus_vtan.bin.log | decript2 ctf -c $base/etc/cdd.cfg > $base/2rebus_vtan.txt.log 2>&1 &

#if our parser is a server outputing rtsd to VAP, then run c2o as a fake client:
c2o localhost 40492|tee $base/rtsd_vtan.bin.log | decript2 ctf -c $base/etc/cdd.cfg > $base/rtsd_vtan.txt.log 2>&1 &

# run a local xtap process:
$base/shared/tp_xtap/bin/xtap -c $base/etc/test_replay.cfg > $base/xtap_vtan.txt.log 2>&1 &
#sleep 3; echo -en "\n\n\nDebugger ready? Start pbflow? [y/any_other_key] "
#unset REPLY; read $REPLY; [ "$REPLY" = "y" ] || return

# playback some historical data, on a multicast port:
pbflow -r999 ~/captured/ax/arcabookxdp1-primary 224.0.0.7:40491 &

set +x
jobs
trap