[18] latency in a typical sell-side DMA box: 10micros

(This topic is not GTD not zbs, but relevant to some QQ interviewers.)

https://www.youtube.com/watch?v=BD9cRbxWQx8 is a 2018 presentation.

  1. AA is when a client order hits a broker
  2. Between AA and BB is the entire broker DMA engine in a single process, which parses client order, maintains order state, consumers market data and creates/modifies the outgoing FIX msg
  3. BB is when the broker ships the FIX msg out to exchange.

Edge-to-edge latency from AA to BB, if implemented in a given language:

  • python ~ about 50 times longer than java
  • java – can aim for 10 micros if you are really really good. Dan recommends java as a “reasonable choice” iFF you can accept 10+ micros. Single-digit microsecond shops should “take a motorbike not a bicycle”.
  • c# – comparable to java
  • FPGA ~ about 1 micro
  • ASIC ~ 400 ns

— c/c++ can only aim for 10 micros … no better than java.

The stronghold of c++, the space between java and fpga, is shrinking … “constantly” according to Dan Shaya. I think “constantly” is like the growth of Everest.. perhaps by 2.5 inches a year

I feel c++ is still much easier, more flexible than FPGA.

I feel java programming style would become more unnatural than c++ programming in order to compete with c++ on latency.

— IPC latency

Shared memory beats TCP hands down. For an echo test involving two processes:

Using an Aeron-based messaging application, 50th percentile is 250 ns. I think NIC and possibly kernel (not java or c++) are responsible for this latency.

sponsored DMA

Context — a buy-side shop (say HRT) uses a DMA connection sponsored by a sell-side like MS (or Baml or Instinet) to access NYSE. MS provides a DMA platform like Speedway.

The HRT FIX gateway would implement the NYSE FIX spec. Speedway also has a FIX spec for HRT to implement. This spec should include minor customization on the NYSE spec.

I have seen the HPR spec. (HPR is like an engine running in Baml or GS or whatever.) HPR spec seems to talks about customization for NYSE, Nsdq etc …re Gary chat.

Therefore, the HRT FIX gateway to NYSE must implement, in a single codebase,

  1. NYSe spec
  2. Speedway spec
  3. HPR spec
  4. Instinet spec
  5. other sponsors’ spec

The FIX session would be provided (“sponsored”) by MS or Baml, or Instinet. I think the HRT FIX gateway would connect to some IP address belonging to the sponsor like MS. Speedway would forward the FIX messages to NYSE, after some risk checks.

too many DB-writes: sharding insufficient #Indeed

Context: each company receives many many reviews. In a similar scenario, we can say that too many user comments flood in during the soccer world cup.

Aha — the updates don’t need to show up on browser in strict order

Aha — commenting user only need to see her own comment on top, and need not see other users’ latest comments. The comments below her own could be browser-cached content. Ajax…

Interviewer: OK you said horizontal sharding by company id can address highly concurrent data store updates. Now what if one company, say, Amazon, by itself gets 20% of the updates so sharding can’t help this one company.

me: I guess the update requests would block and possibly time out. We can use a task queue.

This is similar to WordPress import/export requests.

  • Each task takes a few seconds, so we don’t want user to wait.
  • If high server load, the wait could be longer.
  • More important — this task need not be immediate. It can be scheduled.
  • We should probably optimize for server efficiency rather than latency or throughput

So similarly, all the review submissions on Amazon can be queued and scheduled.

In FIX protocol, an order sender can receive 150=A meaning PendingNew. I call it “queued for exchange”.  The OMS Server sends the acknowledgement in two steps.

  1. Optional. Execution Report message on order placement in OMS Server’s Order Book. ExecType (150) = A (Pending New)
  1. Execution Report message on receiving acknowledgement from exchange. ExecType (150) = 0 (New). I guess this indicates placement in exchange order book

Note this optimization is completely different from HFT latency engineering.

late fill: fills after order is closed

In FIX protocol, the concept of “closed order” is not so well defined IMO — I would say an order is closed after it is cancelled, expired, fully filled, rejected, replaced, DFD (closed for current day), , ,

Broker can send exec report on a closed order, known as a late fill, usually as a protocol violation.

As a buy-side, is it OK to ignore the late fill? Is it OK to pass on the late fill to other systems?

mktData direct^Reuter feeds: realistic ibank set-up

Nyse – an ibank would use direct feed to minimize latency.

For some newer exchanges – an ibank would get Reuters feed first, then over a few years replace it with direct feed.

A company often gets duplicate data feed for the most important feeds like NYSE and Nasdaq. The rationale is discussed in [[all about hft]]

For some illiquid FX pairs, they rely on calculated rates from Reuters. BBG also calculate a numerically very similar rate, but BBG is more expensive. Bbg also prohibits real time data redistribution within the ibank.

FIX^tibrv: order flow

Background — In a typical sell-side an buy/sell order goes through multiple nodes of a flow chain before it goes out to the trading venue. The order message is payload in a messaging platform.

FIX and Tibrv (and derivatives) are dominant and default choices as messaging platforms. Both are highly adaptable to complex combinations of requirements.

TCP UDP transport
unicast multicast pub/sub fan-out
designed for low-latency eq trading popular for bond trading latency
order msg gets modified along the flow chain ? editable payload

restatement in FIX: triggers

http://fixwiki.org/fixwiki/ExecutionReport — represents an ExecutionRpt sent by the sellside communicating a change in the order (or a restatement of the order”s parameters) WITHOUT an electronic [1] request from the buyside.

“unsolicited order replace”

[1] this exec report can be triggered by a non-electronic request from buyside

A less common trigger — repricing of orders by the sellside such as making Sell Short orders compliant with uptick / downtick rules

DoneForDay report message

https://www.onixs.biz/fix-dictionary/4.2/app_d.html shows many comparable scenarios. D2 is a typical usage of DFD

I think a DFD report is sent (from venue to trader) for each partially filled or unfilled but closed order. Exchange sends an exec report with 39=10 to indicate DFD i.e. Order not completed but no more fills.

Someone said online — DFD (done for day) is a concept that applies to an order, not a stock.


new orderId is generated for order-replace, !! order-modify

Click to access XDP_Integrated_Feed_Client_Specification_v2.1g.pdf

shows that NYSE matching engine has specific business logic for order-repace vs order-modify

  • order-modify reuses existing orderId
  • order-replace would create a new orderId — The “sitting” order must be removed from the book and replaced with the new order.

My friend Ashish said in more than one forex ECN’s,  order update always generates new orderId. He actually implemented the order update business logic in FIX clients that connect to the ECN’s.

overcoming exchange-FIX-session throughput limit

Some exchanges (CME?) limits each client to 30 orders per second. If we have a burst of order to send , I can see two common solutions A) upstream queuing B) multiple sessions

  1. upstream queuing is a must in many contexts. I think this is similar to TCP flow control.
    • queuing in MOM? Possible but not the only choice
  2. an exchange can allow 100+ FIX sessions for one big client like a big ibank.
    • Note a big exchange operator like nsdq can have dozens of individual exchanges.

Q: is there any (sender self-discipline) flow control in intranet FIX?
A: not needed.

## 35+150 in FIX

— FIX Tag 35:
35=8 Execution Report
35=D Order – Single
35=9 Order Cancel Reject
35=F Order Cancel Request
35=G Order Replace Request
35=3 means rejection response due to session level errors, or “protocol-level” reject
35=9 means rejection response to a cancellation/replacement request. ‘UnableToHonorYourCancel’

— FIX tag 150:
150=0 new
150=2 fill
150=5 replace
150=6 pending cancel
150=8 Rejected. There should be a reason. Note 35 is also 8 in this case
150=A pending new — To my surprise, this one also come with 35=8 !

ExecType[150] to complement 35=8

Tag 150 shows the execution report type, so it only (and always, according to Rahul) accompanies 35=8 not other 35=* messages.

With 35=8 alone, we won’t know what type report this is.

The first execution report we get on a given order should show 150=0 i.e. new, not a fill. I believe only the real exchange (not the intermediate routing nodes) would send this message.

I think sometimes we get 35=8;150=A i.e. pending-new, presumably before 150=0. Why? I don’t think we need to bother now.

FIX^MOM: exchange connectivity

The upshot — some subsystem use FIX not MOM; some subsystem use MOM not FIX. A site could send FIX over MOM (such as RV) but not common.

It’s important to realize FIX is not a type of MOM system. Even though FIX uses messages, I believe they are typically sent over persistent TCP sockets, without middle-ware or buffering/queuing. RTS is actually a good example.

Q: does FIX have a huge message buffer? I think it’s small, like TCP, but TCP has congestion control, so sender will wait for receiver.

Say we (sell-side or buy-side) have a direct connectivity to an exchange. Between exchange and our exterior gateway, it’s all sockets and possibly java NIO — no MOM. Data format could be FIX, or it could be the exchange’s proprietary format — here’s why.

The exchange often has an internal client-server protocol — the real protocol and a data flow “choke point” . All client-server request/response relies solely on this protocol. The exchange often builds a FIX translation over this protocol (for obvious reasons….) If we use their FIX protocol, our messages are internally translated from FIX to the exchange proprietary format. Therefore it’s obviously faster to directly pass messages in the exchange proprietary format. The exchange offers this access to selected hedge funds.

As an espeed developer told me, they ship a C-level dll (less often *.so [1]) library (not c++, not static library), to be used by the hedge fund’s exterior gateway. The hedge fund application uses the C functions therein to communicate with the exchange. This communication protocol is possibly proprietary. HotspotFX has a similar client-side library in the form of a jar. There’s no MOM here. The hedge fund gateway engine makes synchronous function calls into the C library.

[1] most trader workstations are on win32.

Note the dll or jar is not running in its own process, but rather loaded into the client process just like any dll or jar.

Between this gateway machine and the trading desk interior hosts, the typical communication is dominated by MOM such as RV, 29West or Solace. In some places, the gateway translates/normalizes messages into an in-house standard format.

y FIX needs seqNo over TCP seqNo

My friend Alan Shi said … Suppose your FIX process crashed or lost power, reloaded (from disk) the last sequence received and reconnected (resetting tcp seq#). It would then receive a live seq # higher than expected. This could mean some executions reports were missed. If exchange notices a sequence gap, then it could mean some order cancellation was missed.  Both scenarios are serious and requires request for resend. CME documentation states:

… a given system, upon detecting a higher than expected message sequence number from its counterparty, requests a range of ordered messages resent from the counterparty.

Major difference from TCP sequence number — FIX specifies no Ack though some exchange do. See Ack in FIX^TCP

Q; how could FIX miss messages given TCP reliability? I guess tcp session disconnect is the main reason.

https://kb.b2bits.com/display/B2BITS/Sequence+number+handling has details:

  • two streams of seq numbers, each controlled by exch vs trader
  • how to react to unexpected high/low values received. Note “my” outgoing seq is controlled by me hence never “unexpected”
  • Sequence number reset policy — After a logout, sequence numbers is supposed to reset to 1, but if connection is terminated ‘non-gracefully’ sequence numbers will continue when the session is restored. In fact a lot of service providers (eg: Trax) never reset sequence numbers during the day. There are also some, who reset sequence numbers once per week, regardless of logout.


binary data ] FIX

It’s possible to “embed” arbitrary binary data in a FIX tag-value pair. However, the parser has to know where it ends. Therefore, the “value” consist of a length and a binary payload.

Q: Any example?

Q: can we send a image?

Q: can we send a document?

Q: Encryption? Need to read more

crossed orderbook detection@refresh #CSY

At least one interviewer asked me — in your orderbook replication (like our rebus), how do you detect a crossed orderbook? I have an idea for your comment.

Rebus currently has a rule to generate top-book-marker. Rebus puts this token on the best bid (+ best ask) whenever a new top-of-book bid emerges.

I think rebus can have an additional rule to check the new best bid against the reining best ask. If the new best bid is ABOVE the best ask, then rebus can attach a “crossed-orderbook-warning” flag to the top-book-marker. This way, downstream gets alerted and have a chance to take corrective action such as removing the entire orderbook or blocking the new best bid until rebus sends a best-bid without this warning.

Such a warning could help protect the customer’s reputation and integrity of the data.

retrans: FIX^TCP^xtap

The FIX part is very relevant to real world OMS.. Devil is in the details.

IP layer offers no retrans. UDP doesn’t support retrans.



TCP FIX xtap
seq# continuous no yes.. see seq]FIX yes
..reset automatic loopback managed by application seldom #exchange decision
..dup possible possible normal under bestOfBoth
..per session per connection per clientId per day
..resumption? possible if wire gets reconnected quickly yes upon re-login unconditional. no choice
Ack positive Ack needed only needed for order submission etc not needed
gap detection sophisticated every gap should be handled immediately since sequence is critical. Out-of-sequence is unacceptable. gap mgr with timer
retrans sophisticated receiver(ECN) will issue resend request; original sender to react intelligently gap mgr with timer
Note original sender should be careful resending new orders.
See https://drivewealth-fix-api.readme.io/docs/resend-request-message


session^msg^tag level concepts

Remember every session-level operation is implemented using messages.

  • Gap management? Session level
  • Seq reset? Session level, but seq reset is a msg
  • Retrans? per-Msg
  • Checksum? per-Msg
  • MsgType tag? there is one per-Msg
  • Header? per-Msg
  • order/trade life-cycle? session level
  • OMS state management? None of them …. more like application level than Session level

##order-book common data structureS #array

Rebus has two levels of trees, according to Steve.

  1. For each feed there’s a separate symbol lookup tree — one would think a hashtable would be faster but given our symbol data size (a few thousands per feed) AVL tree is proven faster.
  2. A per-symbol bid-tree (and ask-tree) sorted by price


A common interview question. http://fixmeisters.blogspot.com/2006/04/possdupflag-or-possresend-this.html has details.

msg with PossDupFlag msg with PossResend
handled by FIX engine application to check many details. Engine can’t help.
seq # old seq # new seq #
triggered by retrans request not automatically
generated by FIX engine not automatically
reactive action by engine proactive decision by application

FIX.5 + FIXT.1 breaking changes

I think many systems are not yet using FIX.5 …

  1. FIXT.1 protocol [1] is a kinda subset of the FIX.4 protocol. It specifies (the traditional) FIX session maintenance over naked TCP
  2. FIX.5 protocol is a kinda subset of the FIX.4 protocol. It specifies only the application messages and not the session messages.

See https://www.fixtrading.org/standards/unsupported/fix-5-0/. Therefore,

  • FIX4 over naked TCP = FIX.5 + FIXT
  • FIX4 over non-TCP = FIX.5 over non-TCP

FIX.5 can use a transport messaging queue or web service, instead of FIXT

In FIX.5, Header now has 5 mandatory fields:

  • existing — BeginString(8), BodyLength(9), MsgType(35)
  • new — SenderCompId(49), TargetCompId(56)

Some applications also require MsgSeqNo(34) and SendingTime(52), but these are unrelated to FIX.5

Note BeginString actually look like “8=FIXT1.1

[1] FIX Session layer will utilize a new version moniker of “FIXTx.y”

FIX over non-TCP

based on https://en.wikipedia.org/wiki/Financial_Information_eXchange…

Background — The traditional FIX transport is naked TCP. I think naked TCP is most efficient.


Some adaptations of FIX can use messaging queue or web services. I think these are built on top of TCP.

MOM would add latency.


FIX is traditionally used for trading. In contrast, (high volume) market data usually go over multicast for efficency. However, there are now  adaptations (like FAST FIX) to send market data over multicast.

checksum[10] value is ascii

https://www.onixs.biz/fix-dictionary/4.2/tagNum_10.html asserts the value has 3 characters, always unencrypted.

In contrast, some non-FIX protocols use an int16 binary integer, which is more efficient.

However, very few protocols would use 10 bits to represent a 3-digit natural number. The only usage I can imagine is a bit-array like 24 bits used to combine 5 different fields, so bit 0 to 9 could be clientId…

SOH can be embedded in binary field value

Usually, a FIX tag/value “field” is terminated by the one-byte SOH character, but if the value is binary, then SOH can be embedded in it at the beginning or the END (resulting in two SOH in a row?)…

https://stackoverflow.com/questions/4907848/whats-the-most-efficient-way-to-parse-fix-protocol-messages-in-net confirms SOH can be embedded.

https://www.ibm.com/support/knowledgecenter/en/SSVSD8_8.3.0/com.ibm.websphere.dtx.packfix.doc/concepts/c_pack_fix_message_structur.htm says in a binary field, there must be a length

crossed orderbook mgmt: real story4IV

disappears after a while; Some days better some days worse

data quality visible to paying customers

–individual contribution?
not really, to be honest.

–what would you do differently?
Perhaps detect and set a warning flag when generating top-book token? See other post on “crossed”


  • query the exchange if query is supported – many exchanges do.
  • compare across ticker plants? Not really done in our case.
  • replay captured data for investigation
  • retrans as the main solution
  • periodic snapshot feed is supported by many exchanges, designed for late-starting subscribers. We could (though we don’t) use it to clean up our crossed orderbook
  • manual cleaning via the cleaner script, as a “2nd-last” resort
  • hot failover.. as last resort

–the cleaner script:

This “depth-cleaner” tool is essentially a script to delete crossed/locked (c/l) entries from our replicated order books. It is run by a user in response to an alert.

… The script then compares the Ask & Bid entries in the order book. If it finds a crossed or locked order, where the bid price is greater than (crossed) or equal to (locked) the ask price, it writes information about that order to a file. It then continues checking entries on both sides of the book until it finds a valid combination. Note that the entries are not checked for “staleness”. As soon as it finds non-crossed/locked Ask and Bid entries, regardless of their timestamps, it is done checking that symbol.

The script takes the entries from the crossed/locked file, and creates a CIM file containing a delete message for every order given. This CIM data is then sent to (the admin port of) order book engine to execute the deletes.

Before the cleaner is invoked manually, we have a scheduled scanner for crossed books. This scanner checks every symbol once a minute. I think it uses a low-priority read-only thread.

FIX heartBtInt^TCP keep-alive^XDP heartbeat^xtap inactivity^MSFilter hearbeat

When either end of a FIX connection has not Sent any data for [HeartBtInt] seconds, it should send a Heartbeat message. This is first layer of defense.

When either end of the connection has not Received any data for (HeartBtInt + “some reasonable transmission lag”) seconds, it will send a TestRequest probe message to verify the connection. This is 2nd layer of defense. If there is still no message received after (HeartBtInt + “some reasonable transmission lag”) seconds then the connection should be considered lost.

Heartbeats issued as a response to TestRequest must contain the TestReqID transmitted in the TestRequest. This is useful to verify that the Heartbeat is the result of the TestRequest and not as the result of a HeartBtInt timeout. If a response doesn’t have the expected TestReqId, I think we shouldn’t ignore it. If everyone were to ignore it, then TestReqId would be a worthless feature added to FIX protocol.

FIX clients should reset the heartbeat interval timer after every transmitted message (not just heartbeats).

TCP keep-alive is an optional, classic feature.

NYSE XDP protocol uses heartbeat messages in both multicast and TCP request server. The TCP heartbeat requires client response, similar to FIX probe message.

Xtap also has an “inactivity timeout”. Every incoming exchange message (including heartbeats) is recognized as “activity”. 60 sec of Inactivity triggers an alert in the log, the log watchdog…

MS Filter framework supports a server-initiated heartbeat, to inform clients that “I’m alive”.  An optional feature — heartbeat msg can carry an expiration value a.k.a dynamic frequency value like “next heartbeat will arrive in X seconds”.

#112=testRequest id. There can be many test requests, each with an id.

%%FIX xp

Experience: special investment upfront-commission. Real time trading app. EMS. Supports
* cancel
* rebook
* redemption
* sellout

Experience: Eq volatile derivatives. Supports C=amend/correction, X=cancel, N=new

FIX over RV for market data dissemination and cache invalidation

Basic FIX client in MS coding interview.

Ack in FIX #TCP

TCP requires Ack for every segment? See cumulative acknowledgement in [[computerNetworking]]

FIX protocol standard doesn’t require Ack for every message.

However, many exchanges spec would require an Ack message for {order submission OR execution report}. At TrexQuant design interview, I dealt with this issue —

  • –trader sends an order and expecting an Ack
  • exchange sends an Ack and expecting another Ack … endless cycle
  • –exchange sends an exec report and expecting an Ack
  • trader sends an Ack waiting for another Ack … endless cycle
  • –what if at any point trader crashes? Exchange would assume the exec report is not acknowledged?

My design minimizes duty of the exchange —

  • –trader sends an order and expecting an Ack
  • exchange sends an Ack and assumes it’s delivered.
  • If trader misses the Ack she would decide either to resend the order (with PossResend flag) or query the exchange
  • –exchange sends an exec report and expecting an Ack. Proactive resend (with PossResend) when appropriate.
  • trader sends an Ack and assumes it’s delivered.
  • If exchanges doesn’t get any Ack it would force a heartbeat with TestRequest. Then exchange assumes trader is offline.

##mkt-data jargon terms: fakeIV

Context — real time high volume market data feed from exchanges.

  • Q: what’s depth of market? Who may need it?
  • Q: what’s the different usages of FIX vs binary protocols?
  • Q: when is FIX not suitable? (market data dissemination …)
  • Q: when is binary protocol not suitable? (order submission …)
  • Q: what’s BBO i.e. best bid offer? Why is it important?
  • Q: How do you update BBO and when do you send out the update?
  • Q[D]: how do you support trade-bust that comes with a trade id? How about order cancellation — is it handled differently?
  • Q: how do you handle an order modification?
  • Q: how do you handle an order replacement?
  • Q: How do you handle currency code? (My parser sends the currency code for each symbol only once, not on every trade)
  • Q: do you have the refresh channel? How can it be useful?
  • Q[D]: do you have a snapshot channel? How can it be useful?
  • Q: when do you shutdown your feed handler? (We don’t, since our clients expect to receive refresh/snapshots from us round the clock. We only restart once a while)
  • Q: if you keep running, then how is your daily open/close/high/low updated?
  • Q: how do you handle partial fill of a big order?
  • Q: what’s an iceberg order? How do you handle it?
  • Q: what’s a hidden order? How do you handle it?
  • Q: What’s Imbalance data and why do clients need it?
    • Hint: they impact daily closing prices which can trigger many derivative contracts. Closing prices are closely watched somewhat like LIBOR.
  • Q: when would we get imbalance data?
  • Q: are there imbalance data for all stocks? (No)
  • ——— symbol/reference data
  • Q[D]: Do you send security description to downstream in every message? It can be quite long and take up lots of bandwidth?
  • Q: what time of the day do symbol data come in?
  • Q5[D]: what essential attributes are there in a typical symbol message?
  • Q5b: Can you name one of the key reasons why symbol message is critical to a market data feed?
  • Q6: how would dividend impact the numbers in your feed? Will that mess up the data you send out to downstream?
  • Q6b[D]: if yes how do you handle it?
  • Q6c: what kind of exchanges don’t have dividend data?
  • Q: what is a stock split? How do you handle it?
  • Q: how are corporate actions handled in your feed?
  • Q: what are the most common corporate actions in your market data? (dividends, stock splits, rights issue…)
  • ——— resilience, recovery, reliability, capacity
  • Q: if a security is lightly traded, how do you know if you have missed some updates or there’s simply no update on this security?
  • Q25: When would your exchange send a reset?
  • Q25b: How do you handle resets?
  • Q26[D]: How do you start your feed handler mid-day?
  • Q26b: What are the concerns?
  • Q[D]: how do you handle potential data loss in multicast?
  • Q[D]: how do you ensure your feed handler can cope with a burst in messages in TCP vs multicast?
  • Q[D]: what if your feed handler is offline for a while and missed some messages?
  • Q21: Do you see gaps in sequence numbers? Are they normal?
  • Q21b: How is your feed handler designed to handle them?
  • Q[D]: does your exchange have a primary and secondary line? How do you combine both?
  • Q[D]: between primary and secondary channel, if one channel has lost data but the other channel did receive the data, how does your system combine them?
  • Q[D]: how do you use the disaster recovery channel?
  • Q[D]: If exchange has too much data to send in one channel, how many more channels do they usually use, and how do they distribute the data among multiple channels?
  • Q: is there a request channel where you send requests to exchange? What kind of data do you send in a request?
  • Q: Is there any consequence if you send too many requests?
    Q: what if some of your requests appear to be lost? How does your feed handler know and react?

[D=design question]

complexities: replicate exchange order book #Level1+2

–For a generic orderbook, the Level 1 complexity is mostly about trade cancel/correct.
All trades must be databased (like our TickCache). In the database, each trade has a trade Id but also an arrival time.

When a trade is canceled by TradeId, we need to regenerate LastPrice/OpenPrice, so we need the ArrivalTime attribute.

VWAP/High/Low are all generated from TickCache.

The other complexity is BBO generation from Level 2.

–For a Level-based Level 2 order book, the complexity is higher than Level 1.

OrderId is usually required so as to cancel/modify.

MOM^sharedMem ring buffer^UDP : mkt-data transmission

I feel in most environments, the MOM design is most robust, relying on a reliable middleware. However, latency sensitive trading systems won’t tolerate the additional latency and see it as unnecessary.

Gregory (ICE) told me about his home-grown simple ring buffer in shared memory. He used a circular byte array. Message boundary is embedded in the payload. When the producer finishes writing to the buffer, it puts some marker to indicate end of data. Greg said the consumer is slower, so he makes it a (periodic) polling reader. When consumer encounters the marker it would stop reading. I told Gregory we need some synchronization. Greg said it’s trivial. Here are my tentative ideas —

Design 1 — every time the producer or the consumer starts it would acquire a lock. Coarse-grained locking

But when the consumer is chipping away at head of the queue, the producer can simultaneously write to the tail, so here’s

Design 2 — the latest message being written is “invisible” to the consumer. Producer keeps the marker unchanged while adding data to the tail of queue. When it has nothing more to write, it moves the marker by updating it.

The marker can be a lock-protected integer representing the index of the last byte written.

No need to worry about buffer capacity, or a very slow consumer.

MOM UDP multicast or TCP or UDS shared_mem
how many processes 3-tier 2-tier 2-tier
1-to-many distribution easy easiest doable
intermediate storage yes tiny. The socket buffer can be 256MB yes
producer data burst supported message loss is common in such a situation supported
async? yes yes, since the receiver must poll or be notified I think the receiver must poll or be notified
additional latency yes yes minimal

TimeInForce [59] FIX

0 = Day (or session)
1 = Good Till Cancel (GTC)
2 = At the Opening (OPG)
3 = Immediate or Cancel (IOC)
4 = Fill or Kill (FOK)
5 = Good Till Crossing (GTX)
6 = Good Till Date
7 = At the Close

Many of these values are interesting. Most confusing is IOC ^ FOK. The IOC order can (but does not have to) be filled in full. If it cannot be filled immediately or fully, the pending part gets canceled (a partial execution is generated) by the market. It differs from a fill or kill order in that in fill or kill orders, partial executions are not accepted.

mkt-data tech skills: portable/shared@@

Raw mkt data tech skill is better than soft mkt data even though it’s further away from “the money”:

  • standard — Exchange mkt data format won’t change a lot. Feels like an industry standard
  • the future — most OTC products are moving to electronic trading and will have market data to process
  • more necessary than many modules in a trading system. However ….. I guess only a few systems need to deal with raw market data. Most down stream systems only deal with the soft market data.

Q1: If you compare 5 typical market data gateway dev [1] jobs, can you identify a few key tech skills shared by at least half the jobs, but not a widely used “generic” skill like math, hash table, polymorphism etc?

Q2: if there is at least one, how important is it to a given job? One of the important required skills, or a make-or-break survival skill?

My view — I feel there is not a shared core skill set. I venture to say there’s not a single answer to Q1.

In contrast, look at quant developers. They all need skills in c++/excel, BlackScholes, bond math, swaps, …

In contrast, also look at dedicated database developers. They all need non-trivial SQL, schema design. Many need stored procs. Tuning is needed if large tables

Now look at market data gateway for OPRA. Two firms’ job requirements will share some common tech skills like throughput (TPS) optimization, fast storage.

If latency and TPS requirements aren’t stringent, then I feel the portable skill set is an empty set.

[1] There are also many positions whose primary duty is market data but not raw market data, not large volume, not latency sensitive. The skill set is even more different. Some don’t need development skill on market data — they only configure some components.

FIX support for multi-leg order # option strategy

  • There are multi-leg FIX orders to trade two stocks without options
    • I believe there’s no valid multi-leg order to trade the same stock using market orders. The multiple legs simply collapse to one. How about two-leg limit orders? I would say Please split into two disconnected limit orders.
  • There are multi-leg FIX orders to trade two futures contracts
  • There are multi-leg FIX orders to trade 3 products like a butterfly.
  • There are multi-leg FIX orders with a special ratio like an option “ratio spread”

But today I show a more common multi-leg strategy order — buy 5 buy-write with IBM. At least four option exchanges support multi-leg FIX orders — Nasdaq-ISE / CBOE / ARCA / Nasdq-PHLX

38=5; // quantity = 5 units
555=2; // how many legs in this order. This tag precedes a repeatingGroup!
654=Leg1; //leg id for one leg
600=IBM; // same value repeated later!
608=OC; //OC=call option
610=201007; //option expiry
612=85; //strike price
624=2; //side = Sell
600=IBM; …

Note that in this case, the entire strategy is defined in FIX, without reference to some listed symbol or productID.

## 4 exchange feeds +! TCP/multicast

  • eSpeed
  • Dalian Commodity Exchange
  • SGX Securities Market Direct Feed
  • CanDeal (http://www.candeal.com) mostly Canadian dollar debt securities

I have evidence (not my imagination) to believe that these exchange data feeds don’t use vanilla TCP or multicast but some proprietary API (based on them presumably).

I was told other China feeds (probably Shenzhen feed) is also API-based.

ESpeed would ship a client-side library. Downstream would statically link or dynamically link it into parser. The parser then communicates to the server in some proprietary protocol.

IDC is not an  exchange but we go one level deeper, requiring our downstream to provide rack space for our own blackbox “clientSiteProcessor” machine. I think this machine may or may not be part of a typical client/server set-up, but it communicates with our central server in a proprietary protocol.

OPRA: name-based sharding by official feed provider

2008(GFC)peak OPRA msg rate — Wikipedia “low latency” article says 1 million updates per second. Note My NYSE parser can handle 370,000 messages per second per thread !

https://www.opradata.com/specs/48_Line_Notification_Common_IP_Multicast_Specification.pdf shows 48 multicast groups each for a subset of the option symbols. When there were 24 groups, the symbols starting with RU to SMZZZ were too heavy too voluminous for one multicast group, more so than the other 23 groups.

Our OPRA parser instance is simple and efficient (probably stateless) so presumably capable of handling multiple OPRA multicast groups per instance. We still use one parser per MC group for simplicity and ease of management.

From: Chen, Tao
Sent: Tuesday, May 30, 2017 8:24 AM
To: Tan, Victor
Subject: RE: OPRA feed volume

Opra data is provided by SIAC(securities industry automation corporation). The data is disseminated on 53 multicast channels. TP runs 53 instances of parser and 48 instances of rebus across 7 servers to handle it.

stale (replicated) orderbook

This page is intended to list ideas on possible ways of improving the orderbook replication system so that it can better determine when an instrument or instruments are stale. It further lists possible ways of automating retransmission of lost and/or stale data to improve the recovery time in the event of an outage.

The current system is largely “best effort”; at each level of the ticker-plant and distribution network components are capable of dropping data should a problem arise. There are few processes capable of correlating what was lost in a format that is useful to a customer. To account for the possibility of lost data, the orderbook replication components constantly generates “refresh” messages with the most up-to-date state of an instrument.

Although this system works well in practice, it leaves open the possibility that a customer may have a cache with “stale” data for an unbounded length of time. This inability to track staleness can be a point of concern for customers.

= Recovery types =

When a downstream component loses one or more update messages for an instrument it is generally safe to assume the instrument is stale. However there can be two different kinds of recoveries:

== Retransmit the latest snapshot ==

This method of retransmission and stale detection revolves around keeping the current tick snapshot database up to date. It is useful for customers that need an accurate tick cache. It may not be a full solution to customers that need an accurate time and sales database.

== Retransmit all lost ticks ==

It is also possible to retransmit all lost ticks after an outage. This is typically useful when trying to repair a time-and-sales database.

Although it is possible to build an accurate “current” snapshot record when all lost ticks are retransmitted, it is a very tedious and error-prone process. It is expected that customers will, in general, be unwilling to rebuild the “current” from the retransmission of lost ticks.

So, a scheme that involves retransmission of lost ticks will still likely require a scheme that retransmits the latest snapshot.

Most of the following discussions are centered around the concept of latest snapshot recovery.

= Gap prevention =

There may be simple ways to reduce the number of times gaps occur. This process could be called “gap prevention”.

In general, it is not possible to eliminate gaps, because severe outages and equipment failure can always occur. The process of gap prevention may be useful, however, where the alternative gap recovery solution is expensive or undesirable. It is also useful in systems that need full lost tick recovery.

There are two possible ways of preventing gaps from occurring. Both trade bandwidth and latency for increased reliability during intermittent outages.

== Wait for retransmit ==

The simplest form of gap prevention involves retransmitting any packets lost on the network. The sender keeps a buffer of recently sent messages, and the receiver can request retransmissions. In the event of packet loss, the receiver waits for the retransmissions before processing data.

This form of gap recovery is a basic feature of the TCP/IP transmission protocol.

== Forward error correction ==

It is also possible to prevent gaps by sending additional data on the feed.

The most basic form of this is used in the “best-of-both” system. It sends two or more copies of the data, and the receiver can fill lost ticks from the additional copies.

It is not necessary to send a full additional feed. For example, one could send a block of parity codes on every tenth packet. A receiver could then theoretically recover from up to ten percent packet loss by using the parity code packets.

Although the forward error correction scheme uses additional bandwidth, additional bandwidth may be available due to line redundancy.

= Snapshot recovery types =

In order to correct a stale instrument, it may be necessary to send the full contents of the instrument. When doing so, one may send them serialized in the real-time feed or out of order.

== In sequence snapshots ==

The simplest form of snapshot transmission involves using the same socket or pipe as the real-time feed itself. In this case, the receiver can simply apply the snapshot to its database; it does not need to handle the cases where the snapshot arrives after/before a real-time update arrives.

The downside of this scheme, however, is that a single upstream supplier of snapshots might get overloaded with requests for retransmissions. If additional distributed databases are used to offload processing, then each additional component will add latency to the real-time feed.

== Out of sequence snapshots ==

It is also possible to send snapshot transmissions using sockets and/or pipes separate from the real-time feed. The advantage of this scheme, is it is relatively cheap and easy to increase the number of distributed snapshot databases from which one can query. However, it requires the receiver of the snapshot to work harder when attempting to apply the response to its database.

One way to apply out of order snapshots is to build a “reorder buffer” into the receiver. This buffer would store the contents of the real-time feed. When a snapshot response arrives, the receiver could locate where the sender was in the real-time stream when it generated the snapshot (possibly by using sequence numbers). It can then apply the snapshot and internally replay any pending updates from the reorder buffer. In the case where a snapshot arrived that was based on real-time traffic that the receiver has yet to receive, the receiver must wait for that traffic to arrive before applying the snapshot.

This scheme is thought to be complex, resource intensive, and error-prone.

If, however, the feed were changed to eliminate the distributed business rules, it may be possible to implement a much simpler out-of-order snapshot application system. See [[Out of sequence snapshots idea]] for a possible implementation.

= Gap detection =

In order to accurately determine when an instrument is “stale”, it is necessary to be able to determine when one or more update messages have been lost. The following sections contains notes on various schemes that can provide this information.

Note that some of the schemes may be complementary. That is, a possible future solution might use parts of several of these methods.

== Sequence numbers with keep-alive on idle ==

The most common method to detect a gap involves placing a monotonically incrementing number on all outgoing messages or message blocks. The receiver can then detect a gap when a message (or message block) arrives with a sequence number that is not one greater than the last sequence number.

In order to account for the case where all messages from a source are lost, or where a source goes idle just after a message loss, the sender needs to arrange to transmit a “keep alive” indicator periodically when the source is otherwise idle. With knowledge of the keep-alive period, the receiver can detect a gap by timing out if it does not receive a message from a source within the specified period. The larger the period, the less keep-alive messages need to be sent when idle. However, it also increases the worst case time to detect a message gap.

It is possible for the sender to generate multiple sequence number series simultaneously by separating instruments into multiple categories. For example, the outgoing feed currently generates an independent sequence number for each “exchange”. At the extreme, it is possible to generate a sequence number stream per instrument; however this would increase the bandwidth due to larger number of keep-alive messages necessary. (One would also not be able to optimize bandwidth by sequencing only message blocks.)

When a sequence number gap is detected, the receiver must consider all instruments using the sequence series as suspect.

Only a single source can reliably generate a sequence number. If multiple ticker-plants generate a feed, they need to use different sequence series. If an upstream ticker-plant switch occurs, the receiver needs to mark the full range of affected instruments as suspect.

Finally, some exchanges provide sequenced data feeds. However, there are [[issues with exchange provided sequence numbers]]. Due to this, it may be difficult to rely on exchange sequencing as a basis for a distribution network sequencing.

== Sequence number of last message ==

A variant of the basic sequencing scheme involves the sequence number of the last message (SNLM) that updates an instrument. This field would be kept by both sender and receiver, and included with real-time messages. If SNLM matches the receiver’s record, implying that the receiver has not missed any updates for this instrument, then the instrument can transition from “suspect” to “clean”. Conversely, a non-match should force the instrument to “stale”.

An advantage of this scheme is that ordinary real-time traffic could reduce the number of suspect records after an outage. It may also make using exchange provided sequence numbers more practical.

As a disadvantage, however, it would require that both a sequence number and SNLM be provided on every real-time update. This might significantly increase bandwidth.

== Periodic message count checks ==

It is also possible to detect gaps if the sender periodically transmits an accounting of all messages sent since the last period. This scheme may use less bandwidth than sequence numbers, because it is not necessary to send a sequence number with every message (or message block).

The scheme still has the same limitations as sequence numbers when ticker-plant switches occur and when trying to determine what was lost when a gap occurs.

== Periodic hash checks ==

Another possible method of detecting gaps is by having the sender generate a hash of the contents of its database. The receiver can then compare the sender’s hash to the same hash generated for its database. If the two differ, a gap must have occurred. (If the two match, however, a gap may have occurred but already been corrected; this method is therefore not useful when full tick recovery is necessary.)

This scheme may be beneficial when ticker-plant switches occur. If two senders have identical databases and no data is lost during a network switch, then the hash checks should still match at the receiver. This scheme, however, still faces the problem of determining which instruments from the set are actually stale when a gap is detected.

Technically, it is possible that two databases could differ while sharing the same hash key. However, it is possible to choose a hash function that makes the possibility of this extremely small.

Finally, this system may face challenges during software upgrades and rollouts. If either the sender or the receiver change how or what they database, it may be difficult to maintain a consistent hash representation.

== Sender tells receiver of gaps ==

If a reliable transmission scheme (eg, tcp) is in use between the sender and receiver, then it may be possible for the sender to inform the receiver when the receiver is unable to receive some portion of the content.

For example, if a sender can not transmit a block of messages to a receiver because the receiver does not have sufficient bandwidth at the time of the message, then it is possible for the sender to make a note of all instruments that receiver was unable to receive. When the receiver has sufficient bandwidth to continue receiving updates, the sender can iterate through the list of lost instruments and inform the receiver.

The scheme has the advantage that it allows the receiver to quickly determine what instruments are stale. It may also be useful when a component upstream in the ticker-plant detects a gap – it can just push down the known stale messages to all components down-stream from it. (For example, an exchange parser might detect a gap and send a stale indicator downstream while it attempts to fill the gap from the exchange.)

As a disadvantage, it may significantly complicate data senders. It also does not help in cases where a receiver needs to change to a different sender.

== Receiver analyzes gapped messages ==

In some systems, the receiver may need to obtain all lost messages (eg, to build a full-tick database). If the receiver knows the contents of messages missing earlier in the stream it can determine which messages are stale. Every instrument that contains an update message in the list of missing messages would be stale; instruments that did not have update messages would be “clean”.

An advantage of this system is that it is relatively simple to implement for receivers that need full tick retransmissions.

However, in the general case, it is not possible to implement full tick retransmissions due to the possibility of hard failures and ticker-plant switches. Therefore this scheme would only be useful to reduce the number of stale instruments in certain cases.

Also, the cost of retransmitting lost ticks may exceed the benefits found from reducing the number of instruments marked stale. This makes the scheme less attractive for receivers that do not need all lost ticks retransmitted.

= Stale correction =

This section discusses possible methods of resolving “suspect conditions” that occur when it is detected that an instrument may have missed real-time update messages.

There are likely many other possible schemes not discussed here. It is also possible that a combination of one or more of these schemes may provide a useful solution.

These solutions center around restoring the snapshot database. Restoration of a tick history database is left for a separate discussion.

== Background refresh ==

The simplest method of clearing stale records is to have the ticker-plant generate a periodic stream of refresh messages. This is what the system currently does.

This system is not very good at handling intermittent errors, because it could take a very long time to refresh the full instrument database. However, if enough bandwidth is allocated, it is a useful system for recovering from hard failures where the downstream needs a full refresh anyway. It is also possible to combine this with one of the gap prevention schemes discussed above to help deter intermittent outages.


* simple to implement at both receiver and sender


* time to recovery can be large

* can be difficult to detect when an instrument should be deleted, or when an IPO should be added

== Receiver requests snapshot for all stale instruments ==

In this system, the receiver would use one of the above gap detection mechanisms to determine when an instrument may be stale. It then issues a series of upstream requests until all such instruments are no longer stale.

In order to reduce the number of requests during an outage, the instruments on the feed could be broken up into multiple sets of sequenced streams (eg, one per exchange).


* could lead to faster recovery when there is available bandwidth and few other customers requiring snapshots


* could be complex trying to request snapshots for instruments where the initial create message is lost


* see discussion on [[#Snapshot recovery types]]

* see discussion on [[#Gap detection]] for possible methods of reducing the universe of suspect instruments during an outage

== Sender sends snapshots ==

This is a variant of [[#Sender tells receiver of gaps]]. However, in this scheme, the sender would detect a gap for a receiver and automatically send the snapshot when bandwidth becomes available. (It may also be possible to send only the part of the snapshot that is necessary.)


* Simple for receiver


* Could be complex for sender

* Isn’t useful if receiver needs to change upstream sources.

== Receiver requests gapped sequences ==

This method involves the receiver detecting when an outage occurs and making an upstream request for the sequence numbers of all messages (or message blocks) not received. The sender would then retransmit the lost messages (or blocks) to the receiver.

The receiver would then place the lost messages along with all subsequently received messages into a “reorder” buffer. The receiver can then internally “play back” the messages from the reorder buffer to rebuild the current state.


* Useful for clients that need to build full-tick databases and thus need the lost messages anyway.


* Thought to be complex and impractical to implement. The reorder buffer could grow to large sizes and might take significant resources to store and apply.
* The bandwidth necessary to retransmit all lost messages may exceed the bandwidth necessary to retransmit the current state of all instruments.
* Doesn’t help when a ticker-plant switch occurs.

== Sender analyzes gapped sequences ==

This scheme is a variant on [[#Receiver requests gapped sequences]]. The receiver detects when an outage occurs and makes an upstream request for the sequence numbers of all messages (or message blocks) not received.

Upon receipt of the request the sender would generate a series of snapshots for all instruments that had real-time updates present in the lost messages. It can do this by analyzing the contents of the messages that it sent but the receiver did not obtain. The sender would also have to inform the receiver when all snapshots have been sent so the receiver can transition the remaining instruments into a “not stale” state.


* May be useful in conjunction with gap prevention. That is, the sender could try resending the lost messages themselves if there is a good chance the receiver will receive them before timing out. If the receiver does timeout, the sender could fall back to the above snapshot system.

* May be simple for receivers


* May be complicated for senders
* Doesn’t help when a ticker-plant switch occurs.


* Either in-sequence or out-of-sequence snapshot transmissions could be used. (See [[#Snapshot recovery types]] for more info.) The receiver need not send the requests to the sender – it could send them to another (more reliable) receiver.

== Receiver could ask if update necessary ==

This is a variant of [[#Receiver requests snapshot for all stale instruments]], however, in this system the receiver sends the sequence number of the last message that updated the instrument (SNLM) with the request. The sender can then compare its SNLM with the receiver’s and either send an “instrument okay” message or a full snapshot in response.


* Reduces downstream bandwidth necessary after an outage


* Doesn’t work well in cases where instruments are updating, because the receiver and sender may be at different points in the update stream
* Lost create messages – see disadvantages of [[#Receiver requests snapshot for all stale instruments]]

== Receiver could ask with hash ==

This is a variant of [[#Receiver could ask if update necessary]], however, in this system the receiver sends a hash value of the current instrument’s database record with the request. The sender can then compare its database hash value with the receiver’s and either send an “instrument okay” message or a full snapshot in response.


* Works during tp switches


* Doesn’t work well in cases where instruments are updating, because the hash values are unlikely to match if sender and receiver are at a different point in the update stream.

* Rollout issues – see [[#Periodic hash checks]]

* Lost create messages – see disadvantages of [[#Receiver requests snapshot for all stale instruments]]

= Important considerations =

In many stale detection and correction system there are several “corner cases” that can be difficult to handle. Planning for these cases in advance can simplify later development issues.

The following is a list of “corner cases” and miscellaneous ideas:

== Ticker plant switches ==

It can be difficult to handle the case where a receiver starts obtaining messages from a different ticker-plant. Our generated sequence numbers wont be synchronized between the ticker-plants. Many of the above schemes would need to place any affected instruments into a “suspect” state should a tp switch occur.

Even if one could guarantee that no update messages were lost during a tp switch (for example by using exchange sequence numbers) there might still be additional work. The old ticker-plant might have been sending incorrect or incomplete messages — indeed, that may have been the reason for the tp switch.

== Lost IPO message ==

When the real-time feed gaps, it is possible that a message that would have created a new instrument was lost. An automatic recovery process should be capable of recovering this lost information.

There are [[schemes to detect extra and missing records]].

== Lost delete message ==

Similar to the IPO case, a real-time gap could have lost an instrument delete message. An automatic recover process should be able to properly handle this.

A more strange, but technically possible situation, involves losing a combination of delete and create messages for the same instrument. The recovery process should be robust enough to ensure that full resynchronization is possible regardless of the real-time message update content.

There are [[schemes to detect extra and missing records]].

== Exchange update patterns ==

Some exchanges have a small number of instruments that update relatively frequently (eg, equities). Other exchanges have a large number of instruments that individually update infrequently, but have a large aggregate update (eg, US options).

Schemes for gap detection and correction should be aware of these differences and be capable of handling both gracefully.

== Orderbooks ==

Recovering orderbooks can be a difficult process. However, getting it right can result in dramatic improvements to their quality, because orderbooks are more susceptible to problems resulting from lost messages.

The key to getting orderbooks correct is finding good solutions to all of the above corner cases. Orderbooks have frequent record creates and deletes. They also have the peculiar situation where some of the orders (those at the “top”) update with very high frequency, but most other orders (not at the “top”) update very infrequently.

== Sources can legitimately idle ==

Many exchanges follow a pattern of high traffic during market hours, but little to no traffic on off hours. Ironically, the traffic near idle periods can be extremely important (eg, opens, closes, deletes, resets).

It is important to make sure a detection scheme can handle the case where a gap occurs around the time of a legitimate feed idle. It should also be able to do so in a reasonable amount of time. (An example of this is the “keep alive” in the above sequence number scheme.)

== Variations of stale ==

A record is sometimes thought to be either “clean” or “stale”. However, it is possible to graduate and/or qualify what stale means. That is, it is possible to be “suspect” or “suspect for a reason” instead of just being “stale”.

Possible per-instrument stale conditions:

  • ; clean : the instrument is not stale
  • ; possible gap : gap in sequence number that could affect many instruments
  • ; definite gap : some recovery schemes can determine when an instrument has definitely lost updates
  • ; upstream possible gap : the tp might have seen a sequence gap from the exchange
  • ; upstream definite gap : the tp might have deduced which instruments actually gapped from exchange
  • ; stale due to csp startup : the csp was recently started and has an unknown cache state
  • ; suspect due to tp switch : a ticker-plant switch occurred
  • ; pre-clean : possible state in out-of-order snapshot recovery schemes
  • ; downstream gap : in some schemes a sender can inform a receiver that it lost updates

essential FIX messages

Focus on just 2 app messages – new order single + a partial fill. Logon/Logout/heartbeat… LG2 — important session messages.

Start with just the 10 most common fields in each message. Later, you need to know 20 but no more.

For the most important fields – tag 35, exectype, ordstatus, focus on the most important enum values.

This is the only way to get a grip on this slippery fish

25-13:34:27:613440 |250—-> 16 BC_ITGC_14_CMAS_104 1080861988 |



— A NewOrderSingle followed by 2 exec reports —

25-13:34:27:802535 |<—-252 13 BC_ITGC_14_CMAS_104 1079105241 |



25-13:34:27:990564 |<—-253 13 BC_ITGC_14_CMAS_104 1079105241 |



client conn IV#FIX/java

Q: OrdStatus vs ExecType in FIX?
http://fixprotocol.org/FIXimate3.0/en/FIX.5.0SP2/tag39.html — good to memorize all of these

Each order can generate multiple execution reports. Each report has an exectype describing the report. Once an order status changes form A to B, OrdStatus field changes from A to B, but in the interim the exectype in the interim execution reports can take on various values.

Enum values of exectype tag and ordstatus tag are very similar. For example, a partial fill msg will have exectype=F (fill) and ordstatus=1 (partial fillED).

exectype is the main carrier of the cancellation message and the mod message.

In a confusing special case, The exectype value can be the letter “i”, representing OrderStatus, spelt “OrderStatus”.

Q: what’s a test request in FIX
AA: force opposite party to send a heartbeat.

Q: what’s send Message() vs post Message in COM?

Q: intern()?

Q: are string objects created in eden or ….?

design considerations – my first homemade FIX client

In a typical sell-side / market maker, the FIX gateway is a FIX client for a liquidity venue. The liquidity venue (typically an exchange, but also ECNs, dark pools) runs a massively multi-threaded, FIX server farm, possibly with custom-made FPGA hardware. I believe the technical requirements for an exchange FIX server is different from a sell-side FIX client.

Therefore, I assume my audience are far more interested in the client code than the server code.

My mock server implementation is a mock-up, with no optimization, no resilience/robust design, no fault-tolerance, no select()-based sockets, no high-availability, no checksum validation.

— design priorities —
Since this is a from-scratch implementation of a large spec, I focused on basic functionality. Debugging and instrumentation is important in the early stage. A lot of extra code was created to help developers verify the correct operation of all those nitty gritty details.

I don’t aim to build a complete engine, but the basic functionality I build, i try to build it on solid ground, so hopefully it doesn’t become throw-away code.

At the heart of the OO design was the abstraction expressed by AbstractClient, AbstractState and ClientSession. These form the backbone of the core object graph. The fields and constructors were chosen with care. Cohesion and low-coupling were my ideals but sometimes hard to achieve.

These core classes help meet most of the fundamental requirements related to object state of a client connection, sequence number, state transition, output stream sharing, connection liveness monitoring… I wouldn’t say these are well-design classes, but they are a reasonable first attempt at modelling the problem using java classes.

Reusable components were created whenever possible, such as the message parser, message formatter.

FIX engine are highly configurable. All config parameters are extracted into a config object, which is obtained from a config factory. One config for Test, one for Production or UAT etc.

— features —
– parsing a raw FIX msg into an array, which is faster to access than a hashmap. By our design, we avoid hash lookup completely.
– Destroyer.java is a home-made solution to the potential forever-blocking of readLine(). On some platforms, socket read is not interruptible and the only way to unfreeze a socket read is to close the socket. We do that in this implementation but only after sending a final logout message.
– Various template methods to guarantee close() and guard against resource leak
– AbstractClient has most of the generic logic, while SingleOrderClient implements very small amount of custom logic.
– FIX msg formatter is generic enough to accept a map of tags and values — any tags and values
– sequence num validation
– state machines on both client and (simplified) server. State design pattern.
– server side — very few features, but ConnectionManager is reasonable encapsulation of new socket creation.

—- client-side features to be implemented —-
– heart beat — more robust. Should never fail.
– timeout waiting for ack
– re-logon if no heartbeat for 60 seconds
– checksum validation
– storing exchange order-ack into database
– Right now, given time constraint, i didn’t use nonblocking IO. All FIX system should use non-blocking IO.
———- Disclaimer ————-
I spent about 4 hours on this project, as suggested by Jonathan. I had to do some basic research on FIX since I have never programmed FIX (except some ECN connectivity modules that probably used FIX under the hood).
————— How to run the server/client —————–
Run ServerMain.java, then ClientMain.java.
To see the heart beat monitoring/intervention (Destroyer) in action, put 500 in getSilenceThreshold().

OPRA feed processing – load sharing

On the Frontline, one (or more) socket receives the raw feed. Number of sockets is dictated by the data provider system. Socket somehow feeds into tibco (possibly on many boxes). Optionally, we could normalize or enrich the raw data.

Tibco then multicasts messages on (up to) 1 million channels i.e. hierarchical subjects. For each underlier, there are potentially hundreds of option tickers. Each option is traded on (up to) 7 exchanges (CME is #1). Therefore there’s one tibco subject for each ticker/exchange. These add up to 1 million hierarchical subjects.

Now, that’s the firmwide market data distributor, the so-called producer. My system, one of the consumers, actually subscribes to most of these subjects. This entails a dual challenge

* we can’t run one thread per subject. Too many threads.
* Can we subscribe to all the subjects in one JVM? I guess possible if one machine has enough kernel threads. In reality, we use up to 500 machines to cope with this volume of subjects.

We ended up grouping thousands of subjects into each JVM instance. All of these instances are kept busy by the OPRA messages.

Note it’s not acceptable to skip any single message in our OPRA feed because the feed is incremental and cumulative.

order book live feed – stream^snapshot

Refer to the blog on order-driven vs quote-driven markets. Suppose a ECN receives limit orders or executable quotes (like limit orders) from traders. ECN maintains them in an order book. Suppose you are a trader and want to receive all updates to that order book. There are 2 common formats

1) The snapshot feed can simplify development/integration with the ECN as the subscriber application does not need to manage the internal state of the ECN order book. The feed publishes a complete snapshot of the order book at a fixed interval; this interval could be something like 50ms. It is possible to subscribe to a specific list of securities upon connecting.

2) Streaming event-based format. consumes more bandwidth (1Mb/sec) and requires that the subscriber manages the state of the ECN order book by applying events received over the feed. The advantage of this feed is that subscriber’s market data will be as accurate as possible and will not have the possible 50ms delay of the snapshot feed.

G10 fields in NOS ^ exec report

—- top 5 fields common to NOS (NewOrderSingle) and exec report —
#1) 22 securityIDSource. Notable enum values
5 = RIC code
*** 48 SecurityID, Requires tag 22 ==NOS/exec
*** 55 optional human-readable symbol ==NOS/exec

#2) 54 side. Notable enum values
1 = Buy
2 = Sell

#3) 40 ordType. Notable enum values
1 = Market order
2 = Limit order

#4) 59 TimeInForce. Notable enum values
0 = Day (or session)
1 = Good Till Cancel (GTC)
3 = Immediate Or Cancel (IOC)
4 = Fill Or Kill (FOK)

—- other fields common to NOS and exec type
*11 ClOrdID assigned by buy-side
*49 senderCompID
*56 receiving firm ID
*38 qty
*44 price

—- exec report top 5 fields
* 150 execType
* 39 OrdStatus. Notable enum values
0 = New
1 = Partially filled
2 = Filled

* 37 OrderID assigned by sell-side. Not in NOS. Less useful than the ClOrdID

simplified wire data format in mkt-data feed

struct quote{
  char[7] symbol; // null terminator not needed
  int bidPrice; // no float please
  int bidSize;
  int offerPrice;
  int offerSize;
This is a 23-byte fixed-width record, extremely network friendly.

Q: write a hash function q[ int hashCode(char * char7) ] for 300,000 symbols
%%A: (c[0]*31 + c[1])*31 …. but this is too slow for this options exchange

I was told there’s a solution with no loop no math. I guess some kind of bitwise operation on the 56bits of char7.

basic order book data structure

An order book quickly adds/removes order objects against a sorted data structure. Sorting by price and (within the same price) sequence #. Both price and seq are represented by integers, not floats. Sequence # is never reused. If due to technical reason the same seq# comes in, then data structure should discard it.
This calls for a sorted set, not multiset. Maps are less memory-efficient.

You can actually specialize a STL set with a comparator functor.

The overloaded () operator should return a bool, where true means A is less than B, and false means A >= B. Note equality should return false, as pointed by [[effective stl]]. This is important to duplicate detection. Suppose we get 2 dupe order objects with different timestamps. System compares price, then compares sequence #. We see A is NOT less than B, and B is NOT less than A, thanks to the false return values. System correctly concludes both are equivalent and discards one duplicate.

socket, tcp/udp, FIX

I was told both UDP and TCP are used in many ibanks. I feel most java shops generally prefer the simpler TCP.
Specifically, I learnt from many veterans that some ibanks use multicast (UDP) for internal messaging.

FIX is not much used. It is sometimes used as purely message formatting protocol (without any session management) on top of multicast UDP. (In contrast, full-featured FIX communication probably prefer TCP for connection/session control.)

Similar – Object serialization like protobuf.

##essential features@FIX session #vague

Top 5 session-message types– Heartbeat(0), Test Request(1) ResendRequest(2), Logout(5) / Logon(A).[1] This short list highlight the essential functionality of the concept known as “FIX session”. I feel the most important features are sequence number and heart beat.

FIX session concept is different from a database server session. I feel FIX session is more like a TCP connection with keep-alive — http://fixprotocol.org/documents/742/FIX_Session_Layer_rev1.ppt says A FIX Session can be defined as “a bi-directional stream of ordered messages between two parties within a continuous sequence number series”… very much like TCP. Each FIX Engines is expected to keep track of both sequence numbers — The Incoming Sequence Number expected on inbound messages received from the counter-party.

A broker FIX server maintains many FIX sessions. Each FIX session has its own pair of sequence numbers. When a FIX client connects to a FIX server, server would check the SenderCompID and TargetCompID values to pick the pair. If there are some persisted sequence values, then FIX server resumes the lost session. If another FIX client also uses the same SenderCompID/TargetCompID values, then the FIX server might assign the lost session to this client. Sometimes server requires password login before resuming a lost session.

http://javarevisited.blogspot.com/2011/02/fix-protocol-session-or-admin-messages.html is a blog post by a fellow developer in finance IT.

http://aj-sometechnicalitiesoflife.blogspot.com/2010/04/fix-protocol-interview-questions.html is another blogger

[1] In parentheses are MsgType values, which are ALWAYS single-characters.

10 coding tips of high-frequency exchange connectivity

[c] use enum/char rather than vptr #Miami
[c] hand-crafted hash table, which is often more optimized for our specific requirement #Barcap
[c] Fixed-width, padded char-array without null terminator #Miami
[c] favor Set to Map
[c] hand-crafted reference counted data structure #Miami
[c] monolithic network-friendly wire format STRUCT for high volume data transfer. Nothing fancy.
[c] java finalizers slowing down garbage collection
[c] favor arrays to object graphs #UBS
[c] avoid boxing and java colllections. Favor primitives. #CS
specialized operator-new. See [[eff c++]]

Note — all of these are considered optimizations on memory. That’s where C++ outsmarts java.

[c=according to insiders]

parse FIX msg to key/value pairS #repeating group

http://bigblog.tanbin.com/2012/03/design-considerations-my-first-homemade.html is the context.

     * Parsing to array is faster than parsing to a HashMap
    public static ArrayList quickParse(String fromServer) {
        ArrayList ret = new ArrayList(Collections.nCopies(
            MAX_FIX_TAG, (String) null));
        for (String pair : fromServer.split(SOH)) {
            String[] tmp = pair.split("=");
            if (tmp.length != 2)  continue;
            int tag = Integer.parseInt(tmp[0]);
            ret.set(tag, tmp[1]); //treat ArrayList as array -- efficient
        return ret;

Repeating group is not supported in this simplistic scheme, but could be accommodated. I can think one simple and one robust design:

Design 1: array<list<string>>, so a regular non-repeating tag will hit a singular list.

Design 2: we ought to know which tags are repeating. The message specification must tell us which tag (NoRoutingIds in the example) signals the arrival of a repeating group. Therefore, at compile time (not runtime) system knows what tags are special i.e. potentially repeating. We will build a separate data structure for them, such as a list<groupMember>. So each FIX message will have one regular array + one list

state maintenance in FIX gateway

Even the fastest, most dumb FIX gateways need to maintain order state, because exchange can send (via FIX) partial fills and acks, and customer can send (via other MOM) cancels and amends, in full-duplex.

When an order is completed, it should be removed from memory to free up. Before that, the engine needs to maintain its state. State maintenance is a major cost and burden on a low-latency system.

A sell-side FIX engine is a daemon process waiting for Response from liquidity venue, and mkt/limit orders from buy-side clients.

[[Complete guide to capital markets]] has some brief coverage on OMS and state maintenance.

FIX session msg vs session tags

Update, here’s an example of a session-related business message: 35=3

The “content” of FIX data is surrounded by session related data, including session-Messages and session-Tags. As a beginner, I was constantly confused over the difference between session-Messages vs session-Tags

A) There are only a few (I guess 5 to 10) session message types, such as heart beat, logon/logout, request for resend…
** All the types — session msg subtypes or app msg subtypes — show up under Tag 35.

B) In ANY (session/app) Message, there are (3 to 30) session-related header tags such often routing-related, such as (34)seqNum, (10)checksum, (8)beginString, (9)bodyLength, (52) timeStamp, encryption. I’d say even the all-important (35)MsgType tag qualifies as session tag.

Most of the standard header tags are optional.

In fact, application data are “surrounded” by session data. App data are the stars of the show, surrounded by a huge “supporting crew” —
– A session msg’s fields are 100% session-related;
– An app msg is not free of session information. It is partly a session msg — due to those session tags.
– If you snoop the conversation, most messages are session messages — heartbeats
– every business data exchange starts with and ends with session messages — logon and logout.

FIX head^tail

Every Single msg (session or app) has exactly 1 head and 1 tail.

Up to FIX.4.4, the header contained three mandatory fields (beside a few dozen optional ones): 8 (BeginString), 9 (BodyLength), and 35 (MsgType) tags.

Every Single msg (session or app) has a tail, containing a single tag (10) ie the checksum.

==> Given a long string, how do you carve cout a full message (session or app)? Look for 8= and 10=

I feel head/tail =~= tcp envelope, and Body =~= payload

sequence num ] FIX

Sequence num can be considered part of the header for many app messages. It’s a control device. I consider it as a session-related tag on a non-session message.

Officially, sequece num is NOT part of the header. Up to FIX.4.4, the header contained three fields: 8 (BeginString), 9 (BodyLength), and 35 (MsgType) tags.

There are several tags about sequence num:
(34) MsgSeqNum Integer message sequence number.
(36) NewSeqNo New sequence number
(43) PossDupFlag Indicates possible retransmission of message with this sequence number
(45) RefSeqNum Reference message sequence number

https://drivewealth-fix-api.readme.io/docs/resend-request-message describes a resend request message i.e. a 35=2 message, containing —

(7) BeginSeqNo and (16) EndSeqNo specifies the requested messages

FIX session^app messages


I feel even an app message has session-related tags.

Top-down learning of the FIX tags:

* learn basic fields of head and tail — see another post
* learn basic message types (msgType or Tag 35) within session messages aka session messages
** remember — Each session starts with a _session_ message LOGON and ends with session message LOGOUT
*** LOGON is a full message with a head containing a field msgType=A meaning LOGON

* learn basic message types (msgType or Tag 35) within _app_ messages
** learn basic Execution Report (35=8), the most common app message
** learn basic trading scenarios consisting of FIX dialogs

FIX common app message types

This post is about “Business / Application messages”, not “session/admin messages

http://en.wikipedia.org/wiki/Order_(exchange) — different order types

common tag=value pairs

35=D — msgType = NewOrderRequest
35=8 — msgType = executionReport
35=G — msgType = requestForCancel/Replace
12=0.05 — Commission (Tag = 12, Type: Amt) Note if CommType (13) is percentage, Commission of 5% should be represented as .05.

Used in:

* Quote Status Report (AI)
* Quote Response (AJ)
* Quote (S)

FIX session is like TCP + sqlplus

Each session starts with an session-message LOGON and ends with session-message LOGOUT. Session-messages are also known as “admin messages”. Main purpose is session management and flow control as in TCP

To maintain the session, I think FIX uses not only the session messages — each app message also has session-related tags like (10)checksum, (9)bodyLength and (34)sequence numbers — used to maintain a session between client and server.

retrans-request — is a message to ask for a resend of a missing message

heartbeat messages — between 2 parties

FIX keywords and intro

tag=value, where tag is always a number, and value is often char(1) — Both need the dictionary for interpretation.

Whenever there’s an enum of possible values, FIX would encode them into very short codes (often char(1) ). As a result, encoding/decoding using dictionary is prevalent.