mktData direct^Reuter feeds: realistic ibank set-up

Nyse – an ibank would use direct feed to minimize latency.

For some newer exchanges – an ibank would get Reuters feed first, then over a few years replace it with direct feed.

A company often gets duplicate data feed for the most important feeds like NYSE and Nasdaq. The rationale is discussed in [[all about hft]]

For some illiquid FX pairs, they rely on calculated rates from Reuters. BBG also calculate a numerically very similar rate, but BBG is more expensive. Bbg also prohibits real time data redistribution within the ibank.

Advertisements

new orderId is generated for order-replace, !! order-modify

https://www.nyse.com/publicdocs/nyse/data/XDP_Integrated_Feed_Client_Specification_v2.1g.pdf

shows that NYSE matching engine has specific business logic for order-repace vs order-modify

  • order-modify reuses existing orderId
  • order-replace would create a new orderId — The “sitting” order must be removed from the book and replaced with the new order.

My friend Ashish said in more than one forex ECN’s,  order update always generates new orderId. He actually implemented the order update business logic in FIX clients that connect to the ECN’s.

crossed orderbook detection@refresh #CSY

At least one interviewer asked me — in your orderbook replication (like our rebus), how do you detect a crossed orderbook? I have an idea for your comment.

Rebus currently has a rule to generate top-book-marker. Rebus puts this token on the best bid (+ best ask) whenever a new top-of-book bid emerges.

I think rebus can have an additional rule to check the new best bid against the reining best ask. If the new best bid is ABOVE the best ask, then rebus can attach a “crossed-orderbook-warning” flag to the top-book-marker. This way, downstream gets alerted and have a chance to take corrective action such as removing the entire orderbook or blocking the new best bid until rebus sends a best-bid without this warning.

Such a warning could help protect the customer’s reputation and integrity of the data.

crossed orderbook mgmt: real story4IV

–challenges:
disappears after a while; Some days better some days worse

–impact
data quality visible to paying customers

–individual contribution?
not really, to be honest.

–what would you do differently?
Perhaps detect and set a warning flag when generating top-book token? See other post on “crossed”

–details:

  • query the exchange if query is supported – many exchanges do.
  • compare across ticker plants? Not really done in our case.
  • replay captured data for investigation
  • retrans as the main solution
  • periodic snapshot feed is supported by many exchanges, designed for late-starting subscribers. We could (though we don’t) use it to clean up our crossed orderbook
  • manual cleaning via the cleaner script, as a “2nd-last” resort
  • hot failover.. as last resort

–the cleaner script:

This “depth-cleaner” tool is essentially a script to delete crossed/locked (c/l) entries from our replicated order books. It is run by a user in response to an alert.

… The script then compares the Ask & Bid entries in the order book. If it finds a crossed or locked order, where the bid price is greater than (crossed) or equal to (locked) the ask price, it writes information about that order to a file. It then continues checking entries on both sides of the book until it finds a valid combination. Note that the entries are not checked for “staleness”. As soon as it finds non-crossed/locked Ask and Bid entries, regardless of their timestamps, it is done checking that symbol.

The script takes the entries from the crossed/locked file, and creates a CIM file containing a delete message for every order given. This CIM data is then sent to (the admin port of) order book engine to execute the deletes.

Before the cleaner is invoked manually, we have a scheduled scanner for crossed books. This scanner checks every symbol once a minute. I think it uses a low-priority read-only thread.

##mkt-data jargon terms: fakeIV

Context — real time high volume market data feed from exchanges.

  • Q: what’s depth of market? Who may need it?
  • Q: what’s the different usages of FIX vs binary protocols?
  • Q: when is FIX not suitable? (market data dissemination …)
  • Q: when is binary protocol not suitable? (order submission …)
  • Q: what’s BBO i.e. best bid offer? Why is it important?
  • Q: How do you update BBO and when do you send out the update?
  • Q[D]: how do you support trade-bust that comes with a trade id? How about order cancellation — is it handled differently?
  • Q: how do you handle an order modification?
  • Q: how do you handle an order replacement?
  • Q: How do you handle currency code? (My parser sends the currency code for each symbol only once, not on every trade)
  • Q: do you have the refresh channel? How can it be useful?
  • Q[D]: do you have a snapshot channel? How can it be useful?
  • Q: when do you shutdown your feed handler? (We don’t, since our clients expect to receive refresh/snapshots from us round the clock. We only restart once a while)
  • Q: if you keep running, then how is your daily open/close/high/low updated?
  • Q: how do you handle partial fill of a big order?
  • Q: what’s an iceberg order? How do you handle it?
  • Q: what’s a hidden order? How do you handle it?
  • Q: What’s Imbalance data and why do clients need it?
    • Hint: they impact daily closing prices which can trigger many derivative contracts. Closing prices are closely watched somewhat like LIBOR.
  • Q: when would we get imbalance data?
  • Q: are there imbalance data for all stocks? (No)
  • ——— symbol/reference data
  • Q[D]: Do you send security description to downstream in every message? It can be quite long and take up lots of bandwidth?
  • Q: what time of the day do symbol data come in?
  • Q5[D]: what essential attributes are there in a typical symbol message?
  • Q5b: Can you name one of the key reasons why symbol message is critical to a market data feed?
  • Q6: how would dividend impact the numbers in your feed? Will that mess up the data you send out to downstream?
  • Q6b[D]: if yes how do you handle it?
  • Q6c: what kind of exchanges don’t have dividend data?
  • Q: what is a stock split? How do you handle it?
  • Q: how are corporate actions handled in your feed?
  • Q: what are the most common corporate actions in your market data? (dividends, stock splits, rights issue…)
  • ——— resilience, recovery, reliability, capacity
  • Q: if a security is lightly traded, how do you know if you have missed some updates or there’s simply no update on this security?
  • Q25: When would your exchange send a reset?
  • Q25b: How do you handle resets?
  • Q26[D]: How do you start your feed handler mid-day?
  • Q26b: What are the concerns?
  • Q[D]: how do you handle potential data loss in multicast?
  • Q[D]: how do you ensure your feed handler can cope with a burst in messages in TCP vs multicast?
  • Q[D]: what if your feed handler is offline for a while and missed some messages?
  • Q21: Do you see gaps in sequence numbers? Are they normal?
  • Q21b: How is your feed handler designed to handle them?
  • Q[D]: does your exchange have a primary and secondary line? How do you combine both?
  • Q[D]: between primary and secondary channel, if one channel has lost data but the other channel did receive the data, how does your system combine them?
  • Q[D]: how do you use the disaster recovery channel?
  • Q[D]: If exchange has too much data to send in one channel, how many more channels do they usually use, and how do they distribute the data among multiple channels?
  • Q: is there a request channel where you send requests to exchange? What kind of data do you send in a request?
  • Q: Is there any consequence if you send too many requests?
    Q: what if some of your requests appear to be lost? How does your feed handler know and react?

[D=design question]

complexities: replicate exchange order book #Level1+2

–For a generic orderbook, the Level 1 complexity is mostly about trade cancel/correct.
All trades must be databased (like our TickCache). In the database, each trade has a trade Id but also an arrival time.

When a trade is canceled by TradeId, we need to regenerate LastPrice/OpenPrice, so we need the ArrivalTime attribute.

VWAP/High/Low are all generated from TickCache.

The other complexity is BBO generation from Level 2.

–For a Level-based Level 2 order book, the complexity is higher than Level 1.

OrderId is usually required so as to cancel/modify.

MOM^sharedMem ring buffer^UDP : mkt-data transmission

I feel in most environments, the MOM design is most robust, relying on a reliable middleware. However, latency sensitive trading systems won’t tolerate the additional latency and see it as unnecessary.

Gregory (ICE) told me about his home-grown simple ring buffer in shared memory. He used a circular byte array. Message boundary is embedded in the payload. When the producer finishes writing to the buffer, it puts some marker to indicate end of data. Greg said the consumer is slower, so he makes it a (periodic) polling reader. When consumer encounters the marker it would stop reading. I told Gregory we need some synchronization. Greg said it’s trivial. Here are my tentative ideas —

Design 1 — every time the producer or the consumer starts it would acquire a lock. Coarse-grained locking

But when the consumer is chipping away at head of the queue, the producer can simultaneously write to the tail, so here’s

Design 2 — the latest message being written is “invisible” to the consumer. Producer keeps the marker unchanged while adding data to the tail of queue. When it has nothing more to write, it moves the marker by updating it.

The marker can be a lock-protected integer representing the index of the last byte written.

No need to worry about buffer capacity, or a very slow consumer.

MOM UDP multicast or TCP or UDS shared_mem
how many processes 3-tier 2-tier 2-tier
1-to-many distribution easy easiest doable
intermediate storage yes tiny. The socket buffer can be 256MB yes
producer data burst supported message loss is common in such a situation supported
async? yes yes, since the receiver must poll or be notified I think the receiver must poll or be notified
additional latency yes yes minimal

mkt-data tech skills: portable/shared@@

Raw mkt data tech skill is better than soft mkt data even though it’s further away from “the money”:

  • standard — Exchange mkt data format won’t change a lot. Feels like an industry standard
  • the future — most OTC products are moving to electronic trading and will have market data to process
  • more necessary than many modules in a trading system. However ….. I guess only a few systems need to deal with raw market data. Most down stream systems only deal with the soft market data.

Q1: If you compare 5 typical market data gateway dev [1] jobs, can you identify a few key tech skills shared by at least half the jobs, but not a widely used “generic” skill like math, hash table, polymorphism etc?

Q2: if there is at least one, how important is it to a given job? One of the important required skills, or a make-or-break survival skill?

My view — I feel there is not a shared core skill set. I venture to say there’s not a single answer to Q1.

In contrast, look at quant developers. They all need skills in c++/excel, BlackScholes, bond math, swaps, …

In contrast, also look at dedicated database developers. They all need non-trivial SQL, schema design. Many need stored procs. Tuning is needed if large tables

Now look at market data gateway for OPRA. Two firms’ job requirements will share some common tech skills like throughput (TPS) optimization, fast storage.

If latency and TPS requirements aren’t stringent, then I feel the portable skill set is an empty set.

[1] There are also many positions whose primary duty is market data but not raw market data, not large volume, not latency sensitive. The skill set is even more different. Some don’t need development skill on market data — they only configure some components.

## 4 exchange feeds +! TCP/multicast

  • eSpeed
  • Dalian Commodity Exchange
  • SGX Securities Market Direct Feed
  • CanDeal (http://www.candeal.com) mostly Canadian dollar debt securities

I have evidence (not my imagination) to believe that these exchange data feeds don’t use vanilla TCP or multicast but some proprietary API (based on them presumably).

I was told other China feeds (probably Shenzhen feed) is also API-based.

ESpeed would ship a client-side library. Downstream would statically link or dynamically link it into parser. The parser then communicates to the server in some proprietary protocol.

IDC is not an  exchange but we go one level deeper, requiring our downstream to provide rack space for our own blackbox “clientSiteProcessor” machine. I think this machine may or may not be part of a typical client/server set-up, but it communicates with our central server in a proprietary protocol.

OPRA: name-based sharding by official feed provider

2008(GFC)peak OPRA msg rate — Wikipedia “low latency” article says 1 million updates per second. Note My NYSE parser can handle 370,000 messages per second per thread !

https://www.opradata.com/specs/48_Line_Notification_Common_IP_Multicast_Specification.pdf shows 48 multicast groups each for a subset of the option symbols. When there were 24 groups, the symbols starting with RU to SMZZZ were too heavy too voluminous for one multicast group, more so than the other 23 groups.

Our OPRA parser instance is simple and efficient (probably stateless) so presumably capable of handling multiple OPRA multicast groups per instance. We still use one parser per MC group for simplicity and ease of management.

From: Chen, Tao
Sent: Tuesday, May 30, 2017 8:24 AM
To: Tan, Victor
Subject: RE: OPRA feed volume

Opra data is provided by SIAC(securities industry automation corporation). The data is disseminated on 53 multicast channels. TP runs 53 instances of parser and 48 instances of rebus across 7 servers to handle it.

stale (replicated) orderbook

This page is intended to list ideas on possible ways of improving the orderbook replication system so that it can better determine when an instrument or instruments are stale. It further lists possible ways of automating retransmission of lost and/or stale data to improve the recovery time in the event of an outage.

The current system is largely “best effort”; at each level of the ticker-plant and distribution network components are capable of dropping data should a problem arise. There are few processes capable of correlating what was lost in a format that is useful to a customer. To account for the possibility of lost data, the orderbook replication components constantly generates “refresh” messages with the most up-to-date state of an instrument.

Although this system works well in practice, it leaves open the possibility that a customer may have a cache with “stale” data for an unbounded length of time. This inability to track staleness can be a point of concern for customers.

= Recovery types =

When a downstream component loses one or more update messages for an instrument it is generally safe to assume the instrument is stale. However there can be two different kinds of recoveries:

== Retransmit the latest snapshot ==

This method of retransmission and stale detection revolves around keeping the current tick snapshot database up to date. It is useful for customers that need an accurate tick cache. It may not be a full solution to customers that need an accurate time and sales database.

== Retransmit all lost ticks ==

It is also possible to retransmit all lost ticks after an outage. This is typically useful when trying to repair a time-and-sales database.

Although it is possible to build an accurate “current” snapshot record when all lost ticks are retransmitted, it is a very tedious and error-prone process. It is expected that customers will, in general, be unwilling to rebuild the “current” from the retransmission of lost ticks.

So, a scheme that involves retransmission of lost ticks will still likely require a scheme that retransmits the latest snapshot.

Most of the following discussions are centered around the concept of latest snapshot recovery.

= Gap prevention =

There may be simple ways to reduce the number of times gaps occur. This process could be called “gap prevention”.

In general, it is not possible to eliminate gaps, because severe outages and equipment failure can always occur. The process of gap prevention may be useful, however, where the alternative gap recovery solution is expensive or undesirable. It is also useful in systems that need full lost tick recovery.

There are two possible ways of preventing gaps from occurring. Both trade bandwidth and latency for increased reliability during intermittent outages.

== Wait for retransmit ==

The simplest form of gap prevention involves retransmitting any packets lost on the network. The sender keeps a buffer of recently sent messages, and the receiver can request retransmissions. In the event of packet loss, the receiver waits for the retransmissions before processing data.

This form of gap recovery is a basic feature of the TCP/IP transmission protocol.

== Forward error correction ==

It is also possible to prevent gaps by sending additional data on the feed.

The most basic form of this is used in the “best-of-both” system. It sends two or more copies of the data, and the receiver can fill lost ticks from the additional copies.

It is not necessary to send a full additional feed. For example, one could send a block of parity codes on every tenth packet. A receiver could then theoretically recover from up to ten percent packet loss by using the parity code packets.

Although the forward error correction scheme uses additional bandwidth, additional bandwidth may be available due to line redundancy.

= Snapshot recovery types =

In order to correct a stale instrument, it may be necessary to send the full contents of the instrument. When doing so, one may send them serialized in the real-time feed or out of order.

== In sequence snapshots ==

The simplest form of snapshot transmission involves using the same socket or pipe as the real-time feed itself. In this case, the receiver can simply apply the snapshot to its database; it does not need to handle the cases where the snapshot arrives after/before a real-time update arrives.

The downside of this scheme, however, is that a single upstream supplier of snapshots might get overloaded with requests for retransmissions. If additional distributed databases are used to offload processing, then each additional component will add latency to the real-time feed.

== Out of sequence snapshots ==

It is also possible to send snapshot transmissions using sockets and/or pipes separate from the real-time feed. The advantage of this scheme, is it is relatively cheap and easy to increase the number of distributed snapshot databases from which one can query. However, it requires the receiver of the snapshot to work harder when attempting to apply the response to its database.

One way to apply out of order snapshots is to build a “reorder buffer” into the receiver. This buffer would store the contents of the real-time feed. When a snapshot response arrives, the receiver could locate where the sender was in the real-time stream when it generated the snapshot (possibly by using sequence numbers). It can then apply the snapshot and internally replay any pending updates from the reorder buffer. In the case where a snapshot arrived that was based on real-time traffic that the receiver has yet to receive, the receiver must wait for that traffic to arrive before applying the snapshot.

This scheme is thought to be complex, resource intensive, and error-prone.

If, however, the feed were changed to eliminate the distributed business rules, it may be possible to implement a much simpler out-of-order snapshot application system. See [[Out of sequence snapshots idea]] for a possible implementation.

= Gap detection =

In order to accurately determine when an instrument is “stale”, it is necessary to be able to determine when one or more update messages have been lost. The following sections contains notes on various schemes that can provide this information.

Note that some of the schemes may be complementary. That is, a possible future solution might use parts of several of these methods.

== Sequence numbers with keep-alive on idle ==

The most common method to detect a gap involves placing a monotonically incrementing number on all outgoing messages or message blocks. The receiver can then detect a gap when a message (or message block) arrives with a sequence number that is not one greater than the last sequence number.

In order to account for the case where all messages from a source are lost, or where a source goes idle just after a message loss, the sender needs to arrange to transmit a “keep alive” indicator periodically when the source is otherwise idle. With knowledge of the keep-alive period, the receiver can detect a gap by timing out if it does not receive a message from a source within the specified period. The larger the period, the less keep-alive messages need to be sent when idle. However, it also increases the worst case time to detect a message gap.

It is possible for the sender to generate multiple sequence number series simultaneously by separating instruments into multiple categories. For example, the outgoing feed currently generates an independent sequence number for each “exchange”. At the extreme, it is possible to generate a sequence number stream per instrument; however this would increase the bandwidth due to larger number of keep-alive messages necessary. (One would also not be able to optimize bandwidth by sequencing only message blocks.)

When a sequence number gap is detected, the receiver must consider all instruments using the sequence series as suspect.

Only a single source can reliably generate a sequence number. If multiple ticker-plants generate a feed, they need to use different sequence series. If an upstream ticker-plant switch occurs, the receiver needs to mark the full range of affected instruments as suspect.

Finally, some exchanges provide sequenced data feeds. However, there are [[issues with exchange provided sequence numbers]]. Due to this, it may be difficult to rely on exchange sequencing as a basis for a distribution network sequencing.

== Sequence number of last message ==

A variant of the basic sequencing scheme involves the sequence number of the last message (SNLM) that updates an instrument. This field would be kept by both sender and receiver, and included with real-time messages. If SNLM matches the receiver’s record, implying that the receiver has not missed any updates for this instrument, then the instrument can transition from “suspect” to “clean”. Conversely, a non-match should force the instrument to “stale”.

An advantage of this scheme is that ordinary real-time traffic could reduce the number of suspect records after an outage. It may also make using exchange provided sequence numbers more practical.

As a disadvantage, however, it would require that both a sequence number and SNLM be provided on every real-time update. This might significantly increase bandwidth.

== Periodic message count checks ==

It is also possible to detect gaps if the sender periodically transmits an accounting of all messages sent since the last period. This scheme may use less bandwidth than sequence numbers, because it is not necessary to send a sequence number with every message (or message block).

The scheme still has the same limitations as sequence numbers when ticker-plant switches occur and when trying to determine what was lost when a gap occurs.

== Periodic hash checks ==

Another possible method of detecting gaps is by having the sender generate a hash of the contents of its database. The receiver can then compare the sender’s hash to the same hash generated for its database. If the two differ, a gap must have occurred. (If the two match, however, a gap may have occurred but already been corrected; this method is therefore not useful when full tick recovery is necessary.)

This scheme may be beneficial when ticker-plant switches occur. If two senders have identical databases and no data is lost during a network switch, then the hash checks should still match at the receiver. This scheme, however, still faces the problem of determining which instruments from the set are actually stale when a gap is detected.

Technically, it is possible that two databases could differ while sharing the same hash key. However, it is possible to choose a hash function that makes the possibility of this extremely small.

Finally, this system may face challenges during software upgrades and rollouts. If either the sender or the receiver change how or what they database, it may be difficult to maintain a consistent hash representation.

== Sender tells receiver of gaps ==

If a reliable transmission scheme (eg, tcp) is in use between the sender and receiver, then it may be possible for the sender to inform the receiver when the receiver is unable to receive some portion of the content.

For example, if a sender can not transmit a block of messages to a receiver because the receiver does not have sufficient bandwidth at the time of the message, then it is possible for the sender to make a note of all instruments that receiver was unable to receive. When the receiver has sufficient bandwidth to continue receiving updates, the sender can iterate through the list of lost instruments and inform the receiver.

The scheme has the advantage that it allows the receiver to quickly determine what instruments are stale. It may also be useful when a component upstream in the ticker-plant detects a gap – it can just push down the known stale messages to all components down-stream from it. (For example, an exchange parser might detect a gap and send a stale indicator downstream while it attempts to fill the gap from the exchange.)

As a disadvantage, it may significantly complicate data senders. It also does not help in cases where a receiver needs to change to a different sender.

== Receiver analyzes gapped messages ==

In some systems, the receiver may need to obtain all lost messages (eg, to build a full-tick database). If the receiver knows the contents of messages missing earlier in the stream it can determine which messages are stale. Every instrument that contains an update message in the list of missing messages would be stale; instruments that did not have update messages would be “clean”.

An advantage of this system is that it is relatively simple to implement for receivers that need full tick retransmissions.

However, in the general case, it is not possible to implement full tick retransmissions due to the possibility of hard failures and ticker-plant switches. Therefore this scheme would only be useful to reduce the number of stale instruments in certain cases.

Also, the cost of retransmitting lost ticks may exceed the benefits found from reducing the number of instruments marked stale. This makes the scheme less attractive for receivers that do not need all lost ticks retransmitted.

= Stale correction =

This section discusses possible methods of resolving “suspect conditions” that occur when it is detected that an instrument may have missed real-time update messages.

There are likely many other possible schemes not discussed here. It is also possible that a combination of one or more of these schemes may provide a useful solution.

These solutions center around restoring the snapshot database. Restoration of a tick history database is left for a separate discussion.

== Background refresh ==

The simplest method of clearing stale records is to have the ticker-plant generate a periodic stream of refresh messages. This is what the system currently does.

This system is not very good at handling intermittent errors, because it could take a very long time to refresh the full instrument database. However, if enough bandwidth is allocated, it is a useful system for recovering from hard failures where the downstream needs a full refresh anyway. It is also possible to combine this with one of the gap prevention schemes discussed above to help deter intermittent outages.

Advantages:

* simple to implement at both receiver and sender

Disadvantages:

* time to recovery can be large

* can be difficult to detect when an instrument should be deleted, or when an IPO should be added

== Receiver requests snapshot for all stale instruments ==

In this system, the receiver would use one of the above gap detection mechanisms to determine when an instrument may be stale. It then issues a series of upstream requests until all such instruments are no longer stale.

In order to reduce the number of requests during an outage, the instruments on the feed could be broken up into multiple sets of sequenced streams (eg, one per exchange).

Advantages:

* could lead to faster recovery when there is available bandwidth and few other customers requiring snapshots

Disadvantages:

* could be complex trying to request snapshots for instruments where the initial create message is lost

Notes:

* see discussion on [[#Snapshot recovery types]]

* see discussion on [[#Gap detection]] for possible methods of reducing the universe of suspect instruments during an outage

== Sender sends snapshots ==

This is a variant of [[#Sender tells receiver of gaps]]. However, in this scheme, the sender would detect a gap for a receiver and automatically send the snapshot when bandwidth becomes available. (It may also be possible to send only the part of the snapshot that is necessary.)

Advantages:

* Simple for receiver

Disadvantages:

* Could be complex for sender

* Isn’t useful if receiver needs to change upstream sources.

== Receiver requests gapped sequences ==

This method involves the receiver detecting when an outage occurs and making an upstream request for the sequence numbers of all messages (or message blocks) not received. The sender would then retransmit the lost messages (or blocks) to the receiver.

The receiver would then place the lost messages along with all subsequently received messages into a “reorder” buffer. The receiver can then internally “play back” the messages from the reorder buffer to rebuild the current state.

Advantages:

* Useful for clients that need to build full-tick databases and thus need the lost messages anyway.

Disadvantages:

* Thought to be complex and impractical to implement. The reorder buffer could grow to large sizes and might take significant resources to store and apply.
* The bandwidth necessary to retransmit all lost messages may exceed the bandwidth necessary to retransmit the current state of all instruments.
* Doesn’t help when a ticker-plant switch occurs.

== Sender analyzes gapped sequences ==

This scheme is a variant on [[#Receiver requests gapped sequences]]. The receiver detects when an outage occurs and makes an upstream request for the sequence numbers of all messages (or message blocks) not received.

Upon receipt of the request the sender would generate a series of snapshots for all instruments that had real-time updates present in the lost messages. It can do this by analyzing the contents of the messages that it sent but the receiver did not obtain. The sender would also have to inform the receiver when all snapshots have been sent so the receiver can transition the remaining instruments into a “not stale” state.

Advantages:

* May be useful in conjunction with gap prevention. That is, the sender could try resending the lost messages themselves if there is a good chance the receiver will receive them before timing out. If the receiver does timeout, the sender could fall back to the above snapshot system.

* May be simple for receivers

Disadvantages:

* May be complicated for senders
* Doesn’t help when a ticker-plant switch occurs.

Notes:

* Either in-sequence or out-of-sequence snapshot transmissions could be used. (See [[#Snapshot recovery types]] for more info.) The receiver need not send the requests to the sender – it could send them to another (more reliable) receiver.

== Receiver could ask if update necessary ==

This is a variant of [[#Receiver requests snapshot for all stale instruments]], however, in this system the receiver sends the sequence number of the last message that updated the instrument (SNLM) with the request. The sender can then compare its SNLM with the receiver’s and either send an “instrument okay” message or a full snapshot in response.

Advantages:

* Reduces downstream bandwidth necessary after an outage

Disadvantages:

* Doesn’t work well in cases where instruments are updating, because the receiver and sender may be at different points in the update stream
* Lost create messages – see disadvantages of [[#Receiver requests snapshot for all stale instruments]]

== Receiver could ask with hash ==

This is a variant of [[#Receiver could ask if update necessary]], however, in this system the receiver sends a hash value of the current instrument’s database record with the request. The sender can then compare its database hash value with the receiver’s and either send an “instrument okay” message or a full snapshot in response.

Advantages:

* Works during tp switches

Disadvantages:

* Doesn’t work well in cases where instruments are updating, because the hash values are unlikely to match if sender and receiver are at a different point in the update stream.

* Rollout issues – see [[#Periodic hash checks]]

* Lost create messages – see disadvantages of [[#Receiver requests snapshot for all stale instruments]]

= Important considerations =

In many stale detection and correction system there are several “corner cases” that can be difficult to handle. Planning for these cases in advance can simplify later development issues.

The following is a list of “corner cases” and miscellaneous ideas:

== Ticker plant switches ==

It can be difficult to handle the case where a receiver starts obtaining messages from a different ticker-plant. Our generated sequence numbers wont be synchronized between the ticker-plants. Many of the above schemes would need to place any affected instruments into a “suspect” state should a tp switch occur.

Even if one could guarantee that no update messages were lost during a tp switch (for example by using exchange sequence numbers) there might still be additional work. The old ticker-plant might have been sending incorrect or incomplete messages — indeed, that may have been the reason for the tp switch.

== Lost IPO message ==

When the real-time feed gaps, it is possible that a message that would have created a new instrument was lost. An automatic recovery process should be capable of recovering this lost information.

There are [[schemes to detect extra and missing records]].

== Lost delete message ==

Similar to the IPO case, a real-time gap could have lost an instrument delete message. An automatic recover process should be able to properly handle this.

A more strange, but technically possible situation, involves losing a combination of delete and create messages for the same instrument. The recovery process should be robust enough to ensure that full resynchronization is possible regardless of the real-time message update content.

There are [[schemes to detect extra and missing records]].

== Exchange update patterns ==

Some exchanges have a small number of instruments that update relatively frequently (eg, equities). Other exchanges have a large number of instruments that individually update infrequently, but have a large aggregate update (eg, US options).

Schemes for gap detection and correction should be aware of these differences and be capable of handling both gracefully.

== Orderbooks ==

Recovering orderbooks can be a difficult process. However, getting it right can result in dramatic improvements to their quality, because orderbooks are more susceptible to problems resulting from lost messages.

The key to getting orderbooks correct is finding good solutions to all of the above corner cases. Orderbooks have frequent record creates and deletes. They also have the peculiar situation where some of the orders (those at the “top”) update with very high frequency, but most other orders (not at the “top”) update very infrequently.

== Sources can legitimately idle ==

Many exchanges follow a pattern of high traffic during market hours, but little to no traffic on off hours. Ironically, the traffic near idle periods can be extremely important (eg, opens, closes, deletes, resets).

It is important to make sure a detection scheme can handle the case where a gap occurs around the time of a legitimate feed idle. It should also be able to do so in a reasonable amount of time. (An example of this is the “keep alive” in the above sequence number scheme.)

== Variations of stale ==

A record is sometimes thought to be either “clean” or “stale”. However, it is possible to graduate and/or qualify what stale means. That is, it is possible to be “suspect” or “suspect for a reason” instead of just being “stale”.

Possible per-instrument stale conditions:

  • ; clean : the instrument is not stale
  • ; possible gap : gap in sequence number that could affect many instruments
  • ; definite gap : some recovery schemes can determine when an instrument has definitely lost updates
  • ; upstream possible gap : the tp might have seen a sequence gap from the exchange
  • ; upstream definite gap : the tp might have deduced which instruments actually gapped from exchange
  • ; stale due to csp startup : the csp was recently started and has an unknown cache state
  • ; suspect due to tp switch : a ticker-plant switch occurred
  • ; pre-clean : possible state in out-of-order snapshot recovery schemes
  • ; downstream gap : in some schemes a sender can inform a receiver that it lost updates

5 retrans questions from interviewers + from me

Q1 (IV): do you track gaps for Line A and also track gaps in Line B?
A: No, according to Deepak and Shanyou. Deepak said we treat the seq from Line A/B as one combined sequence (with duplicates) and track gaps therein

Q2 (IV): Taking parser + orderbook (i.e. rebus) as a single black box, when you notice a gap (say seq #55/56/57), do you continue to process seq # 58, 59 … or do you warehouse these #58, 59… messages and wait until you get the resent #55/56/57 messages?
A: no warehousing. We process #58 right away.

Q2b (IV): In your orderbook engine (like Rebus), suppose you get a bunch of order delete/exec/modify messages, but the orderId is unrecognized and possibly pending retrans. Rebus doesn’t know about any pending retrans. What would rebus do about those messages? See other posts on retrans.

Q3 (IV): do you immediately send a retrans request every time you see a gap like (1-54, then 58,59…)? Or do you wait a while?
A: I think we do need to wait since UDP can deliver #55 out of sequence. Note Line A+B are tracked as a combined stream.

Q3b: But how long do we wait?
A: 5 ms according to Deepak

Q3c: how do you keep a timer for every gap identified?
%%A: I think we could attach a timestamp to each gap.

Q4: after you send a retrans request but gets no data back, how soon do you resend the same request again?
A: a few mills, according to Deepak. I think

Q4b: Do you maintain a timer for every gap?
%%A: I think my timestamp idea will help.

Q: You said the retrans processing in your parser shares the same thread as regular (main publisher) message processing. What if the publisher stream is very busy so the gaps are neglected? In other words, the thread is overloaded by the publisher stream.
%%A: We accept this risk. I think this seldom happens. The exchange uses sharding to ensure each stream is never overloaded.

OPRA feed processing – load sharing

On the Frontline, one (or more) socket receives the raw feed. Number of sockets is dictated by the data provider system. Socket somehow feeds into tibco (possibly on many boxes). Optionally, we could normalize or enrich the raw data.

Tibco then multicasts messages on (up to) 1 million channels i.e. hierarchical subjects. For each underlier, there are potentially hundreds of option tickers. Each option is traded on (up to) 7 exchanges (CME is #1). Therefore there’s one tibco subject for each ticker/exchange. These add up to 1 million hierarchical subjects.

Now, that’s the firmwide market data distributor, the so-called producer. My system, one of the consumers, actually subscribes to most of these subjects. This entails a dual challenge

* we can’t run one thread per subject. Too many threads.
* Can we subscribe to all the subjects in one JVM? I guess possible if one machine has enough kernel threads. In reality, we use up to 500 machines to cope with this volume of subjects.

We ended up grouping thousands of subjects into each JVM instance. All of these instances are kept busy by the OPRA messages.

Note it’s not acceptable to skip any single message in our OPRA feed because the feed is incremental and cumulative.

order book live feed – stream^snapshot

Refer to the blog on order-driven vs quote-driven markets. Suppose a ECN receives limit orders or executable quotes (like limit orders) from traders. ECN maintains them in an order book. Suppose you are a trader and want to receive all updates to that order book. There are 2 common formats

1) The snapshot feed can simplify development/integration with the ECN as the subscriber application does not need to manage the internal state of the ECN order book. The feed publishes a complete snapshot of the order book at a fixed interval; this interval could be something like 50ms. It is possible to subscribe to a specific list of securities upon connecting.

2) Streaming event-based format. consumes more bandwidth (1Mb/sec) and requires that the subscriber manages the state of the ECN order book by applying events received over the feed. The advantage of this feed is that subscriber’s market data will be as accurate as possible and will not have the possible 50ms delay of the snapshot feed.

simplified wire data format in mkt-data feed

struct quote{
  char[7] symbol; // null terminator not needed
  int bidPrice; // no float please
  int bidSize;
  int offerPrice;
  int offerSize;
}
This is a 23-byte fixed-width record, extremely network friendly.

Q: write a hash function q[ int hashCode(char * char7) ] for 300,000 symbols
%%A: (c[0]*31 + c[1])*31 …. but this is too slow for this options exchange

I was told there’s a solution with no loop no math. I guess some kind of bitwise operation on the 56bits of char7.

10 coding tips of high-frequency exchange connectivity

[c] use enum/char rather than vptr #Miami
[c] hand-crafted hash table, which is often more optimized for our specific requirement #Barcap
[c] Fixed-width, padded char-array without null terminator #Miami
[c] favor Set to Map
[c] hand-crafted reference counted data structure #Miami
[c] monolithic network-friendly wire format STRUCT for high volume data transfer. Nothing fancy.
[c] java finalizers slowing down garbage collection
[c] favor arrays to object graphs #UBS
[c] avoid boxing and java colllections. Favor primitives. #CS
specialized operator-new. See [[eff c++]]
unions

—-
Note — all of these are considered optimizations on memory. That’s where C++ outsmarts java.

[c=according to insiders]