This page is intended to list ideas on possible ways of improving the orderbook replication system so that it can better determine when an instrument or instruments are stale. It further lists possible ways of automating retransmission of lost and/or stale data to improve the recovery time in the event of an outage.
The current system is largely “best effort”; at each level of the ticker-plant and distribution network components are capable of dropping data should a problem arise. There are few processes capable of correlating what was lost in a format that is useful to a customer. To account for the possibility of lost data, the orderbook replication components constantly generates “refresh” messages with the most up-to-date state of an instrument.
Although this system works well in practice, it leaves open the possibility that a customer may have a cache with “stale” data for an unbounded length of time. This inability to track staleness can be a point of concern for customers.
= Recovery types =
When a downstream component loses one or more update messages for an instrument it is generally safe to assume the instrument is stale. However there can be two different kinds of recoveries:
== Retransmit the latest snapshot ==
This method of retransmission and stale detection revolves around keeping the current tick snapshot database up to date. It is useful for customers that need an accurate tick cache. It may not be a full solution to customers that need an accurate time and sales database.
== Retransmit all lost ticks ==
It is also possible to retransmit all lost ticks after an outage. This is typically useful when trying to repair a time-and-sales database.
Although it is possible to build an accurate “current” snapshot record when all lost ticks are retransmitted, it is a very tedious and error-prone process. It is expected that customers will, in general, be unwilling to rebuild the “current” from the retransmission of lost ticks.
So, a scheme that involves retransmission of lost ticks will still likely require a scheme that retransmits the latest snapshot.
Most of the following discussions are centered around the concept of latest snapshot recovery.
= Gap prevention =
There may be simple ways to reduce the number of times gaps occur. This process could be called “gap prevention”.
In general, it is not possible to eliminate gaps, because severe outages and equipment failure can always occur. The process of gap prevention may be useful, however, where the alternative gap recovery solution is expensive or undesirable. It is also useful in systems that need full lost tick recovery.
There are two possible ways of preventing gaps from occurring. Both trade bandwidth and latency for increased reliability during intermittent outages.
== Wait for retransmit ==
The simplest form of gap prevention involves retransmitting any packets lost on the network. The sender keeps a buffer of recently sent messages, and the receiver can request retransmissions. In the event of packet loss, the receiver waits for the retransmissions before processing data.
This form of gap recovery is a basic feature of the TCP/IP transmission protocol.
== Forward error correction ==
It is also possible to prevent gaps by sending additional data on the feed.
The most basic form of this is used in the “best-of-both” system. It sends two or more copies of the data, and the receiver can fill lost ticks from the additional copies.
It is not necessary to send a full additional feed. For example, one could send a block of parity codes on every tenth packet. A receiver could then theoretically recover from up to ten percent packet loss by using the parity code packets.
Although the forward error correction scheme uses additional bandwidth, additional bandwidth may be available due to line redundancy.
= Snapshot recovery types =
In order to correct a stale instrument, it may be necessary to send the full contents of the instrument. When doing so, one may send them serialized in the real-time feed or out of order.
== In sequence snapshots ==
The simplest form of snapshot transmission involves using the same socket or pipe as the real-time feed itself. In this case, the receiver can simply apply the snapshot to its database; it does not need to handle the cases where the snapshot arrives after/before a real-time update arrives.
The downside of this scheme, however, is that a single upstream supplier of snapshots might get overloaded with requests for retransmissions. If additional distributed databases are used to offload processing, then each additional component will add latency to the real-time feed.
== Out of sequence snapshots ==
It is also possible to send snapshot transmissions using sockets and/or pipes separate from the real-time feed. The advantage of this scheme, is it is relatively cheap and easy to increase the number of distributed snapshot databases from which one can query. However, it requires the receiver of the snapshot to work harder when attempting to apply the response to its database.
One way to apply out of order snapshots is to build a “reorder buffer” into the receiver. This buffer would store the contents of the real-time feed. When a snapshot response arrives, the receiver could locate where the sender was in the real-time stream when it generated the snapshot (possibly by using sequence numbers). It can then apply the snapshot and internally replay any pending updates from the reorder buffer. In the case where a snapshot arrived that was based on real-time traffic that the receiver has yet to receive, the receiver must wait for that traffic to arrive before applying the snapshot.
This scheme is thought to be complex, resource intensive, and error-prone.
If, however, the feed were changed to eliminate the distributed business rules, it may be possible to implement a much simpler out-of-order snapshot application system. See [[Out of sequence snapshots idea]] for a possible implementation.
= Gap detection =
In order to accurately determine when an instrument is “stale”, it is necessary to be able to determine when one or more update messages have been lost. The following sections contains notes on various schemes that can provide this information.
Note that some of the schemes may be complementary. That is, a possible future solution might use parts of several of these methods.
== Sequence numbers with keep-alive on idle ==
The most common method to detect a gap involves placing a monotonically incrementing number on all outgoing messages or message blocks. The receiver can then detect a gap when a message (or message block) arrives with a sequence number that is not one greater than the last sequence number.
In order to account for the case where all messages from a source are lost, or where a source goes idle just after a message loss, the sender needs to arrange to transmit a “keep alive” indicator periodically when the source is otherwise idle. With knowledge of the keep-alive period, the receiver can detect a gap by timing out if it does not receive a message from a source within the specified period. The larger the period, the less keep-alive messages need to be sent when idle. However, it also increases the worst case time to detect a message gap.
It is possible for the sender to generate multiple sequence number series simultaneously by separating instruments into multiple categories. For example, the outgoing feed currently generates an independent sequence number for each “exchange”. At the extreme, it is possible to generate a sequence number stream per instrument; however this would increase the bandwidth due to larger number of keep-alive messages necessary. (One would also not be able to optimize bandwidth by sequencing only message blocks.)
When a sequence number gap is detected, the receiver must consider all instruments using the sequence series as suspect.
Only a single source can reliably generate a sequence number. If multiple ticker-plants generate a feed, they need to use different sequence series. If an upstream ticker-plant switch occurs, the receiver needs to mark the full range of affected instruments as suspect.
Finally, some exchanges provide sequenced data feeds. However, there are [[issues with exchange provided sequence numbers]]. Due to this, it may be difficult to rely on exchange sequencing as a basis for a distribution network sequencing.
== Sequence number of last message ==
A variant of the basic sequencing scheme involves the sequence number of the last message (SNLM) that updates an instrument. This field would be kept by both sender and receiver, and included with real-time messages. If SNLM matches the receiver’s record, implying that the receiver has not missed any updates for this instrument, then the instrument can transition from “suspect” to “clean”. Conversely, a non-match should force the instrument to “stale”.
An advantage of this scheme is that ordinary real-time traffic could reduce the number of suspect records after an outage. It may also make using exchange provided sequence numbers more practical.
As a disadvantage, however, it would require that both a sequence number and SNLM be provided on every real-time update. This might significantly increase bandwidth.
== Periodic message count checks ==
It is also possible to detect gaps if the sender periodically transmits an accounting of all messages sent since the last period. This scheme may use less bandwidth than sequence numbers, because it is not necessary to send a sequence number with every message (or message block).
The scheme still has the same limitations as sequence numbers when ticker-plant switches occur and when trying to determine what was lost when a gap occurs.
== Periodic hash checks ==
Another possible method of detecting gaps is by having the sender generate a hash of the contents of its database. The receiver can then compare the sender’s hash to the same hash generated for its database. If the two differ, a gap must have occurred. (If the two match, however, a gap may have occurred but already been corrected; this method is therefore not useful when full tick recovery is necessary.)
This scheme may be beneficial when ticker-plant switches occur. If two senders have identical databases and no data is lost during a network switch, then the hash checks should still match at the receiver. This scheme, however, still faces the problem of determining which instruments from the set are actually stale when a gap is detected.
Technically, it is possible that two databases could differ while sharing the same hash key. However, it is possible to choose a hash function that makes the possibility of this extremely small.
Finally, this system may face challenges during software upgrades and rollouts. If either the sender or the receiver change how or what they database, it may be difficult to maintain a consistent hash representation.
== Sender tells receiver of gaps ==
If a reliable transmission scheme (eg, tcp) is in use between the sender and receiver, then it may be possible for the sender to inform the receiver when the receiver is unable to receive some portion of the content.
For example, if a sender can not transmit a block of messages to a receiver because the receiver does not have sufficient bandwidth at the time of the message, then it is possible for the sender to make a note of all instruments that receiver was unable to receive. When the receiver has sufficient bandwidth to continue receiving updates, the sender can iterate through the list of lost instruments and inform the receiver.
The scheme has the advantage that it allows the receiver to quickly determine what instruments are stale. It may also be useful when a component upstream in the ticker-plant detects a gap – it can just push down the known stale messages to all components down-stream from it. (For example, an exchange parser might detect a gap and send a stale indicator downstream while it attempts to fill the gap from the exchange.)
As a disadvantage, it may significantly complicate data senders. It also does not help in cases where a receiver needs to change to a different sender.
== Receiver analyzes gapped messages ==
In some systems, the receiver may need to obtain all lost messages (eg, to build a full-tick database). If the receiver knows the contents of messages missing earlier in the stream it can determine which messages are stale. Every instrument that contains an update message in the list of missing messages would be stale; instruments that did not have update messages would be “clean”.
An advantage of this system is that it is relatively simple to implement for receivers that need full tick retransmissions.
However, in the general case, it is not possible to implement full tick retransmissions due to the possibility of hard failures and ticker-plant switches. Therefore this scheme would only be useful to reduce the number of stale instruments in certain cases.
Also, the cost of retransmitting lost ticks may exceed the benefits found from reducing the number of instruments marked stale. This makes the scheme less attractive for receivers that do not need all lost ticks retransmitted.
= Stale correction =
This section discusses possible methods of resolving “suspect conditions” that occur when it is detected that an instrument may have missed real-time update messages.
There are likely many other possible schemes not discussed here. It is also possible that a combination of one or more of these schemes may provide a useful solution.
These solutions center around restoring the snapshot database. Restoration of a tick history database is left for a separate discussion.
== Background refresh ==
The simplest method of clearing stale records is to have the ticker-plant generate a periodic stream of refresh messages. This is what the system currently does.
This system is not very good at handling intermittent errors, because it could take a very long time to refresh the full instrument database. However, if enough bandwidth is allocated, it is a useful system for recovering from hard failures where the downstream needs a full refresh anyway. It is also possible to combine this with one of the gap prevention schemes discussed above to help deter intermittent outages.
* simple to implement at both receiver and sender
* time to recovery can be large
* can be difficult to detect when an instrument should be deleted, or when an IPO should be added
== Receiver requests snapshot for all stale instruments ==
In this system, the receiver would use one of the above gap detection mechanisms to determine when an instrument may be stale. It then issues a series of upstream requests until all such instruments are no longer stale.
In order to reduce the number of requests during an outage, the instruments on the feed could be broken up into multiple sets of sequenced streams (eg, one per exchange).
* could lead to faster recovery when there is available bandwidth and few other customers requiring snapshots
* could be complex trying to request snapshots for instruments where the initial create message is lost
* see discussion on [[#Snapshot recovery types]]
* see discussion on [[#Gap detection]] for possible methods of reducing the universe of suspect instruments during an outage
== Sender sends snapshots ==
This is a variant of [[#Sender tells receiver of gaps]]. However, in this scheme, the sender would detect a gap for a receiver and automatically send the snapshot when bandwidth becomes available. (It may also be possible to send only the part of the snapshot that is necessary.)
* Simple for receiver
* Could be complex for sender
* Isn’t useful if receiver needs to change upstream sources.
== Receiver requests gapped sequences ==
This method involves the receiver detecting when an outage occurs and making an upstream request for the sequence numbers of all messages (or message blocks) not received. The sender would then retransmit the lost messages (or blocks) to the receiver.
The receiver would then place the lost messages along with all subsequently received messages into a “reorder” buffer. The receiver can then internally “play back” the messages from the reorder buffer to rebuild the current state.
* Useful for clients that need to build full-tick databases and thus need the lost messages anyway.
* Thought to be complex and impractical to implement. The reorder buffer could grow to large sizes and might take significant resources to store and apply.
* The bandwidth necessary to retransmit all lost messages may exceed the bandwidth necessary to retransmit the current state of all instruments.
* Doesn’t help when a ticker-plant switch occurs.
== Sender analyzes gapped sequences ==
This scheme is a variant on [[#Receiver requests gapped sequences]]. The receiver detects when an outage occurs and makes an upstream request for the sequence numbers of all messages (or message blocks) not received.
Upon receipt of the request the sender would generate a series of snapshots for all instruments that had real-time updates present in the lost messages. It can do this by analyzing the contents of the messages that it sent but the receiver did not obtain. The sender would also have to inform the receiver when all snapshots have been sent so the receiver can transition the remaining instruments into a “not stale” state.
* May be useful in conjunction with gap prevention. That is, the sender could try resending the lost messages themselves if there is a good chance the receiver will receive them before timing out. If the receiver does timeout, the sender could fall back to the above snapshot system.
* May be simple for receivers
* May be complicated for senders
* Doesn’t help when a ticker-plant switch occurs.
* Either in-sequence or out-of-sequence snapshot transmissions could be used. (See [[#Snapshot recovery types]] for more info.) The receiver need not send the requests to the sender – it could send them to another (more reliable) receiver.
== Receiver could ask if update necessary ==
This is a variant of [[#Receiver requests snapshot for all stale instruments]], however, in this system the receiver sends the sequence number of the last message that updated the instrument (SNLM) with the request. The sender can then compare its SNLM with the receiver’s and either send an “instrument okay” message or a full snapshot in response.
* Reduces downstream bandwidth necessary after an outage
* Doesn’t work well in cases where instruments are updating, because the receiver and sender may be at different points in the update stream
* Lost create messages – see disadvantages of [[#Receiver requests snapshot for all stale instruments]]
== Receiver could ask with hash ==
This is a variant of [[#Receiver could ask if update necessary]], however, in this system the receiver sends a hash value of the current instrument’s database record with the request. The sender can then compare its database hash value with the receiver’s and either send an “instrument okay” message or a full snapshot in response.
* Works during tp switches
* Doesn’t work well in cases where instruments are updating, because the hash values are unlikely to match if sender and receiver are at a different point in the update stream.
* Rollout issues – see [[#Periodic hash checks]]
* Lost create messages – see disadvantages of [[#Receiver requests snapshot for all stale instruments]]
= Important considerations =
In many stale detection and correction system there are several “corner cases” that can be difficult to handle. Planning for these cases in advance can simplify later development issues.
The following is a list of “corner cases” and miscellaneous ideas:
== Ticker plant switches ==
It can be difficult to handle the case where a receiver starts obtaining messages from a different ticker-plant. Our generated sequence numbers wont be synchronized between the ticker-plants. Many of the above schemes would need to place any affected instruments into a “suspect” state should a tp switch occur.
Even if one could guarantee that no update messages were lost during a tp switch (for example by using exchange sequence numbers) there might still be additional work. The old ticker-plant might have been sending incorrect or incomplete messages — indeed, that may have been the reason for the tp switch.
== Lost IPO message ==
When the real-time feed gaps, it is possible that a message that would have created a new instrument was lost. An automatic recovery process should be capable of recovering this lost information.
There are [[schemes to detect extra and missing records]].
== Lost delete message ==
Similar to the IPO case, a real-time gap could have lost an instrument delete message. An automatic recover process should be able to properly handle this.
A more strange, but technically possible situation, involves losing a combination of delete and create messages for the same instrument. The recovery process should be robust enough to ensure that full resynchronization is possible regardless of the real-time message update content.
There are [[schemes to detect extra and missing records]].
== Exchange update patterns ==
Some exchanges have a small number of instruments that update relatively frequently (eg, equities). Other exchanges have a large number of instruments that individually update infrequently, but have a large aggregate update (eg, US options).
Schemes for gap detection and correction should be aware of these differences and be capable of handling both gracefully.
== Orderbooks ==
Recovering orderbooks can be a difficult process. However, getting it right can result in dramatic improvements to their quality, because orderbooks are more susceptible to problems resulting from lost messages.
The key to getting orderbooks correct is finding good solutions to all of the above corner cases. Orderbooks have frequent record creates and deletes. They also have the peculiar situation where some of the orders (those at the “top”) update with very high frequency, but most other orders (not at the “top”) update very infrequently.
== Sources can legitimately idle ==
Many exchanges follow a pattern of high traffic during market hours, but little to no traffic on off hours. Ironically, the traffic near idle periods can be extremely important (eg, opens, closes, deletes, resets).
It is important to make sure a detection scheme can handle the case where a gap occurs around the time of a legitimate feed idle. It should also be able to do so in a reasonable amount of time. (An example of this is the “keep alive” in the above sequence number scheme.)
== Variations of stale ==
A record is sometimes thought to be either “clean” or “stale”. However, it is possible to graduate and/or qualify what stale means. That is, it is possible to be “suspect” or “suspect for a reason” instead of just being “stale”.
Possible per-instrument stale conditions:
- ; clean : the instrument is not stale
- ; possible gap : gap in sequence number that could affect many instruments
- ; definite gap : some recovery schemes can determine when an instrument has definitely lost updates
- ; upstream possible gap : the tp might have seen a sequence gap from the exchange
- ; upstream definite gap : the tp might have deduced which instruments actually gapped from exchange
- ; stale due to csp startup : the csp was recently started and has an unknown cache state
- ; suspect due to tp switch : a ticker-plant switch occurred
- ; pre-clean : possible state in out-of-order snapshot recovery schemes
- ; downstream gap : in some schemes a sender can inform a receiver that it lost updates