private bank trade/order/quote/execution flow

Remember — Most non-exchange traded products are voice executed. Only a few very dominant, high volume products are electronically executed. Perhaps 1% of the products account for 99% of the trades — by number of trades. By dollar amount, IRS alone is probably 50% of all the trades, and IRS is probably voice-executed, given the large notional amounts.

The products traded between the bank and its clients (not interbank) are often customized private offerings, unavailable from any other bank. (Remember the BofA puttable floats.)

RM / PWA would get live quotes from dealer and give to a client. Sometimes dealer publishes quotes on an internal network, but RFQ is more common. Any time the quote could be executed between RM and client. RM would book the new position into the bank's database. As soon as as executed (before the booking), the bank has a position but dealer knows the position only after the booking, and would hedge quickly.

Dealer initially only responds to RFQ. It's usually executed without her knowledge, just like an ECN flow.

I think in BofA's wealth management platform, many non-equity products (muni bonds are largely sold to retail clients) trade in the same way. Dealer publishes quotes on an intranet website. RM negotiates with client and executes over the phone. During trade booking, the price and quantity would be validated. Occasionally (volatile market), trade fails to go through and RM must inform client to retry. Perhaps requote. Fundamentally, the dealer gets a last look, unlike the exchange flow.

I believe structured products (traded between bank and clients) are usually not fast and volatile — less requote. However, when dealer hedges the position, I think she often uses vanilla instruments.

Terminology warning — some places use “trade” to mean many things including orders. I think in exchange flow, “order” is a precise word.

credit e-trading – basics

2 “external” venues —
$ (ECN) interdealer electronic brokers — Bloomberg, Marketaxess, TradeWeb, BondDesk, NYSE, TMC. These are like the exchange-connectivity interface.
$ Retail-facing Distributors – Fidelity, Charles Schwab etc. These are often the retail-oriented portfolio/wealth managers. These portfolio managers are like the “client connectivity” interface.
* Each external connectivity above can have a customized FIX protocol.

Volume — 300 trades/day, 1000 RFQ(client inquiries)/day, but 4000 price updates/SECOND at peak! Probably incoming quotes. These would update internal cache within the bank. These “incoming” updates must be synchronized with other updates initiated from “within” the bank. Probably the trickiest technical challenge in this platform.

Most trades are institutional (often from wealth management firms) but there are retail trades as small as $5000/trade.

The treasury work-up process also takes place on some of these ECN’s.

Most important products — corporate bonds, CDS, CDX. 

(Based on a major credit trading sell-side.)

simplified order book design doc – jump

It’s tempting to use virtual function processMessage() to process various order types (A, M, X …) and trade types, but virtual functions add runtime overhead. Template specialization is a more efficient design, but due to the limited timeframe I implemented an efficient and simple alternative.

Assumption: M messages can only change quantity. Price change not allowed — Sender would need to cancel and submit new order. The B/S and price fields of the order should not change, but validation is omitted in this version.

Assumption: T messages and the corresponding M and X messages (also the initiator A message) are assumed consistent and complete. Validation is technically possible but omitted. Validation failure indicates lost messages.

The cornerstone of the design is the data structure of the order book — a RB-tree of linked lists. Add is O(logN) due to the tree-insert. Modify is O(1) thanks to the lookup array. Remove is O(1) — eliminating tree search. This is achieved with the lookup array, and by saving iterator into the order object.

There are 2 containers of pointers — the map of lists and the lookup-array. It would be better to use container of smart pointers to ease memory management, but STL doesn’t provide any smart pointer.

All equality test on doubles are done using “==”. Should use some kind of tolerance if time permits.

Here’s the documentation in the lookup array class

/*This class encapsulates an array of pointers.
 Assumption 1 — Exchanges is likely to generate auto-incrementing orderID’s. Here’s my reasoning. OrderID’s are unique, as stated in the question. If orderID generation isn’t continuous, then the generator has 2 choices about the inevitable gap between 2 adjacent ID numbers. It can keep the gap forever wasted, or somehow “go back” into a gap and pick a number therein as a new orderID. To do the latter it must keep track of what numbers are already assigned — rather inefficient. There are proven in-memory algorithms to generate auto-increment identity numbers. I assume an exchange would use them. Auto-increment numbers make a good candidate as array index, but what about the total number range?

 Assumption 2 — each day the number range has an upper limit. Exchange must agree with exchange members on the format of the orderID. It’s likely to be 32 bits, 64 bits etc and won’t be a million bits.

 Question 1: Can the exchange use OrderID 98761234 on both IBM and MSFT during a trading day? I don’t know and i feel it doesn’t matter. Here’s the reason.

 Case 1: suppose exchange uses an *independent* auto-increment generator for each stock. So IBM and MSFT generators can both generate 98761234. My design would use one array for IBM and one array for MSFT. For basket orders, additional generator instances might be needed.

 Case 2: suppose exchange uses an independent auto-increment generator for each stock, but each stock uses a non-overlap number range. 98761234 will fall into IBM number range. My design would need to know the number range so as to convert orderID to array index and conserve memory.

 Case 3: suppose exchange uses a singleton auto-increment generator across all stocks (bottleneck inside the exchange). My design would use one gigantic array. Given Assumption 1, the numbers would be quasi-continuous rather than sparse — below 50% of the range is assigned. Suppose the range is S+1, S+2 … S+N, then my array would be allocated N elements (where S is orderIDBase). There’s a limit on N in reality. Every system is designed for a finite N — no system can handle 10^9999 (that’s one followed by ten thousand zeros) orders in a day. Each array element is a pointer. For a 64-bit machine, N elements take 64N bits or 8N bytes. If I have 640GB memory, N can be 80 billion but not higher. To scale out horizontally, we would hope Case 1 or 2 is the case.

 Therefore the answer to Question 1 shows array of pointer is feasible for the purpose of lookup by orderID. In a real system hash table is likely to be time/space efficient. In this exercise, only STL is available, which provides no hash table. Tree based map has logN time complexity — too slow. My choice is between a built-in array vs a non-expanding vector. I chose array for simplicity.

hide client names and address

I proposed a system to a buy-side asset manager shop. I said client names don't need to stored in the central database. Maybe the salesforce and investment advisors need the names but they don't need to save those in a shared central database for everyone else to see.

Each client is identified by account id, which might include an initial.

When client logs in to a client-facing website, they will not see their name but some kind of relatively public information such as their self-chosen nick name, investment objectives, account balance, and last login time.

Client postal address is needed only for those who opt for paper statement. And only one system needs to access it — the statement printing shop.

A veteran in a similar system told me this is feasible and proposed an enhancement — encrypt sensitive client information.

What are some of the inconveniences in practice?

strictly 1 thread/symbol – fastest market data distributor

Quotes (and other market data) sent downstream should be in FIFO sequence, not out-of-sequence (OOS).

In FX and cash equities (eg EURUSD), I know many major market data aggregators design the core of the core feed engine to be single-threaded — each symbol is confined to a single “owning” thread. I was told the main reason is to avoid synchronization between 2 load-sharing threads. 2 threads improve throughput but can introduce OOS risk.

You can think of a typical stringent client as a buy-side high-frequency trader (HFT). This client assumes later-delivered quote is physically “generated” later. If 2 quotes arrive on the same name, one by one, then the later one always overwrites the earlier one – conflation.

A client’s HFT can react in microseconds, from receiving quote (data entering client’s network) to placing orders (data leaving client’s network). For such a fast client, a little bit of delay can be quite bad, but not as bad as OOS. I feel OOS delivery makes the data feed unreliable.

I was told many automated algo trading engines (including automatic offer/bid pricers in bond) send fake orders just to test the market. It sends a test order and waits for the response in the data feed. An OOS delivery would confuse this “observer”.

A HFT could be trend-sensitive. It monitors the rise and fall of sizes of the quotes on a given name (say SPX). It assumes the market data are delivered in-sequence.

common technical challenges in buy-side software systems

10 – 30% of wall st IT jobs are on the buy-side such as funds, portfolio and asset management.

* Core challenge – sub ledger. In one system there was more than 100,000 client accounts in the sub ledger. Each is managed by professional financial advisors. I believe each account on average could have hundreds of positions. (Thousands would be overwhelming I feel.) Since clients (and FA) need instant access to their “portfolio”, there are too many positions to keep up to date. Core of the entire IT infrastructure is the subledger.

** Number of trades per day is a 2nd challenge. These aren’t high-frequency traders, but many Asian clients do nothing but brokerage (equity) trades. Per-account not many, but all the accounts combined is a lot of processing overnight. These must add to the sub ledger by next day.

* quarterly performance reporting
** per-fund, per-account, per-position
** YTD, quarterly, annual etc

I guess there is also monthly performance reporting requirement in some cases.

* asset allocation and periodic portfolio re-balancing — for each client. Key differentiators. Investors get a large “menu” of funds, products … For comparison, they may want performance metrics.

– VaR or realtime risk? Probably important to large funds
– pricing? Probably important to large funds

– swing/wpf not really required. Web is adequate.
– trade booking? not a challenge

OPRA feed processing – load sharing

On the Frontline, one (or more) socket receives the raw feed. Number of sockets is dictated by the data provider system. Socket somehow feeds into tibco (possibly on many boxes). Optionally, we could normalize or enrich the raw data.

Tibco then multicasts messages on (up to) 1 million channels i.e. hierarchical subjects. For each underlier, there are potentially hundreds of option tickers. Each option is traded on (up to) 7 exchanges (CME is #1). Therefore there’s one tibco subject for each ticker/exchange. These add up to 1 million hierarchical subjects.

Now, that’s the firmwide market data distributor, the so-called producer. My system, one of the consumers, actually subscribes to most of these subjects. This entails a dual challenge

* we can’t run one thread per subject. Too many threads.
* Can we subscribe to all the subjects in one JVM? I guess possible if one machine has enough kernel threads. In reality, we use up to 500 machines to cope with this volume of subjects.

We ended up grouping thousands of subjects into each JVM instance. All of these instances are kept busy by the OPRA messages.

Note it’s not acceptable to skip any single message in our OPRA feed because the feed is incremental and cumulative.

Database: limited usage]real time trad`

“Database” and “Real-time trading” don’t rhyme!

See Trading systems use lots of MOM and distributed cache.

In comparison, DB offers perhaps the most effective logging/audit. I feel every update sent to MOM or cache should ideally be asynchronously persisted in DB. I would probably customize an optimized DB persistence service to be used across the board.

Just about any update in cache need to be persisted, because cache is volatile memory. Consider flat file.

real time high volume FX quote processing #letter

Horizontal scale-out (distributing to different boxes) is the design of choice when we are cpu-bound. For instance, if we get hundreds of updates a sec and each update requires repricing a large number of objects.

Ideally, you would want cpu to be saturated. By using twice the hardware threads, you want throughput to doubles. Our pricing engine didn’t have that much cpu load, so we didn’t scale out to more than a few boxes.

The complication of scale-out is, data required to reprice one object may reside in different boxes. People try many solutions like memory virtualization (non-trivial synchronization cost + network latency), message-passing, RMI, … but I personally prefer the one-big machine approach. Throw in 16 (or 128) processors, each with say 4 to 8 hardware threads, run 64-bit, throw in 32G RAM. No network latency. No RMI/messaging latency. I think this hardware is rather costly. Total cost of 8 smaller machines with a comparable total CPU power would cost much less, so most big banks prefer it – so-called grid computing.

According to my observations, most practitioners in your type of situations eventually opt for scale-out.

It sounds like after routing a message, your “worker” process has all it needs in its local memory. That would an ideal use case for parallel processing.

I don’t know if FX spot real time pricing is that ideal. Specifically, suppose a worker process is *dedicated* to update and publish eur/usd spot quote. I know you would listen to the eurusd quotes from all liquidity providers, but do you also need to watch usd/jpy and eur/jpy?

15,000 quotes repriced within a minute

One of my bond pricing engines could price about 15,000 offers/bids in about a minute. 4 slow lanes to avoid
1) database persistence is done asynchronously by gemfire write-behind.

2) offers/bids we produce must be verified by another system, which officially owns the OutgoingQuote table. The verification takes a long time. We avoid that overhead by pricing all the offers/bids in gemfire, then send them out by batch, then wait for the result. The 1 minute speed is without the verification.

3) all reference data is preloaded into gemfire, so no more disk I/O.

4) minimal serialization overhead, since most of the objects needed are in local JVM.

In contrast, a more complex engine, the mark-to-market engine needs a few minutes to price 15,000 positions. This engine doesn't need real time performance.