Merrill S’pore: fastest stock broadcast

Updates — RV or multicast topic; msg selector

I think this is a typical wall-street interview question for a senior role. System requirement as remembered by my friend the interviewee: ML needs a new relay system to receive real-time stock updates from the stock exachange such as SGX. Each ML client, one of many thousand[1], will each install a new client-software [3] to receive updates on the stocks [2] she is interested. Some clients use algorithmic trading system and need the fastest feed.

[1] Not clear about the order of magnitude. Let’s target 10,000
[2] Not clear how many stocks per client on average. Let’s target 100.
[3] Maintence and customer support for a custom client-software is nightmare and perhaps impractical. Practically, the client-software has to be extremely mature such as browsers or email clients.

Q: database locking?
A: I don’t think so. only concurrent reading. No write-contention.

Key#1 to this capacity planning is how to identify bottlenecks. Bandwidth might be a more severe bottleneck than other bottlenecks described below.

Key#2 — 2 separate architectures for algorithmic clients and traditional clients. Each architecture would meet a different minimum latency standard, perhaps a few seconds for traditional and sub-second for algorithmic.

Solution 0: Whatever broadcasting system SGX uses. In an idea world, no budget constraint. Highest capacity desired.

Solution 2: no MQ? No asynchronous transmission? As soon as an update is received from SGX, the relay calls each client directly. Server-push.

Solution 1: MQ — the standard solution in my humble opinion.

Solution 1A: topics. One topic per stock. If 2000 clients want IBM updates, they all subscribe to this topic.

Q: client-pull? I think this is the bottleneck.

Q: Would Client-pull introduce additional delays?

Solution 1B: queues. one queue for each client each stock.

If 2000 clients want IBM updates, Relay need to make that many copies of an update and send to that many queues — duplication of effort. I think this is the bottleneck. Not absolutely sure if this affects relay system performance. Massively parallel processing is required, with thousands of native CPU threads (not java green threads)

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s