Q1: pros and cons of vector vs linked list?
Q1b: Given a 100-element collection, compare performance of … (iteration? Lookup?)
Q: UDP vs TCP diff?
%%A: multicast needs UDP.
Q: How would you add reliability to multicast?
Q: How would you use tibco for trade messages vs pricing messages?
Q5: In your systems, how serious was data loss in non-CM multicast?
%%A: Usually not a big problem. During peak volatile periods, messaging rates could surge 500%. Data loss would deteriorate.
Q5b: how would you address the high data loss?
%%A: test with a target message rate. Beyond the target rate, we don’t feel confident.
Q7: how is order state managed in your OMS engine?
%%A: if an order is half-processed and pending the 3nd reply from ECN, the single thread would block.
Q7b: even if multiple orders (for the same security) are waiting in the queue?
%%A: yes. To allow multiple orders to enter the “stream” would be dangerous.
Now I think the single thread should pick up and process all new orders and keep all pending orders in cache. Any incoming exchange messages would join the same task queue (or a separate task queue) – the same single thread.
3 main infrastructure teams
* exchange connectivity – order submission
* exchange connectivity – pricing feed. I think this is incoming-only, probably higher volume. Probably similar to Zhen Hai’s role.
* risk infrastructure – no VaR mathematics.
http://solacesystems.com/news/fastest-jms-broker/ solace JMS broker (Solace Message Router) support 100,000 messages per second in persistent mode and 10 million messages non-persistent. In a more detailed article, http://solacesystems.com/solutions/messaging-middleware/jms/ shows 11 million 100-byte non-persistent messages.
A major sell-side’s messaging platform chief said his most important consideration was the deviation of peak-to-average latency and outliers. A small amount of deviation and (good) predictability were key. They chose Solace.
In all cases (Solace, Tibco, Tervela), hardware-based appliances *promise* at least 10 fold boost in performance compared to software solutions. Latency within the appliance is predictably low, but the end-to-end latency is not. Because of the separate /devices/ and the network hops between them, the best-case latency is in the tens of microseconds. The next logical step is to integrate the components into a single system to avoid all the network latency and intermediate memory copies (including serializations). Solace has demonstrated sub-microsecond latencies by adding support for inter-process communications (IPC) via shared memory. Developers will be able to fold the ticker feed function, the messaging platform, and the algorithmic engine into the same “application” , and use shared memory IPC as the data transport (though I feel single-application design need no IPC).
For best results you want to keep each “application”  on the same multi-core processor, and nail individual application components (like the feed handler and algo engine) to specific cores. That way, application data can be shared between the cores in the Level 2 cache.
 Each “application” is potentially a multi-process application with multiple address spaces, and may need IPC.
Benchmark — Solace ran tests with a million 100-byte messages per second, achieving an average latency of less than 700 nanoseconds using a single Intel processor. As of 2009, OPRA topped out at about a million messages per second. OPRA hit 869,109 mps (msg/sec) in Apr 2009.
Solace vs RV appliance — Although Solace already offers its own appliance, it runs other messaging software. The Tibco version runs Rendezvous (implemented in ASIC+FPGA), providing a clear differentiator between the Tibco and Solace appliances.
Solace 3260 Message Router is the product chosen by most Wall St. customers.
http://kirkwylie.blogspot.com/2008/11/meeting-with-solace-systems-hardware.html provides good tech insights.
Update – Si-Valley also need elites – small number of expert developers.
Financial (esp. trading) IT feels like an elite sector – small number of specialists
– with multi-skilled track record
– familiar with rare, specialized tools — JNI, KDB, FIX, tibrv, sockets, sybase
– Also, Many mainstream tools used in finance IT are used to an advanced level — threading, memory, SQL tuning, large complex SQL
If you compare the track record and skills of a finance IT guy with a “mainstream” tech MNC consultant, the finance guy probably appears too specialized.
That’s one psychological resistance facing a strong techie contemplating a move into finance. It appears risky to move from mainstream into a specialized field.
Culprit: all threads in the pool are blocked in wait(), lock() or …
Culprit: bounded queue is full. Sometimes the thread that adds task to the queue is blocked while doing that.
Culprit: in some systems, there’s a single task dispatcher thread like swing EDT. That thread can sometimes get stuck
Suggestion: dynamically turn on verbose logging in the messaging module within the engine, so it always logs something to indicate activity. It’s like the flashing LED in your router. You can turn on such logging by JMX.
Suggestion: for tibrv, you can easily start a windows tibrv listener on the same subject as the listener inside the trading engine. This can reveal activity on the subject
swing trader station + OMS on the server-side + smart order router over low-latency connectivity layer
* gemfire distributed cache. why not DB? latency too high.
* tibrv is the primary MOM
* between internal systems — FIX based protocol over tibrv, just like Lehman equities. Compare to protobuf object serialization
* there’s more math in risk system; but the highest latency requirements are on the eq front office systems.
jms message selector is executed on the broker.
rvd executes the same duty — “Filter subject-addressed messages.”