MOM+threading ] low latency apps

Piroz told me that trading IT job interviews tend to emphasize multi-threading and MOM. Some use SQL too. I now feel all of these are unwelcome in low latency trading.

A) MOM – The HSBC interviewer was the first to point out to me that MOM adds latency. Their goal is to get the (market) data from producer to consumer as quickly as possible, with minimum stops in between.

Then I found that the ICE/IDC systems has no middleware between feed parser and order book engine (named Rebus). 29West documentation echos “Instead of implementing special messaging servers and daemons to receive and re-transmit messages, Ultra Messaging routes messages primarily with the network infrastructure at wire speed. Placing little or nothing in between the sender and receiver is an important and unique design principle of Ultra Messaging.

B) threading – ST is generally the fastest in theory and in practice. (I only have a small sample size) I feel the fastest trading engines are ST. No shared mutable.

MT is OK if they don’t compete for resources like CPU, I/O or locks. Compared to ST, most lockfree systems introduce latency like retries.

C) SQL – as stated elsewhere, flat files are much faster than relational DB. How about in-memory relational DB?

Rebus, the order book engine, is in-memory.

Advertisements

MOM advantage over RMI

An Singapore ANZ telephone interviewer (Ivan?) 2011?) drilled me down — “just why is MOM more reliable than a blocking synchronous call without a middleware?” I feel this is a typical “insight” question, but by no means academic or theoretical. There are theories and (more importantly) there are empirical evidence. Here I will just talk about the theoretical explanations.
Capacity — MOM can hold a lot more pending requests than a synch service. A RMI or web server can have a limited queue. The TCP socket can hold requests in a queue, but all limited.  In contrast, MOM queue can be on disk or in the broker host’s memory. Hundreds or possibly millions time higher capacity.
Burst of request can bring down an RMI system even if it is loaded lightly 99% of the time.

But what if the synch service has enough capacity so no caller needs to wait? I feel this is wishful thinking. For the same hardware capacity, MOM can support 10x or 100x more concurrent requests. For now, let’s assume capacity isn’t the issue.

Long-running — if some of the requests take a long time (like a few sec) to complete then we don’t want too many “on-going” tasks at the same time. They compete for CPU/memory/bandwidth and can reduce stability and reliability. Even logging can benefit from async MOM design.
But again let’s assume the requests take no time to complete.
ACID — Reliable MOM always persists messages before replying with a positive ACK.

java calling win32 tibrvsend.exe

            /**
            * daemon and the arg must be distinct strings
            *
             * quotes aren’t necessary
            */
            String[] strArrayWithoutQuote = new String[] {
                        “C:\\tibrv\\8.3\\bin\\tibrvsend”, “-daemon”,
                        “localhost:7500”, “-service”, “9013”, “-network”,
                        “;239.193.224.50,239.193.224.51;239.193.224.50”,
                        “GENERIC.g.TO_MTS_TRADE_REQUEST”, xml };
            System.out.println(Arrays.asList(strArrayWithoutQuote));
            execAndWait(strArrayWithoutQuote);
      }
      /**
      * http://www.rgagnon.com/javadetails/java-0014.html says If you need to
      * pass arguments, it’s safer to a String array especially if they contain
      * spaces.
      */
      static private void execAndWait(String[] command)
            try {
                  Runtime runtime = Runtime.getRuntime();
                  Process p = runtime.exec(command);
                  BufferedReader stdInput = new BufferedReader(new InputStreamReader(p
                              .getInputStream()));
                  String s = null;
                  while ((s = stdInput.readLine()) != null) {
                        System.out.println(s);
                  }
                  BufferedReader stdError = new BufferedReader(new InputStreamReader(p
                              .getErrorStream()));
                  while ((s = stdError.readLine()) != null) {
                        System.err.println(s);
                  }
                  p.waitFor(); // advised to do this after streams
            } catch (IOException e) {
                  throw new RuntimeException(e);
            } catch (InterruptedException e) {
                  throw new RuntimeException(e);
            }
      }

Chicago/Sing HFT IV Aug 2012 (master copy in pearl)

Q1: pros and cons of vector vs linked list?

Q1b: Given a 100-element collection, compare performance of … (iteration? Lookup?)

Q: UDP vs TCP diff?
%%A: multicast needs UDP.

Q: How would you add reliability to multicast?

Q: How would you use tibco for trade messages vs pricing messages?

Q5: In your systems, how serious was data loss in non-CM multicast?
%%A: Usually not a big problem. During peak volatile periods, messaging rates could surge 500%. Data loss would deteriorate.

Q5b: how would you address the high data loss?
%%A: test with a target message rate. Beyond the target rate, we don’t feel confident.

Q7: how is order state managed in your OMS engine?
%%A: if an order is half-processed and pending the 3nd reply from ECN, the single thread would block.

Q7b: even if multiple orders (for the same security) are waiting in the queue?
%%A: yes. To allow multiple orders to enter the “stream” would be dangerous.

Now I think the single thread should pick up and process all new orders and keep all pending orders in cache. Any incoming exchange messages would join the same task queue (or a separate task queue) – the same single thread.

3 main infrastructure teams
* exchange connectivity – order submission
* exchange connectivity – pricing feed. I think this is incoming-only, probably higher volume. Probably similar to Zhen Hai’s role.
* risk infrastructure – no VaR mathematics.

request/reply in one MOM transaction

If in one transaction you send a request then read reply off the queue/topic, i think you will get stuck. With the commit pending, the send won’t reach the broker, so you the requester will deadlock with yourself forever.

An unrelated design of transactional request/reply is “receive then send 2nd request” within a transaction. This is obviously for a different requirement, but known to be popular. See the O’Relly book [[JMS]]

tibrv supports no rollback – non-transactional transport

Non-transactional “transports” such as TIBCO Rendezvous Certified and socket do not allow for message rollback so delivery is not guaranteed.  Non-transactional transports can be problematic because the operation is committed to the transport immediately after a get or put occurs, rather than after you finish further processing and issue a Commit command.

JMS does support transactional “transport”, so you can “peek” at a message before issuing a Commit command to physically remove it from the queue.

http://infocenter.sybase.com/help/index.jsp?topic=/com.sybase.ebd.eai.help.src/configuring_a_rendevous_reliable_session_and_transport.htm