Say you have market data feeds from Reuters, Wombat, Bloomberg, eSpeed, BrokerTec, ION… Data covers some 4000 underliers and about half a million derivative instruments on these underliers. For each instrument, there can be new bid/offer/trade ticks at any millisecond mark. Volume is similar to option data feed like OPRA.
Say you have institutional clients (in additional to in-house systems) who register to receive IBM ticks when a combination of conditions occur, like “when bid/ask spread reaches X, and when some other pricing pattern occurs”. There are other conditions like “send me the 11am IBM snapshot best bid/ask”, but let’s put those aside. For each of the instruments, there are probably a few combination of conditions, but each client could have a different target value for a condition — 2% for u, 2.5% for me. Assuming just 10 combination for each instrument, we have 5 million combination to monitor. To fulfill clients, we must continuously evaluate these conditions. CEP and Gemfire continuous query have this functionality.
I proposed a heavily multi-threaded architecture. Each thread is event-driven (primary event) and wakes up to reevaluate a bunch of conditions and generate secondary events to be sent out. It can drop the new 2ndary event into a queue so as to quickly return. The “consumer” can pick up the 2ndary events and send out by multicast.
Each market data vendor (Reuters, e-speed, ION, even tibrv) provides a “client-runtime” in the form of a jar or DLL. You embed the client-runtime into your VM, and it may create private threads dedicated to communicating with the remote publisher.
 Each IBM tick actually has about 10 fields, but each IBM update from vendor only contains 2 fields if the other field the symbol didn’t change. So we need something like Gemfire to reconstruct the entire 10-field object.