async messaging-driven #FIX

A few distinct architectures:

  • architecture based on UDP multicast. Most of the other architectures are based on TCP.
  • architecture based on FIX messaging, modeled after the exchange-bank messaging, using multiple request/response messages to manage one stateful order
  • architecture based on pub-sub topics, much more reliable than multicast
  • architecture based on one-to-one message queue

non-blocking socket readiness: alternatives2periodic poll

Some Wells interviewer once asked me —

Q: After your non-blocking send() fails due to a full buffer, what can you do to get your data sent ASAP?

Simple solution is retrying after zero or more millisecond. Zero would be busy-weight i.e. spinning CPU. Non-zero means unwanted latency.

A 1st alternative is poll()/select() with a timeout, and immediately retry the same. There’s basically no latency. No spinning either. The linux proprietary epoll() is more efficient than poll()/select() and a popular solution for asynchronous IO.

2nd alternative is SIGIO. says it doesn’t waste CPU. P52 [[tcp/ip sockets in C]] also picked this solution to go with non-blocking sockets. is actually a concise overview of several alternatives

  • non-blocking socket
  • select/poll/epoll
  • .. other tricks

async – 2 threads 2 different execution contexts

See also

Sync call is simple — the output of the actual operation is processed on the same thread, so all the objects on the stack frame are available.

In an async call, the RR fires and forgets. Firing means registration.

The output data is processed …. usually not RR thread. Therefore the host object of the callback and other objects on the RR stack frame are not available.

If one of them is made available to the callback, then the object must be protected from concurrent access. – initial phrasebook

producer/consumer – upgraded.

buffer – as explained elsewhere in my blog, there’s buffer in any async design. In the ExecutorCompletionService scenario, the buffer is the “completion queue”. In the classic producer/consumer scenario, buffer is the item queue.

items = “tasks” – In P/C setup, Thread 1 could be producing “items” and Thread 2 could be taking up the items off the buffer and using them. In the important special case of task items, the consumer thread
(possibly worker thread) would pick up the task from queue and execute them. CompletionService is all about task items.

tasks executed by..? – in P/C with task queue, tasks are executed by consumer. In CompletionService, tasks are executed by the mysterious “Service”, not consumer. See CompletionService javadoc.

3-party – 1 more than P/C. Beside the P and C threads, the task executor could run on another thread.

dotnet wait handles – phrasebook

win32 – wrapper over win32 native constructs, (presumably) like file handles and other OS handles.
** p/invoke – these wrappers save you the p/i calls

kernel – the underlying are kernel constructs and probably involve kernel “Services”

predate – the kernel constructs predate the dotnet framework. I think they are part of win32 API.

conditionVar – I feel these are not like the condition variables offered by thread libraries

–some important dotnet constructs using wait handles
* IAsynchResult
* Mutex class
* Semaphore class
* signal events like AutoResetEvent and ManualResetEvent. Despite the confusing name, unrelated to the dotnet events.

http web service in async mode@@ #YH

I think in my dotnet client/server system, we use WCF duplex extensively, meaning the server can push updates to the clients. Each client must subscribe though. I think this is exactly event-driven as you said.

I remember that if I put a break point in the client’s updateReceived() callback method, it gets hit automatically, without the client polling the server.

WCF duplex is a standard and mature feature in microsoft WCF.

The server endpoint uses https, and I believe it’s a web service.

Is it possible that the server push is implemented actually by client poll under the hood? I don’t think so. There’s a polling duplex ..


every async operation involves a sync call

I now feel just about every asynchronous interaction involves a pair of (often remote) threads. (Let’s give them simple names — The requester RR vs the provider PP). An async interaction goes through 2 phases —

Phase 1 — registration — RR registers “interest” with PP. When RR reaches out to PP, the call must be synchronous, i.e. Blocking. In other words, during registration RR thread blocks until registration completes. RR thread won’t return immediately if the registration takes a while.

If PP is remote, then I was told there’s usually a local proxy object living inside the RR Process. Registration against proxy is faster, implying the proxy schedules the actual, remote registration. Without the scheduling capability, proxy must complete the (potentially slow) remote registration on the RR thread, before the local registration call returns. How slow? If remote registration goes over a network or involves a busy database, it would take many milliseconds. Even though the details are my speculation, the conclusion is fairly clear — registration call must be synchronous, at least partially.

Even in Fire-and-forget mode, the registration can’t completely “forget”. What if the fire throws an exception at the last phase after the “forget” i.e. after the local call has returned?

Phase 2 — data delivery — PP delivers the data to an RR2 thread. RR2 thread must be at an “interruption point” — Boost::thread terminology. I was told RR2 could be the same RR thread in WCF.

async web service – windows app ^ browser app mentioned that …

If the application needs the web service output in the later stages of the thread, then the WaitHandle is the best approach. For example, if the web service queries a database and retrieves a value needed in the parent process and then displays the final result on GUI, then WaitHandle should be used. We need the parent thread to block at a certain stage.

Most web apps come under this scenario. In all other scenarios, we can use callback. Mostly Windows apps use this approach.

In my mind this is the inherent difference between windows and web apps — web app has a dominant thread returning data to the browser while other threads are Unwanted or optional.

Windows apps (swing, winform, wpf etc) typically use a lot of threads beside the UI thread. Most tasks should avoid the UI thread, and keep the UI snappy.

select() syscall and its drv

select() is the most “special” socket kernel syscall (not a “standard library function”). It’s treated special.

– Java Non-blocking IO is related to select().
– Python has at least 3 modules — select, asyncore, asynchat all built around select()
– dotnet offers 2 solutions to the same problem:

  1. a select()-based and
  2. a newer asynchronous solution to the same problem

producer/consumer – fundamental to MT(+MOM/trading) apps

I hardly find any non-trivial threading app without some form of producer/consumer.

#1) Async — always requires a buffer in a P/C pattern. See

# dispatcher/worker — architectures use a task-queue in a P/C pattern.
# thread pools — all (java or C++) use a task-queue in a P/C pattern.
# Swing — EDT comes with a event queue in a P/C pattern
# mutex — Both producer and consumer needs write access to a shared object, size 1 or higher. Always needs a mutex.
# condVar — is required in most cases.

async (almost)always requires buffer and additional complexity

Any time I see asynchronous (swing, MOM etc), i see additional complexity. Synchronous is simpler. Synchronous means blocking, and requires no object beside the caller actor and service actor. The call is confined to a single call stack.

In contrast, async almost always involves 2 call stacks, requires a 3rd object in the form of a buffer [1]. Async means caller/sender can return before responder/callback even gets the message. In that /limbo/, the message must be kept in the buffer. If responder were a doctor then she might be “not accepting new patients“.

Producer/consumer pattern … (details omitted)
Buffer has capacity and can overflow.
Buffer is usually shared by different producer threads.
Buffer can resend.
Buffer can send the messages out of order.

[1] I guess the swing event object must be kept not just on the 2 call stacks, but on the event queue — the buffer

Q: single-threaded can be async?
A: yes the task producer can enqueue to a buffer. The same thread periodically dequeues. I believe swing EDT thread can be producer and consumer of tasks i.e. events. Requirement — each task is short and the thread is not overloaded.

Q: timer callback in single-threaded?
A: yes. Xtap is single-threaded and uses epoll timeout to handle both sockets and timer callbacks. If the thread is busy processing socket buffers it has to ignore timer otherwise socket will get full. Beware of the two “buffers”:

  • NIC hardware buffer is very small, perhaps a few bytes only, processed by hardware interrupt handler, without pid.
  • kernel socket buffer is typically 64-256MB, processed under my parser pid.
    • some of the functions are kernel tcp/udp functions, but running under my parser pid

See which thread/pid drains NIC_buffer}socket_buffer

real Protagonist in an asynchronous system

When discussing asynchronous, folks say “this system” will send request and go on with other work, and will handle the response when it comes in. When we say that, our finger is pointing at some code Modules — functions, methods, classes — business logic. By “this system” we implicitly refer to that code, not data. That code has life, intelligence, and behavior and looks like the agonist of the show.

However, a code module must run on a thread but such a thread can’t possibly “react” or “handle” an event. It has to be handled on another thread. Therefore the code we are looking at technically can’t react to events. The code handling it is another code module unrelated to our code module. So what’s “this system“?

A: a daemon…. Well not a bad answer, but I prefer …
A: the cache. Possibly in the form of disk/memory database, or in the form of coherence/gemfire, or a lowly hashmap, or more generally a buffer ( In every asynchrous design, I have always been able to uncover a bunch of stateful objects, living in memory long enough for the event/response. It’s clear to me the cache is one eternal, universal, salient feature across all asynchronous systems.

The “cache” in this context can be POJO with absolute zero behavior, but often people put business logic around/in it —
* database triggers, stroed procs
* coherence server-side modules
* gemfire cache listeners
* wrapper classes around POJO
* good old object-oriented techniques to attach logic to the cache
* more creative ways to attach logic to the cache

However, I feel it’s critically important to see through the smoke and the fluff and recognize that even the barebones POJO qualifies as the “this system” (as in the opening sentence). It qualifies better than anyone else.

Swing/WPF can have one thread displaying data according to some logic, and another thread (EDT?) responding to events. Again, there are stateful objects at the core.

gemfire write-behind and gateway queue #conflation, batched update says (simplified by me) —
In the Write-Behind mode, updates are asynchronously written to DB. GemFire uses Gateway Queue. Batched DB writes. A bit like a buffered file writer.

With the asynch gateway, low-latency apps can run unimpeded. See blog on offloading non-essentials asynchronously.

GemFire’s best known use of Gateway Queue technology is for the distribution/propagation of cache update events between clusters separated by a WAN (thus they are referred to as ‘WAN Gateways’).

However, Gateways are designed to solve a more fundamental integration problem shared by both disk and network IO — 1) disk-based databases and 2) remote clusters across a WAN. This problem is the impedance mismatch when update rates exceed absorption capability of downstream. For remote WAN clusters the impedance mismatch is network latency–a 1 millisecond synchronously replicated update on the LAN can’t possibly be replicated over a WAN in the same way. Similarly, an in-memory replicated datastore such as GemFire with sustained high-volume update rates provides a far greater transaction throughput than a disk-based database. However, the DB actually has enough absorption capacity if we batch the updates.

Application is insulated from DB failures as the gateway queues are highly available by default and can be configured to allow zero data loss.

Reduce database load by enabling conflation — Multiple updates of the same key can be conflated and only the final entry (containing all updates combined) written to the database.

Each Gateway queue is maintained on at least 2 nodes, internally arranged in a primary + (one or multiple) secondary configuration.

async JMS — briefly

Async JMS (onMessage()) is generally more efficient than sync (polling loop).

* when volume is low — polling is wasteful. onMessage() is like wait/notify in a producer/consumer pattern
* when volume is high — async can efficiently transfer large volume of messages in one lot. See P236 [[Weblogic the definitive guide]]

Q3: if I implement the MessageListener interface, which method in which thread calls my onMessage()[Q1]? I believe it’s similar to RMI, or doGet() in a servlet. [Q2]

Q1: is wait/notify used?
A1: i think so, else there’s a network blocking call like socket.accept

Q2: Servlets must exist in a servlet container (a network daemon), so how about a message listener? I believe it must exist in a network daemon, too.

sync/async available for both consumer types

Both sync and async are available to both queue and topic consumers. Receive() method is specified in the MessageConsumer interface.

Asynchronous onMessage() is the celebrated selling point of JMS and MOM in general.

I used queue receiver.receive() in the SIM upfront project.

Unfamiliar to some people, a topic subscriber can also call the blocking receive() method or the non-blocking receiveNoWait() method. What if you have logic in onMessage but also need to use receive() occasionally? P74 [[JMS]] has an exmple, where a topic subscriber, upon returning from receive(), immediately calls this.onMessage(theMsgReceived).

I think this technique can be useful in Swing, where a thread sends a request to a JMS broker, and blocks in receive(). Useful if the sender thread has a lot of local data unavilable to any other (like a listener) thread.

SL WCF async call – a real example

In one of my SL apps, client need to
1) connect (via ConnectAsync/ConnectCompleted) to a WCF server so as to subscribe to a stream of server-push heartbeat messages. These heartbeats must update the screen if screen is ready.

2) client also need a timer to drive periodic pings using getXXXAsync/getXXXCompleted. If ping succeeds/fails, we would like to update the status bar. Therefore the callbacks had batter come after screen setup.

3) at start time client makes one-time get1Async/get1Completed, get2Async/get2Completed, get3Async/get3Completed calls to receive initial static data so as to populate the GUI. I wrapped these calls in a static Download6() method.

*) basically all these data from server (either server-pushed or callbacks) “WANT” to update screen provided static data is loaded, which is done by D6 callbacks. So D6 callbacks had better happen before heartbeats come in, or heartbeat processing would hit empty static data and cause problems. Also, D6 callbacks must execute after screen setup, because those callbacks need to update screen.

That’s basically the requirements. In my initial (barely working) solution everything except timer stuff happens on the main thread.

1) at start time, main thread invokes D6. The callbacks would come back and wait for the main thread to execute them — troublesome but doable
2) main thread makes initial ConnectAsync. The callback would hit main thread too
3) before these callbacks, main thread finishes painting the screen and setting up the view model objects.
4) now the D6 and ConnectCompleted callbacks hit main thread in random order. D6 callbacks tend to happen before server pushes heartbeats.
5) upon ConnectCompleted, we subscribe to heartbeats
6) Timer has an initial delay (about 10s) so usually starts pinging after the dust settles. The Async and Completed calls execute on 2 different non-main-threads.

All the “scheduling” depends on event firing. It’s not possible to “do step 1, then at end of it do step 2…” on main thread.

I now prefer to move the connect, d6 etc off main thread, so the callbacks would hit some other threads. More importantly, this way I have a chance to utilize locks/wait-handles/events to properly serialize the Get/Connect calls, the timer-start and the subscription to server-push heartbeats. This means Step B starts only after Step A completes all callbacks. This was not possible earlier as it would block the main thread and freeze the screen.

WCF async call in SL – 2 funny rules

Rule AB0) in all cases, Main thread must not be blocked during the WCF call. Some part of the flow must happen on the Main thread.[1]

A) You can make an xxxAsync call from either Main thread (Thread1 in debugger) or a non-Main-thread such as thread pool or timer.

Rule 1) Main thread will execute xxxAsync(), then finish whatever it has to do, then execute the callback when it becomes idle. The callback ALWAYS executes on the same Main thread.

Rule 2) non-Main thread executes xxxAsync(), then the callback usually executes on the same (or another) non-main-thread.

B) Synchronous? It took me a bit of effort to make the call synchronously. But obey Rule AB0. See

 var channelFactory = new ChannelFactory("*");
var simpleService = channelFactory.CreateChannel();
var asyncResult = simpleService.BeginGetGreeting("Daniel", null, null);
string greeting = simpleService.EndGetGreeting(asyncResult); 

I find it a good rule to avoid calling xxxAsync on Main thread in non-trivial SL apps. I now do it on other threads. It often involves more BeginInvoke, but eliminates a lot of future restrictions.

Incidentally, if the callback is on non-Main thread but needs to modify View or Model, then you need to use BeginInvoke or SynchronizationContext

[1] Suppose you put UI thread on some wait handle or grabbing a lock so it blocks, to be released by the “wcf thread”. If you do that, then “wcf thread” would also block waiting for some part of the processing to run on the Main thread. This is a fairly classic deadlock situation.

asynchronous: meaning…@@

When people talk about async, they could mean a few things.

Meaning — non-blocking thread. HTTP interaction is not async because the browser thread blocks.
* eg: email vs browser
* eg: async query

Meaning — initiating thread *returns* immediately, before the task is completed by another thread.

Meaning — no immediate response, less stringent demand on response time. HTTP server need to service the client right away since the client is blocking. As a web developer i constantly worry about round-trip duration.

Meaning — delayed response. In any design where a response is provided by a different thread, the requester generally don’t need immediate response. Requester thread goes on with its business.
* eg: email, MQ, producer/consumer
* eg: jms onMessage() vs poll, but RV is purely async
* eg: acknowledgement messages in zed, all-http

You can (but i won’t) think of these “meanings” as “features” of async.

Async query (Sybase + others)

Q: What do we mean by Asynchronous query or Asynchronous DB connection?

A: Best answer, based on
The Sybase API knows about 1) synchronous (ie blocking, the default), 2) deferred IO (polling) and 3) async IO (call-back) modes. In the synchronous mode database requests block(see below) until a response is available. In deferred IO mode database requests return immediately (with a return code of CS_PENDING), and you check for completion with the ct_poll() call. In async IO mode the API uses a completion callback to notify the application of pending data for async operations.

—– blocking ^ polling in Sybase and other databases, based on

Asynchronous connection to a database means that multiple commands can be passed to the database server and processed simultaneously. For example, there is no need to wait for the query to return before sending another command to the database server.

Instead of waiting for a “Finished” message to be sent back from the database server, the calling application sends a query status message to the database server at set times asking “Are you done yet?” (polling) When the database server says “Yes”, the result packets start coming back.

Client can send 2 signals to server
* a periodic status check — “my query is completed and ready for download?”
* a cancel

—– polling ^ call-back, based on

To enable fully async mode, you need to have your completion callback Perl-subroutine registered, and you need to also enable async notifications, as well as setting up a timeout for requests.

Your completion callback perl-subroutine is either invoked when ct_poll() is notified of a request completion, or by the OpenClient API directly in full async mode.