SCB-FM design IV #Ashish #80%

Q: Your parser class is a plug-in in a framework. The framework would call your parser’s member function onData(seqNum, packet)  whenever the framework receives a packet on a UDP socket. You need to deal with

  • out of sequence packets
  • duplicate packets

Inside your onData(), you need to invoke a client callback function like theClient->callback(ptr2packet) but you need to invoke it 1) in correct sequence and 2) without duplicates.

Note the requirement is part of TCP’s job functions. TCP receives out-of-sequence packets (and/or duplicates) from IP layer and must deliver sequenced packets to the application layer.

Does TCP use ring buffer or hashtable? I doubt it, but we are building simpler solutions and are free to choose our data structure.

====My solution=====

an illustration
seq # received warehoused send in-use region of ring buffer
1 1
5 5 2-5
9 5,9 2-9
3 3,5,9
2 2-3 4-9
6 5,6,9
8 5,6,8,9
7 5-9
11 5-9,11 4-11
4 4-9 10-11
  • keep the packets (like fixed-size struct instances) in a large singleton circular array (or a deque). Save each packet in a slot keyed by the seq number of the packet (modulus the array size). Remember the nextSeqToSend. If we get a higher sequence than that, just warehouse it in the circular buffer.
  • (Interviewer didn’t ask) How do you reuse slots in the circular buffer? Given ten thousand slots #0~#9999, when I’m warehousing packet #109,999 in slot #9999, then conceptually the old packet in the #0 slot was already sent out, so I can safely “wrap around” to save next packet (#110,000) in there. I can implement my system to ensure it is actually safe.
  • What if the sequence numbers I receive jump wildly? Well, in real systems this will never happen (except an explicit seq reset). At most the sequence numbers jump ahead by a few hundreds. Assuming the sequence numbers arrive largely in correct order with occasional out-of-order arrivals, ring buffer is a natural choice. Without this assumption, dictionary solutions (Ashish saw in QuickFix) might be more suitable.
  • permanent gaps? If I see an old gap (like nextSeqToSend == #55 and no #56 but #57~#8000 all received) then we need a policy to mark the gap as permanent. Otherwise we would have to wait for it indefinitely.

Q (from interviewer): if you use a deque, how do you allocate slot for packet #5 while waiting for #4?
%%A: i would allocate for both, but keep #4 slot vacant. Not sure if std::deque has this API. I think my deque will hold pointers … dummy pointer represents vacant.

Justification for deque is similar to ring buffer — to keep the queue length short and release once-used memory.

I haven’t analyzed hashtables i.e. dictionaries. I believe it’s a proven solution with no major drawbacks.

#1 minor drawback of hashtable-based (or deque) relative to ring buffer is runtime allocation which is about 100 to 1000 times slower than arithmetic operations. For this reason, I always favor ring buffers when it’s a natural and logical data structure choice. This is my bias in many system designs. Sequence-number-based systems can often use ring buffers.

Another minor drawback of hashtable is memory overhead . Ring buffer has relatively small overhead in addition to the packet footprints, Hashtable wraps each packet in a link node. Hashtable also needs an expandable bucket array.

In terms of runtime efficiency, I am not so sure. I feel circular array has faster read/write. Hashtable depends on the hash function, which can degrade due to hash collisions.



Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s