Q: Your parser class is a plug-in in a framework. The framework would call your parser’s member function onData(seqNum, packet) whenever the framework receives a packet on a UDP socket. You need to deal with
- out of sequence packets
- duplicate packets
Inside your onData(), you need to invoke a client callback function like theClient->callback(ptr2packet) but you need to invoke it 1) in correct sequence and 2) without duplicates.
Note the requirement is part of TCP’s job functions. TCP receives out-of-sequence packets (and/or duplicates) from IP layer and must deliver sequenced packets to the application layer.
Does TCP use ring buffer or hashtable? I doubt it, but we are building simpler solutions and are free to choose our data structure.
|seq # received||warehoused||send||in-use region of ring buffer|
- keep the packets (like fixed-size struct instances) in a large singleton circular array (or a deque). Save each packet in a slot keyed by the seq number of the packet (modulus the array size). Remember the nextSeqToSend. If we get a higher sequence than that, just warehouse it in the circular buffer.
- (Interviewer didn’t ask) How do you reuse slots in the circular buffer? Given ten thousand slots #0~#9999, when I’m warehousing packet #109,999 in slot #9999, then conceptually the old packet in the #0 slot was already sent out, so I can safely “wrap around” to save next packet (#110,000) in there. I can implement my system to ensure it is actually safe.
- What if the sequence numbers I receive jump wildly? Well, in real systems this will never happen (except an explicit seq reset). At most the sequence numbers jump ahead by a few hundreds. Assuming the sequence numbers arrive largely in correct order with occasional out-of-order arrivals, ring buffer is a natural choice. Without this assumption, dictionary solutions (Ashish saw in QuickFix) might be more suitable.
- permanent gaps? If I see an old gap (like nextSeqToSend == #55 and no #56 but #57~#8000 all received) then we need a policy to mark the gap as permanent. Otherwise we would have to wait for it indefinitely.
Q (from interviewer): if you use a deque, how do you allocate slot for packet #5 while waiting for #4?
%%A: i would allocate for both, but keep #4 slot vacant. Not sure if std::deque has this API. I think my deque will hold pointers … dummy pointer represents vacant.
Justification for deque is similar to ring buffer — to keep the queue length short and release once-used memory.
I haven’t analyzed hashtables i.e. dictionaries. I believe it’s a proven solution with no major drawbacks.
#1 minor drawback of hashtable-based (or deque) relative to ring buffer is runtime allocation which is about 100 to 1000 times slower than arithmetic operations. For this reason, I always favor ring buffers when it’s a natural and logical data structure choice. This is my bias in many system designs. Sequence-number-based systems can often use ring buffers.
Another minor drawback of hashtable is memory overhead . Ring buffer has relatively small overhead in addition to the packet footprints, Hashtable wraps each packet in a link node. Hashtable also needs an expandable bucket array.
In terms of runtime efficiency, I am not so sure. I feel circular array has faster read/write. Hashtable depends on the hash function, which can degrade due to hash collisions.