It’s wrong to say UDP uses a small receive buffer but doesn’t use send buffer.
Receive — https://access.redhat.com/documentation/en-US/JBoss_Enterprise_Web_Platform/5/html/Administration_And_Configuration_Guide/jgroups-perf-udpbuffer.html shows how to increase UDP receive buffer to 25MB
Send — https://stackoverflow.com/questions/2031109/understanding-set-getsockopt-so-sndbuf) shows you CAN configure send buffer on a UDP socket.
Send — https://www.ibm.com/support/knowledgecenter/en/SSB23S_126.96.36.199/gtpc2/cpp_sendto.html also confirms SO_SNDBUF send-buffer size applies to UDP sockets
In addition, Application is free to use its own custom buffer in the form of a vector for example.
http://www.nmsl.cs.ucsb.edu/MulticastSocketsBook/ has zipped sample code showing
mc_addr.sin_port = thePort;
bind(sock, (struct sockaddr *) &mc_addr, sizeof(mc_addr) ) // set the group port, not local port!
mc_req.imr_multiaddr.s_addr = inet_addr(“188.8.131.52”);
setsockopt(sock, IPPROTO_IP, IP_DROP_MEMBERSHIP,
(void*) &mc_req, sizeof(mc_req) // set the IP by sending a IGMP join-request
Note setsocopt() actually sends a request!
====That’s for multicast receivers. Multicast senders use a simpler procedure —
mc_addr.sin_addr.s_addr = inet_addr(“184.108.40.206”);
mc_addr.sin_port = htons(thePort);
sendto(sock, send_str, send_len, 0, (struct sockaddr *) &mc_addr, …
In NYSE market data lingo, we say “multicast channel”.
- analogy: TV channel — you can subscribe but can’t connect to it.
- analogy: Twitter hashtag — you can follow it, but can’t connect to it.
“Multicast connectivity” is barely tolerable but not “connection”. A multicast end system joins or subscribes to a group. You can’t really “connect” to a group as there could be zero or a million different peer systems without a “ring leader” or a representative.
Even for unicast UDP, “connect” is the wrong word as UDP is connectionless.
Saying a nonsense like “multicast connection” is an immediate giveaway that the speaker isn’t familiar with UDP or multicast.
P116 [[tcp/ip sockets in C]] made it very clear.
A call to recv on the receiver machine will return data from at most one send() on the sender machine.
It can be a partial message, but would be the first part. See https://stackoverflow.com/questions/13317532/receiving-a-part-of-packet-via-recvfrom-udp
I believe entire payload of one send()/sendto() is packaged into an envelope. The kernel would never deliver two envelopes to one recv()/recvfrom() call. Therefore receiver can only receive one envelope at a time. If entire envelope is too large then only only first part of the payload is delivered.
A lot of discussions fail to acknowledge the existence (and importance) of UDP receiver.
- Socket tuning
- receive buffer configuration
- linux kernel tuning
- blocking socket calls
I guess you could simulate broadcast using TCP. You may need to maintain 5 sockets to 5 clients…
I think this is highly inefficient.
I recently used multicast for a while and I see it as yet another example of the same pattern — technical interviewers care about deep theoretical knowledge not practical skills.
Many new developers don’t know multicast protocol uses special IP addresses. This is practical knowledge required on my job, but not asked by interviewers.
Unlike TCP, there’s not a “server” or a “client” in a multicast set-up. This is practical knowledge in my project but not asked by interviewers.
When I receive no data from a multicast channel, it’s not obvious whether nobody is sending or I have no connectivity. (In contrast, with TCP, you get connection error if there’s no connectivity. See tcp: detect wire unplugged.) This is practical knowledge, but never asked by interviewers.
I never receive a partial message by multicast, but I always receive partial message by TCP when the message is a huge file. This is reality in my project, but never asked by any interviewer.
So what do interviewers focus on?
- packet loss — UDP (including multicast) lacks delivery guarantee. This is a real issue for system design, but I seldom notice it.
- higher efficiency than TCP — I don’t notice it, though it’s a true.
- socket buffer overflow — should never happen in TCP but could happen in UDP including multiast. This knowledge is not needed in my project.
- flow control — TCP receiver can notify sender to reduce sending speed. This knowledge is not needed in many projects.
- non-blocking send/receive — not needed in any project.
So what can we do? Study beyond what’s needed in the project. (The practical skills used is only 10% of the interview requirements.) Otherwise, even after 2 years using multicast in very project, I would still look like as a novice to an interviewer.
Without the job interviews, it’s hard to know what theoretical details are required. I feel a multicast project is a valuable starting point to get me started. I can truthfully mention multicast in my resume. Then I need to attend interviews and study the theoretical topics.
This is often mentioned in IV. At least you can demonstrate your knowledge.
What if the UDP datagram is too big for recv() i.e. specified buffer length is too small? P116 [[tcp/ip soclets in C]] seems to say the oversize message is silently truncated.
UDP recv() will only return a single “logical” message . I believe TCP can put partial or multiple messages into one “buffer” for recv().
Q: if my buffer is big enough, will my UDP recv() ever truncate a msg?
Note IP would always deliver a whole msg or miss a whole msg, never a partial msg. See P 329 [[comp networking]]
 a logical msg is the payload from one send()
Reason: data rate constraints inherent in TCP protocol. Congestion Control?
Reason: TCP to a large group would be one-by-one unicast, highly inefficient and too much load on the sender. Reason: TCP has more data-overhead in the form of non-payload data. * TCP header is typically 20 bytes vs 8 bytes for UDP
* Receiver need to acknowledge
If you receive seq #13 before 11, receiver will detect a gap between 10 and 13 and send retransmission request to the exchange. If 11, 12 come later, they will be discarded.
https://www.iana.org/assignments/multicast-addresses/multicast-addresses.xhtml shows a few hundred big companies including exchanges. For example, one exchange multicast address 220.127.116.11 falls within the range
18.104.22.168 to 22.214.171.124 Inter-continental Exchange, Inc.
It’s educational to compare with a unicast IP address. If you own such an unicast address, you can put it on a host and bind an http server to it. No one else can bind a server to that uncast address. Any client connecting to that IP will hit your host.
As owner of a multicast address, you alone can send datagrams to it and (presumably) you can restrict who can send or receive on this group address. Alan Shi pointed out the model is pub-sub MOM.
Twitter hashtab is another analogy.
http://www.diffen.com/difference/TCP_vs_UDP is relevant.
- FIFO — TCP; UDP — packet sequencing is uncontrolled
- Virtual circuit — TCP; UDP — datagram network
- Connectionless — UDP ; TCP — Connection-oriented
- Channel vs Connection — In RTS and xtap, we use the analogy of “TV channel” for multicast. TCP uses “Connection”.
With http, ftp etc, you establish a Connection (like a session). No such connection for UDP communication.
Retransmission is part of — TCP; UDP — application layer (not network layer) on receiving end must request retransmission.
To provide guaranteed FIFO data delivery, over unreliable channel, TCP must be able to detect and request retransmission. UDP doesn’t bother. An application built on UDP need to create that functionality, as in the IDC (Interactive Data Corp) ticker plant. Here’s one simple scenario (easy to set up as a test):
- sender keeps multicasting
- shut down and restart receiver.
- receiver detects the sequence number gap, indicate message loss during the down time.
- Receiver request for retransmission.
Every multicast address is a group address. In other words, a multicast address identifies a group.
Sending a multicast datagram is much simpler than receiving…
 http://www.tldp.org/HOWTO/Multicast-HOWTO-2.html is a concise 4-page introduction. Describes joining/leaving.
 http://ntrg.cs.tcd.ie/undergrad/4ba2/multicast/antony/ has sample code to send/receive. Note there’s no server/client actually.
By definition, multicast addresses all start with 1110 in the first half byte. Routers seeing such a destnation (never a source) address knows the msg is a multicast msg.
However, routers don’t forward any msg with destnation address 126.96.36.199 through 188.8.131.52 because these are local multicast addresses. I guess these local multicast addresses are like 192.168.* addresses.
In terms of kernel implementation, my linux kernel book clearly hints UDP is much simpler than TCP.
However, because UDP doesn’t provide lots of guarantee, fault tolerance etc, we the users of UDP must provide those features.
http://en.wikipedia.org/wiki/Multicast shows(suggests?) that broadcast is also time-efficient since sender only does one send. However, multicast is smarter and more bandwidth-efficient.
IPv6 disabled broadcast — to prevent disturbing all nodes in a network when only a few are interested in a particular service. Instead it relies on multicast addressing, a conceptually similar one-to-many routing methodology. However, multicasting limits the pool of receivers to those that join a specific multicast receiver group.
(Note virtually all MC apps use UDP.)
To understand MC efficency, we must compare with UC (unicast) and BC (broadcast). First we need some “codified” metrics —
- TT = imposing extra Traffic on network, which happens when the same packet is sent multiple times through the same network.
- RR = imposing extra processing workload on the Receiver host, because the packet is addressed TO “me” (pretending to be a receiver). If “my” address were not mentioned in the packet, then I would have ignored it without processing.
- SS = imposing extra processing workload by the Sender — a relatively low priority.
Now we can contrast MC, UC and BC. Suppose there are 3 receiver hosts to be notified, and 97 other hosts to leave alone, and suppose you send the message via —
- UC – TT not RR — sender dispatches 3 copies each addressed to a single host.
- BC – RR not TT — every host on the network sees a packet addressed to it though most would process then ignore it, wasting receiver’s time. When CEO sends an announcement email, everyone is in the recipient list.
- MC – not RR not TT. However, MC can still flood the network.
These are the 2 main usages of IP multicast. In both, Lost packets are considered lost forever. Resend is sometimes considered “too late”.
I think some of the world’s most cutting-edge network services — live price feed, live event broadcast, VOD — rely on IP multicast.
Multicast is more intelligent data dissemination than broadcast, and faster than unicast. Intelligence is built into routers.
I believe JMS publish is unicast based, not broadcast based. The receivers don’t comprise an IP broadcast group. Therefore JMS broker must deliver to one receiver at a time.
I guess a digest of the msg + a sequence number is sent out along with the msg itself.
One of the common designs is PGM —
While TCP uses ACKs to acknowledge groups of packets sent (something that would be uneconomical over multicast), PGM uses the concept of Negative Acknowledgements (NAKs). A NAK is sent unicast back to the host via a defined network-layer hop-by-hop procedure whenever there is a detection of data loss of a specific sequence. As PGM is heavily reliant on NAKs for integrity, when a NAK is sent, a NAK Confirmation (NCF) is sent via multicast for every hop back. Repair Data (RDATA) is then sent back either from the source or from a Designated Local Repairer (DLR).
PGM is an IETF experimental protocol. It is not yet a standard, but has been implemented in some networking devices and operating systems, including Windows XP and later versions of Microsoft Windows, as well as in third-party libraries for Linux, Windows and Solaris.
First, use a distinct sequence numbers for each packet. When one of the receivers notices a missed packet, it asks sender to resend ….to all receivers.
As an optimization, use bigger chunks. Use a window of packets. If the transmission is reliable, then expand the window size, so each sequence number covers to a (much) larger chunk of packets.
These are the basic reliability techniques of TCP. Reliable multicast could borrow these from TCP.
Note real TCP isn’t usable for multicast as each TCP transmission has exactly one sender and one receiver. I think entire TCP protocol is based on that premise — unicast circuit.
My own Q: How do you make UDP reliable?
A: sequence number + gap management + retransmission
My own Q: can q(snoop) capture UDP traffic?
Q: Why would multicast use a different address space? Theoretical question?
A: each MC address is a group…
Q: why would a server refuse connection? (Theoretical question?)
%%A: perhaps tcp queue is full, so application layer won’t see anything
Q: How do you avoid full GC
Q: what’s the impact of 64 bit JVM?
Q: how many package break releases did your team have in a year?
Q: In a live production system, how do you make configuration changes with minimal impact to existing modules?