https://www.usenix.org/conference/usits-99/scalable-web-caching-frequently-updated-objects-using-reliable-multicast is a 1999 research paper. I hope by now multicast has grown more mature more proven. Not sure where this is used, perhaps within certain network boundaries such as a private network of data servers.
This paper examines reliable multicast for invalidation and delivery of popular, frequently updated objects to web cache proxies.
The generic routing algorithm in multicast has optimization features that offer efficient data duplication.
— optimized message duplication at routing layer (Layer 3?)
https://www.metaswitch.com/knowledge-center/reference/what-is-multicast-ip-routing explains that
Routers duplicate data packets and forward multiple copies wherever the path to recipients
diverges. Group membership information is used to calculate optimal branch points i.e. the best routers at which to duplicate the packets to optimize the use of the network.
The multicast distribution tree of receiving hosts holds the route to every recipient that has joined the multicast group, and is optimized so that
- multicast traffic does not reach networks that do not have any such recipients (unless the network is a transit network on the way to other recipients)
- duplicate copies of packets are kept to a minimum.
Many people say that in general, multicast over public internet is unsupported. I think many retail websites (including the dominant west coast firms) are technically unable to use multicast.
As an end-user, You can multicast across the public Internet to another site by using a tunnel that supports multicast.
As a larger organization, like a video provider or an ISP, it is certainly possible to forward multicast packets across their domain boundary (i.e. across an Internet).
To forward those multicast packets to another ISP, you would need a peering agreement with them and use the Multicast Source Discovery Protocol (MSDP), configured on both ends.
While you won’t propagate your multicast across the global Internet, crossing network boundaries with multicast packets is not impossible.
- data loss is tolerable
- many-to-many messaging — with multiple senders and multiple receivers
— https://www.cisco.com/c/en/us/about/press/internet-protocol-journal/back-issues/table-contents-19/reliable-multicast.html says
Collaborative applications such as data conferences (whiteboarding) and network-based games are many-to-many applications with ….
… modest scaling requirements of less than 100 participants.
This kind of application requires low latency of less than 400 millisec.
Transmission does not always need strict reliability; for example, refresh of background information for a network game could wait for the next refresh.
It’s wrong to say UDP uses a small receive buffer but doesn’t use send buffer.
Receive — https://access.redhat.com/documentation/en-US/JBoss_Enterprise_Web_Platform/5/html/Administration_And_Configuration_Guide/jgroups-perf-udpbuffer.html shows how to increase UDP receive buffer to 25MB
Send — https://stackoverflow.com/questions/2031109/understanding-set-getsockopt-so-sndbuf) shows you CAN configure send buffer on a UDP socket.
Send — https://www.ibm.com/support/knowledgecenter/en/SSB23S_188.8.131.52/gtpc2/cpp_sendto.html also confirms SO_SNDBUF send-buffer size applies to UDP sockets
In addition, Application is free to use its own custom buffer in the form of a vector for example.
370,000 msg/sec isn’t too high. Typical exchange message size is 100 bits, so we are talking about 37 Mbps, , less than 0.1% of a 100 Gbps network capacity.
My networking mentor CSY and I both believe it’s entirely possible to host 8 independent threads in a single process to handle the 8 independent message channels. Capacity would be 296 Mbps on a single NIC and single PID.
See also mktData parser: multi-threaded]ST mode #Balaji
I feel a more bandwidth-demanding multicast application is video-on-demand where a single server may need to simultaneously stream different videos.
Q: how about world cup final real-time multicast video streaming to millions of subscribers?
%%A: now I think this is not so demanding, because the number of simultaneous video streams is one
I think there are many application-layer multicast protocols. http://pages.cs.wisc.edu/~suman/pubs/sigcomm02.pdf says
Unlike native multicast (i.e. IP-multicast) where data packets are replicated at routers inside the network, in application-layer multicast data packets are replicated at end hosts.
When we hear “multicast” we should not assume it means IP-multicast using UDP ! “Multicast” is not a specific protocol like UDP, but a generic, loosely defined term.
https://pdfs.semanticscholar.org/87f8/41def2a4e5740a13589d4b60d7a2406da866.pdf says “many current systems support RGDD at application layer using unicast TCP“
http://www.nmsl.cs.ucsb.edu/MulticastSocketsBook/ has zipped sample code showing
mc_addr.sin_port = thePort;
bind(sock, (struct sockaddr *) &mc_addr, sizeof(mc_addr) ) // set the group port, not local port!
mc_req.imr_multiaddr.s_addr = inet_addr(“184.108.40.206”);
setsockopt(sock, IPPROTO_IP, IP_DROP_MEMBERSHIP,
(void*) &mc_req, sizeof(mc_req) // set the IP by sending a IGMP join-request
Note setsocopt() actually sends a request!
====That’s for multicast receivers. Multicast senders use a simpler procedure —
mc_addr.sin_addr.s_addr = inet_addr(“220.127.116.11”);
mc_addr.sin_port = htons(thePort);
sendto(sock, send_str, send_len, 0, (struct sockaddr *) &mc_addr, …
In NYSE market data lingo, we say “multicast channel”.
- analogy: TV channel — you can subscribe but can’t connect to it.
- analogy: Twitter hashtag — you can follow it, but can’t connect to it.
“Multicast connectivity” is barely tolerable but not “connection”. A multicast end system joins or subscribes to a group. You can’t really “connect” to a group as there could be zero or a million different peer systems without a “ring leader” or a representative.
Even for unicast UDP, “connect” is the wrong word as UDP is connectionless.
Saying a nonsense like “multicast connection” is an immediate giveaway that the speaker isn’t familiar with UDP or multicast.
P116 [[tcp/ip sockets in C]] made it very clear.
A call to recv on the receiver machine will return data from at most one send() on the sender machine.
It can be a partial message, but would be the first part. See https://stackoverflow.com/questions/13317532/receiving-a-part-of-packet-via-recvfrom-udp
I believe entire payload of one send()/sendto() is packaged into an envelope. The kernel would never deliver two envelopes to one recv()/recvfrom() call. Therefore receiver can only receive one envelope at a time. If entire envelope is too large then only only first part of the payload is delivered.
A lot of discussions fail to acknowledge the existence (and importance) of UDP receiver.
- Socket tuning
- receive buffer configuration
- linux kernel tuning
- blocking socket calls
I guess you could simulate broadcast using TCP. You may need to maintain 5 sockets to 5 clients…
I think this is highly inefficient.
I recently used multicast for a while and I see it as yet another example of the same pattern — technical interviewers care about deep theoretical knowledge not practical skills.
Many new developers don’t know multicast protocol uses special IP addresses. This is practical knowledge required on my job, but not asked by interviewers.
Unlike TCP, there’s not a “server” or a “client” in a multicast set-up. This is practical knowledge in my project but not asked by interviewers.
When I receive no data from a multicast channel, it’s not obvious whether nobody is sending or I have no connectivity. (In contrast, with TCP, you get connection error if there’s no connectivity. See tcp: detect wire unplugged.) This is practical knowledge, but never asked by interviewers.
I never receive a partial message by multicast, but I always receive partial message by TCP when the message is a huge file. This is reality in my project, but never asked by any interviewer.
So what do interviewers focus on?
- packet loss — UDP (including multicast) lacks delivery guarantee. This is a real issue for system design, but I seldom notice it.
- higher efficiency than TCP — I don’t notice it, though it’s a true.
- socket buffer overflow — should never happen in TCP but could happen in UDP including multiast. This knowledge is not needed in my project.
- flow control — TCP receiver can notify sender to reduce sending speed. This knowledge is not needed in many projects.
- non-blocking send/receive — not needed in any project.
So what can we do? Study beyond what’s needed in the project. (The practical skills used is only 10% of the interview requirements.) Otherwise, even after 2 years using multicast in very project, I would still look like as a novice to an interviewer.
Without the job interviews, it’s hard to know what theoretical details are required. I feel a multicast project is a valuable starting point to get me started. I can truthfully mention multicast in my resume. Then I need to attend interviews and study the theoretical topics.
This is often mentioned in IV. At least you can demonstrate your knowledge.
What if the UDP datagram is too big for recv() i.e. specified buffer length is too small? P116 [[tcp/ip soclets in C]] seems to say the oversize message is silently truncated.
UDP recv() will only return a single “logical” message . I believe TCP can put partial or multiple messages into one “buffer” for recv().
Q: if my buffer is big enough, will my UDP recv() ever truncate a msg?
Note IP would always deliver a whole msg or miss a whole msg, never a partial msg. See P 329 [[comp networking]]
 a logical msg is the payload from one send()
Reason: data rate constraints inherent in TCP protocol. Congestion Control?
Reason: TCP to a large group would be one-by-one unicast, highly inefficient and too much load on the sender. Reason: TCP has more data-overhead in the form of non-payload data. * TCP header is typically 20 bytes vs 8 bytes for UDP
* Receiver need to acknowledge
If you receive seq #13 before 11, receiver will detect a gap between 10 and 13 and send retransmission request to the exchange. If 11, 12 come later, they will be discarded.
https://www.iana.org/assignments/multicast-addresses/multicast-addresses.xhtml shows a few hundred big companies including exchanges. For example, one exchange multicast address 18.104.22.168 falls within the range
22.214.171.124 to 126.96.36.199 Inter-continental Exchange, Inc.
It’s educational to compare with a unicast IP address. If you own such an unicast address, you can put it on a host and bind an http server to it. No one else can bind a server to that uncast address. Any client connecting to that IP will hit your host.
As owner of a multicast address, you alone can send datagrams to it and (presumably) you can restrict who can send or receive on this group address. Alan Shi pointed out the model is pub-sub MOM.
Twitter hashtab is another analogy.
http://www.diffen.com/difference/TCP_vs_UDP is relevant.
- FIFO — TCP; UDP — packet sequencing is uncontrolled
- Virtual circuit — TCP; UDP — datagram network
- Connectionless — UDP ; TCP — Connection-oriented
- Channel vs Connection — In RTS and xtap, we use the analogy of “TV channel” for multicast. TCP uses “Connection”.
With http, ftp etc, you establish a Connection (like a session). No such connection for UDP communication.
Retransmission is part of — TCP; UDP — application layer (not network layer) on receiving end must request retransmission.
To provide guaranteed FIFO data delivery, over unreliable channel, TCP must be able to detect and request retransmission. UDP doesn’t bother. An application built on UDP need to create that functionality, as in the IDC (Interactive Data Corp) ticker plant. Here’s one simple scenario (easy to set up as a test):
- sender keeps multicasting
- shut down and restart receiver.
- receiver detects the sequence number gap, indicate message loss during the down time.
- Receiver request for retransmission.
Every multicast address is a group address. In other words, a multicast address identifies a group.
Sending a multicast datagram is much simpler than receiving…
 http://www.tldp.org/HOWTO/Multicast-HOWTO-2.html is a concise 4-page introduction. Describes joining/leaving.
 http://ntrg.cs.tcd.ie/undergrad/4ba2/multicast/antony/ has sample code to send/receive. Note there’s no server/client actually.
By definition, multicast addresses all start with 1110 in the first half byte. Routers seeing such a destnation (never a source) address knows the msg is a multicast msg.
However, routers don’t forward any msg with destnation address 188.8.131.52 through 184.108.40.206 because these are local multicast addresses. I guess these local multicast addresses are like 192.168.* addresses.
In terms of kernel implementation, my linux kernel book clearly hints UDP is much simpler than TCP.
However, because UDP doesn’t provide lots of guarantee, fault tolerance etc, we the users of UDP must provide those features.
http://en.wikipedia.org/wiki/Multicast shows(suggests?) that broadcast is also time-efficient since sender only does one send. However, multicast is smarter and more bandwidth-efficient.
IPv6 disabled broadcast — to prevent disturbing all nodes in a network when only a few are interested in a particular service. Instead it relies on multicast addressing, a conceptually similar one-to-many routing methodology. However, multicasting limits the pool of receivers to those that join a specific multicast receiver group.
(Note virtually all MC apps use UDP.)
To understand MC efficency, we must compare with UC (unicast) and BC (broadcast). First we need some “codified” metrics —
- TT = imposing extra Traffic on network, which happens when the same packet is sent multiple times through the same network.
- RR = imposing extra processing workload on the Receiver host, because the packet is addressed TO “me” (pretending to be a receiver). If “my” address were not mentioned in the packet, then I would have ignored it without processing.
- SS = imposing extra processing workload by the Sender — a relatively low priority.
Now we can contrast MC, UC and BC. Suppose there are 3 receiver hosts to be notified, and 97 other hosts to leave alone, and suppose you send the message via —
- UC – TT not RR — sender dispatches 3 copies each addressed to a single host.
- BC – RR not TT — every host on the network sees a packet addressed to it though most would process then ignore it, wasting receiver’s time. When CEO sends an announcement email, everyone is in the recipient list.
- MC – not RR not TT. However, MC can still flood the network.
These are the 2 main usages of IP multicast. In both, Lost packets are considered lost forever. Resend is sometimes considered “too late”.
I think some of the world’s most cutting-edge network services — live price feed, live event broadcast, VOD, multiplayer gaming — rely on multicast.
Multicast is more intelligent data dissemination than broadcast, and faster than unicast. Intelligence is built into routers.
I believe JMS publish is unicast based, not broadcast based. The receivers don’t comprise an IP broadcast group. Therefore JMS broker must deliver to one receiver at a time.
I guess a digest of the msg + a sequence number is sent out along with the msg itself.
One of the common designs is PGM —
While TCP uses ACKs to acknowledge groups of packets sent (something that would be uneconomical over multicast), PGM uses the concept of Negative Acknowledgements (NAKs). A NAK is sent unicast back to the host via a defined network-layer hop-by-hop procedure whenever there is a detection of data loss of a specific sequence. As PGM is heavily reliant on NAKs for integrity, when a NAK is sent, a NAK Confirmation (NCF) is sent via multicast for every hop back. Repair Data (RDATA) is then sent back either from the source or from a Designated Local Repairer (DLR).
PGM is an IETF experimental protocol. It is not yet a standard, but has been implemented in some networking devices and operating systems, including Windows XP and later versions of Microsoft Windows, as well as in third-party libraries for Linux, Windows and Solaris.
First, use a distinct sequence numbers for each packet. When one of the receivers notices a missed packet, it asks sender to resend ….to all receivers.
As an optimization, use bigger chunks. Use a window of packets. If the transmission is reliable, then expand the window size, so each sequence number covers to a (much) larger chunk of packets.
These are the basic reliability techniques of TCP. Reliable multicast could borrow these from TCP.
Note real TCP isn’t usable for multicast as each TCP transmission has exactly one sender and one receiver. I think entire TCP protocol is based on that premise — unicast circuit.
My own Q: How do you make UDP reliable?
A: sequence number + gap management + retransmission
My own Q: can q(snoop) capture UDP traffic?
Q: Why would multicast use a different address space? Theoretical question?
A: each MC address is a group…
Q: why would a server refuse connection? (Theoretical question?)
%%A: perhaps tcp queue is full, so application layer won’t see anything
Q: How do you avoid full GC
Q: what’s the impact of 64 bit JVM?
Q: how many package break releases did your team have in a year?
Q: In a live production system, how do you make configuration changes with minimal impact to existing modules?