specify(by ip:port) multicast group to join

http://www.nmsl.cs.ucsb.edu/MulticastSocketsBook/ has zipped sample code showing

mc_addr.sin_port = thePort;

bind(sock, (struct sockaddr *) &mc_addr, sizeof(mc_addr) ) // set the group port, not local port!
—-
mc_req.imr_multiaddr.s_addr = inet_addr(“224.1.2.3”);

setsockopt(sock, IPPROTO_IP, IP_DROP_MEMBERSHIP,
(void*) &mc_req, sizeof(mc_req) // set the IP by sending a IGMP join-request

Note setsocopt() actually sends a request!

====That’s for multicast receivers.  Multicast senders use a simpler procedure —

mc_addr.sin_addr.s_addr = inet_addr(“224.1.2.3”);
mc_addr.sin_port = htons(thePort);

sendto(sock, send_str, send_len, 0, (struct sockaddr *) &mc_addr, …

Advertisements

multicast: IV care only about bookish nlg !!practical skills

Hi friends,

I recently used multicast for a while and I see it as yet another example of the same pattern — technical interviewers care about deep theoretical knowledge not practical skills.

Many new developers don’t know multicast protocol uses special IP addresses. This is practical knowledge required on my job, but not asked by interviewers.

Unlike TCP, there’s not a “server” or a “client” in a multicast set-up. This is practical knowledge in my project but not asked by interviewers.

When I receive no data from a multicast channel, it’s not obvious whether nobody is sending or I have no connectivity. (In contrast, with TCP, you get connection error if there’s no connectivity. See tcp: detect wire unplugged.) This is practical knowledge, but never asked by interviewers.

I never receive a partial message by multicast, but I always receive partial message by TCP when the message is a huge file. This is reality in my project, but never asked by any interviewer.

So what do interviewers focus on?

  • packet loss — UDP (including multicast) lacks delivery guarantee. This is a real issue for system design, but I seldom notice it.
  • higher efficiency than TCP — I don’t notice it, though it’s a true.
  • socket buffer overflow — should never happen in TCP but could happen in UDP including multiast. This knowledge is not needed in my project.
  • flow control — TCP receiver can notify sender to reduce sending speed. This knowledge is not needed in many projects.
  • non-blocking send/receive — not needed in any project.

So what can we do? Study beyond what’s needed in the project. (The practical skills used is only 10% of the interview requirements.) Otherwise, even after 2 years using multicast in very project, I would still look like as a novice to an interviewer.

Without the job interviews, it’s hard to know what theoretical details are required. I feel a multicast project is a valuable starting point to get me started. I can truthfully mention multicast in my resume. Then I need to attend interviews and study the theoretical topics.

UDP/TCP socket read buffer size: can be 256MB

For my UDP socket, I use 64MB.
For my TCP socket, I use 64MB too!

These are large values and required kernel turning. In my linux server, /etc/sysctl.conf shows these permissible read buffer sizes:

net.core.rmem_max = 268435456 # —–> 256 MB
net.ipv4.tcp_rmem = 4096   10179648   268435456 # —–> 256 MB

Note a read buffer of any socket is always maintained by the kernel and can be shared across processes [1]. In my mind, the TCP/UDP code using these buffers is kernel code, like hotel service. Application code is like hotel guests.

[1] Process A will use its file descriptor 3 for this socket, while Process B will use its file descriptor 5 for this socket.

TCP/UDP: partial or multiple messages in one buffer

This is often mentioned in IV. At least you can demonstrate your knowledge.

What if the UDP datagram is too big for recv() i.e. specified buffer length is too small? P116 [[tcp/ip soclets in C]] seems to say the oversize message is silently truncated.

UDP recv() will only return a single “logical” message [1]. I believe TCP can put partial or multiple messages into one “buffer” for recv().

Q: if my buffer is big enough, will my UDP recv() ever truncate a msg?
%%A: never

Note IP would always deliver a whole msg or miss a whole msg, never a partial msg. See P 329 [[comp networking]]

[1] a logical msg is the payload from one send()

##y MultiCast favored over TCP

Reason: data rate constraints inherent in TCP protocol. Congestion Control?
Reason: TCP to a large group would be one-by-one unicast, highly inefficient and too much load on the sender. Reason: TCP has more data-overhead in the form of non-payload data. * TCP header is typically 20 bytes vs 8 bytes for UDP
* Receiver need to acknowledge

multicast address ownership#eg exchanges

https://www.iana.org/assignments/multicast-addresses/multicast-addresses.xhtml shows a few hundred big companies including exchanges. For example, one exchange multicast address 224.0.59.76 falls within the range

224.0.58.0 to 224.0.61.255 Inter-continental Exchange, Inc.

It’s educational to compare with a unicast IP address. If you own such an unicast address, you can put it on a host and bind an http server to it. No one else can bind a server to that uncast address. Any client connecting to that IP will hit your host.

As owner of a multicast address, you alone can send datagrams to it and (presumably) you can restrict who can send or receive on this group address. Alan Shi pointed out the model is pub-sub MOM.

UDP^TCP again#retrans

http://www.diffen.com/difference/TCP_vs_UDP is relevant.

FIFO — TCP; UDP — packet sequencing is uncontrolled
Virtual circuit — TCP; UDP — datagram network
Connectionless — UDP ; TCP — Connection-oriented

With http, ftp etc, you establish a Connection (like a session). No such connection for UDP communication.

Retransmission is part of — TCP; UDP — application layer (not network layer) on receiving end must request retransmission.

To provide guaranteed FIFO data delivery, over unreliable channel, TCP must be able to detect and request retransmission. UDP doesn’t bother. An application built on UDP need to create that functionality, as in the IDC (Interactive Data Corp) ticker plant. Here’s one simple scenario (easy to set up as a test):

  • sender keeps multicasting
  • shut down and restart receiver.
  • receiver detects the sequence number gap, indicate message loss during the down time.
  • Receiver request for retransmission.

 

joining/leaving a multicast group

Every multicast address is a group address. In other words, a multicast address identifies a group.

Sending a multicast datagram is much simpler than receiving…

[1] http://www.tldp.org/HOWTO/Multicast-HOWTO-2.html is a concise 4-page introduction. Describes joining/leaving.

[2] http://ntrg.cs.tcd.ie/undergrad/4ba2/multicast/antony/ has sample code to send/receive. Note there’s no server/client actually.

 

multicast address 1110xxxx #briefly

By definition, multicast addresses all start with 1110 in the first half byte. Routers seeing such a destnation (never a source) address knows the msg is a multicast msg.

However, routers don’t forward any msg with destnation address 224.0.0.0 through 224.0.0.255 because these are local multicast addresses. I guess these local multicast addresses are like 192.168.* addresses.

send()recv() ^ write()read() @sockets

Q: A socket is also a file descriptor, so why bother with send()recv() when you can use write()read()?

A: See https://stackoverflow.com/questions/9475442/unix-domain-socket-vs-named-pipes

send()recv() are recommended, more widely used and better documented.

[[linux kernel]] P623 actually uses read()write() for udp, in stead of sendto()recvfrom(), but only after a call to connect() to set the remote address

broadcast^multicast

http://en.wikipedia.org/wiki/Multicast shows(suggests?) that broadcast is also time-efficient since sender only does one send. However, multicast is smarter and more bandwidth-efficient.

IPv6 disabled broadcast — to prevent disturbing all nodes in a network when only a few are interested in a particular service. Instead it relies on multicast addressing, a conceptually similar one-to-many routing methodology. However, multicasting limits the pool of receivers to those that join a specific multicast receiver group.

multicast – highly efficient? my take

(Note virtually all MC apps use UDP.)
To understand MC efficency, we must compare with UC (unicast) and BC (broadcast). First we need some “codified” metrics —
  • TT = imposing extra Traffic on network, which happens when the same packet is sent multiple times through the same network.
  • RR = imposing extra processing workload on the Receiver host, because the packet is addressed TO “me” (pretending to be a receiver). If “my” address were not mentioned in the packet, then I would have ignored it without processing.
  • SS = imposing extra processing workload by the Sender — a relatively low priority.
Now we can contrast MC, UC and BC. Suppose there are 3 receiver hosts to be notified, and 97 other hosts to leave alone, and suppose you send the message via —
  1. UC – TT not RR — sender dispatches 3 copies each addressed to a single host.
  2. BC – RR not TT — every host on the network sees a packet addressed to it though most would process then ignore it, wasting receiver’s time. When CEO sends an announcement email, everyone is in the recipient list.
  3. MC – not RR not TT. However, MC can still flood the network.

multicast – video streaming^live price feed

These are the 2 main usages of IP multicast. In both, Lost packets are considered lost forever. Resend would be “too late”.

I think some of the world’s most cutting-edge network services — live price feed, live event broadcast, VOD — rely on IP multicast.

Multicast is more intelligent data dissemination than broadcast, and faster than unicast. Intelligence is built into routers.

I believe JMS publish is unicast based, not broadcast based. The receivers don’t comprise an IP broadcast group. Therefore JMS broker must deliver to one receiver at a time.

how does reliable multicast work #briefly

I guess a digest of the msg + a sequence number is sent out along with the msg itself.

See wiki.

One of the common designs is PGM —

While TCP uses ACKs to acknowledge groups of packets sent (something that would be uneconomical over multicast), PGM uses the concept of Negative Acknowledgements (NAKs). A NAK is sent unicast back to the host via a defined network-layer hop-by-hop procedure whenever there is a detection of data loss of a specific sequence. As PGM is heavily reliant on NAKs for integrity, when a NAK is sent, a NAK Confirmation (NCF) is sent via multicast for every hop back. Repair Data (RDATA) is then sent back either from the source or from a Designated Local Repairer (DLR).

PGM is an IETF experimental protocol. It is not yet a standard, but has been implemented in some networking devices and operating systems, including Windows XP and later versions of Microsoft Windows, as well as in third-party libraries for Linux, Windows and Solaris.

reliable multicast – basics

First, use a distinct sequence numbers for each packet. When one of the receivers notices a missed packet, it asks sender to resend ….to all receivers.

As an optimization, use bigger chunks. Use a window of packets. If the transmission is reliable, then expand the window size, so each sequence number covers to a (much) larger chunk of packets.

These are the basic reliability techniques of TCP. Reliable multicast could borrow these from TCP.

Note real TCP isn’t usable for multicast as each TCP transmission has exactly one sender and one receiver. I think entire TCP protocol is based on that premise — unicast circuit.

IV – UDP/java

My own Q: How do you make UDP reliable?
A: sequence number + gap management + retransmission

My own Q: can q(snoop) capture UDP traffic?

Q: Why would multicast use a different address space? Theoretical question?
A: each MC address is a group…

Q: why would a server refuse connection? (Theoretical question?)
%%A: perhaps tcp queue is full, so application layer won’t see anything

——————

Q: How do you avoid full GC
Q: what’s the impact of 64 bit JVM?
Q: how many package break releases did your team have in a year?
Q: In a live production system, how do you make configuration changes with minimal impact to existing modules?

y UDP uses recvfrom() !! read()

http://www.cs.cmu.edu/afs/cs/academic/class/15441-f01/www/lectures/lecture03.ppt shows that UDP server must use recvfrom() and not read() because only recvfrom() returns (by reference param) the client’s address.

In contrast, TCP establishes a connection/session/virtual-circuit so the thread calling read() already knows the other side’s address, so recvfrom(oppositeAddr,…..) and sendto(oppositeAddr,…) aren’t required — The logic is all in the names!