UDP: send/receive buffers are configurable #CSY

It’s wrong to say UDP uses a small receive buffer but doesn’t use send buffer.

Receive — https://access.redhat.com/documentation/en-US/JBoss_Enterprise_Web_Platform/5/html/Administration_And_Configuration_Guide/jgroups-perf-udpbuffer.html shows how to increase UDP receive buffer to 25MB

Send — https://stackoverflow.com/questions/2031109/understanding-set-getsockopt-so-sndbuf) shows you CAN configure send buffer on a UDP socket.

Send — https://www.ibm.com/support/knowledgecenter/en/SSB23S_1.1.0.15/gtpc2/cpp_sendto.html also confirms SO_SNDBUF send-buffer size applies to UDP sockets

In addition, Application is free to use its own custom buffer in the form of a vector for example.

Advertisements

specify(by ip:port) multicast group to join

http://www.nmsl.cs.ucsb.edu/MulticastSocketsBook/ has zipped sample code showing

mc_addr.sin_port = thePort;

bind(sock, (struct sockaddr *) &mc_addr, sizeof(mc_addr) ) // set the group port, not local port!
—-
mc_req.imr_multiaddr.s_addr = inet_addr(“224.1.2.3”);

setsockopt(sock, IPPROTO_IP, IP_DROP_MEMBERSHIP,
(void*) &mc_req, sizeof(mc_req) // set the IP by sending a IGMP join-request

Note setsocopt() actually sends a request!

====That’s for multicast receivers.  Multicast senders use a simpler procedure —

mc_addr.sin_addr.s_addr = inet_addr(“224.1.2.3”);
mc_addr.sin_port = htons(thePort);

sendto(sock, send_str, send_len, 0, (struct sockaddr *) &mc_addr, …

CHANNEL for multicast; TCP has Connection

In NYSE market data lingo, we say “multicast channel”.

  • analogy: TV channel — you can subscribe but can’t connect to it.
  • analogy: Twitter hashtag — you can follow it, but can’t connect to it.

“Multicast connectivity” is barely tolerable but not “connection”. A multicast end system joins or subscribes to a group. You can’t really “connect” to a group as there could be zero or a million different peer systems without a “ring leader” or a representative.

Even for unicast UDP, “connect” is the wrong word as UDP is connectionless.

Saying a nonsense like “multicast connection” is an immediate giveaway that the speaker isn’t familiar with UDP or multicast.

UDP recv()from 1 send()at most

P116 [[tcp/ip sockets in C]] made it very clear.

A call to recv on the receiver machine will return data from at most one send() on the sender machine.

It can be a partial message, but would be the first part. See https://stackoverflow.com/questions/13317532/receiving-a-part-of-packet-via-recvfrom-udp

I believe entire payload of one send()/sendto() is packaged into an envelope. The kernel would never deliver two envelopes to one recv()/recvfrom() call. Therefore receiver can only receive one envelope at a time. If entire envelope is too large then only only first part of the payload is delivered.

multicast: IV care only about bookish nlg !!practical skills

Hi friends,

I recently used multicast for a while and I see it as yet another example of the same pattern — technical interviewers care about deep theoretical knowledge not practical skills.

Many new developers don’t know multicast protocol uses special IP addresses. This is practical knowledge required on my job, but not asked by interviewers.

Unlike TCP, there’s not a “server” or a “client” in a multicast set-up. This is practical knowledge in my project but not asked by interviewers.

When I receive no data from a multicast channel, it’s not obvious whether nobody is sending or I have no connectivity. (In contrast, with TCP, you get connection error if there’s no connectivity. See tcp: detect wire unplugged.) This is practical knowledge, but never asked by interviewers.

I never receive a partial message by multicast, but I always receive partial message by TCP when the message is a huge file. This is reality in my project, but never asked by any interviewer.

So what do interviewers focus on?

  • packet loss — UDP (including multicast) lacks delivery guarantee. This is a real issue for system design, but I seldom notice it.
  • higher efficiency than TCP — I don’t notice it, though it’s a true.
  • socket buffer overflow — should never happen in TCP but could happen in UDP including multiast. This knowledge is not needed in my project.
  • flow control — TCP receiver can notify sender to reduce sending speed. This knowledge is not needed in many projects.
  • non-blocking send/receive — not needed in any project.

So what can we do? Study beyond what’s needed in the project. (The practical skills used is only 10% of the interview requirements.) Otherwise, even after 2 years using multicast in very project, I would still look like as a novice to an interviewer.

Without the job interviews, it’s hard to know what theoretical details are required. I feel a multicast project is a valuable starting point to get me started. I can truthfully mention multicast in my resume. Then I need to attend interviews and study the theoretical topics.

TCP/UDP: partial or multiple messages in one buffer

This is often mentioned in IV. At least you can demonstrate your knowledge.

What if the UDP datagram is too big for recv() i.e. specified buffer length is too small? P116 [[tcp/ip soclets in C]] seems to say the oversize message is silently truncated.

UDP recv() will only return a single “logical” message [1]. I believe TCP can put partial or multiple messages into one “buffer” for recv().

Q: if my buffer is big enough, will my UDP recv() ever truncate a msg?
%%A: never

Note IP would always deliver a whole msg or miss a whole msg, never a partial msg. See P 329 [[comp networking]]

[1] a logical msg is the payload from one send()

##y MultiCast favored over TCP

Reason: data rate constraints inherent in TCP protocol. Congestion Control?
Reason: TCP to a large group would be one-by-one unicast, highly inefficient and too much load on the sender. Reason: TCP has more data-overhead in the form of non-payload data. * TCP header is typically 20 bytes vs 8 bytes for UDP
* Receiver need to acknowledge

multicast address ownership#eg exchanges

https://www.iana.org/assignments/multicast-addresses/multicast-addresses.xhtml shows a few hundred big companies including exchanges. For example, one exchange multicast address 224.0.59.76 falls within the range

224.0.58.0 to 224.0.61.255 Inter-continental Exchange, Inc.

It’s educational to compare with a unicast IP address. If you own such an unicast address, you can put it on a host and bind an http server to it. No one else can bind a server to that uncast address. Any client connecting to that IP will hit your host.

As owner of a multicast address, you alone can send datagrams to it and (presumably) you can restrict who can send or receive on this group address. Alan Shi pointed out the model is pub-sub MOM.

Twitter hashtab is another analogy.

UDP^TCP #TV-channel

http://www.diffen.com/difference/TCP_vs_UDP is relevant.

  • FIFO — TCP; UDP — packet sequencing is uncontrolled
  • Virtual circuit — TCP; UDP — datagram network
  • Connectionless — UDP ; TCP — Connection-oriented
  • Channel vs Connection — In RTS and xtap, we use the analogy of “TV channel” for multicast. TCP uses “Connection”.

With http, ftp etc, you establish a Connection (like a session). No such connection for UDP communication.

Retransmission is part of — TCP; UDP — application layer (not network layer) on receiving end must request retransmission.

To provide guaranteed FIFO data delivery, over unreliable channel, TCP must be able to detect and request retransmission. UDP doesn’t bother. An application built on UDP need to create that functionality, as in the IDC (Interactive Data Corp) ticker plant. Here’s one simple scenario (easy to set up as a test):

  • sender keeps multicasting
  • shut down and restart receiver.
  • receiver detects the sequence number gap, indicate message loss during the down time.
  • Receiver request for retransmission.

 

joining/leaving a multicast group

Every multicast address is a group address. In other words, a multicast address identifies a group.

Sending a multicast datagram is much simpler than receiving…

[1] http://www.tldp.org/HOWTO/Multicast-HOWTO-2.html is a concise 4-page introduction. Describes joining/leaving.

[2] http://ntrg.cs.tcd.ie/undergrad/4ba2/multicast/antony/ has sample code to send/receive. Note there’s no server/client actually.

 

multicast address 1110xxxx #briefly

By definition, multicast addresses all start with 1110 in the first half byte. Routers seeing such a destnation (never a source) address knows the msg is a multicast msg.

However, routers don’t forward any msg with destnation address 224.0.0.0 through 224.0.0.255 because these are local multicast addresses. I guess these local multicast addresses are like 192.168.* addresses.

broadcast^multicast

http://en.wikipedia.org/wiki/Multicast shows(suggests?) that broadcast is also time-efficient since sender only does one send. However, multicast is smarter and more bandwidth-efficient.

IPv6 disabled broadcast — to prevent disturbing all nodes in a network when only a few are interested in a particular service. Instead it relies on multicast addressing, a conceptually similar one-to-many routing methodology. However, multicasting limits the pool of receivers to those that join a specific multicast receiver group.

multicast – highly efficient@@ # my take

(Note virtually all MC apps use UDP.)
To understand MC efficency, we must compare with UC (unicast) and BC (broadcast). First we need some “codified” metrics —
  • TT = imposing extra Traffic on network, which happens when the same packet is sent multiple times through the same network.
  • RR = imposing extra processing workload on the Receiver host, because the packet is addressed TO “me” (pretending to be a receiver). If “my” address were not mentioned in the packet, then I would have ignored it without processing.
  • SS = imposing extra processing workload by the Sender — a relatively low priority.
Now we can contrast MC, UC and BC. Suppose there are 3 receiver hosts to be notified, and 97 other hosts to leave alone, and suppose you send the message via —
  1. UC – TT not RR — sender dispatches 3 copies each addressed to a single host.
  2. BC – RR not TT — every host on the network sees a packet addressed to it though most would process then ignore it, wasting receiver’s time. When CEO sends an announcement email, everyone is in the recipient list.
  3. MC – not RR not TT. However, MC can still flood the network.

multicast: video streaming^mkt data

These are the 2 main usages of IP multicast. In both, Lost packets are considered lost forever. Resend is sometimes considered “too late”.

I think some of the world’s most cutting-edge network services — live price feed, live event broadcast, VOD — rely on IP multicast.

Multicast is more intelligent data dissemination than broadcast, and faster than unicast. Intelligence is built into routers.

I believe JMS publish is unicast based, not broadcast based. The receivers don’t comprise an IP broadcast group. Therefore JMS broker must deliver to one receiver at a time.

how does reliable multicast work #briefly

I guess a digest of the msg + a sequence number is sent out along with the msg itself.

See wiki.

One of the common designs is PGM —

While TCP uses ACKs to acknowledge groups of packets sent (something that would be uneconomical over multicast), PGM uses the concept of Negative Acknowledgements (NAKs). A NAK is sent unicast back to the host via a defined network-layer hop-by-hop procedure whenever there is a detection of data loss of a specific sequence. As PGM is heavily reliant on NAKs for integrity, when a NAK is sent, a NAK Confirmation (NCF) is sent via multicast for every hop back. Repair Data (RDATA) is then sent back either from the source or from a Designated Local Repairer (DLR).

PGM is an IETF experimental protocol. It is not yet a standard, but has been implemented in some networking devices and operating systems, including Windows XP and later versions of Microsoft Windows, as well as in third-party libraries for Linux, Windows and Solaris.

reliable multicast – basics

First, use a distinct sequence numbers for each packet. When one of the receivers notices a missed packet, it asks sender to resend ….to all receivers.

As an optimization, use bigger chunks. Use a window of packets. If the transmission is reliable, then expand the window size, so each sequence number covers to a (much) larger chunk of packets.

These are the basic reliability techniques of TCP. Reliable multicast could borrow these from TCP.

Note real TCP isn’t usable for multicast as each TCP transmission has exactly one sender and one receiver. I think entire TCP protocol is based on that premise — unicast circuit.

IV – UDP/java

My own Q: How do you make UDP reliable?
A: sequence number + gap management + retransmission

My own Q: can q(snoop) capture UDP traffic?

Q: Why would multicast use a different address space? Theoretical question?
A: each MC address is a group…

Q: why would a server refuse connection? (Theoretical question?)
%%A: perhaps tcp queue is full, so application layer won’t see anything

——————

Q: How do you avoid full GC
Q: what’s the impact of 64 bit JVM?
Q: how many package break releases did your team have in a year?
Q: In a live production system, how do you make configuration changes with minimal impact to existing modules?