convert any-size int to my host endianness

https://linux.die.net/man/3/be64toh is what I was looking for.

There are many unofficial solutions on StackOverflow etc. They unconditionally swap the bytes, but what if the host endianness is same as the input? I don’t like these flaky solutions.

The standard ntoh functions are like betoh, because “n” means “be” i.e big-endian.

[19]with3sockets: select^non-block recv#CSY

Update — CSY said when you increase from 3 to 99, the difference becomes obvious.

Q (TradeWeb mkt data team): given 3 non-blocking receive-sockets, you can either read them sequentially in a loop or use select(). What’s the difference?

http://compgeom.com/~piyush/teach/4531_06/project/hell.html compares many alternatives. It says Non-block socket read() is least efficient as it wastes CPU.

If you have 3 receive-sockets to monitor, select()/poll() is designed just for your situation.

Using read() instead, you may try specifying a timeout in your read() like

//set socket option via SO_RCVTIMEO to 5 seconds
while(1){
recv(sock1);
recv(sock2);

}
Low latency systems can’t use this design because while you wait 5 seconds on socket1, you miss new data on socket2 😦

Therefore, low-latency systems MUST disable SO_RCVTIMEO. In such a design, the loop wastes cpu in a spin-wait.

–another advantage to selelct(): you can process a high-priority socket earlier when select() says multiple sockets ready.

I gave this answer on the spot.

recv()/send() with timeout #CSY

I was right — timeout is supported Not only in select()/poll(), but also read/recv/send..

SO_RCVTIMEO and SO_SNDTIMEO

Timeouts only have effect for system calls that perform socket I/O (e.g., read(2), recvmsg(2), send(2), sendmsg(2)); timeouts have no effect for select(2), poll(2), epoll_wait(2), and so on. These latter functions only check socket status… no I/O.

I believe this socket option has no effect if used on non-blocking sockets . Nonblocking socket can be created either

  • O_NONBLOCK in fcntl() or
  • SOCK_NONBLOCK in socket()

xtap check`if UDP channel=healthy #CSY

xtap library needs reliable ways to check if a “connectivity” is down

UDP (including multicast) is a connectionless protocol; TCP is connection-oriented. Therefore the xtap “connection” class cannot be used for multicast channels. Multicast Channel is like a TV-channel (NYSE terminology).

–UDP is connectionless, has no session, no virtual circuit, so no “established” state. So how do we know the exchange server is dead or just quiet?

After discussing with CSY, I feel UDP unicast or UDP multicast can’t tell me.

heartbeat — I think we must rely on the heartbeat. I actually helped create an inactivity timeout alert precisely because in a multicast channel, we don’t know if exchange is down or just quiet.

–TCP should be easier.

Many online resources such as https://stackoverflow.com/questions/4142012/how-to-find-the-socket-connection-state-in-c.

According to CSY, as a receiver, we don’t need to send a probe message. if exchange has closed the TCP socket, the four-way handshake would have informed my socket that the connection is closed. So my select() would say my TCP socket is ready for reading. When I read it I would get 0 bytes.

I believe there’s a one-to-one mapping between a socket and a “connection” for TCP only

 

de-multiplex packets bearing Same dest ip:port Different source

see de-multiplex by-destPort: UDP ok but insufficient for TCP

For UDP, the 2 packets are always delivered to the same destination socket. Source IP:port are ignored.

For TCP, if there are two matching worker sockets … then delivered to them. Perhaps two ssh sessions.

If there’s only a listening socket, then both packets delivered to the same socket, which has wild cards for remote ip:port.

UDP socket is identified by two-tuple; TCP socket is by four-tuple

Based on [[computer networking]] P192. see also de-multiplex by-destPortNumber UDP ok but !! enough for TCP

  • Note the term in subject is “socket” not “connection”. UDP is connection-less.

A TCP segment has four header fields for Source IP:port and destination IP:port.

A TCP socket has internal data structure for a four-tuple — Remote IP:port and local IP:port.

A regular TCP “Worker socket” has all four items populated, to represent a real “session/connection”, but a Listening socket could have wild cards in all but the local-port field.

retrans: FIX^TCP^xtap

The FIX part is very relevant to real world OMS.. Devil is in the details.

IP layer offers no retrans. UDP doesn’t support retrans.

 

 

TCP FIX xtap
seq# continuous no yes.. see seq]FIX yes
..reset automatic loopback managed by application seldom #exchange decision
..dup possible possible normal under bestOfBoth
..per session per connection per clientId per day
..resumption? possible if wire gets reconnected quickly yes upon re-login unconditional. no choice
Ack positive Ack needed only needed for order submission etc not needed
gap detection sophisticated every gap should be handled immediately since sequence is critical. Out-of-sequence is unacceptable. gap mgr with timer
retrans sophisticated receiver(ECN) will issue resend request; original sender to react intelligently gap mgr with timer
Note original sender should be careful resending new orders.
See https://drivewealth-fix-api.readme.io/docs/resend-request-message

 

de-multiplex by-destPort: UDP ok but insufficient for TCP

When people ask me what is the purpose of the port number in networking, I used to say that it helps demultiplex. Now I know that’s true for UDP but TCP uses more than the destination port number.

Background — Two processes X and Y on a single-IP machine  need to maintain two private, independent ssh sessions. The incoming packets need to be directed to the correct process, based on the port numbers of X and Y… or is it?

If X is sshd with a listening socket on port 22, and Y is a forked child process from accept(), then Y’s “worker socket” also has local port 22. That’s why in our linux server, I see many ssh sockets where the local ip:port pairs are indistinguishable.

TCP demultiplex uses not only the local ip:port, but also remote (i.e. source) ip:port. Demultiplex also considers wild cards.

TCP UDP
socket has local IP:port
socket has remote IP:port no such thing
2 sockets with same
local port 22 ???
can live in two processes not allowed
can live in one process not allowed
2 msg with same dest ip:port
but different source ports
addressed to 2 sockets;
2 ssh sessions
addressed to the
same socket

Q: which thread/PID drains NicBuffer→socketBuffer

Too many kernel concepts here. I will use a phrasebook format. I have also separated some independent tips into hardware interrupt handler #phrasebook

  1. Scenario 1 : A single CPU. I start my parser which creates the multicast receiver socket but no data coming. My process (PID 111) gets preempted on timer interrupt. CPU is running unrelated PID 222 when my data wash up on the NIC.
  2. Scenario 2: pid111 is running handleInput() while additional data comes in on the NIC.

Some key points about all scenarios:

  • context switching — There’s context switch to interrupt handler (i-handler). In all scenarios, the running process gets suspended to make way for the interrupt handler function. I-handler’s instruction address gets loaded into the cpu registers and this function starts “driving” the cpu. Traditionally, the handler function would use the suspended process’s existing stack.
    • After the i-handler completes, the suspended “current” process resumes by default. However, the handler may cause another pid333 to be scheduled right away [1 Chapter 4.1].
  • no pid — interrupt handler execution has no pid, though some authors say it runs on behalf of the suspended pid. I feel the suspended pid may be unrelated to the socket (Scenario 2), rather than the socket’s owner process pid111.
  • kernel scheduler — In Scenario 1, pid111 would not get to process the data until it gets in the “driver’s seat” again. However, the interrupt handler could trigger a rescheduling and push pid111 “to the top of the queue” so to speak. [1 Chapter 4.1]
  • top-half — drains the tiny NIC ring-buffer into main memory (presumably socket buffer) as fast as possible [2] as NIC buffer can only hold a few packets — [[linux kernel]] P 629.
  • bottom-half — (i.e. deferrable functions) includes lengthy tasks like copying packets. Deferrable function run in interrupt context [1 Chapter 4.8], under nobody’s pid
  • sleeping — the socket owner pid 111 would be technically “sleeping” in the socket’s wait queue initially. After the data is copied into the socket receive buffer in user space, I think the kernel scheduler would locate pid111 in the socket’s wait queue and make pid111 the cpu-driver. This pid111 would call read() on the socket.
    • wait queue — How the scheduler does it is non-trivial. See [1 Chapter 3.2.4.1]
  • burst — What if there’s a burst of multicast packets? The i-handler would hog or steal the driver’s seat and /drain/ the NIC ring-buffer as fast as possible, and populate the socket receive buffer. When the i-handler takes a break, our handleInput() would chip away at the socket buffer.
    • priority — is given to the NIC’s interrupt handler as NIC buffer is much smaller than socket buffer.
    • UDP could overrun the socket receive buffer; TCP uses transmission control to prevent it.

Q: What if the process scheduler is triggered to run (on timer interrupt) while i-handler is busy draining the NIC?
A: Well, all interrupt handlers can be interrupted, but I would doubt the process scheduler would suspend the NIC interrupt handler.

One friend said the while the i-handler runs on our single-CPU, the executing pid is 1, the kernel process. I doubt it.

[1] [[UnderstandingLinuxKernel, 3rd Edition]]

[2] https://notes.shichao.io/lkd/ch7/#top-halves-versus-bottom-halves

IP4 fragmentation+reassembly #MTU,offset

I consider this a “halo” knowledge pearl because it is part of an essential everyday service. We can easily find an opportunity to inject it into an IV.

A Trex interviewer said something questionable. I said fragmentation is done at IP layer and he said yes but not reassembly. He was wrong. See P329 [[Computer Networking]]

I was talking about IP layer breaking up , say, a 4KB packet (TCP or UDP packet) into three IP-fragments no bigger than 1500B [1]. The reassembly task is to put all 3 fragments back together in sequence (and detect missing fragments) and hand it over to TCP or UDP.

This reassembly is done in IP layer. IP4 uses an “offset” number in each fragment to identify the sequencing and to detect missing fragments. The fragment with the highest offset also has a flag indicating it’s the last fragment of a given /logical/ packet.

Therefore, IP4 detects and will never deliver partial packets to UDP/TCP (P328 [[computer networking]]), even though IP is considered an unreliable service. IP4 can detect missing/incomplete IP-datagram and will refuse to “release” it to upper layer (i.e. UDP/TCP).

  • If TCP doesn’t get one “segment” (a.k.a. IP-datagram), it will request retransmission
  • UDP does no retransmission
  • IP4 does no retransmission

[1] MTU for some hardware is lower than 1500 Bytes …

accept()+select() : multiple persistent worker-sockets

I feel it is not that common. See https://stackoverflow.com/questions/3444729/using-accept-and-select-at-the-same-time is very relevant.

The naive design — a polling-thread (select/poll) to monitor new data on 2 worker-sockets + accept-thread to accept on the listening socket. The accept-thread must inform the polling thread after a worker-socket is born.

The proposed design —

  1. a single polling thread to watch two existing worker sockets W1/W2 + listening socket LL. select() or poll() would block.
  2. When LL is seen “ready”, select() returns, so the same thread will run accept() on LL and immediately get a 3rd worker-socket W3. No blocking:)
  3. process the data on the new W3 socket
  4. go back to select() on W1 W2 W3 LL
  • Note if any worker socket has data our polling thread must process it quickly. If any worker socket is hogging the polling thread, then we need another thread to offload the work.
  • Note all worker sockets, by definition, have identical local (i.e. server-side) port, since they all inherit the local port from LL.

[[tcp/ip socket programming in C]] shows a select() example with multiple server ports.

 

simultaneous send to 2 tcp clients #multicast emulation

Consider a Mutlti-core machine hosting either a forking or multi-threaded tcp server. The accept() call would return twice with file descriptors 5 and 6 for two new-born worker sockets. Both could have the same server-side address, but definitely their ports must be identical to the listening port like 80.

There will be 2 dedicated threads (or processes) serving file descriptor 5 and 6, so they can both send the same data simultaneously. The two data streams will not be exactly in-sync because the two threads are uncoordinated.

My friend Alan confirmed this is possible. Advantages:

  • can handle slow and fast receives at different data rates. Each tcp connection maintains its state
  • guaranteed delivery

For multicast, a single thread would send just a single copy  to the network. I believe the routers in between would create copies for distribution.  Advantages:

  • Efficiency — Router is better at this task than a host
  • throughput — tcp flow control is a speed bump

 

after fork(): threads,sockets.. #Trex

I have read about fork() many times without knowing these details, until Trex interviewer asked !

–based on http://man7.org/linux/man-pages/man2/fork.2.html

The child process is created with a single thread—the one that called fork(). The entire virtual address space of the parent is replicated in the new process, including the states of pthread mutexes, pthread condition variables, and other pthreads objects In particular, if in parent process a lock was held by some other thread t2, then child process only has the main thread (which called fork()) and no t2 but the lock is still unavailable. This is a common problem, addressed in http://poincare.matf.bg.ac.rs/~ivana/courses/ps/sistemi_knjige/pomocno/apue/APUE/0201433079/ch12lev1sec9.html.

The very 1st instruction executed in Child is the instruction after fork() — as proven in https://github.com/tiger40490/repo1/blob/cpp1/cpp1/fork3times.cpp

The child inherits copies of the parent’s set of open file descriptors, including stdin/stdout/stderr. Child process should usually close them.

Special case — socket file descriptor inherited. See https://bintanvictor.wordpress.com/2017/04/29/socket-shared-between-2-processes/

 

check a receiver socket is alive: tcp^udp

Q: given a TCP receiver socket, how do you tell if it’s connected to a session or disconnected?

Shanyou said when you recv() on the socket but got 0 it means disconnected.

http://man7.org/linux/man-pages/man2/recv.2.html#RETURN_VALUE shows recv() return value of 0 indicates dead connection i.e. disconnected.

https://stackoverflow.com/questions/4142012/how-to-find-the-socket-connection-state-in-c uses getsockopt()

Q: given a UDP multicast receiver socket, how do you tell if it’s still has a live subscription to the multicast group?

%%A: I guess you can use getsockopt() to check socket /aliveness/. If alive but no data, then the group is quiet

non-blocking socket readiness: alternatives2periodic poll

Some Wells interviewer once asked me —

Q: After your non-blocking send() fails due to a full buffer, what can you do to get your data sent ASAP?

Simple solution is retrying after zero or more millisecond. Zero would be busy-weight i.e. spinning CPU. Non-zero means unwanted latency.

A 1st alternative is poll()/select() with a timeout, and immediately retry the same. There’s basically no latency. No spinning either. The linux proprietary epoll() is more efficient than poll()/select() and a popular solution for asynchronous IO.

2nd alternative is SIGIO. http://compgeom.com/~piyush/teach/4531_06/project/hell.html says it doesn’t waste CPU. P52 [[tcp/ip sockets in C]] also picked this solution to go with non-blocking sockets.

http://compgeom.com/~piyush/teach/4531_06/project/hell.html is actually a concise overview of several alternatives

  • non-blocking socket
  • select/poll/epoll
  • SIGIO
  • .. other tricks

## Y avoid blocking design

There are many contexts. I only know a few.

1st, let’s look at an socket context. Suppose there are many (like 500 or 50) sockets to process. We don’t want 50 threads. We prefer fewer, perhaps 1 thread to check each “ready” socket, transfer whatever data can be transferred then go back to waiting. In this context, we need either

  • /readiness notification/, or
  • polling
  • … Both are compared on P51 [[TCP/IP sockets in C]]

2nd scenario — GUI. Blocking a UI-related thread (like the EDT) would freeze the screen.

3rd, let’s look at some DB request client. The request thread sends a request and it would take a long time to get a response. Blocking the request thread would waste some memory resource but not really CPU resource. It’s often better to deploy this thread to other tasks, if any.

Q: So what other tasks?
A: ANY task, in the thread pool design. The requester thread completes the sending task, and returns to the thread pool. It can pick up unrelated tasks. When the DB server responds, any thread in the pool can pick it up.

This can be seen as a “server bound” system, rather than IO bound or CPU bound. Both the CPU task queue and the IO task queue gets drained quickly.

 

UDP/TCP socket read buffer size: can be 256MB

For my UDP socket, I use 64MB.
For my TCP socket, I use 64MB too!

These are large values and required kernel turning. In my linux server, /etc/sysctl.conf shows these permissible read buffer sizes:

net.core.rmem_max = 268435456 # —–> 256 MB
net.ipv4.tcp_rmem = 4096   10179648   268435456 # —–> 256 MB

Note a read buffer of any socket is always maintained by the kernel and can be shared across processes [1]. In my mind, the TCP/UDP code using these buffers is kernel code, like hotel service. Application code is like hotel guests.

[1] Process A will use its file descriptor 3 for this socket, while Process B will use its file descriptor 5 for this socket.

select^poll # phrasebook

Based on https://www.ulduzsoft.com/2014/01/select-poll-epoll-practical-difference-for-system-architects/, which I respect.

  • descriptor count — up to 200 is fine with select(); 1000 is fine with poll(); Above 1000 consider epoll
  • time-out precision — poll/epoll has millisec precision. select() has nanosec, a million times higher precision, but only embedded devices need such precision.
  • single-threaded app — poll is just as fast as epoll. epoll() excels in MT.

https://www.ibm.com/support/knowledgecenter/ssw_ibm_i_71/rzab6/poll.htm has sample code on poll().

sharing port or socket #index page

Opening example – we all know that once a local endpoint is occupied by a tcp server process, another process can’t bind to it.

However, various scenarios exist to allow some form of sharing.

https://bintanvictor.wordpress.com/2017/04/29/socket-shared-between-2-processes/

https://bintanvictor.wordpress.com/2017/04/29/so_reuseport-socket-option/

https://bintanvictor.wordpress.com/2017/04/29/2-tcp-clients-connected-to-a-single-server-socket/

https://bintanvictor.wordpress.com/2017/03/29/multiple-sockets-can-attach-to-the-same-addressport/

socket accept() key points often missed

I have studied accept() many times but still unfamiliar.

Useful as zbs, and perhaps QQ, rarely for GTD…

Based on P95-97 [[tcp/ip socket in C]]

  • used in tcp only
  • used on server side only
  • usually called inside an endless loop
  • blocks most of the time, when there’s no incoming new connections. The existing clients don’t bother us as they communicate with the “child” sockets independently. The accept() “show” starts only upon a new incoming connection
    • thread remains blocked, starting from receiving the incoming until a newborn socket is fully Established.
    • at that juncture the new remote client is probably connected to the newborn socket, so the “parent thread[2]” have the opportunity/license to let-go and return from accept()
    • now, parent thread has the newborn socket, it needs to pass it to a child thread/process
    • after that, parent thread can go back into another blocking accept()
  • new born or other child sockets all share the same local port, not some random high port! Until now I still find this unbelievable. https://stackoverflow.com/questions/489036/how-does-the-socket-api-accept-function-work confirms it.
  • On a host with a single IP, 2 sister sockets would share the same local ip too, but luckily each socket structure has at least 4 [1] identifier keys — local ip:port / remote ip:port. So our 2 sister sockets are never identical twins.
  • [1] I omitted a 5th key — protocol as it’s a distraction from the key point.
  • [2] 2 variations — parent Thread or parent Process.

in-depth article: epoll illustrated #SELECT

(source code is available for download in the article)

Compared to select(), the newer linux system call epoll() is designed to be more performant.

Ticker Plant uses epoll. No select() at all.

https://banu.com/blog/2/how-to-use-epoll-a-complete-example-in-c/ is a nice article with sample code of a TCP server.

  • bind(), listen(), accept()
  • main() function with an event loop. In the loop
  • epoll_wait() to detect
    • new client
    • new data on existing clients
    • (Using the timeout parameter, it could also react to a timer events.)

I think this toy program is more readable than a real-world epoll server with thousands of lines.

##which common UDP/TCP functions are blocking

Q: which blocking functions support a timeout?
A: All

A non-blocking send fails when it can’t send a single byte, usually because destination send buffer (for TCP) is full. For UDP, see [4]

Note read() and write() has similar behavior. See send()recv() ^ write()read() @sockets and http://www.mathcs.emory.edu/~cheung/Courses/455/Syllabus/7-transport/flow-control.html

[1] meaning of non-blocking connect() on TCP is special. See P51[[tcp/ip sokets in C]] and https://www.scottklement.com/rpg/socktut/nonblocking.html
[2b] accept() returning with timeout is obscure knowledge — see accept(2) manpage
[2c] accept() is designed to return immediately on a nonblocking socket — https://stackoverflow.com/questions/12861956/is-it-possible-and-safe-to-make-an-accepting-socket-non-blocking and https://www.scottklement.com/rpg/socktut/nonblocking.html
[3] select() on a non-blocking socket is obscure knowledge —  See https://www.scottklement.com/rpg/socktut/nonblocking.html
[4] UDP has no send buffer but some factors can still cause blocking … obscure knowledge
[5] close() depends on SO_LINGER and non-blocking by default … https://stackoverflow.com/questions/6418277/how-to-close-a-non-blocking-socket
[6] MSG_DONTWAIT flag

t/out default flags arg to func fcntl on entire socket touching TCP buffers?
y select blocking not supported? still blocking! [3]  no
epoll blocking not supported?  no
y recv blocking can change to NB [6] can change to NB  yes
y send/write TCP blocking can change to NB [6] can change to NB  yes
y recvfrom blocking can change to NB [6] can change to NB  yes
y sendto UDP can block [4] can change to NB [6] can change to NB  yes
y close() with SO_LINGER not the default not supported yes yes
close() by default NB [5] not supported can change to blocking yes
y[2b] accept by default blocking not supported yes yes
accept on NB socket [2c] not supported yes no
LG connect()TCP blocking not supported? can change to NB [1] no
LG connect()UDP NB NotApplicable

send()recv() ^ write()read() @sockets

Q: A socket is also a file descriptor, so why bother with send()recv() when you can use write()read()?

A: See https://stackoverflow.com/questions/9475442/unix-domain-socket-vs-named-pipes

send()recv() are recommended, more widely used and better documented.

[[linux kernel]] P623 actually uses read()write() for udp, in stead of sendto()recvfrom(), but only after a call to connect() to set the remote address

socket stats monitoring tools – on-line resources

This is a rare interview question, perhaps asked 1 or 2 times. I don’t want to overspend.

In ICE RTS, we use built-in statistics modules written in C++ to collect the throughput statistics.

If you don’t have source code to modify, I guess you need to rely on standard tools.

message fragment in Internet Protocol !! TCP

IP layer handles fragmentation/defrag. UDP and TCP are one layer above IP and relies on this “service” of the IP layer.

UDP may (TCP is different) occasionally lose an entire “logical” packet, but never Part of a logical packet.

In my own words, If IP layer loses a “fragment” it discards the entire packet.

When a logical packet is broken up at IP layer into physical packets, the constituent physical packets will either be delivered altogether or lost altogether. The frag/defrag IP service is transparent to upper layers so UDP/TCP don’t need to worry about basic data integrity.

I will attempt to contrast it to TCP flow control, which breaks up a megabyte file into smaller chunks. Each chunk is a “logical” packet. TCP (not UDP) uses sequence numbers in the packets.

tcp/udp use a C library, still dominating

History – The socket library was created in the 1980’s and has stood the test of time. Similar resilience is seen in SQL, Unix, and mutex/condition constructs

In the socket programming space, I feel C still dominates, though java is a contender (but i don’t understand why).

Perl and python both provide thin wrappers over the C socket API. (Python’s wrapper is a thin OO wrapper.)

Sockets are rather low level /constructs/ and performance-critical — latency and footprint. OO adds both overheads without adding well-appreciated or much-needed OO nicety. If you need flexibility, consider c++ templates. All modern languages try to encapsulate/wrap the low-level details and present an easier API to upper-layer developers.

Choose one option only between —
AA) If you write tcp/ip code by hand, then you probably don’t need OO wrappers in c#, java or python
BB) If you like high-level OO wrappers, then don’t bother with raw sockets.

My bias is AA, esp. on Wall St. Strong low-level experience always beats (and often compensates for lack of) upper-layer experience. If you have limited time, invest wisely.

I feel one problems with java is, sockets are low-level “friends” of ints and chars, but java collections need auto-boxing. If you write fast java sockets, you must avoid auto-boxing everywhere.

Q: is java socket based on C api?
A: yes http://stackoverflow.com/questions/12064170/java-net-socket-implementation

b4 and af select() syscall

Note select() is usually used on the server-side. It allows a /single/ server thread to handle hundreds of concurrent clients.
— B4 —
open the sockets. Each socket is represented by an integer file descriptor. It can be saved in an int array. (A vector would be better, but in C the array also looks like an int pointer).

FD_SET(socketDes1, &readfds); /* add socketDes1 to the readfds */
——–

select() function argument includes readfds — the list of existing sockets[1]. Select will test each socket.

— After —
check the set of incoming sockets and see which socket is “ready”.

FD_ISSET(socketDes1, &readfds)

If ready, Then you can either read() or recvfrom()

http://www.cs.cmu.edu/afs/cs/academic/class/15441-f01/www/lectures/lecture03.ppt has sample code.

[1] Actually Three independent sets of file descriptors are watched, but for now let’s focus on the first — the incoming sockets

select() syscall and its drv

select() is the most “special” socket kernel syscall (not a “standard library function”). It’s treated special.

– Java Non-blocking IO is related to select().
– Python has at least 3 modules — select, asyncore, asynchat all built around select()
– dotnet offers 2 solutions to the same problem:

  1. a select()-based and
  2. a newer asynchronous solution to the same problem

socket file desc in main() function

I see real applications with main() function declaring a stack variable my_sockfd = socket(…).

Q: is the my_sockfd stackVar in scope throughout the app's lifetime?

Q2: what if the main() function exits with some threads still alive? Will the my_sockfd object (and variable) disappear?
A: yes see Q2b.

Q2b: Will the app exit?
A: yes. See blog http://bigblog.tanbin.com/2008/12/main-thread-early-exit-javac.html

5 parts in socket data structure

— Adapted from http://stackoverflow.com/questions/489036/how-does-the-socket-api-accept-function-work

Note accept() instantiates a socket object and returns a file descriptor for it. accept() doesn’t open a new port.

A socket object in memory consists of 5 things – (source ip, source port, destination ip, destination port, protocol). Here the protocol could TCP or UDP[1]. This protocol is identified in the packet from the ‘protocol’ field in the IP datagram.

Thus it is possible to have 2 different applications on the server communicating to to the same client on exactly the same 4-tuples but different in protocol field. For example

Apache at server talking on (server1.com:880-client1:1234 on TCP) and
World of warcraft talking on (server1.com:880-client1:1234 on UDP)

Both the client and server will handle it since protocol field in the IP packet in both cases is different even if all the other 4 fields are same.

[1] or others

tcp/udp client+server briefly #connect()UDP

(Note no unix-domain sockets covered here.)

As elaborated in http://bigblog.tanbin.com/2010/12/socket-bind-listen-accept.html

TCP server is socket()-bind-listen-accept
TCP client is socket()-connect

UDP server is socket()-bind
UDP client is socket()-bind { optional }

(UDP both sides similar. See the nice diagram and sample code in http://www.tenouk.com/Module41a.html)

However, after the set-up, to move the data, UDP supports several choices.

2011 BGC c++IV #socket #done

Q3: what can go wrong during a write to a socket??? (LG)
Q3b: if buffer is full, will the writing thread block???
%A: by default it blocks, but it’s probably possible to configure it to return an error code or even thrown an exception

Q: blocking vs non-blocking socket?

Q: socket programming – what’s select ?

Q: Should base class dtor always be virtual?
%%A: I would say yes. If a dotr is really not virtual, then it’s not supposed to be a base class. However, SOF mentions: “If you want to prevent the deletion of an instance through a base class pointer, you can make the base class destructor protected and non-virtual; by doing so, the compiler won’t let you call deleteon a base class pointer.”

Q: how many ways to share data between 2 processes? How about shared memory

Q: synchronize 2 unix processes accessing a shared file?
A: named semaphore

fd_set in select() syscall, learning notes

First thing to really understand in select() is the fd_set. Probably the basis of the java Selector.

A fd_set is a struct holding a bunch[2] of file descriptors. (I don’t think there’s any boolean flag in it). A fd_set instance is used as an in/out parameter to select().
– upon entry, it carries the list of sockets to check
– upon return, it carries the subset of those sockets found “dirty” [1]

FD_SET(fd, fdSet) adds a file descriptor “fd” to fdSet. Used before select().
FD_ISSET(fd, fdSet) checks if fd is part of fdSet. Used after select().

—-
Now we understand fd_set, let’s look at …
First parameter to select() is max_descriptor. File Descriptors are numbered starting at zero, so the max_descriptor parameter must specify a value that is one greater than the largest descriptor number that is to be tested. I see a lot of confusion looking at how programmers populate this parameter.

See http://publib.boulder.ibm.com/infocenter/iseries/v5r3/index.jsp?topic=%2Frzab6%2Frzab6xnonblock.htm

[1] “ready” is the better word
[2] if you don’t have any file descriptor, then you should pass in NULL, not an empty fd_set

select() syscall lesson1: fd_set …

A fd_set instance is a struct (remember — C API) holding a bunch of file descriptors. I don’t think there’s any boolean flag in it.

A fd_set instance is used as an in/out parameter to select(). Only pointer arguments support in/out.
– upon entry, it carries the list of sockets to check
– upon return, it carries the subset of those sockets found “dirty” [1]

FD_SET(fd, fdSet) is a free function (remember — C API) which adds a file descriptor “fd” into a fdSet. Used before select().

FD_ISSET(fd, fdSet) is a free function (remember — C API) which checks if fd is part of fdSet. Used after select().

First parameter to select() is max_descriptor. File Descriptors are numbered starting at zero, so the max_descriptor parameter must specify a value that is one greater than the largest descriptor number that is to be tested.

See http://publib.boulder.ibm.com/infocenter/iseries/v5r3/index.jsp?topic=%2Frzab6%2Frzab6xnonblock.htm

[1] “ready” is the better word

select() syscall multiplex vs 1 thread/socket ]mkt-data gateway

Low-volume market data gateways could multiplex using select() syscall — Warren of CS. A single thread can service thousands of low-volume clients. (See my brief write-up on epoll) Blocking socket means each read() and write() could block an entire thread. If 90% of 1000 sockets have full buffers, then 900 threads would block in write(). Too many threads slow down entire system.

A standard blocking socket server’s main thread blocks in accept(). Upon return, it gets a file handle. It could save the file handle somewhere then go back to accept(). Over time it will collect a bunch of file handles, each being a socket for a particular network client. Another server thread can then use select() to talk to multiple clients, whild the main accept() thread continues to wait for new connections.

However, in high volume mkt data gateways, you might prefer one dedicated thread per socket. This supposedly reduces context switching. I believe in this case there’s a small number of sockets preconfigured, perhaps one socket per exchange. In such a case there’s no benefit in multiplex. Very different from a google web server.

This dedicated thread may experience short periods of silence on the socket – I guess market data could come in bursts. I was told the “correct” design is spin-wait, with a short sleep between iterations. I was told there’s no onMsg() in this case. I guess onMsg() requires another thread to wake up the blocking thread. Instead, the spin thread simply sleeps briefly, then reads the socket until there’s no data to read.

If this single thread and this socket are dedicated to each other like husband and wife, then there’s not much difference between blocking vs non-blocking read/write. The reader probably runs in an endless loop, and reads as fast as possible. If non-blocking, then perhaps the thread can do something else when socket buffer is found empty. For blocking socket, the thread is unable to do any useful work while blocked.

I was told UDP asynchronous read will NOT block.

socket operations in perl, python and c

(“i-socket” means internet socket. Our discussion doesn't cover UD-socket ie unix domain sockets.)

1st) and easy lesson – both perl and python provide wrappers over C-level syscalls, so function names are identical across C, perl and python.

2nd) Lesson – python uses a socket object as the Dictator (remember Benevolent Dictator For Life?) whereas Perl puts a socket FILEhandle as the first argument to those functions.

Like Perl, C puts a socket FILE descriptor as first argument to those functions. However, the C file descriptor is different from a Perl file handle —
– C file descriptor is an integer either for a socket or for a disk file
– Perl file handle is internally implemented like c++ i/o streams. As compared in the post on [[pipes, streams, io classes]], all of these are built on the PIPE concept

However, both C and Perl treat a socket like a file, to support read/write.

NIO q&&a

Prerequisite: blocking. See other posts

Q: Typically, how many active clients can an NIO server support?
A: thousands for a single-thread NIO server

Q: How about traditional IO?
A: typically hundreds

Q: can we have a single-thread server in traditional IO? P221
A: no. when data is unavailable on the socket, the server thread would “block” and stop doing anything at all. Since there are no other threads to do anything either, the entire server freezes up in waiting, ignoring any events or signals or inputs.

Q: basic NIO server uses a single thread or multiple threads?
A: p 233

Q: in traditional IO, what’s the threading, session, socket … allocation?
A: each client is linked to a server socket, a thread.

Q: why traditional io supports limited number of clients?
A: threads can’t multiply infinitely. (No real limit on sockets.). P227 shows 2nd reason

Q: problem of thread pool?
A: thousands of active clients would require same size (heavy!) of thread pool!

Q: in one sentence, explain a major complexity in NIO programming. P232

What q[bind]means to a java socket

Both types — ServerSocket and Socket — are bound to a [1] local address:port and never a remote one. “Bind” always implies “local”.

[1] one and only 1 local address:port.

Look at the ServerSocket.bind() and Socket.bind() API.

It may help to think of a server jvm and a client jvm.

It’s possible for a socket object (in JVM) to be unbound.

java clientside inet-sockets — briefly

Look at Socket.java constructor signatures. As a client side internet socket (not UD-socket), the most basic address:port pair needed is the REMOTE address:port.

Q: so how about the local address:port?
A: Usually, only after a socket is created with the remote address:port, does the socket need to bind() to a local address:port.

Q: Can a Socket object can be on server side or client side.
A: I think both. See ServerSocket.accept() javadoc. accept() manufactures a socket object in the server jvm.

Q: Can a java Socket object be associated to 2 connections? The output data would broadcast into both channels?