Based on https://www.ulduzsoft.com/2014/01/select-poll-epoll-practical-difference-for-system-architects/, which I respect.
* descriptor count — up to 200 is fine with select(); 1000 is fine with poll(); Above 1000 consider epoll
* single-threaded app — poll is just as fast as epoll. epoll() excels in MT.
* time-out precision — poll/epoll has millisec precision. select() has nanosec, but only embedded devices need such precision.
* linux-only — epoll
There are dozens of sub-topics but in my small sample of interviews, the following sub-topics have received disproportionate attention:
- blocking vs non-blocking
- accept() + threading
- add basic reliability over UDP (many blog posts); how is TCP transmission control implemented
Background — The QQ/ZZ framework was first introduced in this post on c++ learning topics
Only c/c++ positions need socket knowledge. However, my perl/py/java experience with socket API is still relevant.
Socket is a low-level subject. Socket tough topics feel not as complex as concurrency, algorithms, probabilities, OO design, MOM …
Interview is mostly knowledge test; but to do well in real projects, you probably need experience.
Coding practice? no need. Just read and blog.
Socket knowledge is seldom the #1 selection criteria for a given position, but could be #3. (In contrast, concurrency or algorithm skill could be #1.)
- [ZZ] tweaking
- [ZZ] exception handling in practice
- —-Above topics are still worth studying to some extent—–
- [QQ] tuning
- [QQ] buffer management
Reason: data rate constraints inherent in TCP protocol. Congestion Control?
Reason: TCP to a large group would be one-by-one unicast, highly inefficient and too much load on the sender. Reason: TCP has more data-overhead in the form of non-payload data. * TCP header is typically 20 bytes vs 8 bytes for UDP
* Receiver need to acknowledge
I have studied accept() many times but still unfamiliar.
Useful as zbs, and perhaps QQ, rarely for GTD…
Based on P95-97 [[tcp/ip socket in C]]
- (probably) used in tcp only
- (probably) used on server side only
- usually called inside an endless loop
- blocks most of the time, when there’s no incoming new connections. The existing clients don’t bother us as they communicate with the “child” sockets independently. The accept() “show” starts only upon a new incoming connection
- thread remains blocked, starting from receiving the incoming until a newborn socket is fully Established.
- at that juncture the new remote client is probably connected to the newborn socket, so the “parent thread” have the opportunity/license to let-go and
return from accept()
- now, parent thread has the newborn socket, it needs to pass it to a child thread/process
- after that, parent thread can go back into another blocking accept()
- new born or other child sockets all share the same local port, not some random high port! Until now I still find this unbelievable. https://stackoverflow.com/questions/489036/how-does-the-socket-api-accept-function-work confirms it.
- On a host with a single IP, 2 sister sockets would share the same local ip too, but luckily each socket structure has at least 4  identifier keys — local ip:port / remote ip:port. So our 2 sister sockets are never identical twins.
-  I omitted a 5th key — protocol as it’s a distraction from the key point.
-  2 variations — parent Thread or parent Process.
Compared to select(), the newer linux system call epoll() is designed to be more performant.
Ticker Plant uses epoll. No select() at all.
https://banu.com/blog/2/how-to-use-epoll-a-complete-example-in-c/ is a nice article with sample code of a TCP server.
- main() function with an event loop
I think this toy program is more educational than a real-world epoll server with thousands of lines of code.
No sample client but I think a standard TCP client will do.
TCP Client-side is a 2-stepper (look at Wikipedia and [[python ref]], among many references)
1) [SC] socket()
2) [C] connect()
[SC = used on server and client sides]
[C=client-only. seldom/never used on server-side.]
Note UDP is connection-less but connect() can be used too — to set the default destination. See https://stackoverflow.com/questions/9741392/can-you-bind-and-connect-both-ends-of-a-udp-connection.
Under TCP, The verb connect() means something quite different — “reach across and build connection”. You see it when you telnet … Also, server-side don’t make outgoing connections, so this is used by TCP client only. When making connection, we often see error messages about server refusing connection, because no server is “accepting”.
 think of a foreign businessman traveling to China to build guanxi with local government officials.
If you receive seq #13 before 11, receiver will detect a gap between 10 and 13 and send retransmission request to the exchange. If 11, 12 come later, they will be discarded.
https://www.iana.org/assignments/multicast-addresses/multicast-addresses.xhtml shows a few hundred big companies including exchanges. For example, one exchange multicast address 126.96.36.199 falls within the range 188.8.131.52 to 184.108.40.206
Intercontinental Exchange, Inc.
It’s educational to compare with a unicast IP address. If you own such an unicast address, you can put it on a host and bind an http server to it. No one else can bind a server to that uncast address. Any client connecting to that IP will hit your host.
As owner of a multicast address, you alone can send datagrams to it and (presumably) you can restrict who can send or receive on this group address. Alan Shi pointed out the model is pub-sub MOM.