make_shared() cache efficiency, forward()

This low-level topic is apparently important to multiple interviewers. I guess there are similarly low-level topics like lockfree, wait/notify, hashmap, const correctness.. These topics are purely for theoretical QQ interviews. I don’t think app developers ever need to write forward() in their code. touches on a few low-level optimizations. Suppose you follow Herb Sutter’s advice and write a factory accepting Trade ctor arg and returning a shared_ptr<Trade>,

  • your factory’s parameter should be a universal reference. You should then std::forward() it to make_shared(). See gcc source code See make_shared() source in
  • make_shared() makes one allocation for a Trade and an adjacent control block, with cache efficiency — any read access on the Trade pointer will cache the control block too
  • if the arg object is a temp object, then the rvr would be forwarded to the Trade ctor. Scott Meryers says the lvr would be cast to a rvr. The Trade ctor would need to move() it.
  • if the runtime object is carried by an lvr (arg object not a temp object), then the lvr would be forwarded as is to Trade ctor?

Q: What if I omit std::forward()?
AA: Trade ctor would receive always a lvr. See ScottMeyers P162 and my github code is my experiment.



move/forward used beyond argument passing: rare

I believe move() and forward() are most often used when passing argument into a worker function (including a ctor). I see very few exceptions with move()

1) immediate steal:

Badstr && alias=move(passedIn);
alias.ptrField = NULL;

2) move-assignment:

existingBadstr = move(passedIn); //could be written as
existingBadstr. operator=(move(passedIn)) //NOT an exception to the norm

Tested in


linux tcp buffer^AWS tuning

—receive buffer configuration
In general, there are two ways to control how large a TCP socket receive buffer can grow on Linux:

  1. You can set setsockopt(SO_RCVBUF) explicitly as the max receive buffer size on individual TCP/UDP sockets
  2. Or you can leave it to the operating system and allow it to auto-tune it dynamically, using the global tcp_rmem values as a hint.
  3. … both values are capped by

/proc/sys/net/core/rmem_max — is a global hard limit on all sockets (TCP/UDP). I see 256M in my system. Can you set it to 1GB? I’m not sure but it’s probably unaffected by the boolean flag below.

/proc/sys/net/ipv4/tcp_rmem — doesn’t override SO_RCVBUF. The max value on my system is again 256M. The receive buffer for each socket is adjusted by kernel dynamically, at runtime.

The linux “tcp” manpage explains the relationship.

Note large TCP receive buffer size is usually required for high latency, high bandwidth, high volume connections. Low latency systems should use smaller TCP buffers.

For high-volume multicast connections, you need large receive buffers to guard against data loss. UDP sender doesn’t have flow control to prevent receiver overflow.


/proc/sys/net/ipv4/tcp_window_scaling is a boolean configuration. (Turned on by default) 1GB  is the new limit on AWS after turning on window scaling. If turned off, then AWS value is constrained to a 16-bit integer in the TCP header — 65536

I think this flag affects AWS and not receive buffer size.

  • if turned on, and if buffer is configured to grow beyond 64KB, then Ack can set AWS to above 65536.
  • if turned off, then we don’t need a large buffer since AWS can only be 65536 or lower.



Ack in FIX^TCP

TCP requires Ack for every segment? See cumulative acknowledgement in [[computerNetworking]]

FIX protocol standard doesn’t require Ack for every message.

However, many exchanges spec would require an Ack message for order submission or execution report. At TrexQuant design interview, I dealt with this issue —

  • –trader sends an order and expecting an Ack
  • exchange sends an Ack and expecting another Ack … endless cycle
  • –exchange sends an exec report and expecting an Ack
  • trader sends an Ack waiting for another Ack … endless cycle
  • –what if at any point trader crashes? Exchange would assume the exec report is not acknowledged?

My design minimizes duty of the exchange —

  • –trader sends an order and expecting an Ack
  • exchange sends an Ack and assumes it’s delivered.
  • If trader misses the Ack she would decide either to resend the order (with PossResend flag) or query the exchange
  • –exchange sends an exec report and expecting an Ack. Proactive resend (with PossResend) when appropriate.
  • trader sends an Ack and assumes it’s delivered.
  • If exchanges doesn’t get any Ack it would force a heartbeat with TestRequest. Then exchange assumes trader is offline.

y FIX needs session seq# over TCP seq#

My friend Alan said … Suppose your FIX process crashed or lost power, and reloads from disk the last sequence received. It would then receive a live seq # higher than expected. CME documentation states:

… a given system, upon detecting a higher than expected message sequence number from its counterparty, requests a range of ordered messages resent from the counterparty.

Major difference from TCP sequence number — FIX has no Ack. See Ack in FIX^TCP

— Sequence number reset policy:

After a logout, sequence numbers is supposed to reset to 1, but if connection is terminated ‘non-gracefully’ sequence numbers will continue when the session is restored. In fact a lot of service providers never reset sequence numbers during the day. There are also some, who reset sequence numbers once per week, regardless of logout.


FIX.5 + FIXT.1 breaking changes

I think many systems are not yet using FIX.5 …

  1. FIXT.1 protocol [1] is a kinda subset of the FIX.4 protocol. It specifies (the traditional) FIX session maintenance over naked TCP
  2. FIX.5 protocol is a kinda subset of the FIX.4 protocol. It specifies only the application messages and not the session messages.

See Therefore,

  • FIX4 over naked TCP = FIX.5 + FIXT
  • FIX4 over non-TCP = FIX.5 over non-TCP

FIX.5 can use a transport messaging queue or web service, instead of FIXT

In FIX.5, Header now has 5 mandatory fields:

  • existing — BeginString(8), BodyLength(9), MsgType(35)
  • new — SenderCompId(49), TargetCompId(56)

Some applications also require MsgSeqNo(34) and SendingTime(52), but these are unrelated to FIX.5

Note BeginString actually look like “8=FIXT1.1

[1] FIX Session layer will utilize a new version moniker of “FIXTx.y”


HFT mktData redistribution via MOM

Several practitioners say MOM is unwelcome due to added latency:

  1. The HSBC hiring manager Brian R was the first to point out to me that MOM adds latency. Their goal is to get the raw (market) data from producer to consumer as quickly as possible, with minimum stops in between.
  2. 29West documentation echos “Instead of implementing special messaging servers and daemons to receive and re-transmit messages, Ultra Messaging routes messages primarily with the network infrastructure at wire speed. Placing little or nothing in between the sender and receiver is an important and unique design principle of Ultra Messaging.
  3. Then I found that the ICE/RTS systems (not ultra-low-latency ) have no middleware between feed parser and order book engine (named Rebus).

However, HFT doesn’t always avoid MOM. P143 [[all about HFT]] published 2010 says an HFT such as Citadel often subscribes to both individual stock exchanges and CTS/CQS [1], and multicasts the market data for other components of the HFT. This design has additional buffers inherently. The first layer receives raw external data via a socket buffer. The 2nd layer components would receive the multicast data via their socket buffers.

[1] one key reason to subscribe redundant feeds — CTS/CQS may deliver a tick message faster!

Lehman’s market data is re-distributed over tibco RV, in FIX format.


IP (de)fragmentation #MTU,offset

A Trex interviewer said something questionable. I said fragmentation is done at IP layer and he said yes but not reassembly.

I was talking about IP layer breaking up , say, a 4KB packet (TCP or UDP packet) into three IP-fragments no bigger than 1500B [1]. The reassembly task is to put all 3 fragments back together in sequence (and detect missing fragments) and hand it over to TCP or UDP.

This reassembly is done in IP layer. IP uses an “offset” number in each fragment to identify the sequencing and to detect missing fragments. The fragment with the highest offset also has a flag indicating it’s the last fragment of a given /logical/ packet.

Therefore, IP detects and will never deliver partial packets to UDP/TCP (P328 [[computer networking]]), even though IP is considered an unreliable service.

[1] MTU for some hardware is lower than 1500 Bytes …


tcp: detect wire unplugged

In general for either producer or consumer, the only way to detect peer-crash is by probing (eg: keepalive, 1-byte probe, RTO…).

  • Receiver generally don’t probe and will remain oblivious.
  • Sender will always notice a missing Ack. After retrying, TCP module will give up and generate SIGPIPE.
send/recv buffers full buffer full then receiver-crash receiver-crash then sender has data to send receiver-crash amid active transmission
visible symptom 1-byte probe from sender triggers Ack containing AWS=0 The same Ack suddenly stops coming very first expected Ack doesn’t come Ack was coming in then suddenly stops coming
retrans by sender yes yes yes yes
SIGPIPE no probably yes probably

Q20: if a TCP producer process dies After transmission, what would the consumer get?
AA: nothing. See — Receiver is ready to receive data, and has no idea that sender has crashed.
AA: Same answer on

Q21: if a TCP producer process dies During transmission, what would the consumer get?
%A: ditto. Receive has to assume sender stopped.

Q30: if a TCP consumer process dies during a quiet period After transmission, what would the producer get?
AA: P49 [[tcp/ip sockets in C]] Sender doesn’t know right away. At the next send(), sender will get -1 as return value. In addition, SIGPIPE will also be delivered, unless configured otherwise.

Q30b: Is SIGPIPE generated immediately or after some retries?
AA: describes Ack and re-transmission. Sender will notice a missing Ack and RTO will kick in.
%A: I feel TCP module will Not give up prematurely. Sometimes a wire is quickly restored during ftp, without error. If wire remains unplugged it would look exactly like a peer crash.

Q31: if a TCP consumer dies During transmission, what would the producer get?
%A: see Q30.

Q32: if a TCP consumer process dies some time after buffer full, what would the producer get?
%A: probably similar to above, since sender would send a 1-byte probe to trigger a Ack. Not getting the Ack tells sender something. This probe is builtin and mandatory , but functionally similar to (the optional) TCP Keepalive feature

I never studied these topics but they are quite common.


Q: same local IP@2 worker-sockets: delivery to who

Suppose two TCP server worker-sockets both have local address port 80. Both connections active.

When a packet comes addressed to, which socket will kernel deliver it to? Not both.

Not sure about UDP, but TCP connection is a virtual circuit or “private conversation”, so Socket W1  knows the client is If the incoming packet doesn’t have source IP:port matching that, then this packet doesn’t belong to the conversation.