static local var^thread local var

Within a function f, a static local int is shared mutable across threads and unsafe!

Solution — declare the variable with thread local storage, so each thread get a distinct static local object, not shared.

Advertisements

no overflow]TCP slow receiver #non-blocking sender

Q: Does TCP receiver ever overflow due to a fast sender?

A: See http://www.mathcs.emory.edu/~cheung/Courses/455/Syllabus/7-transport/flow-control.html

A: should not. When the receiver buffer is full, the receiver sends ACK to informs the sender. If sender app ignores it and continues to send, then sent data will remain in the send buffer and not sent over the wire. Soon the send buffer will fill up and send() will block. On a non-blocking TCP socket, send() returns with error only when it can’t send a single byte. (UDP is different.)

Non-block send/receive calls either complete the job or returns an error.

Q: Do they ever return with part of the data processed?
A: Yes they return the number of bytes transferred. Partial transfer is considered “completed”.

 

UDP/TCP socket read buffer size: can be 256MB

For my UDP socket, I use 64MB.
For my TCP socket, I use 64MB too!

These are large values and required kernel turning. In my linux server, /etc/sysctl.conf shows these permissible read buffer sizes:

net.core.rmem_max = 268435456 # —–> 256 MB
net.ipv4.tcp_rmem = 4096   10179648   268435456 # —–> 256 MB

Note a read buffer of any socket is always maintained by the kernel and can be shared across processes [1]. In my mind, the TCP/UDP code using these buffers is kernel code, like hotel service. Application code is like hotel guests.

[1] Process A will use its file descriptor 3 for this socket, while Process B will use its file descriptor 5 for this socket.

c++template^java generics #%%take

A 2017 Wells Fargo interviewer asked me this question. There are many many differences. Here I list my top picks. I feel c# is more like java.

  1. (1st word indicates the category winner)
  2. C++ TMP is quite an advanced art and very powerful. Java generics is useful mostly on collections and doesn’t offer equivalents to most of the TMP techniques.
  3. java List<Student> and List<Trade> shares a single classfile, with uniform implementation of all the methods. In c++ there are distinct object files. Most of the code is duplicated leading to code bloat, but it also supports specialization and other features.
  4. java generics supports extends/super. C# is even “richer”. I think c++ can achieve the same with some of the TMP tricks
  5. c++ supports template specialization
  6. C++ wins — java doesn’t allow primitive type arguments and requires inefficient boxing. C# improved on it. This is more serious than it looks because most c++ templates use primitive type arguments.
  7. c++ supports non-dummy-type template param, so you can put in a literal argument of “1.3”
  8. c++ actual type argument is available at runtime. Java erases it, but I can’t give a concrete example illustrating the effect.

 

POSIX-sharedMem^SysV-sharedMem^MMF

http://www.boost.org/doc/libs/1_65_0/doc/html/interprocess/sharedmemorybetweenprocesses.html#interprocess.sharedmemorybetweenprocesses.sharedmemory.xsi_shared_memory points out

  • Boost.Interprocess provides portable shared memory in terms of POSIX semantics. I think this is the simplest or default mode of Boost.Interprocess. There are at least two other modes.
  • Unlike POSIX shared memory segments, SysV shared memory segments are not identified by names but by ‘keys’. SysV shared memory mechanism is quite popular and portable, and it’s not based in file-mapping semantics, but it uses special system functions (shmgetshmatshmdtshmctl…).
  • We can say that memory-mapped files offer the same interprocess communication services as shared memory, with the addition of filesystem persistence. However, as the operating system has to synchronize the file contents with the memory contents, memory-mapped files are not as fast as shared memory. Therefore, I don’t see any market value in this knowledge.

3rd effect@volatile ] java5

A Wells Fargo java interviewer said there are 3 effects. I named

  1. load/store to main memory on the target variable
  2. disable statement reordering

I think interviewer mentioned a 3rd effect about memory barrier.

This quote from Java Concurrency in Practice, chap. 3.1.4 may be relevant:

The visibility effects of volatile variables extend beyond the value of the volatile variable itself. When thread A writes to a volatile variable and subsequently thread B reads that same variable, the values of all variables that were visible to A prior to writing to the volatile variable become visible to B after reading the volatile variable. So from a memory visibility perspective, writing a volatile variable is like exiting a synchronized block and reading a volatile variable is like entering a synchronized block.

https://stackoverflow.com/questions/9169232/java-volatile-and-side-effects address my doubt about “other writes“. Nialscorva’s answer echoes the interviewer:

Before java 1.5, the compiler can reorder the two steps

  1. construction of the new object
  2. assigning the new address to the variable

In such a scenario, other threads (unsynchronized) can see the address in the variable and use the incomplete object, while the construction thread is preempted indefinitely like for 3 hours!

So in java 1.5, the construction is among the “other writes” by the volatile-writing thread! Therefore, the construction is flushed to memory before the address assignment. Below is my own solution, using a non-static volatile field:

public class DoubleCheckSingleton {
	private static DoubleCheckSingleton inst = null;
	private volatile boolean isConstructed = false;
	private DoubleCheckSingleton() {
		/* other construction steps */
		this.isConstructed = true; //last step
	}
	DoubleCheckSingleton getInstance() {
		if (inst != null && inst.isConstructed) return inst;
		synchronized(DoubleCheckSingleton.class) {
			if (inst != null && inst.isConstructed) return inst;
			
/**This design makes uses of volatile feature that's reputed to be java5
*
* Without the isConstructed volatile field, an incomplete object's 
* address can be assigned to inst, so another thread entering getInstance()
* will see a non-null inst and use the half-cooked object 😦
* 
* The isConstructed check ensures the construction has completed
*/
			return inst = new DoubleCheckSingleton();
		}
	}
}

Wells IV #socket/c++/threading

This is one of the longest tech interviews and one of the most enriching 🙂 even though I don’t “like” all their questions — very academic/theoretical, text-book driven, too low-level to be practical, not testing practical zbs.

—— C++ questions:
Q1: how can you avoid the cost of virtual function?
%%A: enum/switch or CRTP

Q1b: what’s CRTP

Q: what if your copy ctor signature is missing the “const”?
%%A: you can’t pass in temporaries or literal values. Such arguments must pass into “const &” parameter or “const” parameter. Correct.

Q: double delete?

Q: policy class? traits class?

Q: STL binders.. use case?

Q: how many types of smart pointers do you use?

Q: difference between java generics vs c++ templates?
%%A: type erasure. Java Compiler enforces many rules, but bytecode saves no info about the type argument, so we can get strange runtime errors.
%%A: template specialization. Template meta-programming
%%A (accepted): code bloat since each template instantiation is a separate chunk of code in the object file and in memory.
A: I have a dedicated blog post on this.

Q: what’s returned by a std::queue’s dequeue operation when it’s empty?
AA: undefined behavior, so we must check empty() before attempting dequeue

Q: why template specialization?
%%A: customize behavior for this particular vector since the standard implementation is not suitable.

Q: how do you implement a thread-safe c++singleton
A: not obvious. See concurrent lazy singleton using static-local var

Q12: in a simple function I have
vector v1 = {“a”, “b”}; vector v2 = v1; cout<<….
What happens to the ctor, dtor etc?
A: 2 std::strings constructed on heap, vector constructed on stack; 2nd vector copy-constructed on stack; 2 new strings constructed on heap; vector destructors deletes all four strings
A: the actual char array is allocated only once for each actual value, due to reference counting in the current std::string implementation.

Q12b: you mean it’s interned?

Coding question: implement void remove_spaces(char * s) //modify the s char array in place. See %%efficient removeSpaces(char*) #Wells

—— threading, mostly in java
Q: What are the problems of CAS solutions?
A: too many retries. CAS is optimistic, but if there are too many concurrent writes, then the assumption is invalid.
%%A: missed update? Not a common issue so far.

%%Q: Synchronized keyword usage inside a static method?
AA: you need be explicit about the target object, like synchronized(MyClass.class)

Q21: Name 3 effects of java volatile keyword — Advanced. See 3effects@volatile ] java5.

Q21b: analyze the double-checking singleton implementation.
staticInst = new Student(); // inside synchronized block, this can assign an incomplete object’s address to staticInst variable, which will be visible to an unsynchronized reader. Solution – declare staticInst as volatile static field

—— System programming (ANSI-C) questions
Q: have you used kernel bypass?

Q5: how do you construct a Student object in a memory-mapped-file?
%%A: placement new

Q5b: what if we close the file and map it again in the same process?
%%A: the ptr I got from malloc is now a dangling ptr. Content will be intact, as in Quest

Q5c: what if Student has virtual functions and has a vptr.
%%A: I see no problem.

Q: you mentioned non-blocking TCP send(), so when your send fails, what can you do?
%%A: retry after completing some other tasks.

Q4: why is UDP faster than TCP
%%A: no buffering; smaller envelopes; no initial handshake; no ack required

Q4b: why buffering in tcp?
%%A: resend

Q: can tcp client use bind()?
%%A: bind to a specific client port, rather than a random port

Q6: is there buffer overflow in TCP?
A: probably not

Q6b: what if I try to send a big file by TCP, and the receiver’s buffer is full? Will sender care about it? Is that reliable transmission?
A: Aquis sends 100MB file. See no overflow]TCP slow receiver #non-blocking sender

Q: in your ticking risk system, how does the system get notified with new market data?
%%A: we use polling instead of notification. The use case doesn’t require notification.

—— Stupid best-practice questions:
Q: what’s the benefit of IOC?

Q: Fitnesse, mock objects

Q6: motivation of factory pattern?
Q6b: why prevent others calling your new()?
%%A: there are too many input parameters to my ctor and I want to provide users a simplified façade
%%A: some objects are expensive to construct (DbConnection) and I need tight control.
%%A: similarly, after construction, I have some initialization in my factory, but I may not be allowed to modify the ctor, or our design doesn’t favor doing such things in the ctor. I make my factory users’ life easier if they don’t call new() directly.
%%A: I want to centralize the logic of how to decide what to return. Code reuse rather than duplication
%%A: I can have some control over the construction. I could implement some loose singleton

java lock: atomic^visibility

A typical lock() operation in a java method has two effects:

  1. serialized access — unless I release the lock, all other threads will be blocked grabbing this lock
  2. memory barrier — after I acquire the lock, all the changes (on shared variables) by other threads are now visible. The exactly details are to be detailed.

I now feel the 2nd effect is often more important and more tricky than the 1st effect. See P94 [[Doug Lea]] . I like this simple summary, even if not 100% complete —

“In essence, releasing a lock forces a flush of all writes from working memory….and acquiring a lock forces reload of accessible fields.”

Q: what are the subset of “accessible fields” in a class with 9 fields?
A: I believe the compiler knows “in advance” what subset of fields are accessed after lock acquisition.

Q: what if I acquire a lock, do nothing and release the lock? Is there the #2 effect?
A: I doubt it. You need to enclose all the “tricky” operations between a lock grab/release. If you leave some update (to a shared mutable) outside in the “cold”, then #1 will fail and #2 may also fail.

You may want to use “synchronized” on a simple getter, for the #2 effect.