coding practice ≈ yoga + weight loss

I had a brief discussion with my friend Ashish.. Coding test improvement is similar to weight or flexibility improvement. I also tell my son about diet control. You probably can think of many other examples.

  • we don’t notice any improvement for a long time. Still, I believe the practice makes me better than before.
    • Optimism and faith is important here. A pessimist tends to give up before seeing any ROI
  • when we stop the practice, we are likely to lose part of the effort and continue the downward spiral
  • as we grow older, we feel we tend to lose the competition, even though there’s not enough evidence.
  • it can feel boring and repetitive
  • pair exercise is easier
    • Otherwise, promise a friend
Advertisements

FIX^MOM: exchange connectivity

The upshot — some subsystem use FIX not MOM; some subsystem use MOM not FIX. A site could send FIX over MOM (such as RV) but not common.

It’s important to realize FIX is not a type of MOM system. Even though FIX uses messages, I believe they are typically sent over persistent TCP sockets, without middle-ware or buffering/queuing. RTS is actually a good example.

Q: does FIX have a huge message buffer? I think it’s small, like TCP

Say we (sell-side or buy-side) have a direct connectivity to an exchange. Between exchange and our exterior gateway, it’s all sockets and possibly java NIO — no MOM. Data format could be FIX, or it could be the exchange’s proprietary format — here’s why.

The exchange often has an internal client-server protocol — the real protocol and a data flow “choke point” . All client-server request/response relies solely on this protocol. The exchange often builds a FIX translation over this protocol (for obvious reasons….) If we use their FIX protocol, our messages are internally translated from FIX to the exchange proprietary format. Therefore it’s obviously faster to directly pass messages in the exchange proprietary format. The exchange offers this access to selected hedge funds.

As an espeed developer told me, they ship a C-level dll (less often *.so [1]) library (not c++), to be used by the hedge fund’s exterior gateway. The hedge fund application uses the C functions therein to communicate with the exchange. This communication protocol is possibly proprietary. HotspotFX has a similar client-side library in the form of a jar. There’s no MOM here. The hedge fund gateway engine makes synchronous function calls into the C library.

[1] most trader workstations are on win32.

Note the dll or jar is not running in its own process, but rather loaded into the client process just like any dll or jar.

Between this gateway machine and the trading desk interior hosts, the typical communication is dominated by MOM such as RV, 29West or Solace. In some places, the gateway translates/normalizes messages into an in-house standard format.

thread^process: lesser-known differences #kernel

  • To the kernel, there are man similarities between the “thread” construct vs the “process” construct. In fact, a (non-kernel) thread is often referenced as a LightWeightProcess in many kernels such as Solaris and Linux.
  • socket — threads in a process can access the same socket; two processes usually can’t access the same socket, unless … parent-child. See post on fork()
  • memory — thread AA can access all heap objects, and even Thread BB’s stack objects. Two process can’t share these, except via shared memory.
  • context switching — is faster between threads than between processes.
  • creation — some thread libraries can create threads without the kernel knowing. No such thing for a process.
  • a non-kernel thread can never exist without an owner process. In contrast, every process always has a parent process which could be long gone.

(This is largely a QQ question.  Some may consider it zbs.)

calling database access method while holding mutex

Hi XR,

Q: If some vendor provides a database access function (perhaps dbget()) that may be slow and may acquire some lock internally, is it a good idea to call this function while holding an unrelated mutex?

We spoke about this. Now I think this is so bad it should be banned. This dbget() could take an unknown amount of time. If someone deleted a row from a table that dbget() needs, it could block forever. The mutex you hold is obviously shared with other threads, so those threads would be starved. Even if this scenario happens once in a million times, it is simply unacceptable.

Here, I’m assuming the dbget() internal lock is completely unrelated to the mutex. In other words, dbget() doesn’t know anything about your mutex and will not need it.

As a rule of thumb, we should never call reach-out functions while holding a mutex. Reach-out functions need to acquire some shared resources such as a file, a web site, a web service, a socket, a remote system, a messaging system, a shared queue, a shared-memory mapping… Most of these resources are protected and can take some amount of time to become available.

(I remember there’s some guideline named open-call but I can’t find it.)

That’s my understanding. What do you think?

## proliferation -> consolidation.. beware of churn

This is an extension of my 2015 blog post https://bintanvictor.wordpress.com/2015/03/31/some-of-the-worst-technology-churns-letter-to-tanko/

Imagine you spent months of serious personal effort [1] learning to use, debug, and tune, say, MongoDB but after this project you only find projects that need just superficial Mongo knowledge. Developer time-investment has no recurring return. I think this is widespread in tech: A domain heats up attracting too many players creating competing products with varying degrees of similarity. We wish these products are mostly similar so we developers’ time investment can help us more than once, rather than learn-n-forget like 狗熊掰棒子. Often it takes decades to see some consolidation among the competitors, when most of them drop out of the race and we one player emerges dominant, or a common standard [2] is adopted with limited vendor extensions.

Therefore I see two phases : Proliferation -> Consolidation. The churn in the early phase represents a hazardous pitfall.

If we invest too much learning effort there we get burned.

  • Javascript packages
  • JVM languages — javascript, Scala, Groovy, jython
    • I don’t even know which company uses them
  • ORM and database access “frameworks”–ADO.net, LINQ, EntityFramework, SpringJDBC,  iBatis,
  • Data Grid and NoSQL — Terracotta, Hazelcast, Gigaspace, Gemfire, Coherence, …
  • MOM — tibco, solace, 29west, Tervela, zeroc
  • machine learning?
  • web app languages
    • php, perl and the LAMP stack
    • Javascript MEAN stack
    • ASP and the Microsoft stack
    • Java stack

[1] You could have spent the time on personal investment, or something else if not required by the project.

[2] Some positive examples of standardization —

  1. RDBMS vendors
  2. Unix vendors
  3. c++ vendors — mostly GCC vs Microsoft VC++
  4. Java IDEs; c++/java/c# debuggers
  5. cvs, svn, git

A few development technologies “free” of proliferation pains —

  1. socket and system programming — complexities are low level and in C not c++
  2. core java
  3. SQL
  4. c/c++
  5. Unix know-how for testing, investigation, devops and process management
    1. shell scripting,
    2. regular expressions
    3. searching

online solutions to algo Q: I always try myself 1st #XR

https://bintanvictor.wordpress.com/2018/04/07/check-array-of-0-to-n-nasdaq/ is the Nasdaq question. It includes a link to my solution in github. I came up with my own solution. Then I tried your solution found online.

https://bintanvictor.wordpress.com/2018/04/10/check-array-can-be-preorder-bst-walk/ is another interview question. I tried my own and came up with a “better” solution.

If I don’t try my own solution, I would not learn a lot, and I would not have fun implementing my own idea (even if not optimal).

Solving problems myself also grows my confidence when facing unfamiliar problems. I do read online solutions when I have no clue.

By the way, one advantage of my solution is — it can identify the missing numbers, all within O(N) time and O(1) space.

make_shared() cache efficiency, forward()

This low-level topic is apparently important to multiple interviewers. I guess there are similarly low-level topics like lockfree, wait/notify, hashmap, const correctness.. These topics are purely for theoretical QQ interviews. I don’t think app developers ever need to write forward() in their code.

https://stackoverflow.com/questions/18543717/c-perfect-forwarding/18543824 touches on a few low-level optimizations. Suppose you follow Herb Sutter’s advice and write a factory accepting Trade ctor arg and returning a shared_ptr<Trade>,

  • your factory’s parameter should be a universal reference. You should then std::forward() it to make_shared(). See gcc source code See make_shared() source in https://gcc.gnu.org/onlinedocs/libstdc++/libstdc++-api-4.6/a01033_source.html
  • make_shared() makes one allocation for a Trade and an adjacent control block, with cache efficiency — any read access on the Trade pointer will cache the control block too
  • if the arg object is a temp object, then the rvr would be forwarded to the Trade ctor. Scott Meryers says the lvr would be cast to a rvr. The Trade ctor would need to move() it.
  • if the runtime object is carried by an lvr (arg object not a temp object), then the lvr would be forwarded as is to Trade ctor?

Q: What if I omit std::forward()?
AA: Trade ctor would receive always a lvr. See ScottMeyers P162 and my github code

https://github.com/tiger40490/repo1/blob/cpp1/cpp1/rvrDemo.cpp is my experiment.

 

deadlock avoidance: release all locks fail-fast

In some deadlock avoidance algorithms, at runtime we need to ensure we immediately release every lock already acquired, without hesitation, so to speak. Here’s one design my friend Shanyou shared with me.

  • Tip: limit the number of “grabbing” functions, functions that explicit grab any lock.

Suppose a grabbing function f1 acquires a lock and calls another function f2 (a red flag to concurrency designers). In such a situation, f2() will use -1 return value to indicate “must release lock”. If f2() calls a grabbing function f3(), and if f3() fails to grab, it propagates “-1” to f2 and f1.

  1. f3() could use trylock.
  2. f3() could check lock hierarchy and return -1.

which thread/pid drains NicBuffer->socketBuffer

Too many kernel concepts. I will use a phrasebook format. I have also separated some independent tips into hardware interrupt handler #phrasebook

  1. Scenario 1 : A single CPU. I start my parser which creates the multicast receiver socket but no data coming. My pid111 gets preempted. CPU is running unrelated pid222 when data /wash up/.
  2. Scenario 2: pid111 is running handleInput() while additional data comes in on the NIC.
  • context switching — to interrupt handler (i-handler). In all scenarios, the running process gets suspended to make way for the interrupt handler function. I-handler’s instruction address gets loaded into the cpu registers and it starts “driving” the cpu. Traditionally, the handler used the suspended process’s existing stack.
    • After the i-handler completes, the suspended “current” process resumes by default. However, the handler may cause another pid to be scheduled right away [1 Chapter 4.1].
  • no pid — interrupt handler execution has no pid, though some authors say it runs on behalf of the suspended pid. I feel the suspended pid may be unrelated to the socket, rather than the socket’s owner process (pid111).
  • kernel scheduler — In Scenario 1, pid111 would not get to process the data until it gets in the “driver’s seat” again. However, the interrupt handler could trigger a rescheduling and push pid111 “to the top” so to speak. [1 Chapter 4.1]
  • top-half — drains the tiny NIC buffer into main memory as fast as possible [2]
  • bottom-half — (i.e. deferrable functions) includes lengthy tasks like copying packets. Deferrable function run in interrupt context [1 Chapter 4.8], so there’s no pid
  • sleeping — the socket owner pid 111 would be technically “sleeping” in the socket’s wait queue initially. After the data is copied into the socket receive buffer, I think the kernel scheduler would locate pid111 in the socket’s wait queue and make pid111 the driver. Pid111 would call read() on the socket.
    • wait queue — How the scheduler does it is non-trivial. See [1 Chapter 3.2.4.1]
  • burst — What if there’s a burst of multicast packets? The i-handler would hog or steal the driver’s seat and /drain/ the NIC buffer as fast as possible, and populate the socket receive buffer. When the i-handler takes a break our handleInput() would chip away at the socket buffer.
    • priority — is given to the NIC’s interrupt handler, since we have a single CPU.

Q: What if the process scheduler wants to run while i-handler is busy draining the NIC?
A: Well, all interrupt handlers can be interrupted, but I would doubt the process scheduler would suspend the NIC interrupt handler.

One friend said the pid is 1, the kernel process.

[1] [[UnderstandingLinuxKernel, 3rd Edition]]

[2] https://notes.shichao.io/lkd/ch7/#top-halves-versus-bottom-halves

if thread fails b4 releasing mutex #CSY

My friend Shanyou asked:

Q: what if a thread somehow fails before releasing mutex?

I see only three scenarios:

  • If machine loses power, then releasing mutex or not makes no difference.
  • If process crashes but the mutex is in shared memory, then we are in trouble. The mutex will be seen as forever in-use. The other process can’t get this mutex. I feel this could be a practical problem, with practical solutions like reboot or process restart.
  • If process is still alive, I rely on stack unwinding.

Stack unwinding is set up by compiler. The only situation when this compiler-generated stack unwinding is incomplete is — if the failing function is declared noexcept. (In such a case, the failure is your self-inflicted problem since you promised to compiler it should never throw exception.) I will assume we don’t have a noexcept function. Therefore, I assume stack unwinding is robust and all stack objects will be destructed.

If one of the stack objects is a std::unique_lock, then compiler guarantees an unlocked status on destruction. That’s the highest reliability and reassurance I can hope for.