analyzing my perception of reality

Using words and numbers, am trying to “capture” my perceptions (intuitions + observations+ a bit of insights) of the c++/java job market trends, past and future. There’s some reality out there but each person including the expert observer has only a limited view of that reality, based on limited data.

Those numbers look impressive, but actually similar to the words — they are mostly personal perceptions dressed up as objective measurements.

If you don’t use words or numbers then you can’t capture any observation of the “reality”. Your impression of that reality [1] remains hopelessly vague. I now believe vague is the lowest level of comprehension, usually as bad as a biased comprehension. Using words + numbers we have a chance to improve our perception.

[1] (without words you can’t even refer to that reality)

My perceptions shape my decisions, and my decisions affect my family’s life chances.

My perceptions shape my selective listening. Gradually, actively, my selective listening would modify my “membrane” of selective listening! All great thinkers, writers update their membrane.

Am not analyzing reality. Instead, am basically analyzing my perception of the reality, but that’s the best I could do. I’m good at analyzing myself as an object.

Refusing to plan ahead because of high uncertainty is lazy, is pessimistic, is doomed.

CPU run-queue #java perspective

— Mostly based on Charlie Hunt’s [[JavaPerf]] P28

Runtime.availableProcessors() returns the count of virtual processors, or count of hardware threads. This is an important number for CPU tuning, bottleneck analysis.

When a run-queue depth exceeds 4 times the processor count, then host system will become visibly slow (presumably due to excessive context switching).  For a host dedicated to jvm, this is a 2nd reason for CPU saturation. First reason is high CPU usage, which can become high even with a single CPU-hog.

Note run-queue depth is the first column in vmstat output

suffix array of a haystack #[[Pearls]]

[[ProgrammingPearls]] P172 is concise with a single-sentence definition — “Initialize an array of pointers to every character (or word) in your text, sort them, and you have a suffix array”

For an usage, see find All hiding places of needle]haystack #suffixArray

— LCP array + suffix array

The enhanced suffix array comes with additional tables that reproduce the full functionality of suffix trees. The basic suffix array is a construct simplified from a suffix tree.

  • Suffix table — saves the lexicographic rank of each suffix of a haystack.
  • LCP table —- Contains the maximum length of prefix match between two consecutive suffixes, after they are sorted (as in a dictionary) and stored in the suffix array.

Each array element describes one suffix.

Both are integer arrays of length N (i.e. haystack length). LCP saves some lengths; Suffix array saves head positions (these positions are defined within haystack string)

By definition, the LCP is always “helper” for the suffix array. In contrast, the suffix array can be useful by itself.

TCP_NODELAY to improve latency #lower efficiency

https://www.extrahop.com/company/blog/2016/tcp-nodelay-nagle-quickack-best-practices/#3

https://stackoverflow.com/questions/3761276/when-should-i-use-tcp-nodelay-and-when-tcp-cork

The default Nagle’s algo helps in applications like telnet. However, it may increase latency when sending streaming data.

In the case of interactive applications or chatty protocols with a lot of handshakes such as SSL, Citrix and Telnet, Nagle’s algorithm can cause a drop in performance, whereas enabling TCP_NODELAY can improve latency, but at the expense of efficiency, as briefly mentioned in 2011 white paper@high-perf messaging.

In such cases, disabling Nagle’s algorithm is a better option. Enabling the TCP_NODELAY option disables Nagle’s algorithm.

NoDelay means noNagle.

 

reference(instead of ptr) to smart ptr instance

I usually pass smart pointers by value (copy-constructor or move-constructor), just like copying a raw ptr.  Therefore the code below looks unnatural:

unique_ptr<Trade> & ref2smartPtr

Well, my “pbclone” intuition was incorrect.  Actually pbref is rather common because

  • As Herb Sutter suggested, when we need to put pointer into containers, we should avoid raw ptr. Unique ptr is the default choice, and the first choice, followed by shared_ptr
  • I often use unique_ptr as map value . The operator[] return type is a reference to the value type i.e. reference to unque_ptr
  • I may need to put unique_ptr into a vector…. ditto for vector operator[]

[18] G4 IV(!! GTD)domains 2 provide 20Y job security

See also

Let’s ignore zbs or GTD or biz domains like mktData/risk here …

  • –roughly ranked by value-to-me
  • [c s] java? resilient in the face of c# and dynamic languages. At least 10Y relevance.
  • [c s] c++? resilient in the face of java. Time-honored like SQL
  • [c] abstract algorithm and data structures, comp science problem solving
  • [c n] tcp/udp optimization + other hardware/kernel/compiler optimizations
  • ……….No more [c]
  • py + shell scripting? no [c] rating since depth unappreciated
  • Linux and windows? at least 10Y growth, but no [c]
  • [s] SQL? resilient in the face of noSQL, but no [c]
  • bond math?
  • [n s] FIX? At least 10Y relevance
  • [c=high complexity in IV; shelf-life; depth appreciated …]
  • [n=niche, but resilient]
  • [s=survived serious challenges]

de-multiplex by-destPort: UDP ok but insufficient for TCP

When people ask me what is the purpose of the port number in networking, I used to say that it helps demultiplex. Now I know that’s true for UDP but TCP uses more than the destination port number.

Background — Two processes X and Y on a single-IP machine  need to maintain two private, independent ssh sessions. The incoming packets need to be directed to the correct process, based on the port numbers of X and Y… or is it?

If X is sshd with a listening socket on port 22, and Y is a forked child process from accept(), then Y’s “worker socket” also has local port 22. That’s why in our linux server, I see many ssh sockets where the local ip:port pairs are indistinguishable.

TCP demultiplex uses not only the local ip:port, but also remote (i.e. source) ip:port. Demultiplex also considers wild cards.

TCP UDP
socket has local IP:port
socket has remote IP:port no such thing
2 sockets with same
local port 22 ???
can live in two processes not allowed
can live in one process not allowed
2 msg with same dest ip:port
but different source ports
addressed to 2 sockets;
2 ssh sessions
addressed to the
same socket

long term planning can be demoralizing

My father often tells me I plan ahead too much…

Q: where will I be, what job will I have 5 years from now?

Such questions can be demoralizing and sometimes can dampen a precious spirit of optimism. I sometimes perform better by focusing on here and now.

I think the reality may be quite bland and uninspiring — same job, with declining income, not much “offensive” to mount …

## notable linux system calls: QQ question

https://syscalls.kernelgrok.com can sort the functions by function id

http://asm.sourceforge.net/syscall.html is ordered by function id

  • fork()
  • waitpid()
  • open() close() read() write()
  • –socket
  • socket() connect() accept()
  • recvfrom() sendto()
  • shutdown() is for socket shutdown and is more granular than the generic close()
  • select()
  • epoll family
  • –memory
  • brk
  • mmap

##shared_ptr thr-safety: 3 cases

This topic is worthwhile as it is about two high-value topics … threading + smart ptr

At least 3 interviewers pointed out thread safety issues …

http://stackoverflow.com/questions/14482830/stdshared-ptr-thread-safety first answer shows many correct MT usages and incorrect MT usages. Looking at that answer, I see at least 3 distinct”objects” that could be “shared-mutable”:

  1. control block — shared by different club members. Any one club member, like global_instance could be a shared mutable object. Concurrent access to the control block is internally managed by the shared_ptr implementation and probably thread-safe.
  2. pointee on heap — is shared  mutable. If 2 threads call mutator methods on this object, you can hit race condition.
  3. global_instance variable —  a shared mutable instance of shared_ptr. race condition 😦

 

 

 

9tips: hacker rank test #cout-macro

similar to codiliy…

  • I may need a macro for cout, so I can disable all cout quickly before uploading my source.
    • #define ss if(1>2)cout //1>0
  • I keep my own container data dumper function for instrumentation.
  • you can submit multiple times. So as soon as you can pass some tests, please submit.
  • 🙂 You can look at multiple questions at the same time, so feel free to skip tough or time-consuming questions.
  • You absolutely need your own local compiler and ECT environment.
    • I put a small dos window (for g++ and a.exe) over a notepad++ editor. I use F2 to save

The online one is too slow and provides a single screen. You can copy paste code from local IDE, but you can’t copy paste test data. You could try downloading test data, but you need to be really efficient there. Setting up test can take 5 minutes out of average 20 minutes per question.

  • There could be hidden test cases. Don’t worry too much about them. It’s a real achievement if you can pass all the visible test cases.
  • Ignore integer overflow. If there’s a hidden test for it, you will see the failure
  • Don’t ever worry about minor inefficiency. You won’t have the time. Just get it to work first.
  • pass by clone by default. Easier to debug
  • avoid long variable names. Use std abbreviations like li, vec, ma (for map)
  • I maintain many (unnecessary) global variables of generic names like i1, s2
  • 😦 lost connectivity? timer keeps ticking

 

##respect,$,familyTime,spareTime..all benefit from强项

If I take a 强项 job, I kind of sacrifice my muscle building and TrySomethingNew, so I better get good money …, but now I feel I don’t have to.

A typical 强项 job would use (the most complete list)

  • math, analytics
  • high volume data processing
  • unix wizardry
  • heavy text processing
  • heavy SQL
  • heavy scripting
  • some http programming
  • data analysis, perhaps using SQL, scripting etc
  • some combo of java/c++/c#/swing
  • threading