[17]MS java threading IV#phone

See also https://bintanvictor.wordpress.com/2009/03/21/realtime-inter-vm-communication-in-front-desk-trading-sys/

These are in the QQ category i.e. skills required for QnA IV only.

Q1: 3 threads to print the numbers 1,2,3,4,5… in deterministic, serial order. Just like single-threaded.

Q1b: what if JVM A has T1, T2, and JVM B has T3? How do they coordinate?
%%A: in C++ shared memory is the fastest IPC solution for large data volume. For signaling, perhaps a semaphore or named pipe
%%A: I feel the mutex is probably an kernel object, accessible by multiple processes.

On Windows, mutex, wait handle, … are all accessible cross-process, but java (on Windows or unix) is designed differently and doen’t have these cross-process synchronization devices.

%%A: use a database table with one row one column. Database can notify a JVM.
AA: The java.nio.file package provides a file change notification API, called the Watch Service API. The registered JVM has a thread dedicated to watching.AA: in java, the JDK semaphore is NOT a wrapper of the operation system semaphore so not usable for IPC
A: java Semaphore? Probably not an IPC construct in java.

Q2: have you used any optimized Map implementations outside the JDK?

Q3: to benchmark your various threading solutions how do you remove the random effects of GC and JIT compilation?
%%A: allocate enough memory to avoid GC. Turn off JIT to compile every code path. Perhaps give the JVM some warm-up period to complete the JIT compilation before we start the benchmark.

Advertisements

[17] 5 thrusts/directions over next5-10Y]U.S.

1) buy 1st home as soon as financially feasible. Before that consider REITs.

2) shift more focus to academic parenting

3) start PhD if everything works out. Will pave the way to a research/teaching career till age 75

4) consider some Chinese-language-teaching business for wife

5) may need to change gear to a relaxed job, but maintain competitiveness on job market (I didn’t say “on the job”)

I feel my health and job market competitiveness (#5) are the foundation.

ms-outlook personal-folders — 2 instances running

There’s no workaround — having 2 instances (ie on 2 machines) of outlook running will inevitably mess up your personal folders.

Symptoms? all email-rules using personal folders will lose the folder names.

What can users do? Not much. If you must run 2 instances, then try to disable the rules that use personal folders.

Location of the personal folders doesn’t matter, even if located on a shared drive.

ms-outlook mailbox folder size limit

A corporate Exchange server allocates typically 100MB for your account, covering
– data in your calendar
– data in your sent items

NOT covering
– archive folders (usually on your PC)
– personal folders (usually on your PC)

— To investigate
Choose your mailbox (or archive / personal folders) -> right-click -> properties -> folder size
Choose your mailbox (or archive / personal folders) -> right-click -> properties -> Advanced You can see storage locations.

— to “shrink” your data
choose your mailbox -> right click -> advanced find -> more choices -> size greater than 999KB -> advanced tab -> choose Received -> on or before -> 2008/12/31 or 2008-12-31 formats

You can
– remove attachments and add a comment [attachment removed tanbin 9feb]
– permanently delete them or
– move them into personal folders on your PC

noSQL landscape is fragmented;SQL is standardized

Reality — Many people spend more time setting up SQL infrastructure than writing query. Set-up includes integration with a host application. They then use _simple_ queries including simple joins, as I saw in ALL my jobs except GS.

The advanced SQL programmers (like in GS) specialize in joins, stored procs, table and index designs. For the large tables in PWM, every query need to be checked. By default they will run forever. In terms of complex joins, a lot of business logic is implemented in those joins.

Good thing is, most of this skill is portable and long-lasting, based on a consistent and standard base technology.

Not the case with noSQL. I don’t have real experience, but I know there are a few distinct types such as distributed hashmap, document stores (mongo) and columnar. So if there’s a query language it won’t have the complexity of SQL joins. Without the complexity it can’t implement the same amount of business logic. So GS type of SQL expertise is not relevant here.

SQL is slow even if completely in-memory

Many people told me flat-file data store is always faster than RDBMS.

For writing, I guess one reason is — transactional. REBMS may have to populate the redo log etc.

For reading, I am less sure. I feel noSQL (including flat-file) simplify the query operation and avoid scans or full-table joins. So is that faster than a in-memory SQL query? Not sure.

Instead of competing with RDBMS on the traditional game, noSQL products change the game and still get the job done.

44tasks@array,str,dict ] algoIV

See also ##9dataStruct(+..)for TIMED c++cod`IV

See also EPI300

With STL, py, java, c#, and every other language, we need to develop fast fingers (ECT) over these basic tasks. Muscle memory is best.

  1. custom comparitor for a payload Trade class; Specify it in vector sorting or BST
    • custom hashcode is only popular in Java algoIV
  2. — custom tree/link node structs
  3. declare node class
  4. initialize a bunch of nodes
  5. — #1 vector of int (vector of other data type is rarely quizzed):
  6. populate with hard-coded data
  7. populate with default values to a fixed size
  8. copy the container — py/java require explicit ctor
  9. join 33 vectors
  10. slice — harder for c++
  11. max
  12. truncate
  13. reverse-copy and reverse-in-place
  14. append, prepend, insert
  15. binary search, lower_bound
  16. dump, iterate, size
  17. — #2 string:
  18. convert between string and vector<char>
  19. initialize with content
  20. prepend, append, trim
  21. insert_by_pos — not easy
  22. replace_substr
  23. read-write char by index
  24. reverse-copy
  25. successive find of target string from left (or from right)
  26. comp, sort among strings
  27. split at custom delimiter
  28. join 3 strings
  29. search
  30. substr
  31. dump, iterate each char, size
  32. — #3 lookup (usually dict or std::map)
  33. populate with hard-coded data
  34. strictly insert
  35. strictly update
  36. lookup, possibly with failure
  37. dump, iterate, size
  38. — tree map: lower_bound
  39. — Queue: deque, enque, front, back,
  40. — Stack: push, pop, top
  41. — linked list? no special tasks
  42. — misc essential iterator tasks
  43. prev(), next(),
  44. distance()

 

##which common UDP/TCP functions are blocking

Q: which blocking functions support a timeout?
A: All

A non-blocking send fails when it can’t send a single byte, usually because destination send buffer (for TCP) is full. For UDP, see [4]

Note read() and write() has similar behavior. See send()recv() ^ write()read() @sockets and http://www.mathcs.emory.edu/~cheung/Courses/455/Syllabus/7-transport/flow-control.html

[1] meaning of non-blocking connect() on TCP is special. See P51[[tcp/ip sokets in C]] and https://www.scottklement.com/rpg/socktut/nonblocking.html
[2b] accept() returning with timeout is obscure knowledge — see accept(2) manpage
[2c] accept() is designed to return immediately on a nonblocking socket — https://stackoverflow.com/questions/12861956/is-it-possible-and-safe-to-make-an-accepting-socket-non-blocking and https://www.scottklement.com/rpg/socktut/nonblocking.html
[3] select() on a non-blocking socket is obscure knowledge —  See https://www.scottklement.com/rpg/socktut/nonblocking.html
[4] UDP has no send buffer but some factors can still cause blocking … obscure knowledge
[5] close() depends on SO_LINGER and non-blocking by default … https://stackoverflow.com/questions/6418277/how-to-close-a-non-blocking-socket
[6] MSG_DONTWAIT flag

t/out default flags arg to func fcntl on entire socket touching TCP buffers?
y select blocking not supported? still blocking! [3]  no
epoll blocking not supported?  no
y recv blocking can change to NB [6] can change to NB  yes
y send/write TCP blocking can change to NB [6] can change to NB  yes
y recvfrom blocking can change to NB [6] can change to NB  yes
y sendto UDP can block [4] can change to NB [6] can change to NB  yes
y close() with SO_LINGER not the default not supported yes yes
close() by default NB [5] not supported can change to blocking yes
y[2b] accept by default blocking not supported yes yes
accept on NB socket [2c] not supported yes no
LG connect()TCP blocking not supported? can change to NB [1] no
LG connect()UDP NB NotApplicable

9tips: hacker rank test #cout-macro

similar to codiliy…

  • I may need a macro for cout, so I can disable all cout quickly before uploading my source.
    • #define ss if(1>2)cout //1>0
  • I keep my own container data dumper function for instrumentation.
  • you can submit multiple times. So as soon as you can pass some tests, please submit.
  • 🙂 You can look at multiple questions at the same time, so feel free to skip tough or time-consuming questions.
  • You absolutely need your own local compiler and ECT environment.
    • I put a small dos window (for g++ and a.exe) over a notepad++ editor. I use F2 to save

The online one is too slow and provides a single screen. You can copy paste code from local IDE, but you can’t copy paste test data. You could try downloading test data, but you need to be really efficient there. Setting up test can take 5 minutes out of average 20 minutes per question.

  • There could be hidden test cases. Don’t worry too much about them. It’s a real achievement if you can pass all the visible test cases.
  • Ignore integer overflow. If there’s a hidden test for it, you will see the failure
  • Don’t ever worry about minor inefficiency. You won’t have the time. Just get it to work first.
  • pass by clone by default. Easier to debug
  • avoid long variable names. Use std abbreviations like li, vec, ma (for map)
  • I maintain many (unnecessary) global variables of generic names like i1, s2
  • 😦 lost connectivity? timer keeps ticking

 

construct graph from list of connections #BGC java

Given an input file showing a list of {string, string} pairs, build a connection graph.

If you have a correct connection graph, then you can easily determine the connectedness (bool) of any 2 nodes. In a social-network, this bool flag indicates whether 2 individuals are completely unconnected or somehow connected.

—-analysis:
I see this as a social-network. Any pair represents an edge connecting 2 nodes.  At any time there are a number of disconnected islands. The next pair could 1) merge 2 islands or 2) add a node to an existing island or 3) create a new island 4) do nothing, if the 2 nodes are already in some existing island

  • Any known node appears exactly once in the entire graph, in exactly one of the islands.
  • All nodes are contained in a lookup table or hashmap  {node -> island}
  • Each island can be a implemented as a hashset of nodes.

So here’s a proposed algo to process a new pair {A, B}. Look for A and B in the  graph. 3 scenarios + a dummy scenario:

  • (Scenario 3) If both A an B are new comers, then they form a new island.
  • if both A and B are already in the graph,
    • (Scenario 4) if they are in the same island, then exit. Nothing to do
    • (Scenario 1) else we can merge the 2 islands
  • (Scenario 2) If A is in island 3 but B is new comer, then B joins island 3

The merge operation is expensive. The big lookup table needs update but here’s an alternative:

  • At merge time, the smaller island would have all the nodes moved to the bigger island. When the island is empty, it gets a pointer “this.redirect” to the bigger island.
  • lookup table needs no update, avoiding locking a global object.
  • At query time, we look up the table to get the original island, then we follow its pointer (defaults to null) until the island is non-empty.
  • endless loop? would only be a programming error.

memory fencing: terminology

Let’s standardize our terminology in this blog to ease searching — “fence instruction” and “memory-fencing” are more flexible words than “barrier”… also shorter.

Visibility-of-update is part of memory fencing.

Q: does memory fencing imply flush to main memory and bypassing register?
A: I think memory fencing is more about statement reordering and less about “flush to main memory”. All of these affect global visibility of a shared mutable.

 

substring search – Boyer-Moore

I think single-string coding questions are evergreen. The insights and techniques might be …reusable?

Brute force search — shift by one, then attempt to match the entire substring. My code below is more efficient than brute-force. Mostly it shifts by one each time, but checks just one char.

Q: write a python function match(string haystack, string nonRegexNeedle), following the simplified Boyer-Moore algo:

Try matching at the left end. If mismatch, then identify the LEFT-most offending char (J) in haystack. Now locate the RIGHT-most position of J in needle (if not found then simply shift all the way past the J.) and right-shift the needle to align the two J.

Q: So when would we right-shift  by one? Not sure. So I don’t know if my code below is 100% correct in all cases.

def visualize(haystack, substring, offset):
  print '\t--' + '0123456789 123456789 12345-----'
  print '\t--' + haystack
  print '\t--' + ' '*offset + substring
  print '\t--' + '-------------------------------'

def match(haystack, substring):
  offset = 0 #offset from start of haystack, where substring is now
  while(True):
    #now locate first offender in haytack
    offenderOffset, offender = -9999, ''
    for i, char in enumerate(substring):
       offenderOffset = offset+i
       if substring[i] != haystack[offenderOffset]:
         offender = haystack[offenderOffset]
         print offender + ' <-- spotted at offenderOffset = ', offenderOffset
         visualize(haystack, substring, offset)       
         break;
         
    if offender == '': return True
    
    # now back-scan substring to locate offender, and then shift to align
    # not really forward-scan haystack
    while True:
      offset += 1
      if offset + len(substring) > len(haystack):
           print 'Gave up complately:'
           visualize(haystack, substring, offset)       
           return False
      if offenderOffset - offset == -1: 
        print 'gave up on aligning: '
        visualize(haystack, substring, offset)
        break
      if substring[offenderOffset - offset] == offender:
        print 'aligned: '
        visualize(haystack, substring, offset)
        break # go back to start of outer while loop

print match('aborb bdb', 'abors') 

 

stress@jobHunt^GTD

Both mental stress and physical stress. Let’s take a step back and compare the stress intensity during job hunt vs GTD stress on the job.

Many people say it’s too stressful and tiring to keep interviewing compared to a long-term job in a big company. Well, I blogged many times that I need to keep interviewing…. The stress is so far manageable.

On a regular job, the GTD stress levels range from 5 to 7 on a scale of 10 (Donald Trump on women;). Often rise to 8.

Became 9 or 10 whenever (as FTE) boss gave a negative feedback. I know from several past experiences. In contrast, contract projects felt much better.

(To be fair, I did improve after the negative feedback.)

During my job hunt including the challenging Singapore lap, my stress level felt like 4 to 7, but often positive stress, perhaps due to positive progress and positive energy.

Conclusion — I always felt more confident on the open market than recovering from setback on a job.

skill: deepen^diversify^stack up

Since 2010, I have carefully evaluated and executed 3 broad strategies:

  1. deepen – for zbs + IV
  2. diversify or branch-out. Breaking into new markets
  3. stack-up – cautiously
  • eg: Deepened my java/SQL/c++/py knowledge for IV and GTD. See post on QQ vs ZZ.
  • eg: diversified to c++, c#, quant, swing…
  • eg: diversify? west coast.
  • eg: diversify? data science
  • eg: diversify? research + teach
  • eg: stack-up to learn spring, hibernate, noSQL, GWT.

Stack-up — These skills are unlikely to unlock new markets. Lower leverage.

GTD stress/survival on the job? None of these help directly, but based on my observation GTD skill seldom advance my career as a contractor. It could create a bit of spare time, but it’s a challenge to make use of the spare time.

I feel German’s strategy goes beyond tech skills. He treats tech skills as a tool, used to implement his business ideas to create business value. I feel his target role is a mix of architect/BA but more like “product visionary”. It’s somewhat like stack-up.

## retirement disposable time usage

See also my framework: Chore^Pleasure activities

  • exercise in the park everyday .. like grandma
  • reflective blogging — likely to be a big time-killer
  • reading as a pastime? GP said at his age, he still loves reading and has many good books at home, but has insufficient physical energy
  • sight-seeing, burning your cash reserve? Grandpa said he is physically unable to
  • — now the more productive endeavors:
  • volunteering for a worthy cause?
  • helping out as grandparents
  • ! … semi-retirement is clearly superior as I would have a real occupation with a commitment and a fixed work schedule

Grandpa pointed out that there are Actually-bigger factors than finding things to do

  1. cash flow
  2. health

java RTTI n reflection

I feel this potentially powerful in GTD, but seldom tested in interviews.

[[thinking in java]] has a concise chapter on RTTI:

  • class object internals
  • class literals
  • new cast syntax, esp. useful in generic programming
  • dynamic proxies
  • dynamically created or hand-written null objects — a technique designed to save you the effort of checking nulls.

%% autoPtr class template

#include <iostream>
using namespace std;

struct Dummy {
	void play() {
		cout << "play " << payload << endl;
	}
	float payload = 0.40490;
};
ostream& operator<<(ostream& os, Dummy const & d){
	os << "[ Dummy object "<<d.payload<<" ]";
	return os;
}
template<typename T> class autoPtr {
public:
	autoPtr(): autoPtr(nullptr){}
	autoPtr(T* raw): needle(raw) {
		needle = raw;
		if (raw)
			cout << "ctor called with *raw == " << *raw << endl;
	}
	autoPtr(autoPtr<T> & other) { //steal the resource
		cout << "copier called with other.real ";
		needle = other.needle;
		if (other.needle) {
			cout << "pointing to "<<*needle;
			other.needle = nullptr;
		}else {
			cout << "== nullptr";
		}
		cout << endl;
	}
	autoPtr<T> & operator=(autoPtr<T> & other) {
		if (&other == this) return *this;
		cout << "op= called ...";
		if (needle) {
			cout << " ... deleting original raw ptr *real = " << *needle;
			delete needle;
		}
		cout << endl; this->needle = other.needle;
		other.needle = nullptr;
		return *this;
	}
	T* operator->() {
		cout << "calling operator->"<<endl;
		return needle;
	}
	T& operator* () {
		cout << "dereferencing this ...";
		if (needle)
			cout << ".... *real == " << *needle << endl;
		else {
			cout << ".... returning a null ptr" << endl; throw "dereferencing a null autoPtr!"; } return *needle; //how do we return a T when real is nullptr? } void set(T * const raw) { delete this->needle;
		this->needle = raw;
	}
	~autoPtr() {
		cout << "Dtor called for *real == ";
		if (needle)
			cout << *needle;
		else
			cout << "(null ptr)";
		cout<< endl;
		delete needle;
	}
private:
	T* needle;
};
int main() {
	autoPtr<Dummy> pdummy(new Dummy);
	pdummy->play();
	autoPtr<int> p1 = (autoPtr<int>)new int(12);
	{
		autoPtr<int> p2(new int(35));
		cout << "Testing deference: *ptr == "<<*p2<< endl;
		autoPtr<int> p3(p2);
		p1 = p3;
	}
	return 0;
}

support MyType a=7 despite q[ explicit ] conversion constructor@@

Background:

  explict MyType(int); // would disallow
  MyType a = 77;

http://en.cppreference.com/w/cpp/language/converting_constructor has a solution:

  MyType a = (MyType) 77; // Static cast would invoke the explicit conversion constructor!

In general, most custom types should make conversion constructors explicit to prevent hidden bugs, but smart pointer need an implicit conversion constructor, to support

  SmartPtr myPtr = new int(77);

–A real example from CFM quant code

 FwdCurve::const_iterator iter = find( key ); //non-const itr

 QL_ASSERT( iter != ((FwdCurve * const) this)->end(), "not found" ); // original code probably before "explicit". 

// I had to change it to
 FwdCurve_iterator endItr = ((FwdCurve * const) this)->end();
 QL_ASSERT( iter != FwdCurve_const_iterator(endItr), "not found" ); //call the conversion ctor explicitly

QQ^ZZ mutual exclusion: c++/java..

Background — The QQ/ZZ framework was first introduced in this post on c++ learning topics

Focus on tough not ordinary topics in c++ or java.

QQ: By and large, tough topics in job interviews are never needed in projects.
ZZ: By and large, tough topics in work projects (zbs) are never tested in job interviews.

Mutually exclusive =>

  • effort (laser energy) I spent on textbook knowledge during job hunt basically hurts my work,
  • effort I spent on tough zbs topics for a project very seldom help with job interviews.

Therefore, some optimizing candidate would make the minimum effort to get by on projects, and channel the energy to QQ knowledge or algo challenges.

java collection in a static field/singleton can leak memory

Beware of collections in static fields or singletons. By default they are unbounded, so by default they pose a risk of unexpected growth leading to memory leak.

Solution — Either soft or weak reference could help.

Q: why is soft reference said to support memory/capacity-sensitive cache?
A: only when memory capacity becomes a problem, will this soft reference show its value.

Q: is WeakHashMap designed for this purpose?
A: not an ideal solution. See other posts about the typical usage of WHM

Q: what if I make an unmodifiable facade?
A: you would need to ensure no one has the original, read-write interface. So if everyone uses the unmodifiable facade, then it should work.

seek successor of a given node in a BST # no uplinks

Input: root node and an arbitrary node A.

We can’t start from A because by moving left/right, we may not be able to locate the successor. So we start from root and will encounter A.

I think this is a simple, barebones in-order walk entering at root. We will encounter A, and the next node encountered would be A’s successor.

Note this barebones walk requires no uplink.

assessing GTD competence with c++/python etc

See also tag t_GTD_zbs

Q: What are some of the yardsticks to assess someone’s practical GTD competence in a specific technology like c++?

Note every time I say “practical”, there’s actually some theoretical knowledge required, so a guy without a middle-school education probably can’t do a good job.

10 fundamental factors:

1. Local system knowledge

2. Troubleshooting

3. instrumentation tools and other tools

4. Devops tools for the chosen language – IDE, build, automated test

5. 3rd-party libraries in the chosen language

6. Only a small subset of theoretical knowledge is relevant to GTD

7. Practical architecture and low-level design

8. Practical tuning in the chosen language

Here are some specific areas to assess a candidate’s GTD competence:

· Finding the best way to make a (localized) change in the local system

· Practical abstraction and refactor to avoid code duplication

· (Devops) Practical error reporting

· (Devops) Practical logging and log management

· (Devops) monitoring, profiling

· Architecture to support multi-module development, often requires a language-specific architecture

seek successor of a given node in a BST #root node unknown

EPI300 Q14.2.

I feel code is hard to write, as it’s hard to visualize or build the tree.

In-order walk must enter at root. If we only have the current node as the only input, then I assume there’s an uplink in each node. Here’s my algo:

Case 1: if i have Right child, then descend there and then descend Left all the way to a leaf node and return. Easy case.

Case 2: now we know i have no Right child. Am i a Left child? If yes then return my parent. Easy case.

Case 3: Now we know I am the Right child, without my own right child. This is the worst case.   My left child (if any) is irrelevant, so effectively I am a leaf node. A right leaf node. Solution: Move up step by step. After the first up-right move,  descend Left all the way.

For predecessor, I found this algo online, but it can’t move up so unable to support Case 2 and 3.

  1. /* Find the inorder predecessor of current */
  2. pre = current->left;
  3. while (pre->right != NULL && pre->right != current)
  4.    pre = pre->right;

Based on that, here’s my successor algo:

  1. /* Find the inorder sucessor of current */
  2. pre = current->right;
  3. while (pre->left != NULL && pre->left != current)
  4.    pre = pre->left;

binary tree in-order walk : notes

This is a favorite algo interview topic.

  • I believe in-order walk will print a binary search tree in left-to-right order.
  • De-confuse — visiting is not same as printing. I believe we need to visit a node P to access its left child, but we must not print P so early!
  • Binary tree is the only thing that uses in-order
  • De-confuse — “node.parent” is not available. Singly-linked tree, not doubly-linked.

Let’s write some python code to

maxProfit, maxSubArraySum #Kadane

One of the top 3 all-time favorite algo questions, worth studying in-depth. I know only two algorithms — (Kanade) disposableCurSubArray and lowWaterMark (my own and Xinfeng Zhou). I think both are equivalent from a few angles.

My algo is intuitive (to me) for level input (delta input can be transformed into levels) . One pass. I feel it’s not inferior in any way.

In contrast, disposableCurSubArray is intuitive for delta input i.e. maxSubArray.

https://github.com/tiger40490/repo1/tree/py1/py/array has a tested solution with lots of explanations.

https://www.geeksforgeeks.org/largest-sum-contiguous-subarray/ has a simpler solution, to be understood. I rewrote it in https://github.com/tiger40490/repo1/blob/cpp1/cpp/array/maxSubarraySum.cpp

  1. special case: If all numbers are negative, then final answer is the “smallest” negative number as a lonewolf array.
  2. special case: If all numbers are positive, then final answer is the entire array
  3. so the tricky case must have a mix of negative/positive

Here’s my explanation of Kadane’s algo:

  • Delta Input array has arr[0], arr[1] .. arr[n]. let’s denote maxSubsumEndingAt(i) as B[i]. The max Subarray Ending At #55 is a continuous subarray starting somewhere before #55. It can be a lone-wolf containing only #55, as an important special case. In fact, in the mixed positive/negative array, we usually encounter this case.
  • (2-pointer algo) We maintain a left marker and a right marker. Both move to the right. maxSubArrayEndingAt(i) is basically an accumulating subarray from left marker to right marker.
  • B[i] == max(B[i-1] + arr[i]    vs  arr[i] )
    • if  B[3] is negative i.e. all 4 sub-array sums ending in arr[3] are all negative, then B[4] should not include any of them. We can discard them for good and “start afresh“(while keeping the global maxSumSeen)
    • else, there exists a “positive” sub-array ending at arr[3], so we keep growing it until it becomes negative.
    • (See github python code) I can imagine a left-marker, the start of the current max subarray. We will move the right marker to grow the sub-array if the current sub-array is useful i.e. positive. Otherwise, we start afresh and the left marker jumps and coincide with right-marker
  • Note sign(B[i-1]) is key but sign(arr[i]) is irrelevant

Here’s my attempt to connect my lowWaterMark to Kanade’s algo:

I suspect that whenever we move the left marker, we always have a new lowWaterMark.

(Before the first element, Level is defined as 0 . In case III, the low water mark is usually a negative level.) Suppose A few elements after hitting low water mark level=-66, we hit level=-22. This is not a new water mark. maxSubsumEndingAt[this element] is actually positive since there exists a subarray starting right after the “level=-66” element!

When we hit level=-77, a new water mark, the maxSubsumEndingAt[this element] is basically zero, as the disposableCurSubArray is discarded. We start afresh to accumulate. Essentially, we reset the “base” to be the new water mark.



			

##tough n high-leverage c++topics#IV QQ

I used to feel I have so much learning(absorption) capacity, but now I feel in my finite career I can’t really master and remember all the tough c++ topics.

Practical solution — Classify each difficult c++topic into one of

  1. QQ: high impact on QnA interview, probably the only type of high-leverage tough topic. Largely textbook knowledge. As such I’m basically confident I can learn all the basics on my own (perhaps slower than Venkat), provided I know the topics.
    1. including “coding questions” designed really for knowledge test, rather than algorithm thinking
    2. eg: HFT, Venkat…
  2. high impact on algo coding IV? No such topic. These coding IV are not about knowledge in tough topics!
  3. ZZ: high impact on GTD zbs — inevitably Low leverage during job hunt
  4. 00: no high impact on anything

Q: Is there a tough topic in both QQ and ZZ? I don’t know any.

I think 00 will be the biggest category:

  • [00] template wizardry;
  • [00] template specialization
  • [00] MI;
  • [00] operator overloading;
  • [00] pthread
  • ————-
  • [QQ]
  • [QQ] move semantics
  • [QQ] [p] boost common lib
  • [QQ] optimization tricks. Remember MIAX and SCB IV by Dmitry
  • [QQ] [p] singleton implementation — not really tough
  • [QQ] pimpl — not really tough
  • [QQ] op-new/malloc (interacting with ctor)
  • [QQ] memory layout
  • [QQ] [p] struct alignment
  • [QQ] specific threading constructs
  • [QQ] smart ptr details
  • [QQ] ptr as a field
  • [QQ] implement auto_ptr or ref-counting string
  • [QQ] [p] UDP —
  • [QQ] [p] multicast
  • [QQ] select()
  • [QQ]
  • [ZZ] IDE set-up
  • [ZZ] compiler/linker/makefile details
  • [ZZ] debuggers
  • [ZZ] crash analysis, like segmentation fault
  • [ZZ] c^c++:SAME debugging/tracing/instrumentation skills #ZZ

[p=paper tiger. ]

Trex QnA IV #low-level C

Q: false sharing? cache coherence? cache line?

Q5: 3 threads in my process, and one of them calls fork(). What are the threads in the child process and which thread will be running?
Q5b: what potential problems do you see?
%%A: some resources like locks could be in a bad state

Q: how do you start a thread? %%A: pthreads_create()
Q1: how do you start a process? %%A: fork()
Q1b: what happens to the parent’s signal handlers, file handles etc

Q: how does the parent process wait for the child process to end?
AA: wait()

Q: sigkill vs sigterm?
%%A: can’t catch sigkill

See other questions discussed in unix signal #Trexquant IV

Q: why we need isnan(myFloat) rather than if (myFloat == Nan)
Q2: I have a float variable1=20180213, and I cast it to int. What’s the int value?
Q2b: two floating point numbers are both very close to 1. In fact they are the closest possible pair. What’s your estimate of their difference?
Q2c: what’s the representation of a float object?

##respect,$,familyTime,spareTime..all benefit from强项

If I take a 强项 job, I kind of sacrifice my muscle building and TrySomethingNew, so I better get good money …, but now I feel I don’t have to.

A typical 强项 job would use (the most complete list)

  • math, analytics
  • high volume data processing
  • unix wizardry
  • heavy text processing
  • heavy SQL
  • heavy scripting
  • some http programming
  • data analysis, perhaps using SQL, scripting etc
  • some combo of java/c++/c#/swing
  • threading

unix signal Lesson 1 #Trex IV

Q: why did you say the signals sent are pending and there’s a delay in response?
%%A: if the target process is executing some special instructions the kernel won’t deliver the signal.
AA: For example waiting for network/disk IO — such a process may be almost unkillable. https://major.io/2010/03/18/sigterm-vs-sigkill/ says a reboot is required.

%%A: It’s possible that when target process gets a time slice it then gets a chance to check the “signal table”.
AA: A Unix signal is picked up only at start of a time slice by the recipient “cpu-driver”. See P126[[art of debugging]].

So I gave two correct answers to Olgun of Trexquant but he didn’t acknowledge. I also described the (common) scenario of a one-core machine. The signaling process need to give up the CPU before the receiving process can react to it.

Postman drops a letter on your doorstep when you are out. This is the best way to deliver the signal. It’s uncommon (impossible?) to interrupt a process /midstream/ a sequence of instructions. I guess the reason is efficiency — no efficient implementation of such a “real-time” interrupt mechanism. In Unix, what is common is the signal mechanism.

Pending signals for a parent process is not inherited by a forked child process.

Unix signals can be generated from interactive user, from any other process, from kernel or from hardware, but they all have a target PID.

Unix Signal is at a lower level than threading. Thread preemption often depends on Unix signals

Unix Signal is a kernel-level mechanism.

Unix Signals target a process, not a thread.

A Unix Signal is an event. A Signal handler is an event callback function whose address is associated with one specific signal.

voluntary pay cut – often unnecessary

A few times in my career I considered to (or did) take a pay cut, mostly to break into some domain or learn something hard to acquire, things like

* quant
* C++ HFT
* dotnet
* oracle DBA
* technical pre-sales
* threading, MOM for trading engine

Each time, it’s crucial to question and be critical about the promised benefits. Most of the time, we would regret the sacrifice.

https://bintanvictor.wordpress.com/2015/11/14/c-learning-aids/ has some c++ learning aids that could possibly reduce the need for those pay-cuts

quant^HFT^WestCoast, again

After I felt fairly established on Wall St (around 2011), I looked for greener pastures:

  1. quant dev
  2. HFT
  3. high-end positions in the west coast. Not sure what positions exactly — mobile? machine learning? cloud?
  4. regular Wall St c++  job

In fact my java jobs on Wall St is not that inferior, and has the advantage of reachability. In contrast, those greener pastures still look rather distant. They look closer when I’m in a positive mood. But let’s put on the black hat and be critical and skeptical:

Quant dev is low-churn but demand is kinda shrinking.

HFT is rather niche skillset, possibly less relevant to west coast than java and regular c++.

1) 2) 3) are mostly FTE, not contracts. Regular c++ is contract-friendly and more reachable.

I used to feel the quant domain is hardest. Now I feel it’s shrinking. Now I feel I’m in much better shape. I made the decision to focus on quant early, assuming I could self-study and break into HFT later.

Data science? Math is much easier than quant finance, and I have some training in it.

C++? I now have more hands-on experience

c++method default arg isn’t saved]vtable

  1. Google c++ style guide forbids default args in virtual functions. It’s error-prone.
  2. [[c++primer]] P607 has a slightly more liberal advice — on a virtual function, subclass/baseclass always use same default arg.

Ashish said he was asked a common question in multiple c++ interviews. The default arg value is “declared” in base class and (or) derived class but never saved in the vtable. It’s basically decided statically (i.e. at compile time).

If you use a pointer to Base to invoke the virtual method, then the Base version’s declared default arg applies unconditionally, even though subclass override is chosen at run time, via the vtable. Highly confusing.

## insertion sort: phrasebook

  • nearly sorted — for nearly-sorted sample, beat all other sorts  including counting sort
  • small — for small sample, beats MSD radix sort and all comparison sort
    • switch — many comparison sort implementations switch to insertion-sort when the sub-array becomes small
  • poker — Intuitively, this is how players sort poker cards
    • shift — requires shifting part of the sorted portion.

waste time@c++IDE

c++ IDE build is 20% (up to 50%) more complicated than java, partly because c++ compilation is more complicated than java. Another reason is my previous investment in learning java compilation.

It can take many hours to fix an IDE project, but the long term return and leverage is questionable, since I use c++ IDE much fewer than java IDE.

Eclipse is my choice, although there are many usability issues and apparent bugs. Far from ideal.

Experience — devcpp is simpler.

Experience — codeBlocks xx isn’t worth the effort

 

java generics wild cards – too many warnings/errors

Intro: If your project requires generic wild cards that’s too hard for your team’s knowledge level, then sooner or later you need to make a choice.

The complexity may grow out of hands. The compiler errors are non-trivial. Worse still, some Generics errors are runtime errors.

Sugg: see if you can remove generics completely from some classes. Use cast instead.

I feel in most cases, you only need to use "extends" and not "super". I think it can still be too hard.

Here’s one of my projects — the EventQueue project in the 2017 HSBC coding interview. I had to use generic wildcards like

Subscriber<T extends BaseMessage>

SubsriberFilter<T extends BaseMessage>

CallbackTask<T extends BeaseMessage>

When we pass these objects into methods, we face annoying compiler errors or warnings. Most warnings are unnecessary warnings (I think compiler is not smart enough).

Some methods are designed for BaseMessage like …

Other methods are often designed for “T extends BaseMessage”

Yet other methods are designed for a specific subtype PriceMessage.

I feel it’s often easier to use the BaseMessage as argument type. If un-compilable, I often remove the type parameter.

Small tip: if “instanceof ArrayList” gives generics warning, then use ArrayList.class.isInstance().

small tip: use Subscriber<?> can sometimes suppress a warning

small tip: some casts can suppress a warning

##[18] Algo problem solving as hobby@@

  • how does this compare to board game?
  • how does this compare to jigsaw puzzles
  • how does this compare to small mobile app development as a hobby?
    • small mobile game development
  • how does this compare to photography as a hobby?
  • how does this compare to blogging as a hobby
  • how does this compare with DIY home improvement
    • woodwork
  • how does this compare to auto tuning and bike building
  • how does this compare with hi-fi tuning

Every one of them is better than TV, gaming, sight-seeing or drinking. Each one requires consistent effort, sustained focus. Each one will become more frustrating less exciting before you reach the next level.

instance field: char-array without terminating null

This is a best practice if those char-arrays are short and this class gets instantiated or (de)serialized frequently.

Drawback — when you print this field, the printing function would keep printing beyond the field boundary looking for the terminating null. I saw this in my nyse-intg parser

So as a personal best practice I often saves the terminating null in my char-array field

find sub-array with exact target sum #O(N)#1pass clever

#include <iostream>

/* determines if the there is a subarray of arr[] with sum equal to 'sum' 
Nice and simple algorithm. Works for exact match only.
*/
void subArraySum(int const arr[], int const n, int const sum)
{
	int curr_sum = arr[0], le = 0, ri;

	/* Add elements one by one to curr_sum and if the curr_sum exceeds
the sum, then remove starting element */
	for (ri = 0; ri <= n-1; ri++) {
/* If curr_sum exceeds the sum and sub-array isn't single,
then remove the starting elements*/
while (curr_sum > sum && le < ri) {
			printf("curr_sum = %d too high. le = %d; ri = %d\n", curr_sum, le, ri);
			curr_sum = curr_sum - arr[le];
			le++;
		}

		if (curr_sum == sum) {
			printf("Sum found between indexes %d and %d\n", le, ri);
			return;
		}
		// now curr_sum is too small or sub-array is single
		curr_sum = curr_sum + arr[ri+1];
	}
	printf("No subarray found\n");
}
int main()
{
	int const arr[] = { 11,24, 2, 4, 8, 7 };
	int const n = sizeof(arr) / sizeof(arr[0]);
	int const sum = 7;
	subArraySum(arr, n, sum);
}

 

python has limited treeMap, array

Relevant to some coding questions (like Facebook) that emphasize big-O.

— vector — The default list is a vector, with constant time random access.

Just like std::vector, deleting from beginning is slow, according to https://stackoverflow.com/questions/19441488/efficiency-of-len-and-pop-in-python

— fixed array — https://stackoverflow.com/questions/176011/python-list-vs-array-when-to-use shows array type is rarely needed.

— tree map — not popular

https://github.com/tiger40490/repo1/blob/py1/py/tree/simpleBST.py is my own BST without self-balancing.

personal advantages: trying west coast

See also https://1330152open.wordpress.com/2016/06/27/stickyusa-sgp-5-advantages-each-personal-view-610/

  1. Advantage: no kids with me. Easier to adjust
  2. Advantage: no house yet. Easier to move
  3. strength: algo interviews
  4. strength: I could be good at optimizing, research, scalability — natural not artificial complexities
  5. strength: attention to details — more valued in product companies than on Wall St
  6. Advantage: web technology is probably easier than c++
  7. strength: I’m a calculated risk taker

git | 3trees

Three trees are 1) workTree 2)staging i.e. Index 3)HEAD or any commit before

A change can be in three states (https://git-scm.com/book/en/v2/Getting-Started-Git-Basics#_the_three_states): 1) modified locally 2) staged 3) committed. “Uncommitted” changes include 1/2.

http://blog.osteele.com/2008/05/my-git-workflow/ has a nice flow chart:

$ git diff HEAD # compares workTree vs HEAD
$ git diff # by default compares workTree vs staging/index
$ git diff –staged # compares staging vs HEAD, least useful.

https://www.atlassian.com/git/tutorials/undoing-changes/git-reset is my first and most memorable introduction on the three trees of git, in the context of git-reset.

$ git reset #by default means q(git reset –mixed HEAD) and updates the HEAD ref + staging
$ git reset –hard # also modifies workTree

—-git diff –help shows a summary

$ git diff (1)
$ git diff –cached (2)
$ git diff HEAD (3)

1) Changes in the working tree not yet staged for the next commit.
2) Changes between the index (i.e. staged/cached) and your last commit; what you would be committing if you run “git commit” without “-a” option.
3) Changes in the working tree since your last commit; what you would be committing if you run “git commit -a”

python initializing list^dict differently #ECT

Favor {} to dict() due to performance. Also dict() can’t take numbers as keys 😦

pprint({1:’value1′,   3+2:’value2′})

Empty [] is faster than list() but I prefer list() because

See https://stackoverflow.com/questions/5790860/and-vs-list-and-dict-which-is-better

0%ROTI:php,mysql,javascript..really@@

2000 – 2002 are the first few years I spent in IT and had a deep impact on my outlook. However, there are many overstatements:

  • Too early to say — javascript had a surprise revival, even on Wall St! I have not decided to go back there.
  • Too early to say — perl was widely used on Wall St and was a key factor to my survival in GS.
  • SQL — skills I acquired in GS is not completely irrelevant. Many (financial etc) systems still use it. Perhaps less used on west coast in web 2.0 shops.
  • php — investment was not 100% lost. It did provide me a job at NBC. I think this is still a valuable skill on west coast. My php confidence is an asset.
  • mysql — investment was not completely lost. I would say my mysql experience gave me enough confidence and competence to take on other database systems.
  • apache — investment gave me valuable insight into network servers. I think apache is still widely used outside Wall St.
  • weblogic — investment was 90% lost but luckily I didn’t invest too much

 

##a few projects technically too tough4me

See also https://bintanvictor.wordpress.com/2017/05/29/transparentsemi-transparentopaque-languages/

I was technically very, very confident in my 20’s to early 30’s until I was bogged down in some tough projects. I think many successful and competent techies experience the same at least once in their career. They though they were powerful, invincible but then a tough project was given to them that they struggled to get done. Later someone else got it done without much effort. (For a public domain example, look at the kernel project in GNU.)

It can be harmful to dive so deep into past personal limitations but then a hard, honest review could be life-enhancing. I see myself as a tough grown-up guy so I can take the pain.

  • Eg: GS rule-engine re-architecture in 2007 — too early
  • Eg: code generator in Mac Quant team
  • eg: see the opacity issues in https://bintanvictor.wordpress.com/2017/06/12/xp-3typestech-zbs-challenges/
  • eg: the mysterious “tp” in the g++ command line when I run “make debug”

stay long→move up to hands-off roles: hurts fitness

I think a few people I know said they deliberately avoided hands-off roles. Avichal, Nitin of Morgan Stanley, .. If I ever become a manager, I feel 100% sure I would adopt the same career strategy.

(How about German Cheung?)

In the US and in Singapore, I stayed away from hands-off manager or client-facing architect/pre-sales positions.  One of the top 3 justifications is to stay fit. Hands-on roles keep me fit to the job market.

In 80 to 90% of the financial IT job interviews I attended, there’s a non-trivial technical screening, usually a coding test. However, I am an biased statistician or there’s no science here — nothing but personal observations.

Backgrounder — (See also https://bintanvictor.wordpress.com/2016/03/03/long-term-system-expertcontractor/) I did hope to stay in a job for a few years and grow into a system expert. In a few places, it turned out that in one team (say up to 10 people), there are multiple system experts each with many years of experience in the local system. Even if you have 3 years experience in this local system, you may not be appointed the team lead. The chance is 50/50 … could be 30% or 70%. So do you want to remain here for 3 more years and hope for a leadership role?

When I couldn’t rise up in the team, I always decide to move “out” of the comfort zone into the “cold” — every job interview has a technical screening. I soon came to the conclusion that to stay relevant and marketable, I must maintain portable (non-local) tech skills. Many techies at my age wouldn’t choose this route partly because only a minority of techies can stay fit after 45.

Well, I see myself in that (lucky) minority. I don’t mean to say I’m as fit as at 30, but I am fit enough to pass the tech screening.

my quicksort in python/c++

import random
# partition the given section by fixing the anchor
def partitionUsingFarRight(A, le, ri):
    pivotValue = A[ri] # immutable value
    pivotPos = i = ri
    while True:
        if (i <= le): return pivotPos
        i -= 1
        if A[i] > pivotValue:  
          swap(A, pivotPos-1, i)
          swap(A, pivotPos-1, pivotPos) 
          pivotPos -= 1
          
def partitionUsingFarLeft(A, le, ri): 
    # optional: swap a random object to the far left
    swap(A, random.randint(le, ri), le)
    benchmark=A[le]
    ret = i = le
    while True:
        if i == ri: return ret
        i +=1 #move pointer
        if A[i] < benchmark: # 3-step adjustment
            swap(A, ret+1, i)
            swap(A, ret+1, ret)
            ret +=1
    
def partition(A, le, ri):
    return partitionUsingFarLeft(A, le,ri)
def recurse(A, le, ri): 
    if le>=ri: return
    print 'entering partition()...',
    print(A[le:ri+1]), ' pivotVal =', A[ri]
    anchor = partition(A, le, ri)
    print '...after partition()   ',
    print(A[le:ri+1])
    recurse(A, le, anchor-1)
    recurse(A, anchor+1, ri)
def swap(A, x,y):
    tmp = A[x]
    A[x] = A[y]
    A[y] = tmp
def qsort(A):
    recurse(A, 0, len(A)-1)
    print(A)
 
qsort([222,77,6,55,3,11,5,2,88,9,66,22,8,44,1,33,99,7,66])

Above is py, below is c++


#include <iostream>
#include <vector>

std::vector<int> A{77, 11, 66,22,33,99,44,88, 77, 55, 0};
int const size = A.size();

void dump(int l=0, int r=size-1) {
	for (int i = l; i <= r; ++i)
		std::cout << A[i] << ' ';
	std::cout <<std::endl;
}

template <typename T>
void swap(int pos1, int pos2) {
	if (A[pos1] == A[pos2]) return;
	T tmp = A[pos1];
	A[pos1] = A[pos2];
	A[pos2] = tmp;
}

/*partition the region [l,r] such that all elements smaller than
pivotValue are on the left of pivotPosition
*/
template <typename T>
int partitionUsing1st(int l, int r) {
	T const pivotVal = A[l];
	int pivotPos = l;
	for (int i = l+ 1; i <= r; ++i) { 
		if (A[i] >= pivotVal) continue;
		swap<int>(pivotPos + 1, i);
		swap<int>(pivotPos + 1, pivotPos);
		++pivotPos;
	}
	return pivotPos;
}
template <typename T>
int partitionUsingLast(int l, int r) {
	T const pivotVal = A[r];
	int pivotPos = r;
	for (int i = r - 1; i >= l; --i) {
		if (A[i] <= pivotVal) continue;
		swap<int>(pivotPos - 1, i);
		swap<int>(pivotPos - 1, pivotPos);
		--pivotPos;
	}
	return pivotPos;
}
/*based on [[Algorithms unlocked]], should but doesn't minimize swaps!
Lime zone -- items smaller than pivot value
Red zone -- items bigger than pivot value
Ugly zone -- items yet to be checked
*/
template <typename T>
int partitionMinimalSwap(int le, int ri) {
	T const pivotVal = A[ri];
	// two boundaries exist between zones
	int redStart = le;
	// we start with redStart == uglyStart == l, which means item at le is Unknown
	for (int uglyStart = le; uglyStart < ri; ++uglyStart) {
		if (A[uglyStart] < pivotVal) {
			swap<int>(uglyStart, redStart);
			redStart++;
		}
	}
	swap<int>(ri, redStart);
	return redStart;
}

template <typename T>
void recurse(int l, int r) {
	if (l >= r) return; // recursion exit condition
	int const anchor = partitionMinimalSwap<T>(l, r);
	recurse<T>(l, anchor-1);
	recurse<T>(anchor+1, r);
}

int main() {
	recurse<int>(0, size-1);
	dump();
	return 0;
}