Optional.java notes

Q: if an optional is empty, will it remain forever empty?

— An Optional.java variable could but should never be null, as this instance need a state to hold at least the boolean isPresent.

If a method is declared to return Optional<C>, then the author need to ensure she doesn’t return a null Optional ! This is not guaranteed by the language.

https://dzone.com/articles/considerations-when-returning-java-8s-optional-from-a-method illustrates a simple rule — use a local var retVal throughout then, at the very last moment, return Optional.ofNullable(retVal). This way, retVal can be null but the returned reference is never null.

If needed, an Optional variable should be initialized to Optional.empty() rather than null.

–immutability is tricky

  1. the referent object is mutable
  2. the Optional reference can be reseated, i.e. not q[ final ]
  3. the Optional instance itself is immutable.
  4. Therefore, I think an Optional == a mutable ptr to a const wrapper object enclosing a regular ptr to a mutable java object.

Similarity to String.java — [B/C]

Compare to shared_ptr instance — [A] is true.

  • C) In contrast, a shared_ptr instance has mutable State, in terms of refcount etc
  • B) I say Not-applicable as I seldom use a pointer to a shared_ptr

— get() can throw exception if not present

— not serializable

— My motivation for learning Optional is 1) QQ 2) simplify my design in a common yet simple scenario

https://www.mkyong.com/java8/java-8-optional-in-depth/ is a demo, featuring … flatMap() !!

array-based order book: phrasebook

For HFT + mkt data + .. this is a favorite interview topic, kind of similar to retransmission.

Perhaps look at … https://web.archive.org/web/20141222151051/https://dl.dropboxusercontent.com/u/3001534/engine.c has a brief design doc, referenced by https://quant.stackexchange.com/questions/3783/what-is-an-efficient-data-structure-to-model-order-book

  • TLB?
  • cache efficiency

heap allocation: java Can beat c++

  • case 1 (standard java): you allocate heap memory. After you finish with it you wait for the java GC to clean it up.
  • case 2 (low latency java): you allocate heap memory but disable java GC. Either you hold on to all your objects, or you leave unreachable garbage orbiting the earth forever.
  • case 3 (c++): you allocate heap memory with the expectation of releasing it, so the compiler sets up housekeeping in advance for the anticipated delete(). This housekeeping overhead is somehow similar to try/catch before c++11 ‘noexcept’.

Stroustrup suggested that #2 will be faster than #3, but #3 is faster than #1. I said “But c++ can emulate the allocation as jvm does?” Stroustrup said C++ is not designed for that. I have seen online posts about this “emulation” but I would trust Stroustrup more.

  • case 4 (C): C/c++ can sometimes use local variables to beat heap allocation. C programmers use rather few heap allocations, in my experience.

Note jvm or malloc are all userland allocators, not part of kernel and usually not using system calls. You can substitute your own malloc.

https://stackoverflow.com/questions/18268151/java-collections-faster-than-c-containers top answer by Kanze is consistent with what Stroustrup told me.

  • no dynamic allocation is always faster than even the fastest dynamic allocation. Similar to Case 4
  • jvm allocation (without the GC clean-up) can be 10 times faster than c++ allocation. Similar to Case 2^3
    • Q: Is there a free list in JVM allocator?

https://softwareengineering.stackexchange.com/questions/208656/java-heap-allocation-faster-than-c claims

  • c++ Custom allocators managing a pool of fixed-sized objects can beat jvm
  • jvm allocation often requires little more than one pointer addition, which is certainly faster than typical C++ heap allocation algorithms

max rectangle formed us`any 4points#Google#Rahul 50%

Q (Google onsite): given N Points defined by x,y coordinates, find the largest possible rectangle formed with four corner points drawn from the N points.


worst input? many points line up along the north and west of a rectangle?

From N points, we get (roughly) NN line segments, actual count is up N(N-1)/2.

Any 3 points forming a right-angle corner is represented as a Corner object with 5 fields (or more)

  • 3 Points: clockwise start point, corner point, clockwise end point
  • 2 line segments

A FloatingLineSegment object has 2 fields {length, slope}

Any 2 line segments of equal length and slope forming a parallelogram is represented as a Para object having fields (or more):

  • 4 Points
  • 4 line segments
  • 2 FLSs
  • a single point representing the central intersection

Each line segment has one FLS and one slope. So we scan the NN line segments to populate some hashmaps.

A hashmap of {FLS -> set of lineSegments}

A hashmap of {slope -> set of FLS }. From any one slope value, we can look up the FLSs of that slope + the FLSs of the opposite slope.

Scan the NN line segments. For each line segment L1, compute the opposite slope. Use slope hashmap to get the FLSs. Use these FLSs to get the line segments perpendicular to L1. Discard any of those line segments not joining L1. This is how to populate .. A hashmap of {lineSegment -> set of Corners}

Scan the NN line segments. For each line segment L1, compute FLS. Use the FLS hashmap to get the other line segments. This is how to populate … A hashmap of {lineSegment -> set of Paras}

Are there more Corners or more Paras? I hope at least one of them is O(NN)

(From NN line segments, we could get a large number of Corners objects. I think the count of Corners is too big if all points line up along some big rectangle. I feel we should start from one corner and try all possible rectangles rooted there, then remove this corner from the search space.)

Solution 1: At any point C1, there are N-1 relevant line segments. Using a hashmap {slope ->?}, it takes up to O(NN) time to identify all Corners at C1. If N-1 is 8, then at most 4×4 such Corners, so worst case O(NN) such corners. For each Corner, check if the opposite corner point exists. This check is O(1). This entire process at C1 takes O(NN).

Overall O(NNN) since there are N points like C1. Average case is much better.

I feel O(NN logN) is possible.

— O(NNN) by Rahul — pick any 3 points, check if they form a corner. If yes, compute the opposite corner and check its existence.

–Idea 3: for the N(N-1)/2 line segments, sort by slope, length, lowest x-coordinate, lowest y-coordinate…?
sorting takes O(NN logN)

–idea 4: group N(N-1)/2 line segments by slope and length.

Within each group, there are up to N points and O(NN) line segments.

— O(NN) solution … wrong!

Pick any 2 points as a pair-of-opposite-corners. Work out the locations of the other 2 corners and check if they exist. This check is O(1).

There are O(NN) such pairs, so overall O(NN).


G9 workhorse-algos ] speedCoding #XR

XR said we should take speed coding contests to find out what basic algos are needed.

Opening example — parent/child pairs→tree algos #Indeed is one example of realistic problem that requires common comp-science constructs…

Need to memorize enough to implement on the spot, as these are basic “comp science constructs” needed for “realistic problems”

  • dft/bft, with levels
  • BST find predecessor
  • Binary search in array
  • Merge-sort/insertion-sort/partitioning
  • .. realistic string problems:
  • .. realistic int array problems:
  • max profit; max subarray sum
  • .. realistic matrix problems:

Below are basic algos for hackerrank/codility but NOT applicable to realistic problems typical of Indeed/FB

  • Linked list: remove dupes
  • string: collapse consecutive chars

77 c++IV paperTigers

Avoid spending too much time on this list…. These c++ topics appeared non-trivial (perhaps daunting, intimidating) for years, until I started cracking the QnA interviews. Then I realized in-depth expertise isn’t required.


  1. — real tigers i.e. non-trivial nlg is quizzed
  2. [A] CRPP, SFINAE — real tigers. I got asked about these around 5 times, sometimes in-depth
  3. socket: non-blocking
  4. — Now the paper tigers
  5. open source or commercial instrumentation for memory, dependency instrumentation. See blogpost to Ashish
  6. [s] what debuggers and memory leak detectors are you familiar with?
  7. [a] singleton, factory, pimpl and other design patterns
  8. [s] c++ regex
  9. [s] stream I/O and manipulators
  10. —sockets # many more paper tigers to be listed
  11. udp, multicast, select()
  12. [s] socket buffers
  13. [a] byte alignment
  14. endianness
  15. TCP flow control
  16. TCP handshake and disconnect
  17. ack numbers
  18. partial send
  19. close() vs shutdown()
  20. — STL # many more paper tigers to be listed
  21. [s] STL binders
  22. [s] STL allocators
  23. adapters for containers, iterators and functors
  24. [a] iterator invalidation rules
  25. [s] how is deque different from a vector
  26. RBtree
  27. —concurrency # many more paper tigers to be listed
  28. [A] atomic types
  29. pthread functions
  30. [A] IPC mutex
  31. mutex in shared memory
  32. recursive lock
  33. read-write lock
  34. what if a thread dies while holding a lock?
  35. [s] RAII scoped lock
  36. — multi-file build
  37. forward class declarations (as required in pimpl) and their limitations
  38. [s] C/C++ integration, extern-C — heavily quizzed, but no in-depth
  39. [s] what C features are not supported by c++ compiler
  40. circular dependency between libraries — confusing. real tiger but seldom quizzed
  41. [As] shared lib vs static lib
  42. —integration and data exchange
  43. [A] shared memory
  44. [A] CORBA, RPC
  45. [a] serialization in compact wire format — only primitive data types!
  46. [s] OS system calls vs std library (glibc) — sounds daunting to most developers
  47. —exception
  48. catch by ref or by value or by pointer?
  49. [s] exception guarantees
  50. [s] stack unwinding due to exception
  51. throwing destructors — various questions
  52. —memory
  53. which part of memory do static data members go? How about file scope static variables? How about global variables
  54. [s] preventing heap/stack allocation of my class
  55. [s] custom new/delete,  set_new_handler()
  56. [s] intrusive smart ptr, weak_ptr
  57. [s] ref counting
  58. custom deleter in shared_ptr
  59. [s] reinterpret_cast  # always on pointers
  60. [A] custom allocators
  61. [A] free list in the free store
  62. what if you call delete on a pointer that’s created by array-new?
  63. placement new
  64. out-of-memory in operator-new
  65. —inheritance
  66. dynamic_cast
  67. [A] multiple inheritance
  68. [s] virtual inheritance… which base class ctor gets called first? See https://isocpp.org/wiki/faq/multiple-inheritance#mi-vi-ctor-order
  69. [a] slicing problem
  70. [a] private inheritance
  71. [s] pure virtual
  72. —other low level topics
  73. [s] setjmp, jmp_buf… See the dedicated blog post jmp_buf/setjmp() basics for IV #ANSI-C
  74. [s] cpu cache levels
  75. [s] translation lookaside buffer
  76. [s] what data structures are cache-friendly?
  77. [a] memcpy, memset
  78. [s] ++myItr vs myItr++ how are they implemented differently?
  79. —other language features
  80. [s] RAII
  81. [s] operator-overload
  82. [A] template specialization — part of the STL fabric but largely transparent
  83. [s] ptr to member (function) — seldom used outside library code. I tried the syntax in my binary tree serializer
  84. [A] std::forward() std::move(), rvalue-ref
  85. const and constexp
  86. [a] lambda with capture
  87. [a] double-pointer .. why do you need it?
  88. —-
  89. [s == shallow book knowledge is enough]
  90. [a == actually not that deep, IMHO]
  91. [A == actually deep topic]

At what juncture would kernel scheduler run@@

Kernel scheduler has an algorithm and therefore implemented as a sequence of instructions. You can think of it as some black-box function/routine.

I think it is Not really a long-running background process. In Linux, I believe it is an on-demand routine, but not run on behalf of any process.

Background — Many familiar on-demand kernel routines do run on behalf of an “owner process” —

  • accessing a file or socket
  • accessing some device such as NIC
  • accessing memory

However, other on-demand kernel routines (often interrupt handlers) do not have an “owner process”. Here are some routines —

  • reacting to timer interrupts
  • reacting to low-level emergency hardware interrupts like …. ?

So the scheduler is a classic example. I think scheduler can get triggered by timer interrupts. See P 350 [[linux kernel]]

make_shared() cache efficiency, forward()

This low-level topic is apparently important to multiple interviewers. I guess there are similarly low-level topics like lockfree, wait/notify, hashmap, const correctness.. These topics are purely for theoretical QQ interviews. I don’t think app developers ever need to write forward() in their code.

https://stackoverflow.com/questions/18543717/c-perfect-forwarding/18543824 touches on a few low-level optimizations. Suppose you follow Herb Sutter’s advice and write a factory accepting Trade ctor arg and returning a shared_ptr<Trade>,

  • your factory’s parameter should be a universal reference. You should then std::forward() it to make_shared(). See gcc source code See make_shared() source in https://gcc.gnu.org/onlinedocs/libstdc++/libstdc++-api-4.6/a01033_source.html
  • make_shared() makes a single allocation for a Trade and an adjacent control block, with cache efficiency — any read access on the Trade pointer will cache the control block too
  • I remember reading online that one allocation vs two is a huge performance win….
  • if the arg object is a temp object, then the rvr would be forwarded to the Trade ctor. Scott Meryers says the lvr would be cast to a rvr. The Trade ctor would need to move() it.
  • if the runtime object is carried by an lvr (arg object not a temp object), then the lvr would be forwarded as is to Trade ctor?

Q: What if I omit std::forward()?
AA: Trade ctor would receive always a lvr. See ScottMeyers P162 and my github code

https://github.com/tiger40490/repo1/blob/cpp1/cpp1/rvrDemo.cpp is my experiment.


## optimize code for i-cache: few tips

I don’t see any ground-breaking suggestions. I think only very hot functions (confirmed by oprofile + cachegrind) requires such micro-optimization.

I like the function^code based fragmentation framework on https://www.eetimes.com/document.asp?doc_id=1275472 (3 parts)

  • inline: footprint+perf can backfire. Can be classified as embedding
  • use table lookup to replace “if” ladder — minimize jumps
  • branching — refactor a lengthy-n-corner-case (not “hot”) code chunk out to a function, so 99% of the time the instruction cache (esp. the pre-fetch flavor) doesn’t load a big chunk of cold stuff.
    • this is the opposite of embedding !
  • Trim the executable footprint. Reduce code bloat due to inlining and templates?
  • loop unrolling to minimize jumps. I think this is practical and time-honored — at aggressive optimization levels some compilers actually perform loop unrolling! Programmers can do it manually.
  • Use array (anything contiguous) instead of linked list or maps to exploit d-cache + i-cache
  • https://software.intel.com/en-us/blogs/2014/11/17/split-huge-function-if-called-by-loop-for-best-utilizing-instruction-cache is a 2014 Intel paper — split huge function if it’s invoked in a loop.


check a receiver socket is alive: tcp^udp

Q: given a TCP receiver socket, how do you tell if it’s connected to a session or disconnected?

Shanyou said when you recv() on the socket but got 0 it means disconnected.

http://man7.org/linux/man-pages/man2/recv.2.html#RETURN_VALUE shows recv() return value of 0 indicates dead connection i.e. disconnected.

https://stackoverflow.com/questions/4142012/how-to-find-the-socket-connection-state-in-c uses getsockopt()

Q: given a UDP multicast receiver socket, how do you tell if it’s still has a live subscription to the multicast group?

%%A: I guess you can use getsockopt() to check socket /aliveness/. If alive but no data, then the group is quiet

TCP/UDP: partial or multiple messages in one buffer

This is often mentioned in IV. At least you can demonstrate your knowledge.

What if the UDP datagram is too big for recv() i.e. specified buffer length is too small? P116 [[tcp/ip soclets in C]] seems to say the oversize message is silently truncated.

UDP recv() will only return a single “logical” message [1]. I believe TCP can put partial or multiple messages into one “buffer” for recv().

Q: if my buffer is big enough, will my UDP recv() ever truncate a msg?
%%A: never

Note IP would always deliver a whole msg or miss a whole msg, never a partial msg. See P 329 [[comp networking]]

[1] a logical msg is the payload from one send()

generate paths in ternary matrix #Promethean TomJerry


Similar to binary-matrix cluster count #DeepakM #DST+fullScan but the connection is modeled differently.

–Original requirements: Input matrix has an object in each cell

  • 1 means wall or barrier
  • 0 means passage
  • 2 means cheese station

Jerry is [x][y]. Tom starts at [0][0] and must collect all cheese and deliver to Jerry. Find shortest path. Tom can only move one (U/D/L/R) step at a time.

https://github.com/tiger40490/repo1/blob/cpp1/cpp1/2d/TomJerry.cpp is my half-finished code. It builds a graph connecting all the cells (each a Node object). It’s able to check feasibility. Todo:

  • breadth-first walk from Tom to Jerry, without revisiting any node. Hopefully we can discover all possible paths (i.e. collecting all cheese)
  • determine the shortest.


memory fencing: terminology

Let’s standardize our terminology in this blog to ease searching — “fence instruction” and “memory-fencing” are more flexible words than “barrier”… also shorter.

Visibility-of-update is part of memory fencing.

Q: does memory fencing imply flush to main memory and bypassing register?
A: I think memory fencing is more about statement reordering and less about “flush to main memory”. All of these affect global visibility of a shared mutable.


locate a pair with targetSum==55 #bbg IV #Morris


Q: any O(1) space sort?

Q: From an unsorted array of positive integers, is it possible to find a pair of integers that sum up to a given sum?

Constraints: This should be done in O(n) and in-place without any external storage like arrays, hash-maps, but you can use extra variables/pointers.

If this is not possible, can there be a proof given for the same?

—–Initial analysis—-
I wish I were allowed to use a hash table of “wanted ” values. (iterate once and build hashtable. For Each new value encountered, check if it is in the “wanted” list…)

I feel this is typical of west coast algorithm quiz.

I feel it’s technically impossible, but proof?  I don’t know the standard procedure to prove O(n) is impossible. Here’s my feeble attempt:

Worst case — the pair happens to be the 1st and last in the array. Without external storage, how do we remember the 1st element?  We can only use a small number of variables, even if the array size is 99999999. As we iterate the array, I guess there would be no clue  that 1st element is worth remembering. Obviously if we forget the 1st element, then when we see the last element we won’t recognize they are the pair.
—–2nd analysis—-
If we can find a O(n) in-place sort then problem is solvable [1]. Let’s look at Radix sort, one of the most promising candidates. https://stackoverflow.com/questions/463105/in-place-radix-sort has an implementation for DNA strings.

Assumption 1: the integers all have a maximum “size” in terms of digits. Let’s say 32-bit. then yes radix is O(n) but not sure about space complexity. Now, with any big-O analysis we impose no limit on the sample size. For example we could have 999888777666555444333 integers. Now, 32-bit gives about 4 billion distinct “pigeon-holes”, so by the pigeon-hole principle most integers in our sample have to be duplicates!

Therefore, Assumption 1 is questionable. In fact, some programming languages impose no limit on integer size. One integer, be it 32 thousand bits or 32 billion bits, could use up as much memory as there is in the system. Therefore, Assumption 1 is actually superfluous.

Without Assumption 1, and if we allow our sample to be freely distributed, we must assume nothing about the maximum number of digits. I would simply use

Assumption 2: maximum number of digits is about log(n). In that case radix sort is O(n log(n)), not linear time:(

—– new analysis as of Jun 2017 —-
Can Morris algo sort an array (external storage)? If yes,  then use the 2-moving-pointer algo in locate a pair whose diff=55

However, Morris needs a tree not an array.

[1] locate a pair with targetSum=55 : pre-sorted #O(N) 1-pass

## c++0x features I understand as significant

Here are the subset of C++11 features I understand as significant. Some significant c++11 features I don’t appreciate, such as lambda.

#1 move semantics
– std::move and forward

#2 Lib: smart pointers
– make_shared() etc

#3 Lib: thread library
– atomic<int>


#4 type traits

nested class has special capabilities ] C++^java

Q: Can nested class (NN) methods access a private member of the enclosing class EE? See c++nested class accessing host private member

Can EE methods access NN’s private members? Yes. Tested. See below.

Below is based on P186 ARM, perhaps outdated.
Q: in terms of syntax, how does Inner class methods refer to Outer class’s non-static[1] field?
A: no special access. Unlike java, there’s no hidden “this” field pointing to the Outer _i n s t a n c e_
A: yes if the Inner method has a pointer, reference or a nonref variable of an Outer class _i n s t a n c e_,

Java’s nested class N1 is in the same “private club” along with E2() an Enclose class private method.
– N1 methods can access private members of E2
– E2 methods can access private members of N1
–> Not in C++

Q: So what special capabilities does a nested class has? Why make a class nested at all?
A: to mention the “Inner” type name, you need to prefix it as “Outer::Inner”
A: another specialness — inner class methods can access outer private members. As pointed out on
— in the original C++98 the inner class has no special privileges accessing the outer class.With C++98 compiler you’d either have to give the inner class the necessary privileges (friendship) or expose
the outer members x and y as public. However, this situation was classified as a defect in C++98, and it was decided that inner classes should have full access to outer class members (even private ones).

[1] static is fine — Outer::f