fastestSortFor: one str; arrayOfStr; 32bit ints #non-comparison

I think the big-O coding question may require this knowledge, because

….real world sorting problems can often use key-based sorting, rather than comparison-based sorting.

Therefore, if your overall solution requires sorting, don’t assume it would become O(J*K+M^2+P*N logN) based on your O(N logN) sorting assumption. Here are some achievable linear-time sorts:

sort O() data type ref notes
counting O(N) chars
 radix O(N) 64-bit ints O(Nw) -> O(N)
 radix O(N) variable-size strings  [1][2] fastest
 burst O(N) variable-size strings 2nd fastest. key-based, trie-based
 radix O(N) floats  [3] IEEE format
 radix O(N) date/time converting to fixed-length string or fixed-size integers
 bucket? O(N) random ints in python/perl unbounded. No limit on #digits

Note if data is nearly sorted, then insertion-sort will outperform everyone else.

[1] https://www.cs.princeton.edu/~rs/AlgsDS07/18RadixSort.pdf

[2] https://link.springer.com/chapter/10.1007%2F978-3-540-89097-3_3 paper presented a MSD radix sort for large collections of strings, faster than any other string sorting algorithm.

[3] http://www.boost.org/doc/libs/1_59_0/libs/sort/doc/html/sort/sort_hpp/float_sort.html

Advertisements

How we achieved 400 to 700 kmps #reinterpret_cast

  • ! Many threads but each in single threaded mode. No shared mutable.
  • ! allocation avoided — on millions of Output message objects. Pre-allocated ring buffer eliminates new()
  • ! allocation avoided — on millions of Input message objects, thanks to reinterpret_cast() on pointers. (Allocation would have returned a new address!)
  • ! Stateless Parser. Downstream component (Rebus) handles cancels, modifications..
  • Hot fictions use pbref or RVO or move semantics, minimizing stack allocations
  • Socket tuning
  • Output list (of messages) buffering
  • aggressively simplified parsing. Minimal logic
  • large containers are pre sized. No reallocation
  • Very fast duplicate seq check — a hot function
  • Special Logger to reduce I/O cost
  • Sharding to speed up symbol lookup
  • No virtual functions
  • Inline
  • —-Speed with control
  • Best of both to reduce the real risk of our connection becoming slow or lossy
  • Depth monitor to detect missed messages
  • Capacity tracker to monitor mps rates
  • Missed seq reporting

##elegant/legit simplifications ] algoIV

  • eg: consumer thread dequeue() method. When empty, it “should” be waiting for a notification, according to Jun of Wells Fargo, but a simpler design returns a special value to indicate empty. The thread can then do other things or return to the thread pool or just die. I think the wait model is not ideal. It can waste a thread resource. We could end up with lots of waiting threads.
  • Eg: https://bintanvictor.wordpress.com/2017/09/10/lru-cache-concise-c-implementation/ requirement is daunting, so it’s important and completely legitimate to simplify lookup() so it doesn’t insert any data. API is simpler, not incomplete
  • Eg: find every combination adding up to a given target #Ashish permutation within each combination is a separate issue and not the main issue
  • recursive solution is often a quicker route to working code. Once we cross that milestone, we could optimize away the recursion.
  • eg: extract isPrime() as a unimplemented function and simply assume it is easy to implement when I get around to do it.
  • Eg: show free slots between meetings #bbg I solved a similar and more familiar problem.

## how might jvm surpass c++]latency

A Shanghai Morgan Stanley interviewer asked in a 2017 java interview.

One hypothesis — no free() or delete() in java, so the memory manager doesn’t need to handle reclaiming and reusing the memory.  [[optimizedC++]] P333 confirmed the c++ mem mgr need that. Instead the GC uses a very different algorithm — see below.

One hypothesis — after a warm-up period, based on heuristics JIT compiler could aggressively compile bytecode into machine code with speedy shortcuts for the “normal” code path + special code path to handle the abnormal conditions. Here’s an analogy — if every borrower seen so far has acceptable credit score, the bank may simplify credit check and have special procedure to deal with defaults.  For most of the cases this work flow is faster than the traditional.

http://www.javaworld.com/article/2076593/performance-tests-show-java-as-fast-as-c–.html is a 1998 research.

In my GS-PWM days, a colleague circulated a publication that java could match C in performance, but they didn’t say “surpass”

https://stackoverflow.com/questions/1984856/java-runtime-performance-vs-native-c-c-code is not a published expert but he says:

On average, a garbage collector is far faster than manual memory management, for many reasons:

  • on a managed heap, dynamic allocations can be done much faster than the classic heap
  • shared ownership can be handled with negligible amortized cost, where in a native language you’d have to use reference counting which is awfully expensive
  • in some (possibly rare and contrived) cases, object destruction is vastly simplified as well (Most Java objects can be reclaimed just by GC’ing the memory block. In C++ destructors must always be executed)

STOP arg to range()/slice: simplistic rule

In the simplest usage, START is 0 and STEP is 1 or -1.

…. If STOP is 5 then five integers are generated. If used in a for loop then we enter the loop body five iterations.

In a rare usage (avoid such confusion in coding test!), STEP is neither 1 or -1, or START is not zero, so STOP is a used in something like “if generated_candidate < STOP then exit”

I think slicing operator is exactly the same. See https://github.com/tiger40490/repo1/blob/py1/py/slice%5Erange.py

word[:2]    # The first two characters
word[2:]    # Everything except the first two characters
s[:i] + s[i:] equals s
length of word[1:3] is 3-1==2

allocate half the players to NY ^ SF

Always good to be concrete, so without loss of generality, let’s say J=99 i.e. 198 players.

Q1: Given 2J players, we need to allocate exactly half of them to NY and the rest to San Francisco, to form two teams for a inter-city contest. We receive the NY fare and SF fare for each player, as 2J (i.e. 198) pairs of integers. How can we minimize the total fares?

Q2(bonus): what if 2J+1 players, and you can decide which city gets an extra player?

— Analysis: how many possible allocations? 2J-choose-J. Very large number of allocations. Something like O(J!) so brute force is impractical.

If for any player the two fares both go up to $9800, it doesn’t change our algorithm at all. Therefore, we only care about the fare difference (N-S) for each player.

— solution: I will assume most players live near SF so SF fares are lower. I tabulate and sort the 198 fare differences “NY fare – SF fare” and suppose indeed at least 99 (half) are positive[1]. Therefore, my “base” is SF i.e. my base allocation is everyone-allocated-to-SF. Next I must pick 99 (half) of them and allocate to NY.

I will avoid the higher values in the table.

I simply pick the lower 99 (half) in the table, because these are the 99 “extra” costs I will incur. I want to minimize the total of these 99 values, whether they are mostly positive or mostly negative.

  • Note the highest number in the table indicates the most expensive player for NY relative to SF. If this number is negative, then SF is more expensive than NY for All players so follow the rule in [1] but notice he is the least expensive players for SF relative to NY. Regardless of positive/negative sign, we want to keep this person in SF .
  • Note the lowest number in the table indicates the least expensive player for NY relative to SF. If this number is negative, then SF is more expensive than NY for this player — normal. Regardless, we want to allocate her to NY.

[1] if not the case, then we could easily treat NY as base to keep the solution intuitive. Actually even if we don’t change the base, the algorithm still works, albeit unintuitively.

A2: Say J=99 so total 199 players We already picked 99 for NY. Do we pick one more for NY or keep remaining 100 in SF?

Just look at the next lowest value in the table. If positive, then NY is more expensive for him, so keep him in SF.

parseContinuousSentenceUsingDictionary

Q: Given a dictionary of English words (containing no numbers no underscores) and a very long sentence, someone has removed all the spaces. Please write a program to restore it by adding a space after each word. If there’s no way to parse the sentence just return False, otherwise return True to indicate you found at least one way to parse the sentence.

The sentence can have repeated words.  Ignore case.

I was allowed to use either white-board or computer.

Brute force? I struggled for a long time with some vague idea. Interviewer asked me to implement any solution I have, however inefficient but I was unable to.

https://github.com/tiger40490/repo1/blob/py1/py/str/continuousSentence.py is my solution. Interviewers basically said it works.

binary-matrix cluster count #DeepakM #DST+fullScan

https://github.com/tiger40490/repo1/blob/py1/py/2d/clusterCount.py is my solution.

https://www.geeksforgeeks.org/find-number-of-islands/ is conceptually identical, though using a different criteria for “connected”

Hi Deepak

I realize for “connected” problems, there’s definitely a graph underneath, so graph traversal is required. I guess BFT or DST will both find all the nodes connected to “me”.

Given a 1000 x 1000 all-black matrix, I think DFT recursion will go into exactly 1,000,000 levels and overflow stack space.

A high-level (vague) idea is

  • Scan in type-writer fashion. Suppose there are 10×10 = 100 cells either black or write. I paint each cell brown [1] once visited. I also use a integer counter to keep track how many cells already visited.
  • During the scan. If a cell is already visited, I will ignore it and move on
  • Any time I find a black cell, I will start a BFT (I dislike DST) to find all connected black cells.  Once I find all the black cells connected to “me”, I resume the type-writer scan.
  • Once my counter hits 100, I have visited all cells and will exit.

[1] you paint it white, which is probably better.

I thought this problem was too hard and not worth study, but you convinced me that it may come up and I can at least give a vague solution  … better than no solution.

Compared to a binary tree, walking over a matrix is /operationally/ (not conceptually) harder because

  • need to check array bounds. Error-prone and time consuming in coding tests
  • The (invisible) links will not cover all cells. To avoid missing a cell, we still need a full matrix scan. The integration of tree walk + matrix scan is unintuitive, to put it mildly.
  • During the tree-walk, you don’t want to visit any cell too many times or get lost in an endless loop. A brute-force technique — shadow matrix to remember all visited cells.
  • … However, once you get over these nitty-gritty operational complexities, then tree-walk in matrix is not really harder. These algo questions can therefore frustrate and fail many candidates untrained on The technique.

describe Latest project #weak@22Feb

See also my notes in C:\0\z0\cv,iview.0

Gayle has a good one-pager too for tech companies. There are similar resources.

Current project is supposed to be fresh in your mind, so interviewer has high expectation of details. Better avoid the current project.

Option: you could talk about a “past” project, but bring in details from another project, such as your actual current project, but beware you may be asked “what’s your current project?”

Avoid naive honesty, which often backfires. Such honesty doesn’t serve any purpose. We are professionals, expected to communicate truthfully but also stay relevant. The major challenge or a key component I created may be irrelevant to the interviewer or impossible to describe over phone.

Q: which component did you *design* ?

  • A: wait/notify framework to associate order requests sent out with response received. See the SCB concurrency design questionA
  • A: orderbook + reset, refresh. Orderbook alone could be too familiar to some interviewers .. hard to score points
  • A: Guardian — monitoring, with microagent + server + GUI. Easy to describe with some complexity
  • A: retrans gap mgmt. More details needed
  • A: NYSE+Arca+American integrated feed parser
  • A: Preferences framework
  • A: vol model serialization

Q: major challenges?

%%A: wait/notify framework

Q: what does the system do, in layman’s term

4×4 sudoku: backtracking #FB

4-queen is one of the simplest problems in the backtracking category, but I will look at a mini sudoku with any input.

So I must remember all my “placements” so far. Each placement is {piece, row, col}. I will keep my placements in a LIFO stack. I could write an iterative, but recursive might be easier.

I also need to keep track of all the “available pieces” and all the “empty cells” as global variables.

find any black-corner submatrix #open

https://www.geeksforgeeks.org/find-rectangle-binary-matrix-corners-1/

Q: given a black/white matrix, find any rectangle whose all four corners are black.

Brute force — typewriter-scan. For each black cell, treat it as a top-left, and look rightward for a top-right. If found, then scan down for a pair of bottom-left/bottom-right. If no then complete the given row and discard the row.

For a space/time trade-off,

  • once I find a black cell on current row, I put it in a “horizontal” vector.

## HFT c++IV is like .. swing IV

I figured out long ago that java swing is a distinct skill from mainstream java.

Now I feel HFT c++ is also a distinct skill from mainstream c++. Interviewers tend to ask question (almost) irrelevant to mainstream c++. Shanyou said focus is on system-level not application-level. I feel superficial knowledge is enough often.

  1. avoid “virtual” for latency
  2. CPU cache management
  3. translation lookaside buffer
  4. kernel bypass
  5. socket tuning
  6. memory and socket performance monitoring
  7. shared memory
  8. placement new and custom allocator
  9. system calls
  10. in-lining

–These topics are relevant to mainstream c++ but more relevant to HFT

  • setjmp
  • IPC mutex
  • reference counting