STOP arg to range()/slice: simplistic rule

In the simplest usage, START is 0 and STEP is 1 or -1.

…. If STOP is 5 then five integers are generated. If used in a for loop then we enter the loop body five iterations.

In a rare usage (avoid such confusion in coding test!), STEP is neither 1 or -1, or START is not zero, so STOP is a used in something like “if generated_candidate < STOP then exit”

I think slicing operator is exactly the same. See

word[:2]    # The first two characters
word[2:]    # Everything except the first two characters
s[:i] + s[i:] equals s
length of word[1:3] is 3-1==2

allocate half the players to NY ^ SF

Always good to be concrete, so without loss of generality, let’s say J=99 i.e. 198 players.

Q1: Given 2J players, we need to allocate exactly half of them to NY and the rest to San Francisco, to form two teams for a inter-city contest. We receive the NY fare and SF fare for each player, as 2J (i.e. 198) pairs of integers. How can we minimize the total fares?

Q2(bonus): what if 2J+1 players, and you can decide which city gets an extra player?

— Analysis: how many possible allocations? 2J-choose-J. Very large number of allocations. Something like O(J!) so brute force is impractical.

If for any player the two fares both go up to $9800, it doesn’t change our algorithm at all. Therefore, we only care about the fare difference (N-S) for each player.

— solution: I will assume most players live near SF so SF fares are lower. I tabulate and sort the 198 fare differences “NY fare – SF fare” and suppose indeed at least 99 (half) are positive[1]. Therefore, my “base” is SF i.e. my base allocation is everyone-allocated-to-SF. Next I must pick 99 (half) of them and allocate to NY.

I will avoid the higher values in the table.

I simply pick the lower 99 (half) in the table, because these are the 99 “extra” costs I will incur. I want to minimize the total of these 99 values, whether they are mostly positive or mostly negative.

  • Note the highest number in the table indicates the most expensive player for NY relative to SF. If this number is negative, then SF is more expensive than NY for All players so follow the rule in [1] but notice he is the least expensive players for SF relative to NY. Regardless of positive/negative sign, we want to keep this person in SF .
  • Note the lowest number in the table indicates the least expensive player for NY relative to SF. If this number is negative, then SF is more expensive than NY for this player — normal. Regardless, we want to allocate her to NY.

[1] if not the case, then we could easily treat NY as base to keep the solution intuitive. Actually even if we don’t change the base, the algorithm still works, albeit unintuitively.

A2: Say J=99 so total 199 players We already picked 99 for NY. Do we pick one more for NY or keep remaining 100 in SF?

Just look at the next lowest value in the table. If positive, then NY is more expensive for him, so keep him in SF.


Q: Given a dictionary of English words (containing no numbers no underscores) and a very long sentence, someone has removed all the spaces. Please write a program to restore it by adding a space after each word. If there’s no way to parse the sentence just return False, otherwise return True to indicate you found at least one way to parse the sentence.

The sentence can have repeated words.  Ignore case.

I was allowed to use either white-board or computer.

Brute force? I struggled for a long time with some vague idea. Interviewer asked me to implement any solution I have, however inefficient but I was unable to. is my solution. Interviewers basically said it works.

binary-matrix cluster count #DeepakM #DST+fullScan is my solution. is conceptually identical, though using a different criteria for “connected”

Hi Deepak

I realize for “connected” problems, there’s definitely a graph underneath, so graph traversal is required. I guess BFT or DST will both find all the nodes connected to “me”.

Given a 1000 x 1000 all-black matrix, I think DFT recursion will go into exactly 1,000,000 levels and overflow stack space.

A high-level (vague) idea is

  • Scan in type-writer fashion. Suppose there are 10×10 = 100 cells either black or write. I paint each cell brown [1] once visited. I also use a integer counter to keep track how many cells already visited.
  • During the scan. If a cell is already visited, I will ignore it and move on
  • Any time I find a black cell, I will start a BFT (I dislike DST) to find all connected black cells.  Once I find all the black cells connected to “me”, I resume the type-writer scan.
  • Once my counter hits 100, I have visited all cells and will exit.

[1] you paint it white, which is probably better.

I thought this problem was too hard and not worth study, but you convinced me that it may come up and I can at least give a vague solution  … better than no solution.

Compared to a binary tree, walking over a matrix is /operationally/ (not conceptually) harder because

  • need to check array bounds. Error-prone and time consuming in coding tests
  • The (invisible) links will not cover all cells. To avoid missing a cell, we still need a full matrix scan. The integration of tree walk + matrix scan is unintuitive, to put it mildly.
  • During the tree-walk, you don’t want to visit any cell too many times or get lost in an endless loop. A brute-force technique — shadow matrix to remember all visited cells.
  • … However, once you get over these nitty-gritty operational complexities, then tree-walk in matrix is not really harder. These algo questions can therefore frustrate and fail many candidates untrained on The technique.

describe Latest project #weak@22Feb

See also my notes in C:\0\z0\cv,iview.0

Gayle has a good one-pager too for tech companies. There are similar resources.

Current project is supposed to be fresh in your mind, so interviewer has high expectation of details. Better avoid the current project.

Option: you could talk about a “past” project, but bring in details from another project, such as your actual current project, but beware you may be asked “what’s your current project?”

Avoid naive honesty, which often backfires. Such honesty doesn’t serve any purpose. We are professionals, expected to communicate truthfully but also stay relevant. The major challenge or a key component I created may be irrelevant to the interviewer or impossible to describe over phone.

Q: which component did you *design* ?

  • A: wait/notify framework to associate order requests sent out with response received. See the SCB concurrency design questionA
  • A: orderbook + reset, refresh. Orderbook alone could be too familiar to some interviewers .. hard to score points
  • A: Guardian — monitoring, with microagent + server + GUI. Easy to describe with some complexity
  • A: retrans gap mgmt. More details needed
  • A: NYSE+Arca+American integrated feed parser
  • A: Preferences framework
  • A: vol model serialization

Q: major challenges?

%%A: wait/notify framework

Q: what does the system do, in layman’s term

4×4 sudoku: backtracking #FB

4-queen is one of the simplest problems in the backtracking category, but I will look at a mini sudoku with any input.

So I must remember all my “placements” so far. Each placement is {piece, row, col}. I will keep my placements in a LIFO stack. I could write an iterative, but recursive might be easier.

I also need to keep track of all the “available pieces” and all the “empty cells” as global variables.

find any black-corner submatrix #open

Q: given a black/white matrix, find any rectangle whose all four corners are black.

Brute force — typewriter-scan. For each black cell, treat it as a top-left, and look rightward for a top-right. If found, then scan down for a pair of bottom-left/bottom-right. If no then complete the given row and discard the row.

For a space/time trade-off,

  • once I find a black cell on current row, I put it in a “horizontal” vector.

## HFT c++IV is like .. swing IV

I figured out long ago that java swing is a distinct skill from mainstream java.

Now I feel HFT c++ is also a distinct skill from mainstream c++. Interviewers tend to ask question (almost) irrelevant to mainstream c++. Shanyou said focus is on system-level not application-level. I feel superficial knowledge is enough often.

  1. avoid “virtual” for latency
  2. CPU cache management
  3. translation lookaside buffer
  4. kernel bypass
  5. socket tuning
  6. memory and socket performance monitoring
  7. shared memory
  8. placement new and custom allocator
  9. system calls
  10. in-lining

–These topics are relevant to mainstream c++ but more relevant to HFT

  • setjmp
  • IPC mutex
  • reference counting