deque with uneven segments #GS

Q: (2017 GS interview question) We know std::deque offers efficient insert at both ends. But now we want efficient insert mid-stream, and willing to relax O(1) lookup.

I proposed a solution using uneven segments. Each segment has a max capacity. When reached, a new segment would be allocated and linked in. Lookup using a staircase function.

Segment sizes (not capacity) is a mutable int array. From this array, we lazily derive another int array, the “threshold array” or staircase array. Every lookup requires a binary search in this threshold array to locate the target segment. If we keep a hard limit (like 1024) on segment count, then the threshold array would be at most 1024 long, so within 10 hits we always locate the target segment — better than O(log N)

The two int arrays should be L1-cached.

Advertisements

generate simple paths between 2 graph nodes

Q1: given 2 nodes in a graph containing N (eg 121) nodes, potentially with cycles, generate all simple paths between the pair. A simple path has no cycle. (In other words, length + 1 ==  # unique nodes in a simple path)

  • I think there are classic math algorithms for it, because this is part of basic graph theory. Here are some applications of this type of algorithms —
  • Q1b (special case of Q1): given 2 nodes in a C by R matrix grid, where every node is connected to (up to) four neighbors, generate all cycle-free paths.
    • I can solve this problem in python
  • Q2 (easy one based on Q1): generate all simple paths between any node pair in a graph. The shortest simple path has length=0. Longest simple path can potentially visit every node exactly once.
  • A: first generate all 121-Choose-2 node pairs. For each pair, solve Q1. Lastly generate the 121 trivial paths of length=0.
  • Q2b (special case of Q2): given a C by R (eg 11×11) matrix grid, where every node is connected to (up to) four neighbors, generate all simple paths.
  • Q2c (easy one based on Q2): given a binary tree containing no cycles, generate all paths.

— my DFT implementation (probably not 100% correct) , where each “trail” either fails or becomes a path.

  1. from NodeA start a breadcrumb/trail. We can’t revisit any node already visited on current breadcrumb,
    1. if this is a matrix, then instead of a hashtable, we can also use a shadow matrix, but the breadcrumb is much smaller than a shadow matrix
  2. if we can reach a node surrounded by nodes on the same breadcrumb, then the trail fails
  3. else we will reach NodeB 🙂 Print the breadcrumb

By construction, we won’t see duplicate paths 🙂

https://github.com/tiger40490/repo1/blob/py1/py/grid/classic_count4waySimplePaths.py is the implemnetation

–BFT? I don’t think it can print each unique path

generate all abbr from longest.. +! recursion

I won’t insist on relative ordering among the shortest.

Idea 1() — Start with longest abbreviation i.e. the original string S, assuming 5 characters.

  1. populate the smallHM with the original word
  2. copy every char except the first. save into bigHM, then print/process this abbrevation.
  3. copy every char except the 2nd and print
  4. ..
  5. copy every char except the last. Now we have 5 strings in bigHM (a Level-4 hashmap), each length S-1=4
  6. make smallHM point to bigHM object; point bigHM to an empty hm
  7. now take a string from smallHM (Level-4 collection) and generate 4 shorter strings and save them in bigHM (a Level-3 collection), each length S-2=3
  8. now take 2nd string from Level-4 …
  9. After we finish Level-4, we have generated 20 strings in Level-3, but there are only 10 distinct items! so we need a L3 hashmap.

 

O(1) space or O(1) search or O(N) sort : tricks

Every time I see O(1) space required on an array problem, I think of …. swapping.

Every time I see O(1) space required on a list problem, I ask is it a ….. linked list.

Every time I see O(N) time required on an array problem, I think of … radix sort, applicable to 64-bit integers, 64-bit floats and strings.

Every time I see O(1) search, I think of … hash table and radix array

Tower IV: c++,algo,past prj..

Q1: how could you use enum to improve performance
AA: use it to replace strings
AA: enum is used in template programming to move computation from run time to compile time. See Q4 and SFINAE in github.
%%A: see clever use of enum(char?) in demanding c++ app

Q1b: you put in your change but it turns out to hurt performance. Any idea?

Q4: how do you compute Fib(N) like Fib(99999) at compile time using template programming?
A: See my Fibonacci code in github

Q: how would you use multicast to send a message to a group of registered users?
%%A: less data copying than TCP

Q: In O(1) space, Given an unsorted array of natural numbers, how do you find any pair to produce a target sum? What’s your best time complexity?
%%A: if I assume the integer size is 32-bit then a fixed-size O(1) space radix structure can sort the array in O(N). See also my blog post on [[locate a pair with targetSum=55 #bbg IV #Morris]]

Q: How is your muni bond pricing engine designed internally?
Q2b: what kind of pricing rules live in the cache?
Q2c: how many threads to react to a given event?

Q: how did you calculate FX fwd points?
%%A: use the interest rates in the two currencies and the spot FX rate.

Q: how is your implied volatility computed from an option price?
%%A: if there’s not a closed-form formula, then Newton’s method or something similar should be able to converge quickly.

Q3 (OO design): How would you design a game software with Animal objects, various birds, and fish?
Q3a: Say the system is running fine, but now you need to add ostrich class?

binary search in rotated sorted array

https://leetcode.com/problems/search-in-rotated-sorted-array/description/ has the requirement. I don’t want to look at other people’s solution, so I have reproduced the requirements below. I have not encountered this problem in any coding interview.

Q: Suppose an array sorted in ascending order is rotated at some pivot unknown to you beforehand. (i.e., [0,1,2,4,5,6,7] might become [4,5,6,7,0,1,2]). You are given a target value to search. If found in the array return its index, otherwise return -1. You may assume no duplicate exists in the array. Your algorithm’s runtime complexity must be in the order of O(log n).

https://github.com/tiger40490/repo1/blob/cpp1/cpp/array/binSearchRoatedArr.cpp is my solution

–Solution 2:

first run a binary search to locate the pivot point. Then run a O(1) test to discard one of the 2 segments. We are left with the remaining segment as a regular sorted array. Run binary search on it.

bbg: allocate half the players to NY^SF

Always good to be concrete, so without loss of generality, let’s say J=99 i.e. 198 players.

Q1: Given 2J players, we need to allocate exactly half of them to NY and the rest to San Francisco, to form two teams for a inter-city contest. We receive the NY fare and SF fare for each player, as 2J (i.e. 198) pairs of integers. How can we minimize the total fares?

Q2(bonus): what if 2J+1 players, and you can decide which city gets an extra player?

— Analysis: how many possible allocations? 2J-choose-J. Very large number of allocations. Something like O(J!) so brute force is impractical.

If for any player the two fares both go up by $9800, it doesn’t change our algorithm at all. Therefore, we only care about the fare difference (N-S) for each player.

— solution: I will assume most players live near SF so SF fares are lower. I tabulate and sort the 198 fare differences “NY fare – SF fare” and suppose indeed at least 99 (half) are positive[1]. Therefore, my “base” is SF i.e. my base allocation is everyone-allocated-to-SF. Next I must pick 99 (half) of them and allocate to NY.

I will avoid the higher values in the table.

I simply pick the lower 99 (half) in the table, because these are the 99 “extra” costs I will incur. I want to minimize the total of these 99 values, whether they are mostly positive or mostly negative.

  • Note the highest number in the table indicates the most expensive player for NY relative to SF. If this number is negative, then SF is more expensive than NY for All players so follow the rule in [1] but notice he is the least expensive players for SF relative to NY. Regardless of positive/negative sign, we want to keep this person in SF .
  • Note the lowest number in the table indicates the least expensive player for NY relative to SF. If this number is negative, then SF is more expensive than NY for this player — normal. Regardless, we want to allocate her to NY.

[1] if proven otherwise (by the data), then we could easily treat NY as base to keep the solution intuitive. Actually even if we don’t change the base, the algorithm still works, albeit unintuitively.

A2: Say J=99 so total 199 players We already picked 99 for NY. Do we pick one more for NY or keep remaining 100 in SF?

Just look at the next lowest value in the table. If positive, then NY is more expensive for him, so keep him in SF.

bucketSort: O(N+k) could be better or worse than O(N)

Based on the top answer in https://stackoverflow.com/questions/7311415/how-is-the-complexity-of-bucket-sort-is-onk-if-we-implement-buckets-using-lin

It’s obvious (by definition) that O(2N) is equivalent to O(N).

O(N+k) is worse than O(2N) when k is larger, like 4000 times N. For a concrete illustration, say, we have N=5 million strings to sort, using k = 20 billion buckets.

  • It takes constant time to bucket each string, so the first step takes O(N) i.e. grows proportional to N.
  • 2nd step is now dominated by k, since all 20 billion buckets have to be examined. Time complexity is O(N+k). This is explained in the stackoverflow answer.
  • Therefore, total complexity is O(N+k) i.e. cN + bk with two constant factors c and b. Even if b is small (complexity of looking at an empty bucket), as k grows, at sometime the k term would overtake N term and become dominant factor.

In practice, k is often chosen to be a smaller integer than N, so I don’t think this is a practical issue.

25-horse puzzle #Google

Q: A single racetrack. P=25 mechanical horses each with a unique speed. You can race up to 5 horses each time, and get a printout of the ranking, but no timing.

What’s the minimum number of races to identify fastest #1 #2 #3 horses.

Youtube and my green quant-IV book both have the ananlysis.

——–analysis——–

Quick elimination — each time I can eliminate minimum 2 horses but can we eliminate faster?

What if T is 1 and P is 9? Two races exactly.

Here’s my algo #1

  1. first pick 5 random horses,
  2. race them and eliminate slowest 2 (reduce population by 2). Name the top 3 as AA/BB/CC
  3. race CC with 4 untested random horses. Four scenarios:
    1. if 3 or more beat CC, then eliminate the two slowest including CC, and put the top 3 in a group with AA/BB. Go back to Step b)
    2. if 2 beat CC, then eliminate the three slowest including CC, and put top 2 and an untested horse in a group with AA/BB. Go back to b)
    3. if only one horse beat CC, then eliminate the four slowest including CC, and put the #1 and 2 untested horses in a group with AA/BB. Go back to step b)
    4. if CC is #1 (happy path), then eliminate the other four horse and go back to step c)

Worst case — first race actually had the slowest 5, then we hit case C1 every time. So we only eliminate 2 horses each race

best case — first race had 5 fastest. total 6 races [1] but this is statistically unlikely. If you are not the most lucky, then we will need more than 6 races.

[1] first race eliminates 2… next 5 races each eliminates 4 only if CC is always fastest in every race.

–algo #2:

  1. [5 races] Split into 5 groups and run 5 unrelated races. Discard slowest 2 in each group. We end up with 15 horses.
  2. [1 race] Now race the 5 number one’s. Name them ABCDE. The D and E groups are completely eliminated (-6) and the C group is left with C alone (-2), and B group’s #3 is eliminated (-1). We end up with 6 horses. I will name the seven as A/B/C and the less-tested A2 A3 B2. There are only three possibilities:
    1. ABC
    2. A A2 A3
    3. A B B2
  3. [1 race] so I will race 5 of the 6 horses, sparing A since it’s already the final champion.
  4. — total 7 races best or worst case.