bbg: allocate half the players to NY^SF

Always good to be concrete, so without loss of generality, let’s say J=99 i.e. 198 players.

Q1: Given 2J players, we need to allocate exactly half of them to NY and the rest to San Francisco, to form two teams for a inter-city contest. We receive the NY fare and SF fare for each player, as 2J (i.e. 198) pairs of integers. How can we minimize the total fares?

Q2(bonus): what if 2J+1 players, and you can decide which city gets an extra player?

— Analysis: how many possible allocations? 2J-choose-J. Very large number of allocations. Something like O(J!) so brute force is impractical.

If for any player the two fares both go up by $9800, it doesn’t change our algorithm at all. Therefore, we only care about the fare difference (N-S) for each player.

— solution: I will assume most players live near SF so SF fares are lower. I tabulate and sort the 198 fare differences “NY fare – SF fare” and suppose indeed at least 99 (half) are positive[1]. Therefore, my “base” is SF i.e. my base allocation is everyone-allocated-to-SF. Next I must pick 99 (half) of them and allocate to NY.

I will avoid the higher values in the table.

I simply pick the lower 99 (half) in the table, because these are the 99 “extra” costs I will incur. I want to minimize the total of these 99 values, whether they are mostly positive or mostly negative.

  • Note the highest number in the table indicates the most expensive player for NY relative to SF. If this number is negative, then SF is more expensive than NY for All players so follow the rule in [1] but notice he is the least expensive players for SF relative to NY. Regardless of positive/negative sign, we want to keep this person in SF .
  • Note the lowest number in the table indicates the least expensive player for NY relative to SF. If this number is negative, then SF is more expensive than NY for this player — normal. Regardless, we want to allocate her to NY.

[1] if proven otherwise (by the data), then we could easily treat NY as base to keep the solution intuitive. Actually even if we don’t change the base, the algorithm still works, albeit unintuitively.

A2: Say J=99 so total 199 players We already picked 99 for NY. Do we pick one more for NY or keep remaining 100 in SF?

Just look at the next lowest value in the table. If positive, then NY is more expensive for him, so keep him in SF.

Advertisements

quickSort worst case O(N^2)

https://www.khanacademy.org/computing/computer-science/algorithms/quick-sort/a/analysis-of-quicksort

Worst partition “tree” looks like a chain of length N, but typically, height of this tree is logN.

On each level of the tree, the time to “go through” all nodes is O(N).

bucketSort: O(N+k) could be better or worse than O(N)

Based on the top answer in https://stackoverflow.com/questions/7311415/how-is-the-complexity-of-bucket-sort-is-onk-if-we-implement-buckets-using-lin

It’s obvious (by definition) that O(2N) is equivalent to O(N).

O(N+k) is worse than O(2N) when k is larger, like 4000 times N. For a concrete illustration, say, we have N=5 million strings to sort, using k = 20 billion buckets.

  • It takes constant time to bucket each string, so the first step takes O(N) i.e. grows proportional to N.
  • 2nd step is now dominated by k, since all 20 billion buckets have to be examined. Time complexity is O(N+k). This is explained in the stackoverflow answer.
  • Therefore, total complexity is O(N+k) i.e. cN + bk with two constant factors c and b. Even if b is small (complexity of looking at an empty bucket), as k grows, at sometime the k term would overtake N term and become dominant factor.

In practice, k is often chosen to be a smaller integer than N, so I don’t think this is a practical issue.

25-horse puzzle #Google #unsolved

Q: A single racetrack. P=25 mechanical horses each with a unique speed. You can race up to 5 horses each time, and get a printout of the ranking, but no timing.

What’s the minimum number of races to identify fastest T=3 horses.

has the ananlysis.

——–analysis——–
Quick elimination — each time I can eliminate minimum 2 horses but can we eliminate faster?

What if T is 1 and P is 9? Two races exactly.

What if T is 3 and P is 6? Two races.

I will use a RB-tree to keep as many of the horses as possible.

Worst case — first race actually had the top 3 and bottom 2 among entire population. After 1st race, I would re-race the #3 and #4 with 3 new horses.

 

fastestSortFor: one str; arrayOfStr; 32bit ints #non-comparison

I think the big-O coding question may require this knowledge, because

….real world sorting problems can often use key-based sorting, rather than comparison-based sorting.

Therefore, if your overall solution requires sorting, don’t assume it would become O(J*K+M^2+P*N logN) based on your O(N logN) sorting assumption. Here are some achievable linear-time sorts:

sort O() data type ref notes
counting O(N) chars
 radix O(N) 64-bit ints O(Nw) -> O(N)
 radix O(N) variable-size strings  [1][2] fastest
 burst O(N) variable-size strings 2nd fastest. key-based, trie-based
 radix O(N) floats  [3] IEEE format
 radix O(N) date/time converting to fixed-length string or fixed-size integers
 bucket? O(N) random ints in python/perl unbounded. No limit on #digits

Note if data is nearly sorted, then insertion-sort will outperform everyone else.

[1] https://www.cs.princeton.edu/~rs/AlgsDS07/18RadixSort.pdf

[2] https://link.springer.com/chapter/10.1007%2F978-3-540-89097-3_3 paper presented a MSD radix sort for large collections of strings, faster than any other string sorting algorithm.

[3] http://www.boost.org/doc/libs/1_59_0/libs/sort/doc/html/sort/sort_hpp/float_sort.html

find lowest int value !! in my list#codility sample

—– the question ——
Write a function:
class Solution { public int solution(int[] A); }
that, given an array A of N integers, returns the smallest positive integer (greater than 0) that does not occur in A.
For example, given A = [1, 3, 6, 4, 1, 2], the function should return 5.
Given A = [1, 2, 3], the function should return 4.
Given A = [−1, −3], the function should return 1.
Assume that:
• N is an integer within the range [1..100,000];
• each element of array A is an integer within the range [−1,000,000..1,000,000].
Complexity:
• expected worst-case time complexity is O(N);
• expected worst-case space complexity is O(N), beyond input storage (not counting the storage required for input arguments).
Elements of input arrays can be modified.

—— my analysis —–
This is not testing coding abilities, but theoretical knowledge.

Codility is a fully automated system. I believe the O(N) requirement will not be checked by a human. The Codility system may not be able to determine O() by parsing the source code. In fact, a O(N) source code may run slower than a O(N logN) solution and fail the Codility hidden tests. Therefore, we can put aside the O(…) requirement initially.

One way to achieve O(N) is radix sort on A, dropping any negative values. Then iterate the sorted array and check for the earliest “gap”. Worst case — you get 1,2,3… without gap, so answer is the 1+ largest array element.

coin problem #all large-enough amounts are decomposable

This is not really a algo IV question, but more like brain teaser problem.

Based on https://en.wikipedia.org/wiki/Coin_problem — For example, the largest amount that cannot be obtained using only coins of 3 and 5 units is 7 units. The solution to this problem for a given set of coin denominations is called the Frobenius number of the set. The Frobenius number exists as long as the set of coin denominations has no common divisor.

Note if a common divisor exists as in {2,4} then all the odd amounts will be non-decomposable.

Q: why a very large amount is always decomposable ? Give an intuitive explanation for 2 coin values like 3 and 5.

Here’s an incomplete answer — 15 (=3*5), 16, 17 are all decomposable. Any larger number can be solved by adding 3’s .

In fact, it was proven that any amount greater than (not equal to) [xy-x-y] are always decomposable. So if we are given 2 coin values (like 4,5, where x is the smaller value) we can easily figure out a range

xy-x-y+1  to xy-y

are each decomposable. Note this range has x distinct values. So any higher amount are easily solved by adding x’s

Also note xy-y is obviously decomposable as (x-1)y.

 

GS c++/dataStruct IV

Q1: what’s the problem to be solved by virtual inheritance
A: diamond..

Q2: consider such a diamond with A as grandparent, B/C virtually deriving from A, then grandchild D. Suppose class C has a foo() method that uses a field A.a. Now, within C, the offset of A.a depends on the inheritance hierarchy, so how does the compiler compile C.foo()?

A: I do see a problem here. In this ABCD case, the offset of A.a is one value, depending on the size of B. However, in a ABJKCD case (J/K virtually subclass A), the offset of A.a will be a different value, depending on the sizes of B/J/K. So the C.foo() method will compile to different object code! Yet the compiler has a single version of C.foo().

The trick is the offset of the A sub-object within C. This offset value can be saved in the vtable of C.

Note if there’s no virtual inheritance, then simpler. Within C, the A sub-object’s offset is a constant. No need to look up vtable.

Q3: appending a million integers to a list vs a vector, which is faster and by how much?
%%A: vector requires reallocation, roughly 20 times, assuming doubling each time, much fewer than 1M allocations for the list.  The reallocation entails element-by-element copy, but that’s cheaper than allocations. If I must estimate how much faster, I need to estimate how many allocations and how many copying:

  • for list, 1M allocations + 1M copying
  • for vector, 20 allocations + about 2M copying

%%A: I still think allocation is a hotspot and much more expensive than copying, perhaps by 10 or 100 times.

%%A: of course, you can pre-allocate for vector, but for this problem we ignore that feature.

Q4: deque — known for fast append/prepend, and random access. How?
%%A: I think it’s implemented as segments of vectors, with the front and back segments not fully occupied and growable. Once one of them reaches a certain size, a new segment is created. RandomAccess is implemented by identifying the host segment. To optimize, segments should have equal length.

Q4b: what if we want to support fast midstream insertion?
%%A: this is a vector limitation inherited by deque. If we need to support it, we may need to allow unequal segment lengths. RandomAccess is now slightly slower. For a given subscript 16, we need to maintain a step function like

  • 5-10 in Segment 2
  • 11-13 in Segment 3
  • 14-20 in Segment 4
  • 21-30 in Segment 5
  • 31-32 in Segment 6
  • … so each time any segment expands, the step function needs an update
  • Once we have the step function, a binary search will do, at log(S) where S = Segment count

## insertion sort: phrasebook

  • nearly sorted — for nearly-sorted sample, beat all other sorts  including counting sort
  • small — for small sample, beats MSD radix sort and all comparison sort
    • switch — many comparison sort implementations switch to insertion-sort when the sub-array becomes small
  • poker — Intuitively, this is how players sort poker cards
    • shift — requires shifting part of the sorted portion.