versioned-queue problem

I think this problem is mostly about data structure, not algorithm.

Q: Design and implement a Version-Queue. A Version-Queue maintains a version number along with normal Queue functionality. Every version is a snapshot of the entire queue. Every operation[Enqueue/Dequeue] on the Queue increments its version.

Implement the following functions:

1. Enqueue – appends an element at the end of the queue.
2. Dequeue – returns the top element of the queue.
3. Print – it takes a version number as input and prints the elements of the queue of the given version. The version number input can also be an old/historical version number.

E.g. if the current version number of the queue is 7 and the input to this function is 5, then it should print the elements of the queue when its version number was 5.

For simplicity, assume the elements are integers.

We expect you to write a helper program to test the above data structure which will read input from stdin and prints output to stdout.

Input format:
First line should have an integer n (number of operations). This should be followed by n number of lines, each denoting one operation.
e.g.
6
e 1
e 4
d
e 5
p 2
p 4

‘e’ stands for enqueue

— My design —
In addition to the regular queue data structure, we need a few helper data structures.

All current + historical queue elements are saved as individual elements in a “Snapshot” vector, in arrival order. This vector never decreases in size even during dequeue. Two pointers represent the trailing edge (dequeue) and leading edge (enqueue).

(minor implementation detail — Since it’s a vector, the pointers can be implemented as 2 integer index variables. Pointers and iterators get invalidated by automatic resizing.)

Every enqueue operation increments the right pointer to the right;
Every dequeue operation Increments the left pointer to the Right;
(No decrement on the pointers.)

With this vector, I think we can reconstruct each snapshot in history.

Every pointer increment is recorded in a Version table, which is a vector of Version objects {Left pointer, Right pointer}. For example, if Version 782 has {L=22, R=55} then the snapshot #782 is the sub-array from Index 22 to 55.

Additional space costs:
O(K) where K = number of operations

Additional time costs:
Enqueue operation — O(1). Insert at end of the Snapshot vector and Version vector
Dequeue operation — O(1). Move a pointer + insert into Version vector Print operation — O(M) where M = maximum queue capacity

Minor point — To avoid vector automatic resize, we need to estimate in advance the limit on K i.e. number of operations. If you tell me you get millions of operations a microsecond, then over a day there
could be trillions of “versions” so the Versions vector and the Snapshot vector need sufficient initial capacity.

Note we could get 9000 enqueue operations in a row.

Advertisements

find median@2sorted arrays #Trex untested

https://leetcode.com/problems/median-of-two-sorted-arrays/description/ is similar except X and Y can be unequal length. My solution solves the harder, generalized problem.

This “coding” question is really math problem. Once you work out the math techniques, the coding is simple.

Designate arr1 as the shorter array. compare med(arr1) vs med(arr2)

Suppose former is lower, i can discard lower half of arr1 (s items). Can i discard highest s items in arr2? I think so because upper half of arr2 cannot have that median element, so any subset of it can be discarded

repeat until arr1 is completely discarded or left to a single element .. might be the final median. Now answer is close to the med of the remaining arr2.

–For the equal-length problem, My own idea on the spot — find the median of X and median of Y. If med(X) < med(Y) then discard the lower portion of X i.e. the “XB group”, and higher portion of Y (“YA group”). Then repeat.

  • Note len(XB) == len(YA) == min(len(X), len(Y))/2 := K. So every iteration would shrink the shorter array by half (i.e. K), and shrink the longer array by K. K would drop in value in next iteration.
  • loop exit — When the shorter of the two (say it’s X) shrinks to length 1, we are lucky — find the numbers around median(Y) and adjust the answer based on X[0].

Insight — Why can’t the final “winner”be somewhere in XB group? Because XA + YA already constitute half the population, and all of them are higher.

I always like concrete examples. So Suppose there are 512 items in the lower portion “XB group”, and the higher portion “XA” has 512 items. Suppose there are 128 items each in YB and YA groups. So in this iteration, we discard YA and the lowest 128 items in XB.

Definition of lower portion —
* all lower items up to but not including med(X) If len(X) is odd
* exactly the lower half of X if len(X) is even

maxProfit, max-sum subArray #Kadane

One of the top 3 all-time favorite algo questions, worth studying in-depth. I know only two algorithms — (Kanade) disposableCurSubArray and lowWaterMark (my own and Xinfeng Zhou). I think both are equivalent from a few angles.

My algo is intuitive (to me) for level input (delta input can be transformed into levels) . One pass. I feel it’s not inferior in any way.

In contrast, disposableCurSubArray is intuitive for delta input i.e. maxSubArray.

https://github.com/tiger40490/repo1/tree/py1/py/array has a tested solution with lots of explanations.

https://www.geeksforgeeks.org/largest-sum-contiguous-subarray/ has a simpler solution, to be understood. I rewrote it in https://github.com/tiger40490/repo1/blob/cpp1/cpp/array/maxSubarraySum.cpp

  1. special case: If all numbers are negative, then final answer is the “smallest” negative number as a lonewolf array.
  2. special case: If all numbers are positive, then final answer is the entire array
  3. so the tricky case must have a mix of negative/positive

Here’s my explanation of Kadane’s algo:

  • Delta Input array has arr[0], arr[1] .. arr[n]. let’s denote maxSubsumEndingAt(i) as B[i]. The max Subarray Ending At #55 is a continuous subarray starting somewhere before #55. It can be a lone-wolf containing only #55, as an important special case. In fact, in the mixed positive/negative array, we usually encounter this case.
  • (2-pointer algo) We maintain a left marker and a right marker. Both move to the right. maxSubArrayEndingAt(i) is basically an accumulating subarray from left marker to right marker.
  • B[i] == max(B[i-1] + arr[i]    vs  arr[i] )
    • if  B[3] is negative i.e. all 4 sub-array sums ending in arr[3] are all negative, then B[4] should not include any of them. We can discard them for good and “start afresh“(while keeping the global maxSumSeen)
    • else, there exists a “positive” sub-array ending at arr[3], so we keep growing it until it becomes negative.
    • (See github python code) I can imagine a left-marker, the start of the current max subarray. We will move the right marker to grow the sub-array if the current sub-array is useful i.e. positive. Otherwise, we start afresh and the left marker jumps and coincide with right-marker
  • Note sign(B[i-1]) is key but sign(arr[i]) is irrelevant

Here’s my attempt to connect my lowWaterMark to Kanade’s algo:

I suspect that whenever we move the left marker, we always have a new lowWaterMark.

(Before the first element, Level is defined as 0 . In case III, the low water mark is usually a negative level.) Suppose A few elements after hitting low water mark level=-66, we hit level=-22. This is not a new water mark. maxSubsumEndingAt[this element] is actually positive since there exists a subarray starting right after the “level=-66” element!

When we hit level=-77, a new water mark, the maxSubsumEndingAt[this element] is basically zero, as the disposableCurSubArray is discarded. We start afresh to accumulate. Essentially, we reset the “base” to be the new water mark.



			

locate a pair with targetSum==55 #bbg IV #Morris

Update

Q: any O(1) space sort?

Q: From an unsorted array of positive integers, is it possible to find a pair of integers that sum up to a given sum?

Constraints: This should be done in O(n) and in-place without any external storage like arrays, hash-maps, but you can use extra variables/pointers.

If this is not possible, can there be a proof given for the same?

—–Initial analysis—-
I wish I were allowed to use a hash table of “wanted ” values. (iterate once and build hashtable. For Each new value encountered, check if it is in the “wanted” list…)

I feel this is typical of west coast algorithm quiz.

I feel it’s technically impossible, but proof?  I don’t know the standard procedure to prove O(n) is impossible. Here’s my feeble attempt:

Worst case — the pair happens to be the 1st and last in the array. Without external storage, how do we remember the 1st element?  We can only use a small number of variables, even if the array size is 99999999. As we iterate the array, I guess there would be no clue  that 1st element is worth remembering. Obviously if we forget the 1st element, then when we see the last element we won’t recognize they are the pair.
—–2nd analysis—-
If we can find a O(n) in-place sort then problem is solvable [1]. Let’s look at Radix sort, one of the most promising candidates. https://stackoverflow.com/questions/463105/in-place-radix-sort has an implementation for DNA strings.

Assumption 1: the integers all have a maximum “size” in terms of digits. Let’s say 32-bit. then yes radix is O(n) but not sure about space complexity. Now, with any big-O analysis we impose no limit on the sample size. For example we could have 999888777666555444333 integers. Now, 32-bit gives about 4 billion distinct “pigeon-holes”, so by the pigeon-hole principle most integers in our sample have to be duplicates!

Therefore, Assumption 1 is questionable. In fact, some programming languages impose no limit on integer size. One integer, be it 32 thousand bits or 32 billion bits, could use up as much memory as there is in the system. Therefore, Assumption 1 is actually superfluous.

Without Assumption 1, and if we allow our sample to be freely distributed, we must assume nothing about the maximum number of digits. I would simply use

Assumption 2: maximum number of digits is about log(n). In that case radix sort is O(n log(n)), not linear time:(

—– new analysis as of Jun 2017 —-
Can Morris algo sort an array (external storage)? If yes,  then use the 2-moving-pointer algo in locate a pair whose diff=55

However, Morris needs a tree not an array.

[1] locate a pair with targetSum=55 : pre-sorted #O(N) 1-pass

island rainfall problem

#include
#include
#include
#include
#include
#include
using namespace std;
int const island[] = { 54, 50, 54, 54, 52, 55, 51, 59, 50, 56, 52, 50 };
///////////////   Pos # 0   1   2   3   4   5   6   7   8   9  10  11
int const size = sizeof(island) / sizeof(int);
int accu = 0;
//adapted from STL
template
ForwardIterator max_element_last(ForwardIterator scanner, ForwardIterator const end) {
ForwardIterator ret = scanner;
if (scanner == end)
return ret;//empty range, with zero element!
while (++scanner != end)
if (*ret <= *scanner) //"=" means find LAST
ret = scanner;
return ret;
}
//print height and address of a column
void print1(int const* const pos, char const * const label) {
//int const height = *pos;
printf(“%s=%d/%d “, label, *pos, pos – island);
}
void printAll(int const* const L, int const* const l, int const* const h,
int const* const H) {
if (l < h) {
print1(L, “wallL”);
print1(l, “ptr”);
printf(”  “);
print1(h, “ptr”);
print1(H, “wallH”);
} else {
print1(H, “wallH”);
print1(h, “ptr”);
printf(”  “);
print1(l, “ptr”);
print1(L, “wallL”);
}
printf(“%d=Accu\n”, accu);
}
//Rule: move the lo-side pointer only
void onePassAlgo(){
int*loptr; //moving pointer, moving-inward.
int*wallLo, *wallHi; //latest walls
int*h;

//1st we ASSUME the first left side wall will be lower than the first right side wall
wallLo = loptr = const_cast (island);
wallHi = h = const_cast (island) + size – 1;
//2nd, we validate that assumption
if (*wallLo > *wallHi) {
std::swap(wallLo, wallHi);
std::swap(loptr, h);
}
// now lo is confirmed lower than the hi side
printAll(wallLo,loptr,h,wallHi);
printf(“All pointers initialized (incl. 2 walls\n”);
while (loptr != h) {
if (*loptr > *wallHi) {
wallLo = wallHi;
wallHi = loptr;
std::swap(loptr, h);
//printf(“new wallHi:”);
} else if (*loptr >= *wallLo) {//see the >=
wallLo = loptr;
//printf(“wallLo updated:”);
} else {
assert (*loptr < *wallLo);
accu += (*wallLo – *loptr);
printf(“adding %d liter of water at Pos_%d (%d=A\n”, *wallLo – *loptr,
loptr – island, accu);
}
printAll(wallLo,loptr,h,wallHi);
// only by moving the loptr (not h) can we confidently accumulate water
if (loptr < h)
++loptr; //lo side is on the left, move loptr right
else
–loptr; //lo side is on the right, move loptr left
}
}
void twoPassAlgo() {//less convoluted
int const* const peak = max_element_last(island, island + size);
printf(“highest peak (last if multiple) is %d, at Pos %d\n”, *peak, peak
– island);
//(island, island + size, ostream_iterator (cout, ” “));

//forward scan towards peak
int* pos = const_cast (island); //left edge of island
int* wall = pos;
for (++pos; pos < peak; ++pos) {
if (*wall > *pos) {
accu += *wall – *pos; // accumulate water
printf(“adding %d liter of water at Pos#%d (T=%d)\n”, *wall – *pos,
pos – island, accu);
continue;
}
//ALL new walls must match or exceed previous wall.
printf(“found new wall of %d^ at Pos#%d\n”, *pos, pos – island);
wall = pos;
}
cout << "^^^ end of fwd scan ; beginning backward scan vvv\n";
//backward scan
pos = const_cast (island) + size – 1;
wall = pos;
for (–pos; pos > peak; –pos) {
if (*wall > *pos) {
accu += *wall – *pos; // accumulate water
printf(“adding %d liter of water at Pos#%d (T=%d)\n”, *wall – *pos,
pos – island, accu);
continue;
}
//Note all new walls must match or exceed previous wall.
printf(“found new wall of %d^ at Pos#%d\n”, *pos, pos – island);
wall = pos;
}
}
int main(int argc, char *argv[]) {
twoPassAlgo();
accu = 0;
cout<<"—————————–\n";
onePassAlgo();
}
/*
 Requirement — a one-dimentional island is completely covered with columns of bricks.
 If  between Column 
 A(height 9) and Column B(10) all columns are lower, then we get a basin to
 collect rainfall. Watermark height (absolute) will be 9.  We can easily calculate the
 amount of water. If I give you all the column heights, give me total rainfall collected.
 Code showcasing
 – stl algo over raw array
 – array/pointer manipulation
 – array initialization
 – array size detection
 – std::max_element modified
 – std::swap
 */