max-sum non-adjacent subSequence: signedIntArr#70%

Q: given a signed int array a[], find the best sub-sequence sum. Any Adjacent elements in a[] should not both show up. O(N) time and O(1) space.

====analysis

— O(1) idea 2:

Two simple variables needed: m_1 is the m[cur-1] and m_2 is m[cur-2]. So compare a[cur] + m_2 vs m_1.

https://github.com/tiger40490/repo1/blob/py1/py/algo_arr/nonAdjSeqSum.py is self-tested DP solution.

count lower ints to my right

https://leetcode.com/problems/count-of-smaller-numbers-after-self/ is labelled “hard”.

Q: given an integer array nums , return a new counts array, wherein counts[i] is the number of smaller elements to the right of nums[i]

====analysis

Order statistics tree  (i.e. an augmented RBTree) should make O(N logN). However, the actual algorithm is not clear to me.

One scan from right. Insert each node into this tree. Before inserting a node of value like 22, we will query the tree getRank(22).

Implementation wise, it’s hard to create a self-balancing BST from scratch. So I think an unbalanced BST might do.

Also, there might be some alternative solutions, like mergesort??

min int_range2include all clubs#presorted 60% careercup

Q (careercup): You have k uneven lists of pre-sorted integers, N ints in total. Find the smallest range that includes at least one number from each of the k lists. For example,
List 1: [4, 10, 15, 24, 26]
List 2: [0, 9, 12, 20]
List 3: [5, 18, 22, 30]
The smallest range here would be [20, 24] as it contains 24 from list 1, 20 from list 2, and 22 from list 3
==== Analysis
Looks rather contrived but very popular.
— solution 1 O(N): moving window algo:
O(N) merge all K clubs’ items into a big array, where payload includes the original clubId of each item. We end up with
[0/2nd 4/1st 5/3rd 9/2nd 10/1st …]

Once I come up with this big array, I feel confident to conceive a O(N) moving-window algo.

A ‘basic window’ is any subarray having representatives from each club.
A ‘tight window’ is a basic window that can’t shrink further, i.e. the 1st and last items are the only reps of their clubs.

Now findInitiaLwindow(), similar to findNextWin().
Simple procedure —
As we build the window, we update a hashtable like {club1: position 1,3,6; club2: position 0,5; club3: position 2,4,..}
Basically an array of queues.
We update this table each time we increment the front pointer in search of the next basic window.
When we find a basic window, this table should have all K queues non-empty and at least one of them singular, probably the last updated queue.

Now shrink this window via shrink(). Here’s a fast algo but more tricky and no O() impact —
For each queue, find the last added i.e. right-most position.
Put these K positions in a container(actually no container needed).
Now out of these K positions, find the minimum. Say 22. I claim 22 is the left edge of a smaller window.
Keep shrinking if we can but I doubt we can.

Here’s a slow shrink algo —
For the current window, look at the clubId of the left edge (back ptr).
Use the clubId to locate a queue in the table.
Head item in the queue should match the position of back ptr.
If not empty, pop this queue and increment the back ptr by one; else exit shrink()

Now we have a tight window, we remember its size (and update current winner).

After shrink(), we call findNextWin() —
Suppose the basic window’s earliest item is a clubX.
Now we move back ptr (clubX queue is now empty).
Now we move front pointer i.e. right edge of the window, looking for the next clubX item.

* findNextWin() is O(1) per front ptr increment. Only touches the tail of queues
* shrink() is O(1) per back ptr increment. Only touches the head of queues
* therefore, at most N front ptr moves and N back ptr moves.. O(N)

PriorityQ can speed up some, but won’t improve bigO

max proceeds selling 2 homes #Kyle

Q: you own a residential and an investment property. You have the historical price time series on both. You want to see the max aggregate amount you could have sold both, provided you must sell the investment property before the residential property. You can pick a time to sell the investment and a later time to sell the residential.

I created this questions for

find offending section ] originally-sorted array

I rephrase Leetcode 581 Q: There was an ascending int array containing duplicates. Someone reshuffled some elements so no longer sorted. Write a function to determine the shortest subarray needs reshuffle. Before or after this subarray, no change needed.

O(N) time and O(1) space.

==== analysis
Not “easy” as labelled.

I want to find lowest integer that’s out of place. (Do the same on the other end and problem completed.)

first scan to find global min and max values, and ensure they are correctly placed.

Next scan from beginning on the rising slope. At first down-turn, treat the array as two partitions. In the right partition after the first peak, we determine the lowest value, say, -55. Then we will find the correct position for -55, within the Left partition! We will look for the last item equal-or-lower-than -55…. This is the key constraint of this entire problem.

Insight — We know -55 must move left and we don’t care about any other item in the right partition.

Insight — As a concrete illustration, if within left partition, -55 should stand after #7, then we know all Earlier items up to #7 are already correctly placed. Why? All Subsequent items in the left section are strictly higher than -55; and all other items in right section are all (equal or)higher than -55.

— my idea 1 in ON) space : radix sort a clone of the array, then compare to the input array.

signed-int: longest K-sum subArray #60%

Q: given a signed-int array, find the longest subarray having sum=K

realistic scenario: we know someone executed a buy and a sell and made $K profit. There are many possible buy-sell points when I review the price points. We know this trader held the position longest to gain exposure, so which buy-sell points did he use?

====analysis

First scan to build cumulative sum — the sum-array. Looks like a price chart. Initial value is the first original element.

Now scan this sum-array. At each element s[j],

  • I save s[j] in hashtable {s[j] -> j] iFF no such key yet.
  • Then I look for the (by design, earliest) element s[i] such that s[i] + K == s[j]. If found in the hash table, then compute hold:=j-i and update the global longest_hold if necessary.

Note the hash table’s values (positions) are all earlier than j, since we are scanning forward.

Can consolidate to one scan.

Tip — hash table is usable to reduce O() thanks to exactly-K.

append+maxInRangeK #Rahul

Q: given a sequence of integers, implement two operations

  1. Operation 1: append another integer. Size becomes N
  2. Operation 2: given two arbitrary subscripts into the sequence, (they define a sub-array of size K), find the max

Rahul saw discussions without any optimal solution. I think a simple vector-based algo can achieve amortized O(1) for Append and O(logK) for Max.

AuxDS required.

— idea 1: segment the sequence into fixed segments

 

max-product subArray #51% untested

Q: Leetcode 152: given signed int array, find the subarray (minimum size 1) with highest product. Return the product.

Note if there’s 0 in your subarray you must return 0.

====analysis
peek? Not yet

— solution 1: O(N)

One scan to identify all the zeros. They partition the array.  For each partition, we need algorithm-A:

Count neg elements. If even, then return product of entire array. Else

Accumulate from left until last neg element -> prodL.
Accumulate from right until last neg element -> prodR. Compare the two.

trivial edge case — if the zeros create only single-neg partitions, then zero is the best.

trivial edge case — If subarray has only one element, be it zero or negative, then that’s the best for that subarray.

 

 

best2subArrays show`max diff in sums #70%

Q (careercup): Given an array of N signed integers. Find two disjoint contiguous sub-arrays such that the absolute difference between the two sub-array sums is maximized.  The sub-arrays should not overlap. O(n^2) algorithm was not accepted.

eg: [2 -1 -2 1 -4 2 8] ans: (-1 -2 1 -4) (2 8), diff = 16

==== analysis
int-array problems are my relative weakness but I have done quite a few. I feel this is more about the algo rather the implementation

Without loss of generality, let’s assume the best result includes a min subarray on the left and a max subarray on the right.

–idea 1: a partition point at 5 splits the array to 0-4 and 5-end
For a given partition point, we can find the min subarray on the left and also max subarray on the right.

A O(N^2) solution can be designed based on this idea — at each partition point, O(N) search of min subarray sum on the left and max subarray sum on the right

—idea 2: 2 colliding pointers le and ri.
0-le contains a min subarray, and ri-end contains a max subarray. But how do I move the 2 pointers?

— idea 3: O(N) scan fwd to record at each position “max sum ending here” and “min sum ending here” .. two numbers. Ditto reverse scan. Total 4 numbers per tuple.

A subarray is never empty, as produced from my minSubarraySum.py.

End up with N-array of tuples like {a,b,c,d} . Then scan this array in O(N)

3-sum #52%

Q[L] Given an array nums of K integers, are there elements a, b, c in nums such that a + b + c = 0? Find all unique triplets in the array which gives the sum of zero. Doable in O(KK)

====analysis

— Solution 1: separate into (M) non-negatives and (N) negatives. M+N=K.

— solution 1a: given M targets and N negative items,

if M > N, then check O(NN) pairs among negatives …. and look up the sum in a pre-populated hash table of M items.
If M < N, then use the two-sum algo O(MN)

Total O(MN + MM) for small M
Total O(MN + NN) for small N

Both comes out to O(MN+max(M,N)), smaller than O( [N+M]^2 )

Note some bigO students don’t like too many tokens and may simplify it to O(KK), but I think it’s imprecise… see when to introduce new tokens in O()

— solution 1c: given M targets and N negative items,

I can sort the N-array. For each target, I can binary-search, but this is possibly O(MN logN)

streaming data median #70%

Q (Leetcode 295): Design a data structure that supports the following two operations:

  • void addNum(int num) – Add a integer number from the data stream to the data structure.
  • double findMedian() – Return the median of all elements so far.

Follow up:

Q2 If all integer numbers from the stream are between 0 and 100, how would you optimize it?

Q3: If 99% of all integer numbers from the stream are between 0 and 100, how would you optimize it?

===== analysis
For Q2, I use a fixed array to keep the frequency. Based on this array, I maintain a segment tree.
For Q3, I will add two additional “buckets” i.e. array elements for ‘below0’ and ‘above99’.
Below is for Q1
— keeping a size-constrained minHeap holding the “upper house”. Similarly, a maxHeap to hold the “lower house”.

If N is even, then both heaps have same size. If N is odd, then lower half (maxHeap) gets 1 more item and its top is the median

if a new number is below current median it goes into the lower house (max heap) whose top may relocate to the other heap if necessary.

pop() is O(log N/2), so addNum() is O(logN) whereas findMedian() is O(1). Can RBTree achieve the same?

given unsorted ints, find median in O(N)

Classic problem — Q: Given an unsorted int array, find its median in O(N) time. For simplicity, we can first assume array size is odd.

I think we can use any data structure. We can scan the array multiple times.

—-simple partition algo

Pick a random item in the array as pivot value and partition the array. Let’s say we are unlucky and get 12% low vs 88% high. So we discard the 12% and repeat. Suppose then we get 66% vs 22% (within high section). We would then discard the 22%.

So we are likely to require 2N “visits”, I don’t think it would degrade to ((N logN).

—-fancy idea — weighted pivot

In one scan find the max and min. Calculate the mid value. Say this is 23.5, not even an integer. I will use this as a pivot value to partition the array into 2 segments.

Suppose N=101, so I’m looking for #51 item. Suppose left segment has 10 and right segment has 91, so I discard left segment. Now I’m looking for the #41 in the remaining array.

Now I find max and min again (if necessary). Now, Instead of getting the mid point between them, I will use a weighted pivot of (10*min+91*max)/101, so this pivot is shifted to the right, based on suspicion of a right-heavy histogram.

–complexity?

On average, I should get N+N/2+N/4 … <2N

Worst case? The earlier illustration is rather unlucky since that histogram happens to be right-heavy. In such a case, my “weighted pivot” idea should alleviate it.

max-profit at-most-2K trades #proven but untested

Q(Leetcode): Say you have a price history as array. Design an algorithm to find the maximum profit. You may complete at most 2K transactions, consisting of exactly K (eg 2) buys and K sells. You may not engage in multiple transactions at the same time (i.e. you must sell the stock before you buy again). No short sell please.

No O() requirement.

====analysis=====

I feel first challenge is to list all (not more than 10) scenarios. This step has taken me a few days, even though I drew many examples.

–Idea 3 for 2K, based on Leetcode discussion

f[2k, ii] represents the max profit up until prices[ii] using at most 2k transactions. 
f[2k, ii] = max(f[2k, ii-1], prices[ii] + max_for_all_jj(f[2k-2, jj-1] - prices[jj]))Two possibilities
  1. the optimal solution at [2k,ii] doesn’t involve the price point ii, so solution is f[2k,ii-1]
  2. the optimal solution at [2k,ii] involves a Sell at prince point ii. In this scenario, the last buy is at some previous price point jj, and before jj we have an optimal solution at [2k-2, jj-1]

–solution 1 (O(NN) brute force): construct all possible pairs, rank them and pick top 2.

–solution 2 (O(N) only works for K=2)

  1. Identify all the turning points so we end up with HLHLHL… We can eliminate or ignore the other points.
  2. * identify the best pair using the max-profit algo. denote them as L1/Hj
  3. * In the subarray before L1, find the best pair
  4. * in the subarray after Hj, find the best pair
  5. pick the best among the two an denote it as p2
  6. Now look into the subarray L1 to Hj. If there’s no enclosed pairs within then we have a simple case — use L1/Hj and p2. But let’s assume there are at least 2 nodes enclosed. I will denote entire subarray as L1 H1 L2 H2 … Lj Hj (where L1-Hj is the max-profit)
  7. * use max-profit algo to find the worst loss from H1 to Lj. Suppose it’s H3 to L5.
  8. If this loss exceeds p2, then the we will return L1/H3 and l5/Hj. Otherwise, return L1/Hj and p2

This solution uses the watermark algo 4 times (*).

I feel basic decision is to break the best pair or keep it

case: need to break the highest pair into 2 pairs,
case: best pair + another pair outside. I think this is easy..
case: 1,18,2,19,15,16. Perhaps the hardest case to solve.

–other ideas, for K > 2

can we use a matrix?

We can keep track of all profitable pairs i.e. le/ri indices, and also a pointer to the current best pair that’s not overlapping with “me”.

After creating 2nd pair, IFF no overlap, then we update the pointers in both instances.

After creating 7th pair, if it doesn’t overlap with the #3 highest pair, then check-update the pointer in #3.

I think if we can efficiently keep track of these then it should work.

size-N array find The duplicate int #1~N+1#Abhinav

https://leetcode.com/problems/find-the-duplicate-number/description/: Given an immutable int array nums containing n + 1 elements where each element is between 1 and n (inclusive), prove that at least one duplicate number must exist. You are guaranteed that there is only one duplicate number, find the duplicate value in O(1) space, below O(NN) time. The culprit may repeat many times.

I didn’t bother to write the code.

===== analaysis =====

contributed by a user and highly contrived:(
many likes:)

–bisection solution in O(N logN) time and O(1) space. I came up with this solution within a minute.

  1. Divide the full range [1 to n] into 2 almost-equal ranges (i.e. if n = 2K+1, then i use [1 to K] and [K+1 to n] as 2 ranges)
  2. Count how many nodes are in each range. Clearly one of the two ranges must have too many elements.
  3. Remember the boundary of that bad range so from now on we will ignore those nodes falling into the good range. We will use 2 variables to update/improve the boundary, until they coincide.
  4. within the bad range, repeat Step 1.

Key insight — progressive bisection.. non-recursive.

Key insight — applying pigeon-hold principle, we split the conceptual range. The more common (but ineffective) technique would split the physical array.

fewest jumps to reach right end #triple jump

Q(Leetcode 45): Given an array of non-negative integers, you are initially positioned at the first index of the array. Each element in the array represents the maximum permitted jump length from that position.

==== analysis =====
https://github.com/tiger40490/repo1/blob/py1/py/array/tripleJump.py is my solution, NOT tested on Leetcode. I won’t bother to test on leetcode. Protect my joy, momentum, absorbency

My solution not only determines feasibility but also finds the fewest jump.

Typical DP+

greedy algorithm. I will jump leftward starting from right end  [1].

Suppose there are N=99 nodes in the array. I will pre-scan the N nodes to build a shadow array of integer records, each a BestLefNode. (The first record is unused.)

Eg: If BestLefNode[44] == 33, it means that based on known data so far, the left-most (furthest) node we can jump to from Node #44 is Node #33.

Suppose the original array shows that from Node #7 we can jump 11 steps ahead. When we visit Node #7 during the rightward scan, we will update (up to) 11 BestLefNode records. These 11 records are located at #8 onwards. Each record, will be updated with “7” if appropriate.

As soon as we update BestLefNode[N-1] i.e. right-most record, we exit the initial scan since the optimal solution is now available. For example, if rightmost BestLefNode has value #88, that means the furthest node we can reach from the right end is Node #88, so we will jump to #88 and then check the best destination From #88.

[1] why not start from left end and jump rightward? No I think there’s no symmetry in this problem. From Node 1 the immediately-reachable nodes are a continuous region.

longest consecutive ints]O(N) #zebra

Popularity — 1000+ likes on Leetcode … possibly popular

Q(Leetcode #128): Given an unsorted array of integers, find the longest consecutive element sequence, in O(N) time. Eg: given [100, 4, 200, 1, 3, 2] return [1,2,3,4]

I call this the zebra problem because  every consecutive sequence of int is a black stripe and the gaps between them are white stripes. We want the widest black stripe. Obviously, each stripe has minimum size 1.

https://github.com/tiger40490/repo1/blob/py1/py/array/zebra.py is my O(N) solution, not tested on Leetcode.

========

What’s UnionFind? A reusable technique?

Like inserting interval #merging #80% done, I  feel this is a data structure problem,

To keep things simple, i will first run one iteration to remove all duplicate items.

I will use hashtable where key a known item. The value is a pointer to a “segment” object.

A segment stores the min and max values. All integers within [min, max] of the segment are always known-items during my scan of input array.

When a new item is either min-1 or max+1, we expand the segment by adjusting the extremes…

The trick is joining two segments, without link pointers. After joining, we don’t really adjust the min/max fields. We only update the max-length global variable if needed.

To keep the hashtable small, I can optionally delete from it but we don’t want to do a range delete within the loop — O(NN)

max rectangle ] histogram

Q: https://leetcode.com/problems/largest-rectangle-in-histogram/description/. Given N possibly recurring non-negative integers representing the histogram’s bar heights, and given the width of each bar is 1, find the area of largest rectangle in the histogram.

Visually well-defined problem. Kind of naturally-occurring. Very simple data structure. No O() requirement, so I will just try my own solution.

https://github.com/tiger40490/repo1/blob/py1/py/array/maxHistoBox.py is my solution. 100% passed on Leetcode.

==== analysis — heavy on data structure design.

Key insight — one scan to update a clever data structure.

key insight — data structure is not per bar, but per height!

For every bar J, there exists an enclosing max-rectangle of J’s height. We can just compare all of these rectangles.

We might start with two extreme candidates
1) the peak — whose enclosing rectangle is likely slender — O(N) one scan to find all the peaks
2) the lowest bar — whose enclosing rectangle has width N — O(N)

If we paint the histogram as a binary matrix, then this is equivalent to anther problem max all-black submatrix #DP #zhurongbut I think there exists better solutions like O(N logN) or O(N*S) …

–homegrown algo with O[N*S] where S:= #unique heights. The binary search doesn’t show up as logS.

A pre-scan to get all distinct heights. For each distinct height, we maintain a RunRecord object {bestRun, currentRunStart, height}, in a sorted map {height -> record}. In py, I can use a pre-sorted vector of Records, sorted on height

In main scan, As we encounter a new bar of height J, we update these records.

  • if not falling or rising
    • record-J and each record-H below J must have a current run … extend that run (no-op)
  • if rising from height H
    • each record up to H must have a current run … extend that run by no-op
      • iterate the treemap up to H
    • iterate treemap from H+1 to J. start a new run for each record
  • if falling from height P to J
    • record-J and each record-H (where H <J) must have a current run … extend that run
    • iterate treemap from J+1 to P … each record-K must have a current run, indicated by a valid currentRunStart, then this record’s current run has just ended. We update bestRun and put a invalid value into currentRunStart.

At end of the main scan, every record has a bestRun i.e. the duration. I can then calc the area under each bestRun and return the max.

sum@arbitrarySubArray, mutable int #Rahul#segmentTree

Q: given an array of mutable integers, implement subArraySum(le, ri), and updateElement(idx, newVal)

This is data-structure heavy. You need correct data structure to support efficient update/query.

Assumption A: Without loss of generality, i will assume original array length is a power of two, such as 8

— Idea 1: carefully segment the array. Maintain array of Segment object {le, sum}

The segments can shrink/expand based on heuristics. For now, I will assume “Segment.le” is immutable.

Every update() will update the Segment.sum in exactly one segment per level.

At the leaf level, there are 8 segments of length one or two. (Given Assumption A, it would be two.)

Next level I will have 4 segments. Each segment at this level consists of exactly 2 leaf segments. Similar to Fenwick tree and segmented binary tree, update() and query() are both O(log N)

check array@0-N +! auxDS #Nsdq#contrived

— Q1: Given a size-5 array of integers. For every element x: 0<= x <=4. Array would be considered Good if all elements are unique, i.e. all the numbers 0 to 4 show up exactly once each. Please write a program to check if the array is good. If Bad, then print every number that’s showing more than once. O(N) time and O(1) space please.

We will denote array size as sz, so N == sz-1.  N=4 and sz=5 in the opening example.

— comments:

I feel this question (even without the bonus) is too hard to complete on the spot, unless you have done it before.

I made a mistake that would be unforgivable in west coast and Bloomberg interviews — failed to clarify requirements and assumed input array is immutable.

— Q1b (bonus): Also print every missing number. (Requires iterating entire input.)

Example A {1,2,0,2,0} has repeating 0 and 2, and missing 3 and 4. Example B {3,1,4,2,0} is Good.

— comments:

If you feel the O() complexity requirements are daunting, then work out a less efficient solution first, and improve on it.

I feel in real projects, we always have enough memory to build a separate array, so the O(1) space requirement is contrived. A O(N) time and O(N) space solution is easier. Therefore, I rate this coding question as contrived and relatively non-standard. Yet the required techniques are “universal” in many high-end interviews.

https://github.com/tiger40490/repo1/blob/cpp1/cpp/algo_arr/checkArr0-N_Nsdq.cpp

has my 99% solution. The unfinished part is trivial.

 

 

max-sum subMatrix O(N^3)

Q: given a square matrix (height N) of signed ints, find a submatrix with maxi sum

====analysis

compute submatrix sum by hashmap lookup is a O(N^4) solution.

https://www.geeksforgeeks.org/dynamic-programming-set-27-max-sum-rectangle-in-a-2d-matrix/ has a O(N^3) solution for square matrix. I think brute force is O(N^6). If you know the trick it is doable.

Clearly the max-profit algo is inapplicable. The Kadane algo may help.

Need auxDS for sure.

— simple, elegant DP idea based on hint:

For every left-right column pair, find the best submatrix bounded by those two columns . For example, given column pair 3/7, the best submatrix must extend from column3 to column7. I think this constraint is extremely liberating, and leads to an elegant DP solution:

For pair 1/2, construct a new column array temp[] where temp[6]:= row-wise sum across columns 1/2. Once we have temp[], apply Kadane’s algo on temp[] to find the max submatrix sum. Entire process is O(N)

Q: Is each pair evaluated as an independent problem? I doubt it. I think after we have done pair 1/2, adding column3 into the mix is O(N), so the pairs 1/2, 1/3, 1/4 .. 1/N can be completed in O(NN). All pairs would O(NNN)

So how is adding column3 into the mix O(N)? I think we don’t use the prev Kadane result, but we use the previous temp[] to gain efficiency! We can update existing temp[] in O(N), then apply Kadane’s algo on it in O(N)

What if I have a 99×88 matrix? Do we use a O(NNM) or O(NMM) algo? Since there are fewer columns, I will stick to the column-pair algo presented.  (If there are fewer rows i.e. flat matrix then I would use row pairs.)

find sub-array with exact target sum #O(N)#1pass clever

#include <iostream>

/* determines if the there is a subarray of arr[] with sum equal to 'sum' 
Nice and simple algorithm. Works for exact match only.
*/
void subArraySum(int const arr[], int const n, int const sum)
{
	int curr_sum = arr[0], le = 0, ri;

	/* Add elements one by one to curr_sum and if the curr_sum exceeds
the sum, then remove starting element */
	for (ri = 0; ri <= n-1; ri++) {
                /* If curr_sum exceeds the sum and sub-array isn't single,
then remove the starting elements*/
		while (curr_sum > sum && le < ri) {
			printf("curr_sum = %d too high. le = %d; ri = %d\n", curr_sum, le, ri);
			curr_sum = curr_sum - arr[le];
			le++;
		}

		if (curr_sum == sum) {
			printf("Sum found between indexes %d and %d\n", le, ri);
			return;
		}
		// now curr_sum is too small or sub-array is single
		curr_sum = curr_sum + arr[ri+1];
	}
	printf("No subarray found\n");
}
int main()
{
	int const arr[] = { 11,24, 2, 4, 8, 7 };
	int const n = sizeof(arr) / sizeof(arr[0]);
	int const sum = 7;
	subArraySum(arr, n, sum);
}

 

1st lonewolf]unsorted int array

Q: if an integer appears only once in an array, then it’s a lone wolf. Find the first lone wolf in a given array.

This problem is mostly about efficient data structures.

— one idea — O(N) one-pass. linked list + hash table. https://github.com/tiger40490/repo1/blob/cpp1/cpp1/array/1stLoneWolf.cpp is my tested implementation.

In a green linked list, each link node holds {index}. Every index “point to” a lone wolf.

Every lone wolf int or repeated int is also in a hashtable {int -> {color, iterator in linked list}}

For every node encountered, we add it to the hashtable. If it is a new int, it’s appended to the linked list; if it turns from green to brown, then we remove it from linked list.

— one idea — O(N) two-pass. linkedHashMap
* any “new” element will be appended to a LinkedHashMap and painted green
* any “seen” element will be repainted brown in the same container.
* How to test for newness? Any examined element is tested by contains().
* 2nd scan to return 1st green.

— One idea — hash table + heap in fwd scan

  • use a hash table to hold payload objects, and progressively mark each payload green or brown
  • each payload consists of {color, index}. The key in the hash table is arr[index]
  • As we create a payload (to put into hash table), we also save its address in a min-heap, sorted on the index
  • After the scan, extract minimum from the heap, until we get a green element.

 

rainfall: streaming mode #Ashish/CSY

In the static, non-streaming context, the optimal solution is perhaps in my gmail (need to check). Now I feel this is identical to the  “medium” leetcode water-container problem:

Q: Given an array of size N (>=2) of non-negative numbers representing N walls, find a pair of walls that can hold the most water.

https://github.com/tiger40490/repo1/blob/cpp1/cpp/array/rainfallMaxContainer.cpp is my solution + an equivalent but much shorter solution by CSY, basically same as the idea below !

The brute force would evaluate (N-1)*N/2 pairs. We can reduce that significantly. Start with 1st/last, which is possibly the final winner. Save this as maxProductOfLoopA and evaluate a[0] vs a[9 i.e. last].

  • Suppose a[0] > a[9], then 1/9 , 2/9 , 3/9 etc are automatically eliminated.
  • Suppose a[0] < a[9], then 0/8, 0/7, 0/6 etc are automatically eliminated.
  • if a[0] == a[9], then 0/8, 0/7 etc and 1/9, 2/9 etc are automatically eliminated

You can visualize it as removing an outer layer from a NxN matrix. Note the matrix is triangular and has exactly one outer row and one outer column at any moment. In the first step, you either remove the outer row or outer column, or both.

Supposed you removed the “*/9” column. In 2nd step, we compute 2nd product at 0/8, and 2nd evaluation of a[0] vs a[8] and remove another outer layer.

In about N steps we should reduce the matrix to 1 remaining cell. This cell could be the final winner so we must evaluate it.

—-

Hi Ashish,

Let me change your question slightly. Input numbers come individually in a stream. Among all possible pairs of numbers, we want to compute and publish the latest maximum product:

    A(i, j) * B(i, j)

, where A(i, j) is the time lag j-i and B(i, j) is minimum(arr[i], arr[j])

NB: We will need to keep all array elements seen so far, since the the winning pair might include the first few.

At any time there’s a current max. When we receive the 89th element, enter loop A:

compute the product for 1st/89th i.e. arr[0]/arr[88]
* If arr[0] > arr[88], then just exit the loop with this product as maxProductOfLoopA. No need to try 2nd/89th or 3rd/89th, as all those products are all smaller than 1st/89th. (This is the genius of the solution you told me.)
* otherwise, compute the product for 2nd/89th. If it exceeds maxProductOfLoopA, then update maxProductOfLoopA. Now check if arr[1] > arr[88]. If yes just exit loop A with maxProductOfLoopA
* otherwise compute the product for 3rd/89th….

Once we exit loopA, update the current max product using maxProductOfLoopA.

For streaming situation, I think this is one good solution, if not the optimal solution.

[15] EPI300 skyline #event-handler

Reusable technique – SQL

PROBLEM STATEMENT (equivalent to Leetcode Q 218)
We have to design a program which helps drawing the skyline of a two-dimensional city given the locations of the rectangular buildings in the city. Each building B_i is represented by a triplet of (L_i, R_i, h_i) where L_i and R_i are the left and right coordinates of the ith building, and h_i is the height. In the diagram below there are 8 buildings, represented from left to right by the triplets (1, 5, 11), (2, 7, 6), (3, 9, 13), (12, 16, 7), (14, 25, 3), (19, 22, 18), (23, 29, 13) and (24, 28, 4).

Input
The input of the program is a sequence of building triplets. The triplets are sorted by L_i (the left coordinate of the building) in ascending order.

====analysis====

A data structure challenge. Once I had the correct data structure … 迎刃而解

Q1 —– For any given x, determine the height in the skyline.
Note If x == R_i, then the ith building doesn’t count. In other words, If you look at the first building, the 1-to-5 range it covers is a half-open interval, sometimes written as [1,5) as this range includes 1 but excludes 5. You can think of [1,5) as [1, 4.99999999] approximately.

A1(brute force): look at each building and decide if it “covers” the point X. Given the pre-sort, most buildings aren’t relevant. A complete solution would be

Select max(h) from myTable t where t.l =< x < t.r

To support this solution, the objects could be stored in two sorted data structures, one sorted by L_i and one sorted by R_i.

Q2 —- draw the skyline.
A2: evaluate Q1 (i.e. get the height) at every L_i and R_i value. This solution is probably suboptimal.

Q2b (Leetcode Q218) —- draw the skyline by outputting a sequence of {x,h} pairs showing the change in height (represented by a new height h) at each vertical edge (marked by x).

Is it possible to do this in one scan after preprocessing the triplets with some clever data structures? [[EPI300]] may have a solution. Here’s my own proposal —

Pre-sort the N buildings into a list by L_i, and sort the same buildings into another list by R_i, and merge the 2 lists into a big sorted list of 2N pointers to N unique objects. Each building shows up twice. Each of the 2N entries consists of {building_object_id, x, boolean flag left_or_right }. We will go through this big list one-pass.

I convert this list into a sequence of “events” as we move along the x-axis. That’s why all left/right edges must be sorted into a single sequence.

Main scan — As we hit the left edge of a building, we include this building in the Alive container (probably a BST). In Alive, we keep the buildings sorted by height. We also maintain a lookup table { building_id, pointer/iterator into Alive}. As we hit the right edge of a building, we remove it from Alive. (No need to remove from lookup since we won’t see the same building again.)

As we hit any edge, we need to determine the impact on the skyline if any. This is when we make use of the Alive system. For any edge, there’s impact iff the target building is taller than all other in Alive.

locate a pair with targetDdiff==55

http://www.geeksforgeeks.org/write-a-c-program-that-given-a-set-a-of-n-numbers-and-another-number-x-determines-whether-or-not-there-exist-two-elements-in-s-whose-sum-is-exactly-x/ has a nice algo for “sum=55”

This technique becomes more interesting when the problem is modified to “find a pair whose difference is 55”. Without loss of generality, let’s keep everything positive.

–my solution:

  1. sort the original integers
  2. duplicate the array, flip the signs and reverse it
  3. append positive array to get a combined, sorted array, whose left half are all negavie
  4. now start the 2 pointers, but each constrained within its half.
//Requirement: find all pairs in the array whose difference = D
//
//If there are negative numbers, then the simplest solution is
//to shift everything up to positive

vector<int> v{3,1,1,7,4,9};
int const SZ = v.size();
int const D = 2;

void dump(){
  for(int i=0; i<v.size(); ++i) cout<<setw(3)<<v[i];
  cout<<endl;
  for(int i=0; i<v.size(); ++i) cout<<setw(3)<<i;
  cout<<endl;
}
int main(){
  sort(v.rbegin(), v.rend());
  for (int i=SZ-1; i>=0; --i) v.push_back(-1 * v[i]);
  dump();
  for(int left=0, right=v.size()-1; left < SZ && right>=SZ;){
        int diff = v[left] + v[right];
        if (diff == D) {
                cout<<v[left]<<" - "<<-1*v[right]<<endl;
                if (v[left+1] == v[left]) ++left;
                else --right;
        }
        else if (diff < D) --right;
        else ++left;
  }
}

locate a pair with targetSum==55 : pre-sorted #O(N) 1-pass

I came up with this idea within 5 sec — carefully adjust le and ri pointers from both ends towards center.

  • if arr[le] + arr[ri] too big, then move le
  • else move ri
  • repeat until success or le+1==ri

My friend Ashish was asked a related question — unsorted array .. must return original indices of the pair. It’s relatively easy to adjust my algo to meet this requirement.

  • before sorting, save the original indices in a multimap or map<val, set<index>>
  • after we found a pair, look up to get the indices.

gym scheduler #greedy#Junli

I find this algorithm extremely simple but not easy to recognize. The extreme simplicity indicates some important/valuable tip but not sure what.

Write a scheduler for a single gym equipment(like a room). Every morning at 8am before start of day, we receive a batch file containing all booking requests for that day. We run the scheduler once, and all accepted bookings are confirmed and finalized. Each request is identified by a pair of start/end times on the current day. No 2 requests share the same start/end times (perhaps because the duplicates are already filtered out). Exercise duration is in blocks of 1 minute. We earn $1 rental for each session regardless of duration.  Optimization goal is to allocate as many sessions as possible. We don’t care idle time; it doesn’t cost us. Let’s say the candidate for the “optimal” allocation plan is PlanK.
—–solution——
Pick the booking that ends earliest. Alternatively, pick the booking that starts later than all others.
(I feel this is not a dynamic programming algorithm, but a greedy algorithm.)

If there’s a last-minute user (4.59pm – 5pm, let’s call it AA), then it’s always good to accept. This last minute user AA doesn’t block others. AA is a minimum-hindrance session.

If PlanK ends with any other session (call it BB), BB surely starts before 4.59pm. BB is a bigger hindrance than AA is. If we replace BB with the last-minute user AA, we would produce a “safer” plan – PlanL. PlanL is at least as optimal as PlanK, and can potentially /accommodate/ additional sessions. Therefore PlanK without AA isn’t the optimal. The optimal plan ought to include the last-minute allocation.

That’s the key to the optimization. The last-to-begin session has minimum hindrance and should always be selected. After that selection, eliminate all requests that are now hindered. Then among the remaining  requests, look for the last-to-begin.

———-
Exploiting symmetry, you can reverse the algorithm by picking the first-to-end session.

island rainfall problem

#include
#include
#include
#include
#include
#include
using namespace std;
int const island[] = { 54, 50, 54, 54, 52, 55, 51, 59, 50, 56, 52, 50 };
///////////////   Pos # 0   1   2   3   4   5   6   7   8   9  10  11
int const size = sizeof(island) / sizeof(int);
int accu = 0;
//adapted from STL
template
ForwardIterator max_element_last(ForwardIterator scanner, ForwardIterator const end) {
ForwardIterator ret = scanner;
if (scanner == end)
return ret;//empty range, with zero element!
while (++scanner != end)
if (*ret <= *scanner) //"=" means find LAST
ret = scanner;
return ret;
}
//print height and address of a column
void print1(int const* const pos, char const * const label) {
//int const height = *pos;
printf(“%s=%d/%d “, label, *pos, pos – island);
}
void printAll(int const* const L, int const* const l, int const* const h,
int const* const H) {
if (l < h) {
print1(L, “wallL”);
print1(l, “ptr”);
printf(”  “);
print1(h, “ptr”);
print1(H, “wallH”);
} else {
print1(H, “wallH”);
print1(h, “ptr”);
printf(”  “);
print1(l, “ptr”);
print1(L, “wallL”);
}
printf(“%d=Accu\n”, accu);
}
//Rule: move the lo-side pointer only
void onePassAlgo(){
int*loptr; //moving pointer, moving-inward.
int*wallLo, *wallHi; //latest walls
int*h;

//1st we ASSUME the first left side wall will be lower than the first right side wall
wallLo = loptr = const_cast (island);
wallHi = h = const_cast (island) + size – 1;
//2nd, we validate that assumption
if (*wallLo > *wallHi) {
std::swap(wallLo, wallHi);
std::swap(loptr, h);
}
// now lo is confirmed lower than the hi side
printAll(wallLo,loptr,h,wallHi);
printf(“All pointers initialized (incl. 2 walls\n”);
while (loptr != h) {
if (*loptr > *wallHi) {
wallLo = wallHi;
wallHi = loptr;
std::swap(loptr, h);
//printf(“new wallHi:”);
} else if (*loptr >= *wallLo) {//see the >=
wallLo = loptr;
//printf(“wallLo updated:”);
} else {
assert (*loptr < *wallLo);
accu += (*wallLo – *loptr);
printf(“adding %d liter of water at Pos_%d (%d=A\n”, *wallLo – *loptr,
loptr – island, accu);
}
printAll(wallLo,loptr,h,wallHi);
// only by moving the loptr (not h) can we confidently accumulate water
if (loptr < h)
++loptr; //lo side is on the left, move loptr right
else
–loptr; //lo side is on the right, move loptr left
}
}
void twoPassAlgo() {//less convoluted
int const* const peak = max_element_last(island, island + size);
printf(“highest peak (last if multiple) is %d, at Pos %d\n”, *peak, peak
– island);
//(island, island + size, ostream_iterator (cout, ” “));

//forward scan towards peak
int* pos = const_cast (island); //left edge of island
int* wall = pos;
for (++pos; pos < peak; ++pos) {
if (*wall > *pos) {
accu += *wall – *pos; // accumulate water
printf(“adding %d liter of water at Pos#%d (T=%d)\n”, *wall – *pos,
pos – island, accu);
continue;
}
//ALL new walls must match or exceed previous wall.
printf(“found new wall of %d^ at Pos#%d\n”, *pos, pos – island);
wall = pos;
}
cout << "^^^ end of fwd scan ; beginning backward scan vvv\n";
//backward scan
pos = const_cast (island) + size – 1;
wall = pos;
for (–pos; pos > peak; –pos) {
if (*wall > *pos) {
accu += *wall – *pos; // accumulate water
printf(“adding %d liter of water at Pos#%d (T=%d)\n”, *wall – *pos,
pos – island, accu);
continue;
}
//Note all new walls must match or exceed previous wall.
printf(“found new wall of %d^ at Pos#%d\n”, *pos, pos – island);
wall = pos;
}
}
int main(int argc, char *argv[]) {
twoPassAlgo();
accu = 0;
cout<<"—————————–\n";
onePassAlgo();
}
/*
 Requirement — a one-dimentional island is completely covered with columns of bricks.
 If  between Column 
 A(height 9) and Column B(10) all columns are lower, then we get a basin to
 collect rainfall. Watermark height (absolute) will be 9.  We can easily calculate the
 amount of water. If I give you all the column heights, give me total rainfall collected.
 Code showcasing
 – stl algo over raw array
 – array/pointer manipulation
 – array initialization
 – array size detection
 – std::max_element modified
 – std::swap
 */