get_majority_elem]unsorted array,O(1)space #90%

Q: Given an array of size n, find the majority element. The majority element is the element that appears more than ⌊ n/2 ⌋ times. You may assume that the array is non-empty and the majority element always exist in the array.

====analysis
worst input: odd length, only X and Y. X occurs once more than Y.

hash table solution needs O(N) space since there can be N/2 distinct values. To improve space complexity, how about quick select? Discard the smaller side and pick another random pivot.

Median-finder algorithm can solve this problem, using std::nth_element() which uses QuickSelect… O(1) space despite recursive set-up.

— idea 3 O(1) space: random pick then verify
Random pick and count the occurrence of this pick. Is it more than N/2? Within a few trials we should find a good pick.

nth largest element in unsorted array #QuickSelect

Q: Find the kth largest element in an unsorted array. Note that it is the kth largest element in the sorted order, not the kth distinct element.

I think this is mostly a bigO algoQQ problem.

std::nth_element is linear on average .. https://stackoverflow.com/questions/11068429/nth-element-implementations-complexities talks about QuickSelect algo

— idea 6: priority Q (Fib-heap) of size k
if any item is higher than the min, then pop min O(logK) and insert in O(1)
— idea 6: priority Q
Linear time to build it
— idea 5: 95th percentile problem from NYSE
— idea 4: multiple scans
— idea 3: segments
— Sol2: O(N). use the O(N) algo in the blog on “given int array, find median ] O(N)”. Then discard one of the two segments. Then repeat.
Note: Each time the target element must exist in one of the 2 segments.

O(N) + O(N/2) + O(N/4) … -> O(N)

— Sol2a: Use the O(N) qsort partition algo to anchor a (random) pivot element to create two segments. Our target must exist in one of the two, so discard the other by adjusting the le/ri boundaries.

This idea is same as the most voted solution in leetcode discussion.
O(N) on average — we get O(N)+O(N/2) + O(N/4) + … < O(2N)

Note average complexity is acceptable in hashtable!

in-place merge: 2 sorted arrays

Q: Given two sorted integer arrays nums1 and nums2, merge nums2 into nums1 as one sorted array.

  • The number of elements initialized in nums1 and nums2 are m and n respectively.
  • You may assume that nums1 has enough space (size that is greater or equal to m + n) to hold additional elements from nums2.

I would add a requirement — O(1) additional space. so you can’t create another array. This can be realistic if allocation is strictly controlled to prevent fragmentation in embedded environment.

====analysis:

Rather contrived, so I won’t spend too much time

–Idea: use num1’s right portion as the “new” array.

Suppose the allocated array of num1 has capacity k >= m + n. I will call it array KK. Note The right portion of KK is currently unused, so I can wipe it clean with some dummy value.

( If no dummy value is possible, then I probably can still solve the problem but with less clarity. )

Now backscan both arrays and put the highest value in KK[m+n-1] , filling KK leftward. The spare capacity to the right of this position will remain unused forever.

Implementation note — We need a back-scanner pointer into num1 as “cur” + another pointer to the right, “lastPicked”… meaning the item at this position has been copied to KK.

(We may not need lastPicked pointer, but it is less ambiguous more clear, easier to reason with.  You may say it’s a device for analysis and communication, not necessarily for coding.)

We also need such a pointer into num2.

inserting interval #merging

Q (Leetcode): Given a set of non-overlapping intervals, insert a new interval into existing intervals (merge if necessary) and print updated list of intervals. Intervals were a vector sorted according to their start times.

–analysis–

Now I feel the #1 main data structure is a doubly linked list (dlist) of Segment objects:

  • { segment_left_mark,
  • ptr to next node, ptr to prev node
  • optionally a (bool or) enum having A/B, where A means current segment is AboveWater (an interval) or BelowWater i.e. a gap}.

Every time this dlist is modified, we would update a “helper container” — a tree of node pointers, sorted by the segment_left_mark value. Tree to help successive inserts. However, if each insert(vector intervals) has a sorted vector then we can binary search the vector and don’t need to tree.

First, binary search to locate the left mark among all existing marks. Ditto right mark. Based on these 2 results, there are many cases.

  1. done — Case (simple) both fall into the same existing interval. No op
  2. done — case (simple) both fall into the same gap segment. Create 2 new segments and insert into the dlist
  3. done — case (simple) one boundary falls into a gap the other falls into a adjacent interval — just adjust the segment_left_mark without inserting new segment
  4. done — case — bridge: both boundaries fall into different intervals. Adjust segment_left_mark of 2 affected segments, then link up the two to skip the intermediate segments
  5. done — case — wipeout: both boundaries fall into different gaps, wiping out at least 1 interval.
  6. done — case (most complex) — one falls into an interval, the other into a non-adjacent gap.
  7. case — incoming interval left boundary is lower than all boundaries, but right boundary falls into some segment
  8. case — incoming interval is very low
  9. case (special) — if an interval becomes adjacent to another, then merge the two.

Need a sorted tree of all marks + array of segments. Redundant but helpful.

Each segment (interval or gap) is represented by {left mark, right mark} where left <= right. I will save the segment objects into (a linked list and) an array. Even elements are interval objects and odd elements are gap objects. Now superceded by dlist.

I think this problem is all about corner cases. Perhaps start with the complex cases which will take care of the simpler cases. No need to pass Leetcode tests. Due to the pointer complexity, I prefer python.

https://github.com/tiger40490/repo1/blob/py1/py/linklist/insertInterval.py is my solution but I dare not test on Leetcode

sequence@{peak within sliding window} O(N) no dequeue

Q (leetcode hard problem 239): Given an array nums, there is a sliding window of size k which is moving from the very left of the array to the very right. You can only see the k numbers in the window. Each time the sliding window moves right by one position, return the max item in it. Total run time should be O(N) but I think within each window we may not always get O(1).

https://www.interviewbit.com/problems/sliding-window-maximum/ is the same problem stated more clearly.

====analysis====

Since there exists a O(N) solution i’m confident I can figure out a O(N) solution

–idea — first scan to identify all the troughs. Items around trough to be removed, but always keep original subscripts of remaining nodes

–latest idea, not using dequeue therefore likely original —

Update — consider a rising sequence before going any further. Virtually all items are showmen…

in my showman-stack, every element {idx,height} will show up on stage at least one episode. I would say each showman goes on stage either when it first enters the sliding window, or before it leaves the sliding window. If that’s the case at each slide i only need to consider the head and tail.

(Now i think this is in the right direction but is messier than it needs to be.)

first forward scan populates this stack and ensures only showmen are included. Let’s work out some examples. First part of first scan would simply find the max among first K items. 2nd scan might be combined with first scan, but for now, my goal is clear and focused — produce a clean stack of showmen-only

  • degenerate case — monotonic sequence requires virtually all elements be included
  • rules on the forward scan —
    • every new item needs to be pushed to the stack as it could be higher than all followers. The tricky logic is “what before the push”
    • Before we examine a new item, top of the stack is always the immediate predecessor. Need to add assert.

pseudo code:

O(N) pre-scan to find all troughs. For each of N items, record the preceding trough. No stack yet.

First scan: check if the most recent trough is within range. If NO then nothing to pop from stack, so simply push new item and advance. Now assuming YES.

while new item >= top of stack __&&__ the 2nd top is within range (distance <=K):

keep pop

while new item >= top of stack && distance between is below K && there would still exist an earlier higher item within range:
//at end of the while loop, we could actually produce some output, but I would postpone it to 2nd scan

(For the last K items in array, we need some O(K) special handling.)

I think both conditions are necessary for popping the stack. Are these two conditions alone sufficient justification to remove the top?

—- segment-based solution #elegant
https://stackoverflow.com/questions/8031939/finding-maximum-for-every-window-of-size-k-in-an-array shows an elegant segment-based algo.

Suppose windows size := 10 and subscripts start at 0.
A) Conceptually we split the array into fixed segments of length 10. (The stub at the end is LG2.) Note unlike the ring algorithm, no physical data structure is implemented for this purely conceptual segments, but this conceptual data structure is essential to this clever solution.
B) Pre-scan – pre-scan the array twice .. rightward and leftward to populate two physical data structures, the LR-lookup and RL-lookup i.e. segment-wise max-till-here. For example,

RL[22] := max(#29 to #22) := Leftward max-till-22 within the enclosing egment@20 enclosing #20 to #29.

With the two auxiliary data structures RL lookup and LR lookup, the solution is mind-boggling simple and elegant. For example,

C) window@22 will overlap segment@20 and segment@30. The 29-to-22 section is covered by RL[22] and the 30-to-31 section is covered by LR[31]. So simply compare RL[22] vs LR[31].

I find this solution more intuitive, more visual than the ring solution. After the pre-scans, there’s no data structure to maintain so the iterations are stateless and extremely straightforward.

Q: what if increasing sequence?
Q: what if decreasing sequence?

—- The same webpage also has a concise explanation of the deque-based solution, kind of similar to but cleaner than my showman idea

• I will use a ring buffer of size=W as a ring is allocated at start-up time (at compile time if W is compile-time constant) and requires no dynamic allocation.
• Ring is FIFO. Terminology — I will say the earlier elements are removed from the Head like a ticketing queue; new elements are appended at the Tail.
• Invariant — I think within this ring, payloads are descending from head to tail.
• Ring holds subscripts, not payloads. Payloads could be strings or floats.
• At each iteration, 0 or 1 head item is removed if it is out of reach.
• At each iteration, incoming item kicks out all smaller tail items, as they are too close to the incoming giant. This operation enforces the invariant.

As a result, head of the ring (as a subscript) always points to the current max payload.

 

sorted circular array max() in O(log N)

Q: A sorted array A[] with distinct elements is rotated at some unknown point, the task is to find the max element in it.

Expected Time Complexity : O(Log n)

–Analysis —
It takes a const time to determine if the list is ascending or descending.

(Take 1st 3 elements. If not sorted, then entire task is simple — answer is the max among the three because we hit the turning point)

Suppose we know it’s ascending, use binary search forward to locate the turn.

rotate array by K slots in O(1)space O(N) time

Can write a c++ solution as an ECT practice. Easy to verify. Added to enroute.txt.

[[EPI300]] 6.13 is probably a much shorter and more readable solution!

A: find the largest common denominator between N and K. If it’s 3, then the array consists of 3 interleaved subsequences (abbreviations). We will “process” each subsequences simultaneously since they are completely independent. First subsequences consists of Slots 0,3,6,9… Within each subsequences , K’ and N’ (scaled down by the denominator) have no common denominator, therefore an iteration will visit every element exactly once until we revisit the first element.

Q: rotate a large array (N elements) by K positions. Both N and K are potentially large, so we need 1) time efficiency 2) memory efficiency. Ideally, O(1) space and O(N) time.

–analysis
If K = 1, then just shift every element right, rotating the last element to the head. If K = 2, just repeat once. But K is large.

You can assume it’s an array of pointers, or you can assume it’s an array of integers — Same challenge.

The brute-force solution requires O(K*N), shifting all N elements K times.

Without loss of generality, let’s rotate by 5 slots. If N is multiple of 5, like 30, then there are 5 individual, independent, interleaved arrays of 10. Just rotate each by 1 slot.

If N is not a multiple of 5, like 31, then view the array as a circular array and we hop by 5 slots each time. So our journey will start from position 0 (or any position) and cover every position and come back to the starting position. This can be proven by contradiction —

Sooner or later we must be re-visiting some position. We know that position is reachable from position 0, so the distance between them must be multiple of 5. So position 0 must be revisited.