longest run@same char,allow`K replacements #70%

https://leetcode.com/problems/longest-repeating-character-replacement/

Q: Given a string s that consists of only uppercase English letters, you can perform at most k operations on that string. In one operation, you can choose any character of the string and change it to any other character. Find the length of the longest sub-string containing all repeating letters you can get after performing the above operations.

Here’s my description of the same problem:

Q: Suppose we stand by highway and watch cars of the each color. Only 26 possible colors. Cars pass fast, so sometimes we miscount.

My son says “I saw 11 red cars in a row in the fast lane”.
My daughter says “I saw 22 blue cars in a row in the middle lane”
We allow kids to miss up to 3 cars in their answer. In other words, my son may have seen only 8, 9 or 10 red cars in a row.

When we review the traffic video footage of N cars in a single lane, determine the max X cars in a row of the same color, allowing k mistakes. K < N.
====analysis
Suppose k is 3

— solution 1: O(N) use 2 variables to maintain topFrq and w i.e. winSize

Within a sliding window of size w, maintain a frq table. initialize w to a good conservative value of 4 (i.e. k+1).

If we notice top frq is 2, better than (w-k) i.e. w-k<=topFrq , then lucky we can be less conservative and we can expand the current window backward (possibly safer than fwd).

After expansion, immediate try further expansion. IFF impossible i.e. w – topFrq > k, then slide the window.

If correct answer is 11 i.e there’s a 11-substring containing 8 reds, I feel my sliding window will not miss it.

clean-up: non-overlapping intervals #70%

Q (L435): Given a collection of intervals, find the minimum number of intervals you need to remove to make the rest of the intervals non-overlapping.

Note:

  1. You may assume the interval’s end point is always bigger than its start point.
  2. Intervals like [1,2] and [2,3] have borders “touching” but they don’t overlap each other.

==== analysis:

I think this is same as the greedy room scheduler. Schedule the earliest-ending task, so to to maximize accepted meetings.

A deselected meeting is an interval removed.

lone-wolf hidden ] AAABBBCCC unsorted #52%

Q[Lv] 60%: Given array of integers, every element appears three times except for one, which appears exactly once. Find that single one in O(1) time. Could you implement it without using extra memory?

====analysis

peek? Not yet

well-defined problem:)
greedy?
O(1) space probably means swapping
— Idea 1: with more space, I can use a hashtable of count to achieve O(N)
— Idea 2: with O(1) space I can also sort in O(N logN)
— Idea 9: My O(N) algo, applicable for any-size integers and also other than “three”.

Pick a random pivot and partition in O(N) time and O(1) space. Also keep track how many repetitions of the pivot value (probably 3). Exclude the pivot value, count size of both partitions and discard the one whose size=3X. Repeat.

N+N/2+N/4+N/8 …= 2N

Roman-to-integer converter

https://leetcode.com/problems/roman-to-integer/

Roman numerals are usually written largest to smallest from left to right. However, the numeral for four is not IIII. Instead, the number four is written as IV. Because the one is before the five we subtract it making four. The same principle applies to the number nine, which is written as IX. There are six instances where subtraction is used:

  • I can be placed before V (5) and X (10) to make 4 and 9.
  • X can be placed before L (50) and C (100) to make 40 and 90.
  • C can be placed before D (500) and M (1000) to make 400 and 900.

Input: “MCMXCIV” Output: 1994 Explanation: M = 1000, CM = 900, XC = 90 and IV = 4.

==== analysis

Too complicated for a speed coding test?

Reusable technique — One back scan might be enough. Normally the rank of letters encountered would increase. A decrease (only one position) means subtraction. See code in https://leetcode.com/problems/roman-to-integer/discuss/6547/Clean-O(n)-c%2B%2B-solution

Reusable technique — hardcode the 6 “subtraction” cases to simplify the if/else logic.



			

merge two pre-sorted halves: array||slist

Q: Given an integer array of which both first half and second half (unequal sizes) are sorted. Task is to merge two sorted halves of array into single sorted array. 

====analysis

Much easier if we have a slist rather than array.

— solution 1 O(N) time but O(N) space:

First replicate the array to a slist. Then maintain 2 pointers. Pick the smallest and relocate it to end of the merged segment (on the left).

— solution 2 in-place without extra space

quick sort will need extra stack space O(log N)

staircase:1or2 each step

Q: You are climbing a stair case. It takes n steps to reach to the top. Each time you can either climb 1 or 2 steps. In how many distinct ways can you climb to the top?

I think this problem is not “easy”. similar to Fib:

f(3) = f(1) + f(2) where f(1) = how many unique paths after taking a double-step; f(2) how many unique paths after taking a single step.

friend-circle #Union-Find#60%

Q (Leetcode 547 union-find): There are N students in a class. Some of them are friends, while some are not. If A is a direct friend of B, and B is a direct friend of C, then A is an indirect friend of C. And we defined a friend circle is a group of students who are direct or indirect friends.

Given a N*N matrix M representing the friend relationship between students in the class. If M[i][j] = 1, then the ith and jth students are direct friends with each other, otherwise not. And you have to output the total number of friend circles among all the students.

— analysis:
Rated “medium” on leetcode but my Design #1 is easier than many “easy” questions. Clearly this is a data-structure question … my traditional stronghold.

Challenge is merging.

— design 3: island count by BFS, but I think DFS might be easier

— design 1:
lookup map{studentId -> circleId}
Circle class{ circleId, presized vector of studentId}

When we merge two circles, the smaller circle’s students would /each/ update their circleId. This merge process has modest performance but simple.

In reality, students outnumber circles, so here’s an alternative ..

— design 2:
map remains same (Not optional!) .
Circle class is now {circleId, parentCircleId (default -1)}

The swallowed circle will have this->parentCircleId set to a top-level circleId… Path-compression as described in disjoint set.
The merge would only update this one field in one or more Circles. O(H) i.e. height of tree. H is usually very small because at any time, each circle’s parentCircleId is either -1 or a top-level circle — I hope to maintain this invariant.

Scenario:

  1. circles AA, BB, CC created
  2. circle a2 acquired by AA
  3. circle a3 acquired by a2 ultimately “branded” by AA
  4. circle b2 and b3 acquired by BB
  5. a2 swallows b2 –> need to update BB as acquired. When we try to update b2.parentCircleId, we realize it’s already set, so we follow the uplink to trace to the top-level node BB, and update ALL nodes on the path, including b2 as b2 is on the “path” to BB, but do we have to update b3 which is off the path? Suppose I don’t. I think it’s safe.
  6. circle c2 acquired by CC
  7. c2 now swallowed by b3. Now c2 will get branded by AA, and so should the nodes on the path ( b3 -> BB -> AA) This chain-update would speed up future mergers. Should C2’s old parent (CC) also get branded by AA? I think so.

After the data structures are fully updated, we simply return the count of top-level circles. (Each time a top-level circle gets created or disappears, we update that count.)

Additional field in Circle: The vector of studentId is needed only if we need to output the individual students in a given circle.

balloon burst #DP optimization #50%

Q [ Leetcode 312]: not really classic : Given n (up to 500) balloons, indexed from 0 to n-1. Each balloon is painted with a number on it represented by array “nums”. You are asked to burst all the balloons one by one. If the you burst balloon i you will get nums[left] * nums[i] * nums[right] coins. Here left and right are adjacent indices of i. After the burst, the left and right then becomes adjacent. Find the maximum coins you can collect by bursting the balloons wisely.

If you burst a leftmost balloon, you collect 1*it*rightNeighbor coins. In other words, when multiplying 3 numbers, any absentee is a one.

0 ≤ nums[i] ≤ 100

Example: Input: [3,1,5,8]
Output: 167
Explanation: nums = [3,1,5,8] –> [3,5,8] –> [3,8] –> [8] –> []
coins = 3*1*5 + 3*5*8 + 1*3*8 + 1*8*1 = 167
==analysis:
int-array optimization problem.
Might be related to some classic problem.

Let’s define a generic math-function of 3 balloon IDs score(myle, me, myri). In this problem, score() is simply “return myle*me*myri “, but in the next problem, score() could be any math function of the three inputs.

I see each possible snapshot (having K balloons, i.e. at level K) as a graph node. Exactly 2^N nodes in the grid, i.e. 2^N possible snapshots i.e. 2^N combinations of these N balloons.

Every edge has a score. To compute the score, we only need the two nodes (snapshots) of the edge to identify the 3 balloons for score().

Pyramid — Let’s assume at bottom is “origin” i.e. snapshot of the original array ..Level 500; on top is “phi” i.e. snapshot of the empty array .. Level 0.

The problem transforms into a max path sum problem between these 2 nodes.

–solution-1 DP
From origin to any given node, there are many distinct paths each with a total score up to that node. If a node has 55 paths to it, the max sum among the 55 paths would be the uprank (upward rank) of the node.

If the node also has 44 paths from phi, the max sum among the 44 paths would be the downrank (downwrd rank) of the node. This is an interesting observation, but not needed in this solution since every edge is evaluated exactly once.

To our delight, uprank of a node AA at Level-5 depends only on the six Level-6 parent node upranks, so we don’t need to remember all the distinct paths to AA:). Our space complexity is the size of previous level + current level.

We just need to compute the uprank of every node at Level 6, then use those numbers to work out Level 5…. the Level 4 … all the way to phi.

If there are x nodes at Level 6 and y nodes at level 5, then there are 6x==5y edges linking the two levels.

Time complexity is O(V+E) i.e. visit every edge.

Level n: 1 node
Level n-1: n nodes
Level n-2: nc2 nodes

Level 2: nc2 nodes
Level 1: n nodes
Level 0: 1 node

Each node at level K has K child nodes above. This graph now suggests the max-path-sum algo (with edge scores), but it might be the only way to solve the problem, like the bbg odometer.

consider a DP algo to update the score at each node at level K, ie the max sum from root till here, via one of the K-1 nodes at level K-1

But Level 2 has too many (N-choose-2) nodes. Can We prune the tree, from either origin or phi?

rangeAND: bitwise AND of all ints in range

Q: Given a continuous range [m, n] where 0 <= m <= n <= 2147483647, return the bitwise AND of all numbers in this range, inclusive.

  • the lowest bit will most likely see 0 and 1 so … becomes zero
  • (turns out to be a minor tip) if the range has size > 8 then lowest 3 bits all zeroed out
  • imagine the bit array object incrementing from m to n. We want to find out if there’s a stable higher portion
  • Key technique — look at some examples. We can convert m and n to two bit images. we can look at some examples below.
  • we can compare the two bit images left to right to find the “higher portion”. All lower bits are probably zeroed out in the result

10110101
10110100
10110011
10110010
10110001
—–
10110000

–my solution not tested on Leetcode: https://github.com/tiger40490/repo1/blob/cpp1/cpp/rangeAnd.cpp
* compare m and n left to right. If m is shorter, then return 0.
* if same length, then compare left to right until a difference is found. Until that bit, all left-end bits are “retained”.

 

versioned-queue problem

I think this problem is mostly about data structure, not algorithm.

Q: Design and implement a Version-Queue. A Version-Queue maintains a version number along with normal Queue functionality. Every version is a snapshot of the entire queue. Every operation[Enqueue/Dequeue] on the Queue increments its version.

Implement the following functions:

1. Enqueue – appends an element at the end of the queue.
2. Dequeue – returns the top element of the queue.
3. Print – it takes a version number as input and prints the elements of the queue of the given version. The version number input can also be an old/historical version number.

E.g. if the current version number of the queue is 7 and the input to this function is 5, then it should print the elements of the queue when its version number was 5.

For simplicity, assume the elements are integers.

We expect you to write a helper program to test the above data structure which will read input from stdin and prints output to stdout.

Input format:
First line should have an integer n (number of operations). This should be followed by n number of lines, each denoting one operation.
e.g.
6
e 1
e 4
d
e 5
p 2
p 4

‘e’ stands for enqueue

— My design —
In addition to the regular queue data structure, we need a few helper data structures.

All current + historical queue elements are saved as individual elements in a “Snapshot” vector, in arrival order. This vector never decreases in size even during dequeue. Two pointers represent the trailing edge (dequeue) and leading edge (enqueue).

(minor implementation detail — Since it’s a vector, the pointers can be implemented as 2 integer index variables. Pointers and iterators get invalidated by automatic resizing.)

Every enqueue operation increments the right pointer to the right;
Every dequeue operation Increments the left pointer to the Right;
(No decrement on the pointers.)

With this vector, I think we can reconstruct each snapshot in history.

Every pointer increment is recorded in a Version table, which is a vector of Version objects {Left pointer, Right pointer}. For example, if Version 782 has {L=22, R=55} then the snapshot #782 is the sub-array from Index 22 to 55.

Additional space costs:
O(K) where K = number of operations

Additional time costs:
Enqueue operation — O(1). Insert at end of the Snapshot vector and Version vector
Dequeue operation — O(1). Move a pointer + insert into Version vector Print operation — O(M) where M = maximum queue capacity

Minor point — To avoid vector automatic resize, we need to estimate in advance the limit on K i.e. number of operations. If you tell me you get millions of operations a microsecond, then over a day there
could be trillions of “versions” so the Versions vector and the Snapshot vector need sufficient initial capacity.

Note we could get 9000 enqueue operations in a row.

LFU cache #cf.LRU #72%

Q LFU (Least-Frequently-Used) cache to support the following operations: get and put in O(1)
* get(key) – Get the value (will always be positive) of the key if the key exists in the cache, otherwise return -1.
* put(key, value) – Set or insert the value if the key is not already present. When the cache reaches its capacity, it should invalidate the least frequently used item before inserting a new item. For the purpose of this problem, when there is a tie (i.e., two or more keys that have the same frequency), the least recently used key would be evicted.

====Analysis

  1. dstruc — centry i.e. CacheEntry node {key, value, hitCount, lastHit (timestamp), (optional)ptr to host LinkNode}, to be used in an inner linked list.
    • invariant: hitCount can only increase
  2. dstruct — inner minilist of centry nodes
    • invariant: list always sorted by lastHit. We can remove any intermediate node, but incoming node is always added to the Tail
  3. dstruct — fixed-sized (rehash-free) hashtable {key -> ptr to centry}, needed for mid-stream laser-removal
  4. dstruct — LinkNode {level, minilist-of-centry} where all centry objects share the same hitCount denoted “level”.
  5. dstruct — outer list of LinkNodes, always sorted by level

“bubble-up” operation — Whenever a centry gets a cache-hit, its hitCount increments. It immediately and unconditionally bubbles up to the LinkNode one level higher (to be created in O(1) if necessary) ((
* [o1] query the hashtable and follow ptr to remove the centry from the minilist in an old LinkNode
* [o1] insert the centry to the new level, at Tail of minilist. The new LinkNode could be non-existent but Never empty!
* [o1] optionally, new host LinkNode’s address is saved in the centry
))

  • Get() hit — relatively easy. Update the hitCount and bubble up
  • Get() miss — trivial
  • Put() Update — similar to get-hit
  • Insertion (possibly after deletion) — [o1] append to the minilist Tail in the Level-1 LinkNode (to be created if necessary) and add to hashtable
  • Deletion — always from list to hashtable, never the converse
    • [o1] identify lowest level present, then delete the head (i.e. eviction target) of minilist
    • when a linkNode becomes empty, it must disappear from the outer list, to prevent build-up of consecutive empty LinkNodes leading to linear search for eviction target. Imagine aaaaa bbbbb c[Now need to evict an “a”]. Therefore, array of LinkNode is unacceptable.

reverse slist in K-groups

https://leetcode.com/problems/reverse-nodes-in-k-group/description/ is the problem I tried today, not a classic problem. Challenge is not the algorithm per-se but the Edit-Compile-Test-Debug cycle. I think some of us can come up with a conceptual algorithm quickly, but to implement it correctly took me hours.

Similarly, the problems below are not tough due to algorithm but the ECTD cycle can take hours, sometimes due to c++ iterator pitfalls, sometimes because we can’t easily visualize the data structure .. I wrestled with all of these problem, so please feel free to try them and discuss with me.

* print any tree (you can start with a binary) by level, in zigzag sequence
* given a linked list, write a function to remove all nodes greater than 55 (or any user input). Return the head of the modified list.
* https://www.geeksforgeeks.org/zigzag-or-diagonal-traversal-of-matrix/
* https://www.geeksforgeeks.org/create-a-matrix-with-alternating-rectangles-of-0-and-x/
* https://bintanvictor.wordpress.com/2018/02/06/spiral-number-printer/

As decided last week, I didn’t bother to run the Leetcode test suit. They make me feel frustrated, worthless, defeated, inferior, weakling, quitter…. Without these tests I ran my own tests and I feel like a joyful hacker.

Even though I may not pass all Leetcode tests, I feel my code is reasonable quality and I’m proud of it.

—-Problem is well-defined but not very common.

Given a linked list, reverse the nodes of a linked list k at a time and return its modified list.

k is a positive integer and is less than or equal to the length of the linked list. If the number of nodes is not a multiple of k then left-out nodes in the end should remain as it is. O(1) space. Hopefully O(N) time.

—-My sol1: use my existing O(1) solution but now keep a count.

https://github.com/tiger40490/repo1/blob/cpp1/cpp/linkedList/linkedListReverseInK_group.cpp

The first group and the last group are both tricky and can take up hours.

detect cycle in slist #Part 2

Note there’s an open question at the end

Don’t use fast-slow iterators. These 3 solutions below each has advantages over the fast-slow iterators solution.

–solution: list-reversal, based on https://blog.ostermiller.org/find-loop-singly-linked-list and implemented in https://github.com/tiger40490/repo1/blob/py1/py/slist/slistReverseLoopDetect.py
Basically, iterate from aa to bb to cc … and reverse each arrow. Suppose dd points to bb initially forming a loop. At some point the snapshot is

  • null <- aa <- bb <- cc <- dd  while currentNode is dd and nextNode is bb.

If we go on, we would be back at aa, proving the existence of a loop.

Benefit of this solution — O(1) space complexity and O(N) time complexity

constraint — wont work on read-only lists .

–solution: count nodes against memory usage — my invention
We can get a reliable estimate of the memory usage of the entire data structure, and divide it by per-node footprint, so we know within good approximation how many nodes exist. If we count more than that, there must be duplicates.

https://blog.ostermiller.org/find-loop-singly-linked-list shows one way to estimate the memory usage — keep track of the maximum and minimum addresses among all visited nodes.

The low level languages usually provide addresses. The higher level languages usually provide memory usage. In realistic applications, there is always a way to get an estimate of memory usage.

–solution: Brent. See my code https://github.com/tiger40490/repo1/blob/cpp1/cpp/linkedList/cycleDetect.cpp

If we keep count of total number of increments, we should see that every time we double “power”, that total count is a power of 2. If we have incremented 32 times then we know { mu + lambda } must exceed 32… Advantages over the simple-2-iterator:

  1. tortoise is much faster, so if mu (merge point, or loop starting point) is far out, then we will find it faster. With the standard solution, tortoise takes a long time to reach the merge point.
  2. can return loop size

I now believe I can predict in which “phase” we will detect the loop — If lambda is between 2^5 and 2^6, then we will detect the loop in Phase 6, and we can’t detect in Phase 5

I suspect power-of-3 is also working:

  • If the slower iterator stops moving completely, then the algorithm is broken because the slower iterator could remain outside the loop.
  • If the slower iterator tailgates the faster iterator and the following distance is always always within a static limit (like 99), then the algorithm is broken because the loop size could exceed 99, so the two iterators would have no chance of rendezvous.
  • ==> If the following distance grows gradually, AND the follower is never stuck forever, I think eventually they will meet —
    • Suppose right after a tortoise jump, powOfN bumps up to 81 but actual loop length is shorter, like 80. Now within the next 81 iterations, the hare would move step by step while the tortoise remains. They would rendezvous for sure.
  • —- pros and cons for power-of-3 over power-of-2 —-
  • pro: if a big loop involves root, tortoise won’t have to jump too early too frequently ==> Fewer comparison needed.
  • con: if there’s a tiny loop far out, tortoise would end up jumping too late?

live updated hitCount over last5s#presumably Indeed

I hit a similar question in NY, possibly LiquidNet or CVA

Q: Make a system (perhaps a function?) that returns the average number of hits per minute from the past 5 minutes.

I will keep things simple by computing the total hit over the last 300 seconds. (Same complexity if you want average order amount at Amazon over last 5 minutes.)

Let’s first build a simple system before expanding it for capacity.

Let’s first design a ticking system that logs an update every time there’s an update. The log can be displayed or broadcast like a “notice board”, or we can update a shared atomic<int>.

Whenever we get a new record (a hit), we save it in a data structure stamped with an expiry date (datetime). At any time, we want to quickly find the earliest unexpired record i.e. the blue record. There’s only one blue at any time.

What data structure? RingBuffer with enough capacity to hold the last 5 minutes worth of record.

I will keep the address of the current blue record which is defined as the earliest unexpired record in the last update. When a new record comes in, i check “Is the blue expired?” If NO, then easy.. this new record is too close to the last new record. I simply update my “notice board” in O(1). If YES then we run a binary search for the new blue. Once we find it, we have to compute a new update in O(W), where W is the minimum of two counts, A) recently expired records B) still unexpired records.  After the update, we remove the expired items from our data structure.

–That concludes my first design. Now what if we also need to update the notice board even when there is no new record?

I would need an alarm set to the expiry time of the current blue.

–Now what if the updates are too frequent? I can run a schedule update job. I need to keep the address of a yellow record, defined as the newest record of the last update.

When triggered, routine is familiar. I check “Is the blue expired?” If NO then easy… If YES then binary-search for the new blue.

detect cycle in slist #Part 1

Q: A singly-linked list (slist) contains a loop. You are given nothing but the head node. With O(1) space complexity, how do you locate the join node? For example,

0(head)->1->2->…101->102->103->4(again), so #4 is the merge point

Here’s Deepak’s ingenious trick

  1. first use the 2-pointer trick to find any node inside the loop.
  2. find the length (say, 55, denoted LL) of the loop using a single moving pointer, starting from that node
  3. now we cant discard that node
  4. Now start a pointer B from head and move LL steps.
  5. Now put pointer A at head
  6. Now move A and B in locksteps. They will meet for sure, at the merge point. Here’s the proof:

Suppose the merge point is at MM, i.e. MM steps from head. When A and B starts moving in locksteps,

  • how far is B from MM? MM steps!
  • how far is A from MM? MM steps too.

Note LL value could be small like 1.

struct Node{
  Node const * p;
  int const data;
  Node(int _data, Node const * _p): p(_p), data(_data){}
  friend ostream & operator<<(ostream &os, Node const & node){
        os<<node.data<<" / "<<&node<<" -> "<<node.p<<endl;
        return os;
  }
};

Node _9(9, NULL);
Node _8(8, &_9);
Node _7(7, &_8);
Node _6(6, &_7);
Node _5(5, &_6);
Node _4(4, &_5);
Node _3(3, &_4);
Node _2(2, &_3);
Node _1(1, &_2);
Node _0(0, &_1);
Node & root = _0;
Node const * mergePoint = &_1;

//how many distinct nodes in the loop
size_t getLoopLen(Node const & root){
  Node const * brunner = &root;
  Node const * frunner = &root;
  for(;;){
        frunner = frunner->p->p;
        brunner = brunner->p;
        if (frunner == brunner) break;
  }
  cout<<"now the two runners have met somewhere in the loop: "<<*frunner ;
  for(int ret = 1; ;++ret){
        frunner = frunner->p ;
        if (frunner == brunner) return ret;
  }
}

Node const * getMergePoint(Node const & root){
  size_t LL = getLoopLen(root);
  cout<<"# of nodes in loop = "<<LL<<endl;
  Node const * frunner = &root;
  for(int i = 0; i<LL; ++i){ frunner = frunner->p; }
  //cout<<"front runer is "<<*frunner;
  Node const * brunner = &root;
  for(;frunner != brunner;){
        brunner = brunner->p;
        frunner = frunner->p;
  }
  return frunner;
}

int main(){
  _9.p = mergePoint;
  Node const * ret = getMergePoint(root);
  cout<<"Merge point is "<<*ret;
  assert(ret == mergePoint);
}

[19]hash table expansion: implementation note #GS

(I don’t use the word “rehash” as it has another meaning in java hashmap. See separate blogpost.)

Note this blogpost applies to separate chaining as well as linear probing.

As illustrated in my re_hash_table_LinearProbing.py, the sequence of actions in an expansion is tricky. Here is what worked:

  1. compute bucket id
  2. insert
  3. check new size. If exceeding load factor, then
    1. create new bucket array
    2. insert all entries, including the last one
    3. reseat the pointer at the new bucket array

If you try to reduce the double-insertion of the last entry, you would try moving Step 2 to later. This is tricky and likely buggy.

Say the computed bucket id was 25, so after the expansion, you insert at (or around Bucket25), but this 25 was based on the old “capacity”. When we lookup this key, we would use the current capacity to get a bucket id of 9, so we won’t find this key.

c++CollabEdit/Broadway IV: implement hash table#python

Q: implement a hash table class in any language. You could use existing implementations of linked list, array, hash function…

Q: talk about how you would implement rehash?
%%A: hash code won’t change for the key objects. But I would rerun the modulus against the new bucketCount. Based on the new index values, I would create the linked lists in each new bucket. Every pair needs to be relocated. Lastly I need to get rid of the old bucket array.

Q: how would you test your hash table?
%%A: try inserting (key1, val1), then (key1, val2), then look up key1
%%A: if I know any common weakness of the hash function, then test those.
%%A: trigger rehash

Q: what could go wrong in a multi-threaded context?
%%A: things like lost update or duplicate entries

Q: What concurrency solution would you choose for best performance?
%%A: could use lockfree algo at each point of writing to the bucket array or writing to a linked list.

[12]back-scan any container,print`every Other item #MS

Have I overspent my time on this once-asked question?

The umbrella question — write a utility function to iterate any container and print out every Other element backwards?

Good coding practice! I think this is all about iterator syntax knowledge (my weakness) not algorithm (my strength)!

Note this is really about knowledge not  coding abilities. QQ not ZZ.

Iterator declaration is a can of worm 😦 I might need to give up on this.

#include <iostream>
#include <vector>
#include 	<list>
#include <set>
using namespace std;

template<class _InIt>  void printAlternateItem2itr(_InIt _First, _InIt _Last){
	bool flag = true;
	// if the iterator is from rbegin, then ++ would reverse it!
	for (_InIt it = _First; it != _Last; ++it, flag=!flag) {
		if (flag) cout << *it << ' ';
	}
	cout << endl;
}
template <typename CONT> void printAlternateItemBackward(CONT const & cont) {
	printAlternateItem2itr(cont.rbegin(), cont.rend());
}
int main() {
	//vector<int> cont = { 11,2,3,4,5,6,7,18 };
	//list<int> cont = { 11,2,3,4,5,6,7,18 };
	string cont = "0123456789a";
	set<int> cont2 = { 11,2,33,44,55,66,77,88,99 };
	printAlternateItemBackward(cont);
	printAlternateItemBackward(cont2);
	int arr[] = { 11,2,3,4,5,6,7,18,9 };
	int size = sizeof(arr) / sizeof(arr[0]);
	printAlternateItem2itr(arr, arr + size); //forward only
}

—-
Q: is comparison defined on all iterators?
A: now I think linked list doesn’t. Now I think only random access itr does.

%%Q: what’s the signature of STL find()? I will use those declarations of iterators in my function. (Actually the map and set containers have member functions find() outperforming std::find)

%%Q: from a const container, can u get a non-const iterator?

Q: why don’t you take a container as input? Why must you take iterators?
%%A: it’s more common to take iterator, but in this case container will do. All containers provide rbegin() or begin() including string. Raw array doesn’t but the iterator increment won’t work for raw arrays anyway.


Separate question
Q: OO design — how would you represent Order state transition graph in an OMS?

find every 3-subsets adding up to 9 #AshS

Q: You have 9 poker cards, numbered 1 to 9. Given two integers SUM (for example 11) and COUNT (for example, 3), construct every combination of 3 cards, who add up to the target 11.

Within each combination, if we need to generate each permutation, it’s as simple as calling next_permutation() within an array (which is a combination).

You can only choose each card 0 or 1 time, i.e. no redraw.

I used to feel this was dynamic programming. Now I feel we have no choice but iterate over all combinations. We have an algo to generate ascending combinations. We can stored them in mini arrays, each one is ascending. We could use binary search in each mini-array.

https://github.com/tiger40490/repo1/blob/cpp1/cpp/array/sizeN-subset_TargetSum_Ashish.cpp is a very short recursive solution by my friend CSY.

//There are N distinct poker cards numbered 1 to N. Find all combinations
//of C cards such that each combo adds up to the same given Target
//
//Based on https://bintanvictor.wordpress.com/2017/11/08/generate-next_combo3-boys-out5-recursive/
//
//Note some range of the generated sequence is ascending, permitting
//binary search like lower_bound, but it's not easy to tweak the algo
//to skip ahead. There's no random access iterator here!
#include <iostream>
#include <sstream>
#include <deque>
#include <iomanip> //setw
#include <algorithm>  //sort
#include <assert.h>
//#define DEBUG
using namespace std;
size_t calls=0, combos=0;
size_t const C=3; //how many in each combination
size_t const Target=12;
deque<int> pool{1,3,4,5,8};
deque<int> prefix;

template<typename T> void dumpDeque(deque<T> const & p, string const & headline){
  cout<<"-- "<<headline<<" -- size = "<<p.size()<<endl;
  for(int i=0; i<p.size(); ++i) cout<<setw(5)<<p[i];
  cout<<endl;
}
template<typename T> int showCombo(deque<T> const * p){
  ++ combos;
  size_t sum = 0;
  stringstream ss;
  for(int i=0; i<p->size(); ++i){
        ss<<setw(5)<<(*p)[i];
        sum+= (*p)[i];
  }
  static string last;
  string combo=ss.str();
  cout<<"combo: "<<combo;
  if (sum == Target) cout<<" <- Hit!";
  cout<<endl;
  assert(last <= combo && "should be ascending");
  last = combo;
}

template<typename T> int recurs(){
  ++calls;
#ifdef DEBUG
  cout<<"-------------\nentering "; dumpDeque(prefix, "prefix"); dumpDeque(pool, "pool");
#endif
  if (prefix.size() == C) return showCombo(&prefix); //prefix alone is the combo
  if (pool.empty()) return 0;
  T poolhead = pool.front(); pool.pop_front();

  prefix.push_back(poolhead); //add poolhead to prefix

  //this 1st recursive function call starts a rather deep call stack and prints
  //all combinations that start with the given (new longer) prefix
  recurs<T>();//use the longer prefix and the shorter pool

  prefix.pop_back();//restore prefix
  recurs<T>();
  pool.push_front(poolhead); //restore pool, needed by the 2nd call in the parent stack
#ifdef DEBUG
  cout<<"^^^^^^ restored before returning "; dumpDeque(prefix, "prefix"); dumpDeque(pool, "pool");
#endif
}

int main() {
  assert(C <= pool.size());
  sort(pool.begin(), pool.end());
  recurs<int>();
  cout<<calls<<"  calls to the recursive function to generate "<<combos<<endl;
}

mode finder algo #java boxing

“We have a set of integers, the size is 1 mill. The value of
those integers can go as large as 1 trillion. So the set might look
like [1,3, 3, 100, 23, 100, 100, 89,999999, 89, 100000000, 10000000,
10000000000000, 39, 3, 23, 1,1,1,1,1,1000, 10000000000000….], now we
need to find out the individual single integer which is mostly
repeated. Say if integer 3 repeated 1000 times, no other integers have
dups as many as it does, integer 3 will be the one we are looking for.”

I did a java solution using HashMap. Now I feel auto-boxing is a growing performance drag with increasing volume. Whenever you compute hashcode on an Integer, you must box it up to a heap object. Counting up also requires unboxing and then boxing. C++ would use simple integers.

A Java solution can avoid auto-boxing completely. Build a custom HMap class with an array of buckets, each containing a pointer to a custom LList of Pair objects. Each Pair instance consists of 2 primitive integers + pointer to the “next” Pair object.

How about a sparse array?

##classic algos using 2 cooperating data structures

* LRU cache — i.e. least-recently-used cache. requires a hash table + some kind of sequence container

Hash table is the only choice to provide constant-time lookup.

To maintain the order, we need some kind of sequence container.

* non-recursively print a directory tree breadth-first (then again depth-first). P171 [[c#precisely]] has a concise 10-line solution.

You need a queue (for breadth-first) or stack (for depth-first) to hold the nodes to print

You need a “seen ” Set to remember which nodes already printed.

* tree-print is the “most-interviewed” example of recursion-to-iteration problem. All such problems require at least 2 data structures, at least one of them a stack

* implement a queue using 2 stacks.