get collection sum after K halving operations #AshS

Q: given a collection of N positive integers, you perform K operations like “half the biggest element and replace it with its own ceiling”. Find the collection sum afterwards.

Note the collection size is always N. Note K(like 5) could exceed N(like 2), but I feel it would be trivial.

====analysis====

This is a somewhat contrived problem.

I think O(N + K log min(N,K)) is pretty good if feasible.

K-palindrome #careercup 20%

Q (Facebook India, careercup): A k-palindrome is a string which transforms into a palindrome on removing at most k characters. Implement check(str, k). str has at most 20,000 characters. 0<=k<=30

Sample Test Case: Input : abdxa 1 -> false
====analysis
It’s crucial to zoom into the worst input early on — binary string with the correct “kills” well-dispersed.

Insight — once we group the chars by ascii code, I can see a successful palindrome is really an A-palindrome interleaved with a B-palindrome and C-palindrome…

This problem is related to the (tougher) problem of “max palindrome sub-sequence”

Eg k=33.
–idea 3: per-club distribution table
each char gets a dedicated club/distro, where we record all positions occupied by this char.
Since K is small, then every distro should be almost “balanced” around the midpoint.
— idea 7
There can be only up to 33 possible midpoints, including those in-between. (If all kills hit the upper end, then midpoint drops by 33/2.) Let’s try each possible midpoint.

Insight — once we pick a midpoint, we know how many we can kill above/below. For example, if we pick the highest midpoint, then all kills must hit lower half. The extreme midpoints are easier due to stringent constraints.

If we pick a central midpoint, then we can kill about 33/2 in each half, but the kills must match — nice constraint.

For each midpoint picked, we run Algo B:
At init-time, we compute the defects in each club/distro using Algo B1:
given the midpoint and the positions of all J’s, the defect count is the number of kills (killing a J or non-J) to balance J’s distro…

We will compile all 26 defect counts.  Now we start growing our seed at the chose midpoint. We expand both ways. If next 2 positions belong to different clubs J vs L, we evaluate the two kill options using Algo B2
One of the two choices will reduce overall defects. We first look at J’s distro. The J kill would improve or worsen J club’s defect.

If there’s an efficient way to compare evaluate the two kills or update the default counts, then we have a decent solution.

–idea 9: frq table. If there are 35 odd clubs, then 33 of them can become even but still 2 odd clubs –> immediately failure. However, This idea fails with the binary string.

signed-int: longest K-sum subArray #60%

Q: given a signed-int array, find the longest subarray having sum=K

realistic scenario: we know someone executed a buy and a sell and made $K profit. There are many possible buy-sell points when I review the price points. We know this trader held the position longest to gain exposure, so which buy-sell points did he use?

====analysis

First scan to build cumulative sum — the sum-array. Looks like a price chart. Initial value is the first original element.

Now scan this sum-array. At each element s[j],

  • I save s[j] in hashtable {s[j] -> j] iFF no such key yet.
  • Then I look for the (by design, earliest) element s[i] such that s[i] + K == s[j]. If found in the hash table, then compute hold:=j-i and update the global longest_hold if necessary.

Note the hash table’s values (positions) are all earlier than j, since we are scanning forward.

Can consolidate to one scan.

Tip — hash table is usable to reduce O() thanks to exactly-K.

max-product subArray #51% untested

Q: Leetcode 152: given signed int array, find the subarray (minimum size 1) with highest product. Return the product.

Note if there’s 0 in your subarray you must return 0.

====analysis
peek? Not yet

— solution 1: O(N)

One scan to identify all the zeros. They partition the array.  For each partition, we need algorithm-A:

Count neg elements. If even, then return product of entire array. Else

Accumulate from left until last neg element -> prodL.
Accumulate from right until last neg element -> prodR. Compare the two.

trivial edge case — if the zeros create only single-neg partitions, then zero is the best.

trivial edge case — If subarray has only one element, be it zero or negative, then that’s the best for that subarray.

 

 

lone-wolf hidden ] AAABBBCCC unsorted #52%

Q[Lv] 60%: Given array of integers, every element appears three times except for one, which appears exactly once. Find that single one in O(1) time. Could you implement it without using extra memory?

====analysis

peek? Not yet

well-defined problem:)
greedy?
O(1) space probably means swapping
— Idea 1: with more space, I can use a hashtable of count to achieve O(N)
— Idea 2: with O(1) space I can also sort in O(N logN)
— Idea 9: My O(N) algo, applicable for any-size integers and also other than “three”.

Pick a random pivot and partition in O(N) time and O(1) space. Also keep track how many repetitions of the pivot value (probably 3). Exclude the pivot value, count size of both partitions and discard the one whose size=3X. Repeat.

N+N/2+N/4+N/8 …= 2N

3-sum #52%

Q[L] Given an array nums of K integers, are there elements a, b, c in nums such that a + b + c = 0? Find all unique triplets in the array which gives the sum of zero. Doable in O(KK)

====analysis

— Solution 1: separate into (M) non-negatives and (N) negatives. M+N=K.

— solution 1a: given M targets and N negative items,

if M > N, then check O(NN) pairs among negatives …. and look up the sum in a pre-populated hash table of M items.
If M < N, then use the two-sum algo O(MN)

Total O(MN + MM) for small M
Total O(MN + NN) for small N

Both comes out to O(MN+max(M,N)), smaller than O( [N+M]^2 )

Note some bigO students don’t like too many tokens and may simplify it to O(KK), but I think it’s imprecise… see when to introduce new tokens in O()

— solution 1c: given M targets and N negative items,

I can sort the N-array. For each target, I can binary-search, but this is possibly O(MN logN)

friend-circle #Union-Find#60%

Q (Leetcode 547 union-find): There are N students in a class. Some of them are friends, while some are not. If A is a direct friend of B, and B is a direct friend of C, then A is an indirect friend of C. And we defined a friend circle is a group of students who are direct or indirect friends.

Given a N*N matrix M representing the friend relationship between students in the class. If M[i][j] = 1, then the ith and jth students are direct friends with each other, otherwise not. And you have to output the total number of friend circles among all the students.

— analysis:
Rated “medium” on leetcode but my Design #1 is easier than many “easy” questions. Clearly this is a data-structure question … my traditional stronghold.

Challenge is merging.

— design 3: island count by BFS, but I think DFS might be easier

— design 1:
lookup map{studentId -> circleId}
Circle class{ circleId, presized vector of studentId}

When we merge two circles, the smaller circle’s students would /each/ update their circleId. This merge process has modest performance but simple.

In reality, students outnumber circles, so here’s an alternative ..

— design 2:
map remains same (Not optional!) .
Circle class is now {circleId, parentCircleId (default -1)}

The swallowed circle will have this->parentCircleId set to a top-level circleId… Path-compression as described in disjoint set.
The merge would only update this one field in one or more Circles. O(H) i.e. height of tree. H is usually very small because at any time, each circle’s parentCircleId is either -1 or a top-level circle — I hope to maintain this invariant.

Scenario:

  1. circles AA, BB, CC created
  2. circle a2 acquired by AA
  3. circle a3 acquired by a2 ultimately “branded” by AA
  4. circle b2 and b3 acquired by BB
  5. a2 swallows b2 –> need to update BB as acquired. When we try to update b2.parentCircleId, we realize it’s already set, so we follow the uplink to trace to the top-level node BB, and update ALL nodes on the path, including b2 as b2 is on the “path” to BB, but do we have to update b3 which is off the path? Suppose I don’t. I think it’s safe.
  6. circle c2 acquired by CC
  7. c2 now swallowed by b3. Now c2 will get branded by AA, and so should the nodes on the path ( b3 -> BB -> AA) This chain-update would speed up future mergers. Should C2’s old parent (CC) also get branded by AA? I think so.

After the data structures are fully updated, we simply return the count of top-level circles. (Each time a top-level circle gets created or disappears, we update that count.)

Additional field in Circle: The vector of studentId is needed only if we need to output the individual students in a given circle.

generate loopfree paths: graph node A→B|anyPair

Q1: given 2 nodes in a graph containing N (eg 121) nodes, potentially with cycles, generate all simple paths between the pair. A simple path has no cycle. (In other words, for a simple path, length + 1 ==  # unique nodes)

I think there are classic math algorithms for it, because this is part of basic graph theory. Here are some applications of this type of algorithms —

  • Q1b (special case of Q1): given 2 nodes in a C-by-R rectangular grid, where every node is connected to (up to) four neighbors, generate all cycle-free paths between the pair.
    • Below, I solved this problem in python
  • Q2 (simple application of Q1 algo): generate all simple paths between ALL node pair in a graph. The shortest simple path has length=0. Longest simple path can potentially visit every node exactly once.
  • A: first generate all 121-Choose-2 node pairs. For each pair, solve Q1. Lastly generate the 121 trivial paths of length=0.
    • repetitive 😦
  • Q2b (special case of Q2): In a C-by-R rectangular grid, where every node is connected to (up to) four neighbors, generate all cycle-free paths between ALL pairs. I believe this simple leetcode problem#79 does it.
    • Now I think the algo is much simpler than I imagined. Should really code it by hand.
  • Q2c (easy one based on Q2): given a binary tree containing no cycles, generate all paths.

— A1b: my DFT implementation (probably not 100% correct) , where each “trail” either fails or becomes a path.

  1. from NodeA start a breadcrumb/trail. We can’t revisit any node already visited on current breadcrumb,
    1. if this is a matrix, then instead of a hashtable, we can also use a shadow matrix, but the breadcrumb is much smaller than a shadow matrix
  2. if we reach a node surrounded by nodes on the same breadcrumb, then the trail fails at a dead-end
  3. else we will reach NodeB 🙂 Print the breadcrumb

By construction, we won’t see duplicate paths 🙂

https://github.com/tiger40490/repo1/blob/py1/py/algo_grid/classic_count4waySimplePaths.py is the implementation

–BFT? I don’t think BFT can print each unique path

bbg-FX: check if a binary tree is BST #no deep recursion

Binary search tree has an “ascending” property — any node in my left sub-tree are smaller than me.

Q1: given a binary tree, check if it’s a BST. (No performance optimization required.) Compile and run the program.

Suppose someone tells you that the lastSeen variable can start with a random (high) initial value, what kind of test tree would flush out the bug? My solution is below, but let’s Look at Q1b.

Q1b: what if the tree could be deep so you can’t use recursion?
%%A: use BFT. When we actually “print” each node, we check greatest node in its left subtree (and least node in the right subtree)


I made a mistake with lastSeen initial value. There’s no “special” initial value to represent “uninitialized”. Therefore I added a boolean flag.

Contrary to the interviewer’s claim, local statics are automatically initialized to zerohttps://stackoverflow.com/questions/1597405/what-happens-to-a-declared-uninitialized-variable-in-c-does-it-have-a-value, but zero or any initial value is unsuitable (payload can be any large negative value), so we still need the flag.

#include <iostream>
#include <climits>
using namespace std;

struct Node {
    int data;
    Node *le, *ri;
    Node(int x, Node * left = NULL, Node * right = NULL) : data(x), le(left), ri(right){}
};
/*    5
  2       7
 1 a
*/
Node _7(7);
Node _a(6);
Node _1(1);
Node _2(2, &_1, &_a);
Node _5(5, &_2, &_7); //root

bool isAscending = true; //first assume this tree is BST
void recur(Node * n){
//static int lastSeen; // I don't know what initial value can safely represent a special value

  //simulate a random but unlucky initial value, which can break us if without isFirstNodeDone
  static int lastSeen = INT_MAX; //simulate a random but unlucky initial value
  static bool isFirstNodeDone=false; //should initialize to false

  if (!n) return;
  if (n->le) recur(n->le);

  // used to be open(Node *):
  if (!isAscending) return; //check before opening any node
  cout<<"opening "<<n->data<<endl; 
  if (!isFirstNodeDone){ 
        isFirstNodeDone = true; 
  }else if (lastSeen > n->data){
        isAscending=false;
        return; //early return to save time
  }
  lastSeen = n->data;

  if (n->ri) recur(n->ri);
}

int main(){
  recur(&_5);
  cout<< (isAscending?"ok":"nok");
}