lowest missing+ve int#Codility #70%

Q: Write a function int solution(int[] A);  that, given an array A of N integers, returns the smallest natural number that does not occur in A. For example, given A = [1, 3, 6, 4, 1, 2], the function should return 5.
Given A = [1, 2, 3], the function should return 4.
Given A = [−1, −3], the function should return 1.

• each element of array A is an 32-bit signed int
• expected worst-case time complexity is O(N);
• expected worst-case space complexity is O(N).
* Elements of input arrays can be modified.

https://leetcode.com/problems/first-missing-positive/description/ is similar but O(1) space and average O(N) time!

—— my analysis —–

The mutable and O(1) space hints at — saving array indices as payload !

—- Idea A:

first scan to swap non-positives to the end and remember the new boundary (say #111).

In the same scan also determine min and max. Suppose min=55.

Another scan to reduce all payloads by 55 so they fall in the range of 0 to max-55.

Now use CSY technique .. check array@0-N in-situ #Nsdq#contrived

—-solutionB: make_heap O(1) space but O(N logN) time. Build a min-heap based on the array in O(1) space, then keep calling min().

  • make_heap shows random access container (vector used in demo), and “rearrange”, implying O(1) space
  • make_heap shows O(N) to construct the heap

min() is O(1) but delete_min() is O(log N), so overall we probably get O(N logN)

—-solutionC: radix sort. https://en.wikipedia.org/wiki/Radix_sort#In-place_MSD_radix_sort_implementations shows an in-place binary radix sort.

First pass to transform all negative numbers to 0. Then iterate the sorted array and check for the earliest “gap”. Worst case — you get 1,2,3… without gap, so answer is the 1+ largest array element.

O(W*N) where W is width of the largest integer. If we assume 64-bit then W is a constant:)

http://www.drdobbs.com/parallel/parallel-in-place-radix-sort-simplified/229000734 is another in-place radix sort.

Advertisements

parent/child pairs→tree algos #Indeed

As a speed-coding test, this problem requires you to apply common computer science constructs to a realistic problem, and then challenges you

“How many seconds do you need to build a working directed graph from raw data, and run BFT/DFT on it?”

45 minutes given. Target is to complete first 2 questions with working code. Can some candidates complete all 3 questions? I guess so.

Q: You are given some raw data like

parent_child_pairs = [ (1, 3), (2, 3), (3, 6), (5, 6), (5, 7), (4, 5), (4, 8), (8, 10), (11,2) ]

Suppose we have some input data describing a graph of relationships between parents and children over multiple generations. The data is formatted as a list of (parent, child) pairs, where each individual is assigned a unique integer identifier.

For example, in this diagram, 3 is a child of 1 and 2, and 5 is a child of 4:

  11
   \
1   2   4
 \ /   / \
  3   5   8
   \ / \   \
    6   7   10

Q1: write a function to output all individuals having no parents(like 1 and 4) in a list, and another list of individuals having a single parent (like 8 and 7)

Q2: write a bool function to determine if two named individuals have any common ancestor. 3 and 6 yes; 3 and 1 no!

I wrote a DFT solution .. https://github.com/tiger40490/repo1/blob/py1/py/tree/commonAncestor_Indeed.py Not very efficient but I really should care less about that since the real challenge is .. timely completion. I was not used to writing DFT on the spot within minutes but I hacked it together under ticking clock, first time in my career !

To find if two sets intersect, I was forced to make a quick judgment call to write my own loop.

  • I didn’t know if there’s a simple and reliable solution online
  • i didn’t know how much effort is required to search online and understand it
  • i didn’t know how much effort is required to adapt standard solution to suit my needs
  • My own loop gives more legwork but more control if requirements turns out to be non-standard.

Q3: (original wording) Write a function that, for a given individual in our dataset, returns their earliest known ancestor — the one at the farthest distance from the input individual. If there is more than one ancestor tied for “earliest”, return any one of them. If the input individual has no parents, the function should return null (or -1).

Sample input and output:

findEarliestAncestor(parentChildPairs, 8) => 4
findEarliestAncestor(parentChildPairs, 7) => 4
findEarliestAncestor(parentChildPairs, 6) => 11
findEarliestAncestor(parentChildPairs, 1) => null or -1

max-profit buy 1 sell any-bought #pimco java Q8

Pimco Java HackerRank Q8

Q8: Each minute, your trading platform allows you to either buy one share, sell any number of shares that you own (short sell forbidden), or not make any transaction at all. Your task is to find the maximum profit you can obtain with an optimal trading strategy.

I remember having issues with some HackerRank test cases. Should use 64-bit int rather than the default java int.

This problem appears to be very similar to day-trading in hindsight #Nsdq but requires drastically different thinking:

  • a single scan to the left, starting from the last price point.
  • any time there’s a left-ward drop, we have a trading opportunity!

https://github.com/tiger40490/repo1/blob/py1/py/array/maxProfit_buy1sellAny.py is my tested solution, peer-reviewed by Ashish.

count paths between 2 binTree nodes #PimcoQ9 Ashish,Pinsky

The DP idea — compare matrix-path-counter and EditDistance

  • showcasing efficient queue in python.
  • showcasing using (x,y) coordinates as dictionary key
  • showcasing find max value in a dictionary

— Requirement: (https://leetcode.com/problems/unique-paths-ii/description/ is similar)
See Q9.pdf in the email to Ashish. Here are some key points:

Consider a maze mapped to a 2d-grid with an upper left corner at coordinates (row, column) = (0, 0). Any movement must be in increasing row or column direction. You must determine the number of distinct paths through the maze. You will always start at position (0, 0), the top left, and end up at (max(row), max(column)), the bottom right.
1 1 0 1
1 1 1 1
As an example, consider the above matrix where 1 indicates an open cell and 0 indicates blocked. You can only travel through open cells, so no path can go through the cell at (0, 2). There are two distinct paths to the goal. As a 2nd example, matrix below has 10 paths:
1 1 1 1
1 1 1 1
1 1 1 1

https://github.com/tiger40490/repo1/blob/py1/py/2d/classic_pathCountQ9.py is my solution.

==== Analysis ====
I see a binary tree, where each node is a cell. Cell (0,0) has down/left node (1,0) + right node (0,1). I feel this is similar to a more generic problem “count paths between 2 nodes in a binary tree”

— DFT:
😦 I implemented something like a DFT but it was too slow with some test cases
😦 can overflow stack
🙂 DFT can print each unique path. I think BFT can’t.
🙂 DFT is easier to implement than BFT

— DynamicProgramming BFT solution from origin, as I described to Bill Pinsky:
This solution is more versatile than the one from Ashish. It can handle any directed graph.

Mark each node as “computed” once we have computed a score denoting how many paths-from-root there are. Save the score in a shadow matrix.

Origin node has score 1. Blocked cell has score 0.

Start BFT from origin. The two immediate neighbors are set to 1 (or 0 if blocked). Every node can be computed by adding up above-score + left-score. (This algo is simpler than the EditDistance algo.)

Memoization Performance note — My implementation was extremely slow (albeit correct) until I added an “if already computed, then continue” early in the loop

This Layered DP algo is also described in max-path-sum problems

Q: why is score[1,1] accessed 4 times?
A: node[1,1] is added to the queue twice. Each dequeue would need one check.
A: Also score[1,2] need to access score[1,1] as a parent. Ditto score[2,1]

— Ashish Singh gave me a much simpler solution, as shown in my github code.

partition list into good^bad sections #c++LiquidNet

Q: template<typename I, typename P>
I myPartition(I bb, I const end, P isgood); //return first Bad node in the range after shuffling all Bad ones to the end. If no Bad node, then return the original end.

You can call it like firstBad = shuffle(li.begin(), li.end(), isEven). Itr and Pred are template params or well-defined typedefs.

Note we can’t assume the iterator provide sort order. We can only inc/dec and check equality.

https://github.com/tiger40490/repo1/blob/cpp1/cpp/linkedList/partition_LNet.cpp is my solution but it’s hard to reason because I move two rather than one pointer. Interviewer gave a simpler solution with similar time complexity.

Lessons:

  • in a two-iterator algo, if every move on each iterator has some logic, then it is probably over-complicated. In this case, we could move one iterator according to some logic, and simply move the other iterator to the same spot.
  • “bb should be at or before ee” — the invariant is critical but hard to maintain in the face of complex logic
  • There were many scenarios in my design. There are fewer scenarios in the interviewer’s design.

staircase problem #CSY@Bbg

Q (bbg question posed to CSY): given a staircase of height N (eg 3), you can reach the top by three steps 1,1,1 or two steps 1,2 or 2,1, or a single step of 3. Generate all paths for a given N.

I think the solution(s) need more review and internalization.

This problem has many classic solutions

  • bottom-up
  • top-down recursion with memoization
    • yield-based generator function
  • BFT without recursion without duplicates

I feel this is simpler than AQR factorization and the CombinationSum problems.

Is this same as N-boy-split? No… split is harder. With the split, ABC can have a “step” of AC.

easy to test —  https://github.com/tiger40490/repo1/blob/cpp1/cpp/algo_comboPermu/staircase_CSY.cpp was my non-recursive BFT solution.

https://github.com/tiger40490/repo1/blob/py1/py/algo_combo_perm/staircase_CSY.py is my yield-based recursive solution. Only 5 lines in the implementation. I added lots of tests.

Q: why path count == 2n-1?
Answer from CSY: f(1)=1, f(2)=2, f(n) = f(n-1)+f(n-2)…+f(1) = 2n-1
A: For a staircase of n, you can take step of 1 and have f(n-1) paths, or you can take step of 2 and have f(n-2) paths…

—jargon file:

a STEP from LEVEL 0 to 2 has LENGTH=2, and skips level 1; a PATH consists of 1 or more STEPS; a FORMULA is a complete PATH, and always starts at level 0 and may hit intermediate levels #2, #5, #6 …

— key observations:
* Observation 1: Every formula starts with firstStepOf1, or firstStepOf2, (or firstStepOf3…) with not duplicate between these two groups !
* Observation 2: After taking firstStepOf1, the remaining steps are really a formula for staircase of N-1, a smaller problem. This invites a bottom-up DP solution.

—BFT solution. suppose N = 5. We model each path as a directory path from root. Each queue item is a path represented by a linked list or vector of capacity N.

I hope this solution avoids duplicates… Yes it does:)

  1. enqueue all “first levels” /1; /2; /3; /4; /5
  2. then dequeue and visit first path i.e. “1”. In this case, there are 4 levels remaining in the staircase, so enqueue /1/1;  /1/2;  /1/3;  /1/4
  3. then dequeue and visit 2nd path in the queue i.e. “/2”. enqueue /2/1; /2/2;  /2/3

Every time (after dequeue) we see a path has no remaining level, we have a formula.

—DP (bottom-up) iterative solution to be implemented

  1. first, build the formulas for a length-2 staircase. They are /1/1 + /2 i.e. two formulas saved in FormulaArrayForStaircase2 i.e. “fa2”
  2. then build the formulas for a length-3 staircase — fa3 = firstStepOf2 -> fa1 + firstStepOf1 -> fa2
  3. then build the formulas for a length-4 — fa4 = firstStepOf3 -> fa1 + firstStepOf2 -> fa2 + firstStepOf1 -> fa3

— recursive: CSY gave a recursive solution and interviewer asked “Can you find a non-recursive solution”?

I think the recursive solution repeats the same sub-problem (such as a 3-level staircase) over and over again, but we can make the recursive function more efficient with memoization — it should short-circuit when the input already has a solution saved in a global hashtable.

—- python yield/generator recursive solution

My github code shows simple generator function without memoization is about half the speed of bottom-up. With memoization, it’s 20% faster than bottom-up.

monkey crossing river #rare, codility

https://github.com/tiger40490/repo1/blob/cpp1/cpp1/66miscIVQ/monkey_Mak.cpp is my tested solution. Correct, but Performance was rated at 33%, I have since used a set<Timestamp> to improve performance. Still it can’t meet the time complexity.

Q: A monkey is on one bank (position −1) of a river and wants to get to the opposite bank (position N). The monkey can jump any (integer) distance between 1 and D. If D is less than or equal to N then the monkey cannot jump right across the river. Luckily, there are many stones under water. Water is progressively draining away and soon some stones will be out of the water. The monkey can jump to and from any stone, but only when the stone is out of the water.

The stones in the river are described in array A consisting of N integers for N stones labeled 0 to N-1. A[K] represents the first time the stone at position K will emerge. (A[K] = −1 means never). You can assume that no two stones will surface simultaneously. The goal is to find the earliest time our monkey can reach other bank.

For example, consider D = 3 and array A consisting of N = 6 integers:
A[0] = 1
A[1] = -1
A[2] = 0
A[3] = 2
A[4] = 3
A[5] = 5

At time 2, there will be three stones out of the water.

   p o s i t i o n
----------------------
  -1 0 1 2 3 4 5 6 |<-- 6 is the destination; -1 is Starting point
     1 ~ 0 2 3 5   |<-- when each stone surfaces
         0         |Time 0 //one stone at position 2
     1   0         |Time 1 //two stones 0,2
     1   0 2       |Time 2 //three stones 0,2,3
                time ⤴

Time 2 is the earliest moment when the monkey can cross the river (for example, by jumps of length 1, 3 and 3).

Write a function: int solution(vector<int> &A, int D); that, given an array A consisting of N integers, and integer D, returns the earliest time when the monkey can cross the river. If the monkey can do so in one jump, just return 0. Return −1 to mean never.

For A = [3, 2, 1] and D = 1, the function should return 3, since the monkey has to wait for each stone.

For A = [1, 2, 3, 4, −1, −1, −1] and D = 3, the function should return −1 meaning never.

Each element of array A is an integer within the range [−1..100,000], where -1 means no-stone.

Complexity:
expected worst-case time complexity is O(N+max(A));
expected worst-case space complexity is O(N+max(A)).

given int array,find triplets: x divides y;y divides z #Trex

Within a sequence of unique natural numbers, count the occurrence of so-called triplets of (x,y,z) where x divides y and y divides z.

I feel this is not a classic, but the technique is somewhat reusable.

https://github.com/tiger40490/repo1/blob/py1/py/array/tripletsInPreSorted_Trex.py is the improved solution after lots of hints.

— a useful technique:

Construct each {factor, multiple} pair and save it in a collection. Any new pair constructed will be checked against all existing pairs. The “check” is the trick. My 25 Apr 2018 commit shows the most efficient “check” I can think of.

list of pairs -> list of triplet -> list of quadruplet -> list of quintuplet … Build up pairs to construct triplets. Use triplets to construct quadruplets. Use quadruplets to construct quintuplets.

The pairs collection doesn’t need to be fully built before we start constructing triplets.

3-way sorter/classifier #FB

— Requirement — You are give a list of possibly repeating integers. Each belongs to exactly one category, either category Low, Medium or High. There is a function to return the category for any integer. Please write a constant-space function to output all category-L members, followed by all the category-M members, and finally the category-H members.

If input list has 9 trillion integers, output will have 9 trillion intergers too.
Within category Low, you can print in any order.
Within category Medium, you can print in any order.
Within category High, you can print in any order.

Personally, I would suggest (to make things concrete) that L means single-digit; M means double-digit and H means 3-digit or higher.

I failed to clarify requirements — input is array or linked list?

=== solutions

— 1st design — edit the same list in-place. Extract-insert each L or M element to group all elements into three sections. L on the left, H on the right. First design uses array, So I maintain 2 markers (itegers).

Invariant — the mm marker points to start of the M section. hh marker points to start of the H section, and a current marker points to start of unclassified section. L section is on the left of mm.

  • Every time I see a L element, I move it to beginning of array, and increment mm. (It’s more efficient to move it to left of mm.)
  • Every time I see a M element, I move it to the left of hh, and increment hh
  • Every time I see a H element, I do nothing.

— Second design avoids the inefficient array shift, using STL linked list.

Q: When to adjust mm and hh iterators?
A: Only when erase() hits mm or hh. This is important and easily missed. There’s no error message. Result may not show any difference. As a general rule

“Always check every iterator (except unneeded) when you do a list::erase()”

I plan to implement 1st design in python list (vector internally) and 2nd design in std::list.

–3rd design swap-based. If we only have an array and not a linked list, then swap-based design is efficient

https://github.com/tiger40490/repo1/blob/cpp1/cpp1/array/3waySorterArray.cpp is my tested solution

punctuate ContinuousSentence

I believe https://leetcode.com/problems/word-break/description/ is identical.

Q (bbg): Given a dictionary of English words (containing no numbers no underscores) and a very long sentence, someone has removed all the spaces. Please write a program to restore it by adding a space after each word. If there’s no way to parse the sentence just return False, otherwise return True to indicate you found at least one way to parse the sentence.

The sentence can have repeated words.

I was allowed to use either white-board or computer.

Brute force? I struggled for a long time with some vague idea. Interviewer asked me to implement any solution I have, however inefficient , but I was unable to.

Key observation — This is a typical QQ type of algo question. If you remember one key point, then you have a advantage.

Key observation — syntax and ECT is not really hard. Only simple string operations. Data structure is simple. Challenge is an elegant (precious) pure algo.

Key observation — the recursive backtracking solution is working and clever-looking but actually not the very best. Still, recursive thinking is valuable and gave me a first solution. I’m not ashamed of my recursive solution.

Key observation — This problem is considered not too hard because …. the solution is short! But most of us can’t conceive such a solution! I won’t trust the difficulty level in leetcode.com

https://github.com/tiger40490/repo1/blob/py1/py/str/continuousSentenceBBG.py shows my

recursive backtracking  solution. Interviewers basically said it works.

There’s an inherent tree structure in this greedy backtracking algo. This tree has the power to meet the tougher requirement in Q2. As such, this algorithm is more powerful than then algo below.

— More efficient solution: non-recursive, no backtracking

Suppose the sentence has 99 chars ch[0], ch[1] … ch[98]. We maintain an “aboveWater[99]” bool array. A ch[33] is above water if there exists a 4-char bridge word from aboveWater character ch[29] ….

The bool array is assumed underwater initially. We scan it forward once to check and mark selected char as aboveWater.

Once the left section of the array is analyzed, we don’t need to revisit it by backtracking.

The simplest algo — forward iterate every position. At each position i,
look at the last 2 chars j-1~j
look at the last 3 chars j-2~j
look at the last 4 chars j-3~j
… If any is a bridge word, then mark j as aboveWater and move on.

— Q2(Leetcode): Given a non-empty string s and a dictionary wordDict containing a list of non-empty words, add spaces in s to construct a sentence where each word is a valid dictionary word. Return ALL such possible sentences.

%%A: My github solution should solve it even if not optimal. Need to test.

binary-matrix island count #DeepakCM

Q: https://leetcode.com/problems/number-of-islands/description/

https://github.com/tiger40490/repo1/blob/py1/py/grid/ has my solution. I don’t want to spend the time passing all leetcode tests! Diminishing return

https://www.geeksforgeeks.org/find-number-of-islands/ is conceptually identical, though using a different criteria for “connected” — diagonal neighbors are “connected”

Hi Deepak

I realize for “connected” problems, there’s definitely a graph underneath, so graph traversal is required. I guess BFT or DST will both find all the nodes connected to “me”.

Given a 1000 x 1000 all-black matrix, I think DFT recursion will go into exactly 1,000,000 levels and overflow stack space.

A high-level (vague) idea is

  • Scan in type-writer fashion. Suppose there are 10×10 = 100 cells either black or write. I paint each cell brown [1] once visited. I also use a integer counter to keep track how many cells already visited.
  • During the scan. If a cell is already visited, I will ignore it and move on
  • Any time I find a black cell, I will start a BFT (I dislike DST) to find all connected black cells.  Once I find all the black cells connected to “me”, I resume the type-writer scan.
  • Once my counter hits 100, I have visited all cells and will exit.

[1] you paint it white, which is probably better.

I thought this problem was too hard and not worth study, but you convinced me that it may come up and I can at least give a vague solution  … better than no solution.

Compared to a binary tree, walking over a matrix is /operationally/ (not conceptually) harder because

  • need to check array bounds. Error-prone and time consuming in coding tests
  • The (invisible) links will not cover all cells. To avoid missing a cell, we still need a full matrix scan. The integration of tree walk + matrix scan is unintuitive, to put it mildly.
  • During the tree-walk, you don’t want to visit any cell too many times or get lost in an endless loop. A brute-force technique — shadow matrix to remember all visited cells.
  • … However, once you get over these nitty-gritty operational complexities, then tree-walk in matrix is not really harder. These algo questions can therefore frustrate and fail many candidates untrained on The technique.

FB2: look-n-say sequence #80%

This is one of the 3 sample FB coding questions, fairly popular

https://github.com/tiger40490/repo1/blob/py1/py/algo_arr/ez_lookAndSay_FB.py

… has my self-tested solution.  Among other things it showcases a reusable technique — VO to simplify state maintenance.

Q: Implement a function that outputs the Look and Say sequence, as explained in https://en.wikipedia.org/wiki/Look-and-say_sequence
1
11
21
1211
111221
312211
13112221
1113213211
31131211131221
13211311123113112211
–Analysis
Iterative solution should do. Based on the last string, we can generate a new string. Simply scan and split into segments. Each segment would produce 2 digits, to append on the new string.

Corner cases? No

hackerrank IV: full query

/*requirement: given 99 sentences and 22 queries, check each query against all 99 sentences. All the words in the query must show up in the sentence to qualify.
*/
bool check(string & q, string & sen){
    istringstream lineStream(q);
	string qword;
    while(getline(lineStream, qword, ' '))
	  if ( string::npos == sen.find(qword)) return false;
	return true;
}
void textQueries(vector <string> sentences, vector <string> queries) {
  for (string & qr: queries){
	string output1query;
    for(int i=0; i<sentences.size(); ++i){
	  if (check(qr, sentences[i]))	    output1query.append(to_string(i)+" ");
	}
	if (output1query.empty()) output1query = "-1";
    cout<<output1query<<endl;	 
  }
}
int main(){
  vector<string> sen={"a b c d", "a b c", "b c d e"};
  vector<string> que={"a b", "c d", "a e"};
  textQueries(sen, que);
}

bbg-Eq: longest abbreviation #easier

Given a long word with N non-unique characters, we know there can be 2^N abbreviations (including the original word, and including the empty string).

Requirement: I give you a list of strings and you suspect some of them could be an abbreviation of your word. Find the longest string among them that’s a valid abbreviation. Optimize for time complexity.

I feel this problem appears simple but not easy to complete quickly

https://github.com/tiger40490/repo1/blob/py1/py/str/testAbbr_ez.py is my python solution, different from the c++ solution below. Not sure which has more bugs.


I started with string::find() and hit the problem of non-unique characters. Interviewer hinted char-matching — the key point I needed for this type of problem.

#include <iostream>
#include <algorithm>
#include <vector>
using namespace std;

typedef string candidate;

vector<candidate> v{"monkey","aaaaaaaaaaaaaaaaaa", "mamalon", "monk"};
string const haystack{"mamalonkey"};
size_t hsize=haystack.size();

//Tip: can use typdef alias in argument:)
bool lenComp(candidate const & a, candidate const & b){
  return a.size()<=b.size();
}
ostream & operator<<(ostream & os, vector<candidate> const & c){
  size_t const sz = c.size();
  for (int i=0;i<sz; ++i){
        os<<c[i]<<" ";
  }
  os<<endl; return os; } //single-pass 🙂 
} 

bool check1candate(candidate const & c){ 
  if (c.size() > hsize) return false;

  for(int i=0, h=0; h<hsize; ++h){
        if (c[i] == haystack[h]){
          if (i==c.size()-1) return true;
          ++i;
        }
  }
  return false;
}
int main(){
  sort(v.begin(), v.end(), lenComp);
  cout<<v; for (int i=v.size()-1; i>=0; --i){
    if (check1candate(v[i])){
        cout<<"winner is "<<v[i];
        return 0;
    }
  }
}

 

bbg-FX: in-place array shuffle #splice^evict

Update: In-Place usually requires 1) splice 2) swap.

Requirement: given a list like

{1,2,3,4,  25,29,44,33,  159,139,155,150,  177,176,169,180} which describes four Persons
{id1,id2,..age1,age2, weight1,weight2,…height1,height2,..}

Write a program to output each person’s full attributes like

{1, 25, 139, 177,   2, 29, 129, 176,   3, 24, 135, 169,   4, 33, 140, 180}

Note the number of attributes (in this case 4) is an input parameter to your program. You don’t know (or care about) the name of each attribute. If you are told there are 37 attributes in each person, just output attr1 .. attr37.

Note the input and output sequence are important.

Q (Bonus question): do it in-place without creating a new collection. You can still have a small number of variables in your code.
A: one way to “cheat” is to append each output person to end of the original vector. It will double in size, so we will truncate the first half.
A: I believe I can use the “homeless/displacement” technique.
A: I now feel many in-place array reshuffle can be done in place with a single temp variable but this may not be one of them.
A: how about starting with a linked list?

https://github.com/tiger40490/repo1/blob/cpp1/cpp/array/insitu_reshuffle_attrib_bbg.cpp has ..

bbg-FX: check if a binary tree is BST #no deep recursion

Binary search tree has an “ascending” property — any node in my left sub-tree are smaller than me.

Q1: given a binary tree, check if it’s a BST. (No performance optimization required.) Compile and run the program.

Suppose someone tells you that the lastSeen variable can start with a random (high) initial value, what kind of test tree would flush out the bug? My solution is below, but let’s Look at Q1b.

Q1b: what if the tree could be deep so you can’t use recursion?
A: use BFT. When we actually “print” each node, we check left/right child nodes.


I made a mistake with lastSeen initial value. There’s no “special” initial value to represent “uninitialized”. Therefore I added a boolean flag.

Contrary to the interviewer’s claim, local statics are automatically initialized to zerohttps://stackoverflow.com/questions/1597405/what-happens-to-a-declared-uninitialized-variable-in-c-does-it-have-a-value, but zero or any initial value is unsuitable (payload can be any large negative value), so we still need the flag.

#include <iostream>
#include <climits>
using namespace std;

struct Node {
    int data;
    Node *le, *ri;
    Node(int x, Node * left = NULL, Node * right = NULL) : data(x), le(left), ri(right){}
};
/*    5
  2       7
 1 a
*/
Node _7(7);
Node _a(6);
Node _1(1);
Node _2(2, &_1, &_a);
Node _5(5, &_2, &_7); //root

bool isAscending = true; //first assume this tree is BST
void recur(Node * n){
//static int lastSeen; // I don't know what initial value can safely represent a special value

  //simulate a random but unlucky initial value, which can break us if without isFirstNodeDone
  static int lastSeen = INT_MAX; //simulate a random but unlucky initial value
  static bool isFirstNodeDone=false; //should initialize to false

  if (!n) return;
  if (n->le) recur(n->le);

  // used to be open(Node *):
  if (!isAscending) return; //check before opening any node
  cout<<"opening "<<n->data<<endl; if (!isFirstNodeDone){ isFirstNodeDone = true; }else if (lastSeen > n->data){
        isAscending=false;
        return; //early return to save time
  }
  lastSeen = n->data;

  if (n->ri) recur(n->ri);
}

int main(){
  recur(&_5);
  cout<< (isAscending?"ok":"nok");
}

first N prime Fibonacci: cache server #Thesys

The challenge is not in Fib or prime, but in the algo around them.

//This version is memory efficient as it only remembers the last
//two Fib numbers + all previous prime Fib numbers

//Requirement: For a given input number X, print
//the first X prime Fib numbers. Each output is a Fib and prime
//
//This version is memory efficient as it only remembers the last
//two Fib numbers + all previous prime Fib numbers
//
//The typedef of Fib is nice documentation
//
//My version of isPrime() is simpler
#include <iostream>
#include <string>
#include <vector>
#include <cstdlib>
using namespace std;

typedef unsigned int Fib;
bool isPrime(int i){
        if (i <= 1) return false;
        for(int j=2; j<=i/2; ++j) {
            if(i%j==0) return false;
        }
        return true;
}

pair<Fib,bool> generate1Fib(){
  static Fib parent = 1;
  static Fib grandparent = 0;
  Fib newFib = parent + grandparent;
  grandparent = parent;
  parent = newFib;
  return make_pair(newFib, isPrime(newFib));
}

void print1stXFibPrimes(size_t const X){
    static vector<Fib> fibPrimes;
    int count = 0;
    for(int j=0; j<fibPrimes.size(); ++j){
                cout<<fibPrimes[j]<<" ";
                if (++count >= X) return;
    }

    while( count<X ) {
          pair<Fib,bool> new1 = generate1Fib();
          if (new1.second){
                fibPrimes.push_back(new1.first);
                cout<<new1.first<<" ";
                ++count;
          }
    }
}
inline bool isInteger(const string & s) {
   if(s.empty() || ((!isdigit(s[0])) && (s[0] != '-') && (s[0] != '+'))) return false ;
   char * p ;
   strtol(s.c_str(), &p, 10) ;
   return (*p == 0) ;
}
int main(int argc, char *argv[]){
  for(int i=1; i<argc; ++i){
    string arg(argv[i]);
    if (isInteger(arg)){
      cout<<"calculate the first "<<arg<<" prime fibs"<<endl;
      print1stXFibPrimes( stoi(arg)) ;
      cout<<endl;
    }else
      cout<<arg<<" is not an integer\n";
  }
}

find anagram groups]file #1-pass, FB

Requirement: if any 2 words are anagrams, that’s a group. Within a file, identify all such groups.

Normalize each word using counting sort on the characters of the word. There’s no competing alternative in this isolated part of the solution. Main part of the solution is sorting all words based on the normalized version. There are many alternative solutions to this sort.

desirable — memory efficiency. create no or few objects besides the strings read in from the file.

desirable — avoid the need to create/allocate linkNode or wrapper objects around original words. In c++, such a wrapper is smaller than in java, probably a pointer to the original word + a link pointer

Solution 1 — insert into a sorted set, using a customized comparison (subclass binary_function) based on the normalized string and something unique, such as line number or object address. After all the inserts, when you iterate the set, an anagram group would show up in a row. We can also keep a separate array or iterators (pointers) that point to the first word in each group — emulating a multimap.

Without the something unique, the set would drop words.

I think insertion is inefficient since a normalized string is regenerated for the same word over and over. To overcome this, we could create a wrapper object with 2 fields — normalized string + a pointer to the original word. However, if “SAT” is a very common normalized string then we create multiple copies of it.

Solution 2 — separate chaining. Maintain a hashed map (faster) or sorted map keyed by the normalized strings, just a single copy for each normalized string. Value of each map entry is a linked list of the original words.

#include <iostream>
#include <iomanip>
#include <fstream>
#include <sstream>
#include <string>
#include <vector>
#include <map>
#include <algorithm>
using namespace std;

map<vector<char>, size_t> m;
template<typename T> ostream & operator<<(ostream & out , vector<T> const & v){
        for(int i=0; i<v.size(); ++i) out<<setw(1)<<v[i];
        out<<" ";
}
int main(){
  ifstream fs("words.txt");
  string line, word;
  while(getline(fs, line)){
     stringstream ss(line);
     while(getline(ss, word, ' ')){
        //trim
        int endpos = word.find_last_not_of(" ");
        if (endpos < string::npos) word = word.substr(0, endpos+1);
        if (word.empty()) continue;
        //lower case
        transform(word.begin(), word.end(), word.begin(), ::tolower);
        //sort
        vector<char> sorted(word.begin(), word.end()); sort(sorted.begin(), sorted.end());

        int count = ++m[sorted]; //1st time we get 1 🙂
        cout<<sorted<<"\t<- "<<word<<" :" <<count<<endl;
     }
  }
}

 

show free slots between meetings #bbg c++11

Q: You are given a list of teleconference booking like {9,10} meaning from 9am to 10am. We can multi-task, so overlap is fine. Can you print out all the free slots within the day? You can safely assume all the input data are sensible and between 9 to 18, i.e. our workday.

I solved this problem on the spot. My weakness is still the ECT cycle esp. the compiler errors.

I was familiar with the inverse problem of listing all busy sessions. So I decided to *focus* on that. I took a calculated risk that once I get that to work, I will have the cool confidence to solve the original problem.

corner case: two adjacent meetings. I added comment to specify exactly how to check for that
corner case: what if the entire day is one busy session?

int const SOD = 0, EOD=24;

struct Intv{ //can be a single meeting, a joined session or a free window
  int bgn, end; //end = 10 means 10 o'clock sharp. Will become float
  Intv(int a, int b): bgn(a), end(b){}
  bool operator<(Intv const & other) const { return this->bgn < other.bgn;}

  //can be a free function if no private fields
  friend ostream & operator<<(ostream & os, Intv const & a){
        os<<a.bgn<<'-'<<(float)a.end-0.41;
        return os;
  }
};
template<typename T> ostream & operator<<(ostream & os, vector<T> const & v){
        for(int i = 0; i<v.size(); ++i) os<<v[i]<<"  ";
        os<<endl;
}
void printBusy(vector<Intv> const & arg){
  if (arg.empty()) return;
  vector<Intv> & v = const_cast<vector<Intv>&> (arg);
  sort(v.begin(), v.end());
  size_t const sz = v.size(); //cache
  Intv growable = v[0];
  for(size_t i = 1; i<sz; ++i){
        Intv & req = v[i];
        if (growable.end < req.st){
          cout<<growable<<endl; //close current session
          growable = req;//start new session
          continue;
        }
        growable.end = max(growable.end, req.end);
  }
  cout<<"last session : "<<growable<<endl;
}
void printWindow(vector<Intv> const & arg){ //subsequent exercise
        if (arg.empty()){
                cout<<"free slot "<<Intv(SOD, EOD)<<endl;
                return;
        }
        vector<Intv> &v = const_cast<vector<Intv> &>(arg); //avoid vector deep cloning
        sort(v.begin(), v.end());
        cout<<v<<endl;
        if (v[0].bgn > SOD){
          cout<<"SOD free slot "<<Intv(SOD, v[0].bgn)<<endl;
        }
        Intv growing(v[0]); //current busy period
        for(int i=1; i<v.size(); ++i){
          //Heart of the algo. each new meeting either extends the current
          //busy period or prints a free window before it
          Intv & meeting = v[i];
          //cout<<i<<" a meeting: "<<meeting<<endl;
          //cout<<"growing = "<<growing<<endl;
          if (growing.end < meeting.bgn){
                cout<<"free slot "<<Intv(growing.end, meeting.bgn)<<endl;
                growing = meeting;
          }else{
                growing.end = max(growing.end, meeting.end);
          }
        }
        if (growing.end < EOD)
                cout<<"EOD free slot "<<Intv(growing.end, EOD)<<endl;
}
int main() {
        //pass in initializer list to initliaze a const vector
        printBusy({Intv(15,17), Intv(7,9), Intv(5,7)});
}

remove subsequent duplicates in slist #SocGen

Q (SocGen hackerrank): given the head node of a linked list that might have duplicate values,  return the head node of a sanitized linked list without duplicates.

eg: Node@0x7089 (value=3) ->Node@0x6481 (value=2) ->Node@0x7921 (value=3) ->Node@0x6308 (value=7) ->null

would become Node@0x7089 (value=3) ->Node@0x6481 (value=2) ->Node@0x6308 (value=7) ->null

Note the first node (value=3) is still in the list, but all subsequent nodes with (value=3) are discarded.

https://github.com/tiger40490/repo1/blob/cpp1/cpp/linkedList/removeDupNodes_SocGen.cpp is my tested solution

  • The two-pointer solution sounds simple but is over-complicated and error-prone during implementation.
  • fake-link-head technique is again useful

Monkey-jump problem{Ashish IV@Baml #solved

On the 2D-coordinate plane, A monkey starts from point [a,b] and wants to reach [P,Q]. From any point, the money can only jump to one of two points:

  1. [x,y] up to [x, x+y]
  2. [x,y] out to [x+y, y]

Q: Can you write bool canReach(unsigned long long a,b,P,Q)
A: start from [P,Q] and have no decision to make!

#include <iostream>
using namespace std;
struct Point {
    int x, y;
    Point(int x, int y) : x(x), y(y) {}
    friend ostream& operator<<(ostream& os, Point & n){
      os<<n.x<<","<<n.y;
      return os;
    }
};
bool canReach(unsigned int a, unsigned int b, unsigned int P, unsigned int Q){
  Point lower(a,b), higher(P,Q);
  for(Point p=higher;;cout<<p<<endl){
    if (p.x == lower.x && lower.y == p.y){
        cout<<"Good:)"<<endl;
        return true;
    }
    if (p.x==p.y || p.x<lower.x || p.y<lower.y){
        cout<<"Failed:( Can't jump further"<<endl;
        return false;
    }
    if (p.x<p.y){ p.y = p.y-p.x ; continue;} 
    if (p.x>p.y){ p.x = p.x-p.y ; continue;}
  }
}
int main() {
  cout<<canReach(1,10, 57,91);
}

 

minimum-cost array shrinking #Citadel

Input is a vector of positive integers. You are to shrink it progressively. Each step you remove 2 selected elements, and replace with their sum. Therefore vector size drops by 1 at each step, until there’s one element left.

At each step there’s a cost, which is defined as that sum.

Eg: {4,2,1} becomes {4,3} after you combine 2/1. The cost at this step is the sum 2+1=3.

Q1: For a given vector, find a sequence of steps with the lowest total cost. Test your code in c++
Q2: how would you optimize your code, assuming high data volume.

#include <vector>
#include <queue>
#include <algorithm>
#include <iostream>
#include <string>
#include <functional> // std::greater
using namespace std;

vector<int> vec = { 3,2,1 }; // a std::set will fail when duplicate values show up like {3,3}
priority_queue<int, vector<int>, std::greater<int> > pq(vec.begin(), vec.end());

void dumpVector(string msg) {
 cout << msg << ", size = " << vec.size() << endl;
 for (auto it = vec.begin(); it != vec.end(); ++it) cout << *it << ' ';
 cout << endl;
}

int operateVector(int sum = 0) {
 auto lowestItem = min_element(vec.begin(), vec.end());
 sum += *lowestItem;
 vec.erase(lowestItem); // now s1 is an invalidated iterator and unusable

 //remove() is bad as it removes all not one item having target value!
 //v.erase(std::remove(v.begin(), v.end(), *lowestItem), v.end()); 

 dumpVector("afer erase");
 return sum;
}

void dumpHeap(string msg) {
 auto clone = pq;
 cout << msg << ", size = " << clone.size() << endl;
 for (; !clone.empty();clone.pop()) {
 std::cout << clone.top() << ' ';
 }
 cout << endl;
}
int operateHeap(int sum = 0) {
 sum += pq.top();
 pq.pop();
 //dumpHeap("afer pop");
 return sum;
}

int f1(int sum = 0) {
 return operateHeap(sum);
}
int main87() {
 int totalCost = 0;
 for (; pq.size()>1;) {
 int sum = f1(f1()); //call f1() twice recursively.
 pq.push(sum);
 dumpHeap("afer push");
 totalCost += sum;
 }
 cout << "total cost = " << totalCost << endl;
 return 0;
}

maxProfit, max-sum subArray #Kadane

One of the top 3 all-time favorite algo questions, worth studying in-depth. I know only two algorithms — (Kanade) disposableCurSubArray and lowWaterMark (my own and Xinfeng Zhou). I think both are equivalent from a few angles.

My algo is intuitive (to me) for level input (delta input can be transformed into levels) . One pass. I feel it’s not inferior in any way.

In contrast, disposableCurSubArray is intuitive for delta input i.e. maxSubArray.

https://github.com/tiger40490/repo1/tree/py1/py/array has a tested solution with lots of explanations.

https://www.geeksforgeeks.org/largest-sum-contiguous-subarray/ has a simpler solution, to be understood. I rewrote it in https://github.com/tiger40490/repo1/blob/cpp1/cpp/array/maxSubarraySum.cpp

  1. special case: If all numbers are negative, then final answer is the “smallest” negative number as a lonewolf array.
  2. special case: If all numbers are positive, then final answer is the entire array
  3. so the tricky case must have a mix of negative/positive

Here’s my explanation of Kadane’s algo:

  • Delta Input array has arr[0], arr[1] .. arr[n]. let’s denote maxSubsumEndingAt(i) as B[i]. The max Subarray Ending At #55 is a continuous subarray starting somewhere before #55. It can be a lone-wolf containing only #55, as an important special case. In fact, in the mixed positive/negative array, we usually encounter this case.
  • (2-pointer algo) We maintain a left marker and a right marker. Both move to the right. maxSubArrayEndingAt(i) is basically an accumulating subarray from left marker to right marker.
  • B[i] == max(B[i-1] + arr[i]    vs  arr[i] )
    • if  B[3] is negative i.e. all 4 sub-array sums ending in arr[3] are all negative, then B[4] should not include any of them. We can discard them for good and “start afresh“(while keeping the global maxSumSeen)
    • else, there exists a “positive” sub-array ending at arr[3], so we keep growing it until it becomes negative.
    • (See github python code) I can imagine a left-marker, the start of the current max subarray. We will move the right marker to grow the sub-array if the current sub-array is useful i.e. positive. Otherwise, we start afresh and the left marker jumps and coincide with right-marker
  • Note sign(B[i-1]) is key but sign(arr[i]) is irrelevant

Here’s my attempt to connect my lowWaterMark to Kanade’s algo:

I suspect that whenever we move the left marker, we always have a new lowWaterMark.

(Before the first element, Level is defined as 0 . In case III, the low water mark is usually a negative level.) Suppose A few elements after hitting low water mark level=-66, we hit level=-22. This is not a new water mark. maxSubsumEndingAt[this element] is actually positive since there exists a subarray starting right after the “level=-66” element!

When we hit level=-77, a new water mark, the maxSubsumEndingAt[this element] is basically zero, as the disposableCurSubArray is discarded. We start afresh to accumulate. Essentially, we reset the “base” to be the new water mark.



			

3c++London bbg algo IV #all about array

Q: given a string of characters, find the longest “run” of any repeating character.

Q: given an array of integers, and a target, determine if there’s any pair that add up to the target. Return true/false.

Q: maximum profit problem. Given a time series of historical spot FX rates, find the maximum possible trading profit by exactly one buy one sell.
A: I have seen this many times. First write pseudo-code.

FB 2questions #regex..

—Q1: write a stateless utility function to take in any string consisting of a-z, and return a string with all the original characters appearing once only. All 2nd and subsequent occurrences should be removed. eg: abcbdc → abcd

I don’t want to new up a string in my utility, so I decided to take in an output char* buffer which should be large enough.

I guess this is all about efficient speed coding.
—Q2: Suppose there’s a regex grammar that can accept exactly 3 constructs
1) literals — any combination of abcdefghijkljmnopqrstuvwxyz,
2) dot (.) that matches any one character
3) asterisk — a standard quantifier

For example,

“a*” can match empty string, a single a, or aa, or aaa etc.
“.*” matches any string including the empty string.
“a*b*a” should match “a” and also match “aa”, or “ba”, or “aaa”

—-
Now, write a utility function bool match(char const * regex, char const * S);

Note entire string S must match entire regex. We don’t want to get a false positive when only a substring of S matches the regex.
—– some tips—–

  1. Easy part — if regex ends in literals, then they must “eat up” the tail of S exactly.
  2. After this step, if regex ends with a dot, it will just eat up the last char of S. After this step, regex definitely ends in a star.
  3. If 2nd char in regex is not a star, then first char must eat up head of S.
  4. Repeat previous step until we see star as 2nd char.
  5. Now we can basically split the regex into segments. Each segment is either a single literal, a single dot, or a non-star followed by a star.

Interviewer said — First assume dot is not allowed in regex. We will deal with the dot later.

—– analysis
EPI Question 6.23 is very similar to my Facebook interview. I told FB interviewer that we don’t NEED such problem solving at work. He said this is one method (standard method?) they assess a candidate’s coding abilities.

I feel it (and google questions) is like IQ test or probability test or abstract reasoning test — irrelevant to work, but used to assess learning capacity. They don’t want a dumb person on their team.

The truth is, practice is key. There are patterns. A very very smart person will still fail if without enough practice. A mediocre guy can speed up significantly if practised enough. I said the same thing to My Hwa Chong math teacher Tony Tan. This factor is also the reason I became so strong in physics.

Q: A simple regex (SRE) can use alphanumerics, dot and star. It is enclosed between implicit ^ and $. Write a match() function taking a SRE and a word. No white space allowed.

https://github.com/tiger40490/repo1/blob/py1/py/regexFB.py is my implementation. Here are a few test cases:

assert not match(”, ‘.’)
assert match(‘xyab-abc’, ‘.*ab.*c.*x*’)
assert match(‘xyab-abc’, ‘.*ab.*c.*.*’)
assert match(‘xyab-abc’, ‘.*ab.*c’)
assert match(‘aaaaaaab-abc’, ‘a*aab.*c’)
assert match(‘aab-abc’, ‘a*aab.*c’)
assert match(‘aabc’, ‘a*aab.*..*x*’)
assert not match(‘abc’, ‘a*aab.*c’)
assert match(‘a-aab’, ‘a*.*a*aab’)
assert not match(‘a-aab’, ‘a*a*aab’)

c++ Push all zeros in array to the end #easy

Basic programming exercise

#include <cstdlib>
#include <iostream>
#include <iterator>
#include <assert.h>

using namespace std;
int a[] = {0,0,-1,2,0,4,0,0,8,0};
/*
Push all the zero's of a given array to the end of the array. In place only. Ex 1,2,0,4,0,0,8 becomes 1,2,4,8,0,0,0
*/
int main(int argc, char *argv[])
{
   size_t size = sizeof(a)/sizeof(a[0]);
   copy(a, a+size, ostream_iterator<int>(cout, " "));
   int left = 0;
   int right = size -1;
   while(left < right){ if (0==a[left]){ while (0==a[right]) --right; if (left >= right) break;
          swap(a[left], a[right]);
       }
       ++left;
   }
   cout<<"\n new: \n";
   copy(a, a+size, ostream_iterator<int>(cout, " "));
   cin.get();
}

slist: remove consecutive dupes #Macq

#include
#include
struct litem {
     char data;
     litem* next;
};
void myDelete(litem * i){
  delete i;
}
int remove_consecutive_duplicates( litem*& list ){
  vector trash; // collection of bad nodes' address, to be DELETE'd
  litem* keep = list; // this pointer only points to “good” nodes to keep
  for (litem* p = keep->next; p;  p=p->next ){
    if (keep->data != p->data) {
      keep->next=p; // often unnecessary, but very cheap
      keep=p;
      continue;
    }
    trash.push_back(p);
  }
  keep->next = 0; // keep is now the last good node. It may or may not point to null. This  is cheap insurance.
  int ret = (int) trash.size();
  std::for_each(trash.begin(), trash.end(), myDelete);
  // now, trash.size() == ret probably, as trash holds a bunch of stray pointers, but just in case size changes in the foreach loop, we save the original size in advance.
  return ret;
}
void dump( litem*& list){
    for (litem * p=list; p; p=p->next){
      cout<

data< “;
    }
    cout<<endl;
}
int main23(){
   litem* a = new litem();
   a->data='a';
   litem* a2 = new litem();
   a2->data='a';
   litem* a3 = new litem();
   a3->data='a';
  
   litem* b = new litem();
   b->data='b';
   litem* b2 = new litem();
   b2->data='b';
   litem* b3 = new litem();
   b3->data='b';
   litem* b4 = new litem();
   b4->data='b';
  
   litem* c = new litem();
   c->data='c';
   litem* c2 = new litem();
   c2->data='c';
   litem* c3 = new litem();
   c3->data='c';
  
   a->next = b;
   b->next = c;
   c->next = c2;
   c2->next = a2;
   a2->next = b2;
   b2->next = b3;
   b3->next=b4;
   b4->next=a3;
   dump(a);
   remove_consecutive_duplicates(a);
   dump(a);
   cin.get();
}