bbg: allocate half the players to NY^SF

Always good to be concrete, so without loss of generality, let’s say J=99 i.e. 198 players.

Q1: Given 2J players, we need to allocate exactly half of them to NY and the rest to San Francisco, to form two teams for a inter-city contest. We receive the NY fare and SF fare for each player, as 2J (i.e. 198) pairs of integers. How can we minimize the total fares?

Q2(bonus): what if 2J+1 players, and you can decide which city gets an extra player?

— Analysis: how many possible allocations? 2J-choose-J. Very large number of allocations. Something like O(J!) so brute force is impractical.

If for any player the two fares both go up by $9800, it doesn’t change our algorithm at all. Therefore, we only care about the fare difference (N-S) for each player.

— solution: I will assume most players live near SF so SF fares are lower. I tabulate and sort the 198 fare differences “NY fare – SF fare” and suppose indeed at least 99 (half) are positive[1]. Therefore, my “base” is SF i.e. my base allocation is everyone-allocated-to-SF. Next I must pick 99 (half) of them and allocate to NY.

I will avoid the higher values in the table.

I simply pick the lower 99 (half) in the table, because these are the 99 “extra” costs I will incur. I want to minimize the total of these 99 values, whether they are mostly positive or mostly negative.

  • Note the highest number in the table indicates the most expensive player for NY relative to SF. If this number is negative, then SF is more expensive than NY for All players so follow the rule in [1] but notice he is the least expensive players for SF relative to NY. Regardless of positive/negative sign, we want to keep this person in SF .
  • Note the lowest number in the table indicates the least expensive player for NY relative to SF. If this number is negative, then SF is more expensive than NY for this player — normal. Regardless, we want to allocate her to NY.

[1] if proven otherwise (by the data), then we could easily treat NY as base to keep the solution intuitive. Actually even if we don’t change the base, the algorithm still works, albeit unintuitively.

A2: Say J=99 so total 199 players We already picked 99 for NY. Do we pick one more for NY or keep remaining 100 in SF?

Just look at the next lowest value in the table. If positive, then NY is more expensive for him, so keep him in SF.

Advertisements

celebrity search #MS

Q: write a Function to search for celebrities in a collection of people with the help of a function that can tell whether one person knows the other (bool knows(a, b)) which is an asymmetric relation. Celebrities are those who know no one but are known by anyone else.


Analysis: There can be 0 or 1 celebrity. There are exactly N*N relationships in the system, so brute force is O(N*N) but I propose a 2-pass O(N) solution:

  1. First pass: scan the list from left Node A, B .. to N
    1. if A knows B, then remove A from list; else, remove B from list. So first step always shrinks list by 1
    2. Now check 1st two (or any two) persons in remaining list. Check either direction. Will remove exactly one person.
    3. list will shrink to 1 person (Dan) after N-1 steps
  2. Second pass: check this Dan throughly. Dan must be known to everyone and must not know anyone. We end up with either no celebrity OR one celebrity in Dan.

Lesson learned — better use A/B/C or Alice/Bob/Cody instead of #1, #2, #3.

first N prime Fibonacci: cache server #Thesys

The challenge is not in Fib or prime, but in the algo around them.

//This version is memory efficient as it only remembers the last
//two Fib numbers + all previous prime Fib numbers

//Requirement: For a given input number X, print
//the first X prime Fib numbers. Each output is a Fib and prime
//
//This version is memory efficient as it only remembers the last
//two Fib numbers + all previous prime Fib numbers
//
//The typedef of Fib is nice documentation
//
//My version of isPrime() is simpler
#include <iostream>
#include <string>
#include <vector>
#include <cstdlib>
using namespace std;

typedef unsigned int Fib;
bool isPrime(int i){
        if (i <= 1) return false;
        for(int j=2; j<=i/2; ++j) {
            if(i%j==0) return false;
        }
        return true;
}

pair<Fib,bool> generate1Fib(){
  static Fib parent = 1;
  static Fib grandparent = 0;
  Fib newFib = parent + grandparent;
  grandparent = parent;
  parent = newFib;
  return make_pair(newFib, isPrime(newFib));
}

void print1stXFibPrimes(size_t const X){
    static vector<Fib> fibPrimes;
    int count = 0;
    for(int j=0; j<fibPrimes.size(); ++j){
                cout<<fibPrimes[j]<<" ";
                if (++count >= X) return;
    }

    while( count<X ) {
          pair<Fib,bool> new1 = generate1Fib();
          if (new1.second){
                fibPrimes.push_back(new1.first);
                cout<<new1.first<<" ";
                ++count;
          }
    }
}
inline bool isInteger(const string & s) {
   if(s.empty() || ((!isdigit(s[0])) && (s[0] != '-') && (s[0] != '+'))) return false ;
   char * p ;
   strtol(s.c_str(), &p, 10) ;
   return (*p == 0) ;
}
int main(int argc, char *argv[]){
  for(int i=1; i<argc; ++i){
    string arg(argv[i]);
    if (isInteger(arg)){
      cout<<"calculate the first "<<arg<<" prime fibs"<<endl;
      print1stXFibPrimes( stoi(arg)) ;
      cout<<endl;
    }else
      cout<<arg<<" is not an integer\n";
  }
}

find anagram groups]file #1-pass, FB

Requirement: if any 2 words are anagrams, that’s a group. Within a file, identify all such groups.

Normalize each word using counting sort on the characters of the word. There’s no competing alternative in this isolated part of the solution. Main part of the solution is sorting all words based on the normalized version. There are many alternative solutions to this sort.

desirable — memory efficiency. create no or few objects besides the strings read in from the file.

desirable — avoid the need to create/allocate linkNode or wrapper objects around original words. In c++, such a wrapper is smaller than in java, probably a pointer to the original word + a link pointer

Solution 1 — insert into a sorted set, using a customized comparison (subclass binary_function) based on the normalized string and something unique, such as line number or object address. After all the inserts, when you iterate the set, an anagram group would show up in a row. We can also keep a separate array or iterators (pointers) that point to the first word in each group — emulating a multimap.

Without the something unique, the set would drop words.

I think insertion is inefficient since a normalized string is regenerated for the same word over and over. To overcome this, we could create a wrapper object with 2 fields — normalized string + a pointer to the original word. However, if “SAT” is a very common normalized string then we create multiple copies of it.

Solution 2 — separate chaining. Maintain a hashed map (faster) or sorted map keyed by the normalized strings, just a single copy for each normalized string. Value of each map entry is a linked list of the original words.

#include <iostream>
#include <iomanip>
#include <fstream>
#include <sstream>
#include <string>
#include <vector>
#include <map>
#include <algorithm>
using namespace std;

map<vector<char>, size_t> m;
template<typename T> ostream & operator<<(ostream & out , vector<T> const & v){
        for(int i=0; i<v.size(); ++i) out<<setw(1)<<v[i];
        out<<" ";
}
int main(){
  ifstream fs("words.txt");
  string line, word;
  while(getline(fs, line)){
     stringstream ss(line);
     while(getline(ss, word, ' ')){
        //trim
        int endpos = word.find_last_not_of(" ");
        if (endpos < string::npos) word = word.substr(0, endpos+1);
        if (word.empty()) continue;
        //lower case
        transform(word.begin(), word.end(), word.begin(), ::tolower);
        //sort
        vector<char> sorted(word.begin(), word.end()); sort(sorted.begin(), sorted.end());

        int count = ++m[sorted]; //1st time we get 1 🙂
        cout<<sorted<<"\t<- "<<word<<" :" <<count<<endl;
     }
  }
}

 

ref-counted copy-on-write string #MIAX exchange IV

I completely failed this 2011 IV question from MIAX options exchange:

Q: outline a ref-counted copy-on-write string class, showing all the function declarations
A: here’s my 2017 answer

struct Payload{ //this object must live outside any Str instance. If 1st Str instance in the group is destroyed, this object must live on.
	char * const arr;
	size_t const length;
	mutable size_t refCount;
	Payload(std::string const &);
};
class Str{
	Payload const * payload;
public:
	~Str(); //decrement refCount and if needed, delete the payload
	//Str(); //default ctor is useless since after construction we can't really modify this instance, due to copy-on-write
	Str(std::string const &);
	Str(Str const &); // similar to shared_ptr

	Str & Operator=(Str const & other) const; // will return a reference to a new instance constructed on heap (never on stack!). The new instance will have a ref-count initialized to 1
	Str & replace_with(std::string const &) const; //ditto

// optional utilities
	char const * c_str()    const; // <-- Hey mine is very similar to std::string
	Str(char const * arr, size_t const len); // <-- Hey mine looks similar to std::string
	friend ostream & operator<<(ostream &, Str const &);
};

minimum-cost array shrinking #Citadel

Input is a vector of positive integers. You are to shrink it progressively. Each step you remove 2 selected elements, and replace with their sum. Therefore vector size drops by 1 at each step, until there’s one element left.

At each step there’s a cost, which is defined as that sum.

Eg: {4,2,1} becomes {4,3} after you combine 2/1. The cost at this step is the sum 2+1=3.

Q1: For a given vector, find a sequence of steps with the lowest total cost. Test your code in c++
Q2: how would you optimize your code, assuming high data volume.

#include <vector>
#include <queue>
#include <algorithm>
#include <iostream>
#include <string>
#include <functional> // std::greater
using namespace std;

vector<int> vec = { 3,2,1 }; // a std::set will fail when duplicate values show up like {3,3}
priority_queue<int, vector<int>, std::greater<int> > pq(vec.begin(), vec.end());

void dumpVector(string msg) {
 cout << msg << ", size = " << vec.size() << endl;
 for (auto it = vec.begin(); it != vec.end(); ++it) cout << *it << ' ';
 cout << endl;
}

int operateVector(int sum = 0) {
 auto lowestItem = min_element(vec.begin(), vec.end());
 sum += *lowestItem;
 vec.erase(lowestItem); // now s1 is an invalidated iterator and unusable

 //remove() is bad as it removes all not one item having target value!
 //v.erase(std::remove(v.begin(), v.end(), *lowestItem), v.end()); 

 dumpVector("afer erase");
 return sum;
}

void dumpHeap(string msg) {
 auto clone = pq;
 cout << msg << ", size = " << clone.size() << endl;
 for (; !clone.empty();clone.pop()) {
 std::cout << clone.top() << ' ';
 }
 cout << endl;
}
int operateHeap(int sum = 0) {
 sum += pq.top();
 pq.pop();
 //dumpHeap("afer pop");
 return sum;
}

int f1(int sum = 0) {
 return operateHeap(sum);
}
int main87() {
 int totalCost = 0;
 for (; pq.size()>1;) {
 int sum = f1(f1()); //call f1() twice recursively.
 pq.push(sum);
 dumpHeap("afer push");
 totalCost += sum;
 }
 cout << "total cost = " << totalCost << endl;
 return 0;
}

construct graph from list of connections #BGC java

Given an input file showing a list of {string, string} pairs, build a connection graph.

If you have a correct connection graph, then you can easily determine the connectedness (bool) of any 2 nodes. In a social-network, this bool flag indicates whether 2 individuals are completely unconnected or somehow connected.

—-analysis:
I see this as a social-network. Any pair represents an edge connecting 2 nodes.  At any time there are a number of disconnected islands. The next pair could 1) merge 2 islands or 2) add a node to an existing island or 3) create a new island 4) do nothing, if the 2 nodes are already in some existing island

  • Any known node appears exactly once in the entire graph, in exactly one of the islands.
  • All nodes are contained in a lookup table or hashmap  {node -> island}
  • Each island can be a implemented as a hashset of nodes.

So here’s a proposed algo to process a new pair {A, B}. Look for A and B in the  graph. 3 scenarios + a dummy scenario:

  • (Scenario 3) If both A an B are new comers, then they form a new island.
  • if both A and B are already in the graph,
    • (Scenario 4) if they are in the same island, then exit. Nothing to do
    • (Scenario 1) else we can merge the 2 islands
  • (Scenario 2) If A is in island 3 but B is new comer, then B joins island 3

The merge operation is expensive. The big lookup table needs update but here’s an alternative:

  • At merge time, the smaller island would have all the nodes moved to the bigger island. When the island is empty, it gets a pointer “this.redirect” to the bigger island.
  • lookup table needs no update, avoiding locking a global object.
  • At query time, we look up the table to get the original island, then we follow its pointer (defaults to null) until the island is non-empty.
  • endless loop? would only be a programming error.

max product (j-i)*min(arr[i], arr[j]), streaming mode #Ashish

In the static, non-streaming context, the optimal solution is perhaps in my gmail. The brute force would evaluate roughly N*N/2 pairs. We can reduce that significantly.

Start with 1st/last, which is quite possibly the final winner. Save this as maxProductOfLoopA and evaluate a[0] vs a[9 i.e. last].

  • Suppose a[0] > a[9], then 1/9 , 2/9 , 3/9 etc are automatically eliminated.
  • Suppose a[0] < a[9], then 0/8, 0/7, 0/6 etc are automatically eliminated.
  • if a[0] == a[9], then 0/8 etc are automatically eliminated

You can visualize it as removing an outer layer from a NxN matrix. Note the matrix is triangular and has exactly one outer row and one outer column at any moment. In the first step, you either remove the outer row or outer column.

Supposed you removed */9 column. In 2nd step, we compute 2nd product at 0/8, and 2nd evaluation of a[0] vs a[8] and remove another outer layer.

In about N steps we should reduce the matrix to 1 remaining cell. This cell could be the final winner so we must evaluate it.

—-

Hi Ashish,

Let me change your question slightly. Input numbers come individually in a stream. Among all possible pairs of numbers, we want to compute and publish the latest maximum product:

    A(i, j) * B(i, j)

, where A(i, j) is the time lag j-i and B(i, j) is minimum(arr[i], arr[j])

NB: We will need to keep all numbers seen so far, since the the winning pair might include the first elements.

At any time there’s a current max. When we receive the 89th element, enter loop A:

compute the product for 1st/89th i.e. arr[0]/arr[88]
* If arr[0] > arr[88], then just exit the loop with this product as maxProductOfLoopA. No need to try 2nd/89th or 3rd/89th, as all those products are all smaller than 1st/89th. (This is the genius of the solution you told me.)
* otherwise, compute the product for 2nd/89th. If it exceeds maxProductOfLoopA, then update maxProductOfLoopA. Now check if arr[1] > arr[88]. If yes just exit loop A with maxProductOfLoopA
* otherwise compute the product for 3rd/89th….

Once we exit loopA, update the current max product using maxProductOfLoopA.

For streaming situation, I think this is one good solution, if not the optimal solution.

count coordinate points enclosed by polygon #codecon

Q: On a 49-by-49 super-sized GO-board, there are 2500 coordinate points. Now I give you a 15-side polygon described using the [x,y] coordinates of the 15 vertexes. List all enclosed points.

A: Here’s my tentative algorithm. We will paint and repaint all coordinate points in lime and red as we process the 15 vertexes one by one. At the end of drawing the polygon, we will know which color is enclosed. (Without loss of generality, assume vertexes are never on board boundary.)

1st, assume lines are never completely vertical. We will easily deal with vertical lines later.

  1. with 1st vertex, nothing to paint
  2. After 2nd vertex, we have a closed line (i.e. both ends defined), but we will treat it as an “open line” which is a line with both ends on board boundary.
    1. We paint half the board lime and the other half red. All 2500 coordinate points are painted. Any point on the actual closed line (not the open line) is left unpainted.
    2. how we decide the color? Basically when moving from Vertex 1 to 2, points on the left are painted lime. Specifically, if line is to the right, then below is painted red; if line is to the left, then below is lime.
  3. When we process 3rd vertex, we create a half-open line, meaning it has only one end defined (the previous vertex). The other end is actually the new vertex but we pretend the line extends beyond it and ends at boundary of the board. This is a divider into either lime or red section — new vertex’s current color will tell us. Suppose it cuts into the red, then part of the red must turn lime , i.e. we repaint the affected points lime. We will repeat this step for each remaining vertex.
    1. so which part to change color? follow the rule in 2.2
    2. Note we only repaint part of the newly-divided region. We don’t touch points in the unaffected region.
  4. When we process 4th vertex, we again create a dividing half-open line, either in the red or lime region.
  5. After we process the last vertex, in addition to the half-open line, we create a final line, this time a closed line with both ends defined. We repaint the “outside” points
  6. Completed. The color of [0,0] is the outside color.

Once we have a “database” or “lookup function” of all the “inside” coordinate points, let’s return to the original Bloomberg algo question (https://codecon.bloomberg.com/contest/5948031613881024512/3908)

First, we paint all unpainted points (i.e. on the closed lines) as inside points. Then we iterate over the 49 x 49 squares on the board one by one.

  • IFF all 4 corners are “inside” then it’s an inside square.

Math-wise, the main challenge appears to be Step 2.2. Solution: Suppose we hold x=2 constant. Check all the coordinate points. All the points “below” the “open line” can be pained lime .

IFF the line is exactly vertical, then all the points to the right are considered “below”.

Data structure — 2D array for the board, similar to my Tetris program. A Point class

[12]MS IV:func2back-scan any container,print every Other item

Have I overspent my time on this once-asked question?

The umbrella question — write a utility function to iterate any container and print out every Other element backwards?

Good coding practice! I think this is all about iterator syntax knowledge (my weakness) not algorithm (my strength)!

Note this is really about knowledge not  coding abilities. QQ not ZZ.

Iterator declaration is a can of worm 😦 I might need to give up on this.

#include <iostream>
#include <vector>
#include 	<list>
#include <set>
using namespace std;

template<class _InIt>  void printAlternateItem2itr(_InIt _First, _InIt _Last){
	bool flag = true;
	// if the iterator is from rbegin, then ++ would reverse it!
	for (_InIt it = _First; it != _Last; ++it, flag=!flag) {
		if (flag) cout << *it << ' ';
	}
	cout << endl;
}
template <typename CONT> void printAlternateItemBackward(CONT cont) {
	printAlternateItem2itr(cont.rbegin(), cont.rend());
}
int main() {
	//vector<int> cont = { 11,2,3,4,5,6,7,18 };
	//list<int> cont = { 11,2,3,4,5,6,7,18 };
	string cont = "0123456789a";
	set<int> cont2 = { 11,2,33,44,55,66,77,88,99 };
	printAlternateItemBackward(cont);
	printAlternateItemBackward(cont2);
	int arr[] = { 11,2,3,4,5,6,7,18,9 };
	int size = sizeof(arr) / sizeof(arr[0]);
	printAlternateItem2itr(arr, arr + size); //forward only
}

—-
Q: is comparison defined on all iterators?
A: now I think linked list doesn’t. Now I think only random access itr does.

%%Q: what’s the signature of STL find()? I will use those declarations of iterators in my function. (Actually the map and set containers have member functions find() outperforming std::find)

%%Q: from a const container, can u get a non-const iterator?

Q: why don’t you take a container as input? Why must you take iterators?
%%A: it’s more common to take iterator, but in this case container will do. All containers provide rbegin() or begin() including string. Raw array doesn’t but the iterator increment won’t work for raw arrays anyway.


Separate question
Q: OO design — how would you represent Order state transition graph in an OMS?