Pacific/Atlantic #51%

Q: Given an m*n matrix of non-negative integers representing the height of each unit cell in a continent, the “Pacific ocean” touches the left and top edges of the matrix and the “Atlantic ocean” touches the right and bottom edges. Water can only flow in four directions (up, down, left, or right) from a cell to another one with height equal or lower.

Print all cells where water can flow to both the Pacific and Atlantic ocean.

peek? not yet
— solution 1:
Create m * n nodes in a 2D grid. Two more nodes:
All north|west boundary nodes have an edge From Pacific node
All south|east boundary nodes have an edge From Atlantic node

Between two adj cells, there’s 1 or 2 directed edges, from low to high. Intuition — tracing pollutant upstream. An arrow from P to Q means Q is suspect.

How do I represent the graph? Each node can hold a list of “suspect neighbors” (higher or equal).

Run BFT or DFT from Pacific node and turn on flag in each reachable node.

Run the same from Atlantic. Every reachable node with the flag set is a double-suspect to be printed.


find target word in 2D grid #20%

Q (Leetcode 79) Target word can be constructed from letters of sequentially adjacent cell on the matrix, where “adjacent” cells are those horizontally or vertically neighboring. The same cell may not be used more than once.


It’s critical to recognize early on that a real test case probably consists of just 2 letters abaabaababbabba. 

(Otherwise your analysis would be simplistic and a waste of precious time in interview.) Note interviewer will NOT give you this input early on.

This problem is not restricted to 2D grid.. can be generalized to any graph.

— idea 1: locate the first letter i.e. head node. Once found, start DFT. If failed, mark the first letter as “dead head” and look for the next head.

if all the “head nodes” are dead, then give up. Note a dead head can be part of a solution, just not the head.

clone connected graph given any one node #!!need4stdSol

There are a few ways to phrase the same question.

  1. For a connected graph of colored nodes (say, only a few colors or black/white), you are given one node only. Please clone entire graph.
  2. Given one node in a connected graph, visit every node and recreate entire graph in another computer.
  3. Given one node in a directed graph, serialized it, transmit to another computer and deserialize.

===== analysis

I feel this is one of the most practical algo problems. I feel my solution is fairly efficient for binary tree too.

Not too hard conceptually, if your mental picture is clear. I would need to assign node IDs (based on addresses [1]) to differentiate the black nodes. These IDs are serialized. So are the edge lists [1].

— For serialization, Basic idea is a BFT (or DFT) to visit every node. Each time a node address is encountered again (probably among the edge list of some node), we would simply skip it.

Within each node, the this->edges field is “translated” and serialized as a list or treeSet of node_IDs. The serialized file looks like

  • ID=1, edges=3,4,5,6
  • ID=2, edges=3,5,7
  • ID=3, edges=5,7,9
  • ID=4, edges=6,8

I think serialization is O(V+E). I think same O() for deserialization.

— For deserialization, we can simply construct all node objects in an array, using ID as index.  The edge list can stay as is. Optionally, we translate each edge from an ID into an address in the array.

[1] these are basic algoQQ pointers

serialize slist: each node points to a random node#Rahul

Q: a slist has one more link field in each node. It’s the address of a random node in the same slist. How do you serialize this linked list?

I think the generic solution works but there is hopefully a more intuitive one

— solution 1:
first traversal to build a hashtable {node address -> auto-increment id}

2nd scan visits each node to save the random field along with the host node id

DFT|BFT: “seenNodes”

Q: in graph DFT and BFT, do we need a “seen” hashtable to remember visited nodes?

A: I probably needed it in some of my coding problem
A: I think so. The textbook DFT/BFT algos use white/grey/black for this purpose. It’s basically a

hashtable { node addr -> color } esp. if nodes are readonly.

This color technique is not needed for a simple binary tree, but it can become useful in tough CIV such as cyclic graphs.

topological_sort@DAG: linear algo to assign ranks?

Q: topological sort — given a directed graph, any linear algo to assign ranks?

I prefer to use one shared rank for multiple nodes that can be simultaneously started/concretized/evaluated. This feature can increase flexibility and parallelism

terminology — avoid “dependency” — confusing. Prefer “upstream/downstream” or “ancestor/descendant”. Note ancestors should be started/processed first.

rank table — We can use a hashtable (or pre-sized vector) to store the ranks: {rank -> list of nodes of that rank}. Assigning a node means adding the node id to the correct list, in O(1)

Assumption 1: the original graph node contains ancestors but no descendants. spreadsheet-model. I think this is Kahn’s assumption.

Assumption 2: the original graph nodes contain descendants but no ancestors. notification list or “call list”, or “listener list”

In most situations, One of these two assumptions would hold, but rarely both.

Scan-1 O(V+E) — build a hashtable-based two-way edgeSet representation of the graph. For each node, we maintain a hashset (or slist) of ancestors and a hashset of descendants. The duplication is needed, as described below in the Kahn context.

Scan-2 O(V) — assign rank 0 to all top-level nodes (no precedent). Now we can use the rank table to scan rank-0 nodes

Scan-3 — Now scan the last assigned rank, rank-0 in this case.
For each node in that list, check each downstream child.
Unconditionally remove (O(1) thanks to hashset) the precedent from inside the child.
After that, If the child has empty hashset of ancestors it is assigned rank 1.
I now believe the precedent/dependent link is never accessed again, so we can remove both.

Repeat the last scan at Rank 1, then Rank 2..

Every node is assigned only once. Every edge is checked only once or twice.

Can the ancestors hashset become an integer count?

— simplicity
In this design, I use multiple Simple passes and avoid doing too much in one pass. If needed, you can combine Scan-2 and Scan-1.

We treat the original nodes as readonly — nice simplification.
— terminology:
precedent/dependent is accurate but abstract.
“Dependency” is a confusing term. It means someone I depend on. Better avoid this word in graph problems.
uplink/downlink is visual only in a tree with root on top

— Kahn uses “incoming edge” to mean a precedent or an upstream node
“All nodes with no incoming edge” … implies a node object has this->ancestors field

When he visits downstream nodes from “current node”, he needs this->descendants field

This crucial detail is not explained in wikipedia

=== The DFS algo, as described on wikipedia

Basic idea — check each node and start a DFS. Whenever a leaf node is found, remove it and prepend it to output list. Game over when last node is removed.

  • last node removed will be one of the root nodes.
  • first node removed will be one of the original leaf nodes
  • Invariant: At any time, size of output list + size of graph == N
  • Output list will be a sorted list of items to “process”
  • Implementation note: Actual removal of a node is tricky in either matrix representation or edge-list representation, so it’s easier to use a hashtable to hold “removedNodes”

The simple algo above will fail if cycles exist. To check cycles, we need a  small feature — “temporary mark”.

  • I think this feature can detect cycles in any directed graph such as a tree.

BFT phrasebook

BFT is one of the top 10 heavily used constructs in coding tests.

recursion-free — BFS feels like the most intuitive algo, recursion-free.

two hits — each node is accessed twice — appended to the FIFO and then popped from the FIFO. We need to be careful what to do at first vs 2nd visits. You can get confused if not unaware this subtlety.

graph — BFT is defined on graph, not only trees

tree — BFT produces a tree … the BFT tree.

mailer — Example: send a mailer to all linkedin contacts:

# send to each direct contact
# send to each second-degree contacts …


word ladder #50%done

Leetcode Q 127: word ladder — Given two words (beginWord and endWord), and a dictionary’s word list, find the length of shortest transformation sequence from beginWord to endWord, such that:

* Only one letter can be changed at a time.
* Each transformed word must exist in the word list. Note that beginWord is not a transformed word.
* Return 0 if there is no such transformation sequence.
* All words have the same length.
* All words contain only lowercase alphabetic characters.
* You may assume no duplicates in the word list.

Example 1:
beginWord = “hit”,
endWord = “cog”,
wordList = [“hot”,”dot”,”dog”,”lot”,”log”,”cog”]
Output: 5
Explanation: As one shortest transformation is “hit” -> “hot” -> “dot” -> “dog” -> “cog”,
return its length 5.

Example 2:
beginWord = “hit”
endWord = “cog”
wordList = [“hot”,”dot”,”dog”,”lot”,”log”]
Output: 0

First scan O(NN)to build the bidirectional edges of the graph. Given an existing graph of N words (N can be 1), a new word is compared against each to construct the edge list. N(N+1)/2 comparisons. No choice. No need to be extra clever as this simple O(NN) algo is optimal IMO. No need to worry about the visualization of the graph either because the edge list is a proven representation

At the end of this phase, if beginWord and endWord belong to disjoint sets then return 0. However I see no simple implementation of it.

2nd scan O(N+E) BFS. But there are many cycles, so we need a hashset “Seen”, or array of size N. Array is more elegant than hash table in this case.

To compare two word, char-wise subtraction should give all zero except one char. This last routine can be extracted to a “simple routine to be implemented later”, so no need to worry about it in a white board session.


represent graph: matrix^edgeList #std::forward_list

Josh’s book [[data structures and other objects using c++]] has 5 pages devoted to three best-practice data structures representing a generic directed graph. (I think my [[discrete math] also covers the same topic.)

Note all these data structures can serialize any graph.

  1. boolean adjacency matrix — adj mat
  2. edge set — can’t include two identical edges between nodes A and B (rare) but can support complex graph algos having risk of traversing the same edge repeatedly. The set can be a field of a graph node, or a hash table {node-> edgeSet}
  3. edge list — can be implemented using std::forward_list, marginally faster better than set… not discussed here

pros and cons:

  • iterating all edges connected to a node — edge set wins
  • sparse graph — adj mat wastes memory and not scalable.
  • test linkage between 2 nodes — adj mat wins
  • ease of implementation — adj mat wins

Fundamental classification:

  • In a so-called dense graph, number of edges ~ O(VV) i.e. between any pair of nodes there’s likely an edge
  • In a so-called sparse graph, number of edges ~ O(V).

[[algo in a nutshell]] written by some professors, compares matrix and edge list. For a dense graph, P140 describes an intriguing edge list practical implementation to save space and speed up isConnected(node1, node2). Key idea —

   “Number all individual nodes using auto-incrementing ID numbers.”

After this, we will see that a node’s edge list may look like 2,3,4,5, 8,9 …which can be summarized as 2-4, 8-9. I felt this was a trivial trick but is actually a standard implementation. There’s a tradeoff though — inserting edge is now slower, but I feel O() is unchanged.


SCB-FM algo Q2a 2 slists

Q: Given two well-formed singly-linked lists (loop-free), that might be merged at some node, locate the merge point. If not merged, return null.

Note every node has exactly one child node, but may have two parent nodes.


Suppose list A has size P=55 and list B has size Q=32

–SolH: hashtable O(N+M) — optimal in time but not space

First populate hashtable with the short list’s 32 nodes. Then iterate the longer list and check each node’s address.

–SolR: reverse one list … how?

–SolA: array-based.  construct a simple 55-array of A node pointers, and another 32-array of B node pointers. Then compare the two final elements in the two arrays. If same then binary search in B. O(N+M) + O(log M)

–Sol2: 2-pointer algo O(1) space, usable on read-only lists 🙂

first scan to find the 2 end nodes and remember the sizes like 55 vs 32.

IFF the end nodes match, then 2nd scan:

skip first P-Q (23) nodes in the longer list and then increment 2 iterators in lock steps. Compare the 2 iterators at each step.

merge 2 sorted slists #2try

Q: Merge two sorted linked lists and return it as a new (ascending) list. The new list should be made by splicing together the nodes of the first two lists.

A test of implementation skill. A test of clear communication and clear thinking. My solution below is O(1) space and O(N) time without allocating any heap memory.

I would pick the list having the lowest value as the Yellow jersey (leading). The other list is the green. At any time there exist two head nodes for two slists.

I always keep a brown pointer to the last of the leading pack in the yellow’s list. The leading pack are the “retired” nodes.

Swap – In the event of a ‘swap’, the brown’s current child (current yellow jersey list head) gets detached and becomes the new green jersey, and the old green jersey moves into the leading pack. Then we increment the ptr — this old green jersey’s child becomes the yellow jersey.

Every time after we increment the pointer, we compare its child node vs the green jersey.

I won’t bother with std::list::splice() and will use python or my own c++ linked list.

generate no-loop paths: graph node A→B

Q1: given 2 nodes in a graph containing N (eg 121) nodes, potentially with cycles, generate all simple paths between the pair. A simple path has no cycle. (In other words, for a simple path, length + 1 ==  # unique nodes)

I think there are classic math algorithms for it, because this is part of basic graph theory. Here are some applications of this type of algorithms —

  • Q1b (special case of Q1): given 2 nodes in a C-by-R rectangular grid, where every node is connected to (up to) four neighbors, generate all cycle-free paths.
    • Below, I solved this problem in python
  • Q2 (simple application of Q1 algo): generate all simple paths between ALL node pair in a graph. The shortest simple path has length=0. Longest simple path can potentially visit every node exactly once.
  • A: first generate all 121-Choose-2 node pairs. For each pair, solve Q1. Lastly generate the 121 trivial paths of length=0.
    • repetitive 😦
  • Q2b (special case of Q1): In a C-by-R rectangular grid, where every node is connected to (up to) four neighbors, generate all cycle-free paths between all pairs.
  • Q2c (easy one based on Q2): given a binary tree containing no cycles, generate all paths.

— A1b: my DFT implementation (probably not 100% correct) , where each “trail” either fails or becomes a path.

  1. from NodeA start a breadcrumb/trail. We can’t revisit any node already visited on current breadcrumb,
    1. if this is a matrix, then instead of a hashtable, we can also use a shadow matrix, but the breadcrumb is much smaller than a shadow matrix
  2. if we reach a node surrounded by nodes on the same breadcrumb, then the trail fails at a dead-end
  3. else we will reach NodeB 🙂 Print the breadcrumb

By construction, we won’t see duplicate paths 🙂 is the implementation

–BFT? I don’t think BFT can print each unique path

shortest path:2nodes]binary matrix #BFT

Q: given 2 cells in a binary matrix (1=black, 0=white=blocked), check the pair are connected and if yes return the shortest path. There exists a path of length 1 between any 2 cells IFF both are side by side or stack atop.

count paths between 2 bTree nodes #PimcoQ9 Ashish is arguably harder than this problem, but this problem allows moving in four directions.

binary-matrix island count #DeepakM technique is more applicable. A BFT path should work.

  • every reachable node is painted Green (like 2)
  • we give up after our queue is empty is the implementation, briefly tested.

max path sum problemS #high fan-out/fan-in

Note “path” implies directed (not bidirectional) edges i.e. dag

  • variation (rare): bi-directional edges
  • variation: single origin, single destination
  • variation: single origin, any destination among 55 leaf nodes
  • variation: each node has a +/- value
  • variation (rare): each edge has a +/- value. Edges usually outnumber nodes, as illustrated in the balloon-burst problem

–Common Special case: pyramid (see blogpost on recombinant binTree). The balloon problem has high fan-out.

I would try DP. For each node AA, determine (and store) the max cum-sum up to AA. Need to start from the layer 1 nodes (1 hop from origin), then use Layer 1 data to update layer 2 nodes.

Markov-like — For any node AA, the max cum-sum to AA only depends on the previous layer. No need to even care about a “fullpath” to AA. This property leads to dramatic *simplification*. Efficiency is a by-product of this simplification.

–Common Special case: non-recombinant binary tree.

i would use DFT + Kadane. Each origin-to-leaf path is independently handled. No layering.

Note Kadane algo handles +/- node values.

elegant use@priorityQ #indeed

Q: A given system receives a bunch of tasks at beginning of day.
task {
string id #assigned by this system as auto-incrementing string value
int execTime
list<task> children # all children becomes runnable as soon as this task completes. This is a classic edgeList
implement int getTotalTime(task) #including all descendant tasks

Q1: during system initialization we need to calculate how long each task (and its subtree) will execute. I solved this “warm-up” problem using a top-down tree-walk with memoization.

Q2: Now suppose this system expands to a farm of N identical cpu nodes. When a given cpu completes a task, it can only receive one specific runnable task, i.e. the one with smallest task ID. (This is an artificial constraint that hints at priority queue. I feel in a realistic context some other constraints may exist. Hopefully those constraints can be modeled and supported in the same priority queue.)

Overall, this is a realistic problem, not blatantly contrived.

Solution — I use two priorityQueues to hold two mutually exclusive subsets of tasks. Any task not in these PQs are completed (dead) or blocked (unborn). The mutual exclusion is a hallmark of clean designs.

PQ1 “the eligibles” holds all runnable tasks. This queue receives a group of tasks when group’s parent task completes — a (completion) event. This PQ is sorted by task id.

The event time is predictable as soon as this parent task starts running on a cpu.

PQ2: “the running” contains up to N tasks that are in the driver seats, sorted by expected end times.

My main() function is nothing but a while loop. In each iteration, pop the lowest item from the “running” queue, to simulate the earliest task completion event. At the completion, add all its children to the eligible queue. Then pop the leading eligible (possibly a new entry) and append it to the “running” queue.

The main loop is therefore an event loop like Win32 event loop. Therefore, it’s driven by the “running” queue, not the eligibles queue.

Elegant solution based on two priorityQueues. This data structure is not really intuitive so I think such intuition can and should be learned. This is my first enlightenment on the usage of priority queue. PQ1 yields the most eligible task, based on id or other criteria; PQ2 yields the earliest finisher according to each task’s duration and start time.

  • “Running” tasks is a natural usage of priorityQueue;
  • “Eligibles” is based on a stylized scheduling policy for illustration, but I can imagine real scheduling policies to use criteria other than FIFO. Note FIFO queue is less versatile than priorityQueue for scheduling.
  • Kernel scheduler uses priority queue in a similar way

Exit condition — while-loop ends when the running queue becomes empty. The eligibles queue must have emptied out earlier. The last event time is the total cost.

Minor Note on Initial data parsing: When Task 1 is received, it may reference zero or more child tasks by id, but the child Task-55 details are only available when Task-55 description is seen later on. So it’s best to store a list of child ID’s rather than pointers. The ID’s can translate to Task pointers via a lookup table, either a map or array

Minor Note: we don’t enqueue tasks unrelated to the initial task (specified by user input). We only care about the initial task and its subtree.

count paths between 2 binTree nodes #PimcoQ9 Ashish,Pinsky

The DP idea — compare matrix-path-counter and EditDistance

  • showcasing efficient queue in python.
  • showcasing using (x,y) coordinates as dictionary key
  • showcasing find max value in a dictionary

— Requirement: ( is similar)
See Q9.pdf in the email to Ashish. Here are some key points:

Consider a maze mapped to a 2d-grid with an upper left corner at coordinates (row, column) = (0, 0). Any movement must be in increasing row or column direction. You must determine the number of distinct paths through the maze. You will always start at position (0, 0), the top left, and end up at (max(row), max(column)), the bottom right.
1 1 0 1
1 1 1 1
As an example, consider the above matrix where 1 indicates an open cell and 0 indicates blocked. You can only travel through open cells, so no path can go through the cell at (0, 2). There are two distinct paths to the goal. As a 2nd example, matrix below has 10 paths:
1 1 1 1
1 1 1 1
1 1 1 1 is my solution.

==== Analysis ====
I see a binary tree, where each node is a cell. Cell (0,0) has down/left node (1,0) + right node (0,1). I feel this is similar to a more generic problem “count paths between 2 nodes in a binary tree”

— DFT:
😦 I implemented something like a DFT but it was too slow with some test cases
😦 can overflow stack
🙂 DFT can print each unique path. I think BFT can’t.
🙂 DFT is easier to implement than BFT

— DynamicProgramming BFT solution from origin, as I described to Bill Pinsky:
This solution is more versatile than the one from Ashish. It can handle any directed graph.

Mark each node as “computed” once we have computed a score denoting how many paths-from-root there are. Save the score in a shadow matrix.

Origin node has score 1. Blocked cell has score 0.

Start BFT from origin. The two immediate neighbors are set to 1 (or 0 if blocked). Every node can be computed by adding up above-score + left-score. (This algo is simpler than the EditDistance algo.)

Memoization Performance note — My implementation was extremely slow (albeit correct) until I added an “if already computed, then continue” early in the loop

This Layered DP algo is also described in max-path-sum problems

Q: why is score[1,1] accessed 4 times?
A: node[1,1] is added to the queue twice. Each dequeue would need one check.
A: Also score[1,2] need to access score[1,1] as a parent. Ditto score[2,1]

— Ashish Singh gave me a much simpler solution, as shown in my github code.

NxN matrix: graph@N nodes #IV

Simon Ma of CVA team showed me this simple technique. is my first usage of it.

  • I only needed half of all matrix cells (excluding the diagonal cells) because relationships are bilateral.
  • Otherwise, if graph edges are directed, then we need all (N-1)(N-1) cells since A->B is not same as B->A.

My discrete math textbook shows this is a simplified form of representation and can’t handle self-link or parallel edge. The vertex-edge matrix is more robust.


binary-matrix island count #DeepakCM

Q: has my solution. I don’t want to spend the time passing all leetcode tests! Diminishing return is conceptually identical, though using a different criteria for “connected” — diagonal neighbors are “connected”

Hi Deepak

I realize for “connected” problems, there’s definitely a graph underneath, so graph traversal is required. I guess BFT or DST will both find all the nodes connected to “me”.

Given a 1000 x 1000 all-black matrix, I think DFT recursion will go into exactly 1,000,000 levels and overflow stack space.

A high-level (vague) idea is

  • Scan in type-writer fashion. Suppose there are 10×10 = 100 cells either black or write. I paint each cell brown [1] once visited. I also use a integer counter to keep track how many cells already visited.
  • During the scan. If a cell is already visited, I will ignore it and move on
  • Any time I find a black cell, I will start a BFT (I dislike DST) to find all connected black cells.  Once I find all the black cells connected to “me”, I resume the type-writer scan.
  • Once my counter hits 100, I have visited all cells and will exit.

[1] you paint it white, which is probably better.

I thought this problem was too hard and not worth study, but you convinced me that it may come up and I can at least give a vague solution  … better than no solution.

Compared to a binary tree, walking over a matrix is /operationally/ (not conceptually) harder because

  • need to check array bounds. Error-prone and time consuming in coding tests
  • The (invisible) links will not cover all cells. To avoid missing a cell, we still need a full matrix scan. The integration of tree walk + matrix scan is unintuitive, to put it mildly.
  • During the tree-walk, you don’t want to visit any cell too many times or get lost in an endless loop. A brute-force technique — shadow matrix to remember all visited cells.
  • … However, once you get over these nitty-gritty operational complexities, then tree-walk in matrix is not really harder. These algo questions can therefore frustrate and fail many candidates untrained on The technique.

bbg-Eq: fewest increments on odometer

There’s a special odometer with N (eg 5) digits. Each digit can increment (never decrement, to keep the problem simpler) independently and cyclically, like 8->9->0->1. You can think of them as 5 independent mini-odometers. The starting sequence is input (perhaps something like 07748). Ending sequence can be any other value.

Each increment on any mini-odometer has cost of $1. Find the cheapest way to go from start to end. One simple solution (from me) –increment each digit cyclically to the target value. So a “8” will increment to 9 -> 0 -> 1. Other mini-odometers are not affected by this mini-odometer.

However, 18021 is one of many illegal sequences — barriers.

Key insight — When N==2, this becomes a 2D matrix Tom-n-Jerry problem with barriers. We can simplify the problem to a 3×3 matrix i.e. 2-digit odometer from 00 to 22. Nine possible sequences.

We need to build a graph of nine nodes, which helps answer

  • Q: feasible? [2,2] would be unreachable if [1,2] and [2,1] are blocked.
  • Q: which path is shortest path

Once we build the graph, shortest path btw 2 graph nodes #binary matrix as illutration will solve it.

shortest path: linked-in distance #Flextrade

I haven’t searched online for this common question.

Q1: given a snapshot of the entire linked-in network and two personIDs A1 and B1, return the shortest distance between them. If they are directly connected, then return 1.
%%A: breadth-first search

Q2: what if you are given another pair of personID’s like A2 to B2, then A3 to B3 … Will your answer change?
%%A: I will pre-construct a jagged 2D int-array (more efficient than a matrix [1]). The outer array is indexed by personID. The element at position 2503 is for person #2503(say, Ed) and is an int array holding all of Ed’s first-degree friends’ personIDs.

This jagged array offers cache efficiency since I can load entire #2503 child-array in one memory access, even if Ed has 500 friends. Without this jagged array, I need to visit 500 pointers.

The jagged array alone can answer my question, so I don’t need the graph.

O() complexity is identical to graph traversal.

Q3: what if the snapshot is replaced by a live network? I feel we have to assume at the time of query we treat the network as momentarily frozen.
%%A: I would update my jagged array in real time

[1] jagged array can achieve 99% memory footprint saving. If the most connected person has 9999 friends then the matrix needs 9999 columns, but a typical person only has a few hundred friends.

generate paths in ternary matrix #Promethean TomJerry


Similar to binary-matrix cluster count #DeepakM #DST+fullScan but the connection is modeled differently.

–Original requirements: Input matrix has an object in each cell

  • 1 means wall or barrier
  • 0 means passage
  • 2 means cheese station

Jerry is [x][y]. Tom starts at [0][0] and must collect all cheese and deliver to Jerry. Find shortest path. Tom can only move one (U/D/L/R) step at a time. is my half-finished code. It builds a graph connecting all the cells (each a Node object). It’s able to check feasibility. Todo:

  • breadth-first walk from Tom to Jerry, without revisiting any node. Hopefully we can discover all possible paths (i.e. collecting all cheese)
  • determine the shortest.


tokens shared among friends #Promethean

See also NxN matrix for graph of N nodes

Requirement: There are N friends numbered from 1 to N. There are M pairs of links, where each (x , y ) pair is connected by a shared integer token described by tokenId. Any two friends, x and y , can be connected directly by multiple tokens, or indirectly (without directly shared token) because if friends x and y share token t and friends y and z also share token t , then x and z are also said to share token t.

Note if x/y shares t and u/v share t, then x and u may be unconnected!

Find the maximal product of x and y for any directly or indirectly connected (x , y ) pair such that x and y share the maximal number of tokens with each other. If x/y have 3 tokens connecting them, and u/v also have 3 tokens, then we compare x*y vs u*v. showcasing nested containers like map<int, list<set<int>>>. I believe STL is harder than java, python etc.

construct graph from list of connections #BGC java

Given an input file showing a list of {string, string} pairs, build a connection graph.

If you have a correct connection graph, then you can easily determine the connectedness (bool) of any 2 nodes. In a social-network, this bool flag indicates whether 2 individuals are completely unconnected or somehow connected.

I see this as a social-network. Any pair represents an edge connecting 2 nodes.  At any time there are a number of disconnected islands. The next pair could 1) merge 2 islands or 2) add a node to an existing island or 3) create a new island 4) do nothing, if the 2 nodes are already in some existing island

  • Any known node appears exactly once in the entire graph, in exactly one of the islands.
  • All nodes are contained in a lookup table or hashmap  {node -> island}
  • Each island can be a implemented as a hashset of nodes.

So here’s a proposed algo to process a new pair {A, B}. Look for A and B in the  graph. 3 scenarios + a dummy scenario:

  • (Scenario 3) If both A an B are new comers, then they form a new island.
  • if both A and B are already in the graph,
    • (Scenario 4) if they are in the same island, then exit. Nothing to do
    • (Scenario 1) else we can merge the 2 islands
  • (Scenario 2) If A is in island 3 but B is new comer, then B joins island 3

The merge operation is expensive. The big lookup table needs update but here’s an alternative:

  • At merge time, the smaller island would have all the nodes moved to the bigger island. When the island is empty, it gets a pointer “this.redirect” to the bigger island.
  • lookup table needs no update, avoiding locking a global object.
  • At query time, we look up the table to get the original island, then we follow its pointer (defaults to null) until the island is non-empty.
  • endless loop? would only be a programming error.

equals method of a class


You are spot on about linked list — If a class N has-a field of type N, then N is almost always, by definition, a node in a graph. That N field is probably a parent node. So allow me to put in some meaningful names — Each GraphNode has-a field named this.parent. Now the question becomes “how to override equals() in GraphNode and deal with the unbounded recursion”.

It’s an unusual technical requirement to make equals() to compare all ancestor nodes. However, It’s a reasonable business requirement to compare 2 GraphNodes by comparing all ancestors. Such a business requirement calls for a (static) utility method, NOT an instance method in A static utility method like compareAllAncestor(GraphNode, GraphNode) can be iterative and avoid recursion and stack overflow. Once this static method is in place, I might (grudgingly) create an instance method compare(GraphNode other) which simply returns compareAllAncestor(this, other), without unbounded recursion or stack overflow.

If 2 threads both perform this comparison, then I feel the method may need to lock the entire graph — expensive.

Even in a single-threaded environment, this comparison is expensive. (The recursive version would add an additional memory cost.) Potentially a performance issue. For most graph data structures in business applications, GraphNode should be Serializable and collections-friendly. Therefore hashCode() and equals() should be cheap.

For most graph data structures in business applications, each graph node usually represents a real world entity like a member in a MLM network. Now, if a graph node represents a real world entity, then it’s always, without exception, identifiable by an immutable and unique ID. Usually this ID is saved in database (could also be generated in application). Therefore, in most cases, equals() should compare ID only.