longest descending path through matrix #60%

https://leetcode.com/problems/longest-increasing-path-in-a-matrix/

Q: given an int-matrix that allows 4-way moves,  find longest path with strictly descending nodes. Lateral move disallowed.

====analysis

I have code to generate all paths… generate loopfree paths: graph node A→B

— Solution 1: O(N logN)
O(N) First pass  to construct “waterflow” graph — visit N nodes. For each, check the four neighbors. Add an outgoing edge (in a Node.edgeList) to a neighbor node if it is strictly lower.

Now put all nodes into a min-heap, according to node height. Can be part of first pass.

Second pass we visit each item in the min-heap. For each node, compute longest descending path starting therefrom. Record this path length as Node.score. Note if a node is surrounded by higher or equal nodes, then it has score 0, as in the lowest node.

A higher node AA is always upstream to some “computed” nodes. (We don’t care about any incoming edges into AA). So pick the max of (up to four) lower neighbors’ scores. Add 1 to get AA’s score. This computation is O(1) but the heap pop() makes second pass O(N logN)

Note the lowest node may not be part of the longest path, if it is surrounded by the highest nodes.

Pacific/Atlantic #51%

https://leetcode.com/problems/pacific-atlantic-water-flow/

Q: Given an m*n matrix of non-negative integers representing the height of each unit cell in a continent, the “Pacific ocean” touches the left and top edges of the matrix and the “Atlantic ocean” touches the right and bottom edges. Water can only flow in four directions (up, down, left, or right) from a cell to another one with height equal or lower.

Print all cells where water can flow to both the Pacific and Atlantic ocean.

====Analysis
peek? not yet
— solution 1:
Create m * n nodes in a 2D grid. Two more nodes:
All north|west boundary nodes have an edge From Pacific node
All south|east boundary nodes have an edge From Atlantic node

Between two adj cells, there’s 1 or 2 directed edges, from low to high. Intuition — tracing pollutant upstream. An arrow from P to Q means Q is suspect.

How do I represent the graph? Each node can hold a list of “suspect neighbors” (higher or equal).

Run BFT or DFT from Pacific node and turn on flag in each reachable node.

Run the same from Atlantic. Every reachable node with the flag set is a double-suspect to be printed.

Accounts merge #disjoint set

https://leetcode.com/problems/accounts-merge/

Q: .. too long

====Analysis
Union-find + BFT

— solution 1:
Data structure:

  • Acct object {name, vector of Email objects, isVisited flag }
  • Email object {addr, set of Accts }
  • hash table {address -> Email objects}

First scan to Create one acct object for each row. When populating the vector, check if the address exists in the hash table. If yes, then save the existing email object in the vector. Also the Email object’s set is updated with the new Acct object.

Second scan to print everything. Iterate over the accounts. Need to merge all the emails belong to John —

Scan the vector. If any Email  object has more than one Acct (named John), visit each Acct following BFT, until all John accounts are visited. Collect all of these accounts’ emails in one tree-set, then print them.

 

clone connected graph given any one node #!!need4stdSol

There are a few ways to phrase the same question.

  1. For a connected graph of colored nodes (say, only a few colors or black/white), you are given one node only. Please clone entire graph.
  2. Given one node in a connected graph, visit every node and recreate entire graph in another computer.
  3. Given one node in a directed graph, serialized it, transmit to another computer and deserialize.

===== analysis

I feel this is one of the most practical algo problems. I feel my solution is fairly efficient for binary tree too.

Not too hard conceptually, if your mental picture is clear. I would need to assign node IDs (based on addresses [1]) to differentiate the black nodes. These IDs are serialized. So are the edge lists [1].

— For serialization, Basic idea is a BFT (or DFT) to visit every node. Each time a node address is encountered again (probably among the edge list of some node), we would simply skip it.

Within each node, the this->edges field is “translated” and serialized as a list or treeSet of node_IDs. The serialized file looks like

  • ID=1, edges=3,4,5,6
  • ID=2, edges=3,5,7
  • ID=3, edges=5,7,9
  • ID=4, edges=6,8

I think serialization is O(V+E). I think same O() for deserialization.

— For deserialization, we can simply construct all node objects in an array, using ID as index.  The edge list can stay as is. Optionally, we translate each edge from an ID into an address in the array.

[1] these are basic algoQQ pointers

target-graphNode is usually given by node address

It’s easy to get mislead when a question says a target node to search for is given by payload value or by key.

I always assume payload can be black/white, and key can be repeating! Therefore, the most generic way to identify a node is by address.

However, in the rare case where the graph nodes are movable in the given problem, then we must use node keys. What if the graph has no key? Then it’s not a search graph!

(de)serialize slist: each node points to a random node#Rahul

Q: a slist has one more link field in each node. It’s the address of a random node in the same slist. How do you serialize this linked list?

https://leetcode.com/problems/copy-list-with-random-pointer/ is a similar question where input is head node.

====analysis
I think the generic solution works but there is hopefully a more intuitive one

— solution 1:
first traversal to build a hashtable {node address -> auto-increment id}

2nd scan visits each node to save the random field along with the host node id

If goal is deep-clone, then 2nd scan can build clone list without the random link. Then another pass to create the links.

DFT|BFT: “seenNodes”

Q: in graph DFT and BFT, do we need a “seen” hashtable to remember visited nodes?

A: I probably needed it in some of my coding problem
A: I think so. The textbook DFT/BFT algos use white/grey/black for this purpose. It’s basically a

hashtable { node addr -> color } esp. if nodes are readonly.

This color technique is not needed for a simple binary tree, but it can become useful in tough CIV such as cyclic graphs.

topological_sort@DAG: linear algo to assign ranks?

Q: topological sort — given a directed graph, any linear algo to assign ranks?

I prefer to use one shared rank for multiple nodes that can be simultaneously started/concretized/evaluated. This feature can increase flexibility and parallelism

terminology — avoid “dependency” — confusing. Prefer “upstream/downstream” or “ancestor/descendant”. Note ancestors and upstreams should be started/processed first.

rank table — We can use a hashtable (or pre-sized vector) to store the ranks: {rank -> list of nodes of that rank}. Assigning a node means adding the node id to the correct list, in O(1)

Assumption 1: the original graph node contains links to ancestors but no descendants. spreadsheet-model. I think this is Kahn’s assumption.

Assumption 2: the original graph nodes contain links to descendants but no ancestors. notification list or “call list”, or “listener list”. I think this model is used the DFS algo.

In most situations, One of these two assumptions would hold, but rarely both.

==== my modified version of Kahn’s algo

Scan-1 O(V+E) — build a hashtable-based two-way edgeSet representation of the graph. For each node, we maintain a hashset (or slist) of ancestors and a hashset of descendants. The duplication is needed, as described below in the Kahn context. (I think the DFS algo needs no duplication.)

Scan-2 O(V) — assign rank 0 to all top-level nodes (no precedent). Now we can use the rank table to scan rank-0 nodes

Scan-3 — Now scan the last assigned rank, rank-0 in this case. For each node in that list, check each downstream child. Unconditionally remove (O(1) thanks to hashset) the upstream link from inside the child. After that, If the child has empty hashset of ancestors it is assigned rank 1. I now believe the precedent/dependent link is never accessed again, so we can remove both.

Repeat the last scan at Rank 1, then Rank 2..

Every node is assigned only once. Every edge is checked only once or twice.

Can the ancestors hashset become an integer count?

— simplicity

Insight — In this design, I use multiple Simple passes and avoid doing too much in one pass. If needed, you can combine Scan-2 and Scan-1.

We treat the original nodes as readonly — nice simplification.

— terminology:
precedent/dependent is accurate but abstract.
“Dependency” is a confusing term. It means someone I depend on. Better avoid this word in graph problems.
uplink/downlink is visual only in a tree with root on top

— Kahn uses “incoming edge” to mean a link to an upstream/ancestor
“All nodes with no incoming edge” … implies a Node::ancestors field

When he visits downstream nodes from “current node”, he needs this->descendants field

This crucial detail is not explained in wikipedia

=== Kyle’s favorite DFS algo, as described on wikipedia, and in [[CLRS]]

Basic idea — check each remaining node and start a DFS. Whenever a leaf (downstream) node is found, Remove it from DAG, and prepend it to output list. Game over when last (most upstream) node is removed from DAG.

  • last node removed will be one of the most upstream nodes
  • first node removed will be one of the original leaf nodes
  • Output list has a meaning — a sorted list of items to “process”
  • Invariant — At any time, size of output list + size of graph == N
  • Implementation note: Actual removal of a node is tricky in either matrix representation or edge-list representation, so it’s easier to use a hashtable to hold “removedNodes”

The simple algo above will fail if cycles exist. To check cycles, we need a  small feature — “temporary mark”. I think this feature can detect cycles in any directed graph such as a tree.

BFT phrasebook

BFT is one of the top 10 heavily used constructs in coding tests.

recursion-free — BFS feels like the most intuitive algo, recursion-free.

two hits — each node is accessed twice — appended to the FIFO and then popped from the FIFO. We need to be careful what to do at first vs 2nd visits. You can get confused if not unaware this subtlety.

graph — BFT is defined on graph, not only trees

tree — BFT produces a tree … the BFT tree.

mailer — Example: send a mailer to all linkedin contacts:

# send to each direct contact
# send to each second-degree contacts …

 

represent graph: matrix^edgeList #std::forward_list

Josh’s book [[data structures and other objects using c++]] has 5 pages devoted to three best-practice data structures representing a generic directed graph. (I think my [[discrete math] also covers the same topic.)

Note all these data structures can serialize graphs.

  1. boolean adjacency matrix — adj mat, unable to represent parallel edges between nodes A and B (rare) or self-to-self
  2. edge set — unable to represent parallel edges but can support complex graph algos having risk of traversing the same edge repeatedly. The set can be a field of a graph node, or a hash table {node-> edgeSet}
  3. edge list — can be implemented using std::forward_list, marginally faster better than hashset… not discussed further in this blogpost

pros and cons:

  • iterating all edges connected to a node — edge set wins. adjMat very poor.
  • sparse graph — adj mat wastes memory, not scalable.
  • test linkage between 2 nodes — adj mat and edgeSet are O(1)
  • ease of implementation — adj mat wins

Fundamental classification:

  • In a so-called dense graph, number of edges ~ O(VV) i.e. between any pair of nodes there’s likely an edge
  • In a so-called sparse graph, number of edges ~ O(V).

[[algo in a nutshell]] written by some professors, compares matrix and edge list. For a dense graph, P140 describes an intriguing edge list practical implementation to save space and speed up isConnected(node1, node2). Key insight —

   “Number all individual nodes using auto-incrementing ID numbers.”

After this, we will see that a node’s edge list may look like 2,3,4,5, 8,9 …which can be summarized as 2-4, 8-9. I felt this was a trivial trick but is actually a standard implementation. There’s a tradeoff though — inserting edge is now slower, but I feel O() could remain unchanged.

 

SCB-FM algo Q2a 2 slists

Q: Given two well-formed singly-linked lists (loop-free), that might be merged at some node, locate the merge point. If not merged, return null.

Note every node has exactly one child node, but may have two parent nodes.

====analysis====

Suppose list A has size P=55 and list B has size Q=32

–SolH: hashtable O(N+M) — optimal in time but not space

First populate hashtable with the short list’s 32 nodes. Then iterate the longer list and check each node’s address.

–SolR: reverse one list … how?

–SolA: array-based.  construct a simple 55-array of A node pointers, and another 32-array of B node pointers. Then compare the two final elements in the two arrays. If same then binary search in B. O(N+M) + O(log M)

–Sol2: 2-pointer algo O(1) space, usable on read-only lists 🙂

first scan to find the 2 end nodes and remember the sizes like 55 vs 32.

IFF the end nodes match, then 2nd scan:

skip first P-Q (23) nodes in the longer list and then increment 2 iterators in lock steps. Compare the 2 iterators at each step.

merge 2 sorted slists #2try

Q: Merge two sorted linked lists and return it as a new (ascending) list. The new list should be made by splicing together the nodes of the first two lists.

====analysis
A test of implementation skill. A test of clear communication and clear thinking. My solution below is O(1) space and O(N) time without allocating any heap memory.

I would pick the list having the lowest value as the Yellow jersey (leading). The other list is the green. At any time there exist two head nodes for two slists.

I always keep a brown pointer to the last of the leading pack in the yellow’s list. The leading pack are the “retired” nodes.

Swap – In the event of a ‘swap’, the brown’s current child (current yellow jersey list head) gets detached and becomes the new green jersey, and the old green jersey moves into the leading pack. Then we increment the ptr — this old green jersey’s child becomes the yellow jersey.

Every time after we increment the pointer, we compare its child node vs the green jersey.

I won’t bother with std::list::splice() and will use python or my own c++ linked list.

generate loopfree paths: graph node A→B|anyPair

Q1: given 2 nodes in a graph containing N (eg 121) nodes, potentially with cycles, generate all simple paths between the pair. A simple path has no cycle. (In other words, for a simple path, length + 1 ==  # unique nodes)

I think there are classic math algorithms for it, because this is part of basic graph theory. Here are some applications of this type of algorithms —

  • Q1b (special case of Q1): given 2 nodes in a C-by-R rectangular grid, where every node is connected to (up to) four neighbors, generate all cycle-free paths between the pair.
    • Below, I solved this problem in python
  • Q2 (simple application of Q1 algo): generate all simple paths between ALL node pair in a graph. The shortest simple path has length=0. Longest simple path can potentially visit every node exactly once.
  • A: first generate all 121-Choose-2 node pairs. For each pair, solve Q1. Lastly generate the 121 trivial paths of length=0.
    • repetitive 😦
  • Q2b (special case of Q2): In a C-by-R rectangular grid, where every node is connected to (up to) four neighbors, generate all cycle-free paths between ALL pairs. I believe this simple leetcode problem#79 does it.
    • Now I think the algo is much simpler than I imagined. Should really code it by hand.
  • Q2c (easy one based on Q2): given a binary tree containing no cycles, generate all paths.

— A1b: my DFT implementation (probably not 100% correct) , where each “trail” either fails or becomes a path.

  1. from NodeA start a breadcrumb/trail. We can’t revisit any node already visited on current breadcrumb,
    1. if this is a matrix, then instead of a hashtable, we can also use a shadow matrix, but the breadcrumb is much smaller than a shadow matrix
  2. if we reach a node surrounded by nodes on the same breadcrumb, then the trail fails at a dead-end
  3. else we will reach NodeB 🙂 Print the breadcrumb

By construction, we won’t see duplicate paths 🙂

https://github.com/tiger40490/repo1/blob/py1/py/algo_grid/classic_count4waySimplePaths.py is the implementation

–BFT? I don’t think BFT can print each unique path

shortest path:2nodes]binary matrix #BFT

Q: given 2 cells in a binary matrix (1=black, 0=white=blocked), check the pair are connected and if yes return the shortest path. There exists a path of length 1 between any 2 cells IFF both are side by side or stack atop.

count paths between 2 bTree nodes #PimcoQ9 Ashish is arguably harder than this problem, but this problem allows moving in four directions.

binary-matrix island count #DeepakM technique is more applicable. A BFT path should work.

  • every reachable node is painted Green (like 2)
  • we give up after our queue is empty

https://github.com/tiger40490/repo1/blob/py1/py/grid/classic_connectedPair.py is the implementation, briefly tested.

find target word in 2D grid #20%

Q (Leetcode 79) Target word can be constructed from letters of sequentially adjacent cell on the matrix, where “adjacent” cells are those horizontally or vertically neighboring. The same cell cannot be used more than once. Luckily question only requires yes/no answer. (Q212 is harder, but best to leave it aside.)
====analysis
It’s critical to recognize early on that a real test case matrix probably consists of just 2 letters abaabaababbabba. Otherwise your analysis would be simplistic and a waste of precious time in interview. Note interviewer will NOT give you this input early on.

So far, I have no insight into the structure of word-search-in-graph problems in general. Wikipedia suggests pre-processing the needle. In the end, all Leetcode posted solutions are brute-force.

See my definition of ECO in ## same-complexity optimizations

aha — This problem is not restricted to 2D grid.. can be generalized to any graph.

— idea 2: we could convert the grid to a trie.

  1. O(N) scan needle once. For each two-letter sequence, save it into a hashmap — HM1.
  2. Now build trie. As we build each link, we save the two-letter sequence into a frequency table — F2.
    • [ECO] tree-pruning — if ‘AB’ never shows up in the needle (use HM1), then don’t construct the sub-trie from A to B
    • At the end of trie-building, F2 contains only valid sequences.
  3. [ECO] now query F2 for the least frequent sequence, something like XY. Unfortunately, In the worst data, F2 has only AB and BA 😦
  4. Use this XY to split the needle.

— idea 1: locate the first letter i.e. head node. Once found, start DFT. If failed, mark the first letter as “dead head” and look for the next head. In the end, if all the “head nodes” are dead, then give up. Note a dead head can be part of a match, just not the head.

[ECO] build a frq table from the grid. Sort it by lower frequency. If the least frequent char is in the needle, then great… Eg — “q” shows up in needle ‘oqaque”. I will split needle at first (or 2nd) “Q” and get “oQ” and “Qaque”, then look for “Qaque” and also the reversed needle “Qo”

— tested solution: brute force DFS
Technique — I use a global isDone flag to ensure program exits ASAP. Without isDone, the program actually does the same thing, but less visibly. Am less confident.

public class Solution {
 static int isDone;
 public boolean check(char[][] board, String word) {
    isDone=0;
    for(int i = 0; i < board.length; i++)
        for(int j = 0; j < board[0].length; j++){
            if (exist(board, i, j, word, 0)) {
              assert(isDone==1);
              return true;  
            };
            //if (isDone>0) return true;
        }
    return false;
 }
 private boolean exist(char[][] board, int i, int j, String word, int ind){
    if(ind == word.length()) {
        isDone++; //game over as soon as One match found.
        return true;
    }
    if(i > board.length-1 || i <0 || j<0 || j >board[0].length-1 || board[i][j]!=word.charAt(ind))
        return false;
    board[i][j]='*';
    boolean result =    exist(board, i-1, j, word, ind+1) ||
                        exist(board, i, j-1, word, ind+1) ||
                        exist(board, i, j+1, word, ind+1) ||
                        exist(board, i+1, j, word, ind+1);
    board[i][j] = word.charAt(ind);
    return result;
 }
}

max-sum path problemS #high fan-out/fan-in

Note “path” implies directed (not bidirectional) edges i.e. dag

  • variation (rare): bi-directional edges
  • variation: single origin, single destination
  • variation: single origin, any destination among 55 leaf nodes
  • variation: each node has a +/- value
  • variation (rare): each edge has a +/- value. Edges usually outnumber nodes, as illustrated in the balloon-burst problem

–Common Special case: pyramid (see blogpost on recombinant binTree). The balloon problem has high fan-out.

I would try DP. For each node AA, determine (and store) the max cum-sum up to AA. Need to start from the layer 1 nodes (1 hop from origin), then use Layer 1 data to update layer 2 nodes.

Markov-like — For any node AA, the max cum-sum to AA only depends on the previous layer. No need to even care about a “fullpath” to AA. This property leads to dramatic *simplification*. Efficiency is a by-product of this simplification.

–Common Special case: non-recombinant binary tree.

i would use DFT + Kadane. Each origin-to-leaf path is independently handled. No layering.

Note Kadane algo handles +/- node values.

elegant use@priorityQ #indeed

Q: A given system receives a bunch of tasks at beginning of day.
task {
string id #assigned by this system as auto-incrementing string value
int execTime
list<task> children # all children becomes runnable as soon as this task completes. This is a classic edgeList
}
implement int getTotalTime(task) #including all descendant tasks

Q1: during system initialization we need to calculate how long each task (and its subtree) will execute. I solved this “warm-up” problem using a top-down tree-walk with memoization.

Q2: Now suppose this system expands to a farm of N identical cpu nodes. When a given cpu completes a task, it can only receive one specific runnable task, i.e. the one with smallest task ID. (This is an artificial constraint that hints at priority queue. I feel in a realistic context some other constraints may exist. Hopefully those constraints can be modeled and supported in the same priority queue.)

Overall, this is a realistic problem, not blatantly contrived.

Solution — I use two priorityQueues to hold two mutually exclusive subsets of tasks. Any task not in these PQs are completed (dead) or blocked (unborn). The mutual exclusion is a hallmark of clean designs.

PQ1 “the eligibles” holds all runnable tasks. This queue receives a group of tasks when group’s parent task completes — a (completion) event. This PQ is sorted by task id.

The event time is predictable as soon as this parent task starts running on a cpu.

PQ2: “the running” contains up to N tasks that are in the driver seats, sorted by expected end times.

My main() function is nothing but a while loop. In each iteration, pop the lowest item from the “running” queue, to simulate the earliest task completion event. At the completion, add all its children to the eligible queue. Then pop the leading eligible (possibly a new entry) and append it to the “running” queue.

The main loop is therefore an event loop like Win32 event loop. Therefore, it’s driven by the “running” queue, not the eligibles queue.

Elegant solution based on two priorityQueues. This data structure is not really intuitive so I think such intuition can and should be learned. This is my first enlightenment on the usage of priority queue. PQ1 yields the most eligible task, based on id or other criteria; PQ2 yields the earliest finisher according to each task’s duration and start time.

  • “Running” tasks is a natural usage of priorityQueue;
  • “Eligibles” is based on a stylized scheduling policy for illustration, but I can imagine real scheduling policies to use criteria other than FIFO. Note FIFO queue is less versatile than priorityQueue for scheduling.
  • Kernel scheduler uses priority queue in a similar way

Exit condition — while-loop ends when the running queue becomes empty. The eligibles queue must have emptied out earlier. The last event time is the total cost.

Minor Note on Initial data parsing: When Task 1 is received, it may reference zero or more child tasks by id, but the child Task-55 details are only available when Task-55 description is seen later on. So it’s best to store a list of child ID’s rather than pointers. The ID’s can translate to Task pointers via a lookup table, either a map or array

Minor Note: we don’t enqueue tasks unrelated to the initial task (specified by user input). We only care about the initial task and its subtree.

count paths between 2 binTree nodes #PimcoQ9 Ashish,Pinsky

The DP idea — compare matrix-path-counter and EditDistance

  • showcasing efficient queue in python.
  • showcasing using (x,y) coordinates as dictionary key
  • showcasing find max value in a dictionary

— Requirement: (https://leetcode.com/problems/unique-paths-ii/description/ is similar)
See Q9.pdf in the email to Ashish. Here are some key points:

Consider a maze mapped to a 2d-grid with an upper left corner at coordinates (row, column) = (0, 0). Any movement must be in increasing row or column direction. You must determine the number of distinct paths through the maze. You will always start at position (0, 0), the top left, and end up at (max(row), max(column)), the bottom right.
1 1 0 1
1 1 1 1
As an example, consider the above matrix where 1 indicates an open cell and 0 indicates blocked. You can only travel through open cells, so no path can go through the cell at (0, 2). There are two distinct paths to the goal. As a 2nd example, matrix below has 10 paths:
1 1 1 1
1 1 1 1
1 1 1 1

https://github.com/tiger40490/repo1/blob/py1/py/2d/classic_pathCountQ9.py is my solution.

==== Analysis ====
I see a binary tree, where each node is a cell. Cell (0,0) has down/left node (1,0) + right node (0,1). I feel this is similar to a more generic problem “count paths between 2 nodes in a binary tree”

— DFT:
😦 I implemented something like a DFT but it was too slow with some test cases
😦 can overflow stack
🙂 DFT can print each unique path. I think BFT can’t.
🙂 DFT is easier to implement than BFT

— DynamicProgramming BFT solution from origin, as I described to Bill Pinsky:
This solution is more versatile than the one from Ashish. It can handle any directed graph.

Mark each node as “computed” once we have computed a score denoting how many paths-from-root there are. Save the score in a shadow matrix.

Origin node has score 1. Blocked cell has score 0.

Start BFT from origin. The two immediate neighbors are set to 1 (or 0 if blocked). Every node can be computed by adding up above-score + left-score. (This algo is simpler than the EditDistance algo.)

Memoization Performance note — My implementation was extremely slow (albeit correct) until I added an “if already computed, then continue” early in the loop

This Layered DP algo is also described in max-path-sum problems

Q: why is score[1,1] accessed 4 times?
A: node[1,1] is added to the queue twice. Each dequeue would need one check.
A: Also score[1,2] need to access score[1,1] as a parent. Ditto score[2,1]

— Ashish Singh gave me a much simpler solution, as shown in my github code.

NxN matrix: graph@N nodes #IV

Simon Ma of CVA team showed me this simple technique.

https://github.com/tiger40490/repo1/blob/cpp1/cpp1/miscIVQ/tokenLinked_Friend.cpp is my first usage of it.

  • I only needed half of all matrix cells (excluding the diagonal cells) because relationships are bilateral.
  • Otherwise, if graph edges are directed, then we need all (N-1)(N-1) cells since A->B is not same as B->A.

My discrete math textbook shows this is a simplified form of representation and can’t handle self-link or parallel edge. The vertex-edge matrix is more robust but space-inefficient.

binary-matrix island count #DeepakCM

Q: https://leetcode.com/problems/number-of-islands/description/

https://github.com/tiger40490/repo1/blob/py1/py/grid/ has my solution. I don’t want to spend the time passing all leetcode tests! Diminishing return

https://www.geeksforgeeks.org/find-number-of-islands/ is conceptually identical, though using a different criteria for “connected” — diagonal neighbors are “connected”

Hi Deepak

I realize for “connected” problems, there’s definitely a graph underneath, so graph traversal is required. I guess BFT or DST will both find all the nodes connected to “me”.

Given a 1000 x 1000 all-black matrix, I think DFT recursion will go into exactly 1,000,000 levels and overflow stack space.

A high-level (vague) idea is

  • Scan in type-writer fashion. Suppose there are 10×10 = 100 cells either black or write. I paint each cell brown [1] once visited. I also use a integer counter to keep track how many cells already visited.
  • During the scan. If a cell is already visited, I will ignore it and move on
  • Any time I find a black cell, I will start a BFT (I dislike DST) to find all connected black cells.  Once I find all the black cells connected to “me”, I resume the type-writer scan.
  • Once my counter hits 100, I have visited all cells and will exit.

[1] you paint it white, which is probably better.

I thought this problem was too hard and not worth study, but you convinced me that it may come up and I can at least give a vague solution  … better than no solution.

Compared to a binary tree, walking over a matrix is /operationally/ (not conceptually) harder because

  • need to check array bounds. Error-prone and time consuming in coding tests
  • The (invisible) links will not cover all cells. To avoid missing a cell, we still need a full matrix scan. The integration of tree walk + matrix scan is unintuitive, to put it mildly.
  • During the tree-walk, you don’t want to visit any cell too many times or get lost in an endless loop. A brute-force technique — shadow matrix to remember all visited cells.
  • … However, once you get over these nitty-gritty operational complexities, then tree-walk in matrix is not really harder. These algo questions can therefore frustrate and fail many candidates untrained on The technique.

bbg-Eq: fewest increments on odometer

There’s a special odometer with N (eg 5) digits. Each digit can increment (never decrement, to keep the problem simpler) independently and cyclically, like 8->9->0->1. You can think of them as 5 independent mini-odometers. The starting sequence is input (perhaps something like 07748). Ending sequence can be any other value.

Each increment on any mini-odometer has cost of $1. Find the cheapest way to go from start to end. One simple solution (from me) –increment each digit cyclically to the target value. So a “8” will increment to 9 -> 0 -> 1. Other mini-odometers are not affected by this mini-odometer.

However, 18021 is one of many illegal sequences — barriers.

Key insight — When N==2, this becomes a 2D matrix Tom-n-Jerry problem with barriers. We can simplify the problem to a 3×3 matrix i.e. 2-digit odometer from 00 to 22. Nine possible sequences.

We need to build a graph of nine nodes, which helps answer

  • Q: feasible? [2,2] would be unreachable if [1,2] and [2,1] are blocked.
  • Q: which path is shortest path

Once we build the graph, shortest path btw 2 graph nodes #binary matrix as illutration will solve it.

shortest path: linked-in distance #Flextrade

I haven’t searched online for this common question.

Q1: given a snapshot of the entire linked-in network and two personIDs A1 and B1, return the shortest distance between them. If they are directly connected, then return 1.
%%A: breadth-first search

Q2: what if you are given another pair of personID’s like A2 to B2, then A3 to B3 … Will your answer change?
%%A: I will pre-construct a jagged 2D int-array (more efficient than a matrix [1]). The outer array is indexed by personID. The element at position 2503 is for person #2503(say, Ed) and is an int array holding all of Ed’s first-degree friends’ personIDs.

This jagged array offers cache efficiency since I can load entire #2503 child-array in one memory access, even if Ed has 500 friends. Without this jagged array, I need to visit 500 pointers.

The jagged array alone can answer my question, so I don’t need the graph.

O() complexity is identical to graph traversal.

Q3: what if the snapshot is replaced by a live network? I feel we have to assume at the time of query we treat the network as momentarily frozen.
%%A: I would update my jagged array in real time

[1] jagged array can achieve 99% memory footprint saving. If the most connected person has 9999 friends then the matrix needs 9999 columns, but a typical person only has a few hundred friends.

generate paths in ternary matrix #Promethean TomJerry

–Requirement:

Similar to binary-matrix cluster count #DeepakM #DST+fullScan but the connection is modeled differently.

–Original requirements: Input matrix has an object in each cell

  • 1 means wall or barrier
  • 0 means passage
  • 2 means cheese station

Jerry is [x][y]. Tom starts at [0][0] and must collect all cheese and deliver to Jerry. Find shortest path. Tom can only move one (U/D/L/R) step at a time.

https://github.com/tiger40490/repo1/blob/cpp1/cpp1/2d/TomJerry.cpp is my half-finished code. It builds a graph connecting all the cells (each a Node object). It’s able to check feasibility. Todo:

  • breadth-first walk from Tom to Jerry, without revisiting any node. Hopefully we can discover all possible paths (i.e. collecting all cheese)
  • determine the shortest.

 

tokens shared among friends #Promethean

See also NxN matrix for graph of N nodes

Requirement: There are N friends numbered from 1 to N. There are M pairs of links, where each (x , y ) pair is connected by a shared integer token described by tokenId. Any two friends, x and y , can be connected directly by multiple tokens, or indirectly (without directly shared token) because if friends x and y share token t and friends y and z also share token t , then x and z are also said to share token t.

Note if x/y shares t and u/v share t, then x and u may be unconnected!

Find the maximal product of x and y for any directly or indirectly connected (x , y ) pair such that x and y share the maximal number of tokens with each other. If x/y have 3 tokens connecting them, and u/v also have 3 tokens, then we compare x*y vs u*v.

https://github.com/tiger40490/repo1/blob/cpp1/cpp1/miscIVQ/tokenLinked_Friend.cpp showcasing nested containers like map<int, list<set<int>>>. I believe STL is harder than java, python etc.

construct graph from list of connections #BGC java

Given an input file showing a list of {string, string} pairs, build a connection graph.

If you have a correct connection graph, then you can easily determine the connectedness (bool) of any 2 nodes. In a social-network, this bool flag indicates whether 2 individuals are completely unconnected or somehow connected.

—-analysis:
I see this as a social-network. Any pair represents an edge connecting 2 nodes.  At any time there are a number of disconnected islands. The next pair could 1) merge 2 islands or 2) add a node to an existing island or 3) create a new island 4) do nothing, if the 2 nodes are already in some existing island

  • Any known node appears exactly once in the entire graph, in exactly one of the islands.
  • All nodes are contained in a lookup table or hashmap  {node -> island}
  • Each island can be a implemented as a hashset of nodes.

So here’s a proposed algo to process a new pair {A, B}. Look for A and B in the  graph. 3 scenarios + a dummy scenario:

  • (Scenario 3) If both A an B are new comers, then they form a new island.
  • if both A and B are already in the graph,
    • (Scenario 4) if they are in the same island, then exit. Nothing to do
    • (Scenario 1) else we can merge the 2 islands
  • (Scenario 2) If A is in island 3 but B is new comer, then B joins island 3

The merge operation is expensive. The big lookup table needs update but here’s an alternative:

  • At merge time, the smaller island would have all the nodes moved to the bigger island. When the island is empty, it gets a pointer “this.redirect” to the bigger island.
  • lookup table needs no update, avoiding locking a global object.
  • At query time, we look up the table to get the original island, then we follow its pointer (defaults to null) until the island is non-empty.
  • endless loop? would only be a programming error.

equals method of a GraphNode.java class

XR,

You are spot on about linked list — If a class N has-a field of type N, then N is almost always, by definition, a node in a graph. That N field is probably a parent node. So allow me to put in some meaningful names — Each GraphNode has-a field named this.parent. Now the question becomes “how to override equals() in GraphNode and deal with the unbounded recursion”.

It’s an unusual technical requirement to make equals() to compare all ancestor nodes. However, It’s a reasonable business requirement to compare 2 GraphNodes by comparing all ancestors. Such a business requirement calls for a (static) utility method, NOT an instance method in GraphNode.java. A static utility method like compareAllAncestor(GraphNode, GraphNode) can be iterative and avoid recursion and stack overflow. Once this static method is in place, I might (grudgingly) create an instance method compare(GraphNode other) which simply returns compareAllAncestor(this, other), without unbounded recursion or stack overflow.

If 2 threads both perform this comparison, then I feel the method may need to lock the entire graph — expensive.

Even in a single-threaded environment, this comparison is expensive. (The recursive version would add an additional memory cost.) Potentially a performance issue. For most graph data structures in business applications, GraphNode should be Serializable and collections-friendly. Therefore hashCode() and equals() should be cheap.

For most graph data structures in business applications, each graph node usually represents a real world entity like a member in a MLM network. Now, if a graph node represents a real world entity, then it’s always, without exception, identifiable by an immutable and unique ID. Usually this ID is saved in database (could also be generated in application). Therefore, in most cases, equals() should compare ID only.