topological_sort@DAG: linear algo to assign ranks?

Q: topological sort — given a directed graph, any linear algo to assign ranks?

I prefer to use one shared rank for multiple nodes that can be simultaneously started/concretized/evaluated. This feature can increase flexibility and parallelism

terminology — avoid “dependency” — confusing. Prefer “upstream/downstream” or “ancestor/descendant”. Note ancestors and upstreams should be started/processed first.

rank table — We can use a hashtable (or pre-sized vector) to store the ranks: {rank -> list of nodes of that rank}. Assigning a node means adding the node id to the correct list, in O(1)

Assumption 1: the original graph node contains links to ancestors but no descendants. spreadsheet-model. I think this is Kahn’s assumption.

Assumption 2: the original graph nodes contain links to descendants but no ancestors. notification list or “call list”, or “listener list”. I think this model is used the DFS algo.

In most situations, One of these two assumptions would hold, but rarely both.

==== my modified version of Kahn’s algo

Scan-1 O(V+E) — build a hashtable-based two-way edgeSet representation of the graph. For each node, we maintain a hashset (or slist) of ancestors and a hashset of descendants. The duplication is needed, as described below in the Kahn context. (I think the DFS algo needs no duplication.)

Scan-2 O(V) — assign rank 0 to all top-level nodes (no precedent). Now we can use the rank table to scan rank-0 nodes

Scan-3 — Now scan the last assigned rank, rank-0 in this case. For each node in that list, check each downstream child. Unconditionally remove (O(1) thanks to hashset) the upstream link from inside the child. After that, If the child has empty hashset of ancestors it is assigned rank 1. I now believe the precedent/dependent link is never accessed again, so we can remove both.

Repeat the last scan at Rank 1, then Rank 2..

Every node is assigned only once. Every edge is checked only once or twice.

Can the ancestors hashset become an integer count?

— simplicity

Insight — In this design, I use multiple Simple passes and avoid doing too much in one pass. If needed, you can combine Scan-2 and Scan-1.

We treat the original nodes as readonly — nice simplification.

— terminology:
precedent/dependent is accurate but abstract.
“Dependency” is a confusing term. It means someone I depend on. Better avoid this word in graph problems.
uplink/downlink is visual only in a tree with root on top

— Kahn uses “incoming edge” to mean a link to an upstream/ancestor
“All nodes with no incoming edge” … implies a Node::ancestors field

When he visits downstream nodes from “current node”, he needs this->descendants field

This crucial detail is not explained in wikipedia

=== Kyle’s favorite DFS algo, as described on wikipedia, and in [[CLRS]]

Basic idea — check each remaining node and start a DFS. Whenever a leaf (downstream) node is found, Remove it from DAG, and prepend it to output list. Game over when last (most upstream) node is removed from DAG.

  • last node removed will be one of the most upstream nodes
  • first node removed will be one of the original leaf nodes
  • Output list has a meaning — a sorted list of items to “process”
  • Invariant — At any time, size of output list + size of graph == N
  • Implementation note: Actual removal of a node is tricky in either matrix representation or edge-list representation, so it’s easier to use a hashtable to hold “removedNodes”

The simple algo above will fail if cycles exist. To check cycles, we need a  small feature — “temporary mark”. I think this feature can detect cycles in any directed graph such as a tree.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s