Build Order (CCI 4.7) ✅¶

You are given a list of projects and a list of dependencies (which is a list of pairs of projects, where the second project is dependent on the first project). All of a project's dependencies must be built before the project is. Find a build order that will allow the projects to be built. If there is no valid build order, return an error.

EXAMPLE

Input:
projects: a, b, c, d, e, f
dependencies: (a, d), (f, b), (b, d), (f, a), (d, c)
Output: f, e, a, b, d, c

:::tip Hints: #26,#47, #60, #85, #725, #133

Build a directed graph representing the dependencies. Each node is a project and an edge exists from A to B if B depends on A (A must be built before B). You can also build it the other way if it's easier for you. :::

Solution¶

Visualizing the information as a graph probably works best. Be careful with the direction of the arrows. In the graph below, an arrow from d to g means that d must be compiled before g. You can also draw them in the opposite direction, but you need to consistent and clear about what you mean. Let's draw a fresh

Example.

alt

In drawing this example (which is not the example from the problem description), I looked for a few things.

I wanted the nodes labeled somewhat randomly. If I had instead put a at the top, with b and c as children, then d and e, it could be misleading.The alphabetical order would match the compile order.
I wanted a graph with multiple parts/components, since a connected graph is a bit of a special case.
I wanted a graph where a node links to a node that cannot immediately follow it. For example, f links to a but a cannot immediately follow it (since b and c must come before a and after f).
I wanted a larger graph since I need to figure out the pattern.
I wanted nodes with multiple dependencies.

Now that we have a good example, let's get started with an algorithm.

Solution l TODO¶

Where do we start? Are there any nodes that we can definitely compile immediately?

Yes. Nodes with no incoming edges can be built immediately since they don't depend on anything. Let's add all such nodes to the build order. In the earlier example, this means we have an order of f, d (or d, f).

Once we've done that, it's irrelevant that some nodes are dependent on d and f since d and f have already been built. We can reflect this new state by removing d and f's outgoing edges.

build order: f, d

alt

Next, we know that c, b, and g are free to build since they have no incoming edges. Let's build those and then remove their outgoing edges.

build order: f, d, c, b, g

alt

Project a can be built next, so let's do that and remove its outgoing edges. This leaves just e. We build that next, giving us a complete build order.

build order: f, d, c, b, g, a, e

Did this algorithm work, or did we just get lucky? Let's think about the logic.

We first added the nodes with no incoming edges. If the set of projects can be built, there must be some "first"project, and that project can't have any dependencies. If a project has no dependencies (incoming edges), then we certainly can't break anything by building it first.
We removed all outgoing edges from these roots.This is reasonable. Once those root projects were built, it doesn't matter if another project depends on them.
After that, we found the nodes that now have no incoming edges. Using the same logic from steps 1 and 2, it's okay if we build these. Now we just repeat the same steps: find the nodes with no dependencies, add them to the build order, remove their outgoing edges, and repeat.
What if there are nodes remaining, but all have dependencies (incoming edges)? This means there's no way to build the system. We should return an error.

The implementation follows this approach very closely.¶

a. Initialization and setup:

Build a graph where each project is a node and its outgoing edges represent the projects that depend on it. That is, if A has an edge to B (A-> B), it means B has a dependency on A and therefore A must be built before B. Each node also tracks the number of incoming edges.
Initialize a buildOrder array. Once we determine a project's build order, we add it to the array. We also continue to iterate through the array, using a toBeProcessed pointer to point to the next node to be fully processed.
Find all the nodes with zero incoming edges and add those to a buildOrder array. Set a toBeProcessed pointer to the beginning of the array.

b. Repeat until toBeProcessed is at the end of the buildOrder:

Read node at toBeProcessed. » Ifnode is null, then all remaining nodes have a dependency and we have detected a cycle.
For each child of node: » Decrement child. dependencies (the number of incoming edges). » If child. dependencies iszero, add child to end of buildOrder.
Increment toBeProcessed.

Solution 2¶

Alternatively, we can use depth-first search (DFS) to find the build path.

alt

Suppose we picked an arbitrary node (say b) and performed a depth-first search on it. When we get to the end of a path and can't go any further (which will happen at h and e), we know that those terminating nodes can be the last projects to be built. No projects depend on them.

DFS(b) // Step 1
  DFS(h) // Step 2
    build order ... , h // Step 3
  DFS(a) // Step 4
    DFS(e) // Step 5
      build order ... , e, h // Step 6
      ... // Step 7+
    ...

Now, consider what happens at node a when we return from the DFS of e. We know a's children need to appear after a in the build order. So, once we return from searching a's children (and therefore they have been added), we can choose to add a to the front of the build order.

Once we return from a, and complete the DFS of b's other children, then everything that must appear after b is in the list. Add b to the front.

DFS(b) // Step 1
  DFS(h) // Step 2
    build order = ..., h // Step 3
  DFS(a) // Step 4
    DFS(e) // Step 5
      build order = ... , e, h // Step 6
    build order = ... , a, e, h // Step 7
  DFS(e) -> return // Step 8
  build order = ..., b, a, e, h // Step 9

Let's mark these nodes as having been built too, just in case someone else needs to build them.

alt

Now what? We can start with any old node again, doing a DFS on it and then adding the node to the front of the build queue when the DFS is completed.

DFS(d)
  DFS(g)
    build order = ..., g, b, a, e, h
  build order = ..., d, g, b, a, e, h
DFS(f)
  DFS(c)
    build order = ..., c, d, g, b, a, e, h
  build order= f, c, d, g, b, a, e, h

In an algorithm like this, we should think about the issue of cycles. There is no possible build order if there is a cycle. But still, we don't want to get stuck in an infinite loop just because there's no possible solution.

A cycle will happen if, while doing a DFS on a node, we run back into the same path.What we need therefore is a signal that indicates"I'm still processing this node, so if you see the node again, we have a problem'

What we can do for this is to mark each node as a"partial"(or"is visiting") state just before we start the DFS on it. If we see any node whose state is partial, then we know we have a problem. When we're done with this node's DFS, we need to update the state.

We also need a state to indicate"I've already processed/built this node"so we don't re-build the node. Our state therefore can have three options: COMPLETED, PARTIAL, and BLANK.

C# Solution¶

using System.Collections.Generic;

namespace Algorithm.Medium
{

  public class Project2
  {
    public enum State { COMPLETE, PARTIAL, BLANK };
    public State state = State.BLANK;

    public void SetState(State s) { state = s; }
    public List<Project2> children = new List<Project2>();
    public Dictionary<string, Project2> projectMap = new Dictionary<string, Project2>();
    public string name;

    public Project2(string n)
    {
      name = n;
    }

    public void AddNeighbor(Project2 node)
    {
      if (!projectMap.ContainsKey(node.name))
      {
        children.Add(node);
        projectMap.Add(node.name, node);
      }
    }

  }

  public class Graph
  {
    public List<Project2> nodes = new List<Project2>();
    private Dictionary<string, Project2> map = new Dictionary<string, Project2>();

    public Project2 GetOrCreateNode(string name)
    {
      if (!map.ContainsKey(name))
      {
        var node = new Project2(name);
        nodes.Add(node);
        map.Add(name, node);
      }

      return map[name];
    }

    public void AddEdge(string startName, string endName)
    {
      var start = GetOrCreateNode(startName);
      var end = GetOrCreateNode(endName);
      start.AddNeighbor(end);
    }
  }


  public class BuildOrder2
  {
    public Stack<string> FindBuildOrder(string[] projects, string[][] dependencies)
    {
      Graph graph = BuildGraph(projects, dependencies);
      return OrderProjects(graph.nodes);
    }
    private Stack<string> OrderProjects(List<Project2> projects)
    {
      var stack = new Stack<string>();
      foreach (var project in projects)
      {
        if (project.state == Project2.State.BLANK)
        {
          if (!DoDFS(project, stack))
          {
            return null;
          }
        }
      }
      return stack;
    }

    private bool DoDFS(Project2 project, Stack<string> stack)
    {
      if (project.state == Project2.State.PARTIAL)
      {
        return false;
      }

      if (project.state == Project2.State.BLANK)
      {
        project.SetState(Project2.State.PARTIAL);
        var children = project.children;

        foreach (var child in children)
        {
          if (!DoDFS(child, stack))
          {
            return false;
          }
        }

        project.SetState(Project2.State.COMPLETE);
        stack.Push(project.name);
      }

      return true;
    }

    private Graph BuildGraph(string[] projects, string[][] dependencies)
    {
      var graph = new Graph();
      foreach (var project in projects)
      {
        graph.GetOrCreateNode(project);
      }

      foreach (var dependency in dependencies)
      {
        var first = dependency[0];
        var second = dependency[1];
        graph.AddEdge(first, second);
      }

      return graph;
    }
  }
}

C# Tests¶

using System.Collections.Generic;
using Algorithm.Medium;
using Xunit;

namespace AlgorithmTests.Medium
{
  public class BuildOrder2Test
  {
    [Fact]
    public void TestName()
    {
      var builder = new BuildOrder2();
      var stack = new Stack<string>();
      stack.Push("c");
      stack.Push("d");
      stack.Push("a");
      stack.Push("b");
      stack.Push("e");
      stack.Push("f");
      // var stack = new Stack<string> { "f", "e", "a", "b", "d", "c" };
      var dep = new string[][] { new string[] { "a", "d" }, new string[] { "f", "b" }, new string[] { "b", "d" }, new string[] { "f", "a" }, new string[] { "d", "c" } };
      Assert.Equal(stack, builder.FindBuildOrder(new string[] { "a", "b", "c", "d", "e", "f" }, dep));
    }
  }
}