Kruskal's Algorithm: MST Guide & Implementation

Kruskal's Algorithm: MST Guide & Implementation

1. Introduction to Minimum Spanning Trees

Welcome to the world of algorithms! We begin our journey by exploring a fundamental problem in computer science and graph theory: finding the Minimum Spanning Tree (MST). Before diving into Kruskal's algorithm, let's establish some foundational definitions.

What is a Graph?

A graph is a non-linear data structure consisting of a finite set of vertices (or nodes) and a set of edges connecting pairs of these vertices. Graphs are used to model pairwise relations between objects.

  • Nodes/Vertices ($V$): The points in the graph.
  • Edges ($E$): The connections between nodes. Edges can be directed (one-way) or undirected (two-way).

What is a Tree?

In graph theory, a tree is a special type of undirected graph that has two key properties:

  • It is connected: There is a path between any two nodes.
  • It has no cycles: You cannot start at a node, traverse a sequence of distinct edges, and return to the starting node without repeating an edge.

For a graph with $N$ vertices to be a tree, it must have exactly $N-1$ edges.

What is a Spanning Tree?

Given an undirected, connected graph $G=(V, E)$, a spanning tree is a subgraph that is a tree and connects all the vertices of $G$. In essence, it "spans" (reaches) every vertex using the minimum possible number of edges while maintaining connectivity and avoiding cycles.

  • It must include all vertices ($V$) of the original graph.
  • It must be a tree (connected and acyclic).
  • It will have exactly $|V|-1$ edges.

What is a Minimum Spanning Tree (MST)?

When the edges of a graph have weights (costs or distances associated with them), a Spanning Tree can have an associated total weight (the sum of its edge weights). A Minimum Spanning Tree (MST) is a spanning tree whose total weight is less than or equal to the total weight of every other possible spanning tree of the same graph.

The goal of finding an MST is to connect all vertices in a graph with the lowest possible cumulative edge cost, without forming any cycles.

Real-world Applications of MSTs

Minimum Spanning Trees are not just theoretical constructs; they have numerous practical applications across various fields due to their efficiency in connecting points at minimal cost.

🌐 Network Design

Designing efficient communication networks (e.g., telephone, internet, cable TV, computer networks) where vertices are locations and edge weights are cable costs or distances. MST helps minimize the total cable length or cost to connect all locations.

🛣️ Road & Pipeline Networks

Planning the cheapest way to build roads, railway tracks, or oil pipelines to connect a set of cities or regions, minimizing construction costs.

⚡ Electrical Grids

Designing power transmission grids to supply electricity to multiple substations with minimal wiring cost.

🔬 Clustering Analysis

In data science, MSTs can be used for clustering data points. Points connected by "short" edges are likely in the same cluster.

🌳 Facility Location

Determining the optimal placement of facilities or services (e.g., fire stations, hospitals) to cover a region, minimizing travel distances or response times.

💧 Water Supply Systems

Laying out water pipes to connect homes or districts from a source, ensuring all areas receive water with the least amount of piping.

2. Foundational Concepts

Before we dissect Kruskal's algorithm, let's ensure we have a solid understanding of the underlying data structures and operations it relies upon.

Graph Representation

How we store a graph in memory significantly impacts algorithm efficiency. For Kruskal's algorithm, specifically, the "edge list" representation is most natural.

Adjacency List

An adjacency list represents a graph as an array of lists. The index of the array represents a vertex, and each element in its list contains the neighbors of that vertex. It's efficient for sparse graphs (few edges) and for traversing neighbors.


// Example for an undirected graph
// Vertex 0: [1, 2]
// Vertex 1: [0, 3]
// Vertex 2: [0, 3]
// Vertex 3: [1, 2]
            

Edge List (Primary for Kruskal's)

An edge list is simply a collection of all edges in the graph. Each edge is typically represented as a tuple or object containing its two connected vertices and, for weighted graphs, its weight. This representation is ideal when the algorithm needs to process all edges, especially when sorting them by weight.

Consider a graph with vertices A, B, C, D and edges:

Source Destination Weight
AB4
AC1
BC2
BD5
CD3

Weighted Graphs

A weighted graph is a graph in which a numerical value (the "weight" or "cost") is assigned to each edge. These weights often represent distances, costs, capacities, or times. Kruskal's algorithm, by definition, operates on weighted graphs to find the minimum cost connections.

Cycles in Graphs

A cycle in an undirected graph is a path that starts and ends at the same vertex, where no edge is repeated. Detecting and preventing cycles is crucial for MST algorithms because a spanning tree, by definition, must be acyclic.

If adding an edge to a partially constructed spanning tree creates a cycle, that edge cannot be part of the final MST.

Disjoint Set Union (DSU) / Union-Find Data Structure

The Disjoint Set Union (DSU), also known as Union-Find, is a data structure that keeps track of a set of elements partitioned into a number of disjoint (non-overlapping) subsets. It performs two primary operations efficiently:

  • `find`: Determines which subset a particular element is in. It returns a "representative" (usually the root) of the set.
  • `union`: Merges two subsets into a single subset.

DSU is invaluable for Kruskal's algorithm as it helps determine if adding an edge will form a cycle. If two vertices of an edge are already in the same set (meaning they are already connected), adding that edge would create a cycle.

DSU Operations Explained:

1. `make_set(x)`

Creates a new set containing only the element $x$. Initially, each element is in its own distinct set. This typically involves setting $x$'s parent to itself.


function make_set(x):
    parent[x] = x
    rank[x] = 0 // or size[x] = 1
            

2. `find(x)` (with path compression)

Returns the representative (root) of the set containing $x$. Path compression is an optimization where every node on the path from $x$ to the root is directly attached to the root, flattening the tree structure for faster future lookups.


function find(x):
    if parent[x] == x:
        return x
    parent[x] = find(parent[x]) // Path compression
    return parent[x]
            

3. `union(x, y)` (by rank/size)

Merges the sets containing elements $x$ and $y$. It first finds the roots of $x$ and $y$. If they are different, it merges the two sets by attaching the root of the "smaller" tree (based on rank or size) to the root of the "larger" tree. This optimization (union by rank or size) keeps the trees shallow, improving `find` operation efficiency.

Are `find(x)` and `find(y)` different?
Yes (Distinct Sets)
Compare rank/size of roots `rootX` and `rootY`
`rank[rootX]` < `rank[rootY]`
`parent[rootX] = rootY`
`rank[rootX]` > `rank[rootY]`
`parent[rootY] = rootX`
`rank[rootX]` == `rank[rootY]`
`parent[rootY] = rootX`, `rank[rootX]++`
No (Same Set)
Do Nothing (Already connected, adding edge would form a cycle)

Sorting Algorithms for Edges

Kruskal's algorithm's efficiency hinges on processing edges in increasing order of their weights. Therefore, a fast sorting algorithm is a prerequisite. Standard comparison-based sorting algorithms like Merge Sort, Heap Sort, or Quick Sort (typically $O(E \log E)$ or $O(E \log V)$ if edges are processed as pairs of vertices) are suitable for this task.

3. Kruskal's Algorithm: Core Principles

Kruskal's algorithm offers an elegant and efficient method for finding the Minimum Spanning Tree of a connected, undirected, and weighted graph. It stands out for its simplicity and effectiveness, especially for sparse graphs (graphs with relatively few edges compared to vertices).

Algorithm Objective

The primary objective of Kruskal's algorithm is to construct a Minimum Spanning Tree (MST) from a given graph $G=(V, E, W)$, where $V$ is the set of vertices, $E$ is the set of edges, and $W$ is the set of edge weights. The resulting MST must connect all vertices using $|V|-1$ edges such that the sum of the weights of these edges is minimized.

Greedy Approach

Kruskal's algorithm employs a greedy strategy. A greedy algorithm makes the locally optimal choice at each step with the hope that this choice will lead to a globally optimal solution. For Kruskal's, this means always picking the cheapest available edge that doesn't violate the MST properties.

The Greedy Choice Property: For Kruskal's, the greedy choice is always safe. Adding the cheapest edge available that doesn't form a cycle is guaranteed to be part of *some* MST. This property is key to why the greedy approach works here.

Edge Selection Criterion

The core of the greedy strategy lies in its criterion for selecting edges:

  • Sort by Weight: All edges in the graph are first sorted in non-decreasing (ascending) order of their weights.
  • Iterative Selection: The algorithm then iterates through this sorted list of edges, considering them one by one, from the cheapest to the most expensive.

Cycle Prevention Mechanism

The crucial part of maintaining the "tree" property (acyclic) while adding edges is the cycle prevention mechanism. This is where the Disjoint Set Union (DSU) data structure becomes indispensable.

Pick next cheapest edge $(u, v)$
Use DSU: `find(u)` and `find(v)`
Are `find(u)` and `find(v)` the same?
Yes (Same Set)

Skip edge: Adding it would create a cycle.

No (Different Sets)

Add edge to MST: Use DSU `union(u, v)` to merge sets.

By using DSU, Kruskal's algorithm efficiently checks if the two endpoints of a potential edge already belong to the same connected component. If they do, adding that edge would form a cycle, so it's discarded. If they belong to different components, the edge is added to the MST, and the two components are merged into one using the `union` operation.

4. Step-by-Step Algorithm Breakdown

Let's break down Kruskal's algorithm into its precise sequence of operations. This systematic approach ensures that we build a Minimum Spanning Tree correctly and efficiently.

Initialization

  • Edge List: Create a list of all edges in the graph, where each edge is represented by its two connected vertices and its weight (e.g., $(u, v, \text{weight})$).
  • Disjoint Set Union (DSU) Structure: Initialize a DSU data structure. For each vertex $v$ in the graph, call `make_set(v)`. This means every vertex initially belongs to its own unique component.
  • Result Set: Initialize an empty list or set to store the edges that will form the MST. Let's call this `mst_edges`.

The Kruskal's Algorithm Flow

Start: Graph $G=(V, E)$
Initialize DSU (each vertex in own set)
`mst_edges = []`

Step 1: Sort Edges

Sort all edges in $E$ in non-decreasing order by their weights.

Step 2: Iterate Through Sorted Edges

For each edge $(u, v, \text{weight})$ from the sorted list:

Step 3: Check for Cycles (using DSU)

Are `find(u)` and `find(v)` different?
✅ Yes (Different Components)

Step 4: Add Edge to MST

Add $(u, v, \text{weight})$ to `mst_edges`.

Call `union(u, v)` to merge the components of $u$ and $v$.

❌ No (Same Component)

Skip this edge. Adding it would form a cycle in the partial MST.

Termination Condition

Has `mst_edges` collected $|V|-1$ edges?

Yes
End: `mst_edges` contains the MST.
No
Continue to next edge in sorted list.

Termination Condition

The algorithm terminates when either:

  • The `mst_edges` list contains exactly $|V|-1$ edges (where $|V|$ is the number of vertices in the graph). At this point, a spanning tree with the minimum possible weight has been formed.
  • All edges from the sorted list have been processed. If the graph is disconnected, `mst_edges` might contain fewer than $|V|-1$ edges, indicating that an MST for the entire graph cannot be formed (but rather a Minimum Spanning Forest). Kruskal's implicitly handles disconnected graphs by finding an MST for each connected component. For a truly connected graph, $|V|-1$ edges will always be found.

5. Algorithmic Walkthrough

Let's solidify our understanding of Kruskal's algorithm with a concrete example. We'll trace its execution step-by-step, observing how the DSU data structure helps build the MST.

Example Graph Setup

Consider the following undirected, weighted graph with 5 vertices (A, B, C, D, E) and 7 edges:

(A, B, 4), (A, C, 1), (B, C, 2), (B, D, 5), (C, D, 3), (C, E, 8), (D, E, 6)

We have $|V|=5$ vertices, so our MST will require $|V|-1 = 4$ edges.

Edge List Creation

First, we extract all edges from the graph along with their associated weights:

Edge (u, v) Weight
(A, B)4
(A, C)1
(B, C)2
(B, D)5
(C, D)3
(C, E)8
(D, E)6

Sorted Edge List

Next, we sort these edges in non-decreasing order of their weights:

Order Edge (u, v) Weight
1(A, C)1
2(B, C)2
3(C, D)3
4(A, B)4
5(B, D)5
6(D, E)6
7(C, E)8

Trace: DSU States and MST Construction

We initialize our DSU. Each vertex is its own parent, and has a rank of 0 (for simplicity, we'll just track parents). Initially: `parent = {A:A, B:B, C:C, D:D, E:E}` `mst_edges = []` `total_weight = 0` `edges_in_mst = 0`

Step Edge (u, v), Weight find(u) find(v) Cycle? Action DSU State (parent array) MST Edges Total Weight
Init - - - - - {A:A, B:B, C:C, D:D, E:E} {} 0
1 (A, C), 1 A C No Add (A, C), union(A, C) {A:A, B:B, C:A, D:D, E:E} {(A,C)} 1
2 (B, C), 2 B A (find(C) points to A) No Add (B, C), union(B, A) {A:A, B:A, C:A, D:D, E:E} {(A,C), (B,C)} 1+2=3
3 (C, D), 3 A (find(C) points to A) D No Add (C, D), union(A, D) {A:A, B:A, C:A, D:A, E:E} {(A,C), (B,C), (C,D)} 3+3=6
4 (A, B), 4 A A (find(B) points to A) Yes Skip (forms cycle A-C-B-A) {A:A, B:A, C:A, D:A, E:E} {(A,C), (B,C), (C,D)} 6
5 (B, D), 5 A (find(B) points to A) A (find(D) points to A) Yes Skip (forms cycle B-C-D-B) {A:A, B:A, C:A, D:A, E:E} {(A,C), (B,C), (C,D)} 6
6 (D, E), 6 A (find(D) points to A) E No Add (D, E), union(A, E) {A:A, B:A, C:A, D:A, E:A} {(A,C), (B,C), (C,D), (D,E)} 6+6=12
TERMINATE! 4 edges found ($|V|-1$)

Final MST Identification

The algorithm terminates after adding 4 edges, which is $|V|-1$. The identified Minimum Spanning Tree consists of the following edges:


(A, C) with weight 1
(B, C) with weight 2
(C, D) with weight 3
(D, E) with weight 6
    

The total minimum weight of the spanning tree is $1 + 2 + 3 + 6 = 12$.

At this point, all vertices (A, B, C, D, E) are connected, and there are no cycles. The graph now forms a single connected component, represented by a single root in our DSU structure.

Practice & Application

🎯 Practice: Find the MST of a Network Graph

You are given a network where cities (nodes) are to be connected by cables (edges). Each cable has an associated cost (weight). Your task is to find the minimum cost to connect all cities, effectively finding the Minimum Spanning Tree using Kruskal's algorithm.

Graph: Vertices: 0, 1, 2, 3, 4, 5 Edges and their weights: (0,1,7), (0,2,8) (1,2,3), (1,3,6) (2,3,4), (2,4,3) (3,4,2), (3,5,5) (4,5,2)

Determine the edges that form the MST and its total weight.

Click for Solution

1. Initialization:

  • Vertices: 6 ($|V|=6$). We need $|V|-1 = 5$ edges for the MST.
  • DSU: Each vertex is initially in its own set.
    
    parent = [0, 1, 2, 3, 4, 5] // parent[i] stores the parent of node i
                            
  • `mst_edges = []`
  • `total_weight = 0`

2. Edge List Creation:

Edge (u,v)Weight
(0,1)7
(0,2)8
(1,2)3
(1,3)6
(2,3)4
(2,4)3
(3,4)2
(3,5)5
(4,5)2

3. Sorted Edge List:

OrderEdge (u,v)Weight
1(3,4)2
2(4,5)2
3(1,2)3
4(2,4)3
5(2,3)4
6(3,5)5
7(1,3)6
8(0,1)7
9(0,2)8

4. Trace: DSU States and MST Construction:

Step Edge (u, v), W find(u) find(v) Cycle? Action DSU State (parent array) MST Edges Total W
Init - - - - - [0, 1, 2, 3, 4, 5] {} 0
1 (3,4), 2 3 4 No Add (3,4), union(3,4) [0, 1, 2, 3, 3, 5] {(3,4)} 2
2 (4,5), 2 3 (find(4)) 5 No Add (4,5), union(3,5) [0, 1, 2, 3, 3, 3] {(3,4), (4,5)} 2+2=4
3 (1,2), 3 1 2 No Add (1,2), union(1,2) [0, 1, 1, 3, 3, 3] {(3,4), (4,5), (1,2)} 4+3=7
4 (2,4), 3 1 (find(2)) 3 (find(4)) No Add (2,4), union(1,3) [0, 1, 1, 1, 1, 1] {(3,4), (4,5), (1,2), (2,4)} 7+3=10
5 (2,3), 4 1 (find(2)) 1 (find(3)) Yes Skip (2-1-3-4-2 cycle) [0, 1, 1, 1, 1, 1] {(3,4), (4,5), (1,2), (2,4)} 10
6 (3,5), 5 1 (find(3)) 1 (find(5)) Yes Skip (3-4-5-3 cycle) [0, 1, 1, 1, 1, 1] {(3,4), (4,5), (1,2), (2,4)} 10
7 (1,3), 6 1 (find(1)) 1 (find(3)) Yes Skip (1-2-4-3-1 cycle) [0, 1, 1, 1, 1, 1] {(3,4), (4,5), (1,2), (2,4)} 10
8 (0,1), 7 0 1 (find(1)) No Add (0,1), union(0,1) [0, 0, 0, 0, 0, 0] {(3,4), (4,5), (1,2), (2,4), (0,1)} 10+7=17
TERMINATE! 5 edges found ($|V|-1$)

5. Final MST Identification:

The Minimum Spanning Tree consists of the following 5 edges:


(3, 4) with weight 2
(4, 5) with weight 2
(1, 2) with weight 3
(2, 4) with weight 3
(0, 1) with weight 7
                

The total minimum weight of the spanning tree is $2 + 2 + 3 + 3 + 7 = \mathbf{17}$.

6. Implementation Considerations (Pseudo-code / Concepts)

Translating Kruskal's algorithm from its theoretical steps into a working program requires careful consideration of the data structures and control flow. Here, we'll outline the common implementation choices using pseudo-code and conceptual explanations.

Data Structure for Edges

To store graph edges efficiently, especially for sorting, a custom structure or class is highly recommended. It should encapsulate the two connected vertices and the edge's weight.


// C++ or Java-like struct/class definition
class Edge {
    int u, v;     // Source and destination vertices
    int weight;   // Weight of the edge

    // Constructor (optional, for convenience)
    Edge(int src, int dest, int w) {
        u = src;
        v = dest;
        weight = w;
    }
};

// In Python, a tuple or a dictionary might suffice:
// (weight, u, v) - placing weight first helps with default sorting
// Or a custom class
    

A list or vector of these `Edge` objects will form our primary edge list.

DSU (Disjoint Set Union) Data Structure

The DSU data structure is critical for efficient cycle detection. It's typically implemented using two arrays: one for `parent` and one for `rank` (or `size`).

  • `parent` array: `parent[i]` stores the parent of element `i`. If `parent[i] == i`, then `i` is the representative (root) of its set.
  • `rank` (or `size`) array: Used for the union-by-rank (or union-by-size) optimization to keep trees flat, preventing worst-case $O(N)$ `find` operations. `rank[i]` stores the height of the tree rooted at `i` (or `size[i]` stores the number of elements in the set rooted at `i`).

// DSU class structure
class DSU {
    vector<int> parent;
    vector<int> rank; // or size

    DSU(int num_vertices) {
        parent.resize(num_vertices);
        rank.resize(num_vertices);
        for (int i = 0; i < num_vertices; ++i) {
            parent[i] = i; // Each element is initially its own parent
            rank[i] = 0;   // Initial rank/height is 0
        }
    }

    // Find operation with path compression
    int find(int i) {
        if (parent[i] == i)
            return i;
        return parent[i] = find(parent[i]); // Path compression
    }

    // Union operation by rank
    void unite(int i, int j) {
        int root_i = find(i);
        int root_j = find(j);

        if (root_i != root_j) {
            // Union by rank: Attach smaller rank tree under root of higher rank tree
            if (rank[root_i] < rank[root_j])
                parent[root_i] = root_j;
            else if (rank[root_i] > rank[root_j])
                parent[root_j] = root_i;
            else { // Ranks are equal, make one root and increment its rank
                parent[root_j] = root_i;
                rank[root_i]++;
            }
        }
    }
};
    

Sorting Function for Edges

A crucial step is sorting all edges by their weights. Most programming languages provide efficient sorting functions (e.g., `std::sort` in C++, `Collections.sort` in Java, `list.sort()` in Python). You'll typically need to provide a custom comparator to tell the sorting function how to compare two `Edge` objects based on their `weight` property.


// Example using C++ std::sort with a lambda comparator
vector<Edge> edges;
// ... populate edges ...

sort(edges.begin(), edges.end(), [](const Edge& a, const Edge& b) {
    return a.weight < b.weight; // Sort in ascending order of weight
});

// In Python:
// edges.sort(key=lambda edge: edge.weight)
    

Loop Structure for Algorithm

The main logic of Kruskal's algorithm involves iterating through the sorted edges. A simple `for` loop or `while` loop can achieve this. The loop continues until either all edges have been considered or the MST has collected $|V|-1$ edges.


// Assume N is the number of vertices and 'edges' is the sorted list of Edge objects
DSU dsu(N);
vector<Edge> mst_result;
int edges_count = 0;

for (const auto& edge : edges) { // Iterate through each sorted edge
    if (edges_count == N - 1) { // Optimization: Stop if MST is complete
        break;
    }

    // Check for cycles using DSU's find operation
    // Conditional logic is embedded here
    // ...
}
    

Conditional Logic for Adding Edges

Inside the loop, the core conditional logic determines whether an edge should be added to the MST. This uses the `find` operation of the DSU to check if the two vertices of the current edge are already connected (i.e., in the same set).


// Inside the loop, for each 'edge'
int root_u = dsu.find(edge.u);
int root_v = dsu.find(edge.v);

if (root_u != root_v) {
    // If roots are different, adding this edge will NOT form a cycle
    mst_result.push_back(edge); // Add edge to MST
    dsu.unite(edge.u, edge.v);  // Merge the two components
    edges_count++;              // Increment count of edges in MST
}
// Else (root_u == root_v), the edge forms a cycle, so we skip it.
    

Combining these considerations provides a robust framework for implementing Kruskal's algorithm effectively.

Practice & Application

🎯 Practice: Simulating Kruskal's with DSU State

Consider a graph with 4 vertices (0, 1, 2, 3) and the following edges:

  • (0, 1, 5)
  • (0, 2, 1)
  • (1, 2, 3)
  • (1, 3, 2)
  • (2, 3, 4)

Walk through the first 3 iterations of Kruskal's algorithm. For each iteration, demonstrate:

  1. The current edge being considered (after sorting).
  2. The `find` operation results for both vertices of the edge.
  3. Whether the edge is added to the MST (and why/why not).
  4. The state of the DSU `parent` array after the operation (assume `union` by default logic, e.g., attach smaller index root to larger index root for simplicity if ranks are equal, or just arbitrarily merge if not specified).
  5. The `mst_edges` list and `total_weight` after the step.
Click for Solution

1. Data Structures Initialization:

  • Number of vertices ($N=4$). We need $N-1 = 3$ edges for the MST.
  • DSU `parent` array: Initially `[0, 1, 2, 3]` (each node is its own parent).
  • `mst_edges = []`
  • `total_weight = 0`

2. Edge List & Sorting:

Original Edges: (0,1,5), (0,2,1), (1,2,3), (1,3,2), (2,3,4)

Sorted Edges:

OrderEdge (u,v)Weight
1(0,2)1
2(1,3)2
3(1,2)3
4(2,3)4
5(0,1)5

3. Step-by-Step Simulation (First 3 Iterations):

Iteration 1: Edge (0,2), Weight 1

  • `find(0)`: Returns 0 (root of 0's set).
  • `find(2)`: Returns 2 (root of 2's set).
  • Roots (0 and 2) are different.
  • Action: Add (0,2) to `mst_edges`. Call `unite(0,2)`.
  • DSU `parent` state: `parent[2]` becomes `0`.
    
    // DSU parent array after unite(0,2):
    // parent = [0, 1, 0, 3]
    // Interpretation: 0 is parent of 2. 1 and 3 are self-parents.
                        
  • `mst_edges`: `{(0,2)}`
  • `total_weight`: $1$
  • `edges_in_mst`: $1$

Iteration 2: Edge (1,3), Weight 2

  • `find(1)`: Returns 1 (root of 1's set).
  • `find(3)`: Returns 3 (root of 3's set).
  • Roots (1 and 3) are different.
  • Action: Add (1,3) to `mst_edges`. Call `unite(1,3)`.
  • DSU `parent` state: `parent[3]` becomes `1`.
    
    // DSU parent array after unite(1,3):
    // parent = [0, 1, 0, 1]
    // Interpretation: 0 is parent of 2. 1 is parent of 3.
                        
  • `mst_edges`: `{(0,2), (1,3)}`
  • `total_weight`: $1 + 2 = 3$
  • `edges_in_mst`: $2$

Iteration 3: Edge (1,2), Weight 3

  • `find(1)`: Returns 1 (root of 1's set).
  • `find(2)`: Returns 0 (root of 2's set, because `parent[2]` is 0).
  • Roots (1 and 0) are different.
  • Action: Add (1,2) to `mst_edges`. Call `unite(1,0)`.
  • DSU `parent` state: `parent[0]` becomes `1` (assuming `unite(root1, root2)` makes `root2`'s parent `root1` or some rank-based merge).
    
    // DSU parent array after unite(1,0):
    // parent = [1, 1, 0, 1]
    // (Actual state might vary slightly based on union-by-rank/size exact implementation)
    // Interpretation: 0's parent is now 1. 2's parent is 0, so find(2) eventually goes to 1.
    // All nodes (0,1,2,3) are now connected to the root '1'.
                        
  • `mst_edges`: `{(0,2), (1,3), (1,2)}`
  • `total_weight`: $3 + 3 = 6$
  • `edges_in_mst`: $3$

At this point, `edges_in_mst` is 3, which equals $|V|-1$. The algorithm would terminate, and the MST would be `{(0,2), (1,3), (1,2)}` with a total weight of 6.

7. Complexity Analysis

Understanding the time and space complexity of an algorithm is crucial for evaluating its efficiency and suitability for different problem scales. Let's analyze Kruskal's algorithm.

We'll use standard graph notation:

  • $V$: Number of vertices (nodes) in the graph.
  • $E$: Number of edges in the graph.

Time Complexity

Kruskal's algorithm's execution time is primarily dominated by two major phases: sorting the edges and performing Disjoint Set Union operations.

Edge Sorting Complexity

The very first step of Kruskal's algorithm involves sorting all $E$ edges based on their weights. A standard comparison-based sorting algorithm (like Merge Sort or Heap Sort) will take $O(E \log E)$ time.

In some contexts, particularly for dense graphs where $E$ can be up to $O(V^2)$, this might also be expressed as $O(E \log V)$ because $E \le V^2$, so $\log E \le \log V^2 = 2 \log V$. However, $O(E \log E)$ is the more general and commonly cited complexity for this step, as it holds for both sparse and dense graphs.

DSU Operations Complexity

After sorting, the algorithm iterates through all $E$ edges. For each edge, it performs two `find` operations and potentially one `union` operation.

With the optimized DSU implementation (path compression and union by rank/size), the amortized time complexity for `find` and `union` operations is nearly constant. Specifically, it's $O(\alpha(V))$, where $\alpha(V)$ is the inverse Ackermann function. The inverse Ackermann function grows extremely slowly; for all practical purposes and graph sizes, $\alpha(V)$ is less than 5.

Therefore, performing $E$ DSU operations takes roughly $O(E \cdot \alpha(V))$ time.

The inverse Ackermann function ($\alpha(V)$) is so slow-growing that it is often considered a constant factor in complexity analysis. For any realistic number of vertices $V$, $\alpha(V)$ will never exceed 4 or 5.

Overall Time Complexity

Combining the complexities of sorting and DSU operations:


Total Time = Complexity(Sorting Edges) + Complexity(DSU Operations)
Total Time = O(E log E) + O(E * α(V))
    

Since $\alpha(V)$ is practically a constant, the term $O(E \cdot \alpha(V))$ is typically much smaller than $O(E \log E)$ for most graphs. Thus, the overall time complexity of Kruskal's algorithm is dominated by the sorting step.

Overall Time Complexity: $O(E \log E)$

Space Complexity

Space complexity refers to the amount of memory an algorithm needs to run. For Kruskal's algorithm, we consider the storage required for the graph's edges, the DSU structure, and the resulting MST.

Edge Storage

We need to store all $E$ edges of the input graph, typically in a list or array of custom `Edge` objects. This requires $O(E)$ space.

DSU Storage

The DSU data structure uses two arrays (parent and rank/size), each of size $V$ (for the number of vertices). This requires $O(V)$ space.

MST Storage

The resulting Minimum Spanning Tree will contain exactly $V-1$ edges (for a connected graph). Storing these edges also requires $O(V)$ space.

Overall Space Complexity

Summing up the space requirements:


Total Space = Complexity(Edge Storage) + Complexity(DSU Storage) + Complexity(MST Storage)
Total Space = O(E) + O(V) + O(V)
    

Overall Space Complexity: $O(V + E)$

This makes Kruskal's algorithm quite efficient in terms of memory usage, as it only needs to store the graph's edges and a DSU structure proportional to the number of vertices.

8. Practical Application / Lab Exercise

Now it's your turn to apply Kruskal's algorithm! This lab exercise will guide you through implementing the algorithm and testing it with various graph scenarios.

Problem Statement: Finding MST in a Given Graph

Implement Kruskal's algorithm to find the Minimum Spanning Tree (MST) of a given undirected, weighted graph. Your implementation should output the edges that constitute the MST and the total minimum weight of the spanning tree. If the graph is disconnected, your algorithm should correctly identify this and output the edges for each component's MST (forming a Minimum Spanning Forest) or indicate that a single spanning tree is not possible for the entire graph.

Input Format for Graph Data

The input will be provided in a structured format. Assume the graph vertices are labeled from $0$ to $N-1$.


// First line: Number of vertices (N), Number of edges (M)
N M

// Following M lines: Each line represents an edge
// u v weight
// where 'u' and 'v' are connected vertices, and 'weight' is the edge weight.
// Example:
5 7
0 1 4
0 2 1
1 2 2
1 3 5
2 3 3
2 4 8
3 4 6
    

Using the example above:

Line Description
5 75 vertices (0,1,2,3,4), 7 edges
0 1 4Edge between vertex 0 and 1 with weight 4
0 2 1Edge between vertex 0 and 2 with weight 1
......

Expected Output Format for MST Edges and Total Weight

Your program should output the edges of the MST, each on a new line, followed by the total minimum weight. The edges should be listed such that the smaller vertex index comes first (e.g., `0 1` instead of `1 0`).


// Each MST edge (sorted by weight, then by first vertex, then second)
u v weight

// Last line: Total weight
Total MST Weight: [sum_of_weights]
    

For the example input graph (from the walkthrough section or above):

Output Line Description
0 2 1MST edge (A, C)
1 2 2MST edge (B, C)
2 3 3MST edge (C, D)
3 4 6MST edge (D, E)
Total MST Weight: 12Sum of all MST edge weights

Test Cases Considerations

When implementing and testing your algorithm, consider the following edge cases and scenarios:

Connected Graph

A standard connected graph where an MST can be easily found. This is your primary test case.

Disconnected Graph

A graph where not all vertices can be reached from each other. Kruskal's will find a Minimum Spanning Forest (MST for each component). Your output should reflect this (e.g., fewer than $|V|-1$ edges in total, or a clear message).

For a disconnected graph, your algorithm should detect that it cannot form a single MST connecting all vertices. The number of edges in the resulting "forest" will be $|V| - \text{number of connected components}$.

Single Node Graph

A graph with only one vertex and no edges. The MST should be empty, and the total weight should be 0.

Graph with Parallel Edges

Multiple edges between the same pair of vertices (but with different weights). Kruskal's will naturally pick the cheapest one if it contributes to the MST.

Graph with Self-Loops

Edges connecting a vertex to itself. These edges should typically be ignored as they cannot be part of a simple spanning tree (and don't connect distinct components).

Graph with Negative Weights

Kruskal's algorithm works correctly with negative edge weights, as it's still about finding the minimum sum of weights. The "minimum" might just be a negative sum.

Post a Comment

Previous Post Next Post