SQLServerCentral Article

How To Find A Perfect Match

,

Introduction

This article presents a script that finds a perfect match in any bipartite graph or shows why one cannot exist.

Overview

Recall that a bipartite graph is any undirected graph [1] where its nodes (or vertices) can be partitioned into two sets, and where edges only exist between those two sets. For this article, those sets are equal in size.

For example, consider the following bipartite graph:

Diagram Description automatically generated with medium confidence

The goal is to find a perfect match, which is a collection of edges sharing no nodes and for which every node is part of some edge in that collection.

This script interprets a popular algorithm for finding perfect matchings [2] by first randomly selecting as many disjoint edges as possible until no more can be found:

Diagram Description automatically generated

Here, four disjoint edges (in boldface) have been selected: (1,1), (2,2), (3,3), (4,4). Denoting this collection by M, the script will modify M repeatedly until it covers all nodes or show why it can’t be done. Note that if this algorithm fails to match at least half of all nodes, then no perfect matching exists. This follows from a simple counting argument on how many nodes would be available for building a perfect match.

Denote by U and V the collection of left and right nodes that are not in M. These are the nodes that are missing from the current match. Clearly |U| = |V| since the graph contains an equal number of left and right nodes.

More importantly, no edge connects nodes in U and V, otherwise the so-called greedy algorithm would have selected it.

So, any edge from U must terminate in M. For example, in the above graph, the edges (5,4) and (6,2) from U terminate in M. Similarly, the edges (2,6) and (3,5) from V terminate in M. Note that, by definition, the first coordinate of any edge is always the left node.

Where do we go from here?

Consider the following two bipartite graphs Graph1 and Graph2, each with a partial match:

Diagram Description automatically generated

In the first graph there is a path of length 2 from U to M to M, namely [3,2,2], whose terminal node is connected by an edge to a node in V. Adding that edge to the path enables a bigger matching by alternately adding/deleting its edges to/from M. In the second graph no such path exists, but if we expand the original path [3,2,2] to [3,2,2,1,1] then a bigger matching may be obtained in the same way. This celebrated technique will be used by the script to find a bigger matching.

So, by this bit of inspiration, the next step is to document all paths of length 2 from U to M to M by first building the sets S and T containing their nodes in M. Each node involved in these paths saves its own path from U as an attribute:

Diagram Description automatically generated

Clearly S and T have the same number of nodes. This construction, based on the current partial match, will be called the initialization of S.

After initialization of S, the script looks for an edge from S to V.

If a bigger matching cannot be found using the technique mentioned above, because no edge exists between S and V, then S and T will be expanded to admit longer paths. Hopefully, some new node in S will have an edge to V.

But if it finds an edge, say (2,6), then it has a path (in this case [6,2,2,6]) from U to V where its edges are alternately outside and inside M (starting at U and ending at V). The above technique will transform M into another matching with one more edge. This construction will be called the augmentation of M. Such paths are called “augmenting” because they can be used to augment M by alternately adding and deleting its edges to/from M to yield a different M with an extra edge:

Diagram Description automatically generated

Now both U and V have one less node, and the new M has one more edge. But some paths stored by S and T will now be out of date since U has just lost a member. In particular, the path [6,2,2] for the left node 2 no longer starts from a member of U so augmentation won’t work using it. For that reason, S is always re-initialized by the script after a successful augmentation:

Diagram Description automatically generated

But now augmentation is not possible because no edge exists between S and V. At this point the script will attempt to expand S by finding all paths of length 2 from S to M to M outside of T in the hope that the terminal node of one of them will have an edge to V:

Diagram Description automatically generated

This technique will be called the expansion of S. Fortunately, after expansion, augmentation works here with the new path [5,4,4,3,3,5] from U to V to produce a new M, which is a perfect match:

Diagram Description automatically generated

But suppose expanding S was not possible because the edge (4,3) didn’t exist. Then the neighbors of S would be exactly T (instead of a proper superset of T) because no edge exists between S and V since augmentation just failed! But then U and S will collectively contain more nodes than its neighbors since all edges from U must terminate in T, by construction. This would explain why no matching is possible.

Since the process of augmentation of M + initialization of S + expansion of S always expands M or S, it must eventually halt with either a perfect matching, or U and S will contain more nodes than its neighbors (preventing a perfect match).

Putting it all together, the script’s approach to find a perfect match is the following:

Apply greedy algorithm to get partial matching 
If partial match fails to match at least half of all nodes, return with error set 
If match is perfect, return with result set
 
Call initialization of S 
 
WHILE 1 = 1 
     BEGIN
     Call augmentation of M repeatedly until augmentation fails 
     For each loop: 
     IF augmentation succeeds, 
          BEGIN 
          IF match is now perfect, return with result set
          Call initialization of S 
          END
 
     Call expansion of S since augmentation of M just failed 
     IF unable to expand S, no perfect match is possible so return with error set 
     END

Script

This script calls four stored procedures (spGreedy, spInitializeS, spExpandM, spExpandS) in the following way:

-- Declarations
DECLARE @Result int
DECLARE @Result1 int
DECLARE @Result2 int
DECLARE @Result3 int

-- No counting
SET NOCOUNT ON

--Apply greedy algorithm to get partial matching
EXEC @Result = spGreedy
IF @Result <> 1 RETURN

-- Initialize U, V, S, T, Path
EXEC @Result = spInitializeS

-- Attempt to find perfect matching
-- For each loop, at least one of M or S will expand or no perfect match exists
WHILE 1 = 1
     BEGIN
     -- Repeatedly expand M with a new edge until it fails
     -- Whenever successful: U, V, S, T, Path must be re-initialized
     SET @Result1 = 1
     SET @Result2 = 1
     WHILE @Result1 = 1
          BEGIN
          EXEC @Result1 = spExpandM
          IF @Result1 = 1
               BEGIN
               EXEC @Result2 = spInitializeS
               IF @Result2 = 0 RETURN -- spExpandM may have just built a perfect match
               END
          END

     -- Expand S
     -- Expanding M has just failed so if expanding S fails, no perfect matching exists
     -- If expanding S succeeds, try expanding M again
     EXEC @Result3 = spExpandS
     IF @Result3 <> 1 RETURN
     END

Example

Suppose six pilots at a regional airline have each expressed their preferences for a co-pilot:

Diagram Description automatically generated

Can a perfect match be found, so that each pilot flies with a preferred co-pilot? The script shows why that’s not possible:

NumDistinctLeftNodesNumDistinctRightNodesPerfectMatchNotPossible
43Not enough right nodes to match left nodes, so perfect match impossible

 

LeftNodeRightNode
22
23
43
44
54
62

 

The four left nodes 2,4,5,6 collectively have three neighbors, so no match is possible. But if Jacqui relaxes her preferences and adds Jane to her preference list, then a perfect match exists:

Diagram Description automatically generated

How The Script Works

Three tables are used to define a bipartite graph, where columns exist for LeftNode, RightNode, U, S, V, T, Path, M:

LeftNodes:

LeftNodeUSPath
100
200
300
400
500
600

 

RightNodes:

RightNodeVTPath
100
200
300
400
500
600

 

Graph:

LeftNodeRightNodeM
110
160
220
230
330
340
350
430
440
540
620
660

 

The script attempts to set M to 1 for selected rows in Graph so that each left and right node appears exactly once in that selection (perfect match). For the above example, the table’s perfect match looks like this:

LeftNodeRightNodeM
111
160
221
230
330
340
351
431
440
541
620
661

 

The stored procedures are simple. For example, spExpandM shows how the tables are modified when seeking to expand M:

ALTER PROCEDURE [dbo].[spExpandM]
AS
BEGIN
 
/*
Script:  Try to find an edge from S to V
         If found, add an extra edge to M
Returns: -1 M was not expanded
         0 M was already perfect
         1 M was expanded
*/ 
 
-- No counting
SET NOCOUNT ON

-- Declarations
DECLARE @LeftNode int
DECLARE @RightNode int
DECLARE @Path varchar(max)
DECLARE @Count int
DECLARE @S int
DECLARE @I int
DECLARE @SQL nvarchar(max)
DECLARE @TABLE TABLE(I int IDENTITY, Node varchar(128))

/*
Execution
*/
-- If partial matching is already perfect, display matching and return
IF (SELECT count(*) FROM dbo.Graph WHERE M = 1) = (SELECT count(*) FROM dbo.LeftNodes)
     BEGIN
     SELECT * FROM dbo.Graph WHERE M = 1 ORDER BY LeftNode, RightNode
     RETURN 0
     END

-- Try to find an edge from S to V
SELECT TOP 1 @LeftNode = LeftNode, @RightNode = RightNode FROM dbo.Graph
WHERE
LeftNode IN (SELECT LeftNode FROM dbo.LeftNodes WHERE S = 1)
AND
RightNode IN (
SELECT RightNode FROM dbo.RightNodes
WHERE
RightNode IN (SELECT RightNode FROM dbo.RightNodes WHERE V = 1)
)

-- Return if not possible
IF @LeftNode IS NULL RETURN -1 -- No edge found from S to V

-- Set Path in right node by appending it to Path in left node
-- This Path will show how to get to the right node in V from U
-- It will be used to alternately add/delete edges to/from M
UPDATE dbo.RightNodes
SET Path = (SELECT TOP 1 Path FROM dbo.LeftNodes WHERE LeftNode = @LeftNode) +
',' +
CAST(@RightNode AS VARCHAR(16))
WHERE RightNode = @RightNode

-- Now select this Path to alternately add/delete edges to/from M
-- so a new edge will be added to M.
SET @Path = (SELECT TOP 1 Path FROM dbo.RightNodes WHERE RightNode = @RightNode)

-- Use the function split_string to return a table containing the nodes in the path,
-- along with an IDENTITY column I to provide their order in the path.
INSERT INTO @TABLE SELECT Node from split_string(@Path,',')

-- First count the number of nodes for the subsequent loop
SET @Count = (SELECT count(*) FROM @TABLE)

-- Loop through table of nodes, alternately adding/deleting them in M
SET @I = 1 -- position of row in table
SET @S = 1 -- S = 1 means add, S = 0 means delete
WHILE @I < @Count
     BEGIN
     SELECT @LeftNode = Node FROM @TABLE WHERE I = @I
     SELECT @RightNode = Node FROM @TABLE WHERE I = @I + 1
      
     -- Note that left and right nodes appear alternately in @TABLE
     IF @S = 1
     SET @SQL = 'UPDATE dbo.Graph SET M = ' + CAST(@S AS varchar(2)) +
     ' WHERE LeftNode = ' + CAST(@LeftNode AS varchar(16)) +
     ' AND RightNode = ' + CAST(@RightNode AS varchar(16))
     ELSE
     SET @SQL = 'UPDATE dbo.Graph SET M = ' + CAST(@S AS varchar(2)) +
     ' WHERE RightNode = ' + CAST(@LeftNode AS varchar(16)) +
     ' AND LeftNode = ' + CAST(@RightNode AS varchar(16))

     EXEC sp_executesql @SQL

     -- Update U, V, S, T for first and last nodes
     -- Strictly speaking, it's not required because spInitializeS will be called next
     IF @I = 1 UPDATE dbo.LeftNodes SET U = 0, S = 1 WHERE LeftNode = @LeftNode
     IF @I = @Count - 1 UPDATE dbo.RightNodes SET V = 0, T = 1 WHERE RightNode = @RightNode
     SET @I = @I + 1 -- Go to next node
     SET @S = 1 - @S -- Change add to delete or vice versa
     END

/*
Normal return
*/ 
RETURN 1
 
END

Installation

The resource section contains a script to install the stored procs, along with the following test script to randomly generate bipartite graphs before seeking a perfect match:

-- Declarations
DECLARE @Result int
DECLARE @Result1 int
DECLARE @Result2 int
DECLARE @Result3 int
 
-- No counting
SET NOCOUNT ON
 
-- Build random bipartite graph with selected number of left nodes
-- and probability for edge creation.
-- Disable this section if you've built your own bipartite graph
EXEC @Result = spBuildBipartite 10, 0.25
IF @Result <> 1 RETURN
 
-- Check that graph is bipartite
-- This is only required if a manually created graph is used
-- Enable this section if you've built your own bipartite graph
-- EXEC @Result = spCheckBipartite
-- IF @Result <> 1 RETURN
 
--Apply greedy algorithm to get partial matching
EXEC @Result = spGreedy
IF @Result <> 1 RETURN
 
-- Initialize U, V, S, T, Path
EXEC @Result = spInitializeS
 
-- Attempt to find perfect matching
-- For each loop, at least one of M or S will expand or no perfect match exists
WHILE 1 = 1
     BEGIN
     -- Repeatedly expand M with a new edge until it fails
     -- Whenever successful: U, V, S, T, Path must be re-initialized
     SET @Result1 = 1
     SET @Result2 = 1
     WHILE @Result1 = 1
     BEGIN
     EXEC @Result1 = spExpandM
     IF @Result1 = 1
          BEGIN
          EXEC @Result2 = spInitializeS
          IF @Result2 = 0 RETURN -- spExpandM may have just built a perfect match
          END
     END
 
     -- Expand S
     -- Expanding M has just failed so if expanding S fails, no perfect matching exists
     -- If expanding S succeeds, try expanding M again
     EXEC @Result3 = spExpandS
     IF @Result3 <> 1 RETURN
     END

The stored proc spBuildBipartite accepts two parameters (number of left nodes, probability of edge creation):

EXEC @Result = spBuildBipartite 10, 0.25

For larger values of the first parameter, you’ll want smaller values of the second if you wish to find graphs with no perfect match. Note that processing a large graph with, say, 1000 left nodes may run for 15-30 minutes on a fast machine if the probability of edge creation is at least 25%. For small probabilities, expect a long run. Use the query SELECT count(*) FROM dbo.Graph WHERE M = 1 to monitor the script’s progress. Note that the expansion of M slows down as the script proceeds.

If you disable this command the script will use the current graph tables, so you can run it on your own data. For that reason, the proc spCheckBipartite is available to check that your data represents a bipartite graph with an equal number of left and right nodes.

Remarks

This script provides a reason whenever a bipartite graph fails to have a perfect match: it finds a collection of left nodes with fewer neighbors than the collection itself (so no matching could exist for that collection). This confirms Hall’s theorem [3]: A bipartite graph has a perfect match if and only if every collection of left nodes has at least as many neighbors as the collection itself.

Can a machine be trained to spot the reason for any graph with no perfect match? With infinitely-many randomly generated examples, there’s no shortage of training data.

References

[1] Graph Theory
https://en.wikipedia.org/wiki/Graph_theory
 
[2] Discrete Mathematics
Lovasz, Laszlo
https://www.springer.com/gp/book/9780387955841
 
[3] Hall's marriage theorem
https://en.wikipedia.org/wiki/Hall%27s_marriage_theorem

 

 

Resources

Rate

5 (2)

You rated this post out of 5. Change rating

Share

Share

Rate

5 (2)

You rated this post out of 5. Change rating