Introduction
This article introduces a stored procedure for SQL Server that handles a celebrated matching problem in graph theory. This work formed the basis of a Nobel Prize in 2012 [6], and is currently used by several organizations [4],[5].
In previous articles ([2], [3]) SQL Server scripts used graphs to solve some common database problems in ETL programming.
Those graphs were edge-centric because they stored table edges rather than vertices (which appear as columns of each edge). That worked because the standard database operations on such tables (eg. selects, deletes, insertions) were ideal for some problems, such as locating connected subgraphs to build new objects in the database. But for some problems like this one, array-like thinking would be handy. SQL Server doesn’t have arrays, but a simple workaround for this script will be described.
So let’s get started.
Overview of the Problem
We wish to marry a set of boys to a set of girls in an ideal manner, where both boys and girls have individual preferences about whom they wish to marry. An ideal matching is one where no boy and girl prefer each other to those they’ve married. Take the following tables. The first column in each table is the name of the boy or girl, and the remaining columns are their preference lists. For example, Boy 1 prefers Girls 2, 4, 5, 1, 3 in that order.
We have one for boys, B:
1 | 2 | 3 | 4 | 5 | |
B1 | 2 | 4 | 5 | 1 | 3 |
B2 | 1 | 5 | 2 | 4 | 3 |
B3 | 2 | 1 | 3 | 4 | 5 |
B4 | 1 | 2 | 3 | 4 | 5 |
B5 | 3 | 4 | 1 | 2 | 5 |
Table 1: Boy preference chart
We have another for girls, G:
1 | 2 | 3 | 4 | 5 | |
G1 | 4 | 3 | 5 | 2 | 1 |
G2 | 1 | 2 | 5 | 3 | 4 |
G3 | 2 | 3 | 1 | 5 | 4 |
G4 | 2 | 3 | 4 | 1 | 5 |
G5 | 5 | 2 | 3 | 4 | 1 |
Table 2: Girl preference chart
The following table is a sample marriage between the boys and girls. Here, Boy 1 is married to Girl 2, Boy 2 is married to Girl 5, etc. This particular mapping contains no boy B and girl G that prefer each other to those they’ve married:
B | G |
1 | 2 |
2 | 5 |
3 | 3 |
4 | 1 |
5 | 4 |
Table 3: Marriage chart
To read these tables, let's look at an example. Consider Boy 2 and Girl 3. She prefers Boy 2 to her partner, Boy 3 (see Table 2). He prefers Girl 5 to her (as well as everyone else).
Such mappings are called “stable” because, for ANY boy and girl NOT married to each other, at least one of them prefers their existing partner. That eliminates a potential switching by a boy and girl who prefer each other to those they’ve married.
Why is this useful? Well, the “boys” could be pilots at a regional airline and the “girls” could be co-pilots. Stable matches ensure that no pilot and co-pilot would fly with others when they’d rather fly together.
They could also be graduating medical students assigned to their first hospitals or students at high school who are being matched for their first dance. Stable mappings have many interesting properties. For example, suppose some pilot, P, is every co-pilot’s favorite, and some co-pilot, C, is every pilot’s favorite. Then P will always be mapped to C in any stable mapping. The same holds true for least favorites.
In 1962, David Gale and Lloyd Shapley proved ([1]) that it’s always possible to build stable marriages between an equal number of boys and girls. Presented here is a stored proc for building such marriages, written in SQL Server using their algorithm.
The procedure will randomly generate preference lists for an arbitrary number of boys and girls before building a stable marriage. You may also provide your own preference lists to build customized stable marriages.
Stable Marriage Algorithm
The algorithm is remarkably simple. It goes like this:
WHILE some boy is not engaged:
Select a boy who is not engaged
Select the girl highest on his preference list
If she is not engaged, his proposal is accepted
If she is engaged, his proposal is accepted if she prefers him to her current partner, whom she drops
Remove her from his preference list
Note the following observations:
- The number of engaged girls and boys is always equal
- Once a girl has been proposed, she remains engaged (possibly with existing partner)
- When an engaged girl switches partners, the new partner is preferred to the old partner
- For each loop, some boy's preference list decrements by one girl
Why the algorithm eventually stops
To prove this by contradiction, suppose the algorithm never stops. By observation (4) the preference list of some boy will eventually dwindle to one girl. By observation (2) the other girls on his list are engaged, since he's already proposed to them. By observation (1), the remaining girl on his preference list must not be engaged, so they become engaged.
At this point, everyone's engaged and the algorithm stops. This is a contradiction.
Why the matching is stable
To prove this by contradiction, suppose the matching is not stable. Then there must be a boy B and girl G who prefer each other to their partners. But when B became engaged to his current partner, he would have already proposed to G at an earlier point. Either B became engaged to G at that point, or not. It not, G preferred another boy at that point.
If they became engaged, then at some later point G must have dropped B (which only happens if she preferred another boy). Either way, by (3), the final partner of G is preferred to B. This is a contradiction.
Trace the match making
In the above example, these were the steps logged by the procedure to achieve a stable mapping:
- Boy 1 first proposes to Girl 2
- Girl 2 is not currently engaged
- So Girl 2 accepts proposal from Boy 1
- Boy 2 first proposes to Girl 1
- Girl 1 is not currently engaged
- So Girl 1 accepts proposal from Boy 2
- Boy 3 first proposes to Girl 2
- Unfortunately, Girl 2 prefers Boy 1 to Boy 3
- Boy 3 then proposes to Girl 1
- Fortunately for him, Girl 1 prefers Boy 3 to current Boy 2.
- So Girl 1 drops Boy 2 and becomes engaged to Boy 3
- Boy 2 then proposes to Girl 5
- Girl 5 is not currently engaged
- So Girl 5 accepts proposal from Boy 2
- Boy 4 first proposes to Girl 1
- Fortunately for him, Girl 1 prefers Boy 4 to current Boy 3
- So Girl 1 drops Boy 3 and becomes engaged to Boy 4
- Boy 3 then proposes to Girl 3
- Girl 3 is not currently engaged
- So Girl 3 accepts proposal from Boy 3
- Boy 5 first proposes to Girl 3
- Unfortunately, Girl 3 prefers Boy 3 to Boy 5
- Boy 5 then proposes to Girl 4
- Girl 4 is not currently engaged
- So Girl 4 accepts proposal from Boy 5
After the stable mapping has been built, the table B will have NULLs in each row, representing proposals that were ultimately rejected. That is, except for the last one, where the proposal remained intact. This gives us these tables:
1 | 2 | 3 | 4 | 5 | |
B1 | 2 | 4 | 5 | 1 | 3 |
B2 | 1 | 5 | 2 | 4 | 3 |
B3 | 2 | 1 | 3 | 4 | 5 |
B4 | 1 | 2 | 3 | 4 | 5 |
B5 | 3 | 4 | 1 | 2 | 5 |
Table 4 - Initial preferences
1 | 2 | 3 | 4 | 5 | |
B1 | NULL | 4 | 5 | 1 | 3 |
B2 | NULL | NULL | 2 | 4 | 3 |
B3 | NULL | NULL | NULL | 4 | 5 |
B4 | NULL | 2 | 3 | 4 | 5 |
B5 | NULL | NULL | 1 | 2 | 5 |
Table 5 - Final view of the table
To read this chart, we compare Table 4 with Table 5. We can see that Boy 1 married Girl 2, Boy 2 married Girl 5, Boy 3 married Girl 3, Boy 4 married Girl 1 and Boy 5 married Girl 4. In particular, Boy 3 submitted three proposals (some successful, some not) before finding a partner (Girl 3).
Humour
No scruples exist with the boys or girls during the matching process. Boys freely propose to girls already engaged, and girls happily drop their partners if something better comes along. But when the dust settles, the matching is stable, and everybody lives happily ever after.
This is starting to look like an automated TV soap opera (just keep generating mappings until you get the story you want).
Stored Procedure
The stored procedure for generating stable marriages is called spStableMarriage. It uses the three tables B, G and BG in the above example (with arbitrarily many boys and girls). Tables B and G contain preferences for each boy and girl, while BG stores the stable mapping between them that’s generated by the proc.
You can get the procedure in the Resources section below.
To create a stable marriage for the sample data (stored in the proc spPopulateTables):
exec dbo.spStableMarriage
To create a stable marriage for randomized tables of N boys and girls:
exec dbo.spStableMarriage N
Installation
The resource section contains a single script to build all tables and procedures:
spBuildTables | For a given N, builds BG, B, G for N boys and girls. The column names B1, … BN and G1, … GN are automatically generated. |
spCheckTables | Check that B and G have the same number of boys and girls, with no duplicate, orphan or NULL values in their preference lists. This is automatically called when using hard-wired data. |
spGetColumnNames | For any N, generate a string of N column names for B or G (eg. "B1,B2,B3,B4,B5"). These strings are used when building dynamic queries on the base tables. The actual names are not hard-wired anywhere, except spPopulateTables. |
spPopulateTables | Populate tables B and G with hard-wired data. This data will be used when spStableMarriage is called with no parameter. |
spRandomPermutation | Randomly permute numbers 1 to N, where N = number of boys and girls. Used to build random preference lists of any size. |
spRandomTable | Populate rows of table B or G with random permutations of 1 to N, where N = number of boys and girls. |
spStableMarriage | Builds stable marriages with randomly generated preferences of any size (eg. exec dbo.spStableMarriage 50). To use the hard-wired data in spPopulateTables, omit the size parameter. |
After installation, simply run the proc spStableMarriage as outlined above.
Dealing With No Arrays
There are several instances where treating tables as arrays would be helpful in the script:
- Boy B seeks the first column among columns G1, G2, … that's not NULL (for a given row)
- Boy B determines Girl G for a specific column (for a given row)
- Girl G determines column containing Boy B (for a given row)
To work around this limitation, a key query for the proc uses the UNPIVOT operator, which converts a row of preferences into rows of preference-column pairs for a given boy B:
SELECT * FROM B WHERE B = 3
This returns (for the test data):
1 | 2 | 3 | 4 | 5 | |
B3 | 2 | 1 | 3 | 5 | 4 |
This query will pivot the data:
SELECT u.id, u.mychar, u.morechar, u.X, u.G FROM B UNPIVOT ( G FOR X IN (G1, G2, G3, G4, G5)) AS u WHERE B = 3;
B | G | X |
3 | 2 | G1 |
3 | 1 | G2 |
3 | 3 | G3 |
3 | 5 | G4 |
3 | 4 | G5 |
This allows the proc to determine the column X of any preference G in the preference list for a boy B, as well as determine the value of any preference for a given column. That way, we can determine whether a given preference is before or after another preference.
Setting a preference to NULL for a given column of a given boy B is a simple update:
UPDATE B SET G3 = NULL WHERE B = 3
References
- [1] https://en.wikipedia.org/wiki/Stable_marriage_problem
- [2] https://www.sqlservercentral.com/articles/ordering-tables-to-preserve-referential-integrity
- [3] https://www.sqlservercentral.com/articles/using-graph-theory-to-group-records
- [4] http://www.nrmp.org/
- [5] https://natmatch.com/
- [6] https://www.nytimes.com/2012/10/16/business/economy/alvin-roth-and-lloyd-shapley-win-nobel-ineconomic-science.html