A SQL Sudoko Solver
Introduction
During a long wet May bank holiday weekend I decided to try my hand at a Sudoko puzzle in the Sunday paper.
After all, how hard could it be?
Four frustrating hours later I thought "hang on, this is a set based problem, surely I can use
SQL Server to solve this thing".
Basic Sudoko
Sudoko is usually a 9x9 grid, although any grid that can be regularly divided up is theoretically possible.
For the sake of this article we will assume that we are trying to solve a 9x9 grid.
A Sudoko grid may look something like the following:
As with all the best puzzles the rules are very simple
- Each row must contain the numbers 1 through to 9 only once
- Each column must contain the numbers 1 through to 9 only once
- Each 3x3 cell must contain the numbers 1 through to 9 only once
Looking at our grid this means that in row 9 column 9 we know that its value cannot be
1 because a 1 already exists in that 3x3 cell
- 4 or 9 because those digits already exist in that row
- 2,6 or 7 because those digits already exist in that column
Building the initial Sudoko square.
Our first task is to come up a table structure that will hold our Sudoko information.
| Field | Type |
| RowId | TINYINT |
| ColumnId | TINYINT |
| CellValue | TINYINT |
We can populate this table using simple INSERT statements.
TRUNCATE TABLE dbo.Sudoko
-- Row 1
INSERT INTO dbo.Sudoko (RowId, ColumnId, CellValue) VALUES (1,4,5)
INSERT INTO dbo.Sudoko (RowId, ColumnId, CellValue) VALUES (1,6,8)
...etc
Where we do not yet know the values in the cells we want an entry with a NULL value. Do do this I have a couple of views
and a quick bit of script as follows
IF EXISTS (SELECT * FROM sys.views WHERE Name='Digits')
DROP VIEW dbo.Digits
GO
CREATE VIEW dbo.Digits
AS
SELECT 0 AS value
UNION ALL
SELECT 1 AS value
UNION ALL
SELECT 2 AS value
UNION ALL
SELECT 3 AS value
UNION ALL
SELECT 4 AS value
UNION ALL
SELECT 5 AS value
UNION ALL
SELECT 6 AS value
UNION ALL
SELECT 7 AS value
UNION ALL
SELECT 8 AS value
UNION ALL
SELECT 9 AS value
GO
--------------------------------------------------------------------------
IF EXISTS (SELECT * FROM sys.views WHERE Name='AllValues')
DROP VIEW dbo.AllValues
GO
CREATE VIEW dbo.AllValues
AS
SELECT R.Value AS RowId, C.Value AS ColumnId, V.Value AS CellValue
FROM dbo.Digits AS R,dbo.Digits AS C, dbo.Digits AS V
WHERE R.Value>0 AND C.Value>0 AND V.Value>0
GO
--------------------------------------------------------------------------
-- Insert NULL values into the Sudoko square where row/column coordinates do not yet exist.
INSERT INTO dbo.Sudoko (RowId,ColumnID)
SELECT DT.RowId,DT.ColumnId
FROM dbo.Sudoko AS SD RIGHT JOIN(
SELECT DISTINCT RowId,ColumnId
FROM dbo.AllValues
WHERE RowId>0
AND ColumnId > 0
AND CellValue>0
) AS DT
ON SD.RowId = DT.RowId
AND SD.ColumnId = DT.ColumnID
WHERE SD.RowId IS NULL
GO
Blocked Values Part One
Once we have built our Sudoko square the first step in the solution is to establish which values are blocked from
each individual cell. Remember the rules we outlined earlier.
- Each row must contain the numbers 1 through to 9 only once
- Each column must contain the numbers 1 through to 9 only once
- Each 3x3 cell must contain the numbers 1 through to 9 only once
We have to build a query that will determine these values for us.
To begin with we want to identify all the values that occur in any one row.
SELECT DISTINCT RowId,CellValue
FROM dbo.Sudoko
WHERE CellValue IS NOT NULL
Now we want to join this back to the empty cells of our Sudoko square as follows
SELECT SD.RowId, SD.ColumnId,SR.CellValue
FROM dbo.Sudoko AS SD
INNER JOIN (
-- Get the values for the row.
SELECT DISTINCT RowId,CellValue
FROM dbo.Sudoko
WHERE CellValue IS NOT NULL
) AS SR
ON SD.RowId = SR.RowId
WHERE SD.CellValue IS NULL
We want a similar query for the columns and finally a query that identifies the values in a particular 3x3 cell.
SELECT SD.RowId, SD.ColumnId,CC.CellValue
FROM dbo.Sudoko AS SD
INNER JOIN (
-- Get the values present in a 3x3 cell.
SELECT DISTINCT
(RowId-1)/3+1 AS RowCellId,
(ColumnId-1)/3+1 AS ColumnCellId,
CellValue
FROM dbo.Sudoko
WHERE CellValue IS NOT NULL
) AS CC
ON (SD.ColumnId-1)/3+1 = CC.ColumnCellId
AND (SD.RowId-1)/3+1 = CC.RowCellId
WHERE SD.CellValue IS NULL
We have our 3 queries and we want all of these added together which we can do in a single view as follows:
IF EXISTS (SELECT * FROM sys.views WHERE Name='BlockedValues')
DROP VIEW dbo.BlockedValues
GO
--##SUMMARY Values that cannot be inserted into cells
CREATE VIEW dbo.BlockedValues
AS
SELECT SD.RowId, SD.ColumnId,SR.CellValue
FROM dbo.Sudoko AS SD
INNER JOIN (
SELECT DISTINCT RowId,CellValue
FROM dbo.Sudoko
WHERE CellValue IS NOT NULL
) AS SR
ON SD.RowId = SR.RowId
WHERE SD.CellValue IS NULL
UNION
SELECT SD.RowId, SD.ColumnId,SC.CellValue
FROM dbo.Sudoko AS SD
INNER JOIN (
SELECT DISTINCT ColumnId,CellValue
FROM dbo.Sudoko
WHERE CellValue IS NOT NULL
) AS SC
ON SD.ColumnId = SC.ColumnId
WHERE SD.CellValue IS NULL
UNION
SELECT SD.RowId, SD.ColumnId,CC.CellValue
FROM dbo.Sudoko AS SD
INNER JOIN (
SELECT DISTINCT
(RowId-1)/3+1 AS RowCellId,
(ColumnId-1)/3+1 AS ColumnCellId,
CellValue
FROM dbo.Sudoko
WHERE CellValue IS NOT NULL
) AS CC
ON (SD.ColumnId-1)/3+1 = CC.ColumnCellId
AND (SD.RowId-1)/3+1 = CC.RowCellId
WHERE SD.CellValue IS NULL
GO
We now have all those cell values that are directly blocked by existing data.
This isn't the whole story but it is a good start
Possible Values Part One
The list of possible values that can be inserted into any one cell must be the values still left to solve less those
identified in our "BlockValues" view.
To get to this we can define two new views as follows:
---------------------------------------------------------------------
IF EXISTS (SELECT * FROM sys.views WHERE Name='AllPossibleValues')
DROP VIEW dbo.AllPossibleValues
GO
CREATE VIEW dbo.AllPossibleValues
AS
SELECT AllVal.RowId,
AllVal.ColumnId ,
AllVal.CellValue
FROM dbo.Sudoko AS SD
INNER JOIN dbo.AllValues AS AllVal
ON SD.RowId = AllVal.RowId
AND SD.ColumnID = AllVal.ColumnID
WHERE SD.CellValue IS NULL
GO
---------------------------------------------------------------------
This first view simply produces a total list of all values for those cells that are currently blank.
---------------------------------------------------------------------
IF EXISTS (SELECT * FROM sys.views WHERE Name='PossibleValues')
DROP VIEW dbo.PossibleValues
GO
CREATE VIEW dbo.PossibleValues
AS
SELECT AllVal.RowId,
AllVal.ColumnId ,
AllVal.CellValue
FROM dbo.AllPossibleValues AS AllVal
LEFT JOIN dbo.BlockedValues AS Block
ON AllVal.RowId = Block.RowId
AND AllVal.ColumnId = Block.ColumnId
AND AllVal.CellValue = Block.CellValue
WHERE Block.RowId IS NULL
GO
This 2nd view excludes those values that are deemed to be blocked.
Our first solving query
What we want to do is to populate any cells in our Sudoko puzzle where there is only one possible value,
that is 8 out of 9 values are blocked.
Of course if we solve a cell (or hopefully cells) this means that our list of possible values will be
reduced for other cells and therefore we want to keep hitting our query until it can solve no more values. This means
we need some code that says
WHILE @@ROWCOUNT >0
BEGIN
... exec dbo.OurSolvingSP
END
Firstly we must identify those cells for which there is only one solution.
SELECT RowId, ColumnId, MIN(CellValue) AS CellValue
FROM dbo.PossibleValues
GROUP BY RowId, ColumnId
HAVING COUNT(*)=1
Our stored procedure becomes the following:
------------------------------------------------------------
IF EXISTS(SELECT 1 FROM sys.procedures where name = 'SolveSingleCells')
DROP PROC dbo.SolveSingleCells
GO
CREATE PROCEDURE dbo.SolveSingleCells
AS
SET NOCOUNT ON
DECLARE @RecordCount1 INT ,
@RecordCount2 INT ,
@TotalCount INT
SET @TotalCount=1
WHILE @TotalCount>0
BEGIN
-- Update the Sudoko square where only a single solution exists
-- for any one cell
UPDATE SD
SET SD.CellValue = DT.CellValue
FROM dbo.Sudoko AS SD
INNER JOIN (
SELECT RowId, ColumnId, MIN(CellValue) AS CellValue
FROM dbo.PossibleValues
GROUP BY RowId, ColumnId
HAVING COUNT(*)=1
) AS DT
ON SD.RowId = DT.RowId
AND SD.ColumnId = DT.ColumnId
SET @TotalCount= @@ROWCOUNT
END
GO
In a simple puzzle this will solve quite a few values however our puzzle is a little more complicated than that
so we must think again.
A 2nd attempt
Look at the first 3x3 cell of our grid again. We know that there are no cells with a single possible solution
so we must turn the problem on its head and look for occurrences where a value can only fit in a single cell.
Consider the number 7. The shaded square show where figure 7 cannot go and as you can see there is only one
possible cell where 7 is not blocked.
Using the same method we can see that in the bottom right hand 3x3 cell there is also only one cell where 7 is not blocked.
Clearly we need a query that can identify these occurrences and that we can incorporate into our solving stored procedure!
Our first step is to build a view that can determine the number of different cell values that can go in any particular
cell.
IF EXISTS (SELECT * FROM sys.views WHERE Name='PossibleValuesByCell')
DROP VIEW dbo.PossibleValuesByCell
GO
CREATE VIEW dbo.PossibleValuesByCell
AS
SELECT
(PV.RowId-1)/3+1 AS RowCellId,
(PV.ColumnId-1)/3+1 AS ColumnCellId,
PV.CellValue,
COUNT(*) AS Occurrences
FROM dbo.possiblevalues AS PV
GROUP BY (PV.RowId-1)/3+1 ,
(PV.ColumnId-1)/3+1 ,
PV.CellValue
GO
Our update statement therefore becomes the following
UPDATE SD
SET SD.CellValue = PV.CellValue
FROM dbo.Sudoko AS SD
INNER JOIN dbo.possiblevalues AS PV
ON SD.RowId = PV.RowId
AND SD.ColumnId = PV.ColumnID
INNER JOIN PossibleValuesByCell AS PC
ON PC.RowCellId = (PV.RowId-1)/3+1
AND PC.ColumnCellId = (PV.ColumnId-1)/3+1
AND PC.CellValue = PV.CellValue
WHERE PC.Occurrences=1
Wrapping this up into a single stored procedure becomes the following
------------------------------------------------------------
IF EXISTS(SELECT 1 FROM sys.procedures where name = 'SolveSingleCells')
DROP PROC dbo.SolveSingleCells
GO
CREATE PROCEDURE dbo.SolveSingleCells
AS
SET NOCOUNT ON
DECLARE @RecordCount1 INT ,
@RecordCount2 INT ,
@TotalCount INT
SET @TotalCount=1
WHILE @TotalCount>0
BEGIN
-- Update the Sudoko square where only a single solution exists
-- for any one cell
UPDATE SD
SET SD.CellValue = DT.CellValue
FROM dbo.Sudoko AS SD
INNER JOIN (
SELECT RowId, ColumnId, MIN(CellValue) AS CellValue
FROM dbo.PossibleValues
GROUP BY RowId, ColumnId
HAVING COUNT(*)=1
) AS DT
ON SD.RowId = DT.RowId
AND SD.ColumnId = DT.ColumnId
SET @RecordCount1 = @@ROWCOUNT
UPDATE SD
SET SD.CellValue = PV.CellValue
FROM dbo.Sudoko AS SD
INNER JOIN dbo.PossibleValues AS PV
ON SD.RowId = PV.RowId
AND SD.ColumnId = PV.ColumnID
INNER JOIN PossibleValuesByCell AS PC
ON PC.RowCellId = (PV.RowId-1)/3+1
AND PC.ColumnCellId = (PV.ColumnId-1)/3+1
AND PC.CellValue = PV.CellValue
WHERE PC.Occurrences=1
SET @RecordCount2 = @@ROWCOUNT
SET @TotalCount = @RecordCount1+@RecordCount2
-- Display the number of solved values.
RAISERROR('Single cells solved = %d',10,1,@TotalCount)
END
RETURN @TotalCount
GO
By itterating through this procedure until it can solve no other cells we can see that it will fill in
quite a few of the gaps in our puzzle.
Solving double values
The cells that are left have multiple possible solutions so we need something that will evaluate these cells.
If we look at the row 3 and 4 for column 6 we can see that they are both limited to the same pair of values, 2 &9.
This means that nothing else in that column can take either of these values. If any other cells did take one of these
values it would mean that both of our cells would have exactly the same value, which is against the rules.
What we need is a query that blocks these values from our list of possible values for other cells.
Our next step is to create a view that identifies those cells that have only two possible solutions.
---------------------------------------------------------------------
IF EXISTS (SELECT * FROM sys.views WHERE Name='TwinValues')
DROP VIEW dbo.TwinValues
GO
CREATE VIEW dbo.TwinValues
AS
SELECT RowId, ColumnId,MIN(CellValue) AS Value1, MAX(CellValue) AS Value2
FROM dbo.PossibleValues
GROUP BY RowId, ColumnId
HAVING COUNT(*)=2
GO
---------------------------------------------------------------------
Then we want to identify where a pair of values is shared in the same column.
---------------------------------------------------------------------
SELECT T1.ColumnId, MIN(T1.RowId) AS RowId1, MAX(T1.RowId) AS RowId2, T1.Value1, T1.Value2
FROM dbo.twinvalues AS T1 INNER JOIN dbo.twinvalues AS T2
ON T1.ColumnId = T2.ColumnId -- Must be in the same column
AND T1.Value1 = T2.Value1 -- Must have an identical 1st value
AND T1.Value2 = T2.Value2 -- Must have an identical 2nd value
WHERE T1.RowId<>T2.RowId -- Must be in a different row.
GROUP BY T1.ColumnId,T1.Value1, T1.Value2
---------------------------------------------------------------------
Finally we must combined this back with our possible values so that we can identify any cells in our column, other
than the ones we have identified that have one or more of these values.
---------------------------------------------------------------------
SELECT PV.RowId ,
PV.ColumnId,
PV.CellValue
FROM dbo.PossibleValues AS PV
INNER JOIN (
SELECT T1.ColumnId, MIN(T1.RowId) AS RowId1, MAX(T1.RowId) AS RowId2, T1.Value1, T1.Value2
FROM dbo.twinvalues AS T1 INNER JOIN dbo.twinvalues AS T2
ON T1.ColumnId = T2.ColumnId
AND T1.Value1 = T2.Value1
AND T1.Value2 = T2.Value2
WHERE T1.RowId<>T2.RowId
GROUP BY T1.ColumnId,T1.Value1, T1.Value2
) AS DT
ON PV.ColumnId = DT.ColumnId -- Must be in the same column
AND PV.CellValue IN (DT.Value1, DT.Value2) -- Must have at least one of the values
AND PV.RowId NOT IN (DT.RowId1,DT.RowId2) -- Must not be in the same cells as our paired value cells.
---------------------------------------------------------------------
Blocked Values Part Two
The previous query will form the basis for our 2nd blocked values view. This new view will contain the following:
- Our query identifying values from pairs which occur in column cells other than our paired cells
- A similar query identifying values from pairs which occur in row cells other than our paired cells
- The values from our original BlockedValues view
Putting it all together we can come up with a 2nd stage blocked view
---------------------------------------------------------------------
IF EXISTS (SELECT * FROM sys.views WHERE Name='BlockedValuesStage2')
DROP VIEW dbo.BlockedValuesStage2
GO
--##SUMMARY Values that cannot be inserted into cells
CREATE VIEW dbo.BlockedValuesStage2
AS
-- Identify cell values blocked because they exist in a column pair
SELECT PV.RowId ,
PV.ColumnId,
PV.CellValue
FROM dbo.PossibleValues AS PV
INNER JOIN (
SELECT T1.ColumnId, MIN(T1.RowId) AS RowId1, MAX(T1.RowId) AS RowId2, T1.Value1, T1.Value2
FROM dbo.twinvalues AS T1 INNER JOIN dbo.twinvalues AS T2
ON T1.ColumnId = T2.ColumnId
AND T1.Value1 = T2.Value1
AND T1.Value2 = T2.Value2
WHERE T1.RowId<>T2.RowId
GROUP BY T1.ColumnId,T1.Value1, T1.Value2
) AS DT
ON PV.ColumnId = DT.ColumnId
AND PV.CellValue IN (DT.Value1, DT.Value2)
AND PV.RowId NOT IN (DT.RowId1,DT.RowId2)
UNION
-- Identify cell values blocked because they exist in a row pair
SELECT PV.RowId ,
PV.ColumnId,
PV.CellValue
FROM dbo.PossibleValues AS PV
INNER JOIN (
SELECT T1.RowId, MIN(T1.ColumnId) AS ColumnId1, MAX(T1.ColumnId) AS ColumnId2, T1.Value1, T1.Value2
FROM dbo.twinvalues AS T1 INNER JOIN dbo.twinvalues AS T2
ON T1.RowId = T2.RowId
AND T1.Value1 = T2.Value1
AND T1.Value2 = T2.Value2
WHERE T1.ColumnId<>T2.ColumnId
GROUP BY T1.RowId,T1.Value1, T1.Value2
) AS DT
ON PV.RowId = DT.RowId
AND PV.CellValue IN (DT.Value1, DT.Value2)
AND PV.ColumnId NOT IN (DT.ColumnId1,DT.ColumnId2)
UNION
-- Identify cells from the original blocked query.
SELECT RowId , ColumnId, CellValue
FROM dbo.BlockedValues
GO
A case to ponder
There is one other case that we could consider. Where a paired value exists in the same 3x3 cell but does not share
a row or column. This would indeed pose a challenge as a query because, where as a column pair or row pair share a common
axis, this instance does not share any easily encoded values.
Let us suppose that row 1, column 1 shares the same pairing as row 3 column 3. Writing a query that would investigate
values not in these two cells would be quite tricky. The ways that the preceding queries work would cause row 1 column 3
and row 3 column 1 to be blocked.
There are two reasons why we don't need to worry about this query
- It would only affect the diagonal 3x3 cells 1,1 & 2,2 & 3,3. That is, you cannot transpose row 3 column 4 with
column 4 row 3 as these are in 3x3 cells 1,2 and 2,1 respectively
- The other blocked values actually work around the whole query making our cell test irrelevant.
At worst we end up
with an extra couple of itterations in our loop but the solution remains the same.
Possible Values Part Two
Of course, now we have an additional blocked values query we also need a 2nd stage PossibleValues query.
This will be similar to the first version but will be dbo.PossibleValues minus dbo.BlockedValuesStage2
---------------------------------------------------------------------
IF EXISTS (SELECT * FROM sys.views WHERE Name='PossibleValuesStage2')
DROP VIEW dbo.PossibleValuesStage2
GO
CREATE VIEW dbo.PossibleValuesStage2
AS
SELECT AllVal.RowId,
AllVal.ColumnId ,
AllVal.CellValue
FROM dbo.PossibleValues AS AllVal
LEFT JOIN dbo.BlockedValuesStage2 AS Block
ON AllVal.RowId = Block.RowId
AND AllVal.ColumnId = Block.ColumnId
AND AllVal.CellValue = Block.CellValue
WHERE Block.RowId IS NULL
GO
---------------------------------------------------------------------
Now we can revisit our dbo.SolveSingleCells procedure and dbo.PossibleValuesByCell view and replace
references to dbo.PossibleValues with dbo.PossibleValuesStage2
---------------------------------------------------------------------
IF EXISTS (SELECT * FROM sys.views WHERE Name='PossibleValuesByCell')
DROP VIEW dbo.PossibleValuesByCell
GO
CREATE VIEW dbo.PossibleValuesByCell
AS
SELECT
(PV.RowId-1)/3+1 AS RowCellId,
(PV.ColumnId-1)/3+1 AS ColumnCellId,
PV.CellValue,
COUNT(*) AS Occurrences
FROM dbo.PossibleValuesStage2 AS PV
GROUP BY (PV.RowId-1)/3+1 ,
(PV.ColumnId-1)/3+1 ,
PV.CellValue
GO
------------------------------------------------------------
IF EXISTS(SELECT 1 FROM sys.procedures where name = 'SolveSingleCells')
DROP PROC dbo.SolveSingleCells
GO
CREATE PROCEDURE dbo.SolveSingleCells
AS
SET NOCOUNT ON
DECLARE @RecordCount1 INT ,
@RecordCount2 INT ,
@TotalCount INT
SET @TotalCount=1
WHILE @TotalCount>0
BEGIN
-- Update the Sudoko square where only a single solution exists
-- for any one cell
UPDATE SD
SET SD.CellValue = DT.CellValue
FROM dbo.Sudoko AS SD
INNER JOIN (
SELECT RowId, ColumnId, MIN(CellValue) AS CellValue
FROM dbo.PossibleValuesStage2
GROUP BY RowId, ColumnId
HAVING COUNT(*)=1
) AS DT
ON SD.RowId = DT.RowId
AND SD.ColumnId = DT.ColumnId
SET @RecordCount1 = @@ROWCOUNT
UPDATE SD
SET SD.CellValue = PV.CellValue
FROM dbo.Sudoko AS SD
INNER JOIN dbo.PossibleValuesStage2 AS PV
ON SD.RowId = PV.RowId
AND SD.ColumnId = PV.ColumnID
INNER JOIN PossibleValuesByCell AS PC
ON PC.RowCellId = (PV.RowId-1)/3+1
AND PC.ColumnCellId = (PV.ColumnId-1)/3+1
AND PC.CellValue = PV.CellValue
WHERE PC.Occurrences=1
SET @RecordCount2 = @@ROWCOUNT
SET @TotalCount = @RecordCount1+@RecordCount2
RAISERROR('Single cells solved = %d',10,1,@TotalCount)
-- If any updates occurred then remove them from the possible values list.
IF @TotalCount>0
exec dbo.RemoveSolvedCells
END
RETURN @TotalCount
GO
------------------------------------------------------------
Wrapping it all up
Perhaps the final stage is to write two short views to ensure that the puzzle is truly solved.
The following view should never return any records if the solution is valid. It attempts to find the
following occurrences
- Rows where a single value occurs more than once.
- Columns where a single value occurs more than once.
------------------------------------------------------------
IF EXISTS (SELECT * FROM sys.views WHERE Name='ValidateSolution')
DROP VIEW dbo.ValidateSolution
GO
CREATE VIEW dbo.ValidateSolution
AS
SELECT RowId AS IndexValue,'Row' AS Item,cellvalue
FROM dbo.sudoko
GROUP BY rowid,cellvalue
HAVING count(*)>1
UNION ALL
SELECT ColumnId AS IndexValue,'Column' AS Item,cellvalue
FROM dbo.sudoko
GROUP BY ColumnId,cellvalue
HAVING count(*)>1
GO
------------------------------------------------------------
Concluding thoughts
I have tested this routine on the Sudoko puzzles in the UK's "The Independent" newspaper over the past week
or so and in each case it has produced the correct solution.
I see no reason why this Sudoko solver shouldn't be extended to solve any size of puzzle thought this would almost
certainly move away from the single solving stored procedure.
Of course it is one thing to solve a Sudoko puzzle but the real challenge would be to come up with the necessary
SQL code to generate the solution!