SQL Clone
SQLServerCentral is supported by Redgate
 
Log in  ::  Register  ::  Not logged in
 
 
 

Advent of Code 2018 – Day 3 (Slicing)

Advent of Code 2018 – Day 3

As I explained in a recent post, I’m participating in this year’s Advent of Code challenge, with the twist of doing the challenges in T-SQL. In case you don’t know what “Advent of Code” is, Eric Wastl (t) created it for the purpose of, as Eric describes it:

Advent of Code is a series of small programming puzzles for a variety of skill levels. They are self-contained and are just as appropriate for an expert who wants to stay sharp as they are for a beginner who is just learning to code. Each puzzle calls upon different skills and has two parts that build on a theme.

The full explanation of “Advent of Code” is at https://adventofcode.com/2018/about. You can see my other posts in the “Advent of Code” category.

2018 Advent of Code recap

We started of with someone going back in time and altering Santa’s past. You went back in time to correct it, but the device to fix the problems hadn’t been calibrated before sending you on your way. You calibrated the device on Day 1. On day 2, we located the boxes in the warehouse where the new (500 years ago) Santa suit material is.

Day 3, Part 1 – Loading the file

Now that the new suit material has been found, all the elves are trying to make suits. They’ve submitted claims for different pieces of the material (the list of claims is the input file). The list being processed looks like this:

Day 3 input file

Day 3 input file

The first like breaks down to:

  • Claim #1
  • 185″ from the left edge
  • 501″ from the top edge
  • 17″ wide
  • 15″ tall

The task is to find the number of square inches of fabric that overlap with multiple claims. The piece of fabric is 1000 square inches. The first thing to do is to load the file and separate out all the values:

SELECT  ca4.claimId,
        ca4.pLeft,
        ca4.pTop,
        ca4.pWidth,
        ca4.pHeight
FROM	OPENROWSET(BULK 'D:\AdventOfCode\2018\Input\Day03.txt', SINGLE_CLOB) dt(FileData)
CROSS APPLY dbo.Split(dt.FileData, CHAR(13)) ca1
CROSS APPLY (VALUES (REPLACE(ca1.Item, CHAR(10), ''))) ca2(Item)
CROSS APPLY (VALUES (CHARINDEX('#', ca2.Item),  -- start of claim ID 
                     CHARINDEX(' @', ca2.Item), -- starting position of rectangle
                     CHARINDEX(': ', ca2.Item), -- size of rectangle
                     CHARINDEX(',', ca2.Item),  -- dimension split of starting position
                     CHARINDEX('x', ca2.Item)   -- split of rectangle size
                     )) ca3(posClaimId, posRectStart, posRectSize, posStartSplit, posSizeSplit)
CROSS APPLY (VALUES (CONVERT(INTEGER, SUBSTRING(ca2.Item, ca3.posClaimId+1, ca3.posRectStart - ca3.posClaimId - 1)),        -- claimID
                     CONVERT(INTEGER, SUBSTRING(ca2.Item, ca3.posRectStart + 3, ca3.posStartSplit - ca3.posRectStart - 3)), -- left
                     CONVERT(INTEGER, SUBSTRING(ca2.Item, ca3.posStartSplit + 1, ca3.posRectSize - ca3.posStartSplit - 1)), -- top
                     CONVERT(INTEGER, SUBSTRING(ca2.Item, ca3.posRectSize + 2, ca3.posSizeSplit - ca3.posRectSize - 2)),    -- width
                     CONVERT(INTEGER, SUBSTRING(ca2.Item, ca3.posSizeSplit+1, LEN(ca2.Item)) )                              -- height
                     )) ca4(claimId, pLeft, pTop, pWidth, pHeight)

As I’ve done with the other puzzles, I used OPENROWSET to load the file. This loads the file as one long string, so I use the SPLIT function to break this down into rows (split on CR). Next, I remove the LF from the string, and then find the starting positions of the various delimiters. Finally, I use the SUBSTRING function with all of those positions to extract the data. This query returns a result set looking like:

Extracted data from input file

Extracted data from input file

Day 3, Part 1 – Processing the data

My first thought was to create a temporary table, 1000 columns by 1000 rows. I would increment the appropriate columns / rows for each part that each claim used. As I thought about the work involved in accomplishing this, I decided that this wasn’t a feasible idea. There would be a lot of dynamically generated code to update the data. Additionally, analyzing the results would be a real pain. I need to come up with another idea.

My next thought was to use a table with three columns. One for the X axis, one for the Y axis, and the third is the counter. I added 1,000,000 rows, to handle 1,000 rows by 1,000 columns. It would be straight-forward to both update the coordinates being used, and to analyze the data. This table looks like this:

IF OBJECT_ID('tempdb.dbo.#Fabric') IS NOT NULL DROP TABLE #Fabric;
CREATE TABLE #Fabric (
    X INTEGER,
    Y INTEGER,
    Z INTEGER,
    CONSTRAINT PK_Fabric PRIMARY KEY CLUSTERED (X,Y));

WITH Tens    (N) AS (SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL 
                     SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
                     SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1),
     Hundreds(N) AS (SELECT 1 FROM Tens t1, Tens t2),
     Millions(N) AS (SELECT 1 FROM Hundreds t1, Hundreds t2, Hundreds t3),
     Tally   (N) AS (SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM Millions)
INSERT INTO #Fabric (X, Y, Z)
SELECT  t1.N AS X, t2.N AS Y, 0 AS Z
FROM	Tally t1     -- for the horizontal (X) axis
CROSS JOIN Tally t2  -- for the vertical (Y) axis
WHERE    t1.N <= 1000
AND      t2.N <= 1000;

This table looks like:

work table

Work Table

However, this would require updating the coordinates claim by claim. If I were to do an update of all the claims at once, each row would only be updated (incremented) one time, and not once per claim as I needed. What I need is a way to count how many times the elves use each coordinate.

Day 3, Part 1 – The final way to process the data

One way to overcome the single update issue is to use a cursor to perform updates claim by claim. I don’t like to use cursors unless there is no other way to do things – they cause a lot of performance issues. I get paid to fix performance issues… not to create them.

The method that I decided to go with is to join the table load query with two tally tables, one for the X-axis and one for the Y-axis. This lets me get the full range of values involved in each axis. I then just group on the X and Y axis, and do a count. This is all put into a sub query, and the outer query performs a count to get the total number of coordinates (square inches) being used by more than one claim. This query looks like this:

SELECT  COUNT(*) AS Part1Answer
FROM    (
         SELECT  x.N AS X, 
                 y.N AS Y,
                 COUNT(*) AS Z
         FROM	OPENROWSET(BULK 'D:\AdventOfCode\2018\Input\Day03.txt', SINGLE_CLOB) dt(FileData)
         CROSS APPLY dbo.Split(dt.FileData, CHAR(13)) ca1
         CROSS APPLY (VALUES (REPLACE(ca1.Item, CHAR(10), ''))) ca2(Item)
         CROSS APPLY (VALUES (CHARINDEX('#', ca2.Item),  -- start of claim ID 
                              CHARINDEX(' @', ca2.Item), -- starting position of rectangle
                              CHARINDEX(': ', ca2.Item), -- size of rectangle
                              CHARINDEX(',', ca2.Item),  -- dimension split of starting position
                              CHARINDEX('x', ca2.Item)   -- split of rectangle size
                              )) ca3(posClaimId, posRectStart, posRectSize, posStartSplit, posSizeSplit)
         CROSS APPLY (VALUES (CONVERT(INTEGER, SUBSTRING(ca2.Item, ca3.posClaimId+1, ca3.posRectStart - ca3.posClaimId - 1)),        -- claimID
                              CONVERT(INTEGER, SUBSTRING(ca2.Item, ca3.posRectStart + 3, ca3.posStartSplit - ca3.posRectStart - 3)), -- left
                              CONVERT(INTEGER, SUBSTRING(ca2.Item, ca3.posStartSplit + 1, ca3.posRectSize - ca3.posStartSplit - 1)), -- top
                              CONVERT(INTEGER, SUBSTRING(ca2.Item, ca3.posRectSize + 2, ca3.posSizeSplit - ca3.posRectSize - 2)),    -- width
                              CONVERT(INTEGER, SUBSTRING(ca2.Item, ca3.posSizeSplit+1, LEN(ca2.Item)) )                              -- height
                              )) ca4(claimId, pLeft, pTop, pWidth, pHeight)
         JOIN    dbo.Tally x ON x.N BETWEEN ca4.pLeft AND ca4.pLeft + ca4.pWidth - 1
         JOIN    dbo.Tally y ON y.N BETWEEN ca4.pTop  AND ca4.pTop + ca4.pHeight - 1
         GROUP BY x.N, y.N
        ) RequiredAlias
WHERE   Z > 1

Two notes to make: First, the results of the sub-query are the same as the used part of the proposed temporary table.

Secondly, I’m not joining to a virtual tally table here – this is a physical tally table. This is for performance reasons – I need an index on the tally table to quickly do these range searches. I created the physical tally table with this code:

CREATE TABLE dbo.Tally (N INTEGER CONSTRAINT PK_Tally PRIMARY KEY);

WITH Tens    (N) AS (SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL 
                     SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
                     SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1),
     Hundreds(N) AS (SELECT 1 FROM Tens t1 CROSS JOIN Tens t2),
     Millions(N) AS (SELECT 1 FROM Hundreds t1 CROSS JOIN Hundreds t2 CROSS JOIN Hundreds t3),
     Tally   (N) AS (SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM Millions)
INSERT INTO dbo.Tally (N) 
SELECT N FROM Tally;

Yes, I use a virtual tally table to create the physical tally table.

Day 3, Part 2

With the correct answer entered in for Part 1, Part 2 unlocks. For this, we need to find out the claims that do not overlap with any other claims.

My approach to this part of the puzzle is to look at all the coordinates of a claim, and to add up their usage using the SUM function. If the SUM is equal to the area of the claim (height x width), this all the coordinates are not shared. This requires getting the actual usage (I can use most of the above sub-query), and then adding this up for each claim.

WITH cte AS
(
SELECT  ca4.claimId,
        ca4.pLeft,
        ca4.pTop,
        ca4.pWidth,
        ca4.pHeight
FROM	OPENROWSET(BULK 'D:\AdventOfCode\2018\Input\Day03.txt', SINGLE_CLOB) dt(FileData)
CROSS APPLY dbo.Split(dt.FileData, CHAR(13)) ca1
CROSS APPLY (VALUES (REPLACE(ca1.Item, CHAR(10), ''))) ca2(Item)
CROSS APPLY (VALUES (CHARINDEX('#', ca2.Item),  -- start of claim ID 
                     CHARINDEX(' @', ca2.Item), -- starting position of rectangle
                     CHARINDEX(': ', ca2.Item), -- size of rectangle
                     CHARINDEX(',', ca2.Item),  -- dimension split of starting position
                     CHARINDEX('x', ca2.Item)   -- split of rectangle size
                     )) ca3(posClaimId, posRectStart, posRectSize, posStartSplit, posSizeSplit)
CROSS APPLY (VALUES (CONVERT(INTEGER, SUBSTRING(ca2.Item, ca3.posClaimId+1, ca3.posRectStart - ca3.posClaimId - 1)),        -- claimID
                     CONVERT(INTEGER, SUBSTRING(ca2.Item, ca3.posRectStart + 3, ca3.posStartSplit - ca3.posRectStart - 3)), -- left
                     CONVERT(INTEGER, SUBSTRING(ca2.Item, ca3.posStartSplit + 1, ca3.posRectSize - ca3.posStartSplit - 1)), -- top
                     CONVERT(INTEGER, SUBSTRING(ca2.Item, ca3.posRectSize + 2, ca3.posSizeSplit - ca3.posRectSize - 2)),    -- width
                     CONVERT(INTEGER, SUBSTRING(ca2.Item, ca3.posSizeSplit+1, LEN(ca2.Item)) )                              -- height
                     )) ca4(claimId, pLeft, pTop, pWidth, pHeight)
)
SELECT  c.claimId, 
        c.pWidth, 
        c.pHeight,
        SUM(F.Z) AS Area
FROM    cte c
JOIN    (
         SELECT  x.N AS X, 
                 y.N AS Y,
                 COUNT(*) AS Z
         FROM	 cte 
         JOIN    dbo.Tally x ON x.N BETWEEN cte.pLeft AND cte.pLeft + cte.pWidth - 1
         JOIN    dbo.Tally y ON y.N BETWEEN cte.pTop  AND cte.pTop +  cte.pHeight - 1
         GROUP BY x.N, y.N
        ) F ON  F.X BETWEEN c.pLeft AND c.pLeft + c.pWidth - 1
            AND F.Y BETWEEN c.pTop  AND c.pTop  + c.pHeight - 1
GROUP BY c.claimId,
         c.pWidth,
         c.pHeight
HAVING SUM(F.Z) = c.pWidth * c.pHeight;

This returns the only claim with no overlapping coordinates.

Summary

And here we have a T-SQL solution for Day 3 of the Advent of Code challenge. The key tasks that we can learn from today are:

  • Loading a file.
  • Split a string on a delimiter.
  • Assigning a sequential number to a set of rows in a specific order.
  • Construction of a physical TALLY table.
  • Common Table Expressions (CTE)

The post Advent of Code 2018 – Day 3 (Slicing) appeared first on Wayne Sheffield.

Comments

Leave a comment on the original post [blog.waynesheffield.com, opens in a new window]

Loading comments...