Blog Post

Advent of Code – Day 4

,

Day 4 of the Advent of Code has us mining for data in MD5 hashes, which is out of the comfort zone for a SQL DBA to say the least.

Some would say it’s a poor use for SQL Server, and I’d have to agree.  However, sometimes things just need to get done, so I’m playing along with a single tool approach.

Now SQL Server isn’t supposed to do well with row-by-row processing, right?  Andy Warren was posing the same question in his post pondering possible solutions to this exact problem. Well…the problem is that row-by-row processing is typically referencing a table every row.  Earlier today I mentioned that I didn’t have a good example of set based being slower, but now I found a way.  Sure, some could say that it’s poor examples on poor ways to use SQL Server, but come on…we can all find production code that meets that criteria.

So, typical DBA fashion, I made it set-based evaluating values in batches of up to 50,000.  It worked.

Here’s my code for Part 2.  It’s so close to Part 1 that you’ll have trouble telling the difference.

IF OBJECT_ID('TempDB..#Numbers') IS NOT NULL BEGIN
DROP TABLE #Numbers 
END
CREATE TABLE #Numbers
(
ID INT IDENTITY(1,1) PRIMARY KEY
, b BIT
)
DECLARE @StartString varchar(100) = 'iwrupvqb'
DECLARE @RowCount Int 
INSERT INTO #Numbers (b)
SELECT 0
SELECT @RowCount = @@RowCount 
WHILE 0 = (SELECT COUNT(1) 
FROM (SELECT TOP (@RowCount) ID 
FROM #Numbers 
ORDER BY ID DESC
) X 
WHERE CONVERT(VarChar(10), HASHBYTES('MD5', @StartString + CAST(ID as VarChar(1000))), 2) LIKE '000000%') 
BEGIN
INSERT INTO #Numbers (b)  
SELECT TOP (50000) b
FROM #Numbers

SELECT @RowCount = @@RowCount 
END
SELECT ID = MIN(ID)
, MD5_Hash = CONVERT(VarChar(1000), HASHBYTES('MD5', @StartString + CAST(min(ID) as VarChar(1000))), 2) 
FROM (SELECT TOP (@RowCount) ID 
FROM #Numbers 
ORDER BY ID DESC
) X 
WHERE CONVERT(VarChar(10), HASHBYTES('MD5', @StartString + CAST(ID as VarChar(1000))), 2) LIKE '000000%'

Then I realized I didn’t even need a table and just ran the code. As you can see, the code was much easier to write, and it also worked. More importantly, it worked in 36 seconds as opposed to 113 seconds in the set-based approach for part 2. Part 1, in case you’re wondering was 1 second compared to 3 seconds, so the non-set based was still about 3 times faster.

DECLARE @i Int = 0
, @StartString varchar(100) = 'iwrupvqb'
WHILE CONVERT(VarChar(10), HASHBYTES('MD5', @StartString + CAST(@i as VarChar(1000))), 2) NOT LIKE '000000%' BEGIN
SET @i = @i + 1
END
SELECT @i 

I’m saving the link to my own post to give to others as an example of when set-based is worse. Hint: it’s ONLY because I wasn’t referencing a table in the loop, and I could probably still find a way to make it more efficient in a set.  It’s still true that row-by-row would be slower, I’m cheating by not using rows.

Rate

Share

Share

Rate