Advent of Code – Day 4

Day 4 of the Advent of Code has us mining for data in MD5 hashes, which is out of the comfort zone for a SQL DBA to say the least.

Some would say it’s a poor use for SQL Server, and I’d have to agree.  However, sometimes things just need to get done, so I’m playing along with a single tool approach.

Now SQL Server isn’t supposed to do well with row-by-row processing, right?  Andy Warren was posing the same question in his post pondering possible solutions to this exact problem. Well…the problem is that row-by-row processing is typically referencing a table every row.  Earlier today I mentioned that I didn’t have a good example of set based being slower, but now I found a way.  Sure, some could say that it’s poor examples on poor ways to use SQL Server, but come on…we can all find production code that meets that criteria.

So, typical DBA fashion, I made it set-based evaluating values in batches of up to 50,000.  It worked.

Here’s my code for Part 2.  It’s so close to Part 1 that you’ll have trouble telling the difference.

IF OBJECT_ID('TempDB..#Numbers') IS NOT NULL BEGIN
	DROP TABLE #Numbers 
END

CREATE TABLE #Numbers
(
	ID INT IDENTITY(1,1) PRIMARY KEY
	, b BIT
)

DECLARE @StartString varchar(100) = 'iwrupvqb'
DECLARE @RowCount Int 

INSERT INTO #Numbers (b)
SELECT 0

SELECT @RowCount = @@RowCount 

WHILE 0 = (SELECT COUNT(1) 
			FROM (SELECT TOP (@RowCount) ID 
					FROM #Numbers 
					ORDER BY ID DESC
				) X 
			WHERE CONVERT(VarChar(10), HASHBYTES('MD5', @StartString + CAST(ID as VarChar(1000))), 2) LIKE '000000%') 
	BEGIN

	INSERT INTO #Numbers (b)  
	SELECT TOP (50000) b
	FROM #Numbers
	
	SELECT @RowCount = @@RowCount 
END

SELECT ID = MIN(ID)
	, MD5_Hash = CONVERT(VarChar(1000), HASHBYTES('MD5', @StartString + CAST(min(ID) as VarChar(1000))), 2) 
FROM (SELECT TOP (@RowCount) ID 
		FROM #Numbers 
		ORDER BY ID DESC
	) X 
WHERE CONVERT(VarChar(10), HASHBYTES('MD5', @StartString + CAST(ID as VarChar(1000))), 2) LIKE '000000%'

Then I realized I didn’t even need a table and just ran the code. As you can see, the code was much easier to write, and it also worked. More importantly, it worked in 36 seconds as opposed to 113 seconds in the set-based approach for part 2. Part 1, in case you’re wondering was 1 second compared to 3 seconds, so the non-set based was still about 3 times faster.

DECLARE @i Int = 0
	, @StartString varchar(100) = 'iwrupvqb'

WHILE CONVERT(VarChar(10), HASHBYTES('MD5', @StartString + CAST(@i as VarChar(1000))), 2) NOT LIKE '000000%' BEGIN
	SET @i = @i + 1
END

SELECT @i 

I’m saving the link to my own post to give to others as an example of when set-based is worse. Hint: it’s ONLY because I wasn’t referencing a table in the loop, and I could probably still find a way to make it more efficient in a set.  It’s still true that row-by-row would be slower, I’m cheating by not using rows.

Advertisement

One thought on “Advent of Code – Day 4

  1. I havent thought of anything better than your loop. Given that we only know its a positive number this could run a very long time. Creating a dictionary on the first pass would have allowed us to rule out some on the second pass, though at the expense of writing it to disk for a “maybe”. Only other option I thought of was service broker to get parallel execution, but I’d bet the overhead cancels out or worse, given that a single hash run adds about 15% load to my laptop.

Questions are some of the sincerest compliments

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s