Day 4 of the Advent of Code has us mining for data in MD5 hashes, which is out of the comfort zone for a SQL DBA to say the least.
Some would say it’s a poor use for SQL Server, and I’d have to agree. However, sometimes things just need to get done, so I’m playing along with a single tool approach.
Now SQL Server isn’t supposed to do well with row-by-row processing, right? Andy Warren was posing the same question in his post pondering possible solutions to this exact problem. Well…the problem is that row-by-row processing is typically referencing a table every row. Earlier today I mentioned that I didn’t have a good example of set based being slower, but now I found a way. Sure, some could say that it’s poor examples on poor ways to use SQL Server, but come on…we can all find production code that meets that criteria.
So, typical DBA fashion, I made it set-based evaluating values in batches of up to 50,000. It worked.
Here’s my code for Part 2. It’s so close to Part 1 that you’ll have trouble telling the difference.
IF OBJECT_ID('TempDB..#Numbers') IS NOT NULL BEGIN DROP TABLE #Numbers END CREATE TABLE #Numbers ( ID INT IDENTITY(1,1) PRIMARY KEY , b BIT ) DECLARE @StartString varchar(100) = 'iwrupvqb' DECLARE @RowCount Int INSERT INTO #Numbers (b) SELECT 0 SELECT @RowCount = @@RowCount WHILE 0 = (SELECT COUNT(1) FROM (SELECT TOP (@RowCount) ID FROM #Numbers ORDER BY ID DESC ) X WHERE CONVERT(VarChar(10), HASHBYTES('MD5', @StartString + CAST(ID as VarChar(1000))), 2) LIKE '000000%') BEGIN INSERT INTO #Numbers (b) SELECT TOP (50000) b FROM #Numbers SELECT @RowCount = @@RowCount END SELECT ID = MIN(ID) , MD5_Hash = CONVERT(VarChar(1000), HASHBYTES('MD5', @StartString + CAST(min(ID) as VarChar(1000))), 2) FROM (SELECT TOP (@RowCount) ID FROM #Numbers ORDER BY ID DESC ) X WHERE CONVERT(VarChar(10), HASHBYTES('MD5', @StartString + CAST(ID as VarChar(1000))), 2) LIKE '000000%'
Then I realized I didn’t even need a table and just ran the code. As you can see, the code was much easier to write, and it also worked. More importantly, it worked in 36 seconds as opposed to 113 seconds in the set-based approach for part 2. Part 1, in case you’re wondering was 1 second compared to 3 seconds, so the non-set based was still about 3 times faster.
DECLARE @i Int = 0 , @StartString varchar(100) = 'iwrupvqb' WHILE CONVERT(VarChar(10), HASHBYTES('MD5', @StartString + CAST(@i as VarChar(1000))), 2) NOT LIKE '000000%' BEGIN SET @i = @i + 1 END SELECT @i
I’m saving the link to my own post to give to others as an example of when set-based is worse. Hint: it’s ONLY because I wasn’t referencing a table in the loop, and I could probably still find a way to make it more efficient in a set. It’s still true that row-by-row would be slower, I’m cheating by not using rows.
I havent thought of anything better than your loop. Given that we only know its a positive number this could run a very long time. Creating a dictionary on the first pass would have allowed us to rule out some on the second pass, though at the expense of writing it to disk for a “maybe”. Only other option I thought of was service broker to get parallel execution, but I’d bet the overhead cancels out or worse, given that a single hash run adds about 15% load to my laptop.