Once again it is that time. It is TSQL Tuesday. This time it is being hosted by Mike Walsh. Mike has decided to leave the topic open for interpretation like was done last month. So this month we get to incorporate IO into our articles however we wish – so long as we can tie it back to SQL somehow. This time, I will be discussing a topic that I had already once blogged. I had planned on a three part series. Instead, I will write this little excerpt into that series as part 1.5. In that series, my intent was to evaluate several different solutions and compare them, eventually showing my solution and compare it.
In this article, I hope to bridge a gap that I realized would occur due to some omitted information in the first article. Thus a comparison of the IO impact of a poorly performing query and a quick and better performing query. Some setup for this is necessary. I needed a routine to monitor file growth (since this is about IO afterall). I elected to use a method that I have used in the past that is the crux of an article that is to be published on SQLServerCentral March 11. Due to the publication, I will not be discussing that method here.
After establishing the tracking for these experiments, I attempted to create a 1 billion row numbers table. This table was to establish plenty of IO as a baseline for comparison and testing. The table creation failed due to lack of hard drive space. The table creation failed after running for about an hour and then rolled back the transaction. I was able to capture some vital information though. We can see the growth and timeline for the first entry and final entry of this attempt, in the following table.
Notice that the data file grew from 123Mb to 5461MB and the Log grew from 120MB to 17207MB. This also caused 53 log growths with the current log growth settings (10%). The observed IO strain on my laptop was substantial but unmeasured beyond the above table. Under these conditions, the tempdb grew by only about 5MB total.
Since the 1 billion row table was a nogo, I reset to test for a 10 million row test. The table population took 2m18sec this time. The below table shows the database growth and LogGrowths for this test.
Now with a numbers table, I can proceed with the next step in this test. I will use some of the scripts from the FizzBuzz article to demonstrate some of the IO associated with this table (beyond the IO required to create it).
SET STATISTICS io ON SET STATISTICS TIME ON ; WITH Numbers(n) AS ( SELECT 1 UNION ALL SELECT 1 + n FROM Numbers WHERE n < 1000000) SELECT CASE WHEN n%5=0 AND n%3=0 THEN 'FizzBuzz' WHEN n%3 = 0 THEN 'Fizz' WHEN n%5 = 0 THEN 'Buzz' ELSE CAST(n AS VARCHAR(8)) END FROM Numbers OPTION (MAXRECURSION 0); SET STATISTICS TIME OFF SET STATISTICS io OFF
And another query to demonstrate IO.
SET STATISTICS io ON SET STATISTICS TIME ON SELECT TOP 1000000 CASE WHEN ROW_NUMBER() OVER (ORDER BY sc1.someid)%15 = 0 THEN 'FizzBuzz' --Multiples of 3 AND 5 WHEN ROW_NUMBER() OVER (ORDER BY sc1.someid)%3 = 0 THEN 'Pepsi' WHEN ROW_NUMBER() OVER (ORDER BY sc1.someid)%5 = 0 THEN 'Coke' ELSE CAST(ROW_NUMBER() OVER (ORDER BY sc1.someid) AS VARCHAR(8)) END FROM dbo.Numbers sc1 SET STATISTICS TIME OFF SET STATISTICS io OFF
And here is the specific IO stats for this query.
Notice that the IO is better, and might even be considered acceptable for 1 million rows.
The IO-less Solution
What if we could create a numbers table without IO? What if we could perform several different kinds of queries without IO? Itzik Ben Gan proposed a cascading CTE solution that does this sort of thing. Since then, I have seen other solutions that can perform just about as well and without the IO as well. I will demonstrate the cascading CTE solution here.
SET STATISTICS io ON SET STATISTICS TIME ON DECLARE @LIMIT BIGINT SET @LIMIT = 1000000 ; WITH Nbrs_2( n ) AS (SELECT 1 UNION all SELECT 1 UNION all SELECT 1 UNION all SELECT 1 UNION all SELECT 1 UNION all SELECT 1 UNION all SELECT 1 UNION all SELECT 1 UNION all SELECT 1 UNION all SELECT 0), Nbrs_1( n ) AS ( SELECT 1 FROM Nbrs_2 n1 CROSS JOIN Nbrs_2 n2 ), Nbrs_0( n ) AS ( SELECT 1 FROM Nbrs_1 n1 CROSS JOIN Nbrs_1 n2 ), Nbrs ( n ) AS ( SELECT 1 FROM Nbrs_0 n1 CROSS JOIN Nbrs_0 n2 ) SELECT CASE WHEN n % 15 = 0 THEN 'Dr. Pepper' --Multiples of 3 AND 5 WHEN n % 3 = 0 THEN 'Pepsi' WHEN n % 5 = 0 THEN 'Coke' ELSE CAST(n AS VARCHAR(8)) END AS 'FizzBuzz' FROM ( SELECT ROW_NUMBER() OVER (ORDER BY n) FROM Nbrs ) D (n) WHERE n <= ABS(@LIMIT) ; ; SET STATISTICS io OFF SET STATISTICS TIME OFF
IO statistics show no Disk IO for this query. Execution is quite similar on my laptop to the last query – but this query builds both the numbers table and then runs a query against it.
We can see how some queries can create a substantial amount of IO and disk pressure when run. A little work, and we can relieve this IO pressure and come up with a better query.