Data is stored in individual rows in a table, but what is the best method of storing or retrieving that data? Many application developers would be quite familiar with a row-based query approach because it is very similar to the looping constructs found in traditional application development languages. You loop through a set of data and perform an action one element at a time. When this method is used to retrieve data from a data source, it means acting on one row of data at a time. I intentionally used the term "set" there to differentiate how set-based SQL operations work. Rather than process one row at a time, they work against all of the data at once. In the following examples, I will illustrate the differences in design and performance between cursors, while loops, and set operations.
Row-based: Cursors and While loops
Let's take a look at the general syntax to see how it works.
DECLARE MyCursor CURSOR
FOR SELECT ...
OPEN MyCursor
FETCH NEXT FROM MyCursor
INTO @LocalVariable
WHILE @@FETCH_STATUS = 0
BEGIN
-- do something
FETCH NEXT FROM MyCursor
INTO @LocalVariable
END
CLOSE MyCursor
DEALLOCATE MyCursorWe know SELECT is the basic data retrieval command in the SQL language, and here we see it used in the definition of the cursor. Each time we execute the statement "FETCH NEXT", we are essentially re-executing that SELECT statement to get the next top result row. Here is a very similar approach without defining a cursor:
DECLARE @var varchar(50) = '';
WHILE @var is not null
BEGIN
SET @var =
(
SELECT TOP(1) column1 FROM table1
WHERE column1 > @var
ORDER BY column1
)
IF @var is null
BREAK;
-- do something
ENDSet-based
The idea behind a set-based approach is to attempt to retrieve the same final result but with as few set requests as possible. In fact, the ideal scenario is to design a solution that allows the entire final result to be retrieved from a single dataset.
One example to explain them all
Let's take a look at a simple example using AdventureWorks. We want to report the total number of items ordered by all customers during each month of 2007. The first part of the script creates the temp tables. The next 3 parts run the report using a cursor, then a while loop, and finally a single set operation.
BEGIN; -- setup
IF OBJECT_ID('tempdb..#Months') is not null
DROP TABLE #Months;
CREATE TABLE #Months
(
TheMonthNumber char(2)
, TheMonthName varchar(20)
);
INSERT #Months
(TheMonthNumber, TheMonthName)
VALUES
('01', 'January')
, ('02', 'February')
, ('03', 'March')
, ('04', 'April')
, ('05', 'May')
, ('06', 'June')
, ('07', 'July')
, ('08', 'August')
, ('09', 'September')
, ('10', 'October')
, ('11', 'November')
, ('12', 'December')
;
IF OBJECT_ID('tempdb..#MonthlyItemsOrdered') is not null
DROP TABLE #MonthlyItemsOrdered;
CREATE TABLE #MonthlyItemsOrdered
(
TheMonth varchar(20)
, TotalOrdered int
);
END;
GO
PRINT 'Begin cursor';
DBCC DROPCLEANBUFFERS;
DBCC FREEPROCCACHE;
GO
DECLARE @StartTime datetime2(7), @EndTime datetime2(7);
BEGIN; -- cursor
SET @StartTime = SYSDATETIME();
TRUNCATE TABLE #MonthlyItemsOrdered;
DECLARE @MonthNumber char(2), @MonthName varchar(20);
DECLARE @RangeStart date, @RangeEnd date;
DECLARE MyCursor CURSOR FAST_FORWARD FOR
SELECT m.TheMonthNumber, m.TheMonthName
FROM #Months m
;
OPEN MyCursor
FETCH NEXT FROM MyCursor
INTO @MonthNumber, @MonthName
WHILE @@FETCH_STATUS = 0
BEGIN
SET @RangeStart = CONVERT(date, '2007' + @MonthNumber + '01');
SET @RangeEnd = DATEADD(day, -1, DATEADD(month, 1, @RangeStart));
INSERT #MonthlyItemsOrdered
(TheMonth, TotalOrdered)
SELECT
CONCAT(@MonthNumber, ' - ', @MonthName)
, SUM(sod.OrderQty)
FROM AdventureWorks.Sales.SalesOrderDetail sod
INNER JOIN AdventureWorks.Sales.SalesOrderHeader soh
ON soh.SalesOrderID = sod.SalesOrderID
WHERE soh.OrderDate BETWEEN @RangeStart AND @RangeEnd
;
FETCH NEXT FROM MyCursor
INTO @MonthNumber, @MonthName
END
CLOSE MyCursor
DEALLOCATE MyCursor
SET @EndTime = SYSDATETIME();
SELECT * FROM #MonthlyItemsOrdered;
SELECT Style = 'cursor', ElapsedMilliseconds = DATEDIFF(ms, @StartTime, @EndTime);
END;
GO
PRINT 'Begin while loop';
DBCC DROPCLEANBUFFERS;
DBCC FREEPROCCACHE;
GO
DECLARE @StartTime datetime2(7), @EndTime datetime2(7);
BEGIN; -- while loop
SET @StartTime = SYSDATETIME();
TRUNCATE TABLE #MonthlyItemsOrdered;
DECLARE @MonthNumber char(2), @MonthName varchar(20);
DECLARE @RangeStart date, @RangeEnd date;
SET @MonthNumber = '00';
WHILE 1=1 -- break handled within
BEGIN;
SET @MonthName = '';
SELECT TOP(1)
@MonthNumber = m.TheMonthNumber
, @MonthName = m.TheMonthName
FROM #Months m
WHERE m.TheMonthNumber > @MonthNumber
ORDER BY TheMonthNumber
;
IF @MonthName = ''
BREAK;
SET @RangeStart = CONVERT(date, '2007' + @MonthNumber + '01');
SET @RangeEnd = DATEADD(day, -1, DATEADD(month, 1, @RangeStart));
INSERT #MonthlyItemsOrdered
(TheMonth, TotalOrdered)
SELECT
CONCAT(@MonthNumber, ' - ', @MonthName)
, SUM(sod.OrderQty)
FROM AdventureWorks.Sales.SalesOrderDetail sod
INNER JOIN AdventureWorks.Sales.SalesOrderHeader soh
ON soh.SalesOrderID = sod.SalesOrderID
WHERE soh.OrderDate BETWEEN @RangeStart AND @RangeEnd
;
END;
SET @EndTime = SYSDATETIME();
SELECT * FROM #MonthlyItemsOrdered;
SELECT Style = 'while loop', ElapsedMilliseconds = DATEDIFF(ms, @StartTime, @EndTime);
END;
GO
PRINT 'Begin set';
DBCC DROPCLEANBUFFERS;
DBCC FREEPROCCACHE;
GO
DECLARE @StartTime datetime2(7), @EndTime datetime2(7);
BEGIN; -- set
SET @StartTime = SYSDATETIME();
INSERT #MonthlyItemsOrdered
(TheMonth, TotalOrdered)
SELECT
CONCAT(RIGHT(CONCAT('00', MONTH(soh.OrderDate)), 2), ' - ', DATENAME(month, soh.OrderDate))
, SUM(sod.OrderQty)
FROM AdventureWorks.Sales.SalesOrderDetail sod
INNER JOIN AdventureWorks.Sales.SalesOrderHeader soh
ON soh.SalesOrderID = sod.SalesOrderID
WHERE soh.OrderDate BETWEEN CONVERT(date, '20070101') AND CONVERT(date, '20071231')
GROUP BY CONCAT(RIGHT(CONCAT('00', MONTH(soh.OrderDate)), 2), ' - ', DATENAME(month, soh.OrderDate))
;
SET @EndTime = SYSDATETIME();
SELECT * FROM #MonthlyItemsOrdered;
SELECT Style = 'set', ElapsedMilliseconds = DATEDIFF(ms, @StartTime, @EndTime);
END;Performance Results
ElapsedMilliseconds | |||||
Style | Test 1 | Test 2 | Test 3 | Test 4 | Test 5 |
cursor | 451 | 451 | 411 | 457 | 410 |
while loop | 354 | 366 | 364 | 367 | 360 |
set | 258 | 300 | 248 | 196 | 202 |
As you can see from the code, not only is the set-based solution easier to read, it only issues a single statement to the engine as opposed to one statement per row. Looking at the performance results, we see that the set-based solution is consistently much faster than either of the other two.
Conclusion
It turns out what’s good for the goose is not always good for the gander. Those loops that work so well in C# application code aren’t so great in SQL.
It is important to note that set-based design will not automatically result in faster performance, especially if that particular query includes subqueries in the SELECT clause. For example, a query like this:
SELECT a
, (SELECT x FROM table2) AS field1
, (SELECT y FROM table3) AS field2
, (SELECT z FROM table4) AS field3
FROM table1looks like a set-based approach but really isn't because each of those subqueries will have to run for each row returned by the main query. For more information on ways that some attempted set-based solutions really aren't truly set-based after all, read Jeff Moden's article here: http://www.sqlservercentral.com/articles/T-SQL/61539/ entitled "Hidden RBAR: Triangular Joins".