SQL Clone
SQLServerCentral is supported by Redgate
 
Log in  ::  Register  ::  Not logged in
 
 
 

Get Set For Speed

By Tom Staab, (first published: 2015/12/29)

Data is stored in individual rows in a table, but what is the best method of storing or retrieving that data? Many application developers would be quite familiar with a row-based query approach because it is very similar to the looping constructs found in traditional application development languages. You loop through a set of data and perform an action one element at a time. When this method is used to retrieve data from a data source, it means acting on one row of data at a time. I intentionally used the term "set" there to differentiate how set-based SQL operations work. Rather than process one row at a time, they work against all of the data at once. In the following examples, I will illustrate the differences in design and performance between cursors, while loops, and set operations.

Row-based: Cursors and While loops

Let's take a look at the general syntax to see how it works.

DECLARE MyCursor CURSOR
    FOR SELECT ...
OPEN MyCursor

FETCH NEXT FROM MyCursor
INTO @LocalVariable

WHILE @@FETCH_STATUS = 0
BEGIN
    -- do something
    FETCH NEXT FROM MyCursor
    INTO @LocalVariable
END
CLOSE MyCursor
DEALLOCATE MyCursor

We know SELECT is the basic data retrieval command in the SQL language, and here we see it used in the definition of the cursor. Each time we execute the statement "FETCH NEXT", we are essentially re-executing that SELECT statement to get the next top result row. Here is a very similar approach without defining a cursor:

DECLARE @var varchar(50) = '';
WHILE @var is not null
BEGIN
    SET @var =
    (
        SELECT TOP(1) column1 FROM table1
        WHERE column1 > @var
        ORDER BY column1
    )
    IF @var is null
        BREAK;
    -- do something
END

Set-based

The idea behind a set-based approach is to attempt to retrieve the same final result but with as few set requests as possible. In fact, the ideal scenario is to design a solution that allows the entire final result to be retrieved from a single dataset.

One example to explain them all

Let's take a look at a simple example using AdventureWorks. We want to report the total number of items ordered by all customers during each month of 2007. The first part of the script creates the temp tables. The next 3 parts run the report using a cursor, then a while loop, and finally a single set operation.

BEGIN; -- setup
    IF OBJECT_ID('tempdb..#Months') is not null
        DROP TABLE #Months;
    CREATE TABLE #Months
    (
        TheMonthNumber char(2)
        , TheMonthName varchar(20)
    );
    INSERT #Months
        (TheMonthNumber, TheMonthName)
        VALUES
          ('01', 'January')
        , ('02', 'February')
        , ('03', 'March')
        , ('04', 'April')
        , ('05', 'May')
        , ('06', 'June')
        , ('07', 'July')
        , ('08', 'August')
        , ('09', 'September')
        , ('10', 'October')
        , ('11', 'November')
        , ('12', 'December')
    ;
    IF OBJECT_ID('tempdb..#MonthlyItemsOrdered') is not null
        DROP TABLE #MonthlyItemsOrdered;
    CREATE TABLE #MonthlyItemsOrdered
    (
        TheMonth varchar(20)
        , TotalOrdered int
    );
END;
GO

PRINT 'Begin cursor';
DBCC DROPCLEANBUFFERS;
DBCC FREEPROCCACHE;
GO
DECLARE @StartTime datetime2(7), @EndTime datetime2(7);
BEGIN; -- cursor
    SET @StartTime = SYSDATETIME();
    TRUNCATE TABLE #MonthlyItemsOrdered;
    DECLARE @MonthNumber char(2), @MonthName varchar(20);
    DECLARE @RangeStart date, @RangeEnd date;
    DECLARE MyCursor CURSOR FAST_FORWARD FOR
        SELECT m.TheMonthNumber, m.TheMonthName
        FROM #Months m
    ;
    OPEN MyCursor

    FETCH NEXT FROM MyCursor
    INTO @MonthNumber, @MonthName

    WHILE @@FETCH_STATUS = 0
    BEGIN
        SET @RangeStart = CONVERT(date, '2007' + @MonthNumber + '01');
        SET @RangeEnd = DATEADD(day, -1, DATEADD(month, 1, @RangeStart));

        INSERT #MonthlyItemsOrdered
            (TheMonth, TotalOrdered)
            SELECT
                CONCAT(@MonthNumber, ' - ', @MonthName)
                , SUM(sod.OrderQty)
            FROM AdventureWorks.Sales.SalesOrderDetail sod
                INNER JOIN AdventureWorks.Sales.SalesOrderHeader soh
                    ON soh.SalesOrderID = sod.SalesOrderID
            WHERE soh.OrderDate BETWEEN @RangeStart AND @RangeEnd
        ;

        FETCH NEXT FROM MyCursor 
        INTO @MonthNumber, @MonthName
    END
    CLOSE MyCursor
    DEALLOCATE MyCursor

    SET @EndTime = SYSDATETIME();
    SELECT * FROM #MonthlyItemsOrdered;
    SELECT Style = 'cursor', ElapsedMilliseconds = DATEDIFF(ms, @StartTime, @EndTime);
END;
GO

PRINT 'Begin while loop';
DBCC DROPCLEANBUFFERS;
DBCC FREEPROCCACHE;
GO
DECLARE @StartTime datetime2(7), @EndTime datetime2(7);
BEGIN; -- while loop
    SET @StartTime = SYSDATETIME();
    TRUNCATE TABLE #MonthlyItemsOrdered;
    DECLARE @MonthNumber char(2), @MonthName varchar(20);
    DECLARE @RangeStart date, @RangeEnd date;

    SET @MonthNumber = '00';
    WHILE 1=1 -- break handled within
    BEGIN;
        SET @MonthName = '';
        SELECT TOP(1)
            @MonthNumber = m.TheMonthNumber
            , @MonthName = m.TheMonthName
        FROM #Months m
        WHERE m.TheMonthNumber > @MonthNumber
        ORDER BY TheMonthNumber
        ;

        IF @MonthName = ''
            BREAK;
        SET @RangeStart = CONVERT(date, '2007' + @MonthNumber + '01');
        SET @RangeEnd = DATEADD(day, -1, DATEADD(month, 1, @RangeStart));

        INSERT #MonthlyItemsOrdered
            (TheMonth, TotalOrdered)
            SELECT
                CONCAT(@MonthNumber, ' - ', @MonthName)
                , SUM(sod.OrderQty)
            FROM AdventureWorks.Sales.SalesOrderDetail sod
                INNER JOIN AdventureWorks.Sales.SalesOrderHeader soh
                    ON soh.SalesOrderID = sod.SalesOrderID
            WHERE soh.OrderDate BETWEEN @RangeStart AND @RangeEnd
        ;
    END;

    SET @EndTime = SYSDATETIME();
    SELECT * FROM #MonthlyItemsOrdered;
    SELECT Style = 'while loop', ElapsedMilliseconds = DATEDIFF(ms, @StartTime, @EndTime);
END;
GO

PRINT 'Begin set';
DBCC DROPCLEANBUFFERS;
DBCC FREEPROCCACHE;
GO
DECLARE @StartTime datetime2(7), @EndTime datetime2(7);
BEGIN; -- set
    SET @StartTime = SYSDATETIME();
    INSERT #MonthlyItemsOrdered
        (TheMonth, TotalOrdered)
        SELECT
            CONCAT(RIGHT(CONCAT('00', MONTH(soh.OrderDate)), 2), ' - ', DATENAME(month, soh.OrderDate))
            , SUM(sod.OrderQty)
        FROM AdventureWorks.Sales.SalesOrderDetail sod
            INNER JOIN AdventureWorks.Sales.SalesOrderHeader soh
                ON soh.SalesOrderID = sod.SalesOrderID
        WHERE soh.OrderDate BETWEEN CONVERT(date, '20070101') AND CONVERT(date, '20071231')
        GROUP BY CONCAT(RIGHT(CONCAT('00', MONTH(soh.OrderDate)), 2), ' - ', DATENAME(month, soh.OrderDate))
    ;

    SET @EndTime = SYSDATETIME();
    SELECT * FROM #MonthlyItemsOrdered;
    SELECT Style = 'set', ElapsedMilliseconds = DATEDIFF(ms, @StartTime, @EndTime);
END;

Performance Results

ElapsedMilliseconds

Style

Test 1

Test 2

Test 3

Test 4

Test 5

cursor

451

451

411

457

410

while loop

354

366

364

367

360

set

258

300

248

196

202

As you can see from the code, not only is the set-based solution easier to read, it only issues a single statement to the engine as opposed to one statement per row. Looking at the performance results, we see that the set-based solution is consistently much faster than either of the other two.

Conclusion

It turns out what’s good for the goose is not always good for the gander. Those loops that work so well in C# application code aren’t so great in SQL.

It is important to note that set-based design will not automatically result in faster performance, especially if that particular query includes subqueries in the SELECT clause. For example, a query like this:

SELECT a
    , (SELECT x FROM table2) AS field1
    , (SELECT y FROM table3) AS field2
    , (SELECT z FROM table4) AS field3
FROM table1

looks like a set-based approach but really isn't because each of those subqueries will have to run for each row returned by the main query. For more information on ways that some attempted set-based solutions really aren't truly set-based after all, read Jeff Moden's article here: http://www.sqlservercentral.com/articles/T-SQL/61539/ entitled "Hidden RBAR: Triangular Joins".

 

Resources:

SQL set-based vs row-based.sql
Total article views: 6142 | Views in the last 30 days: 6
 
Related Articles
FORUM

Cursor

cursor

FORUM

Cursor

Cursor Examples Wanted

BLOG

Calculating the next beginning of month and the current end of month

Handling dates is always a fun challenge in T-SQL! Finding the current end of month and next months ...

FORUM

Cursors

How to Update using Cursors?

FORUM

Select all months bewtween two dates

Select all months bewtween two dates

Tags
cursor    
loop    
set-based    
t-sql    
 
Contribute