SQLServerCentral Article

Optimizing a cursor based routine – Part 1

,

When presented with a poorly performing stored procedure, there can be a one single statement that is to be optimized. However, more often than not this is not the case. There will be many statements each of which may perform well in isolation; the issue is the amount of times that they are executed. It’s a “Death by a thousand cuts scenario”.

SQL is a language designed to work with large datasets and SQL Server works best when given instructions that exploit that. Procedural programming techniques such as cursors, looping, multi-line table valued functions (not to be confused with Inline table valued functions), and scalar UDF's, can limit the efficiency of a stored procedure. If you able to work around these, the results can be astounding.

Here, we have a relatively simple stored procedure that produces a sales report from the AdventureWorks database. I have created it echoing the common issues that I see posted on various forums. The code is designed to run against the 2008R2 release of adventure works. This can be obtained from http://msftdbprodsamples.codeplex.com/releases/view/55926 , the executable file name is AdventureWorks2008R2_SR1.exe.

 CREATE PROCEDURE GetSalesReport
 AS
 DECLARE @SalesOrderID INTEGER
 DECLARE @Status TINYINT
 DECLARE @OrderDate DATE,@DueDateDATE,@ShipDateDATE
 DECLARE @CustomerID INTEGER
 DECLARE @CustomerMinOrderDate DATE
 DECLARE @CustomerMaxOrderDate DATE
 DECLARE @OrderCount INTEGER
 DECLARE @OrderTotal1 MONEY,@POrder1VARCHAR(50)
 DECLARE @OrderTotal2 MONEY,@POrder2VARCHAR(50)
 DECLARE @OrderTotal3 MONEY,@POrder3VARCHAR(50)
 DECLARE @OrderTotal4 MONEY,@POrder4VARCHAR(50)
 DECLARE @TotalDue MONEY,@PorderVARCHAR(50)
 DECLARE @ShippedTotal MONEY
 DECLARE @CustomerTotalDue MONEY
 DECLARE @CustomerName NVARCHAR(255)
 DECLARE @CustomerType NCHAR(1)
 DECLARE @Today        DATE
 SELECT @Today = '20040724'-- GETDATE() would normally be used here.
 CREATE TABLE #ReportResults
 (
 CustomerId  INTEGER NOT NULL,
 CustomerName NVARCHAR(255)NOT NULL,
 TotalDue MONEY,
 ShippedTotal MONEY,
 OrderDate DATE NULL,
 DueDate DATE NULL,
 ShipDate DATE NULL,
 CustomerMinOrderDate DATE NULL,
 CustomerMaxOrderDate DATE NULL,
 POrder1 VARCHAR(50)NULL,OrderTotal1 MONEY NULL,
 POrder2 VARCHAR(50)NULL,OrderTotal2 MONEY NULL,
 POrder3 VARCHAR(50)NULL,OrderTotal3 MONEY NULL,
 POrder4 VARCHAR(50)NULL,OrderTotal4 MONEY NULL
 )
 DECLARE CustomerCur CURSOR STATIC READ_ONLY
    FOR SELECT CustomerId,CustomerType
              FROM Sales.Customer
 OPEN CustomerCur
 WHILE(0=0)BEGIN
   FETCH NEXT FROM CustomerCur INTO @CustomerId,@CustomerType
   IF(@@FETCH_STATUS<>0)BREAK
   SELECT @OrderCount =1
   SELECT @OrderDate =NULL,
         @DueDate   =NULL,
         @ShipDate  =NULL
   IF(@CustomerType ='I')BEGIN /* Individual */      SELECT @CustomerName =FirstName+' '+ LastName
        FROM Person.Contact
        JOIN Sales.Individual
         ON Contact.ContactID= Individual.ContactID
           WHERE Individual.CustomerID= @CustomerId
   END ELSE BEGIN /* Store */      SELECT @CustomerName =Name
            FROM Sales.Store
           WHERE CustomerID =@CustomerId
   END
   SELECT TOP(1)
         @OrderDate = OrderDate,
         @DueDate = DueDate,
         @ShipDate = ShipDate
    FROM sales.SalesOrderHeader
    WHERE CustomerID =@CustomerID
      AND DueDate>=@Today
    ORDER BY DueDate
   SELECT @CustomerMinOrderDate = MIN(OrderDate),
         @CustomerMaxOrderDate = MAX(OrderDate)
     FROM sales.SalesOrderHeader
    WHERE CustomerID =@CustomerID
   SELECT @CustomerTotalDue = SUM(TotalDue)
     FROM sales.salesorderheader
    WHERE CustomerID =@CustomerID
   SELECT @ShippedTotal =SUM(totalDue)
     FROM sales.SalesOrderHeader
    WHERE CustomerID =@CustomerID
      AND Status= 5
   SELECT @OrderTotal1 =NULL, @POrder1 =NULL,
              @OrderTotal2 =NULL, @POrder2 =NULL,
              @OrderTotal3 =NULL, @POrder3 =NULL,
              @OrderTotal4 =NULL, @POrder4 =NULL
   SELECT @OrderCount =1
   DECLARE PrevSales CURSOR STATICREAD_ONLY
       FOR SELECT PurchaseOrderNumber,TotalDue
                FROM sales.salesOrderHeader
                  WHERE CustomerId = @CustomerID
                  ORDER BY TotalDue DESC
   OPEN PrevSales
   FETCH NEXT FROM PrevSales INTO@POrder,@TotalDue
   WHILE(@@FETCH_STATUS =0)BEGIN
          IF(@OrderCount = 1 ) BEGIN
             SELECT @OrderTotal1 =@TotalDue,@POrder1= @POrder
          END
          IF(@OrderCount = 2 ) BEGIN
             SELECT@OrderTotal2 = @TotalDue,@POrder2= @POrder
          END
          IF(@OrderCount = 3 ) BEGIN
             SELECT @OrderTotal3 =@TotalDue,@POrder3= @POrder
          END
          IF(@OrderCount = 4 ) BEGIN
             SELECT @OrderTotal4 =@TotalDue,@POrder4= @POrder
          END
          FETCH NEXT FROM PrevSales INTO @POrder,@TotalDue
          SELECT@OrderCount=@OrderCount+1
   END
   CLOSE PrevSales
   DEALLOCATE PrevSales
   INSERT INTO#ReportResults(CustomerId,CustomerName,
                             TotalDue,ShippedTotal,
                             OrderDate,DueDate,ShipDate,
                             CustomerMinOrderDate,CustomerMaxOrderDate,
                             POrder1,POrder2,POrder3,POrder4,
                             OrderTotal1,OrderTotal2,OrderTotal3,OrderTotal4)
                    VALUES(@CustomerId,@CustomerName,
                           @CustomerTotalDue,@ShippedTotal,
                            @OrderDate,@DueDate,@ShipDate,
                          @CustomerMinOrderDate,@CustomerMaxOrderDate,
                          @POrder1,@POrder2,@POrder3,@POrder4,
                           @OrderTotal1,@OrderTotal2,@OrderTotal3,@OrderTotal4)
 END
 CLOSE CustomerCur
 DEALLOCATE Customercur
 SELECT * FROM #ReportResults

Here are the timings, in milliseconds, of 5 runs of the above procedure.

ProcedureVersion Run 1 Run 2 Run 3 Run 4 Run 5 Average Ms
0 12,413 12,496 12,716 14,143 13,873 13,128

If we were to analyse each individual statement here, we would conclude that we could potentially add an index or two onto SalesOrderHeader to support the order by clauses and to remove bookmark lookups and save some resources there.

But, is there anything else that can be done? As stated before, this routine is simply doing far too much work.

In a procedural language such as C#, VB , C etc.., this kind of looping process is the simplest way for humans to relate multiple sets of data, and it is ingrained into our ''developmental' thoughts very early in our careers. There is nothing wrong, with this per-se but within SQL we have much bigger and better tools readily available to relate datasets and we have to recode our SQL into a state which will enable the optimizer to fully utilize ( and/or even consider) its full range of functionality. Dare I quote Yoda? “You must unlearn, what you have learned”. It’s a bit of a cliché but a statement that does apply very well here.

By using a procedural technique, not only is the order of the data operations hard coded but so has the data-access method that SQL Server will be using. Every operation in the code example is effectively a nested loop operation, which are only generally effective for small datasets, but not large ones. ‘Small’ and ‘large’ are relative terms and rather than a strict number of rows being defined as large or small, we are talking about the relationships between tables. If we are joining 100% of two tables together then a hash or merge join would be more appropriate. Not only that, but parallelism cannot be considered as a possible cost/time saving measure when such looping comes into play.

I would visualise this as running a haulage company and only carrying one package in your van at any one time. To run an efficient operation you need to fully load your van AND plan an efficient route around your drop-off/collection points. If you don’t valuable time and money will be wasted.

So, starting from the inside and working out, let’s see how much of the code we can change / replace to create a better performing set based solution.

The first piece of code to be addressed is:

SELECT @CustomerTotalDue = SUM(TotalDue)
  FROM sales.salesorderheader
 WHERE CustomerID =@CustomerID
SELECT @ShippedTotal = SUM(totalDue)
  FROM sales.SalesOrderHeader
 WHERE CustomerID =@CustomerID
   AND Status= 5

What we have here are two very similar pieces of code reading from the same table, Sales.SalesOrderHeader, the only difference being the filter of “Status =5” to sum up the TotalDue of the orders that have shipped.

Note that at present all the orders in the table have a status of 5, so that”Select count(*) from SalesOrderHeader” and “select count(*) from SalesOrderHeader” where status =5 give the same result. This means that @CustomerTotalDue and @ShippedTotal will have the value.

I’m of the opinion that a piece data should be read only once and every read should be for a reason. In the code snippets above, we are instigating two reads of data when one would be perfectly adequate. Generally speaking, we should be concerned with minimizing the amount of data read as THAT is the most expensive operation in that code even if the data is held within cache. To minimise the amount of IO used by the queries, we shall combine the two SUM operations into a single query. Instead of filtering the rows where status =5 in the WHERE clause, a case statement will be used in the sum aggregate to meet the same needs. Additionally, for reasons that we shall see at a later point, we will create an inline table valued function (ITVF).

CREATE FUNCTION SumCustomerTotals(@CustomerId INTEGER)
 RETURNS TABLE
 AS
 RETURN
 (
  SELECT CustomerTotalDue = SUM(TotalDue),
         ShippedTotal =SUM(CASE WHEN status =5 THEN TotalDue ELSE 0 END)
    FROM sales.salesorderheader
   WHERE CustomerID =@CustomerID
 )

Imagine the scenario where this is the only code executed; We have doubled the performance of our system in one easy stroke.

Using a case statement within an aggregate can be a shortcut for multiple count aggregates, as well. Consider the scenario where we wish to count the total number of rows for a specific customer and the number of rows for that same customer that meet a certain condition, say where TotalDue >= 50; Traditionally we would write code such as:

SELECT CustomerID,
       COUNT(*) AS  TotalCount
  FROM sales.SalesOrderHeader
 WHERE CustomerID = 11185
 GROUP BY CustomerID
SELECT CustomerID,
       COUNT(*) AS  HighCost
  FROM sales.SalesOrderHeader
 WHERE CustomerID = 11185
   AND TotalDue>= 50
 GROUP BY CustomerID

Once again two operations are used where we can use the case statement to help combine them into one. By wrapping a case expression that will return 1 if true and 0 if false and then wrapping that within a sum aggregate expression, we can bring the two separate table lookups together into one.

select CustomerID,
       COUNT(*) as  TotalCount,
       SUM(case when TotalDue>=50 then 1 else 0 end)as HighCost
  from sales.SalesOrderHeader
 where CustomerID = 11185
 group by CustomerID

But, I digress.

To execute the SumCustomerTotals function that we built previously, call it like this :

Select @NonSpecialOfferTotal= NonSpecialOfferTotal,
        @Total =Total
   from dbo.SumCustomerTotals(@SalesOrderId)

A similar technique can be applied to the PrevSales cursor loop. Here, we wish to assign the purchase order number and the total amount of the 4 orders with the highest total due into 4 sets of variables.

DECLARE PrevSales CURSOR STATICREAD_ONLY
      FOR SELECT PurchaseOrderNumber,TotalDue
                FROM sales.salesOrderHeader
                  WHERE CustomerId = @CustomerID
                  ORDER BY TotalDue DESC
   OPEN PrevSales
   FETCH NEXT FROM PrevSales INTO@POrder,@TotalDue
   WHILE(@@FETCH_STATUS =0)BEGIN
          IF(@OrderCount = 1 ) BEGIN
             SELECT @OrderTotal1 =@TotalDue,@POrder1= @POrder
          END
          IF(@OrderCount = 2 ) BEGIN
             SELECT @OrderTotal2 =@TotalDue,@POrder2= @POrder
          END
          IF(@OrderCount = 3 ) BEGIN
             SELECT @OrderTotal3 =@TotalDue,@POrder3= @POrder
          END
          IF(@OrderCount = 4 ) BEGIN
             SELECT @OrderTotal4 =@TotalDue,@POrder4= @POrder
          END
          FETCH NEXT FROM PrevSales INTO @POrder,@TotalDue
          SELECT@OrderCount=@OrderCount+1
   END
   CLOSE PrevSales
   DEALLOCATE PrevSales

What the user has attempted to achieve is a pivot operation. Personally, I have found the PIVOT operator itself quite cumbersome, not to mention the fact that a single PIVOT statement does not support multiple pivoted columns, and so I prefer a “Manual pivot” which is also known as a “Cross Tab”.

Firstly we need a value to pivot by. If the data for such a value does not already exist within the table(s) we’re pivoting, as it does not in this case, we can create it by using the ROW_NUMBER() ranking function. We can then use this new value in a case statement to assign a specific value to a specific column, and then utilize the MIN aggregate function to create a single row to be assigned to the variables. Again, we shall create an Inline table valued function

CREATE FUNCTION GetTop4DueOrders(@CustomerId INTEGER)
 RETURNS TABLE
 AS
 RETURN
 (
 WITH cteTop4
 AS
 (
 SELECT TOP(4)
       PurchaseOrderNumber,
       TotalDue,
       ROW_NUMBER()OVER(ORDER BY TotalDue DESC)AS RowN
 FROM  sales.salesOrderHeader
 WHERE CustomerId = @CustomerID
 ORDER BY TotalDue DESC
 )
 SELECT MIN(CASE WHEN RowN = 1 THEN PurchaseOrderNumber END)AS POrder1,
       MIN(CASE WHEN RowN = 1 THEN TotalDue END)AS TotalDue1,
       MIN(CASE WHEN RowN = 2 THEN PurchaseOrderNumber END)AS POrder2,
       MIN(CASE WHEN RowN = 2 THEN TotalDue END)AS TotalDue2,
       MIN(CASE WHEN RowN = 3 THEN PurchaseOrderNumber END)AS POrder3,
       MIN(CASE WHEN RowN = 3 THEN TotalDue END)AS TotalDue3,
       MIN(CASE WHEN RowN = 4 THEN PurchaseOrderNumber END)AS POrder4,
       MIN(CASE WHEN RowN = 4 THEN TotalDue END)AS TotalDue4
  FROM ctetop4
 )
GO

The statement to get the min and max order dates for the Customer can, for the time being, be simply wrapped inside an ITVF, as well.

Create Function GetMinMaxOrderDates(@CustomerIDinteger)
 returns table
 as
 return
 (
 Select CustomerMinOrderDate = MIN(OrderDate),
       CustomerMaxOrderDate = MAX(OrderDate)
  from sales.SalesOrderHeader
 where CustomerID =@CustomerID
 )
Once these functions are folded into the procedure we are left with:
 CREATE PROCEDURE GetSalesReport_Optimize1
 AS
 DECLARE @OrderDate DATE,@DueDateDATE,@ShipDateDATE
 DECLARE @CustomerID INTEGER
 DECLARE @CustomerMinOrderDate DATE
 DECLARE @CustomerMaxOrderDate DATE
 DECLARE @CustomerName  NVARCHAR(255)
 DECLARE @CustomerType  NCHAR(1)
 DECLARE @OrderCount INTEGER
 DECLARE @OrderTotal1 MONEY,@POrder1VARCHAR(50)
 DECLARE @OrderTotal2 MONEY,@POrder2VARCHAR(50)
 DECLARE @OrderTotal3 MONEY,@POrder3VARCHAR(50)
 DECLARE @OrderTotal4 MONEY,@POrder4VARCHAR(50)
 DECLARE @TotalDue MONEY,@PorderVARCHAR(50)
 DECLARE @CustomerTotalDue MONEY,@ShippedTotal MONEY
 DECLARE @Today        DATE
 SELECT @Today = '20040724'-- GETDATE()
 CREATE TABLE #ReportResults
 (
 CustomerId  INTEGER NOT NULL,
 CustomerName NVARCHAR(255)NOT NULL,
 TotalDue MONEY NULL,
 ShippedTotal MONEY NULL,
 OrderDate DATE NULL,
 DueDate DATE NULL,
 ShipDate DATE NULL,
 CustomerMinOrderDate DATE NULL,
 CustomerMaxOrderDate DATE NULL,
 POrder1 VARCHAR(50)NULL,OrderTotal1 MONEY NULL,
 POrder2 VARCHAR(50)NULL,OrderTotal2 MONEY NULL,
 POrder3 VARCHAR(50)NULL,OrderTotal3 MONEY NULL,
 POrder4 VARCHAR(50)NULL,OrderTotal4 MONEY NULL
 )
 DECLARE CustomerCur CURSOR FOR
 SELECT Customer.CustomerId,CustomerwType
  FROM Sales.Customer
 OPEN CustomerCur
 WHILE(0=0)BEGIN
   FETCH NEXT FROM CustomerCur INTO @CustomerId,@CustomerType
   IF(@@FETCH_STATUS<>0)BREAK
   SELECT @OrderDate =NULL,
         @DueDate   =NULL,
         @ShipDate  =NULL
   IF(@CustomerType ='I')BEGIN /* Individual */      SELECT @CustomerName =FirstName+' '+ LastName
        FROM Person.Contact
        JOIN Sales.Individual
         ON Contact.ContactID= Individual.ContactID
           WHERE Individual.CustomerID= @CustomerId
   END ELSE BEGIN /* Store */      SELECT @CustomerName =Name
            FROM Sales.Store
           WHERE CustomerID =@CustomerId
   END
   SELECT @OrderDate =OrderDate,
         @DueDate   =DueDate,
         @ShipDate  =ShipDate
    FROM dbo.GetNextOrderDue(@CustomerId,@ToDay)
   SELECT @CustomerMinOrderDate = CustomerMinOrderDate,
         @CustomerMaxOrderDate = CustomerMaxOrderDate
    FROM dbo.GetMinMaxOrderDates(@CustomerID)
   SELECT @OrderTotal1 =TotalDue1,@POrder1= POrder1,
         @OrderTotal2 = TotalDue2,@POrder1= POrder2,
         @OrderTotal3 = TotalDue3,@POrder1= POrder3,
         @OrderTotal4 = TotalDue4,@POrder1= POrder4
         FROM dbo.GetTop4DueOrders(@CustomerID)PurchOrders
   SELECT @CustomerTotalDue = CustomerTotalDue,
         @ShippedTotal     = ShippedTotal
    FROM dbo.SumCustomerTotals(@CustomerId)
   INSERT INTO#ReportResults(CustomerId,CustomerName,TotalDue,ShippedTotal,
                             OrderDate,DueDate,ShipDate,
                             CustomerMinOrderDate,CustomerMaxOrderDate,
                             POrder1,POrder2,POrder3,POrder4,
                             OrderTotal1,OrderTotal2,OrderTotal3,OrderTotal4)
                    VALUES(@CustomerId,@CustomerName,@CustomerTotalDue,@ShippedTotal,
                           @OrderDate,@DueDate,@ShipDate,
                            @CustomerMinOrderDate,@CustomerMaxOrderDate,
                           @POrder1,@POrder2,@POrder3,@POrder4,
                           @OrderTotal1,@OrderTotal2,@OrderTotal3,@OrderTotal4)
 END
 CLOSE CustomerCur
 DEALLOCATE CustomerCur
 SELECT * FROM #ReportResults

Now things are starting to look a bit cleaner. Let’s see how the timings are shaping up:

Version 1 2 3 4 5 Average % of Orig
0 12,413 12,496 12,716 14,143 13,873 13,128
1 10,496 10,643 10,656 11,340 11,540 10,935 83%

Already a noticeable 17% drop in execution time.

That’s as far as I want to take it for the first half of this series. Next time we shall continue the process of unpicking the report and creating alternative code to improve performance.

Many thanks must go to the Sql Server community in general, but more specifically to those who have given their free time to proof read these articles.

Resources

Rate

3.95 (21)

You rated this post out of 5. Change rating

Share

Share

Rate

3.95 (21)

You rated this post out of 5. Change rating