SQLServerCentral Article

Cursor-Killing: Retrieving Recently Modified Data

,

Problem

A common use case for cursors or WHILE loops is when we have recently inserted data and want to get back the identity value immediately.  This is typically facilitated by executing our INSERT, and then using SCOPE_IDENTITY() in order to retrieve our identity and pass it into whatever TSQL or code will require it.

For a few rows, this is an acceptable solution, but for a potentially large data set, this can be a major performance hog.  Inserting one row at a time, grabbing the identity, and then repeating this process over and over again means that we need to access our data over and over again.

Consider a simple cursor-based example such as this one:

USE AdventureWorks
GO
DECLARE @product_id INT
DECLARE @new_id INT
DECLARE product_cursor CURSOR FOR
SELECT
       ProductID
FROM Production.Product
WHERE MakeFlag = 1
AND ListPrice > 2000
AND Name NOT LIKE '%New%'
OPEN product_cursor
FETCH NEXT FROM product_cursor
INTO @product_id
WHILE @@FETCH_STATUS = 0
BEGIN
       INSERT INTO Production.Product
         (    Name ,
              ProductNumber ,
              MakeFlag ,
              FinishedGoodsFlag ,
              Color ,
              SafetyStockLevel ,
              ReorderPoint ,
              StandardCost ,
              ListPrice ,
              Size ,
              SizeUnitMeasureCode ,
              WeightUnitMeasureCode ,
              Weight ,
              DaysToManufacture ,
              ProductLine ,
              Class ,
              Style ,
              ProductSubcategoryID ,
              ProductModelID ,
              SellStartDate ,
              SellEndDate ,
              DiscontinuedDate ,
              rowguid ,
              ModifiedDate)
       SELECT
              Name + 'New',
              ProductNumber + '-N',
              MakeFlag ,
              FinishedGoodsFlag ,
              Color ,
              SafetyStockLevel ,
              ReorderPoint ,
              StandardCost ,
              ListPrice ,
              Size ,
              SizeUnitMeasureCode ,
              WeightUnitMeasureCode ,
              Weight ,
              DaysToManufacture ,
              ProductLine ,
              Class ,
              Style ,
              ProductSubcategoryID ,
              ProductModelID ,
              SellStartDate ,
              SellEndDate ,
              DiscontinuedDate ,
              (SELECT NEWID()) ,
              ModifiedDate
       FROM Production.Product
       WHERE ProductID = @product_id
       SELECT @new_id = SCOPE_IDENTITY()
       FETCH NEXT FROM product_cursor
       INTO @product_id
END
CLOSE product_cursor
DEALLOCATE product_cursor

Above, we create a cursor for the rows we wish to insert, loop through each row in order to collect our new ProductID, and then do it all over again.  A look at the execution plan shows operation after operation: a long string of inserts, each with an identical execution plan.  Clearly, this isn’t scalable.

Solution

Starting in SQL Server 2005, we were given the ability to use the OUTPUT clause in order to retrieve modified data on-the-fly, with no need for iteration.  Using this, we can place an OUTPUT INSERTED or OUTPUT DELETED in the DML statement in question and get ALL of the changed data that we want back, whether it is an identity or not.

To solve the dilemma above, we can use OUTPUT INSERTED to pull back the new ProductID as well as the Name:

DECLARE @temp_products TABLE
(productID INT, Name NVARCHAR(50))
INSERT INTO Production.Product
  (    Name ,
       ProductNumber ,
       MakeFlag ,
       FinishedGoodsFlag ,
       Color ,
       SafetyStockLevel ,
       ReorderPoint ,
       StandardCost ,
       ListPrice ,
       Size ,
       SizeUnitMeasureCode ,
       WeightUnitMeasureCode ,
       Weight ,
       DaysToManufacture ,
       ProductLine ,
       Class ,
       Style ,
       ProductSubcategoryID ,
       ProductModelID ,
       SellStartDate ,
       SellEndDate ,
       DiscontinuedDate ,
       rowguid ,
       ModifiedDate)
OUTPUT INSERTED.ProductID, INSERTED.Name
INTO @temp_products
SELECT
       Name + 'New',
       ProductNumber + '-N',
       MakeFlag ,
       FinishedGoodsFlag ,
       Color ,
       SafetyStockLevel ,
       ReorderPoint ,
       StandardCost ,
       ListPrice ,
       Size ,
       SizeUnitMeasureCode ,
       WeightUnitMeasureCode ,
       Weight ,
       DaysToManufacture ,
       ProductLine ,
       Class ,
       Style ,
       ProductSubcategoryID ,
       ProductModelID ,
       SellStartDate ,
       SellEndDate ,
       DiscontinuedDate ,
       (SELECT NEWID()) ,
       ModifiedDate
FROM Production.Product
WHERE MakeFlag = 1
AND ListPrice > 2000
SELECT
       *
FROM @temp_products

Note that the data we collect from our OUTPUT can be stored in a temp table or table variable, as shown above, or returned to the application by leaving out the INTO clause altogether.  In addition, we can use OUTPUT on a scalar value insert as well, if you wanted to INSERT INTO table(…)VALUES(…).

In terms of performance, we have replaced a huge batch of individual statements with a single SQL statement, greatly reducing our overhead, not to mention the sheer complexity of our execution plan.

Solution #2

There is an additional use case not covered above that can pose an additional challenge to us.  What if we want to return data beyond what is contained in INSERTED or DELETED?  What if we also wanted to return additional columns from other tables that we joined as part of a query?  The syntax for OUTPUT does not allow us to include data from other tables (though we can include scalar variables in here if we wish).  How do we mix & match a variety of data all in one SQL statement?

Here is what we might try to do if we started down this route with no knowledge of the limitations of OUTPUT:

DECLARE @temp_products TABLE
(productID INT, Name NVARCHAR(50), ProductModelName NVARCHAR(50))
INSERT INTO Production.Product
  (    Name ,
       ProductNumber ,
       MakeFlag ,
       FinishedGoodsFlag ,
       Color ,
       SafetyStockLevel ,
       ReorderPoint ,
       StandardCost ,
       ListPrice ,
       Size ,
       SizeUnitMeasureCode ,
       WeightUnitMeasureCode ,
       Weight ,
       DaysToManufacture ,
       ProductLine ,
       Class ,
       Style ,
       ProductSubcategoryID ,
       ProductModelID ,
       SellStartDate ,
       SellEndDate ,
       DiscontinuedDate ,
       rowguid ,
       ModifiedDate)
OUTPUT INSERTED.ProductID, INSERTED.Name, ProductModel.Name
INTO @temp_products
SELECT
       Product.Name + 'New',
       ProductNumber + '-N',
       MakeFlag ,
       FinishedGoodsFlag ,
       Color ,
       SafetyStockLevel ,
       ReorderPoint ,
       StandardCost ,
       ListPrice ,
       Size ,
       SizeUnitMeasureCode ,
       WeightUnitMeasureCode ,
       Weight ,
       DaysToManufacture ,
       ProductLine ,
       Class ,
       Style ,
       ProductSubcategoryID ,
       Product.ProductModelID ,
       SellStartDate ,
       SellEndDate ,
       DiscontinuedDate ,
       (SELECT NEWID()) ,
       Product.ModifiedDate
FROM Production.Product
LEFT JOIN Production.ProductModel
ON ProductModel.ProductModelID = Product.ProductModelID
WHERE MakeFlag = 1
AND ListPrice > 2000
SELECT
       *
FROM @temp_products

In my attempt to return the product model name, I throw an unlikely error:

The multi-part identifier "ProductModel.Name" could not be bound.

It turns out that the OUTPUT clause doesn’t allow us to pull data from additional tables as we may have thought it could.  OUTPUT only has access to INSERTED and DELETED, but not to any other tables that are joined to below.

The answer is the MERGE statement, which was added in SQL Server 2008.  MERGE allows us to update, insert, and delete data all at once, and is quite efficient at doing so.  Let’s take our failed attempt to get this data and rewrite it using MERGE:

DECLARE @temp_products TABLE
(productID INT, Name NVARCHAR(50), ProductModelName NVARCHAR(50))
MERGE INTO Production.Product AS target
USING (SELECT Product.*, ProductModel.Name AS ModelName FROM Production.Product
LEFT JOIN Production.ProductModel
ON ProductModel.ProductModelID = Product.ProductModelID
WHERE MakeFlag = 1
AND ListPrice > 2000) AS source
ON (1 = 2)
WHEN NOT MATCHED THEN
  INSERT (Name ,
          ProductNumber ,
          MakeFlag ,
          FinishedGoodsFlag ,
          Color ,
          SafetyStockLevel ,
          ReorderPoint ,
          StandardCost ,
          ListPrice ,
          Size ,
          SizeUnitMeasureCode ,
          WeightUnitMeasureCode ,
          Weight ,
          DaysToManufacture ,
          ProductLine ,
          Class ,
          Style ,
          ProductSubcategoryID ,
          ProductModelID ,
          SellStartDate ,
          SellEndDate ,
          DiscontinuedDate ,
          rowguid ,
          ModifiedDate)
       VALUES (
                     source.Name + 'New',
                     source.ProductNumber + '-N',
                     source.MakeFlag ,
                     source.FinishedGoodsFlag ,
                     source.Color ,
                     source.SafetyStockLevel ,
                     source.ReorderPoint ,
                     source.StandardCost ,
                     source.ListPrice ,
                     source.Size ,
                     source.SizeUnitMeasureCode ,
                     source.WeightUnitMeasureCode ,
                     source.Weight ,
                     source.DaysToManufacture ,
                     source.ProductLine ,
                     source.Class ,
                     source.Style ,
                     source.ProductSubcategoryID ,
                     source.ProductModelID ,
                     source.SellStartDate ,
                     source.SellEndDate ,
                     source.DiscontinuedDate ,
                     (SELECT NEWID()) ,
                     source.ModifiedDate)
       OUTPUT INSERTED.ProductID,
              INSERTED.Name,
              source.ModelName
       INTO @temp_products;

In this example, we can use OUTPUT to get back any columns from the source defined above.  This includes the table we are inserting to or any tables we are joining into.  The “ON (1 = 2)” is used as a placeholder as we are not actually comparing our source and target.  If we wanted to perform a conditional insert, we could change that to any comparison and act on it when MATCHED or NOT MATCHED.  Regardless of the complexity of the MERGE statement, the functionality to output any columns from DELETED, INSERTED, or SOURCE is preserved, not to mention very useful.

Conclusion

The ability to retrieve, store, and return data as soon as it is modified is critical to many software applications.  For a very small data set, an iterative solution may be adequate, but when we are working with larger row counts, performance will become a significant problem.  Using OUTPUT as part of your UPDATE, INSERT, or DELETE statements can allow you make changes and return those changes in a single set-based step, greatly improving efficiency.  In addition, using MERGE can allow you to craft even more flexible TSQL by defining a source that can also be OUTPUT from.  Using either of these features can eliminate the need for WHILE loops or cursors while greatly improving the performance and scalability of your code!

Rate

4.88 (42)

You rated this post out of 5. Change rating

Share

Share

Rate

4.88 (42)

You rated this post out of 5. Change rating