Writing Faster T-SQL

Introduction

This is the first installment of a series of articles which would look into some of the performance tips that could be applicable to the day-to-day programming life of most SQL Server programmers. Anyone would agree that performance is the key to the success of any application. In this series, I will try to present some of the common scenarios and explain a few alternate approaches which would provide better performance.

This morning I read the article by Wayne Fillis where he presented an excellent explanation of the NOLOCK query optimizer hint. To demonstrate the performance benefits, he populated a table with 5 million records and ran the queries with and without NOLOCK hint. He warns that the TSQL code that generated 5 million rows will take more than an hours on a high-end machine and may take a few hours on a machine with lower configuration.

Many times we need to generate large number of rows to do various performance tests. However, most of the times, it takes a few hours to generate the data that we need. It would be interesting to look into alternate ways to generate data quickly and efficiently. This article presents a very efficient approach to generate large number of rows.

Most of the time, I have seen people writing a loop that runs for N times and executes an INSERT statement. Thus, to generate 5 million rows, you need to execute 5 million INSERT statements. This adds to the major part of the delay that we experience while generating data. If we could use a single INSERT statement that inserts 5 million records at one go, we could gain amazing performance benefits.

Approach

As I mentioned above, we would be using a different approach to generate records. We will use a single INSERT statement that will insert 5 million records to the target table. To do that, essentially, we need to have a SELECT statement that returns the desired number of rows. How do we get into that? The steps below demonstrate that.

The first step is to write a query that returns 10 rows containing numbers 0 to

9. The following TSQL query does that.

     1 SELECT
 0 as Number
     2     
 UNION SELECT 1
     3     
 UNION SELECT 2
     4     
 UNION SELECT 3
     5     
 UNION SELECT 4
     6     
 UNION SELECT 5
     7     
 UNION SELECT 6
     8     
 UNION SELECT 7
     9     
 UNION SELECT 8
    10     
 UNION SELECT 9

Now let us write a query that uses the digits above and generate a result set containing 100 records (0 to 100).

     1 WITH 
 digits AS (
     2     
 SELECT 0 as Number
     3     
 UNION SELECT 1
     4     
 UNION SELECT 2
     5     
 UNION SELECT 3
     6     
 UNION SELECT 4
     7     
 UNION SELECT 5
     8     
 UNION SELECT 6
     9     
 UNION SELECT 7
    10     
 UNION SELECT 8
    11     
 UNION SELECT 9
    12 )
    13 SELECT
 (tens.Number * 10) + ones.Number as Number
    14 FROM 
 digits as tens 
    15 CROSS
 JOIN digits as ones

I am using a CTE (Common Table Expression) to simplify the code above. A CROSS JOIN is used on the previous query and it generates 100 records on the fly. CTE is introduced by SQL Server 2005. You can get the same results in SQL Server 2000 by running the following query.

     1 SELECT
 (tens.Number * 10) + ones.Number as Number
     2 FROM 
 (
     3     
 SELECT 0 as Number
     4     
 UNION SELECT 1
     5     
 UNION SELECT 2
     6     
 UNION SELECT 3
     7     
 UNION SELECT 4
     8     
 UNION SELECT 5
     9     
 UNION SELECT 6
    10     
 UNION SELECT 7
    11     
 UNION SELECT 8
    12     
 UNION SELECT 9
    13 ) as 
 tens 
    14 CROSS
 JOIN (
    15     
 SELECT 0 as Number
    16     
 UNION SELECT 1
    17     
 UNION SELECT 2
    18     
 UNION SELECT 3
    19     
 UNION SELECT 4
    20     
 UNION SELECT 5
    21     
 UNION SELECT 6
    22     
 UNION SELECT 7
    23     
 UNION SELECT 8
    24     
 UNION SELECT 9
    25 ) as 
 ones

The above example clearly shows how useful it is to use a CTE. At the next step, let us generate 10 million records.

     1 WITH 
 digits AS (
     2     
 SELECT 0 as Number
     3     
 UNION SELECT 1
     4     
 UNION SELECT 2
     5     
 UNION SELECT 3
     6     
 UNION SELECT 4
     7     
 UNION SELECT 5
     8     
 UNION SELECT 6
     9     
 UNION SELECT 7
    10     
 UNION SELECT 8
    11     
 UNION SELECT 9
    12 )
    13 SELECT
  
    14       (millions.Number
 * 1000000)
    15     + (hThousands.Number
 * 100000) 
     16      + (tThousands.Number
 * 10000) 
     17      + (thousands.Number
 * 1000) 
     18      + (hundreds.Number
 * 100) 
     19     + (tens.Number
 * 10) 
     20     + ones.Number
 AS Number
     21 
 FROM digits AS ones 
     22 
 CROSS JOIN digits AS tens
     23 
 CROSS JOIN digits AS hundreds
     24 
 CROSS JOIN digits AS thousands
     25 
 CROSS JOIN digits AS tThousands
    26 CROSS
 JOIN digits AS hThousands
    27 CROSS
 JOIN digits AS millions

The above TSQL generates 10 million records on the fly. It may take 1 to 2 minutes depending upon the configuration of your computer. Now that we have the required number of records, we can write the INSERT statement to populate the table that we need.

For the purpose of this example, I would like to take the same example Wayne Fillis presented. I am going to re-write the first example he presented, using the new approach we discussed above. The following query generates 5 million records and inserts them to the ORDERS table.

     1 WITH  
     2 -- first
 CTE which returns 10 rows (0-9)
     3 digits 
 AS (
     4     
 SELECT 0 as Number
     5     
 UNION SELECT 1
     6     
 UNION SELECT 2
     7     
 UNION SELECT 3
     8     
 UNION SELECT 4
     9     
 UNION SELECT 5
    10     
 UNION SELECT 6
    11     
 UNION SELECT 7
    12     
 UNION SELECT 8
    13     
 UNION SELECT 9
    14 )
    15 -- second
 CTE which returns 10 million rows by using
    16 -- a
 CROSS JOIN on the first CTE
    17 , dig 
 AS (
    18     
 SELECT  
    19        
   (millions.Number * 1000000)
    20        
 + (hThousands.Number * 100000) 
    21        
 + (tThousands.Number * 10000) 
    22        
 + (thousands.Number * 1000) 
    23        
 + (hundreds.Number * 100) 
    24        
 + (tens.Number * 10) 
    25        
 + ones.Number AS Number
    26     
 FROM digits AS ones 
    27     
 CROSS JOIN digits AS tens
    28     
 CROSS JOIN digits AS hundreds
    29     
 CROSS JOIN digits AS thousands
    30     
 CROSS JOIN digits AS tThousands
    31     
 CROSS JOIN digits AS hThousands
    32     
 CROSS JOIN digits AS millions
    33 )
    34 -- Third
 CTE which generates a "Product ID" and "Order Date".  
    35 -- Product
 ID is repeated after 500,000 records.  
    36 -- after
 every 1000 records, "order date" moves backward.
    37 , prod 
 AS (
    38 SELECT
 number, (number % 500000) + 1 AS ProductID,
 GETDATE() - (number / 1000) 
 AS OrderDate
    39 FROM 
 dig 
    40 WHERE
 number < 5000000 -- we need only 5 million records
 
    41 )
    42 -- the
 insert statement goes here
    43 INSERT
    Orders (ProductID, OrderDate)
    44     
 SELECT productID, 
    45     
 DATEADD(dd,0, DATEDIFF(dd,0,OrderDate))
 -- strips off "time" from date value
    46     
 FROM prod

The above code runs in 2.45 minutes on my laptop (Toshiba Satellite Pro, Solo Core, 1 GB RAM). It runs much faster

compared to the loop that inserts same number of records in close to 90 minutes.

Conclusions

I referred to the example presented by Wayne for the purpose of demonstration only. His article presented a practical scenario where we need large number of records generated for various performance tests. This article presents a more optimized TSQL code that generates data.

Introduction

Approach

Conclusions

Rate

Share

Categories

Share

Rate

Writing Faster T-SQL

Introduction

Approach

Conclusions

Rate

Share

Categories

Share

Rate

Related content

Uses For Derived Tables

Making Dynamic Queries Static

Optimizing Stored Procedures To Avoid Recompiles

Tip: Testing for the Existence of Data

Introduction to ADO Part 4 - Combining It All

Cookies on SQLServerCentral