SQLServerCentral Article

Writing Faster T-SQL

,

Introduction

This is the first installment of a series of articles which would look into some of the performance tips that could be applicable to the day-to-day programming life of most SQL Server programmers. Anyone would agree that performance is the key to the success of any application. In this series, I will try to present some of the common scenarios and explain a few alternate approaches which would provide better performance.

This morning I read the article by Wayne Fillis where he presented an excellent explanation of the NOLOCK query optimizer hint. To demonstrate the performance benefits, he populated a table with 5 million records and ran the queries with and without NOLOCK hint. He warns that the TSQL code that generated 5 million rows will take more than an hours on a high-end machine and may take a few hours on a machine with lower configuration.

Many times we need to generate large number of rows to do various performance tests. However, most of the times, it takes a few hours to generate the data that we need. It would be interesting to look into alternate ways to generate data quickly and efficiently. This article presents a very efficient approach to generate large number of rows.

Most of the time, I have seen people writing a loop that runs for N times and executes an INSERT statement. Thus, to generate 5 million rows, you need to execute 5 million INSERT statements. This adds to the major part of the delay that we experience while generating data. If we could use a single INSERT statement that inserts 5 million records at one go, we could gain amazing performance benefits.

Approach

As I mentioned above, we would be using a different approach to generate records. We will use a single INSERT statement that will insert 5 million records to the target table. To do that, essentially, we need to have a SELECT statement that returns the desired number of rows. How do we get into that? The steps below demonstrate that.

The first step is to write a query that returns 10 rows containing numbers 0 to

9. The following TSQL query does that.

    1 SELECT

0 as Number

    2    

UNION SELECT 1

    3    

UNION SELECT 2

    4    

UNION SELECT 3

    5    

UNION SELECT 4

    6    

UNION SELECT 5

    7    

UNION SELECT 6

    8    

UNION SELECT 7

    9    

UNION SELECT 8

   10    

UNION SELECT 9

Now let us write a query that uses the digits above and generate a result set containing 100 records (0 to 100). 

    1 WITH

digits AS (

    2    

SELECT 0 as Number

    3    

UNION SELECT 1

    4    

UNION SELECT 2

    5    

UNION SELECT 3

    6    

UNION SELECT 4

    7    

UNION SELECT 5

    8    

UNION SELECT 6

    9    

UNION SELECT 7

   10    

UNION SELECT 8

   11    

UNION SELECT 9

   12 )

   13 SELECT

(tens.Number * 10) + ones.Number as Number

   14 FROM

digits as tens

   15 CROSS

JOIN digits as ones

I am using a CTE (Common Table Expression) to simplify the code above.  A CROSS JOIN is used on the previous query and it generates 100 records on the fly. CTE is introduced by SQL Server 2005. You can get the same results in SQL Server 2000 by running the following query.

    1 SELECT

(tens.Number * 10) + ones.Number as Number

    2 FROM

(

    3    

SELECT 0 as Number

    4    

UNION SELECT 1

    5    

UNION SELECT 2

    6    

UNION SELECT 3

    7    

UNION SELECT 4

    8    

UNION SELECT 5

    9    

UNION SELECT 6

   10    

UNION SELECT 7

   11    

UNION SELECT 8

   12    

UNION SELECT 9

   13 ) as

tens

   14 CROSS

JOIN (

   15    

SELECT 0 as Number

   16    

UNION SELECT 1

   17    

UNION SELECT 2

   18    

UNION SELECT 3

   19    

UNION SELECT 4

   20    

UNION SELECT 5

   21    

UNION SELECT 6

   22    

UNION SELECT 7

   23    

UNION SELECT 8

   24    

UNION SELECT 9

   25 ) as

ones

The above example clearly shows how useful it is to use a CTE. At the next step, let us generate 10 million records.

    1 WITH

digits AS (

    2    

SELECT 0 as Number

    3    

UNION SELECT 1

    4    

UNION SELECT 2

    5    

UNION SELECT 3

    6    

UNION SELECT 4

    7    

UNION SELECT 5

    8    

UNION SELECT 6

    9    

UNION SELECT 7

   10    

UNION SELECT 8

   11    

UNION SELECT 9

   12 )

   13 SELECT

   14       (millions.Number

* 1000000)

   15     + (hThousands.Number

* 100000)

    16      + (tThousands.Number

* 10000)

    17      + (thousands.Number

* 1000)

    18      + (hundreds.Number

* 100)

    19     + (tens.Number

* 10)

    20     + ones.Number

AS Number

    21

FROM digits AS ones

    22

CROSS JOIN digits AS tens

    23

CROSS JOIN digits AS hundreds

    24

CROSS JOIN digits AS thousands

    25

CROSS JOIN digits AS tThousands

   26 CROSS

JOIN digits AS hThousands

   27 CROSS

JOIN digits AS millions

The above TSQL generates 10 million records on the fly. It may take 1 to 2 minutes depending upon the configuration of your computer. Now that we have the required number of records, we can write the INSERT statement to populate the table that we need.

For the purpose of this example, I would like to take the same example Wayne Fillis presented. I am going to re-write the first example he presented, using the new approach we discussed above. The following query generates 5 million records and inserts them to the ORDERS table. 

    1 WITH

    2 -- first

CTE which returns 10 rows (0-9)

    3 digits

AS (

    4    

SELECT 0 as Number

    5    

UNION SELECT 1

    6    

UNION SELECT 2

    7    

UNION SELECT 3

    8    

UNION SELECT 4

    9    

UNION SELECT 5

   10    

UNION SELECT 6

   11    

UNION SELECT 7

   12    

UNION SELECT 8

   13    

UNION SELECT 9

   14 )

   15 -- second

CTE which returns 10 million rows by using

   16 -- a

CROSS JOIN on the first CTE

   17 , dig

AS (

   18    

SELECT

   19        

  (millions.Number * 1000000)

   20        

+ (hThousands.Number * 100000)

   21        

+ (tThousands.Number * 10000)

   22        

+ (thousands.Number * 1000)

   23        

+ (hundreds.Number * 100)

   24        

+ (tens.Number * 10)

   25        

+ ones.Number AS Number

   26    

FROM digits AS ones

   27    

CROSS JOIN digits AS tens

   28    

CROSS JOIN digits AS hundreds

   29    

CROSS JOIN digits AS thousands

   30    

CROSS JOIN digits AS tThousands

   31    

CROSS JOIN digits AS hThousands

   32    

CROSS JOIN digits AS millions

   33 )

   34 -- Third

CTE which generates a "Product ID" and "Order Date".

   35 -- Product

ID is repeated after 500,000 records.

   36 -- after

every 1000 records, "order date" moves backward.

   37 , prod

AS (

   38 SELECT

number, (number % 500000) + 1 AS ProductID,

GETDATE() - (number / 1000)

AS OrderDate

   39 FROM

dig

   40 WHERE

number < 5000000 -- we need only 5 million records

🙂

   41 )

   42 -- the

insert statement goes here

   43 INSERT

   Orders (ProductID, OrderDate)

   44    

SELECT productID,

   45    

DATEADD(dd,0, DATEDIFF(dd,0,OrderDate))

-- strips off "time" from date value

   46    

FROM prod

The above code runs in 2.45 minutes on my laptop (Toshiba Satellite Pro, Solo Core, 1 GB RAM). It runs much faster

compared to the loop that inserts same number of records in close to 90 minutes.

Conclusions

I referred to the example presented by Wayne for the purpose of demonstration only. His article presented a practical scenario where we need large number of records generated for various performance tests. This article presents a more optimized TSQL code that generates data.

Rate

4.41 (17)

Share

Share

Rate

4.41 (17)