Writing Faster T-SQL

  • Comments posted here are about the content posted at http://www.sqlservercentral.com/columnists/jSebastian/2944.asp

    .

  • Jacob,

    Your idea is great but your code is too complex for what it does and it takes too long to run... for example, you say the code to produce a million numbers takes a minute or two to run... the following does the same thing and only takes 21 seconds to run...

    --===== Create and populate the test table on the fly

     SELECT TOP 10000000

            IDENTITY(INT,1,1) AS Number

       INTO dbo.TestTable

       FROM Master.dbo.SysColumns sc1,

            Master.dbo.SysColumns sc2

    ...and still you end up with nothing real to test on.

    Thought I'd take this opportunity to show you the test table I use for performance testing on a great number of different methods... the basis of the code can be used to gen just about any test scenario you please... and it's stupid simple...

    --===== Create and populate a test table.

         -- Column "SomeDate" has a range of  >=01/01/2000  <01/01/2010

     SELECT TOP 10000000

            RowNum     = IDENTITY(INT,1,1),

            SomeInt    = CAST(RAND(CAST(NEWID() AS VARBINARY))*50000+1 AS INT),

            SomeString = CHAR(STR(RAND(CAST(NEWID() AS VARBINARY))*25+65))

                       + CHAR(STR(RAND(CAST(NEWID() AS VARBINARY))*25+65)),

            SomeCSV    = 'Part1,Part2,Part3,Part4,Part5,Part6,Part7,Part8,Part9,Part10',

            SomeNumber = CAST(RAND(CAST(NEWID() AS VARBINARY))*100 AS MONEY),

            SomeDate   = CAST(RAND(CAST(NEWID() AS VARBINARY))*3653.0+36524.0 AS DATETIME)

       INTO dbo.JBMTest

       FROM Master.dbo.SysColumns sc1,

            Master.dbo.SysColumns sc2 --Lack of join criteria makes this a CROSS-JOIN

    --===== Add a primary key just to make it a "real" table...

      ALTER TABLE dbo.JBMTest

            ADD PRIMARY KEY CLUSTERED (RowNum)

    Of course, you can easily change the number of rows and, with a bit of forethought, you can change the ranges of numbers/values produced by simply changing some of the multipliers or additive offsets.

    The above code will gen a million rows (including the Clustered PK) in about 43 seconds and 10 million rows in a bit over 8 minutes.

    Like I said, I really appreciate your efforts in your article, but there are easier ways to gen volumes of test data...

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Hi Jeff,

    Thanks for sharing your code. This is indeed a nice approach.

    If you look at my last example, I had said 2.45 minutes for 5 million rows, which also include an applicaton specific logic used for the generated records. I used a CTE to present the idea that rows can be really generated ON THE FLY. If we use a physical table, instead of a CTE, the query be would much faster.

    I appreciate your feedback.

    After the introduction of Visual Studio Team Edition For Database Professionals widely known as dbpro or datadude, test data generation has been much easier and productive. dbpro has tools that  can generate test data automatically for the unit-tests that you write.

    Thanks again

    Jacob

    .

  • Heh... Thanks Jacob...

    Ok... apples to apples on the last bit of code... the following takes 1:09 including the PK.

    --===== Create and populate a test table.

         -- Column "SomeDate" has a range of  >=01/01/2000  <01/01/2010

     SELECT TOP 5000000

            RowNum     = IDENTITY(INT,1,1),

            SomeInt    = CAST(RAND(CAST(NEWID() AS VARBINARY))*50000+1 AS INT),

            SomeDate   = CAST(FLOOR(RAND(CAST(NEWID() AS VARBINARY))*3653.0+36524.0) AS DATETIME)

       INTO dbo.JBMTest

       FROM Master.dbo.SysColumns sc1,

            Master.dbo.SysColumns sc2 --Lack of join criteria makes this a CROSS-JOIN

    --===== Add a PK

      ALTER TABLE dbo.JBMTest

            ADD PRIMARY KEY CLUSTERED (RowNum)

    Again, you can tweek the ranges of data to anything you'd like.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Hi Jeff,

    I agree with you. You won 🙂

    Jacob

    .

  • I thought the article was badly named - should have been something 'Building Test Data Faster with TSQL' or something like that.

  • Nice article!

    It is a pretty good example of CTE.... if you think about some other logic in the CTE other than just creating 10, maybe something more complex.... it would be much faster than using a loop....

     

    Mark

  • Actually, I'm sorry... I normally don't turn things into a race.  Just meant to show an alternative. 

    No matter what method you use, your article hits upon something that a lot of developers overlook... testing for performance and creating volumes of data to do just that.  Good article, Jacob.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Hi jeff,

    I have taken your comment in the CORRECT sense only. It was really valuable and I see that the approach you suggested is better. I would welcome your comments in the future articles too.

    Believe me, I too, did not find it to be a race. It was just a friendly conclusion that I put on my previous comment.

    Jacob

    .

  • thanks for the intro to CTE's! I know I've written the same SELECT statement into 2 parts of a query before. No more ... ah ... once we get all our customers to switch to 2005!

  • I ran across a way to generate the numbers list in Itzik Ben-Gan's book "Inside Microsoft SQL Server 2005 T-SQL Querying".  Itzik is a genious, and I highly recommend his book. Here is the code:

    WITH

    L0

    AS (SELECT 1 AS C UNION ALL SELECT 1), --2 rows

    L1

    AS (SELECT 1 AS C FROM L0 AS A, L0 AS B),--4 rows

    L2

    AS (SELECT 1 AS C FROM L1 AS A, L1 AS B),--16 rows

    L3

    AS (SELECT 1 AS C FROM L2 AS A, L2 AS B),--256 rows

    L4

    AS (SELECT 1 AS C FROM L3 AS A, L3 AS B),--65536 rows

    L5

    AS (SELECT 1 AS C FROM L4 AS A, L4 AS B),--4294967296 rows

    num

    AS (SELECT ROW_NUMBER() OVER(ORDER BY C) AS N FROM L5)

    SELECT

    N FROM NUM WHERE N <= 1000000;

     

    Aunt Kathi Data Platform MVP
    Author of Expert T-SQL Window Functions
    Simple-Talk Editor

  • Jacob,

    Use UNION ALL instead of UNION unless you explicitly need to remove duplicates - this might perform faster.

  • Just out of curiosity, can someone who has SQL Server 2005 (unfortunately, I don't) tell me how long that bit of code takes to run?  Thanks.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff,

       I ran the code on SQL Server 2005 and it generated 1000000 numbers in 9 seconds. Quite impressive.

    (the server in question is using 4 CPUS and 2GB memory.)

    Paul

     

  • This is quite intersting!

    Took 8 seconds on my laptop running SQL Server Express edition.

    (Toshiba Satellite Pro, 1 GB RAM, single processor)

    - Jacob

    .

Viewing 15 posts - 1 through 15 (of 32 total)

You must be logged in to reply to this topic. Login to reply