T-SQL: Why “It Depends”

  • The Dixie Flatline (3/16/2010)


    Jeff: Simply thanks. I've told you before at length how much I owe you. It's good not to be savaged with a pork chop first thing in the morning. 😉

    BWAA-HAAA!!!! Well, "It Depends" on whether it's a smoked pork chop or not and whether there's a bit of apple-sauce to go with it. 🙂 Mmmmmm... pork chops.... gaaahhhh....

    It's difficult to convince new and even some intermediate users that SQL Server has a proverbial "mind of its own" called the "optimizer" and that what works very well for a thousand rows may actually not be "scalable". I think this article demonstrates that quite nicely. Very well done (and I'm not talking pork chops :-)), Bob.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Just to chime in, great article.

    This kind of thing is why performance tuning isn't "one and done", but is an ongoing battle. Good illustration on that.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • Well done.

    Great layout of examples to test several scenarios to show why you want to test with some different volumes.

    I can see this as being pointed to often.

    Greg E

  • I don't tend to praise a lot of articles, this one has my 5 stars!

    Great Job.


    * Noel

  • Am I the only one who copied the code directly and got 3 different result sets for each query?

    Granted I used RedGate to generate 500,000 rows in the example table, but I got 699 Rows with the first example, 101 Rows with the 2nd and 102 rows with the 3rd.

    I limited the generation of data to have 100 products and dates to reside in 2 years.

  • Well done Bob.

    You're moving up in the world. First a presentation to the Memphis PASS Chapter on SQL Server Spatial Data, now this article. Next step, book. So when's the book coming out? 🙂



    Alvin Ramard
    Memphis PASS Chapter[/url]

    All my SSC forum answers come with a money back guarantee. If you didn't like the answer then I'll gladly refund what you paid for it.

    For best practices on asking questions, please read the following article: Forum Etiquette: How to post data/code on a forum to get the best help[/url]

  • Not only does this article provide a concise explanation of "It depends" but it also provides a good example of how articles should be written for SQLServerCentral. (well written, to the point, contains easily-consumed example scripts & charts)

    Thanks!

  • Thank you for a very clear and useful article. It provides a good reference to bring to some of those difficult discussions (you know the ones). Now I can start with, "What conditions did you test it under?" and give them a link (or a clue) 🙂

  • TedS (3/16/2010)


    Am I the only one who copied the code directly and got 3 different result sets for each query?

    Granted I used RedGate to generate 500,000 rows in the example table, but I got 699 Rows with the first example, 101 Rows with the 2nd and 102 rows with the 3rd.

    I limited the generation of data to have 100 products and dates to reside in 2 years.

    Please go back and look at Paul's comment on the format of the code and see if that explains your issue.

    Greg E

  • Good Job Bob... 🙂

    -Roy

  • I really appreciate the depth on such a common topic. This is a simple scenario with great depth of information that should help any newcomer understand the nuances of databases and that number of records along with indexes can have a significant impact on performance. Thank you and I actually learned from this article after many years of database development.

    Owen White

  • Greg Edwards-268690 (3/16/2010)


    TedS (3/16/2010)


    Am I the only one who copied the code directly and got 3 different result sets for each query?

    Granted I used RedGate to generate 500,000 rows in the example table, but I got 699 Rows with the first example, 101 Rows with the 2nd and 102 rows with the 3rd.

    I limited the generation of data to have 100 products and dates to reside in 2 years.

    Please go back and look at Paul's comment on the format of the code and see if that explains your issue.

    Greg E

    I did, it doesn't.

    My generated data should work just fine, the results of the 3 queries (both seperate and together) are simply different.

    While I do appreciate the effort and agree with the premise that there isn't a one size fits all soloution (it does really depend) after I fix the (formatting related) syntax errors the results aren't the same.

  • Great article Bob. It couldn't have been easy to find examples that lined up so nicely to demonstrate your point. I'm guessing there are quite a few hours of work in that piece.

    Seth Phelabaum


    Consistency is only a virtue if you're not a screwup. 😉

    Links: How to Post Sample Data[/url] :: Running Totals[/url] :: Tally Table[/url] :: Cross Tabs/Pivots[/url] :: String Concatenation[/url]

  • TedS (3/16/2010)


    I did, it doesn't. My generated data should work just fine, the results of the 3 queries (both seperate and together) are simply different. While I do appreciate the effort and agree with the premise that there isn't a one size fits all soloution (it does really depend) after I fix the (formatting related) syntax errors the results aren't the same.

    I think the problem is in CTE1:

    SELECT TOP(@rows) *

    FROM dbo.SalesDetail

    ...add an ORDER BY ID to each CTE1 and it should produce the same results.

    My formatted version of the code for anyone interested (with ORDER BY 'fix'):

    ------------------------------------------------------------

    -- Testing All Three Techniques

    ------------------------------------------------------------

    DECLARE @results

    TABLE (

    Technique VARCHAR(20) NOT NULL,

    RowsCount INTEGER NOT NULL,

    Millisecs INTEGER NOT NULL

    );

    DECLARE @timer DATETIME;

    DECLARE @rows INTEGER;

    DECLARE @loop INTEGER;

    SET @rows = 1000; -- will be incremented X10 up to a million

    WHILE @rows <= 1000000

    BEGIN -- OUTER LOOP based on number of rows

    SET @loop = 1;

    WHILE @loop <= 50; -- INNER LOOP based on count to 10

    BEGIN

    ------------------------------------------------------------

    -- Join Detail to MAX([SaleDate])

    ------------------------------------------------------------

    SET @timer = GETDATE();

    WITH CTE1

    AS (

    SELECT TOP(@rows) *

    FROM dbo.SalesDetail

    ORDER BY ID

    ),

    CTE2

    AS (

    SELECT product,

    MAX(SaleDate) AS LastDate

    FROM cte1

    GROUP BY

    product

    )

    SELECT S.*

    INTO #results1

    FROM dbo.SalesDetail S

    JOIN cte2 C

    ON C.product = S.product

    AND S.SaleDate = C.lastDate;

    INSERT @results

    SELECT 'Match MAX()',

    @rows,

    DATEDIFF(MILLISECOND, @timer, GETDATE());

    ------------------------------------------------------------

    -- SELECT TOP 1 for each product by SaleDate desc

    ------------------------------------------------------------

    SET @timer = GETDATE();

    WITH CTE1

    AS (

    SELECT TOP(@rows) *

    FROM dbo.SalesDetail

    ORDER BY ID

    ),

    CTE2

    AS (

    SELECT DISTINCT product

    FROM CTE1

    )

    SELECT ca.*

    INTO #results2

    FROM CTE2 C

    CROSS

    APPLY (

    SELECT TOP (1) *

    FROM dbo.SalesDetail S

    WHERE S.product = C.product

    ORDER BY

    SaleDate DESC

    ) CA;

    INSERT @results

    SELECT 'Top (1)',

    @rows,

    DATEDIFF(MILLISECOND, @timer, GETDATE());

    --------------------------------------------------------------------

    -- use ROW_NUMBER() function to assign sequence IDs to rows

    --------------------------------------------------------------------

    SET @timer = GETDATE();

    WITH CTE1

    AS (

    SELECT TOP(@rows) *

    FROM dbo.SalesDetail

    ORDER BY ID

    ),

    CTE2

    AS (

    SELECT *,

    ROW_NUMBER() OVER(

    PARTITION BY product

    ORDER BY SaleDate DESC)

    AS seqID

    FROM CTE1

    )

    SELECT *

    INTO #results3

    FROM CTE2

    WHERE seqID = 1;

    INSERT @results

    SELECT 'Row_Number()',

    @rows,

    DATEDIFF(MILLISECOND, @timer, GETDATE())

    -- note: prior to adding loops, the queries were tested to ensure

    -- they were producing identical result sets

    DROP TABLE #results1;

    DROP TABLE #results2;

    DROP TABLE #results3 ;

    SET @loop = @loop + 1;

    END; -- INNER LOOP

    SET @rows = @rows * 10;

    END; -- INNER LOOP

    ----------------------------------------------------------

    SELECT Technique,

    RowsCount,

    AVG(1.0 * Millisecs) AS AvgMS

    FROM @results

    GROUP BY

    Technique,RowsCount;

    Untested.

  • Nice article with great statistics.

    Although in the real world, I would think that a table containing only distinct products would be available, and that on the SalesDetail table placing the unique clustured index on product, salesdate desc, and id would have been considered to keep all of the sales data for a particular product on the same area of disk with the most recent sales at the beginning.

Viewing 15 posts - 16 through 30 (of 98 total)

You must be logged in to reply to this topic. Login to reply