T-SQL: Why “It Depends”

Question

T-SQL: Why “It Depends”

Viewing 15 posts - 16 through 30 (of 98 total)

You must be logged in to reply to this topic. Login to reply

Jeff Moden SSC Guru Points: 1003864 More actions · Answer 1

The Dixie Flatline (3/16/2010)
Jeff: Simply thanks. I've told you before at length how much I owe you. It's good not to be savaged with a pork chop first thing in the morning. 😉

BWAA-HAAA!!!! Well, "It Depends" on whether it's a smoked pork chop or not and whether there's a bit of apple-sauce to go with it. 🙂 Mmmmmm... pork chops.... gaaahhhh....

It's difficult to convince new and even some intermediate users that SQL Server has a proverbial "mind of its own" called the "optimizer" and that what works very well for a thousand rows may actually not be "scalable". I think this article demonstrates that quite nicely. Very well done (and I'm not talking pork chops :-)), Bob.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

GSquared SSC Guru Points: 260824 More actions · Answer 2

Just to chime in, great article.

This kind of thing is why performance tuning isn't "one and done", but is an ongoing battle. Good illustration on that.

- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread

"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

Greg Edwards-268690 SSC-Insane Points: 20676 More actions · Answer 3

Well done.

Great layout of examples to test several scenarios to show why you want to test with some different volumes.

I can see this as being pointed to often.

Greg E

noeld SSC Guru Points: 96590 More actions · Answer 4

I don't tend to praise a lot of articles, this one has my 5 stars!

Great Job.

* Noel

TedS Valued Member Points: 51 More actions · Answer 5

Am I the only one who copied the code directly and got 3 different result sets for each query?

Granted I used RedGate to generate 500,000 rows in the example table, but I got 699 Rows with the first example, 101 Rows with the 2nd and 102 rows with the 3rd.

I limited the generation of data to have 100 products and dates to reside in 2 years.

Alvin Ramard SSC-Forever Points: 41190 More actions · Answer 6

Well done Bob.

You're moving up in the world. First a presentation to the Memphis PASS Chapter on SQL Server Spatial Data, now this article. Next step, book. So when's the book coming out? 🙂

Alvin Ramard
Memphis PASS Chapter[/url]

All my SSC forum answers come with a money back guarantee. If you didn't like the answer then I'll gladly refund what you paid for it.

For best practices on asking questions, please read the following article: Forum Etiquette: How to post data/code on a forum to get the best help[/url]

Mike Dougherty Ten Centuries Points: 1112 More actions · Answer 7

Not only does this article provide a concise explanation of "It depends" but it also provides a good example of how articles should be written for SQLServerCentral. (well written, to the point, contains easily-consumed example scripts & charts)

Thanks!

MelS-512196 SSC Eights! Points: 842 More actions · Answer 8

Thank you for a very clear and useful article. It provides a good reference to bring to some of those difficult discussions (you know the ones). Now I can start with, "What conditions did you test it under?" and give them a link (or a clue) 🙂

Greg Edwards-268690 SSC-Insane Points: 20676 More actions · Answer 9

TedS (3/16/2010)
Am I the only one who copied the code directly and got 3 different result sets for each query?
Granted I used RedGate to generate 500,000 rows in the example table, but I got 699 Rows with the first example, 101 Rows with the 2nd and 102 rows with the 3rd.
I limited the generation of data to have 100 products and dates to reside in 2 years.

Please go back and look at Paul's comment on the format of the code and see if that explains your issue.

Greg E

Roy Ernest SSC-Dedicated Points: 38780 More actions · Answer 10

Roy Ernest

SSC-Dedicated

Points: 38780

March 16, 2010 at 8:14 am

#1134459

Good Job Bob... 🙂

-Roy

Owen White Ten Centuries Points: 1250 More actions · Answer 11

I really appreciate the depth on such a common topic. This is a simple scenario with great depth of information that should help any newcomer understand the nuances of databases and that number of records along with indexes can have a significant impact on performance. Thank you and I actually learned from this article after many years of database development.

Owen White

TedS Valued Member Points: 51 More actions · Answer 12

Greg Edwards-268690 (3/16/2010)
TedS (3/16/2010)
Am I the only one who copied the code directly and got 3 different result sets for each query?
Granted I used RedGate to generate 500,000 rows in the example table, but I got 699 Rows with the first example, 101 Rows with the 2nd and 102 rows with the 3rd.
I limited the generation of data to have 100 products and dates to reside in 2 years.
Please go back and look at Paul's comment on the format of the code and see if that explains your issue.
Greg E

I did, it doesn't.

My generated data should work just fine, the results of the 3 queries (both seperate and together) are simply different.

While I do appreciate the effort and agree with the premise that there isn't a one size fits all soloution (it does really depend) after I fix the (formatting related) syntax errors the results aren't the same.

Garadin One Orange Chip Points: 29613 More actions · Answer 13

Great article Bob. It couldn't have been easy to find examples that lined up so nicely to demonstrate your point. I'm guessing there are quite a few hours of work in that piece.

Seth Phelabaum

Consistency is only a virtue if you're not a screwup. 😉

Links: How to Post Sample Data[/url] :: Running Totals[/url] :: Tally Table[/url] :: Cross Tabs/Pivots[/url] :: String Concatenation[/url]

Paul White SSC Guru Points: 150467 More actions · Answer 14

TedS (3/16/2010)
I did, it doesn't. My generated data should work just fine, the results of the 3 queries (both seperate and together) are simply different. While I do appreciate the effort and agree with the premise that there isn't a one size fits all soloution (it does really depend) after I fix the (formatting related) syntax errors the results aren't the same.

I think the problem is in CTE1:

SELECT TOP(@rows) *

FROM dbo.SalesDetail

...add an ORDER BY ID to each CTE1 and it should produce the same results.

My formatted version of the code for anyone interested (with ORDER BY 'fix'):

------------------------------------------------------------

-- Testing All Three Techniques

------------------------------------------------------------

DECLARE @results

TABLE (

Technique VARCHAR(20) NOT NULL,

RowsCount INTEGER NOT NULL,

Millisecs INTEGER NOT NULL

);

DECLARE @timer DATETIME;

DECLARE @rows INTEGER;

DECLARE @loop INTEGER;

SET @rows = 1000; -- will be incremented X10 up to a million

WHILE @rows <= 1000000

BEGIN -- OUTER LOOP based on number of rows

SET @loop = 1;

WHILE @loop <= 50; -- INNER LOOP based on count to 10

BEGIN

------------------------------------------------------------

-- Join Detail to MAX([SaleDate])

------------------------------------------------------------

SET @timer = GETDATE();

WITH CTE1

AS (

SELECT TOP(@rows) *

FROM dbo.SalesDetail

ORDER BY ID

),

CTE2

AS (

SELECT product,

MAX(SaleDate) AS LastDate

FROM cte1

GROUP BY

product

)

SELECT S.*

INTO #results1

FROM dbo.SalesDetail S

JOIN cte2 C

ON C.product = S.product

AND S.SaleDate = C.lastDate;

INSERT @results

SELECT 'Match MAX()',

@rows,

DATEDIFF(MILLISECOND, @timer, GETDATE());

------------------------------------------------------------

-- SELECT TOP 1 for each product by SaleDate desc

------------------------------------------------------------

SET @timer = GETDATE();

WITH CTE1

AS (

SELECT TOP(@rows) *

FROM dbo.SalesDetail

ORDER BY ID

),

CTE2

AS (

SELECT DISTINCT product

FROM CTE1

)

SELECT ca.*

INTO #results2

FROM CTE2 C

CROSS

APPLY (

SELECT TOP (1) *

FROM dbo.SalesDetail S

WHERE S.product = C.product

ORDER BY

SaleDate DESC

) CA;

INSERT @results

SELECT 'Top (1)',

@rows,

DATEDIFF(MILLISECOND, @timer, GETDATE());

--------------------------------------------------------------------

-- use ROW_NUMBER() function to assign sequence IDs to rows

--------------------------------------------------------------------

SET @timer = GETDATE();

WITH CTE1

AS (

SELECT TOP(@rows) *

FROM dbo.SalesDetail

ORDER BY ID

),

CTE2

AS (

SELECT *,

ROW_NUMBER() OVER(

PARTITION BY product

ORDER BY SaleDate DESC)

AS seqID

FROM CTE1

)

SELECT *

INTO #results3

FROM CTE2

WHERE seqID = 1;

INSERT @results

SELECT 'Row_Number()',

@rows,

DATEDIFF(MILLISECOND, @timer, GETDATE())

-- note: prior to adding loops, the queries were tested to ensure

-- they were producing identical result sets

DROP TABLE #results1;

DROP TABLE #results2;

DROP TABLE #results3 ;

SET @loop = @loop + 1;

END; -- INNER LOOP

SET @rows = @rows * 10;

END; -- INNER LOOP

----------------------------------------------------------

SELECT Technique,

RowsCount,

AVG(1.0 * Millisecs) AS AvgMS

FROM @results

GROUP BY

Technique,RowsCount;

Untested.

Paul White
SQLPerformance.com
SQLkiwi blog
@SQL_Kiwi

tamie.mcdonald SSC Veteran Points: 227 More actions · Answer 15

Nice article with great statistics.

Although in the real world, I would think that a table containing only distinct products would be available, and that on the SalesDetail table placing the unique clustured index on product, salesdate desc, and id would have been considered to keep all of the sales data for a particular product on the same area of disk with the most recent sales at the beginning.