Most efficient method of getting min values for a set

Question

Post reply

Most efficient method of getting min values for a set

jamesniesewand

Grasshopper

Points: 21
More actions
November 1, 2010 at 11:01 am

#228465

Hi all,
I'm not exactly new to MS SQL, but am facing an optimisation issue that I could use some help on - and the best answers might, I hope, also be helpful to complete SQL Newbies as well.
My question is...
I have an "auction-style" application which contains multiple products being sold (with many duplicate products), and with many possible prices for these items in it, from different people selling the item.
I want to write the most efficient "single pass" query for getting the cheapest price for each item, but also identifying the rows which have identically matching cheapest prices, and returning them as well.
My old SQL "Obi-Wan" guru (sadly not available to ask question of nowadays) always told me to run away from HAVING and IN clauses, and it's proved good advice so far.
I know that answers to these "what works the best" questions depend on a wide variety of factors: but assume that the source Table Variable (@ExampleTable) below is a regular table containing many hundreds of thousands (if not millions) of rows, that I have full control over indexes etc, that the data will be changing *extremely* regularly, that it will be queried often, that there are many other tables hanging off the ExampleTableID field containing various other variables to be filtered on during the select, and that I'm on SQL 2008.
Here are 3 sample queries, illustrating three different ways of getting the same result set from the same base data - and I can think of many more ways than these three (involving CTEs etc).
Can anyone shed some light (given the above) on what the best method would be - and please don't limit yourself to the 3 versions below; I'm absolutely sure that none of them is the most efficient method, but I am really not sure what would be!
Many thanks in advance.
J
declare @ExampleTable table
(
ExampleTableID int identity,
ExampleProduct varchar(50),
ExampleValue int
)
insert into @ExampleTable (ExampleProduct,ExampleValue)
select 'Product 1', 50
insert into @ExampleTable (ExampleProduct,ExampleValue)
select 'Product 1', 120
insert into @ExampleTable (ExampleProduct,ExampleValue)
select 'Product 2', 25
insert into @ExampleTable (ExampleProduct,ExampleValue)
select 'Product 2', 36
insert into @ExampleTable (ExampleProduct,ExampleValue)
select 'Product 3', 14
insert into @ExampleTable (ExampleProduct,ExampleValue)
select 'Product 3', 129
insert into @ExampleTable (ExampleProduct,ExampleValue)
select 'Product 3', 13
insert into @ExampleTable (ExampleProduct,ExampleValue)
select 'Product 2', 25
-- Method 1 of getting the rowset containing the Smallest Value of each Product, using HAVING
select
et.ExampleTableID,
et.ExampleProduct,
et.ExampleValue as SmallestValueForExampleProduct
from @ExampleTable et
group by
et.ExampleTableID,
et.ExampleProduct,
et.ExampleValue
having et.ExampleValue in
(
select MIN(ExampleValue) from @ExampleTable et2
where et.ExampleProduct = et2.ExampleProduct
)
order by et.ExampleProduct asc
-- Method 2 of getting the rowset containing the Smallest Value of each Product, using IN
select
et.ExampleTableID,
et.ExampleProduct,
et.ExampleValue as SmallestValueForExampleProduct
from @ExampleTable et
where et.ExampleValue in (select MIN(et2.ExampleValue) from @ExampleTable et2
where et.ExampleProduct = et2.ExampleProduct)
group by
et.ExampleTableID,
et.ExampleProduct,
et.ExampleValue
order by et.ExampleProduct asc
-- Method 3, separation of smallest values into smaller table and then run from there
declare @SmallestValues table
(
ExampleProduct varchar(50),
ExampleValue int
)
insert into @SmallestValues (ExampleProduct, ExampleValue)
select et.ExampleProduct, MIN(et.ExampleValue)
from @ExampleTable et
group by et.ExampleProduct
select
et.ExampleTableID,
et.ExampleProduct,
et.ExampleValue as SmallestValueForExampleProduct
from @ExampleTable et
where exists (select * from @SmallestValues sv
wheresv.ExampleProduct = et.ExampleProduct
andsv.ExampleValue = et.ExampleValue)
order by et.ExampleProduct asc

Viewing 15 posts - 1 through 15 (of 18 total)

You must be logged in to reply to this topic. Login to reply

WayneS SSC Guru Points: 95461 More actions · Answer 1

;WITH CTE AS

(

SELECT t.ExampleTableID,

t.ExampleProduct,

t.ExampleValue,

RN = RANK() OVER (PARTITION BY ExampleProduct ORDER BY ExampleValue)

FROM @ExampleTable t

)

SELECT *

FROM CTE

WHERE RN = 1;

This article discusses the RANK() and other Windowing Functions: SQL Server Ranking Functions[/url]

Wayne
Microsoft Certified Master: SQL Server 2008
Author - SQL Server T-SQL Recipes

If you can't explain to another person how the code that you're copying from the internet works, then DON'T USE IT on a production system! After all, you will be the one supporting it!

Links:
For better assistance in answering your questions
Performance Problems
Common date/time routines
Understanding and Using APPLY Part 1 & Part 2

jamesniesewand Grasshopper Points: 21 More actions · Answer 2

WayneS (11/1/2010)
;WITH CTE AS ( SELECT t.ExampleTableID, t.ExampleProduct, t.ExampleValue, RN = RANK() OVER (PARTITION BY ExampleProduct ORDER BY ExampleValue) FROM @ExampleTable t ) SELECT * FROM CTE
WHERE RN = 1;
This article discusses the RANK() and other Windowing Functions: SQL Server Ranking Functions[/url]

Hi Wayne,

That's exactly what I suspected I was missing - and thank you muchly for your reply.

Mods, please feel free to close this thread now.

Rgds,

James N

Evil Kraig F SSC Guru Points: 100851 More actions · Answer 3

James,

First, nice work on the sampleset and work shown. Helped a lot.

I usually use a variant on your Method 2 and 3, along these lines:

select

et.ExampleTableID,

et.ExampleProduct,

et.ExampleValue as SmallestValueForExampleProduct

from

@ExampleTable et

JOIN

(select

ExampleProduct,

MIN(ExampleValue) AS MinVal

from

@ExampleTable

GROUP BY

ExampleProduct

) AS drv

ONet.ExampleProduct = drv.ExampleProduct

AND et.ExampleValue = drv.MinVal

order by

et.ExampleProduct

This will depend on your indexing and whatnot, of course, but with no indexes involved and examining the execution plans, my version does less work than the rest of the options. Combine this with method 3 for a temp table (or tablevar) to relink into a more complex query to get better optimization out of it if needed. I would only do that for roughly 2000+ rows in the grouped min() with 10-20k in the external after a bit of testing if I wasn't happy with the speed.

EDIT: I cross posted with Wayne. 🙂 I took his code and checked the plans against mine. Unindexed, and with this tiny dataset, they're about even in work but his has a few more steps including a compute scalar. I'd try both out and see what results you get with the larger dataset.

- Craig Farrell

Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

For better assistance in answering your questions[/url] | Forum Netiquette
For index/tuning help, follow these directions.[/url] |Tally Tables[/url]
Twitter: @AnyWayDBA

jamesniesewand Grasshopper Points: 21 More actions · Answer 4

Craig Farrell (11/1/2010)
James,
First, nice work on the sampleset and work shown. Helped a lot.
I usually use a variant on your Method 2 and 3, along these lines:
select
et.ExampleTableID,
et.ExampleProduct,
et.ExampleValue as SmallestValueForExampleProduct
from
@ExampleTable et
JOIN
(select
ExampleProduct,
MIN(ExampleValue) AS MinVal
from
@ExampleTable
GROUP BY
ExampleProduct
) AS drv
ONet.ExampleProduct = drv.ExampleProduct
AND et.ExampleValue = drv.MinVal
order by
et.ExampleProduct
This will depend on your indexing and whatnot, of course, but with no indexes involved and examining the execution plans, my version does less work than the rest of the options. Combine this with method 3 for a temp table (or tablevar) to relink into a more complex query to get better optimization out of it if needed. I would only do that for roughly 2000+ rows in the grouped min() with 10-20k in the external after a bit of testing if I wasn't happy with the speed.
EDIT: I cross posted with Wayne. 🙂 I took his code and checked the plans against mine. Unindexed, and with this tiny dataset, they're about even in work but his has a few more steps including a compute scalar. I'd try both out and see what results you get with the larger dataset.

Hi Craig,

Heh - I crossposted with yours too 🙂

Personally I felt I was missing something with CTEs, and that "felt" like a Eureka moment for me (I've only just moved up to 2008 from 2000, and CTEs have been a bit of an "omg" moment - as have the geospatial datatypes and HeirarchyIDs) - but I'll definitely check both methods against each other on the actual data and see where I get with what I've got.

Many thanks to you both.

Rgds,

JamesN

Evil Kraig F SSC Guru Points: 100851 More actions · Answer 5

jamesniesewand (11/1/2010)
Personally I felt I was missing something with CTEs, and that "felt" like a Eureka moment for me (I've only just moved up to 2008 from 2000, and CTEs have been a bit of an "omg" moment - as have the geospatial datatypes and HeirarchyIDs) - but I'll definitely check both methods against each other on the actual data and see where I get with what I've got.

My personal omg moment with ctes was the recursive nature. Everything else is just a pretty subquery organizer to me, but those recursions... those are the true power! (Well, in my head, anyway...)

- Craig Farrell

Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

For better assistance in answering your questions[/url] | Forum Netiquette
For index/tuning help, follow these directions.[/url] |Tally Tables[/url]
Twitter: @AnyWayDBA

WayneS SSC Guru Points: 95461 More actions · Answer 6

jamesniesewand (11/1/2010)
Hi Wayne,
That's exactly what I suspected I was missing - and thank you muchly for your reply.

Great, I'm glad it's working right. Thanks for posting the DML/DDL to produce what you're trying to do.

Mods, please feel free to close this thread now.

We don't do that around here... there are frequently some awesome discussions that evolve after there's been one solution - and sometimes even better solutions come forth.

Wayne
Microsoft Certified Master: SQL Server 2008
Author - SQL Server T-SQL Recipes

If you can't explain to another person how the code that you're copying from the internet works, then DON'T USE IT on a production system! After all, you will be the one supporting it!

Links:
For better assistance in answering your questions
Performance Problems
Common date/time routines
Understanding and Using APPLY Part 1 & Part 2

WayneS SSC Guru Points: 95461 More actions · Answer 7

Craig Farrell (11/1/2010)
EDIT: I cross posted with Wayne. 🙂 I took his code and checked the plans against mine. Unindexed, and with this tiny dataset, they're about even in work but his has a few more steps including a compute scalar. I'd try both out and see what results you get with the larger dataset.

The compute scalar is the "1" being used in my plan.

The filter is the "RN = 1".

Your solution performs very well also... I also would really like to see some performance tests on this. Guess it's time to make a million row test rig, and test them.... stay tuned.

Wayne
Microsoft Certified Master: SQL Server 2008
Author - SQL Server T-SQL Recipes

If you can't explain to another person how the code that you're copying from the internet works, then DON'T USE IT on a production system! After all, you will be the one supporting it!

Links:
For better assistance in answering your questions
Performance Problems
Common date/time routines
Understanding and Using APPLY Part 1 & Part 2

WayneS SSC Guru Points: 95461 More actions · Answer 8

Craig Farrell (11/1/2010)
jamesniesewand (11/1/2010)
Personally I felt I was missing something with CTEs, and that "felt" like a Eureka moment for me (I've only just moved up to 2008 from 2000, and CTEs have been a bit of an "omg" moment - as have the geospatial datatypes and HeirarchyIDs) - but I'll definitely check both methods against each other on the actual data and see where I get with what I've got.
My personal omg moment with ctes was the recursive nature. Everything else is just a pretty subquery organizer to me, but those recursions... those are the true power! (Well, in my head, anyway...)

Agreed Craig. With the exception of the recursive CTEs, I tend to think of a CTE as a "pre-defined sub-query". As such, when using them, it makes it obvious how to change your query when working with someone on Sql2000. I just find a CTE a lot easier to test than sub-queries.

Wayne
Microsoft Certified Master: SQL Server 2008
Author - SQL Server T-SQL Recipes

If you can't explain to another person how the code that you're copying from the internet works, then DON'T USE IT on a production system! After all, you will be the one supporting it!

Links:
For better assistance in answering your questions
Performance Problems
Common date/time routines
Understanding and Using APPLY Part 1 & Part 2

WayneS SSC Guru Points: 95461 More actions · Answer 9

Okay, I made a dataset with ten-million records (over 5000 products), then ran both code:

Test harness / code is ~~attached (trouble posting again)~~ below:

if OBJECT_ID('tempdb..#test') IS NOT NULL DROP TABLE #test;

CREATE TABLE #test

(

ExampleTableID int identity,

ExampleProduct varchar(50),

ExampleValue int

)

-- See Jeff Moden's article

-- The "Numbers" or "Tally" Table: What it is and how it replaces a loop.

-- at http://www.sqlservercentral.com/articles/T-SQL/62867/.

-- NOTE! A permanent tally table will always be MUCH faster

-- than this inline one. See the above article to create your own!

;WITH

Tens (N) AS (SELECT 0 UNION ALL SELECT 0 UNION ALL SELECT 0 UNION ALL

SELECT 0 UNION ALL SELECT 0 UNION ALL SELECT 0 UNION ALL

SELECT 0 UNION ALL SELECT 0 UNION ALL SELECT 0 UNION ALL SELECT 0),

Thousands(N) AS (SELECT 1 FROM Tens t1 CROSS JOIN Tens t2 CROSS JOIN Tens t3),

Millions (N) AS (SELECT 1 FROM Thousands t1 CROSS JOIN Thousands t2),

Tally (N) AS (SELECT ROW_NUMBER() OVER (ORDER BY (SELECT 0)) FROM Millions),

Products AS (SELECT Product = 'Product ' + CONVERT(varchar(5), N) FROM Tally WHERE N <= 5000)

INSERT INTO #test

SELECT TOP (10000000) p.Product, value = ABS(CHECKSUM(NEWID()))%100+1

FROM Tally t CROSS JOIN Products p;

DROP INDEX #test.ProductValue;

CREATE INDEX ProductValue ON #test (ExampleProduct, ExampleValue) INCLUDE (ExampleTableID);

declare @i int = 1;

SET STATISTICS IO,TIME ON;

;WITH CTE AS

(

SELECT t.ExampleTableID,

t.ExampleProduct,

t.ExampleValue,

RN = RANK() OVER (PARTITION BY ExampleProduct ORDER BY ExampleValue)

FROM #test t

)

SELECT *

FROM CTE

WHERE RN = @i;

select

et.ExampleTableID,

et.ExampleProduct,

et.ExampleValue as SmallestValueForExampleProduct

from

#test et

JOIN

(select

ExampleProduct,

MIN(ExampleValue) AS MinVal

from

#test

GROUP BY

ExampleProduct

) AS drv

ON et.ExampleProduct = drv.ExampleProduct

AND et.ExampleValue = drv.MinVal

order by

et.ExampleProduct ;

SET STATISTICS IO,TIME OFF;

Results:

Wayne's Code:

Table '#test_00000000000C'. Scan count 1, logical reads 44024, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:

CPU time = 12079 ms, elapsed time = 15344 ms.

Craig's Code:

Table '#test_00000000000C'. Scan count 14, logical reads 44795, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:

CPU time = 6594 ms, elapsed time = 6702 ms.

Craig, I guess you whopped up on me! Only 40% of the time of mine... very good.

Wayne
Microsoft Certified Master: SQL Server 2008
Author - SQL Server T-SQL Recipes

If you can't explain to another person how the code that you're copying from the internet works, then DON'T USE IT on a production system! After all, you will be the one supporting it!

Links:
For better assistance in answering your questions
Performance Problems
Common date/time routines
Understanding and Using APPLY Part 1 & Part 2

jamesniesewand Grasshopper Points: 21 More actions · Answer 10

WayneS (11/1/2010)
Great, I'm glad it's working right. Thanks for posting the DML/DDL to produce what you're trying to do.
Mods, please feel free to close this thread now.
We don't do that around here... there are frequently some awesome discussions that evolve after there's been one solution - and sometimes even better solutions come forth.

Good Lord - you people are the best 😎

I mean seriously - on almost any forum I can mention, this topic would have been closed at my request (or ignored by everyone else).

The responses posted, and the speed (not just of the queries, but of the postings, testbeds as well!) has been a breathtaking first welcome for me to this forum and the expertise herein.

If there isn't a :bowdown: smiley there should be - can't thank you guys enough.

Best,

JamesN

Evil Kraig F SSC Guru Points: 100851 More actions · Answer 11

WayneS (11/1/2010)
Craig, I guess you whopped up on me! Only 40% of the time of mine... very good.

Well, every now and then I get lucky. :w00t: Honestly though, my code beating out yours makes me look closer at why. 🙂 I took your code and ran it and checked through the results.

Couple of notes.

1) DECLARE @i INT = 1; doesn't work in 2k5, had to change it to DECLARE @i INT; SET @i = 1;

2) You've got a better machine then I do. 😛

Table '#test_______________________________________________________________________________________________________________000000000010'. Scan count 1, logical reads 44143, physical reads 0, read-ahead reads 8, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

(1 row(s) affected)

SQL Server Execution Times:

CPU time = 6688 ms, elapsed time = 39886 ms.

(99667 row(s) affected)

Table '#test_______________________________________________________________________________________________________________000000000010'. Scan count 14, logical reads 48953, physical reads 0, read-ahead reads 5, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

(1 row(s) affected)

SQL Server Execution Times:

CPU time = 3282 ms, elapsed time = 16956 ms.

Of particular curiousity to me though is while I had more elapsed time, I used less CPU. About half. Very odd.

3) A review of the execution plans shows that I'm only pulling 22 rows out during the aggregation portion. I'd have expected 5k rows from the product method you used:

Products AS (SELECT Product = 'Product ' + CONVERT(varchar(5), N) FROM Tally WHERE N <= 5000)

/*AND*/

Tally t CROSS JOIN Products p

However:

select count( DISTINCT ExampleProduct) FROM #test

= 11.

select DISTINCT ExampleProduct FROM #test

Product 1

Product 11

Product 13

Product 15

Product 17

Product 19

Product 2

Product 3

Product 5

Product 7

Product 9

What am I missing in how that result set appears? Seems very strange to me.

I believe, though, that the entire speed difference is the volume of data being carried. If we figure out why the product list out of a million rows is only producing 11 distinct products and we bring this up to the 5000 products you'd expect, we will see closer runtimes, I believe.

EDIT: AHAHA! It's from the top limiter vs. the crossjoin. Need to get that to behave itself. Get back to you with slightly modified code there.

- Craig Farrell

Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

For better assistance in answering your questions[/url] | Forum Netiquette
For index/tuning help, follow these directions.[/url] |Tally Tables[/url]
Twitter: @AnyWayDBA

WayneS SSC Guru Points: 95461 More actions · Answer 12

Craig Farrell (11/1/2010)
WayneS (11/1/2010)
Craig, I guess you whopped up on me! Only 40% of the time of mine... very good.
Well, every now and then I get lucky. :w00t: Honestly though, my code beating out yours makes me look closer at why. 🙂 I took your code and ran it and checked through the results.
Couple of notes.
1) DECLARE @i INT = 1; doesn't work in 2k5, had to change it to DECLARE @i INT; SET @i = 1;

Whoops... I tried that afterwards - didn't help any.

2) You've got a better machine then I do. 😛

Shhh! It's an old, sucky Dell Latitude D620. (I'd love to run this on my laptop at home!)

Table '#test_______________________________________________________________________________________________________________000000000010'. Scan count 1, logical reads 44143, physical reads 0, read-ahead reads 8, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
(1 row(s) affected)
SQL Server Execution Times:
CPU time = 6688 ms, elapsed time = 39886 ms.
(99667 row(s) affected)
Table '#test_______________________________________________________________________________________________________________000000000010'. Scan count 14, logical reads 48953, physical reads 0, read-ahead reads 5, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
(1 row(s) affected)
SQL Server Execution Times:
CPU time = 3282 ms, elapsed time = 16956 ms.
Of particular curiousity to me though is while I had more elapsed time, I used less CPU. About half. Very odd.

Yours has the worktable (and more reads???). So yours is half the elapsed time also.

3) A review of the execution plans shows that I'm only pulling 22 rows out during the aggregation portion. I'd have expected 5k rows from the product method you used:

Products AS (SELECT Product = 'Product ' + CONVERT(varchar(5), N) FROM Tally WHERE N <= 5000)

/*AND*/

Tally t CROSS JOIN Products p

However:

select count( DISTINCT ExampleProduct) FROM #test

= 11.

select DISTINCT ExampleProduct FROM #test

Product 1

Product 11

Product 13

Product 15

Product 17

Product 19

Product 2

Product 3

Product 5

Product 7

Product 9

What am I missing in how that result set appears? Seems very strange to me.[/quote]

Indeed. I've got no clue for that - maybe your system is overtaxed?

I believe, though, that the entire speed difference is the volume of data being carried. If we figure out why the product list out of a million rows is only producing 11 distinct products and we bring this up to the 5000 products you'd expect, we will see closer runtimes, I believe.
EDIT: AHAHA! It's from the top limiter vs. the crossjoin. Need to get that to behave itself. Get back to you with slightly modified code there.

Okay, let's see how it looks now.

Wayne
Microsoft Certified Master: SQL Server 2008
Author - SQL Server T-SQL Recipes

If you can't explain to another person how the code that you're copying from the internet works, then DON'T USE IT on a production system! After all, you will be the one supporting it!

Links:
For better assistance in answering your questions
Performance Problems
Common date/time routines
Understanding and Using APPLY Part 1 & Part 2

WayneS SSC Guru Points: 95461 More actions · Answer 13

jamesniesewand (11/1/2010)
Good Lord - you people are the best 😎
I mean seriously - on almost any forum I can mention, this topic would have been closed at my request (or ignored by everyone else).
The responses posted, and the speed (not just of the queries, but of the postings, testbeds as well!) has been a breathtaking first welcome for me to this forum and the expertise herein.
If there isn't a :bowdown: smiley there should be - can't thank you guys enough.
Best,
JamesN

James,

:blush:Thanks for the very kind words.

But, get used to it around here. Especially when you take the time to post table DDL and DML in readily consumable format - lot's of people "lurk", and will work on things if there is code ready to work with. If not, they'll just skip it.

You'll find that most of the regulars here are always interested in better ways to accomplish things. This is not an attempt to show up others, but a serious approach of learning for all.

And finally... WELCOME!

Wayne
Microsoft Certified Master: SQL Server 2008
Author - SQL Server T-SQL Recipes

If you can't explain to another person how the code that you're copying from the internet works, then DON'T USE IT on a production system! After all, you will be the one supporting it!

Links:
For better assistance in answering your questions
Performance Problems
Common date/time routines
Understanding and Using APPLY Part 1 & Part 2

Evil Kraig F SSC Guru Points: 100851 More actions · Answer 14

James, thank you for the kindness. It's appreciated... especially recently.

Hahah, okay, now that the selectivity is frying the 0.3% rule (1 day in a year, as a rule of thumb, James, since you're still following...) We're still seeing similar times but the execution plans are VERY different.

And... strangely... faster.

Modify the beginning code as so (I have a real Tally table in my tempdb. 🙂 ):

TRUNCATE TABLE #test

;WITH Products AS (SELECT Product = 'Product ' + CONVERT(varchar(5), N) FROM Tally WHERE N <= 5000)

INSERT INTO #test

SELECT TOP (10000000) p.Product, value = ABS(CHECKSUM(NEWID()))%100+1

FROM Tally t CROSS JOIN Products p

WHERE t.N < 200;

This gives us 995,000 rows (close enough), but 200/product, and a proper count on the # of products for selectivity.

Now my results look like this:

SQL Server Execution Times:

CPU time = 0 ms, elapsed time = 1 ms.

(11429 row(s) affected)

Table '#test_______________________________________________________________________________________________________________000000000010'. Scan count 1, logical reads 4674, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

(1 row(s) affected)

SQL Server Execution Times:

CPU time = 844 ms, elapsed time = 5467 ms.

(11429 row(s) affected)

Table '#test_______________________________________________________________________________________________________________000000000010'. Scan count 1, logical reads 4674, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

(1 row(s) affected)

SQL Server Execution Times:

CPU time = 360 ms, elapsed time = 2563 ms.

That's some huge differences, in both our timing tests... same box. Yes. No, honestly. That's the same box.

The execution plans have changed, well, at least mine has. Wayne's code didn't change much execution plan wise, so I'm highly confused as to the results of the timing test.

I've attached the resultant sqlplans. That filter is apparently incredibly expensive, Wayne. I had figured the larger Segment would be worse. We end up with the same reads, too.

Sometimes this optimizer seriously confuses me. 🙂

- Craig Farrell

Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

For better assistance in answering your questions[/url] | Forum Netiquette
For index/tuning help, follow these directions.[/url] |Tally Tables[/url]
Twitter: @AnyWayDBA