Returning the Top X row for each group

Question

Returning the Top X row for each group

Dave Ballantyne

SSC-Dedicated

Points: 33667
More actions
December 4, 2010 at 1:06 pm

#227855

Comments posted to this topic are about the item Returning the Top X row for each group

Clear Sky SQL
My Blog[/url]

Viewing 15 posts - 1 through 15 (of 45 total)

You must be logged in to reply to this topic. Login to reply

kay SSCommitted Points: 1518 More actions · Answer 1

A nice quest! I like the solution for its clever use of the index.

However, you depend on a table to deliver the lookup-values for the age-column (in this case master..spt_values ) and you need pre-execution knowledge about all the possible ages in your table (in this case 0...100).

Here's a suggestion:

select * from

(select distinct age as AgeGroup from #RunnersBig) S1

cross apply

(select top 2 * from #RunnersBig where age=agegroup order by time)S2

I suspect it is not as fast as your solution, as it scans the index for the "distinct ages". But it would not require any knowledge about the values in the "ages" column.

I wonder if there is a solution having both, a fast execution with index-seek on the one hand, and a general independence of the value distribution in the age-column.

Kay

Madhivanan-208264 SSCertifiable Points: 7516 More actions · Answer 2

More methods are available here

http://beyondrelational.com/blogs/madhivanan/archive/2008/09/12/return-top-n-rows.aspx

Madhivanan

Failing to plan is Planning to fail

Dave Ballantyne SSC-Dedicated Points: 33667 More actions · Answer 3

Kay ,

You are absolutely correct , any solution that relies upon a tally table has to have at least the required number of rows in the said tally table.

If you used a tally table to build all the dates for the next 10 years, you need to ensure that you have at least 3655 (ish) rows.

An alternative to your distinct method would be to grab the max(age) , which will involve reading a single row from the index. Something like this...

with cteTally

as

(

select number from master..spt_values where type = 'P' and number >0 and number <=(Select max(age) from #RunnersBig)

)

select *

from cteTally

cross apply

(

select top 2 * from #RunnersBig where age=number order by time

) as Winners

Clear Sky SQL
My Blog[/url]

darryl_ba Grasshopper Points: 13 More actions · Answer 4

would this exact code work using TOAD? Or will I get errors. Some keywords dont work using TOAD. Also, how would u sum() up the combined selected top row grouping.

Steve Jones - SSC Editor SSC Guru Points: 742701 More actions · Answer 5

Steve Jones - SSC Editor

SSC Guru

Points: 742701

December 6, 2010 at 6:53 am

#1258988

Great job and this is a reference I'll keep around.

RazvanS SSC Enthusiast Points: 115 More actions · Answer 6

Thank you for the article, very useful.

Is it safe to say that it only applies to SQL Server 2005 & 2008 but not to 2000?

rockvilleaustin SSC-Addicted Points: 412 More actions · Answer 7

I got this error message:

Msg 4108, Level 15, State 1, Line 3

Windowed functions can only appear in the SELECT or ORDER BY clauses.

when running the following codes:

select * ,row_number() over (partition by Age order by Time ) as RowN

from #Runners

where row_number() over (partition by Age order by Time ) <=2

order by Age,Rown

From what I can understand, row_number() cannot be used in where clause?

Dave Ballantyne SSC-Dedicated Points: 33667 More actions · Answer 8

@Mihai , Yes this is all 2005 (or greater)

@RockvilleAustin

That code was qualified with "In an ideal world we would be able to execute ..... However we cannot, so the currently suggested....."

So the error is expected.

Clear Sky SQL
My Blog[/url]

Idea Deadbeat SSC-Addicted Points: 416 More actions · Answer 9

Not directly related to article - but - an alternative to spt_values:

create table #Number (number int);

with N4000 as (select 0 as Number union all select Number+1 from N4000 where Number <4000

)insert into #number select * from N4000 option (MAXRECURSION 4000);

create index ix_N on #Number (Number);

with cteTally

as

(

select Number from #Number where number >0 and number <=(Select max(age) from #RunnersBig)

)

select *

from cteTally

cross apply

(

select top 2 * from #RunnersBig where age=number order by time

) as Winners

Alexander Kuznetsov SSCrazy Points: 2217 More actions · Answer 10

Dave,

I liked this article, thanks!

One more thing: this optimization relies on the assumptions that only a small percentage are winners. It might be interesting to research which query is faster if everyone is a winner, and what percentage of winners is the tipping point when both queries run for the same time.

What do you think?

rockvilleaustin SSC-Addicted Points: 412 More actions · Answer 11

Thanks, Dave!! My bad - I didn't read it carefully enough!

Dave Ballantyne SSC-Dedicated Points: 33667 More actions · Answer 12

@alex , Thanks glad you liked it.

The same thought had occurred to me but there complication would be that it would not entirely reproducible due to random data.

Maybe a follow up with published data is in order then after thats all proved decided and measured , i could break it all by adding another included column of dummy data 🙂

@rockvilleaustin , No worries

Clear Sky SQL
My Blog[/url]

mstjean Hall of Fame Points: 3310 More actions · Answer 13

...what about a tie?

INSERT INTO #Runners SELECT 9,10,20

Cursors are useful if you don't know SQL

Nicole Bowman SSCommitted Points: 1555 More actions · Answer 14

Thanks for the article Dave. It is a well explained example and easy to follow.

Cheers,

Nicole Bowman

Nothing is forever.