Returning the Top X row for each group

  • Dave Ballantyne

    SSC-Dedicated

    Points: 33556

    Comments posted to this topic are about the item Returning the Top X row for each group



    Clear Sky SQL
    My Blog[/url]

  • kay

    SSCommitted

    Points: 1517

    A nice quest! I like the solution for its clever use of the index.

    However, you depend on a table to deliver the lookup-values for the age-column (in this case master..spt_values ) and you need pre-execution knowledge about all the possible ages in your table (in this case 0...100).

    Here's a suggestion:

    select * from

    (select distinct age as AgeGroup from #RunnersBig) S1

    cross apply

    (select top 2 * from #RunnersBig where age=agegroup order by time)S2

    I suspect it is not as fast as your solution, as it scans the index for the "distinct ages". But it would not require any knowledge about the values in the "ages" column.

    I wonder if there is a solution having both, a fast execution with index-seek on the one hand, and a general independence of the value distribution in the age-column.

    Kay

  • Madhivanan-208264

    SSCertifiable

    Points: 7516

    More methods are available here

    http://beyondrelational.com/blogs/madhivanan/archive/2008/09/12/return-top-n-rows.aspx


    Madhivanan

    Failing to plan is Planning to fail

  • Dave Ballantyne

    SSC-Dedicated

    Points: 33556

    Kay ,

    You are absolutely correct , any solution that relies upon a tally table has to have at least the required number of rows in the said tally table.

    If you used a tally table to build all the dates for the next 10 years, you need to ensure that you have at least 3655 (ish) rows.

    An alternative to your distinct method would be to grab the max(age) , which will involve reading a single row from the index. Something like this...

    with cteTally

    as

    (

    select number from master..spt_values where type = 'P' and number >0 and number <=(Select max(age) from #RunnersBig)

    )

    select *

    from cteTally

    cross apply

    (

    select top 2 * from #RunnersBig where age=number order by time

    ) as Winners



    Clear Sky SQL
    My Blog[/url]

  • darryl_ba

    Grasshopper

    Points: 13

    would this exact code work using TOAD? Or will I get errors. Some keywords dont work using TOAD. Also, how would u sum() up the combined selected top row grouping.

  • Steve Jones - SSC Editor

    SSC Guru

    Points: 719752

    Great job and this is a reference I'll keep around.

  • RazvanS

    SSC Enthusiast

    Points: 115

    Thank you for the article, very useful.

    Is it safe to say that it only applies to SQL Server 2005 & 2008 but not to 2000?

  • rockvilleaustin

    SSC-Addicted

    Points: 412

    I got this error message:

    Msg 4108, Level 15, State 1, Line 3

    Windowed functions can only appear in the SELECT or ORDER BY clauses.

    when running the following codes:

    select * ,row_number() over (partition by Age order by Time ) as RowN

    from #Runners

    where row_number() over (partition by Age order by Time ) <=2

    order by Age,Rown

    From what I can understand, row_number() cannot be used in where clause?

  • Dave Ballantyne

    SSC-Dedicated

    Points: 33556

    @Mihai , Yes this is all 2005 (or greater)

    @rockvilleaustin

    That code was qualified with "In an ideal world we would be able to execute ..... However we cannot, so the currently suggested....."

    So the error is expected.



    Clear Sky SQL
    My Blog[/url]

  • Idea Deadbeat

    SSC-Addicted

    Points: 416

    Not directly related to article - but - an alternative to spt_values:

    create table #Number (number int);

    with N4000 as (select 0 as Number union all select Number+1 from N4000 where Number <4000

    )insert into #number select * from N4000 option (MAXRECURSION 4000);

    create index ix_N on #Number (Number);

    with cteTally

    as

    (

    select Number from #Number where number >0 and number <=(Select max(age) from #RunnersBig)

    )

    select *

    from cteTally

    cross apply

    (

    select top 2 * from #RunnersBig where age=number order by time

    ) as Winners

  • Alexander Kuznetsov

    SSCrazy

    Points: 2217

    Dave,

    I liked this article, thanks!

    One more thing: this optimization relies on the assumptions that only a small percentage are winners. It might be interesting to research which query is faster if everyone is a winner, and what percentage of winners is the tipping point when both queries run for the same time.

    What do you think?

  • rockvilleaustin

    SSC-Addicted

    Points: 412

    Thanks, Dave!! My bad - I didn't read it carefully enough!

  • Dave Ballantyne

    SSC-Dedicated

    Points: 33556

    @alex , Thanks glad you liked it.

    The same thought had occurred to me but there complication would be that it would not entirely reproducible due to random data.

    Maybe a follow up with published data is in order then after thats all proved decided and measured , i could break it all by adding another included column of dummy data 🙂

    @rockvilleaustin , No worries



    Clear Sky SQL
    My Blog[/url]

  • mstjean

    Hall of Fame

    Points: 3310

    ...what about a tie?

    INSERT INTO #Runners SELECT 9,10,20


    Cursors are useful if you don't know SQL

  • Nicole Bowman

    SSCommitted

    Points: 1555

    Thanks for the article Dave. It is a well explained example and easy to follow.

    Cheers,

    Nicole Bowman

    Nothing is forever.

Viewing 15 posts - 1 through 15 (of 34 total)

You must be logged in to reply to this topic. Login to reply