Returning the Top X row for each group

  • Comments posted to this topic are about the item Returning the Top X row for each group



    Clear Sky SQL
    My Blog[/url]

  • A nice quest! I like the solution for its clever use of the index.

    However, you depend on a table to deliver the lookup-values for the age-column (in this case master..spt_values ) and you need pre-execution knowledge about all the possible ages in your table (in this case 0...100).

    Here's a suggestion:

    select * from

    (select distinct age as AgeGroup from #RunnersBig) S1

    cross apply

    (select top 2 * from #RunnersBig where age=agegroup order by time)S2

    I suspect it is not as fast as your solution, as it scans the index for the "distinct ages". But it would not require any knowledge about the values in the "ages" column.

    I wonder if there is a solution having both, a fast execution with index-seek on the one hand, and a general independence of the value distribution in the age-column.

    Kay

  • More methods are available here

    http://beyondrelational.com/blogs/madhivanan/archive/2008/09/12/return-top-n-rows.aspx


    Madhivanan

    Failing to plan is Planning to fail

  • Kay ,

    You are absolutely correct , any solution that relies upon a tally table has to have at least the required number of rows in the said tally table.

    If you used a tally table to build all the dates for the next 10 years, you need to ensure that you have at least 3655 (ish) rows.

    An alternative to your distinct method would be to grab the max(age) , which will involve reading a single row from the index. Something like this...

    with cteTally

    as

    (

    select number from master..spt_values where type = 'P' and number >0 and number <=(Select max(age) from #RunnersBig)

    )

    select *

    from cteTally

    cross apply

    (

    select top 2 * from #RunnersBig where age=number order by time

    ) as Winners



    Clear Sky SQL
    My Blog[/url]

  • would this exact code work using TOAD? Or will I get errors. Some keywords dont work using TOAD. Also, how would u sum() up the combined selected top row grouping.

  • Great job and this is a reference I'll keep around.

  • Thank you for the article, very useful.

    Is it safe to say that it only applies to SQL Server 2005 & 2008 but not to 2000?

  • I got this error message:

    Msg 4108, Level 15, State 1, Line 3

    Windowed functions can only appear in the SELECT or ORDER BY clauses.

    when running the following codes:

    select * ,row_number() over (partition by Age order by Time ) as RowN

    from #Runners

    where row_number() over (partition by Age order by Time ) <=2

    order by Age,Rown

    From what I can understand, row_number() cannot be used in where clause?

  • @Mihai , Yes this is all 2005 (or greater)

    @rockvilleaustin

    That code was qualified with "In an ideal world we would be able to execute ..... However we cannot, so the currently suggested....."

    So the error is expected.



    Clear Sky SQL
    My Blog[/url]

  • Not directly related to article - but - an alternative to spt_values:

    create table #Number (number int);

    with N4000 as (select 0 as Number union all select Number+1 from N4000 where Number <4000

    )insert into #number select * from N4000 option (MAXRECURSION 4000);

    create index ix_N on #Number (Number);

    with cteTally

    as

    (

    select Number from #Number where number >0 and number <=(Select max(age) from #RunnersBig)

    )

    select *

    from cteTally

    cross apply

    (

    select top 2 * from #RunnersBig where age=number order by time

    ) as Winners

  • Dave,

    I liked this article, thanks!

    One more thing: this optimization relies on the assumptions that only a small percentage are winners. It might be interesting to research which query is faster if everyone is a winner, and what percentage of winners is the tipping point when both queries run for the same time.

    What do you think?

  • Thanks, Dave!! My bad - I didn't read it carefully enough!

  • @alex , Thanks glad you liked it.

    The same thought had occurred to me but there complication would be that it would not entirely reproducible due to random data.

    Maybe a follow up with published data is in order then after thats all proved decided and measured , i could break it all by adding another included column of dummy data 🙂

    @rockvilleaustin , No worries



    Clear Sky SQL
    My Blog[/url]

  • ...what about a tie?

    INSERT INTO #Runners SELECT 9,10,20


    Cursors are useful if you don't know SQL

  • Thanks for the article Dave. It is a well explained example and easy to follow.

    Cheers,

    Nicole Bowman

    Nothing is forever.

Viewing 15 posts - 1 through 15 (of 33 total)

You must be logged in to reply to this topic. Login to reply