RANK/TOP WITHOUT TIES

  • Hopefully someone can set me straight here - I am looking for a way to select the top n items, so long as the n+1th item does NOT have the same value as the nth item.

    Basically, I'm looking for the opposite of:

    SELECT TOP 5 WITH TIES col1

    FROM Mytable

    ORDER BY col1

    If sorting all values of col1 would give these values: 5,4,3,2,1,1,1

    ... then the WITH TIES query above would return 7 rows...

    I'm looking for a query that would return only 4 rows: 5,4,3,2

    Since including all of the 1's would push it over the limit of n rows, I want to exclude ALL rows with this value, not just the one that happens to get sorted first.

    Basically, I want the TOP n, except for when the TOP WITH TIES n will have more than n rows.

    I hope I'm making sense here - as I feel there's got to be a clever way to do this that I'm just not seeing immediately.

    Thanks,

    Brigid

  • Look at the RANK() and DENSE_RANK() window functions.

    http://msdn.microsoft.com/en-us/library/ms176102.aspx

    http://msdn.microsoft.com/en-us/library/ms173825.aspx


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • dwain.c (8/28/2012)


    Look at the RANK() and DENSE_RANK() window functions.

    Not sure how RANK() OR DENSE_RANK() would help in this example. I thought a lot about the functions RANK, DENSE_RANK, and ROW_NUMBER and am hoping someone can see something clever that won't necessitate joining on the result of these functions...

    These functions applied to the "col1" data I gave above would give:

    col1rankdenserow_num

    5111

    4222

    3333

    2444

    1555

    1556

    1557

    So basically, I would like something that says something like:

    select where rank <= 5 and the max row_num associated with that rank is also <= 5

    Is there a way to set the assignment of the rank function? So that instead of setting all the tied values to the next integer, it would set it to the maximum row number for that value?

    My ideal MYRANK function would give

    col1myrank

    51

    42

    33

    24

    17

    17

    17

    Then I could just SELECT col1 WHERE myrank <= 5.

    I feel like there's got to be a simple explanation that I'm just not seeing... all help is greatly appreciated!

  • This is pretty simple using Row_Number. Please notice how I posted sample data to make this a lot easier for anybody else. You should do something similar in your future posts.

    ;with MyRank (col1)

    as

    (

    select 5 union all

    select 4 union all

    select 3 union all

    select 2 union all

    select 1 union all

    select 1 union all

    select 1

    )

    select col1, MAX(RowNum) as MaxRowNum

    from

    (

    select col1, ROW_NUMBER() over (order by col1 desc) as RowNum

    from MyRank

    ) x

    group by col1

    having Max(RowNum) <= 5

    order by col1 desc

    Or for those who would prefer cacading ctes, you can modify it like this.

    ;with MyRank (col1)

    as

    (

    select 5 union all

    select 4 union all

    select 3 union all

    select 2 union all

    select 1 union all

    select 1 union all

    select 1

    )

    , MaxRank as

    (

    select col1, ROW_NUMBER() over (order by col1 desc) as RowNum

    from MyRank

    )

    select col1, MAX(RowNum) as MaxRowNum

    from MaxRank

    group by col1

    having Max(RowNum) <= 5

    order by col1 desc

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • Another option:

    ;with MyRank (col1)

    as

    (

    select 5 union all

    select 4 union all

    select 3 union all

    select 2 union all

    select 1 union all

    select 1 union all

    select 1

    )

    SELECT TOP 5 col1

    FROM MyRank

    GROUP BY col1

    HAVING COUNT(*) = 1

    ORDER BY col1

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2
  • Luis Cazares (8/29/2012)


    Another option:

    ;with MyRank (col1)

    as

    (

    select 5 union all

    select 4 union all

    select 3 union all

    select 2 union all

    select 1 union all

    select 1 union all

    select 1

    )

    SELECT TOP 5 col1

    FROM MyRank

    GROUP BY col1

    HAVING COUNT(*) = 1

    ORDER BY col1

    That doesn't quite meet the requirements. If for example they wanted to top 7 this code would not work because it would not return a row where col1 = 1.

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • I tested my last solution and won't give the exact results.

    Try this instead:

    ;with MyRank (col1)

    as

    (

    select 6 union all

    select 5 union all

    select 4 union all

    select 3 union all

    select 2 union all

    select 2 union all

    select 1

    ), MyRank2

    as

    (

    SELECT TOP 5 col1, COUNT(*) AS RCount

    FROM MyRank

    GROUP BY col1

    )

    SELECT col1

    FROM MyRank2

    WHERE RCount = 1

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2
  • Luis Cazares (8/29/2012)


    I tested my last solution and won't give the exact results.

    Try this instead:

    ;with MyRank (col1)

    as

    (

    select 6 union all

    select 5 union all

    select 4 union all

    select 3 union all

    select 2 union all

    select 2 union all

    select 1

    ), MyRank2

    as

    (

    SELECT TOP 5 col1, COUNT(*) AS RCount

    FROM MyRank

    GROUP BY col1

    )

    SELECT col1

    FROM MyRank2

    WHERE RCount = 1

    Still not quite right. You have a TOP but not Order By. Which top 5 will it get?

    Adding the order by seems to get it though.

    ;with MyRank (col1)

    as

    (

    select 6 union all

    select 5 union all

    select 4 union all

    select 3 union all

    select 2 union all

    select 2 union all

    select 1

    ), MyRank2

    as

    (

    SELECT TOP 5 col1, COUNT(*) AS RCount

    FROM MyRank

    GROUP BY col1

    order by col1 desc --Need this order by to ensure which top 5

    )

    SELECT col1

    FROM MyRank2

    WHERE RCount = 1

    This seems to have a slightly simpler execution plan than mine too. 😛 We have certainly proven there is more than 1 way to do this. I am sure somebody will come along with another one within a couple hours.

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • Sean Lange (8/29/2012)


    Luis Cazares (8/29/2012)


    I tested my last solution and won't give the exact results.

    Try this instead:

    ;with MyRank (col1)

    as

    (

    select 6 union all

    select 5 union all

    select 4 union all

    select 3 union all

    select 2 union all

    select 2 union all

    select 1

    ), MyRank2

    as

    (

    SELECT TOP 5 col1, COUNT(*) AS RCount

    FROM MyRank

    GROUP BY col1

    )

    SELECT col1

    FROM MyRank2

    WHERE RCount = 1

    Still not quite right. You have a TOP but not Order By. Which top 5 will it get?

    Adding the order by seems to get it though.

    ;with MyRank (col1)

    as

    (

    select 6 union all

    select 5 union all

    select 4 union all

    select 3 union all

    select 2 union all

    select 2 union all

    select 1

    ), MyRank2

    as

    (

    SELECT TOP 5 col1, COUNT(*) AS RCount

    FROM MyRank

    GROUP BY col1

    order by col1 desc --Need this order by to ensure which top 5

    )

    SELECT col1

    FROM MyRank2

    WHERE RCount = 1

    This seems to have a slightly simpler execution plan than mine too. 😛 We have certainly proven there is more than 1 way to do this. I am sure somebody will come along with another one within a couple hours.

    What does the OP want with the following?

    ;with MyRank (col1)

    as

    (

    select 6 union all

    select 5 union all

    select 5 union all

    select 4 union all

    select 3 union all

    select 2 union all

    select 1

    ), MyRank2

    as

    (

    SELECT TOP 5 col1, COUNT(*) AS RCount

    FROM MyRank

    GROUP BY col1

    order by col1 desc --Need this order by to ensure which top 5

    )

    SELECT col1

    FROM MyRank2

    WHERE RCount = 1

  • I had the idea that the group by would do the sort. However, it might not be the safest thing to do it.

    Your query gave me some incorrect results when I change the values.

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2
  • Actually I don't think your works Luis.

    I added some more data and it doesn't do what the OP asked for originally.

    ;with MyRank (col1)

    as

    (

    select 6 union all

    select 5 union all

    select 5 union all

    select 5 union all

    select 5 union all

    select 5 union all

    select 5 union all

    select 5 union all

    select 4 union all

    select 3 union all

    select 2 union all

    select 1

    ), MyRank2

    as

    (

    SELECT TOP 5 col1, COUNT(*) AS RCount

    FROM MyRank

    GROUP BY col1

    order by col1 desc --Need this order by to ensure which top 5

    )

    SELECT col1

    FROM MyRank2

    WHERE RCount = 1

    If I understand the requirements this should only return 1 row. Trying the same with the Row_Number solution seems to still work as the OP stated in their original post and returns only col1 = 6.

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • Thanks to both of you for your help and ideas - I thought I would be able to do this without subqueries with separate groups, but that may not be the case.

    The only other additional chance I thought I had was to make a function combining RANK ASC and RANK DESC.

    ;with MyTable (col1)

    as

    (

    select 1 union all

    select 2 union all

    select 2 union all

    select 3 union all

    select 5 union all

    select 5 union all

    select 9 union all

    select 9 union all

    select 10 union all

    select 10

    )

    SELECT col1

    , RANK() OVER(ORDER BY col1 DESC) AS Rank_DESC

    , RANK() OVER(ORDER BY col1 ASC) AS Rank_ASC

    , RANK() OVER(ORDER BY col1 DESC) + RANK() OVER(ORDER BY col1 ASC) AS Rank_Sum

    FROM MyTable

    If you run this, you can see that the sum of RANK ASC + RANK DESC is greater than the number of rows only when that row is not part of a tie.

    But, since you can't have RANK functions within a WHERE clause, this still leaves me as having a subquery of a sort. (As you can guess, my actual data is a good deal larger than this, so I was trying to avoid having to cache subquery results if it was possible.)

    Thanks a bunch - and if my comments inspire any further ideas - I'd love to hear them!

  • You're right Sean, I missed something.

    EDIT: I need more coffee or to read more carefully.

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2

Viewing 13 posts - 1 through 12 (of 12 total)

You must be logged in to reply to this topic. Login to reply