Select top 101

  • I have a strange situation with an select. I've noticed that when I select top 100, a record is not returning from the database, but when doing top 101 the record appears on position 41.

    The query is like this:

    select top 100 GroupId, count(HouseId)

    from House h

    group by h.GroupId

    order by max([DateCreated]) desc

    From all discussions about top 100 vs top 101 I've noticed that everybody is saying that top 101 is using another algorithm and we can have a spead problem, but my problem is not about this. With top 100 I'm missing a record that should appear at index 41. Anibody noticed something like this?

  • Any time you're debugging an order problem, put the column that you're ordering by into the select.

    select top 100 GroupId, count(HouseId), max([DateCreated])

    from House h

    group by h.GroupId

    order by max([DateCreated]) desc

    If you have multiple rows with the same max([DateCreated]), there's no guarantee what order they'll be displayed in.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • If I put max([DateCreated]) in the select the result is the same, actually the DateCreated is the same for all records in the same group. The problem is that the sql server is generating a different algorithm for top 100 and top 101 (actually one algorithm for under 100 and another for 101 and up). but why is this algorithms returning different data?

  • mariandalalau (8/24/2015)


    ...

    The query is like this:

    select top 100 GroupId, count(HouseId)

    from House h

    group by h.GroupId

    order by max([DateCreated]) desc

    ...

    The query is like this, or the query is this? If your actual query is different, can you post it up please?

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • Gail alluded to the issue TOP 100 without an order by, or an insufficient order by that covers your desired results.

    add more columns to your order by, and change the ASC /DESC; you'll get different values each way.:

    select top 100 GroupId, count(HouseId), max([DateCreated])

    from House h

    group by h.GroupId

    order by max([DateCreated]) desc,GroupId

    Lowell


    --help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

  • mariandalalau (8/24/2015)


    If I put max([DateCreated]) in the select the result is the same, actually the DateCreated is the same for all records in the same group.

    That's what I suspected. What you've got there is a non-deterministic order, meaning that any 100 records with that max([DateCreated]) will correctly satisfy your query, as you specified no more specific order.

    The problem is that the sql server is generating a different algorithm for top 100 and top 101

    No, it's not. The problem is that you're expecting a guarantee of order where no such guarantee exists. It's essentially the same as saying SELECT TOP(100) FROM SomeTable;, nothing forces SQL to give you the same 100 records every time.

    You're probably seeing parallelism, or hash aggregate or similar, but the cause is that you have a insufficiently specific ORDER BY clause and hence any 100 records are just fine based on the logic of your query. If you want a specific 100, you need to make the ORDER BY clause more specific.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • mariandalalau (8/24/2015)


    If I put max([DateCreated]) in the select the result is the same, actually the DateCreated is the same for all records in the same group. The problem is that the sql server is generating a different algorithm for top 100 and top 101 (actually one algorithm for under 100 and another for 101 and up). but why is this algorithms returning different data?

    Well by using top X records on a result set that returns more than that you are effectively agreeing that you don't care about the specific records returned as long as you get X number of records.

    Is there a problem with the records being returned or are you just wondering why you're getting different results with TOP 100 vs TOP 101?

  • The sorting is on a DateTime field, the values are the same only in the group, the rest of the records are different. So it's not possible to have more than 100 records with same Datetime value (already checked this). I'm sure than this sorting is enough to be deterministic.

    SQL Server always uses the alternative algorithm where the TOP 100 (or fewer) rows are requested.

    One clear difference between the two runs is that the 101-row example spilled to tempdb (as indicated by the sort warning), whereas the TOP (100) version (using the alternative algorithm, in memory) did not.

  • Here is the real problem, we can't see what you see. It's like calling your mechanic and expecting him to fix your car without seeing it or touching it.

    Everything you have been told is valid. We can't do much more than what has been done.

    Can you recreate the problem with sample data that represents your production data? We would need to see the DDL for the table, sample data that represents your problem domain and recreates the issue, what you expect to be returned by the query that currently isn't. Also, the actual execution plan would be helpful as well.

  • if you call top 250, do you get 250 records?

    if you call top 100 or top 101, and you get exactly that many records, then there are more records than fit in your TOP command, and SQL is free to return any records, in any order it feels is the fastest, unless you extend your order by query to eliminate ambiguities.

    if you select top 100, and you get a random single record, that might be a problem/.

    order by max([DateCreated]) might as well be ORDER BY (select 99999); it has no affect on the data returned at all; it's just a value.

    order by GROUPID would certainly have an affect.

    Lowell


    --help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

  • mariandalalau (8/24/2015)


    If I put max([DateCreated]) in the select the result is the same, actually the DateCreated is the same for all records in the same group. The problem is that the sql server is generating a different algorithm for top 100 and top 101 (actually one algorithm for under 100 and another for 101 and up). but why is this algorithms returning different data?

    mariandalalau (8/24/2015)


    The sorting is on a DateTime field, the values are the same only in the group, the rest of the records are different. So it's not possible to have more than 100 records with same Datetime value (already checked this). I'm sure than this sorting is enough to be deterministic.

    Unless I am missing something, which one is true?

    SQL Server always uses the alternative algorithm where the TOP 100 (or fewer) rows are requested.

    One clear difference between the two runs is that the 101-row example spilled to tempdb (as indicated by the sort warning), whereas the TOP (100) version (using the alternative algorithm, in memory) did not.

    Please tell us where you found this out. Is this a guess on your part, or is there some documentation that describes the different algorithms?

    Here is what you are seeing happening.

    Example table:

    Col1 Col2 Col3

    1 Fee 01/01/2015

    2 Fi 01/01/2015

    3 Fo 01/01/2015

    4 Fum 01/01/2015

    SELECT TOP 2 Col1, Col2

    FROM ThisTable

    ORDER BY Col3

    You are expecting rows 1 and 2.

    SQL server is returning rows, 1 and 4.

    Like Gail said, because all 4 rows in this table have the same value for Col3, which is the ordering column, the TOP clause may not return the rows YOU are expecting.

    When you change to TOP 3, you get rows 1, 2, and 4.

    Michael L John
    If you assassinate a DBA, would you pull a trigger?
    To properly post on a forum:
    http://www.sqlservercentral.com/articles/61537/

  • Hi guys,

    thanks for the help, finally this morning after a good sleep I've seen the problem.

    You were right, the sorting was not deterministic enough, but yesterday I was so tired that I couldn't see it.

  • I didn't guest, I've read a lot of forums.

    you can check this link: http://sqlblog.com/blogs/paul_white/archive/2010/08/27/sorting-row-goals-and-the-top-100-problem.aspx

    Actually the problem on that forum is different, is based on the big difference on speed for top 100 vs top 101.

    I know now that this was not the problem on my case, just that yesterday I was searching all the possibilities.

    thanks for the help.

  • mariandalalau (8/25/2015)


    I didn't guest, I've read a lot of forums.

    you can check this link: http://sqlblog.com/blogs/paul_white/archive/2010/08/27/sorting-row-goals-and-the-top-100-problem.aspx

    Actually the problem on that forum is different, is based on the big difference on speed for top 100 vs top 101.

    I know now that this was not the problem on my case, just that yesterday I was searching all the possibilities.

    thanks for the help.

    Make the link clickable: http://sqlblog.com/blogs/paul_white/archive/2010/08/27/sorting-row-goals-and-the-top-100-problem.aspx

  • Good tip. You are the best at searching non links.

Viewing 15 posts - 1 through 15 (of 15 total)

You must be logged in to reply to this topic. Login to reply