August 24, 2015 at 8:42 am
I have a strange situation with an select. I've noticed that when I select top 100, a record is not returning from the database, but when doing top 101 the record appears on position 41.
The query is like this:
select top 100 GroupId, count(HouseId)
from House h
group by h.GroupId
order by max([DateCreated]) desc
From all discussions about top 100 vs top 101 I've noticed that everybody is saying that top 101 is using another algorithm and we can have a spead problem, but my problem is not about this. With top 100 I'm missing a record that should appear at index 41. Anibody noticed something like this?
August 24, 2015 at 8:53 am
Any time you're debugging an order problem, put the column that you're ordering by into the select.
select top 100 GroupId, count(HouseId), max([DateCreated])
from House h
group by h.GroupId
order by max([DateCreated]) desc
If you have multiple rows with the same max([DateCreated]), there's no guarantee what order they'll be displayed in.
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
August 24, 2015 at 9:16 am
If I put max([DateCreated]) in the select the result is the same, actually the DateCreated is the same for all records in the same group. The problem is that the sql server is generating a different algorithm for top 100 and top 101 (actually one algorithm for under 100 and another for 101 and up). but why is this algorithms returning different data?
August 24, 2015 at 9:22 am
mariandalalau (8/24/2015)
...The query is like this:
select top 100 GroupId, count(HouseId)
from House h
group by h.GroupId
order by max([DateCreated]) desc
...
The query is like this, or the query is this? If your actual query is different, can you post it up please?
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
August 24, 2015 at 9:25 am
Gail alluded to the issue TOP 100 without an order by, or an insufficient order by that covers your desired results.
add more columns to your order by, and change the ASC /DESC; you'll get different values each way.:
select top 100 GroupId, count(HouseId), max([DateCreated])
from House h
group by h.GroupId
order by max([DateCreated]) desc,GroupId
Lowell
August 24, 2015 at 9:34 am
mariandalalau (8/24/2015)
If I put max([DateCreated]) in the select the result is the same, actually the DateCreated is the same for all records in the same group.
That's what I suspected. What you've got there is a non-deterministic order, meaning that any 100 records with that max([DateCreated]) will correctly satisfy your query, as you specified no more specific order.
The problem is that the sql server is generating a different algorithm for top 100 and top 101
No, it's not. The problem is that you're expecting a guarantee of order where no such guarantee exists. It's essentially the same as saying SELECT TOP(100) FROM SomeTable;, nothing forces SQL to give you the same 100 records every time.
You're probably seeing parallelism, or hash aggregate or similar, but the cause is that you have a insufficiently specific ORDER BY clause and hence any 100 records are just fine based on the logic of your query. If you want a specific 100, you need to make the ORDER BY clause more specific.
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
August 24, 2015 at 9:49 am
mariandalalau (8/24/2015)
If I put max([DateCreated]) in the select the result is the same, actually the DateCreated is the same for all records in the same group. The problem is that the sql server is generating a different algorithm for top 100 and top 101 (actually one algorithm for under 100 and another for 101 and up). but why is this algorithms returning different data?
Well by using top X records on a result set that returns more than that you are effectively agreeing that you don't care about the specific records returned as long as you get X number of records.
Is there a problem with the records being returned or are you just wondering why you're getting different results with TOP 100 vs TOP 101?
August 24, 2015 at 1:15 pm
The sorting is on a DateTime field, the values are the same only in the group, the rest of the records are different. So it's not possible to have more than 100 records with same Datetime value (already checked this). I'm sure than this sorting is enough to be deterministic.
SQL Server always uses the alternative algorithm where the TOP 100 (or fewer) rows are requested.
One clear difference between the two runs is that the 101-row example spilled to tempdb (as indicated by the sort warning), whereas the TOP (100) version (using the alternative algorithm, in memory) did not.
August 24, 2015 at 2:06 pm
Here is the real problem, we can't see what you see. It's like calling your mechanic and expecting him to fix your car without seeing it or touching it.
Everything you have been told is valid. We can't do much more than what has been done.
Can you recreate the problem with sample data that represents your production data? We would need to see the DDL for the table, sample data that represents your problem domain and recreates the issue, what you expect to be returned by the query that currently isn't. Also, the actual execution plan would be helpful as well.
August 24, 2015 at 2:13 pm
if you call top 250, do you get 250 records?
if you call top 100 or top 101, and you get exactly that many records, then there are more records than fit in your TOP command, and SQL is free to return any records, in any order it feels is the fastest, unless you extend your order by query to eliminate ambiguities.
if you select top 100, and you get a random single record, that might be a problem/.
order by max([DateCreated]) might as well be ORDER BY (select 99999); it has no affect on the data returned at all; it's just a value.
order by GROUPID would certainly have an affect.
Lowell
August 24, 2015 at 2:24 pm
mariandalalau (8/24/2015)
If I put max([DateCreated]) in the select the result is the same, actually the DateCreated is the same for all records in the same group. The problem is that the sql server is generating a different algorithm for top 100 and top 101 (actually one algorithm for under 100 and another for 101 and up). but why is this algorithms returning different data?
mariandalalau (8/24/2015)
The sorting is on a DateTime field, the values are the same only in the group, the rest of the records are different. So it's not possible to have more than 100 records with same Datetime value (already checked this). I'm sure than this sorting is enough to be deterministic.
Unless I am missing something, which one is true?
SQL Server always uses the alternative algorithm where the TOP 100 (or fewer) rows are requested.
One clear difference between the two runs is that the 101-row example spilled to tempdb (as indicated by the sort warning), whereas the TOP (100) version (using the alternative algorithm, in memory) did not.
Please tell us where you found this out. Is this a guess on your part, or is there some documentation that describes the different algorithms?
Here is what you are seeing happening.
Example table:
Col1 Col2 Col3
1 Fee 01/01/2015
2 Fi 01/01/2015
3 Fo 01/01/2015
4 Fum 01/01/2015
SELECT TOP 2 Col1, Col2
FROM ThisTable
ORDER BY Col3
You are expecting rows 1 and 2.
SQL server is returning rows, 1 and 4.
Like Gail said, because all 4 rows in this table have the same value for Col3, which is the ordering column, the TOP clause may not return the rows YOU are expecting.
When you change to TOP 3, you get rows 1, 2, and 4.
Michael L John
If you assassinate a DBA, would you pull a trigger?
To properly post on a forum:
http://www.sqlservercentral.com/articles/61537/
August 25, 2015 at 1:46 am
Hi guys,
thanks for the help, finally this morning after a good sleep I've seen the problem.
You were right, the sorting was not deterministic enough, but yesterday I was so tired that I couldn't see it.
August 25, 2015 at 2:10 am
I didn't guest, I've read a lot of forums.
you can check this link: http://sqlblog.com/blogs/paul_white/archive/2010/08/27/sorting-row-goals-and-the-top-100-problem.aspx
Actually the problem on that forum is different, is based on the big difference on speed for top 100 vs top 101.
I know now that this was not the problem on my case, just that yesterday I was searching all the possibilities.
thanks for the help.
August 25, 2015 at 9:54 am
mariandalalau (8/25/2015)
I didn't guest, I've read a lot of forums.you can check this link: http://sqlblog.com/blogs/paul_white/archive/2010/08/27/sorting-row-goals-and-the-top-100-problem.aspx
Actually the problem on that forum is different, is based on the big difference on speed for top 100 vs top 101.
I know now that this was not the problem on my case, just that yesterday I was searching all the possibilities.
thanks for the help.
Make the link clickable: http://sqlblog.com/blogs/paul_white/archive/2010/08/27/sorting-row-goals-and-the-top-100-problem.aspx
August 25, 2015 at 2:44 pm
Good tip. You are the best at searching non links.
Viewing 15 posts - 1 through 15 (of 15 total)
You must be logged in to reply to this topic. Login to reply