TOP

Question

Post reply

TOP

VM-723206

SSCrazy

Points: 2964
More actions
April 22, 2009 at 9:19 pm

#199730

Comments posted to this topic are about the item TOP

Viewing 15 posts - 1 through 15 (of 25 total)

You must be logged in to reply to this topic. Login to reply

Carlo Romagnano SSC-Insane Points: 22854 More actions · Answer 1

Carlo Romagnano

SSC-Insane

Points: 22854

April 23, 2009 at 12:49 am

#982673

I disagree! See the question of yesterday.

SQLWinther SSCertifiable Points: 6294 More actions · Answer 2

You will find it the cause of this if you look at the Execution plan. The SQL optimizer uses a table scan to get "ch" from the first row since the index does not contain the column "ch". If you change your index to include "ch" then the result will be "$", thats because the index can serve your query.

Modified index:

create table #t (id int,ch char)

insert into #t values (2,'A')

insert into #t values (5,'b')

insert into #t values (1,'$')

insert into #t values (3,'*')

insert into #t values (7,'@')

insert into #t values (4,'&')

insert into #t values (6,'!')

create nonclustered index nc_t on #t(id,ch)

select top 1 ch from #t

drop table #t

You can force the query to use the index and return the sort order of the index, but this will cause a performance degradation due to an index scan and a RID lookup. This will give you twice the cost. SQL engine uses the index to get the first row that doesnt contain the column ch and then uses a lookup to find the ch column from the leaf node (data page) in, in this case, the heap. This performance degradation will increas rapidly with many rows.

Index hint:

create table #t (id int,ch char)

insert into #t values (2,'A')

insert into #t values (5,'b')

insert into #t values (1,'$')

insert into #t values (3,'*')

insert into #t values (7,'@')

insert into #t values (4,'&')

insert into #t values (6,'!')

create nonclustered index nc_t on #t(id)

select top 1 ch from #t with(index(nc_t))

drop table #t

/Håkan Winther

Senior SQL Server DBA

/Håkan Winther
MCITP:Database Developer 2008
MCTS: SQL Server 2008, Implementation and Maintenance
MCSE: Data Platform

SQLWinther SSCertifiable Points: 6294 More actions · Answer 3

Carlo Romagnano (4/23/2009)
I disagree! See the question of yesterday.

I do not understand why you disagree? Do you want to be more specific?

/Håkan Winther

Senior SQL Server DBA

/Håkan Winther
MCITP:Database Developer 2008
MCTS: SQL Server 2008, Implementation and Maintenance
MCSE: Data Platform

Hugo Kornelis SSC Guru Points: 64790 More actions · Answer 4

hakan.winther (4/23/2009)
Carlo Romagnano (4/23/2009)
I disagree! See the question of yesterday.
I do not understand why you disagree? Do you want to be more specific?

Hi Håkan,

The QotD of yesterday sparked a huge discussion, because it asked us to protect the output of a query; the answers options included two answers with the same rows in a different output, and the query did not include ann ORDER BY. So even if one of the two was consistently returned on the system of the question author, there is absoolutely no guarantee that the same order will be observed on other people's systems. Or on the author's system tomorrow, when a hotfix is installed or a different overall workload is present when the optimizer starts to work on the query.

The same applies to today's question. The result of a TOP expression without ORDER BY is undefined. This is even explicitly mentioned in the Books Online page that the explanation refers to. Any of the values A, b, $, *, @, &, and ! could be returned by this query, and every one of them would be a correct result.

With my knowledge of SQL Server internals (thanks to reading lots of Kalen Delaney's books), I was able to predict how the rows would be layed out on physical storage and how this would cause the value A to be returned - but only under the assumption that there is no concurrent activity on the system and that there is no service pack or hotfix installed that changes the behaviour of heap allocation or that of the optimizer. (Both undocumented as far as official Microsoft documentation is concerned, and hence free to be changed without notification).

Maybe there is some official Microsoft documentation out there that clearly shows that, even under the most extreme concurrent allocation/deallocation activity in tempdb, the layout of these rows on storage will be the same; and that under all possible circumstances, the optimizer will choose to use an unordered table scan for this query. If there is, I'd appreciate it if someone could post a URL for it. Until that happens (and mind you, I'm not holding my breath), I call shenanigans on this question, as the only correct reply ("any one of the values A, b, $, *, @, &, or ! can be returned") is not present.

Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
Visit my SQL Server blog: https://sqlserverfast.com/blog/
SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/

SQLWinther SSCertifiable Points: 6294 More actions · Answer 5

I agree that you cannot be 100% sure of what method will be chosen by the SQL optimizer, but in this case the index scan would be much more expensive (twice the cost) due to the fact that you need to do a RID lookup (HEAP) to get your column ch. Thats why a table scan was used and the file allocation order is returned.

But you ( and Microsoft ) mean that there is no guarantee that the file allocation order is the order how the data is inserted? Okey, I got your point, the question is incorrect. There is no guarantee that "A" is returned due to the fact that an order by clause is missing. But if you had used an index hint would that always return the same record or could it be another record under different circumstances? (I know different collation can affect the result, but how about index fragmentation etc)

/Håkan Winther

/Håkan Winther
MCITP:Database Developer 2008
MCTS: SQL Server 2008, Implementation and Maintenance
MCSE: Data Platform

Adi Cohn-120898 SSC-Dedicated Points: 33944 More actions · Answer 6

I don’t like this question and yesterday’s question. The fact that if we run the code it will have the same results as the answer that was selected as the correct one, doesn’t makes it the correct answer. There isn’t any official documentation from Microsoft specifying order the records will be retrieved when there is no order by clause. Since there isn’t such official information, Microsoft is free to modify this behavior and it might change in the future after installing a service pack/cumulative update/hotfix or just in the next version of SQL Server.

I think that the important part of the Question Of The Day is that we can learn something from it, but this time I think that it causes damage as it can lead someone to learn that he doesn’t have to use an order by clause and he can trust the server to return the records using a specific order.

Adi

--------------------------------------------------------------
To know how to ask questions and increase the chances of getting asnwers:
http://www.sqlservercentral.com/articles/Best+Practices/61537/

For better answers on performance questions, click on the following...
http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

Hugo Kornelis SSC Guru Points: 64790 More actions · Answer 7

hakan.winther (4/23/2009)
I agree that you cannot be 100% sure of what method will be chosen by the SQL optimizer, but in this case the index scan would be much more expensive (twice the cost) due to the fact that you need to do a RID lookup (HEAP) to get your column ch. Thats why a table scan was used and the file allocation order is returned.

Agreed. For the current version of the optimizer. I'm not claiming the answer is currently wrong, I'm claiming it's subject to change without notice. Maybe the next hotfix will include an optimizer enhancement that makes it more aware of recent activity and will produce plans that favor last accessed data, to increase the chance of cache hits to reduce physical IO. Or some other enhancement, that affects this behaviour. Likely? I'd say no - but not impossible. The only thing one can safely say about the order of rows returned by a query without ORDER BY, or the row(s) selected by a TOP without ORDER BY, is that it is undefined.

But you ( and Microsoft ) mean that there is no guarantee that the file allocation order is the order how the data is inserted?

The order in which rows are inserted is of course exactly as specified in the code, as they are mutliple seperate single-row INSERT statements and T-SQL will always be executed asynchronously. But the order in which the rows are laid out on disk might be different.

For scanning a heap, the storage engine currently has only one strategy: using the IAM (Index Allocation Map) to access the pages allocated to the table. This means that the pages will be accessed in order of page number. On a "quiet" system, page order will normally match the order in which the pages and extents were acquired - but that is not necessarily the case if the system is concurrently being used.

Consider the following example scenario. A different process has to store some temp data (for a temp table, a table variable, or maybe even a spool or sort step in an execution plan) and grabs logical page #37. While this process is still running, you create a temp table and start loading data. You'll probably start at page #38, then get page #39, etc. But if the other process releases it's claim on tempdb, page #7 will be marked as free again - so if you next need a new page, you might get page #37. After that, a table scan will first produce the rows that are present on page #37 (acquired as the third page), then those on pages #38 and #39 (acquired first and second), and then continue from page #40 (acquired fourth).

I tried to create some code to demonstrate this, but I so far was unable to come up with a consistent repro, and I currently lack the time to keep trying. I hope the explanation suffices to see the possible risk.

And again - this is all based on the current version of SQL Server. Who knows what will change next? Or what has already changed in the last service pack, without me knowing?

Okey, I got your point, the question is incorrect. There is no guarantee that "A" is returned due to the fact that an order by clause is missing. But if you had used an index hint would that always return the same record or could it be another record under different circumstances? (I know different collation can affect the result, but how about index fragmentation etc)

Index fragmentation can affect the results if an IAM scan is used, as explained above. But that's not the only risk. Using an index hint forces the optimizer to create a plan that somehow uses the index, but nothing more. It's still free to scan or seek, free to choose an ordered or an unordered scan, and in case of an ordered scan to start at the beginning or the end, free to add other operators to the plan, etc.

Using a plan guide to force an exact execution plan should provide more certainty, but even than I would not vow for a 100% certainty. Even with the same plan, results may differ. For example (not really relevant for this specific situation): if you are running Enterprise Edition and you have to scan a large table that is already being scanned for another process, it can "piggyback" on the first to reuse results as they are being read and then continue to restart from the first page until where it started to piggyback - see http://msdn.microsoft.com/en-us/library/ms191475.aspx (topic Advanced Scanning) for the details.

The problem with a QotD as that of today (or that of yesterday, for that matter), is that they take current, undocumented behaviour, and present it in such a way that people reading this site might think it's something they can depend upon. The reason I take the time to jump on it in these forums is that I hope that at least the fraction of QotD respondents who take the time to read the discussion as well will realise that it's just the behaviour that happens to occur today, with no guarantees whatsoever for the future.

Many people have suffered the pain when they upgraded from SQL Server 6.5 to SQL Server 7.0 and a GROUP BY no longer automatically implied an ORDER BY. Many people have suffered the pain when they upgraded from SQL Server 2000 to SQL Server 2005 and returning data from a view that uses TOP 100 PERCENT ... ORDER BY no longer automatically returned the rows in the specified order. Let's all try not to repeat history a third time...

EDIT: And while I was composing thhis lengthy reply, Adi wrote basically the same, just much more succinctly (sp?). Thanks, Adi!

Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
Visit my SQL Server blog: https://sqlserverfast.com/blog/
SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/

Rachel Byford Hall of Fame Points: 3437 More actions · Answer 8

You're right about people not reading the discussions - they really miss out.

This is a really informative thread (as was yesterday's).

Thank you.

SQLWinther SSCertifiable Points: 6294 More actions · Answer 9

I always use order by when i use TOP, but now i realize how important it really is. I have not thought of why it is important before, and I thank you for your time to share your knowledge about SQL Server internal processes.

/Håkan Winther

/Håkan Winther
MCITP:Database Developer 2008
MCTS: SQL Server 2008, Implementation and Maintenance
MCSE: Data Platform

GSquared SSC Guru Points: 260824 More actions · Answer 10

I was going to comment about how this isn't a valid test, but it's already been covered more than adequately. Good discussion.

- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread

"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

Hugo Kornelis SSC Guru Points: 64790 More actions · Answer 11

Rachel Byford (4/23/2009)
This is a really informative thread (as was yesterday's).

hakan.winther (4/23/2009)
thank you for your time to share your knowledge about SQL Server internal processes.

GSquared (4/23/2009)
Good discussion.

Thanks, all three of you, for the kind words. Comments like these make it worthwhile to continue spending time here! 🙂

Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
Visit my SQL Server blog: https://sqlserverfast.com/blog/
SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/

Mauve SSChampion Points: 11316 More actions · Answer 12

I agree with Hugo and others that the correct answer is missing from this QoD and yesterday's QoD. As a RDBMS developer for over 30 years (DB2, Oracle, SQL Server) I have seen too often bad SQL code based upon the results of some assumption and/or simple "test data". Later on, the system fails and/or returns incorrect data as the data changes, the RDBMS software is updated, and/or the phase of the moon changes.

I always liked IBM's response (decades ago) when asked what would happen when {fill in scenario here} that was not clearly documented in the manual/documentation. Their pat answer was: Results are unpredictable.

In my opinion, it feels like a developer observed some behavior based upon the situation of the day and decided to submit it as a QoD. They really don't appear to be based upon proper research.

[font="Arial Narrow"](PHB) I think we should build an SQL database. (Dilbert) What color do you want that database? (PHB) I think mauve has the most RAM.[/font]

dunnjoe SSCrazy Points: 2180 More actions · Answer 13

The posts for this message are very informative, but I don't see that they invalidate the answer. There were no deletes, so no fragmentation would be created. The nonclustered index is irrelevant to the query because it does not include ch in the definition, so index coverage is not possible. Without index coverage the result would be in the order of the inserts. The one thing I agree with the posts, if you want to guarantee the order of the result set, than a clustered index needs to be defined or order by clause in the select. A question I have: since it is a temp table, other users can't use it, so multi-user scenarios causing a change in the row ordering don't seem to be an issue, right?

SanjayAttray SSChampion Points: 13157 More actions · Answer 14

select top 1 ch from #t with(index(nc_t))

order by ch desc

Answer : b

select top 1 ch from #t with(index(nc_t))

order by ch

Answer : !

select top 1 ch from #t with(index(nc_t)) -- (without order by)

Answer : $

select top 1 ch from #t (without index hint and without order by)

Answer : A

Basically, It all depends how you want to fetch the data. I think all the answers are correct as mention by Hugo. Again " depends ".

SQL DBA.