Blog Post

Executing SELECT query using TOP without ORDER BY

,

Time and again I see developers querying SQL Server trying to retrieve TOP (N) records but without using ORDER BY, and not realising that the query could end up returning random results. We will look at this in a bit of details with some examples.

Assuming you are able to run the StackOverflow database, if you run the following query:

SELECT TOP 10 Id
FROM Users
WHERE LastAccessDate > '2012-01-01'

This is a very simple query and given that you have no other non-clustered indexes being used, the results looks as follows:

post15_results_1

When you re-run the same query over and over again, you will get the same results every time. The execution plan of the query looks as follows:

post15_execution_plan_1

The results returned are in the order they were entered – in this case the same order as the clustered index on Users.Id

In other words, the results are identical to when you run:

SELECT TOP 10 Id
FROM Users
WHERE LastAccessDate > '2012-01-01'
ORDER BY Id

There are at least three reasons why you shouldn’t expect the results to remain consistent without using the ORDER BY clause.

Introduction of other non-clustered indexes (reason one) could potentially mean SQL decides to use a different index and therefore the results returned are not in the order expected.

If I now introduce the following index:

CREATE NONCLUSTERED INDEX [IX_Users_LastAccessDate_Id] ON [dbo].[Users]
(
[LastAccessDate]
)
INCLUDE ([Id])

Then, run the query without the ORDER BY clause, my results look like this:

post15_results_2

How did that change? Let’s look at the execution plan.

post15_execution_plan_1a

SQL Server decided to use our newly created non-clustered index here, because it’s got all the fields that the query requires and it’s smaller in size. However, since the non-clustered index is ordered by the LastAccessDate, the IDs returned will also be in that order.

Reason two – let’s assume that you didn’t have any non-clustered index and your clustered index was not on a sequential field like the identity column we have in the above examples, but instead was a GUID or another “hot” field like the LastAcessDate, which keeps changing every time. This would mean that your clustered index will also change every time the field was updated or a new record inserted. And therefore, the results returned would change along with it. Quite similar to what we saw in the above example when SQL Server decided to use the non-clustered index.

Your application also runs another slightly complex query which looks like:

SELECT TOP 10 Users.Id, COUNT(Posts.Id) as PostsCount
FROM Users
INNER JOIN Posts ON Posts.OwnerUserId = Users.Id
WHERE Posts.CreationDate > '2012-10-01' AND Posts.Score > 10
GROUP BY Users.Id

You will almost likely see a different set of results every time you execute the query.

So what could be going on here? Let’s look at the execution plan.

post15_execution_plan_2

Now, here’s why.
Parallelism (reason three) – Yes, that’s absolutely normal! As the cost of the query is above the “Cost Threshold for Parallelism”, more than one thread is required to accomplish the query. Because of that, though, there is no guarantee of the order when the multiple streams gather and, therefore, the results returned are different almost every time.

TOP TIP: In all the examples above, you have given the control to SQL Server and it’s almost as good as a roll of dice. To stay in control and get consistent results, always use ORDER BY when using TOP (n) in your queries, no matter how simple they are. If the cost of your query suddenly goes up or the DBA decides to make changes to the indexes, your results will remain correct!

The post Executing SELECT query using TOP without ORDER BY appeared first on SQLYSE.

Rate

You rated this post out of 5. Change rating

Share

Share

Rate

You rated this post out of 5. Change rating