RE: Data Transfer - data not transfered in order

SSCrazy Eights

Points: 9832

July 22, 2013 at 10:16 am

Phil Parkin (7/22/2013)
I have some questions about your post wolfkillj.
heaps, will logically order rows in the order in which they were inserted
What do you mean by 'logically' in the above?
*However*, clustered indexes ONLY provide a logical order for storing rows
Are you sure? No physical order whatsoever?
...but it is merely coincidence that the rows are in that order when they leave the final operator in the execution plan
Not coincidence. Under some circumstances (often select * from
), rows are returned in the order defined by the clustered PK, just because the SQL query optimiser decided that that was the best way to do it. It's not guaranteed, but it's not coincidence either.
Note that I am not suggesting that ORDER BY should not be used, just picking up on the details in your message.

Hi Phil,

I probably oversimplified my answers you quote to avoid getting *too* technical, as I only wanted to provide a high-level view of why the "rows from a clustered index will always be returned in key order" myth is incorrect.

I did bungle the first bit about how rows are managed in heaps. There is no order, logical or otherwise, to rows in a heap, of course. What I should have said is that new rows will be inserted in the last allocated page of the heap and a "SELECT *" from a heap will return rows in page allocation order. As Paul Randal explains, this will not necessarily be the same order in which the rows were inserted, apparently because of the possibility that an updated row will be moved to a new page and a forwarding record left on the original page.

The reason that a clustered index does not guarantee that rows will be in a "physical" order is similarly esoteric. Each data page includes a "row offset array" that gives the starting location of each row in the page, with each row designated to a particular "slot". While the rows on a page are assigned to slots in key order, the actual locations of the data for those rows on the page may not be in key order. This blog post[/url] demonstrates how this works. So strictly speaking, the data for rows in a clustered index will not always be physically stored in key order, although the data can only be out of order within a page.

I used the word "coincidence" in an effort to drive home the point that a query may return rows in order of the clustered index key in the absence of an "ORDER BY" clause, not because the query syntax demands it, but because the optimizer chooses an execution plan that delivers the results in that order, as you point out. The optimizer is not constrained to return rows in order of the clustered index, it just happens to do so in this case. I suppose it may be more accurate to say that it's just a coincidence (from the user's point of view) that Microsoft designed the optimizer to behave like this. And, Microsoft makes no guarantees that it will maintain this behavior in future versions of SQL Server.

We seem to agree that it would be an abominable practice to rely on this behavior in any code where the order of results actually matters. I was just trying to emphasize to those who may be mistakenly expecting rows to always be returned in order of the clustered index key that they should not rely on this behavior. What responsible developer wants to write queries that will *probably* return the results in the desired/required order?

Thanks for your comments - I enjoy the discussion.

Jason

Jason Wolfkill