SQL Clone
SQLServerCentral is supported by Redgate
 
Log in  ::  Register  ::  Not logged in
 
 
 


Data Transfer - data not transfered in order


Data Transfer - data not transfered in order

Author
Message
TSQL Tryer
TSQL Tryer
SSCommitted
SSCommitted (1.8K reputation)SSCommitted (1.8K reputation)SSCommitted (1.8K reputation)SSCommitted (1.8K reputation)SSCommitted (1.8K reputation)SSCommitted (1.8K reputation)SSCommitted (1.8K reputation)SSCommitted (1.8K reputation)

Group: General Forum Members
Points: 1821 Visits: 715
hi there,

I have a table with 371 751 rows.

I have a standard SSIS package that transfers data from table A to table B. There are no fancy data manipulation. Yet when I trnasfer the data it doesn't end up in the database table in the same order?

Example the first entry in the table A is ID "4930". Yet in table B the first ID is "142701". Is there a reason it doesn't transfer in the correct order.

It's not the end of the world as all the data is transferring, it's just I don't understand the logic.

Thanks
s_osborne2
s_osborne2
SSCommitted
SSCommitted (1.8K reputation)SSCommitted (1.8K reputation)SSCommitted (1.8K reputation)SSCommitted (1.8K reputation)SSCommitted (1.8K reputation)SSCommitted (1.8K reputation)SSCommitted (1.8K reputation)SSCommitted (1.8K reputation)

Group: General Forum Members
Points: 1795 Visits: 2292
Hi,

How are you identifying the "different order". There is no default "order by" in SQL Server. If you've got a clustered index on the table then the data is logically ordered by that within the table. Simply doing a select top x from blah with no explicit specification for the order by will select top x random rows.

Thanks,


Simon



MCSE: Data Platform
MCSE: Business Intelligence
Follow me on Twitter: @WazzTheBadger
LinkedIn Profile: Simon Osborne
Koen Verbeeck
Koen Verbeeck
SSC Guru
SSC Guru (63K reputation)SSC Guru (63K reputation)SSC Guru (63K reputation)SSC Guru (63K reputation)SSC Guru (63K reputation)SSC Guru (63K reputation)SSC Guru (63K reputation)SSC Guru (63K reputation)

Group: General Forum Members
Points: 63076 Visits: 13298
Order isn't important, that's something for the visualization layer :-)
The order of the rows can depend on indexes on source and/or destination table and the query written to retrieve the rows.


How to post forum questions.
Need an answer? No, you need a question.
What’s the deal with Excel & SSIS?
My blog at SQLKover.

MCSE Business Intelligence - Microsoft Data Platform MVP
TSQL Tryer
TSQL Tryer
SSCommitted
SSCommitted (1.8K reputation)SSCommitted (1.8K reputation)SSCommitted (1.8K reputation)SSCommitted (1.8K reputation)SSCommitted (1.8K reputation)SSCommitted (1.8K reputation)SSCommitted (1.8K reputation)SSCommitted (1.8K reputation)

Group: General Forum Members
Points: 1821 Visits: 715
Thanks guys. It was more a reason I was after which you have both supplied. I won't concern myself with it, but will play about with some indexes in development.

Thanks again
wolfkillj
wolfkillj
SSCrazy
SSCrazy (2.7K reputation)SSCrazy (2.7K reputation)SSCrazy (2.7K reputation)SSCrazy (2.7K reputation)SSCrazy (2.7K reputation)SSCrazy (2.7K reputation)SSCrazy (2.7K reputation)SSCrazy (2.7K reputation)

Group: General Forum Members
Points: 2720 Visits: 2582
Ryan Keast (7/18/2013)
Thanks guys. It was more a reason I was after which you have both supplied. I won't concern myself with it, but will play about with some indexes in development.

Thanks again


TL/DR: If it is important that the rows in a result set are in a certain order, the query MUST include an ORDER BY clause specifying that order, regardless of what indexes exist on the underlying tables.

Hi Ryan,

I'm concerned that maybe the previous answers to your question didn't flesh out the issue enough to keep you from wandering into problematic territory. The relational model defines a relation as an unordered set of tuples. In SQL terminology, a relation is represented as a table (and tuples as rows). The relation:table equivalency is not perfect, however, especially because a relational database has to store actual tables with actual rows instead of conceptual relations comprised of conceptual tuples, so it has to store the rows in some kind of order. A clustered index on the table will govern the logical order in which rows are stored, while tables with no clustered indexes, called heaps, will logically order rows in the order in which they were inserted.

*However*, clustered indexes ONLY provide a logical order for storing rows and a means to seek for rows when the searching on the clustered index key. The existence of a clustered index does NOT in any way guarantee the order in which rows will be retrieved or returned from the table. In SQL Server, the query optimizer generates an execution plan that will be guaranteed to return all rows that meet the search conditions (and only those rows), but the rows will be returned in the order they leave the last operator in the execution plan. Since some execution plans require sorts of intermediate result sets, the final order of the rows may or may not correspond to the order of rows in a clustered index.

Accordingly, if it is important that the rows in a result set are in a certain order, the query MUST include an ORDER BY clause specifying that order, regardless of what indexes exist on the underlying tables. That is the ONLY way to guarantee that rows in a result set are ordered in a specific manner. Some people are led astray by the fact that SQL Server often returns rows from a table in order of the clustered index key even in the absence of an ORDER BY clause, but it is merely coincidence that the rows are in that order when they leave the final operator in the execution plan. It can't be said too often - the only way to guarantee the order of rows in a result set is to include an ORDER BY clause in the query.

Here endeth the lesson. :-D

Jason

Jason Wolfkill
Blog: SQLSouth
Twitter: @SQLSouth
Phil Parkin
Phil Parkin
SSC Guru
SSC Guru (52K reputation)SSC Guru (52K reputation)SSC Guru (52K reputation)SSC Guru (52K reputation)SSC Guru (52K reputation)SSC Guru (52K reputation)SSC Guru (52K reputation)SSC Guru (52K reputation)

Group: General Forum Members
Points: 52093 Visits: 21173
I have some questions about your post wolfkillj.

heaps, will logically order rows in the order in which they were inserted


What do you mean by 'logically' in the above?

*However*, clustered indexes ONLY provide a logical order for storing rows


Are you sure? No physical order whatsoever?

...but it is merely coincidence that the rows are in that order when they leave the final operator in the execution plan


Not coincidence. Under some circumstances (often select * from [table]), rows are returned in the order defined by the clustered PK, just because the SQL query optimiser decided that that was the best way to do it. It's not guaranteed, but it's not coincidence either.

Note that I am not suggesting that ORDER BY should not be used, just picking up on the details in your message.


Help us to help you. For better, quicker and more-focused answers to your questions, consider following the advice in this link.

If the answer to your question can be found with a brief Google search, please perform the search yourself, rather than expecting one of the SSC members to do it for you.

Please surround any code or links you post with the appropriate IFCode formatting tags. It helps readability a lot.
wolfkillj
wolfkillj
SSCrazy
SSCrazy (2.7K reputation)SSCrazy (2.7K reputation)SSCrazy (2.7K reputation)SSCrazy (2.7K reputation)SSCrazy (2.7K reputation)SSCrazy (2.7K reputation)SSCrazy (2.7K reputation)SSCrazy (2.7K reputation)

Group: General Forum Members
Points: 2720 Visits: 2582
Phil Parkin (7/22/2013)
I have some questions about your post wolfkillj.

heaps, will logically order rows in the order in which they were inserted


What do you mean by 'logically' in the above?

*However*, clustered indexes ONLY provide a logical order for storing rows


Are you sure? No physical order whatsoever?

...but it is merely coincidence that the rows are in that order when they leave the final operator in the execution plan


Not coincidence. Under some circumstances (often select * from [table]), rows are returned in the order defined by the clustered PK, just because the SQL query optimiser decided that that was the best way to do it. It's not guaranteed, but it's not coincidence either.

Note that I am not suggesting that ORDER BY should not be used, just picking up on the details in your message.


Hi Phil,

I probably oversimplified my answers you quote to avoid getting *too* technical, as I only wanted to provide a high-level view of why the "rows from a clustered index will always be returned in key order" myth is incorrect.

I did bungle the first bit about how rows are managed in heaps. There is no order, logical or otherwise, to rows in a heap, of course. What I should have said is that new rows will be inserted in the last allocated page of the heap and a "SELECT *" from a heap will return rows in page allocation order. As Paul Randal explains, this will not necessarily be the same order in which the rows were inserted, apparently because of the possibility that an updated row will be moved to a new page and a forwarding record left on the original page.

The reason that a clustered index does not guarantee that rows will be in a "physical" order is similarly esoteric. Each data page includes a "row offset array" that gives the starting location of each row in the page, with each row designated to a particular "slot". While the rows on a page are assigned to slots in key order, the actual locations of the data for those rows on the page may not be in key order. This blog post demonstrates how this works. So strictly speaking, the data for rows in a clustered index will not always be physically stored in key order, although the data can only be out of order within a page.

I used the word "coincidence" in an effort to drive home the point that a query may return rows in order of the clustered index key in the absence of an "ORDER BY" clause, not because the query syntax demands it, but because the optimizer chooses an execution plan that delivers the results in that order, as you point out. The optimizer is not constrained to return rows in order of the clustered index, it just happens to do so in this case. I suppose it may be more accurate to say that it's just a coincidence (from the user's point of view) that Microsoft designed the optimizer to behave like this. And, Microsoft makes no guarantees that it will maintain this behavior in future versions of SQL Server.

We seem to agree that it would be an abominable practice to rely on this behavior in any code where the order of results actually matters. I was just trying to emphasize to those who may be mistakenly expecting rows to always be returned in order of the clustered index key that they should not rely on this behavior. What responsible developer wants to write queries that will *probably* return the results in the desired/required order?

Thanks for your comments - I enjoy the discussion.

Jason

Jason Wolfkill
Blog: SQLSouth
Twitter: @SQLSouth
Phil Parkin
Phil Parkin
SSC Guru
SSC Guru (52K reputation)SSC Guru (52K reputation)SSC Guru (52K reputation)SSC Guru (52K reputation)SSC Guru (52K reputation)SSC Guru (52K reputation)SSC Guru (52K reputation)SSC Guru (52K reputation)

Group: General Forum Members
Points: 52093 Visits: 21173
That's more like it, thanks for taking the time to clarify Jason.

I understand that you were trying not to get too technical with your first answer & just wanted to make sure that we clarified a couple of the areas which I thought could have been misleading to those less-experienced SQL devs out there.


Help us to help you. For better, quicker and more-focused answers to your questions, consider following the advice in this link.

If the answer to your question can be found with a brief Google search, please perform the search yourself, rather than expecting one of the SSC members to do it for you.

Please surround any code or links you post with the appropriate IFCode formatting tags. It helps readability a lot.
GilaMonster
GilaMonster
SSC Guru
SSC Guru (226K reputation)SSC Guru (226K reputation)SSC Guru (226K reputation)SSC Guru (226K reputation)SSC Guru (226K reputation)SSC Guru (226K reputation)SSC Guru (226K reputation)SSC Guru (226K reputation)

Group: General Forum Members
Points: 226673 Visits: 46333
wolfkillj (7/22/2013)
So strictly speaking, the data for rows in a clustered index will not always be physically stored in key order, although the data can only be out of order within a page.


The pages can be out of physical order too, it's called fragmentation (logical order of pages doesn't match physical order of pages)

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass


wolfkillj
wolfkillj
SSCrazy
SSCrazy (2.7K reputation)SSCrazy (2.7K reputation)SSCrazy (2.7K reputation)SSCrazy (2.7K reputation)SSCrazy (2.7K reputation)SSCrazy (2.7K reputation)SSCrazy (2.7K reputation)SSCrazy (2.7K reputation)

Group: General Forum Members
Points: 2720 Visits: 2582
GilaMonster (7/22/2013)
wolfkillj (7/22/2013)
So strictly speaking, the data for rows in a clustered index will not always be physically stored in key order, although the data can only be out of order within a page.


The pages can be out of physical order too, it's called fragmentation (logical order of pages doesn't match physical order of pages)


Indeed. The leaf node pages of a clustered index are linked in a doubly-linked list according to the clustering key, but the physical location of the pages allocated to the index may not follow that order.

Would it be fair to say that this discussion yields at least two general principles?

1. The organization of rows in pages, leaf node pages in doubly-linked lists, etc., and the allocation and management of pages for such purposes are functions of SQL Server's storage engine.

2. A developer should write queries that return the desired results using the documented features of the T-SQL language and SQL Server's relational engine, not counting on the workings of the storage engine to affect the results in any way.

Jason Wolfkill
Blog: SQLSouth
Twitter: @SQLSouth
Go


Permissions

You can't post new topics.
You can't post topic replies.
You can't post new polls.
You can't post replies to polls.
You can't edit your own topics.
You can't delete your own topics.
You can't edit other topics.
You can't delete other topics.
You can't edit your own posts.
You can't edit other posts.
You can't delete your own posts.
You can't delete other posts.
You can't post events.
You can't edit your own events.
You can't edit other events.
You can't delete your own events.
You can't delete other events.
You can't send private messages.
You can't send emails.
You can read topics.
You can't vote in polls.
You can't upload attachments.
You can download attachments.
You can't post HTML code.
You can't edit HTML code.
You can't post IFCode.
You can't post JavaScript.
You can post emoticons.
You can't post or upload images.

Select a forum

































































































































































SQLServerCentral


Search