Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase 12»»

Data Transfer - data not transfered in order Expand / Collapse
Author
Message
Posted Thursday, July 18, 2013 4:20 AM
SSC Veteran

SSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC Veteran

Group: General Forum Members
Last Login: Today @ 7:52 AM
Points: 228, Visits: 464
hi there,

I have a table with 371 751 rows.

I have a standard SSIS package that transfers data from table A to table B. There are no fancy data manipulation. Yet when I trnasfer the data it doesn't end up in the database table in the same order?

Example the first entry in the table A is ID "4930". Yet in table B the first ID is "142701". Is there a reason it doesn't transfer in the correct order.

It's not the end of the world as all the data is transferring, it's just I don't understand the logic.

Thanks
Post #1474960
Posted Thursday, July 18, 2013 4:26 AM
Say Hey Kid

Say Hey KidSay Hey KidSay Hey KidSay Hey KidSay Hey KidSay Hey KidSay Hey KidSay Hey Kid

Group: General Forum Members
Last Login: Wednesday, November 19, 2014 5:22 AM
Points: 711, Visits: 2,209
Hi,

How are you identifying the "different order". There is no default "order by" in SQL Server. If you've got a clustered index on the table then the data is logically ordered by that within the table. Simply doing a select top x from blah with no explicit specification for the order by will select top x random rows.

Thanks,


Simon




MCSE: Data Platform
MCSE: Business Intelligence
Follow me on Twitter: @WazzTheBadger
LinkedIn Profile: Simon Osborne
Post #1474961
Posted Thursday, July 18, 2013 4:29 AM


SSChampion

SSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampion

Group: General Forum Members
Last Login: Today @ 11:18 AM
Points: 13,545, Visits: 11,359
Order isn't important, that's something for the visualization layer
The order of the rows can depend on indexes on source and/or destination table and the query written to retrieve the rows.




How to post forum questions.
Need an answer? No, you need a question.
What’s the deal with Excel & SSIS?

Member of LinkedIn. My blog at LessThanDot.

MCSA SQL Server 2012 - MCSE Business Intelligence
Post #1474962
Posted Thursday, July 18, 2013 4:54 AM
SSC Veteran

SSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC Veteran

Group: General Forum Members
Last Login: Today @ 7:52 AM
Points: 228, Visits: 464
Thanks guys. It was more a reason I was after which you have both supplied. I won't concern myself with it, but will play about with some indexes in development.

Thanks again
Post #1474970
Posted Friday, July 19, 2013 10:06 AM


Ten Centuries

Ten CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen Centuries

Group: General Forum Members
Last Login: Tuesday, October 28, 2014 12:50 PM
Points: 1,061, Visits: 2,580
Ryan Keast (7/18/2013)
Thanks guys. It was more a reason I was after which you have both supplied. I won't concern myself with it, but will play about with some indexes in development.

Thanks again


TL/DR: If it is important that the rows in a result set are in a certain order, the query MUST include an ORDER BY clause specifying that order, regardless of what indexes exist on the underlying tables.

Hi Ryan,

I'm concerned that maybe the previous answers to your question didn't flesh out the issue enough to keep you from wandering into problematic territory. The relational model defines a relation as an unordered set of tuples. In SQL terminology, a relation is represented as a table (and tuples as rows). The relation:table equivalency is not perfect, however, especially because a relational database has to store actual tables with actual rows instead of conceptual relations comprised of conceptual tuples, so it has to store the rows in some kind of order. A clustered index on the table will govern the logical order in which rows are stored, while tables with no clustered indexes, called heaps, will logically order rows in the order in which they were inserted.

*However*, clustered indexes ONLY provide a logical order for storing rows and a means to seek for rows when the searching on the clustered index key. The existence of a clustered index does NOT in any way guarantee the order in which rows will be retrieved or returned from the table. In SQL Server, the query optimizer generates an execution plan that will be guaranteed to return all rows that meet the search conditions (and only those rows), but the rows will be returned in the order they leave the last operator in the execution plan. Since some execution plans require sorts of intermediate result sets, the final order of the rows may or may not correspond to the order of rows in a clustered index.

Accordingly, if it is important that the rows in a result set are in a certain order, the query MUST include an ORDER BY clause specifying that order, regardless of what indexes exist on the underlying tables. That is the ONLY way to guarantee that rows in a result set are ordered in a specific manner. Some people are led astray by the fact that SQL Server often returns rows from a table in order of the clustered index key even in the absence of an ORDER BY clause, but it is merely coincidence that the rows are in that order when they leave the final operator in the execution plan. It can't be said too often - the only way to guarantee the order of rows in a result set is to include an ORDER BY clause in the query.

Here endeth the lesson.

Jason


Jason Wolfkill
Blog: SQLSouth
Twitter: @SQLSouth
Post #1475574
Posted Monday, July 22, 2013 12:07 AM


SSCertifiable

SSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiable

Group: General Forum Members
Last Login: Today @ 11:32 AM
Points: 5,257, Visits: 12,185
I have some questions about your post wolfkillj.

heaps, will logically order rows in the order in which they were inserted


What do you mean by 'logically' in the above?

*However*, clustered indexes ONLY provide a logical order for storing rows


Are you sure? No physical order whatsoever?

...but it is merely coincidence that the rows are in that order when they leave the final operator in the execution plan


Not coincidence. Under some circumstances (often select * from [table]), rows are returned in the order defined by the clustered PK, just because the SQL query optimiser decided that that was the best way to do it. It's not guaranteed, but it's not coincidence either.

Note that I am not suggesting that ORDER BY should not be used, just picking up on the details in your message.



Help us to help you. For better, quicker and more-focused answers to your questions, consider following the advice in this link.

When you ask a question (and please do ask a question: "My T-SQL does not work" just doesn't cut it), please provide enough information for us to understand its context.
Post #1475869
Posted Monday, July 22, 2013 10:16 AM


Ten Centuries

Ten CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen Centuries

Group: General Forum Members
Last Login: Tuesday, October 28, 2014 12:50 PM
Points: 1,061, Visits: 2,580
Phil Parkin (7/22/2013)
I have some questions about your post wolfkillj.

heaps, will logically order rows in the order in which they were inserted


What do you mean by 'logically' in the above?

*However*, clustered indexes ONLY provide a logical order for storing rows


Are you sure? No physical order whatsoever?

...but it is merely coincidence that the rows are in that order when they leave the final operator in the execution plan


Not coincidence. Under some circumstances (often select * from [table]), rows are returned in the order defined by the clustered PK, just because the SQL query optimiser decided that that was the best way to do it. It's not guaranteed, but it's not coincidence either.

Note that I am not suggesting that ORDER BY should not be used, just picking up on the details in your message.


Hi Phil,

I probably oversimplified my answers you quote to avoid getting *too* technical, as I only wanted to provide a high-level view of why the "rows from a clustered index will always be returned in key order" myth is incorrect.

I did bungle the first bit about how rows are managed in heaps. There is no order, logical or otherwise, to rows in a heap, of course. What I should have said is that new rows will be inserted in the last allocated page of the heap and a "SELECT *" from a heap will return rows in page allocation order. As Paul Randal explains, this will not necessarily be the same order in which the rows were inserted, apparently because of the possibility that an updated row will be moved to a new page and a forwarding record left on the original page.

The reason that a clustered index does not guarantee that rows will be in a "physical" order is similarly esoteric. Each data page includes a "row offset array" that gives the starting location of each row in the page, with each row designated to a particular "slot". While the rows on a page are assigned to slots in key order, the actual locations of the data for those rows on the page may not be in key order. This blog post demonstrates how this works. So strictly speaking, the data for rows in a clustered index will not always be physically stored in key order, although the data can only be out of order within a page.

I used the word "coincidence" in an effort to drive home the point that a query may return rows in order of the clustered index key in the absence of an "ORDER BY" clause, not because the query syntax demands it, but because the optimizer chooses an execution plan that delivers the results in that order, as you point out. The optimizer is not constrained to return rows in order of the clustered index, it just happens to do so in this case. I suppose it may be more accurate to say that it's just a coincidence (from the user's point of view) that Microsoft designed the optimizer to behave like this. And, Microsoft makes no guarantees that it will maintain this behavior in future versions of SQL Server.

We seem to agree that it would be an abominable practice to rely on this behavior in any code where the order of results actually matters. I was just trying to emphasize to those who may be mistakenly expecting rows to always be returned in order of the clustered index key that they should not rely on this behavior. What responsible developer wants to write queries that will *probably* return the results in the desired/required order?

Thanks for your comments - I enjoy the discussion.

Jason


Jason Wolfkill
Blog: SQLSouth
Twitter: @SQLSouth
Post #1476117
Posted Monday, July 22, 2013 10:25 AM


SSCertifiable

SSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiable

Group: General Forum Members
Last Login: Today @ 11:32 AM
Points: 5,257, Visits: 12,185
That's more like it, thanks for taking the time to clarify Jason.

I understand that you were trying not to get too technical with your first answer & just wanted to make sure that we clarified a couple of the areas which I thought could have been misleading to those less-experienced SQL devs out there.



Help us to help you. For better, quicker and more-focused answers to your questions, consider following the advice in this link.

When you ask a question (and please do ask a question: "My T-SQL does not work" just doesn't cut it), please provide enough information for us to understand its context.
Post #1476123
Posted Monday, July 22, 2013 10:41 AM


SSC-Forever

SSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-Forever

Group: General Forum Members
Last Login: Today @ 1:13 PM
Points: 40,428, Visits: 36,879
wolfkillj (7/22/2013)
So strictly speaking, the data for rows in a clustered index will not always be physically stored in key order, although the data can only be out of order within a page.


The pages can be out of physical order too, it's called fragmentation (logical order of pages doesn't match physical order of pages)



Gail Shaw
Microsoft Certified Master: SQL Server 2008, MVP
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

Post #1476138
Posted Monday, July 22, 2013 11:37 AM


Ten Centuries

Ten CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen Centuries

Group: General Forum Members
Last Login: Tuesday, October 28, 2014 12:50 PM
Points: 1,061, Visits: 2,580
GilaMonster (7/22/2013)
wolfkillj (7/22/2013)
So strictly speaking, the data for rows in a clustered index will not always be physically stored in key order, although the data can only be out of order within a page.


The pages can be out of physical order too, it's called fragmentation (logical order of pages doesn't match physical order of pages)


Indeed. The leaf node pages of a clustered index are linked in a doubly-linked list according to the clustering key, but the physical location of the pages allocated to the index may not follow that order.

Would it be fair to say that this discussion yields at least two general principles?

1. The organization of rows in pages, leaf node pages in doubly-linked lists, etc., and the allocation and management of pages for such purposes are functions of SQL Server's storage engine.

2. A developer should write queries that return the desired results using the documented features of the T-SQL language and SQL Server's relational engine, not counting on the workings of the storage engine to affect the results in any way.


Jason Wolfkill
Blog: SQLSouth
Twitter: @SQLSouth
Post #1476167
« Prev Topic | Next Topic »

Add to briefcase 12»»

Permissions Expand / Collapse