order of rows during bulk insert

  • ... not to argue with Steve Jimmo's point. In answer to 'how do I make sure that the data in the table is ordered as it is loaded', Steve is 100% right. But to take it a step further, ordering the data in the table does not guarantee that it will always be returned in that order.

    So if you are relying on processing the data in a given order at a later date, you must have a 'order by' key. Which incidentally, Steves suggestion also solves nicely.

  • Thanks Steve and Nick,

    Yes, I understand to retrieve the data I am ordering it by the Id_Column. My concern was that because the database retrieves the data in what the database deems most efficient (unless order is specified) I thought it might be loaded into the table that way too.

    Again, Thank you all! It's been very helpful.

    Tim



    Kind Regards,
    Tim
    :cool:[font="Courier New"]The stone age did not end due to a lack of stones, nor will the information age end due to a lack of data.[/font]:cool:

  • Nick,

    What argument? 🙂

    Tim (and Nick)

    The data is stored in the order that it is loaded. I do not think that your issue is actually how it is stored.

    When you issue a select * from tablename on a table without a PK or clustered index, the data should be returned in the order that it appears in the table using the collation order. If there is a clustered index on the table it will return the data in the order of the PK. Lastly in the order of the order by clause when you use it.

    Now, the next issue will be to match the parent records to the child records. How are you planning to do that?

    Steve Jimmo
    Sr DBA
    “If we ever forget that we are One Nation Under God, then we will be a Nation gone under." - Ronald Reagan

  • In the end, the order the data is loaded into a table is usually irrelevant unless a clustered index already exists on the table and you want to take advantage of the fact that the source data is already in the same order as the clustered index... maybe thats what you're getting at? Lots more info here:

    http://msdn.microsoft.com/en-us/library/aa196737(v=SQL.80).aspx

    The only thing I need to add (and this may be also be irrelevant to the original question) is that you should never rely on rows returning in a specific order without an order by. The rows are returned in the order that the query plan decides to return them, and for now they do exactly as Steve says but who knows what magnificent enhancements will be done to SQL Server in future that might cause the query plan to be rearranged in some way. Adding an order by guarantees the last step of the query plan orders the data.

  • sjimmo (6/23/2010)


    Nick,

    When you issue a select * from tablename on a table without a PK or clustered index, the data should be returned in the order that it appears in the table using the collation order. If there is a clustered index on the table it will return the data in the order of the PK.

    Neither of those statements is true (not even sure what the "order that it appears in the table using the collation order" actually means). If you issue a SELECT statement without ORDER BY then the ordering of rows returned is undefined[/i] and could even be different when you execute the same SELECT again even if the data has not changed. You cannot guarantee to return rows in any specific order unless you use ORDER BY. This is true even when the table has a clustered index - a fact that's often very easy to prove for yourself if you try it out.

    There is no guarantee that an IDENTITY column will be assigned in a particular order for a bulk insert. If you create the IDENTITY column after the insert then there is no guarantee that the order of the IDENTITY values will match the order rows were inserted. In fact the row insertion order isn't preserved or recorded anywhere in the table or index structure.

  • David, I am sorry but I would need to see your references on this.

    When data is bcp'ed into a table, and there are no indexes of any kind, especialy clustered, the data is written to the database pages in the order that it is passed in. There is nothing there that will reorder the load. Even then, with the way that clustered indexes work, the data at the page level is still in the order that it was loaded. The addition of an identity field after the load does not change the order of the records either. It assigns a sequential number from row 1 to row x. These numbers do not jump arond. I do not support this to be the method used to keep relationships between parent/child tables. But then, that wasn't the question. When a clustered index is created on the table, the order returned of the select will be in that order. Especially if that field is in the select and there is not a definitive order by clause in the statement.

    Steve Jimmo
    Sr DBA
    “If we ever forget that we are One Nation Under God, then we will be a Nation gone under." - Ronald Reagan

  • The idea that SELECT without ORDER BY returns the clustered index order has been debunked many, many times at SQLServerCentral and elsewhere. Just Google for those references. It's not difficult to prove it for yourself. For example:

    CREATE TABLE tbl (x INT NOT NULL PRIMARY KEY CLUSTERED, z CHAR(36) NOT NULL UNIQUE);

    INSERT INTO tbl VALUES (1,'0BDB3E52-2937-46DE-90CA-C7FD44A2593C');

    INSERT INTO tbl VALUES (2,'E83561A8-6D58-4F5E-BF1E-1679135682D3');

    INSERT INTO tbl VALUES (3,'D1DC055E-CFA4-423D-8EF0-A22A6ADE6937');

    INSERT INTO tbl VALUES (4,'3AA8901A-2ED6-46A6-AB1A-9A6B4DD6860D');

    SELECT * FROM tbl;

    The result I get is:

    x z

    ----------- ------------------------------------

    1 0BDB3E52-2937-46DE-90CA-C7FD44A2593C

    4 3AA8901A-2ED6-46A6-AB1A-9A6B4DD6860D

    3 D1DC055E-CFA4-423D-8EF0-A22A6ADE6937

    2 E83561A8-6D58-4F5E-BF1E-1679135682D3

    (4 row(s) affected)

    Your mileage may vary because there is no guarantee without ORDER BY but I can post a screenshot if you really need convincing.

    When data is bcp'ed into a table, and there are no indexes of any kind, especialy clustered, the data is written to the database pages in the order that it is passed in.

    It isn't necessarily stored in that order however. Suppose the table is split across several different files for example. If you do it enough times then you will generally get an unexpected sorting of the data. I've certainly had that experience when bulk loading to a table with IDENTITY. Of course there may not be any Microsoft reference that contradicts you but then Microsoft never made any such claim in the first place. Maybe you are just repeating what has been your experience but even if you have tried it 100,000 times and always got the same result that doesn't prove that it will always be so for eveyone in every system.

  • David,

    My next question to you after looking at your results - what is your collation and your collation sort order? Your results returned in your y column alpha-numeric.

    And yes, my answer is based upon many years of experience as well as numerous clases not only from Microsoft but Sybase as well, which if you recall is where SQL Server is actually from.

    Not going to debate as this is not the place, and based upon your many postings in other places on this same topic using the same examples, neither of us are going to easily sway the other.

    Steve Jimmo
    Sr DBA
    “If we ever forget that we are One Nation Under God, then we will be a Nation gone under." - Ronald Reagan

  • My collation is Latin1_General_CI_AS. But if it was ordered by the clustered index then that would be irrelevant because the cluster key in my example is an INTEGER. The ordering in this example is not the same as the cluster key. Did you try it?

  • Dave,

    I just setup a test. Set my collation to yours. Executed your script, recieved the same results.

    Modified your script to what I had said:

    CREATE TABLE tbl (x INT NOT NULL, z CHAR(36) NOT NULL UNIQUE);

    INSERT INTO tbl VALUES (1,'0BDB3E52-2937-46DE-90CA-C7FD44A2593C');

    INSERT INTO tbl VALUES (2,'E83561A8-6D58-4F5E-BF1E-1679135682D3');

    INSERT INTO tbl VALUES (3,'D1DC055E-CFA4-423D-8EF0-A22A6ADE6937');

    INSERT INTO tbl VALUES (4,'3AA8901A-2ED6-46A6-AB1A-9A6B4DD6860D');

    SELECT * FROM tbl;

    Notice the change in the results:

    x z

    ----------- ------------------------------------

    1 0BDB3E52-2937-46DE-90CA-C7FD44A2593C

    2 E83561A8-6D58-4F5E-BF1E-1679135682D3

    3 D1DC055E-CFA4-423D-8EF0-A22A6ADE6937

    4 3AA8901A-2ED6-46A6-AB1A-9A6B4DD6860D

    Then modified the table to add an identity field using the table as it already exists:

    x z id

    ----------- ------------------------------------ -----------

    1 0BDB3E52-2937-46DE-90CA-C7FD44A2593C 1

    2 E83561A8-6D58-4F5E-BF1E-1679135682D3 4

    3 D1DC055E-CFA4-423D-8EF0-A22A6ADE6937 3

    4 3AA8901A-2ED6-46A6-AB1A-9A6B4DD6860D 2

    The identity field (id) placed the unique id on records based upon the collation of the Z field. Why it chose that field s an unknown to me.

    Also note that the rows are returned in the order once again that they were inserted. There is no PK at this time.

    No we make the id field a PK:

    x z id

    ----------- ------------------------------------ -----------

    1 0BDB3E52-2937-46DE-90CA-C7FD44A2593C 1

    4 3AA8901A-2ED6-46A6-AB1A-9A6B4DD6860D 2

    3 D1DC055E-CFA4-423D-8EF0-A22A6ADE6937 3

    2 E83561A8-6D58-4F5E-BF1E-1679135682D3 4

    Note the rows are returned in the order of the PK which is also clustered.

    So again I ask what are your references. I amy be wrong but using your example, I think I have proven what I had been saying.

    Steve Jimmo
    Sr DBA
    “If we ever forget that we are One Nation Under God, then we will be a Nation gone under." - Ronald Reagan

  • one more example:

    Removing the PK from the id field, and placing it on your x field returned the following:

    x z id

    ----------- ------------------------------------ -----------

    1 0BDB3E52-2937-46DE-90CA-C7FD44A2593C 1

    2 E83561A8-6D58-4F5E-BF1E-1679135682D3 4

    3 D1DC055E-CFA4-423D-8EF0-A22A6ADE6937 3

    4 3AA8901A-2ED6-46A6-AB1A-9A6B4DD6860D 2

    which again shows that a select * is returning the results on the PK. In removing the PK and just creating a clusteres index on the x field returned the same results as above.

    Having a nonclustered index on the x field returned the results once again as they were inserted.

    Steve Jimmo
    Sr DBA
    “If we ever forget that we are One Nation Under God, then we will be a Nation gone under." - Ronald Reagan

  • But I've already shown you an example where the order of the SELECT statement without ORDER BY does not match the clustered index order and you say that you get the same result. So what's the point of talking about further examples?

    If you make column x into an IDENTITY it will make no difference. The order returned from the query will still not match the clustered index order.

    All I wanted to demonstrate was that your claim that SELECT without ORDER BY will return rows in the clustered or PK order is not true as a general rule.

  • The original comment was about records being loaded using bcp into a table with no PK or other indexing, and adding it after. (This was my original comment to Oracle-91.)

    Your example does not show that. If you are going to show what I say as wrong, at least try to do it the way it is described.

    Should you remove your key and do a select, what is the output.

    Steve Jimmo
    Sr DBA
    “If we ever forget that we are One Nation Under God, then we will be a Nation gone under." - Ronald Reagan

  • sjimmo (6/25/2010)


    The original comment was about records being loaded using bcp into a table with no PK or other indexing, and adding it after. (This was my original comment to Oracle-91.)

    That's not what you said though. Here is what you said:

    "If there is a clustered index on the table it will return the data in the order of the PK."

    I posted the example I did in order to show that what you said was not generally true. Even if you don't have any other index on the table (other than a clustered PK) then your statement is still not true because a clustered index scan can be an unordered scan (allocation order scan). Or it could be subject to a merry-go-round scan, which is also unordered. So query order will not always match the order of the PK if there is a clustered index on the table and you don't use ORDER BY.

    Should you remove your key and do a select, what is the output.

    Then it happens to be returned in insertion order - at least for this one example. But that's not what I was disagreeing with. However, if you change the UNIQUE constraint as follows then even without the clustered index you will not necessarily see the data returned in insertion order if you don't use ORDER BY - it will be in the nonclustered index order instead.

    CREATE TABLE tbl (x INT NOT NULL, z CHAR(36) NOT NULL UNIQUE (z,x));

    Not necessarily relevant to what you said before but just another interesting example.

    I hope you now understand the point I was making and hopefully all of these examples have helped the OP understand as well.

  • That's not what you said though. Here is what you said:

    "If there is a clustered index on the table it will return the data in the order of the PK."

    I made the mistake of saying clustered index and PK interchangably, which I should not have done.

    I should have said "If there is a clustered index on the table it will return the data in the order of the clustered index"

    Steve Jimmo
    Sr DBA
    “If we ever forget that we are One Nation Under God, then we will be a Nation gone under." - Ronald Reagan

Viewing 15 posts - 16 through 30 (of 42 total)

You must be logged in to reply to this topic. Login to reply