order of rows during bulk insert

  • I should have said "If there is a clustered index on the table it will return the data in the order of the clustered index"

    That's not always true however. Just because a table has a clustered index it does not mean a query without ORDER BY will return rows in that clustered index order - even if there are no other indexes involved. If SQL Server chooses to do an Advanced Scan or Unordered scan then rows may not be returned in the clustered index order.

  • Not on a simple select *

    Unless possibly there is a problem with the statistics on the clustered index.

    Steve Jimmo
    Sr DBA
    “If we ever forget that we are One Nation Under God, then we will be a Nation gone under." - Ronald Reagan

  • There have been countless examples on these forums of cases where OPs expected data to be returned in clustered index order when no 'order by' was specified. It's not guaranteed to happen. Most of the time it does, but you can't EVER rely on it. The optimizer is free to return rows in any oder that miximizes query efficiency when no 'order by' is specified. A new version of SQL Server can break current behavior at any time. If you want something in a particular order, the only way to guarantee it is to use 'order by'. So in the OPs case here, he should add the data he wishes to order by to the rows before he loads them into the server. Updating them once they're in the server won't guarantee he'll get what he expects.


    And then again, I might be wrong ...
    David Webb

  • sjimmo (6/25/2010)


    Not on a simple select *

    Are you saying that the IAM page chain will not be used to do an unordered page traversal if you specify a "simple select *" but it will be if you specify some subset of columns? I never heard that before and it sounds a bit unlikely. Perhaps it is true but I don't think that behaviour is documented. Or have you seen it documented somewhere?

    What is for sure is that even for your SELECT * query the clustered index scan will be identified as unordered in the execution plan ("unordered = True"), therefore in principle I understand the "official" line to be that SQL Server is free to perform an unordered scan. Even if it does not do so today, it could do if there is an engine change in some hotfix or service pack. Therefore to say that the scan will be ordered without giving any warning or qualification is perhaps a bit reckless, especially if we are talking about undocumented behaviour. I just think it's worth qualifying your claim a bit more so that anyone reading this can make up his own mind on whether he wants to take that risk.

  • I may be a bit slow on this, and am not going to be as adament on it as Dave, who has been having this very same arguments for a few years now with many others but:

    Doing a simple select * from <tablename> in which the table has no indexes of any kind will perform a table scan, and return the results in the order they are found on the various pages based upon the where clause unless there is a sorting clause to the sql command. Even then, it will perform a table scan, and do the sorting after it has recieved the result set.

    When there is a clustered index, then the select will do a index scan on the clustered index and return the records in the order that it finds them based upon the clustered index.

    I agree that the way that the engine works can be changed at any time by MS in a SP or a new release, but now you are really stretching into the realm of fantasy. The question had to do with a specific version, and action based upon the current systems and how they work today not in the distant future.

    Dave, I guess we will have to agree to disagree on this, as has happened to you many times in the past. (BTW, you can see everything I have demonstrated and said by running showplans against each query.)

    Steve Jimmo
    Sr DBA
    “If we ever forget that we are One Nation Under God, then we will be a Nation gone under." - Ronald Reagan

  • When there is a clustered index, then the select will do a index scan on the clustered index and return the records in the order that it finds them based upon the clustered index.

    Agreed. But a clustered index scan can be unordered. If you look at the execution plan you will see that in the plan. If the server chooses to do an allocation order scan of the clustered index then the data may not be returned in the index key order.

    Don't believe me. Here's Conor Cunningham's blog on the same topic. Conor is a lead developer on the SQL engine team.

    http://blogs.msdn.com/b/conor_cunningham_msft/archive/2008/08/27/no-seatbelt-expecting-order-without-order-by.aspx

    I recommend you look up the chapters on indexing and execution plans in the Inside SQL Server series of books published by Microsoft which explain about unordered scans. Or just read about unordered scans at Microsoft.com. This topic has been written about extensively by Kalen Delaney, Itzik Ben Gan (who wrote the Inside SQL series) and others. Like me they have also spent a lot of time trying to correct myths about clustered indexes determining the order of a query.

  • sjimmo (6/25/2010)


    BTW, you can see everything I have demonstrated and said by running showplans against each query

    See attached.

  • I just recently had to load data in from a flat file, and had to ensure the order (well, I actually had to be able to get the first row with the column headers).

    Solution:

    BCP into a temp table with an identity column, using a format file to skip the identity column.

    Example:

    IF OBJECT_ID('tempdb..#temp') IS NOT NULL DROP TABLE #temp

    CREATE TABLE #temp (RowData varchar(8000), RowID INT IDENTITY PRIMARY KEY CLUSTERED)

    -- load the data file

    PRINT 'BULK INSERT'

    BULK INSERT #temp

    FROM 'c:\<mypath>\<MyCSVFile>.csv'

    WITH (FORMATFILE='C:\<mypath>\<MyXMLFormatFile.xml')

    --Format file:

    <?xml version="1.0"?>

    <BCPFORMAT xmlns="http://schemas.microsoft.com/sqlserver/2004/bulkload/format" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">

    <RECORD>

    <FIELD ID="1" xsi:type="CharTerm" TERMINATOR="\r" MAX_LENGTH="8000"/>

    </RECORD>

    <ROW>

    <COLUMN SOURCE="1" NAME="RowData" xsi:type="SQLVARYCHAR"/>

    </ROW>

    </BCPFORMAT>

    Hope this helps

    Wayne
    Microsoft Certified Master: SQL Server 2008
    Author - SQL Server T-SQL Recipes


    If you can't explain to another person how the code that you're copying from the internet works, then DON'T USE IT on a production system! After all, you will be the one supporting it!
    Links:
    For better assistance in answering your questions
    Performance Problems
    Common date/time routines
    Understanding and Using APPLY Part 1 & Part 2

  • Hello Wayne,

    Your solution is perfect !! Thank you very much !!

    You saved my day !

  • Hi Everybody

    I have some problem with the Bulk insert in Sql server 2008r2 i have 1 sample.txt file with 500000 records and all records are in Proper Sequence and insert into the database table like this.

    Create Table sample (serialno varchar(max))

    //----- Table Created--------//

    Bulk insert sample From 'C:\Deno Range\sample.txt' WITH(KEEPIDENTITY, FIELDTERMINATOR = ',')

    //----- Bulk Insert Complited ------//

    view all data of table

    Drop Table sample

    //----- Table Deleted -----//

    Do this process with same file more than 3 or 4 time Continuously

    My Files data is like that

    000001,56589587458585,20,03-18-2014

    to

    500000,56589587458585,20,03-18-2014

    in proper sequence

    its properly work 3 times but at 4 or 5th time when i use bulk insert command all data are successfully inserted into table but it lost the sequence.

    after 1136th row 201545th row added and break all the sequence

    Please Give me Proper Solution for that.

  • awez1411 (3/18/2014)


    its properly work 3 times but at 4 or 5th time when i use bulk insert command all data are successfully inserted into table but it lost the sequence.

    after 1136th row 201545th row added and break all the sequence

    Please Give me Proper Solution for that.

    Hi,

    What do you mean by "lost the sequence"? How are you determining what the sequence is? Tables have no inherent sequence and insertion order is not something you can determine just by looking at the data in a table. The way to order a table is using an ORDER BY clause in your SELECT statement.

    In your case it looks like you have an IDENTITY column and you are using the KEEPIDENTITY option to insert those values directly from the file. Are you saying the IDENTITY column values don't match the data in the file?

  • I had the same problem. I created the SQL table with an identity column and made it the primary key. Then create a view on the empty table. Then make the destination of the bulk insert the view. It worked for me. Hope this helps.

  • I found a method that guarantees the order is retained. No other method I found worked 100%.
    ---------------------------------
    Declare @X xml;
    ---------------------------------
    SELECT @X=Cast('<X>'+Replace([BulkColumn],Char(13)+Char(10),'</X><X>')+'</X>' as XML)
    FROM OPENROWSET (BULK N'\\FileServer\ImportFolder\ImportFile_20170120.csv',SINGLE_CLOB) T
    ---------------------------------
    SELECT [Record].[X].query('.').value('.','varchar(max)') [Record]
      ,ROW_NUMBER() OVER (ORDER BY (SELECT 100)) [ID]
    --Into #TEMP
    FROM @X.nodes('X') [Record](X);
    ---------------------------------

    This imports each row into a single column but adds an ID column incremented in the order each row was read.  A general BULK IMPORT does not retain record order when inserting into the target table.
    This was tested on Sql2008R2.

Viewing 13 posts - 31 through 42 (of 42 total)

You must be logged in to reply to this topic. Login to reply