June 16, 2011 at 3:35 pm
I have a flat file that I importing into my database but the rows are not coming into the table in the same order as in the flat file. I know that the data is not coming into the database in order, due records that should be later on in the file are showing up earlier and in the middle of another section of data where it doesn’t belong.
The flat file is created in a specific order and with only column of data. Therefore I can't order by the column
How can I get the bulk insert task or command to keep the order of the flat file?
June 16, 2011 at 3:45 pm
You really can't. SQL table rows don't have an "order" unless you can use data to assign one.
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
June 16, 2011 at 6:15 pm
newbie.with.sql (6/16/2011)
I have a flat file that I importing into my database but the rows are not coming into the table in the same order as in the flat file. I know that the data is not coming into the database in order, due records that should be later on in the file are showing up earlier and in the middle of another section of data where it doesn’t belong.The flat file is created in a specific order and with only column of data. Therefore I can't order by the column
How can I get the bulk insert task or command to keep the order of the flat file?
Step 1:
> Make sure you have an IDENTITY column on your load table.
Step 2:
> Option 1: In theory you could try setting the threads to 1 and setting the batch commit size to 1. I have not tried it but that may guarantee an ordered load. Having a batch commit size of 1 will be slow...slow like issuing a ton of sequential batches each with on INSERT statement, but it may get the job done. How big is the file?
> Option 2: The bcp.exe utility could work for you if Option 1 fails. bcp reads/loads files sequentially and TMK is single threaded so the same theory above would apply. If you set the batch commit size to 1 (-b option) you should get an ordered load.
There are no special teachers of virtue, because virtue is taught by the whole community.
--Plato
June 16, 2011 at 7:16 pm
I'll try with the IDENTITY column, since I am using Option # 2, to see if it helps... Thanks,
June 17, 2011 at 6:31 am
opc.three (6/16/2011)
newbie.with.sql (6/16/2011)
I have a flat file that I importing into my database but the rows are not coming into the table in the same order as in the flat file. I know that the data is not coming into the database in order, due records that should be later on in the file are showing up earlier and in the middle of another section of data where it doesn’t belong.The flat file is created in a specific order and with only column of data. Therefore I can't order by the column
How can I get the bulk insert task or command to keep the order of the flat file?
Step 1:
> Make sure you have an IDENTITY column on your load table.
Step 2:
> Option 1: In theory you could try setting the threads to 1 and setting the batch commit size to 1. I have not tried it but that may guarantee an ordered load. Having a batch commit size of 1 will be slow...slow like issuing a ton of sequential batches each with on INSERT statement, but it may get the job done. How big is the file?
> Option 2: The bcp.exe utility could work for you if Option 1 fails. bcp reads/loads files sequentially and TMK is single threaded so the same theory above would apply. If you set the batch commit size to 1 (-b option) you should get an ordered load.
Unless you add a column that states the order of the rows, in the source file, you really can't count on these working either.
SQL Server doesn't store "the rows are in this order" data unless you tell it to, and tell it what the order is, and define the order either based on the data, or by manually assigning it to each row.
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
June 17, 2011 at 6:35 am
That's why the IDENTITY column is added in Step 1.
You're not going to "physical order" are you? I'm pretty sure all that's needed is logical order. Once the data is loaded this will work just fine:
SELECT *
FROM dbo.load_table
ORDER BY id -- id being the IDENTITY column
There are no special teachers of virtue, because virtue is taught by the whole community.
--Plato
June 17, 2011 at 6:47 am
opc.three (6/17/2011)
That's why the IDENTITY column is added in Step 1.You're not going to "physical order" are you? I'm pretty sure all that's needed is logical order. Once the data is loaded this will work just fine:
SELECT *
FROM dbo.load_table
ORDER BY id -- id being the IDENTITY column
You would need to verify it through some means. All it takes is one other process running in the CPU, and sequentiality can be ruined. You might get what you want, or you might not, and you won't know unless you manually verify it. Like you said in your post, "should". That's not "will". You can't count on it.
If the order matters, and apparently it does, it either needs to be ordered in the source file, or manually ordered/verified in the destination table.
Honestly, ordering in the source file usually isn't that hard. Excel can do that for you in many cases. So can several text editors that can add line numbers to documents.
Outside of that, import through a VB file reader. Those are line-by-line by definition, using a For Next loop, and you can force ordering into the table that way. But don't use an ID column, use a counter in the loop. That makes it certain (adds sequencing in the source, essentially, by adding it inline during the pull).
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
June 17, 2011 at 6:52 am
GSquared (6/17/2011)
opc.three (6/17/2011)
That's why the IDENTITY column is added in Step 1.You're not going to "physical order" are you? I'm pretty sure all that's needed is logical order. Once the data is loaded this will work just fine:
SELECT *
FROM dbo.load_table
ORDER BY id -- id being the IDENTITY column
You would need to verify it through some means. All it takes is one other process running in the CPU, and sequentiality can be ruined. You might get what you want, or you might not, and you won't know unless you manually verify it. Like you said in your post, "should". That's not "will". You can't count on it.
If the order matters, and apparently it does, it either needs to be ordered in the source file, or manually ordered/verified in the destination table.
Honestly, ordering in the source file usually isn't that hard. Excel can do that for you in many cases. So can several text editors that can add line numbers to documents.
Outside of that, import through a VB file reader. Those are line-by-line by definition, using a For Next loop, and you can force ordering into the table that way. But don't use an ID column, use a counter in the loop. That makes it certain (adds sequencing in the source, essentially, by adding it inline during the pull).
I guess I don't see how it could not be ordered. With a single threaded file loader (bcp) reading the file one row at a time and a batch size of 1 (means commit is called every every row is pushed to the table) then ACID being what it is it would (there, I said it) be ordered in the table as it is in the file.
There are no special teachers of virtue, because virtue is taught by the whole community.
--Plato
June 17, 2011 at 4:22 pm
I tried using this line via a stored procedure ,the records were still mixed up..
bulk insert Data from 'C:\SQLData\Import\P11.dat' with (fieldterminator = '', rowterminator ='\r',rows_per_batch =1)
Is this incorrect? It took about the same amount of time to insert the records as it did without the rows_per_batch addition which seems strange.
June 17, 2011 at 4:25 pm
GSquared,
How do I manually assign it to each row?
Thanks
June 17, 2011 at 4:28 pm
The bcp.exe utility is what you want. bcp is much different than BULK INSERT.
There are no special teachers of virtue, because virtue is taught by the whole community.
--Plato
June 18, 2011 at 4:56 pm
I couldn't help myself...it's been a few years since I used BCP regularly so I had to put what I said to the test.
USE tempdb
GO
IF OBJECT_ID(N'dbo.ordered_load_view') > 0
DROP VIEW dbo.ordered_load_view ;
GO
IF OBJECT_ID(N'dbo.ordered_load_table') > 0
DROP TABLE dbo.ordered_load_table ;
GO
CREATE TABLE dbo.ordered_load_table (id INT IDENTITY(1,1) NOT NULL,
some_info VARCHAR(100) NOT NULL ) ;
GO
CREATE VIEW dbo.ordered_load_view
AS
SELECT some_info FROM dbo.ordered_load_table
GO
Step 2) Save the text file in the zip file attached to this post to C:\@\ and then run this bcp command from a Cmd prompt. Note: adjust the database server\instance as needed to populate the table via the view. The text file contains 1 million lines of text, each line having the line number on it so it is easy to compare with the IDENTITY column later.
bcp tempdb.dbo.ordered_load_view in C:\@\1.txt -b 1 -c -T -S server\instance
Step 3) Execute this T-SQL code to verify that the data was inserted in the order of the file:
USE tempdb
GO
SET NOCOUNT OFF ;
GO
RAISERROR('checking for rows loaded out of order',10,1) WITH NOWAIT ;
GO
SELECT *
FROM ordered_load_table
-- this checks if any data file row numbers did not match the id assigned by the DB
WHERE id != CAST(some_info AS INT) ;
GO
On my two-proc machine running SQL 2008 R2 Standard Edition 0 rows were returned from the verification check.* This tells us that the technique could work for you newbie.with.sql.
USE tempdb
GO
SET NOCOUNT ON;
go
IF OBJECT_ID(N'dbo.ordered_load_view') > 0
DROP VIEW dbo.ordered_load_view ;
GO
IF OBJECT_ID(N'dbo.ordered_load_table') > 0
DROP TABLE dbo.ordered_load_table ;
GO
CREATE TABLE dbo.ordered_load_table (id INT IDENTITY(1,1) NOT NULL,
some_info VARCHAR(10) NOT NULL ) ;
GO
CREATE VIEW dbo.ordered_load_view
AS
SELECT some_info FROM dbo.ordered_load_table
go
RAISERROR('starting BULK INSERT',10,1) WITH NOWAIT ;
GO
BULK INSERT tempdb.dbo.ordered_load_view FROM 'C:\@\1.txt'
WITH (ROWTERMINATOR = '',
BATCHSIZE = 1, -- Edit: this is the key to the technique. SET NOCOUNT OFF to see why
ROWS_PER_BATCH = 1,
MAXERRORS = 0) ;
GO
RAISERROR('BULK INSERT complete',10,1) WITH NOWAIT ;
GO
RAISERROR('checking for rows loaded out of order',10,1) WITH NOWAIT ;
GO
SET NOCOUNT OFF;
go
SELECT *
FROM ordered_load_table
-- this checks if any data file row numbers did not match the id assigned by the DB
WHERE id != CAST(some_info AS INT) ;
GO
I wasn't sure because I do not use BULK INSERT much but in fact 0 rows were returned from the verification check.* This means that the BULK INSERT could also work for you newbie.with.sql by using the BATCHSIZE option.
EDIT: re-attached zip file with test file...it was somehow lost along the way
There are no special teachers of virtue, because virtue is taught by the whole community.
--Plato
Viewing 12 posts - 1 through 12 (of 12 total)
You must be logged in to reply to this topic. Login to reply
This website stores cookies on your computer.
These cookies are used to improve your website experience and provide more personalized services to you, both on this website and through other media.
To find out more about the cookies we use, see our Privacy Policy