OpenRowset for CSV File

Question

OpenRowset for CSV File

joshua 15769

Default port

Points: 1492
More actions
November 23, 2016 at 1:29 am

#323293

So earlier, I am on the verge of creating a code that would import a CSV file that is machine generated, to our database. I created one in Excel, and I used this code
select *
from openrowset('MSDASQL'
,'Driver={Microsoft Access Text Driver (*.txt, *.csv)}'
,'select * from D:\Test.CSV')
And it worked fine.
But when I am dealing with the actual data. The above code does not work.
So the CSV file contains a leading 18 row data which information could be dropped off (its just the name of the machine), the needed data is at the 19th row.
After searching, I found out a piece of code, and I tried it to the CSV file, which is
SELECT *
FROM OPENROWSET(BULK 'D:\Data\sample\device1_2016-08-03_15-24-58.csv',
FORMATFILE='D:\Data\sample\BCPFormat.xml',
FIRSTROW = 19) AS a
but the data is blank!
is there any other way to get the data of a CSV file?

Viewing 15 posts - 1 through 15 (of 15 total)

You must be logged in to reply to this topic. Login to reply

Chris Harshman SSC-Forever Points: 42192 More actions · Answer 1

If you already have a table you're trying to load data into, I've often found BULK INSERT to be easier to work with than OPENROWSET. Have you tried something like one of these:

BULK INSERT TestCSVImport FROM 'D:\Data\sample\device1_2016-08-03_15-24-58.csv'

WITH (FIELDTERMINATOR = ',', ROWTERMINATOR = '', FirstRow=19)

BULK INSERT TestCSVImport FROM 'D:\Data\sample\device1_2016-08-03_15-24-58.csv'

WITH (FORMATFILE = 'D:\Data\sample\BCPFormat.xml', FirstRow=19)

https://msdn.microsoft.com/en-us/library/ms188365(v=sql.110).aspx

joshua 15769 Default port Points: 1492 More actions · Answer 2

I tried using your code sir, first, i uploaded the data as temp table in my database (into #TestCSVImport). I also renamed the column names of the CSV file to know if the columns are to blame.

To Iterate

I used this code to create a blank table

SELECT *

into #TestCSVImport

FROM OPENROWSET(BULK 'D:\Data\sample\device1_2016-08-03_15-24-58.csv',

FORMATFILE='D:\Data\sample\BCPFormat.xml',

FIRSTROW = 20) AS a

Then I used this code to insert the values

BULK INSERT #TestCSVImport FROM 'D:\Data\sample\device1_2016-08-03_15-24-58.csv'

WITH (FIELDTERMINATOR = ',', ROWTERMINATOR = '', FirstRow=20)

BULK INSERT #TestCSVImport FROM 'D:\Data\sample\device1_2016-08-03_15-24-58.csv'

WITH (FORMATFILE = 'D:\Data\sample\BCPFormat.xml', FirstRow=20)

the query returned 0 rows.

Jeff Moden SSC Guru Points: 1003863 More actions · Answer 3

joshua 15769 (11/23/2016)
I tried using your code sir, first, i uploaded the data as temp table in my database (into #TestCSVImport). I also renamed the column names of the CSV file to know if the columns are to blame.
To Iterate
I used this code to create a blank table
SELECT *
into #TestCSVImport
FROM OPENROWSET(BULK 'D:\Data\sample\device1_2016-08-03_15-24-58.csv',
FORMATFILE='D:\Data\sample\BCPFormat.xml',
FIRSTROW = 20) AS a
Then I used this code to insert the values
BULK INSERT #TestCSVImport FROM 'D:\Data\sample\device1_2016-08-03_15-24-58.csv'
WITH (FIELDTERMINATOR = ',', ROWTERMINATOR = '', FirstRow=20)
BULK INSERT #TestCSVImport FROM 'D:\Data\sample\device1_2016-08-03_15-24-58.csv'
WITH (FORMATFILE = 'D:\Data\sample\BCPFormat.xml', FirstRow=20)
the query returned 0 rows.

Did you actually create a format file?

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

joshua 15769 Default port Points: 1492 More actions · Answer 4

Yes Sir, I created a XML file, here is the screenshot.

Do take note i change my terminator from '","' to ','

and here sir is the textfile

I would like to ask also that, is it possible for TSQL to retrieve data FROM a CSV 1st and 3rd row?

Jeff Moden SSC Guru Points: 1003863 More actions · Answer 5

joshua 15769 (11/24/2016)
Yes Sir, I created a XML file, here is the screenshot.
Do take note i change my terminator from '","' to ','
and here sir is the textfile
I would like to ask also that, is it possible for TSQL to retrieve data FROM a CSV 1st and 3rd row?

As you know, you posted the example CSV file as a graphic. Any chance of you attaching that same example as a text file so I can try to duplicate your situation? Same goes for the format file. Attaching it as a text file so I can use it to test with would be real handy. Thanks.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

joshua 15769 Default port Points: 1492 More actions · Answer 6

Hi Sir Jeff!

Thank you for your time!

Sir, I have uploaded the data files which you wanted. Thanks again!

Going back to the last thread that we have discussed sir, IF ever we found a solution to this problem sir, what if the data file, the CSV file, adds a column? then I should add the column to the XML file right? then I saw one of your post strongly suggesting the use of BULK INSERT, could it, dynamically (through TSQL), add a column?

Jeff Moden SSC Guru Points: 1003863 More actions · Answer 7

joshua 15769 (11/25/2016)
Hi Sir Jeff!
Thank you for your time!
Sir, I have uploaded the data files which you wanted. Thanks again!
Going back to the last thread that we have discussed sir, IF ever we found a solution to this problem sir, what if the data file, the CSV file, adds a column? then I should add the column to the XML file right? then I saw one of your post strongly suggesting the use of BULK INSERT, could it, dynamically (through TSQL), add a column?

You posted in a 2008 forum and so I assume that's what you're using.

One of the MAJOR problems with BCP and BULK INSERT prior to 2012 is that it didn't count row terminators to count the number of rows to skip. According to your data, the first row should be row 21. In actuality, it's row 7 because the version of BULK INSERT for 2008 still counted column delimiters to determine if it should even look for a row terminator.

Also, you don't need a BCP Format file for this bad boy. Just a properly created staging table. I saved your device file to my C:\Temp\ folder and then ran the following code (I don't trust most defaults which is why so many "WITH" options). With that in mind, don't forget to change the FROM clause of the BULK INSERT to wherever your file actually is. Of course, you eventually need to change the BULK INSERT to dynamic SQL to make it so you can pass the full file path.

--===== This is the table that we'll import to.

CREATE TABLE dbo.DeviceStaging

(

[NO.] INT

,[Time] DATETIME

,ms SMALLINT

,degC1 DECIMAL(9,3)

,degC2 DECIMAL(9,3)

,degC3 DECIMAL(9,3)

,degC4 DECIMAL(9,3)

,degC5 DECIMAL(9,3)

,degC6 DECIMAL(9,3)

,A12345678901 VARCHAR(100)

,A12345678902 VARCHAR(100)

,A12345678903 VARCHAR(100)

,A12345678904 VARCHAR(100)

,A1234 VARCHAR(100)

)

;

--====== This just makes sure the staging table is empty for reruns

TRUNCATE TABLE dbo.DeviceStaging

;

--===== This does the import

BULK INSERT dbo.DeviceStaging

FROM 'C:\Temp\device1_2016-08-03_15-24-58.txt'

WITH (

BATCHSIZE = 2000000000

,CODEPAGE = 'RAW'

,DATAFILETYPE = 'CHAR'

,FIELDTERMINATOR = ','

,ROWTERMINATOR = ''

,FIRSTROW = 7

,TABLOCK

)

;

--===== Let's see if it worked (it does)

SELECT * FROM dbo.DeviceStaging

;

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1003863 More actions · Answer 8

p.s.

The code I posted won't work without error when you do upgrade to 2012 or greater because Bulk Insert will count the row terminators instead of the field delimiters to figure out where the end of a row is and because it's setup to start seemingly at row 7 instead of 21. It won't, however, fail silently. It'll skip the first 6 rows and then start producing errors because the target table is expecting INT values for the first column. Since the default number of errors tolerated by BULK INSERT is 10, it'll generate 10 errors (making it to row 17) and then fail. All that will need to be done is to change FIRSTROW to 21 and you're done.

It COULD be made to be totally self healing even with the addition or reduction of actual columns in the data. In the process, it could also be made to include data from the 20 row header and a bunch of other things. One of the ways to do that is to BULK INSERT the file as single line blobs into a working table and programmatically scan through to find and interrogate the location (row number) of the header. Then re-export the file without the rows prior to the header and re-import to the staging table. In that first import, you could also interrogate the column names of the header and dynamically build the staging table to match the file. If using BCP and xp_CmdShell bugs you (it's NOT a security risk if done properly, though), then someone could write a preprocessor to remove the "super header" prior to import attempts.

Another way would be to simply import everything as VARCHAR and then delete rows that don't pass muster. The problem with that is that there's no guarantee that errors won't be generated due to the varying number of "fields" per row in the "super header". The more complex methods I outlined above would work correctly IF there's some stability in the data as to what the header is actually going to look like or which row the header will always appear on. If it all sounds too complicated, that would be because it's a bit difficult to build forms of artificial intelligence even into managed code, never mind T-SQL. I can't say for sure but even a product like Hadoop may not help any more than the methods I proposed and have the side affect of adding to the proverbial "Tower of Babel" which appears in most folks ETL solutions.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

joshua 15769 Default port Points: 1492 More actions · Answer 9

If I could only know how to check the number of rows in this kind of CSV, that would be great! but anyway, thank you for the insight sir! This would be my foundation to the system which I would be creating, thank you so much!

Jeff Moden SSC Guru Points: 1003863 More actions · Answer 10

joshua 15769 (11/27/2016)
If I could only know how to check the number of rows in this kind of CSV, that would be great!

It's easy. Don't set FIRSTROW and let it fail. It'll tell you how many rows it thinks there are. It said it failed on the first 6 rows when I first ran the code. 🙂

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

joshua 15769 Default port Points: 1492 More actions · Answer 11

Hi Sir Jeff Moden,

Good Day.

Sir, If you dont mid asnwering again, I searched in the internet but to no avail. My current problem is, the data which I gave you, It has its tendency to grow (add columns). But the added column would not appear in the last column; It appears between columns.

For example

Column A |Column B|

This is the original Column.

Column A |Column C| Column B

This is the column that changed.

My question is, using BULK INSERT, I'm having a problem getting the data from the CSV file.

When the columns of the staging table and the CSV are not equal, the data wont upload. Even in a temp table.

is there a way for me to somehow copy the data from the CSV file from the 7th row AS IS, bypassing all rows and errors that would be encountered, using BULK INSERT?

If not, could I use OPENROWSET to select the contents of the CSV file, load it to the database, without a format file but I would have to use FIRSTROW command?

Jeff Moden SSC Guru Points: 1003863 More actions · Answer 12

joshua 15769 (12/1/2016)
Hi Sir Jeff Moden,
Good Day.
Sir, If you dont mid asnwering again, I searched in the internet but to no avail. My current problem is, the data which I gave you, It has its tendency to grow (add columns). But the added column would not appear in the last column; It appears between columns.
For example
Column A |Column B|
This is the original Column.
Column A |Column C| Column B
This is the column that changed.
My question is, using BULK INSERT, I'm having a problem getting the data from the CSV file.
[font="Arial Black"]When the columns of the staging table and the CSV are not equal, the data wont upload. [/font]Even in a temp table.
is there a way for me to somehow copy the data from the CSV file from the 7th row AS IS, bypassing all rows and errors that would be encountered, using BULK INSERT?
If not, could I use OPENROWSET to select the contents of the CSV file, load it to the database, without a format file but I would have to use FIRSTROW command?

You've correctly identified the problem (I emboldened it above). What's so hard about simply changing the definition of the staging table and letting it rip?

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

joshua 15769 Default port Points: 1492 More actions · Answer 13

Currently I'm looking at solutions on how to fix the error when the CSV file and our database are not equal in terms of column count. But I might need your help sir regarding the BULK INSERT command. In my posted attachment, the row count of the CSV file is at 400+, but when I tried inserting a 1 column data, SQL gives out a error. Could I use something like MAXERROR = 100? or Is there any command that would "bypass" the errors done which using BULK INSERT? would still be researching this though throughout this day, I thank you for all the support sir Jeff.

Jeff Moden SSC Guru Points: 1003863 More actions · Answer 14

joshua 15769 (12/1/2016)
Currently I'm looking at solutions on how to fix the error when the CSV file and our database are not equal in terms of column count. But I might need your help sir regarding the BULK INSERT command. In my posted attachment, the row count of the CSV file is at 400+, but when I tried inserting a 1 column data, SQL gives out a error. Could I use something like MAXERROR = 100? or Is there any command that would "bypass" the errors done which using BULK INSERT? would still be researching this though throughout this day, I thank you for all the support sir Jeff.

To answer your question, yes. To help you answer such questions without having to rely on someone like me, please do the following Google search...

https://www.google.com/?gws_rd=ssl#q=bulk+insert

The first link that shows up for me in that search is the one that contains the answer to the question you just asked...

https://msdn.microsoft.com/en-us/library/ms188365.aspx

To answer your next question... no I'm not trying to shrug you off and I'm not getting annoyed with your questions. You seem to have the "right stuff" and I'm trying to help you become more self sufficient.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)