OpenRowset for CSV File

  • So earlier, I am on the verge of creating a code that would import a CSV file that is machine generated, to our database. I created one in Excel, and I used this code

    select *

    from openrowset('MSDASQL'

    ,'Driver={Microsoft Access Text Driver (*.txt, *.csv)}'

    ,'select * from D:\Test.CSV')

    And it worked fine.

    But when I am dealing with the actual data. The above code does not work.

    So the CSV file contains a leading 18 row data which information could be dropped off (its just the name of the machine), the needed data is at the 19th row.

    After searching, I found out a piece of code, and I tried it to the CSV file, which is

    SELECT *

    FROM OPENROWSET(BULK 'D:\Data\sample\device1_2016-08-03_15-24-58.csv',

    FORMATFILE='D:\Data\sample\BCPFormat.xml',

    FIRSTROW = 19) AS a

    but the data is blank!

    is there any other way to get the data of a CSV file?

  • If you already have a table you're trying to load data into, I've often found BULK INSERT to be easier to work with than OPENROWSET. Have you tried something like one of these:

    BULK INSERT TestCSVImport FROM 'D:\Data\sample\device1_2016-08-03_15-24-58.csv'

    WITH (FIELDTERMINATOR = ',', ROWTERMINATOR = '', FirstRow=19)

    BULK INSERT TestCSVImport FROM 'D:\Data\sample\device1_2016-08-03_15-24-58.csv'

    WITH (FORMATFILE = 'D:\Data\sample\BCPFormat.xml', FirstRow=19)

    https://msdn.microsoft.com/en-us/library/ms188365(v=sql.110).aspx

  • I tried using your code sir, first, i uploaded the data as temp table in my database (into #TestCSVImport). I also renamed the column names of the CSV file to know if the columns are to blame.

    To Iterate

    I used this code to create a blank table

    SELECT *

    into #TestCSVImport

    FROM OPENROWSET(BULK 'D:\Data\sample\device1_2016-08-03_15-24-58.csv',

    FORMATFILE='D:\Data\sample\BCPFormat.xml',

    FIRSTROW = 20) AS a

    Then I used this code to insert the values

    BULK INSERT #TestCSVImport FROM 'D:\Data\sample\device1_2016-08-03_15-24-58.csv'

    WITH (FIELDTERMINATOR = ',', ROWTERMINATOR = '', FirstRow=20)

    BULK INSERT #TestCSVImport FROM 'D:\Data\sample\device1_2016-08-03_15-24-58.csv'

    WITH (FORMATFILE = 'D:\Data\sample\BCPFormat.xml', FirstRow=20)

    the query returned 0 rows.

  • joshua 15769 (11/23/2016)


    I tried using your code sir, first, i uploaded the data as temp table in my database (into #TestCSVImport). I also renamed the column names of the CSV file to know if the columns are to blame.

    To Iterate

    I used this code to create a blank table

    SELECT *

    into #TestCSVImport

    FROM OPENROWSET(BULK 'D:\Data\sample\device1_2016-08-03_15-24-58.csv',

    FORMATFILE='D:\Data\sample\BCPFormat.xml',

    FIRSTROW = 20) AS a

    Then I used this code to insert the values

    BULK INSERT #TestCSVImport FROM 'D:\Data\sample\device1_2016-08-03_15-24-58.csv'

    WITH (FIELDTERMINATOR = ',', ROWTERMINATOR = '', FirstRow=20)

    BULK INSERT #TestCSVImport FROM 'D:\Data\sample\device1_2016-08-03_15-24-58.csv'

    WITH (FORMATFILE = 'D:\Data\sample\BCPFormat.xml', FirstRow=20)

    the query returned 0 rows.

    Did you actually create a format file?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Yes Sir, I created a XML file, here is the screenshot.

    Do take note i change my terminator from '","' to ','

    and here sir is the textfile

    I would like to ask also that, is it possible for TSQL to retrieve data FROM a CSV 1st and 3rd row?

  • joshua 15769 (11/24/2016)


    Yes Sir, I created a XML file, here is the screenshot.

    Do take note i change my terminator from '","' to ','

    and here sir is the textfile

    I would like to ask also that, is it possible for TSQL to retrieve data FROM a CSV 1st and 3rd row?

    As you know, you posted the example CSV file as a graphic. Any chance of you attaching that same example as a text file so I can try to duplicate your situation? Same goes for the format file. Attaching it as a text file so I can use it to test with would be real handy. Thanks.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Hi Sir Jeff!

    Thank you for your time!

    Sir, I have uploaded the data files which you wanted. Thanks again!

    Going back to the last thread that we have discussed sir, IF ever we found a solution to this problem sir, what if the data file, the CSV file, adds a column? then I should add the column to the XML file right? then I saw one of your post strongly suggesting the use of BULK INSERT, could it, dynamically (through TSQL), add a column?

  • joshua 15769 (11/25/2016)


    Hi Sir Jeff!

    Thank you for your time!

    Sir, I have uploaded the data files which you wanted. Thanks again!

    Going back to the last thread that we have discussed sir, IF ever we found a solution to this problem sir, what if the data file, the CSV file, adds a column? then I should add the column to the XML file right? then I saw one of your post strongly suggesting the use of BULK INSERT, could it, dynamically (through TSQL), add a column?

    You posted in a 2008 forum and so I assume that's what you're using.

    One of the MAJOR problems with BCP and BULK INSERT prior to 2012 is that it didn't count row terminators to count the number of rows to skip. According to your data, the first row should be row 21. In actuality, it's row 7 because the version of BULK INSERT for 2008 still counted column delimiters to determine if it should even look for a row terminator.

    Also, you don't need a BCP Format file for this bad boy. Just a properly created staging table. I saved your device file to my C:\Temp\ folder and then ran the following code (I don't trust most defaults which is why so many "WITH" options). With that in mind, don't forget to change the FROM clause of the BULK INSERT to wherever your file actually is. Of course, you eventually need to change the BULK INSERT to dynamic SQL to make it so you can pass the full file path.

    --===== This is the table that we'll import to.

    CREATE TABLE dbo.DeviceStaging

    (

    [NO.] INT

    ,[Time] DATETIME

    ,ms SMALLINT

    ,degC1 DECIMAL(9,3)

    ,degC2 DECIMAL(9,3)

    ,degC3 DECIMAL(9,3)

    ,degC4 DECIMAL(9,3)

    ,degC5 DECIMAL(9,3)

    ,degC6 DECIMAL(9,3)

    ,A12345678901 VARCHAR(100)

    ,A12345678902 VARCHAR(100)

    ,A12345678903 VARCHAR(100)

    ,A12345678904 VARCHAR(100)

    ,A1234 VARCHAR(100)

    )

    ;

    --====== This just makes sure the staging table is empty for reruns

    TRUNCATE TABLE dbo.DeviceStaging

    ;

    --===== This does the import

    BULK INSERT dbo.DeviceStaging

    FROM 'C:\Temp\device1_2016-08-03_15-24-58.txt'

    WITH (

    BATCHSIZE = 2000000000

    ,CODEPAGE = 'RAW'

    ,DATAFILETYPE = 'CHAR'

    ,FIELDTERMINATOR = ','

    ,ROWTERMINATOR = ''

    ,FIRSTROW = 7

    ,TABLOCK

    )

    ;

    --===== Let's see if it worked (it does)

    SELECT * FROM dbo.DeviceStaging

    ;

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • p.s.

    The code I posted won't work without error when you do upgrade to 2012 or greater because Bulk Insert will count the row terminators instead of the field delimiters to figure out where the end of a row is and because it's setup to start seemingly at row 7 instead of 21. It won't, however, fail silently. It'll skip the first 6 rows and then start producing errors because the target table is expecting INT values for the first column. Since the default number of errors tolerated by BULK INSERT is 10, it'll generate 10 errors (making it to row 17) and then fail. All that will need to be done is to change FIRSTROW to 21 and you're done.

    It COULD be made to be totally self healing even with the addition or reduction of actual columns in the data. In the process, it could also be made to include data from the 20 row header and a bunch of other things. One of the ways to do that is to BULK INSERT the file as single line blobs into a working table and programmatically scan through to find and interrogate the location (row number) of the header. Then re-export the file without the rows prior to the header and re-import to the staging table. In that first import, you could also interrogate the column names of the header and dynamically build the staging table to match the file. If using BCP and xp_CmdShell bugs you (it's NOT a security risk if done properly, though), then someone could write a preprocessor to remove the "super header" prior to import attempts.

    Another way would be to simply import everything as VARCHAR and then delete rows that don't pass muster. The problem with that is that there's no guarantee that errors won't be generated due to the varying number of "fields" per row in the "super header". The more complex methods I outlined above would work correctly IF there's some stability in the data as to what the header is actually going to look like or which row the header will always appear on. If it all sounds too complicated, that would be because it's a bit difficult to build forms of artificial intelligence even into managed code, never mind T-SQL. I can't say for sure but even a product like Hadoop may not help any more than the methods I proposed and have the side affect of adding to the proverbial "Tower of Babel" which appears in most folks ETL solutions.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • If I could only know how to check the number of rows in this kind of CSV, that would be great! but anyway, thank you for the insight sir! This would be my foundation to the system which I would be creating, thank you so much!

  • joshua 15769 (11/27/2016)


    If I could only know how to check the number of rows in this kind of CSV, that would be great!

    It's easy. Don't set FIRSTROW and let it fail. It'll tell you how many rows it thinks there are. It said it failed on the first 6 rows when I first ran the code. 🙂

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Hi Sir Jeff Moden,

    Good Day.

    Sir, If you dont mid asnwering again, I searched in the internet but to no avail. My current problem is, the data which I gave you, It has its tendency to grow (add columns). But the added column would not appear in the last column; It appears between columns.

    For example

    Column A |Column B|

    This is the original Column.

    Column A |Column C| Column B

    This is the column that changed.

    My question is, using BULK INSERT, I'm having a problem getting the data from the CSV file.

    When the columns of the staging table and the CSV are not equal, the data wont upload. Even in a temp table.

    is there a way for me to somehow copy the data from the CSV file from the 7th row AS IS, bypassing all rows and errors that would be encountered, using BULK INSERT?

    If not, could I use OPENROWSET to select the contents of the CSV file, load it to the database, without a format file but I would have to use FIRSTROW command?

  • joshua 15769 (12/1/2016)


    Hi Sir Jeff Moden,

    Good Day.

    Sir, If you dont mid asnwering again, I searched in the internet but to no avail. My current problem is, the data which I gave you, It has its tendency to grow (add columns). But the added column would not appear in the last column; It appears between columns.

    For example

    Column A |Column B|

    This is the original Column.

    Column A |Column C| Column B

    This is the column that changed.

    My question is, using BULK INSERT, I'm having a problem getting the data from the CSV file.

    [font="Arial Black"]When the columns of the staging table and the CSV are not equal, the data wont upload. [/font]Even in a temp table.

    is there a way for me to somehow copy the data from the CSV file from the 7th row AS IS, bypassing all rows and errors that would be encountered, using BULK INSERT?

    If not, could I use OPENROWSET to select the contents of the CSV file, load it to the database, without a format file but I would have to use FIRSTROW command?

    You've correctly identified the problem (I emboldened it above). What's so hard about simply changing the definition of the staging table and letting it rip?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Currently I'm looking at solutions on how to fix the error when the CSV file and our database are not equal in terms of column count. But I might need your help sir regarding the BULK INSERT command. In my posted attachment, the row count of the CSV file is at 400+, but when I tried inserting a 1 column data, SQL gives out a error. Could I use something like MAXERROR = 100? or Is there any command that would "bypass" the errors done which using BULK INSERT? would still be researching this though throughout this day, I thank you for all the support sir Jeff.

  • joshua 15769 (12/1/2016)


    Currently I'm looking at solutions on how to fix the error when the CSV file and our database are not equal in terms of column count. But I might need your help sir regarding the BULK INSERT command. In my posted attachment, the row count of the CSV file is at 400+, but when I tried inserting a 1 column data, SQL gives out a error. Could I use something like MAXERROR = 100? or Is there any command that would "bypass" the errors done which using BULK INSERT? would still be researching this though throughout this day, I thank you for all the support sir Jeff.

    To answer your question, yes. To help you answer such questions without having to rely on someone like me, please do the following Google search...

    https://www.google.com/?gws_rd=ssl#q=bulk+insert

    The first link that shows up for me in that search is the one that contains the answer to the question you just asked...

    https://msdn.microsoft.com/en-us/library/ms188365.aspx

    To answer your next question... no I'm not trying to shrug you off and I'm not getting annoyed with your questions. You seem to have the "right stuff" and I'm trying to help you become more self sufficient.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 15 posts - 1 through 15 (of 15 total)

You must be logged in to reply to this topic. Login to reply