How to Bulk Insert data from csv which contains double quotes to sql server 2014

  • Input CSV file:

    "Id","Name","Sal"
    "1","vikas,j","10000.5"
    "2","James","5000.2"
    "3","V
    J","4000.3"

    create table dbo.demo
    (
     Id char(1),
     Name varchar(50),
     Sal float
    )

    bcp.fmt file:

    12.0
    4
    1   SQLCHAR   0   0  "\""  0 FIRST_QUOTE   ""
    2   SQLCHAR   0   3  "\","  1  Id     ""
    3   SQLCHAR   0   55  ",\""  2  Name     ""
    4   SQLCHAR   0   15  "\"\r\n" 3  Sal     ""

    BULK INSERT dbo.demo from 'D:\Input.csv' with(DATAFILETYPE='char',fieldterminator= '","' ,Firstrow = 2, Rows_per_batch = 100000 , rowterminator='\n',FORMATFILE = 'D:\bcp.fmt')

    Error: Bulk load data conversion error (type mismatch or invalid character for the specified codepage) for row 4, column 4 (Sal)

    Can anyone tell me what am I missing here, from what I understand I am doing something wrong in format file while passing Terminator value?

  • The problem is how you're treating the leading quote.  You can't treat it as a delimiter.  It has to be treated as a guaranteed single character (a fixed field length of 1).  Also, the delimiters between the fields are actually "," (quotes included) and so must the delimiters be in the BCP format file.  Since a " must be escaped when used as a part of a delimiter in BCP Format Files and remembering that a delimiter in a BCP Format File marks the END of a field and the whole delimiter must be encapsulated in quotes, the delimiter between the fields must be marked as "\",\"".

    And, with the understanding that I've not tested this specific example, please see the following....

    12.0
    4
    1 SQLCHAR 0 1 "" 0 FIRST_QUOTE ""
    2 SQLCHAR 0 3 "\",\"" 1 Id  ""
    3 SQLCHAR 0 55 "\",\"" 2 Name  ""
    4 SQLCHAR 0 15 "\"\r\n" 3 Sal  ""

    I also recommend that since your data (and the first and last quotes are actually a part of the data) can be variable width, that you NOT specify the exact column width you're looking for.  Let the datatype of the columns in the target table do that for you so that you don't need to maintain the BCP format file for simple expansions of columns.  And, yeah... the oversized 9999 bytes is very intentional so that I get error feedback if the field in the file contain more data than the table columns do.

    This is what I 'd end up with for a BCP format file this this file...
    12.0
    4
    1 SQLCHAR 0 1 "" 0 FIRST_QUOTE ""
    2 SQLCHAR 0 9999 "\",\"" 1 Id  ""
    3 SQLCHAR 0 9999 "\",\"" 2 Name  ""
    4 SQLCHAR 0 9999 "\"\r\n" 3 Sal  ""

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Thanks, Jeff I tried your suggested format file 

    Bulk insert generated the error: Bulk load data conversion error (type mismatch or invalid character for the specified codepage) for row 4, column 4 (Sal). But in actual it had inserted 2 rows in the table. However 3rd row where name columns value goes to the next line that 1 row was not inserted. Please note this is how I get value in Input CSV and that cannot be changed.

    "3","V
    J","4000.3"

    So Ideally expected output in the table for the above row is :
    Id       Name           Sal
    3    V<space>J     4000.3

    In SQL 2017 this can be achieved without using format file. But can't this be done in SQL 2014 even when I am using format file?

  • the issue is that the last line fails to meet the correct delimiter "\"\r\n" and as such the value contains a quote

    try changing the table to
    create table dbo.demo
    (
    Id char(1),
    Name varchar(50),
    Sal varchar(100)
    )
    and you will see what I mean

    BCP prior to 2017 will not work correctly no matter what is done with the delimiters - there will always be some issues that need to be coded around.

    this will work.

    if object_id('demo_staging') is not null
    drop table demo_staging;
    create table dbo.demo_staging
    (
    Id char(1),
    Name varchar(50),
    Sal varchar(100),
    id2 int identity(1,1)
    )

    BULK INSERT dbo.demo_staging from 'c:\temp\test.csv' with(DATAFILETYPE='char'
    ,Firstrow = 2, Rows_per_batch = 100000
    -- not using these as they are defined within the format file
    -- , rowterminator='\n'
    --,fieldterminator= '","'
    ,FORMATFILE = 'c:\temp\test.fmt')

    select *
    from dbo.demo_staging

    -- as we know the last value is "incorrect" clean it
    update t2
    set t2.sal = left(t2.sal, len(t2.sal) -1)
    from dbo.demo_staging t2
    where t2.id2 = (select max(id2) from dbo.demo_staging)

    select *
    from dbo.demo_staging

    -- now it can be inserted into a table with the correct datatypes

    if object_id('demo') is not null
    drop table demo;
    create table dbo.demo
    (
    Id char(1),
    Name varchar(50),
    Sal float
    )

    insert into dbo.demo
    select id
      , name
      , convert(float, sal)
    from dbo.demo_staging

    select *
    from dbo.demo

    if object_id('demo_staging') is not null
    drop table demo_staging;

    if object_id('demo') is not null
    drop table demo;

    regarding "3 V<space>J 4000.3"
    this will not be correct - the space will either be \n or \r\n - if  you need it to be space you need to replace it yourself

  • vikasjagadale8 - Thursday, January 3, 2019 11:14 PM

    Thanks, Jeff I tried your suggested format file 

    Bulk insert generated the error: Bulk load data conversion error (type mismatch or invalid character for the specified codepage) for row 4, column 4 (Sal). But in actual it had inserted 2 rows in the table. However 3rd row where name columns value goes to the next line that 1 row was not inserted. Please note this is how I get value in Input CSV and that cannot be changed.

    "3","V
    J","4000.3"

    So Ideally expected output in the table for the above row is :
    Id       Name           Sal
    3    V<space>J     4000.3

    In SQL 2017 this can be achieved without using format file. But can't this be done in SQL 2014 even when I am using format file?

    Can you attach the actual test file (please, no PII or other sensitive information) you're using for this?  I'd like to examine it for control characters.  If YOU made the test file, I think I know what the problem is but would like to confirm it before making a suggestion.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden - Friday, January 4, 2019 8:48 AM

    vikasjagadale8 - Thursday, January 3, 2019 11:14 PM

    Thanks, Jeff I tried your suggested format file 

    Bulk insert generated the error: Bulk load data conversion error (type mismatch or invalid character for the specified codepage) for row 4, column 4 (Sal). But in actual it had inserted 2 rows in the table. However 3rd row where name columns value goes to the next line that 1 row was not inserted. Please note this is how I get value in Input CSV and that cannot be changed.

    "3","V
    J","4000.3"

    So Ideally expected output in the table for the above row is :
    Id       Name           Sal
    3    V<space>J     4000.3

    In SQL 2017 this can be achieved without using format file. But can't this be done in SQL 2014 even when I am using format file?

    Can you attach the actual test file (please, no PII or other sensitive information) you're using for this?  I'd like to examine it for control characters.  If YOU made the test file, I think I know what the problem is but would like to confirm it before making a suggestion.

    Sure!  But unfortunately, since it's a CSV file I cannot Upload that file. So I am uploading the images of the CSV file. When opened in Notepad++ so you can see field terminator(Comma)  and End of Line (CRLF) and a second image of actual CSV so that you can see how excel by default wraps that text

  • vikasjagadale8 - Friday, January 4, 2019 9:00 AM

    Jeff Moden - Friday, January 4, 2019 8:48 AM

    vikasjagadale8 - Thursday, January 3, 2019 11:14 PM

    Thanks, Jeff I tried your suggested format file 

    Bulk insert generated the error: Bulk load data conversion error (type mismatch or invalid character for the specified codepage) for row 4, column 4 (Sal). But in actual it had inserted 2 rows in the table. However 3rd row where name columns value goes to the next line that 1 row was not inserted. Please note this is how I get value in Input CSV and that cannot be changed.

    "3","V
    J","4000.3"

    So Ideally expected output in the table for the above row is :
    Id       Name           Sal
    3    V<space>J     4000.3

    In SQL 2017 this can be achieved without using format file. But can't this be done in SQL 2014 even when I am using format file?

    Can you attach the actual test file (please, no PII or other sensitive information) you're using for this?  I'd like to examine it for control characters.  If YOU made the test file, I think I know what the problem is but would like to confirm it before making a suggestion.

    Sure!  But unfortunately, since it's a CSV file I cannot Upload that file. So I am uploading the images of the CSV file. When opened in Notepad++ so you can see field terminator(Comma)  and End of Line (CRLF) and a second image of actual CSV so that you can see how excel by default wraps that text

    I can't use that.  I need the file for other reasons, as well.  Yep... I could type it out.  Just don't want to do that.   Just change the extension of the CSV file to TXT and that'll work.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden - Friday, January 4, 2019 9:25 AM

    vikasjagadale8 - Friday, January 4, 2019 9:00 AM

    Jeff Moden - Friday, January 4, 2019 8:48 AM

    vikasjagadale8 - Thursday, January 3, 2019 11:14 PM

    Thanks, Jeff I tried your suggested format file 

    Bulk insert generated the error: Bulk load data conversion error (type mismatch or invalid character for the specified codepage) for row 4, column 4 (Sal). But in actual it had inserted 2 rows in the table. However 3rd row where name columns value goes to the next line that 1 row was not inserted. Please note this is how I get value in Input CSV and that cannot be changed.

    "3","V
    J","4000.3"

    So Ideally expected output in the table for the above row is :
    Id       Name           Sal
    3    V<space>J     4000.3

    In SQL 2017 this can be achieved without using format file. But can't this be done in SQL 2014 even when I am using format file?

    Can you attach the actual test file (please, no PII or other sensitive information) you're using for this?  I'd like to examine it for control characters.  If YOU made the test file, I think I know what the problem is but would like to confirm it before making a suggestion.

    Sure!  But unfortunately, since it's a CSV file I cannot Upload that file. So I am uploading the images of the CSV file. When opened in Notepad++ so you can see field terminator(Comma)  and End of Line (CRLF) and a second image of actual CSV so that you can see how excel by default wraps that text

    I can't use that.  I need the file for other reasons, as well.  Yep... I could type it out.  Just don't want to do that.   Just change the extension of the CSV file to TXT and that'll work.

  • I was able to get this to return the correct data.  Format File and data file are attached

    Here's the sql

    SELECT *
      FROM OPENROWSET (
       BULK 'c:\work\data\demo1.txt',
            Firstrow = 2,
       FORMATFILE = 'c:\work\data\demofmt.txt'
       ) AS t1;

    For better, quicker answers, click on the following...
    http://www.sqlservercentral.com/articles/Best+Practices/61537/

    For better answers on performance questions, click on the following...
    http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

  • Mike01 - Friday, January 4, 2019 9:50 AM

    I was able to get this to return the correct data.  Format File and data file are attached

    Here's the sql

    SELECT *
      FROM OPENROWSET (
       BULK 'c:\work\data\demo1.txt',
            Firstrow = 2,
       FORMATFILE = 'c:\work\data\demofmt.txt'
       ) AS t1;

    Thanks, Mike, that worked. But still what I couldn't understand is how come just changing the format to .txt worked. And then what's the problem with CSV file ? Because ideally, I am going to get CSV file for importing data. I can change the format to txt but want to do that only as a last option.

    I think I got it how it worked when I checked your demo.txt file after the last row there is an additional line at the end (basically you have pressed enter after the last row). Now it worked even for CSV format. Please correct me If I am wrong.

Viewing 10 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply