Import a text file into SQL Server table without BULK insert due to special char

  • We currently have a process to import a text file from a customer that uses BULK insert but it is now crashing due to special characters (Ñ, Ã).

    , in the file.  Is there another way to get the text file into  a table  without  BULK  insert/BCP  since  this  runs  in  a nightly  batch  process?   The text file contains multiple formatted records that now get loaded into a ##TempPerson table then loaded into multiple tables depending on the contents of a specific column.

    FROM ##TempPerson

    where SUBSTRING(##TempPerson.info,49,4) = '0000'

  • you may need to look at adding a codepage to your bulk insert command

    or alternatively use powershell - but again you will need to specify the correct codepage to load the file correctly

  • Thanks for the quick reply.  We get this file from another organization so I don't what format it is saved in other than just text.  I tried using CODEPAGE = 1252 and 860 speculatively since the reading I have done, while informative doesn't really point one in a specific direction.  Both got the process to run but in each case, the row with the special character gets ignored(dropped & not imported) so neither of those satisfy the requirement.  I also tried 65001 which worked but the results seemed like they went through some sort of sort or jumble before entering the import and came into the table all out of sequence.

     

  • you may need to speak with the supplier of the file and see how they are generating the file.

    you could also try HxD (https://mh-nexus.de/en/hxd/) to edit the file and pick up the multiple formats it has until you "know" which codeset - you will need to play around a bit.

    not much we can do without a file (unmodified by yourself) to test - and the bulk command you using

  • I have to agree with Frederico, if your import process has been working without issues and then suddenly starts failing due to special characters, that says something changed at their end.  You need to work with them to figure out what changed so that you can address that change in your process.  Right now you are having to guess to try to find a fix.  Get the people supplying you the file to help figure out what needs to be done.

  • We have sent a request back to the agency to get more information on the file.  What this file contains is personnel information for the supporting agency's and some agency's have input their names using the hyphenated N,A,E,Z characters and the state has decided they will allow this to happen so all other support agencies have to also allow it as the data feeds down to them.

    Until this change, the files never contained any special characters so the bulk insert worked without any issues.  Will post back when I get updated information.

  • The text file was UTF-8 and we used CODEPAGE = 65001 and were able to process successfully.

     

    set @Str1 = 'Select * Into ##TempPerson FROM OPENROWSET (BULK '''

    set @Str2 = ''', CODEPAGE = 65001, FORMATFILE = '''

    Set @Str3 = ''', FIRSTROW = 1) as Person'

    Set @cmdStr1 = @Str1 + @DataFile

    Set @cmdStr2 = @Str2 + @FormatFile + @Str3

    Set @cmdStr = @cmdStr1 + @cmdStr2

    exec sp_executeSql @cmdStr

  • You could try importing it as a blob (use OPENROWSET with SINGLE_BLOB), this should always succeed. But you would then need to parse the string before putting it into your temp table.

  • rip wrote:

    The text file was UTF-8 and we used CODEPAGE = 65001 and were able to process successfully.

    set @Str1 = 'Select * Into ##TempPerson FROM OPENROWSET (BULK '''

    set @Str2 = ''', CODEPAGE = 65001, FORMATFILE = '''

    Set @Str3 = ''', FIRSTROW = 1) as Person'

    Set @cmdStr1 = @Str1 + @DataFile

    Set @cmdStr2 = @Str2 + @FormatFile + @Str3

    Set @cmdStr = @cmdStr1 + @cmdStr2

    exec sp_executeSql @cmdStr

    Great feedback.  Thanks.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 9 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply