import flat file through SSIS

  • I have a SSIS package which imports a flat file to DB. The file delimiter is Comma, the text qualifier is set to " (double quote). It was all fine until it hits this column: "toys ""r"" us, inc." supposedly the comma is inside of the text qualifier, but I guess the double double quotes around the R confused the system, so the field was splited into two. And this causes problem. I was wondering if anybody have any idea how to handle this (modify the original import file is not an option.)

    Thanks!

  • I tried with a file having the folllowing content.

    "Jacob","jacob"s" File", "Other File"

    I get this file processed correctly (Jacob, Jacob"s" File, Other File) with the following settings (SSIS Flat File Connection manager)

    format: delimited

    Text Qualifier: "

    Column delimiter: comma {,}

    Row delimiter: {CR}{LF}

    .

  • Thanks for your quick response. I guess I didn't state the problem clearly. In my case, I don't want the data imported as two seperated fields: "toys ""r"" us" and "inc". the whole thing should be one field (column).

  • ah! my mistake.

    .

  • I don't have a solution, but shouldn't it be "toys 'r us" with a single quote before the r? Maybe the data can be corrected before you receive it?

  • ...or find any instances of double qoutes in your text file and replace with single qoutes before executing the import.

    Does anyone ever get really 'Clean' data?

    'nix

  • Thanks for all the inputs! Yea, I am working on reformatting the data. I was hoping that I don't have to do that...

  • i do monthly imports of taxation data from CCH and have had a similar problem

    only solution i ever found was to map out the column lengths

  • By "map out the column lengths" do you mean make it a fixed length file?

  • If you setup a BCP format file correctly and use Bulk Insert, that problem will magically disappear... performance problems will probably disappear, as well. I've used Bulk Insert with a format file to import more than 5 million rows per minute on a 20 column quoted identifier CSV.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • yep, i think that's what it's called

    i had to recreate my import process a few weeks back and i went through each table, looked at the column lengths and created the SSIS flat file imports with the same column lengths

  • I don't believe that would work on something that looks like this...

    "Some Company with a very long name","somedata","some other data"

    "Some Company","someData","some other data"

    "Some Company with a very long name","some very long winded data","some other data"

    "Some Company","some very long winded data","some other data"

    "Some Company with a very long name","","some other data"

    "Some Company","","some other data"

    ... but I could be wrong 😛

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Thanks, Jeff. right now I wrote a console app to reformat the files to tab delimited files and the problem is fixed. but the performance is getting worse. I have more than 20 files I need to import daily. and each file contain about 130 columns and about a million rows. Almost all columns are poorly formated. For example: they use "0001-01-01" for Null dates. So maybe bcp format will help. but I have never done that before, would you please give an example?

    Thanks a lot!

  • Better than that, post the first 100 rows of your problem file as an attachment and we'll take a crack at it tonight. Also, post what the CREATE TABLE statement for the target table, please. The less I have to do, the more time I can spend on the actual problem....

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • i've never done it, but what you can do is set up a transformation to write all bad imports to a table to be worked on manually

    the explanation is in the Wrox SSIS book

Viewing 15 posts - 1 through 15 (of 33 total)

You must be logged in to reply to this topic. Login to reply