Importing CSV Flat File defaults all columns to 50 chars

  • rschaeferhig

    SSCrazy Eights

    Points: 8776

    I have to import a CSV file into a table that has 75 columns. The Flat File connection defaults all the columns to 50 character strings. Some of the fields are much smaller and some are 255. The only solution I've found so far to match the Flat File connection columns to the schema of the table is to manually update each column in the connection and then again in the Flat File Data Source in the Data Flow.

    There has to be a better way. I played with SSIS 2008 and Suggest Types but it tried to convert the numeric strings (i.e. zip codes) into INT which causes more problems since they're varchar in the database.

    Does anyone have a more elegant solution to this problem?

     

  • rschaeferhig

    SSCrazy Eights

    Points: 8776

    Nobody has a solution for this? Really??:crazy:

     

  • vijay-547528

    Mr or Mrs. 500

    Points: 559

    Create a program to load the package and change the properties (column width) similar to that of destination columns and execute it.

  • rschaeferhig

    SSCrazy Eights

    Points: 8776

    I was thinking there should be an option to map the input source schema to the output schema but there doesn't seem to be one.

    I suppose there's a way to modify the column definitions in the source via a script task that reads the schema of the target, but I'm in over my head on this and looking for someone that might have done this before to offer some specific guidance (how do I access the target schema for the column definitions and then how do I change the source columns with that information? Are there ripple effects in the mappings I need to handle?)

     

  • MG-148046

    SSCrazy Eights

    Points: 9945

    DTS and SSIS sample X number of rows to determine column widths. If the columns in the initial sampling are smaller than a column in later rows, the later data will be truncated. You change the sampling size, or if you know what the maximum column sizes will be, you can configure each one, or you can move a row that has the largest column to the first row in the file so the sample will find it. In DTS, it was Disconnected Edit where you could override the default. I don't know where it can be done in SSIS but hopefully this will point you in the right direction or jog someones' memory who has a better grasp of SSIS than I do.

    MG

    "There are two ways of constructing a software design. One way is to make it so simple that there are obviously no deficiencies. And the other way is to make it so complicated that there are no obvious deficiencies."
    Tony Hoare

    "If you think it's expensive to hire a professional to do the job, wait until you hire an amateur." Red Adair.

  • rschaeferhig

    SSCrazy Eights

    Points: 8776

    There's an option, I think it's new to 2008, to "guess" the datatypes by scanning "n" rows where "n" can be specified. The problem with that is I have numeric data stored as strings that need to remain as strings, but when SSIS "guesses" the datatype from scanning the input data it decides that a field with only numbers in it should be stored as INT, dates as DATETIME, etc. So this doesn't work for what I need.

    Thanks anyway.

     

  • MG-148046

    SSCrazy Eights

    Points: 9945

    For the source columns that are numeric, you could use a Derived Column Transform in between the source and destination. Change the data type there rather than scripting.

    MG

    "There are two ways of constructing a software design. One way is to make it so simple that there are obviously no deficiencies. And the other way is to make it so complicated that there are no obvious deficiencies."
    Tony Hoare

    "If you think it's expensive to hire a professional to do the job, wait until you hire an amateur." Red Adair.

  • rschaeferhig

    SSCrazy Eights

    Points: 8776

    Problem is I have 75 columns and I'd have to reset each one. None of the target table columns are varchar(50) and if the metadata doesn't match between source and target you get all sorts of warnings. Plus the column length has to be changed in two places in the data source (external columns and output columns) and they both have to match.

    I'm trying to come up with a generic way to script it to set column 0 on the input to the same length as column 0 on the output and iterate through all the columns to set each one. This way I can (hopefully) change both settings for all the columns and reuse this code for other imports where the columns match 1 for 1.

     

  • Duncan Pryde

    SSCertifiable

    Points: 7956

    Hi

    Just wondering if you'd got anywhere with this, as I have a table with nearly 200 columns that was a bit of a drag to import from flat file in SSIS, although very simple in DTS (I know DTS also does sampling to obtain data types, but somehow it always just seems to work, doesn't truncate data or pick the wrong types). I'm going to be upgrading a SQL 2000 installation to SQL 2008 soon and can see this becoming an ongoing issue given that I do a number of imports from wide flat files currently and may be doing many more in the future.

    What would be great is if there was a way to get the column types from the destination table and apply them to the columns on the flat file connection somehow, without having to go through them one by one. Or maybe I should just go back to bcp...?

    Duncan

  • Rich Mechaber

    SSChampion

    Points: 10935

    I've run into a similar problem importing data from Excel. There's a connection string parameter you can add to prevent SQL from "guessing" how to import the data:

    http://microsoftdw.blogspot.com/2005/11/excel-data-not-coming-into-ssis-right.html

    Maybe there's something similar for textfile connection strings?

    Good luck,

    Rich

  • Duncan Pryde

    SSCertifiable

    Points: 7956

    rmechaber (8/25/2010)


    I've run into a similar problem importing data from Excel. There's a connection string parameter you can add to prevent SQL from "guessing" how to import the data:

    http://microsoftdw.blogspot.com/2005/11/excel-data-not-coming-into-ssis-right.html

    Maybe there's something similar for textfile connection strings?

    Good luck,

    Rich

    Thanks for the reply. It's not so much that it guesses and gets it wrong, it's that you seem to have 3 choices - accept the default of all columns in the textfile being varchar(50) (why always 50?); allow it to guess based on the first few thousand rows (useless in a 60m row file); or go through column by column setting the type yourself (tedious with more than 20 or 30 columns and usually requires a few "test runs" before getting the types spot on).

    If you could get it to use varchar(255) or thereabouts, and then have it perform implicit conversions when importing the data, that might work, so I'll look into your suggestion.

    Duncan

  • Duncan A. McRae

    SSC Enthusiast

    Points: 180

    Same name, same issue. Any resolution in the 8 years since posted?

  • Jeff Moden

    SSC Guru

    Points: 996831

    Duncan A. McRae - Saturday, October 13, 2018 9:19 AM

    Same name, same issue. Any resolution in the 8 years since posted?

    IMHO, you correctly answered your own question 8 years ago. 😀

    Or maybe I should just go back to bcp...?

    Depending on where the files live in relationship to where the database lives, BULK INSERT is another ticket that deserves a punch.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".
    "If "pre-optimization" is the root of all evil, then what does the resulting no optimization lead to?"

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • DAJames

    Newbie

    Points: 1

    after you have chosen your datasource as flat file,

    go into the Advanced tab which will be highlighting the 1st field and saying OutputColumnWidth 50.

    hold down SHIFT and click on the last field in the list - so you highlight All fields.

    Change the OutputColumnWidth to 500 or some other value you need...

    This stops most fields throwing data overflow / truncation.

     

  • tripleAxe

    SSCertifiable

    Points: 5605

    Or load the data into a staging table first, and then load the original target table from the staging table.  The staging table could have all columns as varchar(100) or whatever you need.  The other table would have the correct data types and lengths.  A query or stored procedure to populate the table would do the data conversion from varchar(100) to whatever is required.

Viewing 15 posts - 1 through 15 (of 17 total)

You must be logged in to reply to this topic. Login to reply