How to Handle NULL Values Loaded by SSIS from Excel Files

  • yzhang

    SSC Enthusiast

    Points: 154

    Comments posted to this topic are about the item How to Handle NULL Values Loaded by SSIS from Excel Files

  • Anipaul

    SSC-Insane

    Points: 24681

    Useful one. Good for the new bies in SSIS. Also the links provided at the end are useful. 🙂

  • p.wiggins

    Valued Member

    Points: 63

    Hi,

    This is an old problem with importing Excel, we just put in a dummy first row with data in the form rewuired for each column, e.g. TEXT for text, 1234 for number, etc. , then delete the first record imported. tends to solve the problem for us.

    Pete Wiggins

  • Pieter-423357

    SSCommitted

    Points: 1619

    One could just save the Excel as a CSV file, load the file via BULK INSERT or OPENROWSET and be done.

    Why go through hoops on and develop SSIS package? The portability and predictability of SSIS isn't that great anyway.

    my 2 cents...

  • SuperDBA-207096

    SSCrazy Eights

    Points: 8176

    I've had a similar problem - ended up resolving the problem as follows:

    1) save the data as csv

    2) create the table with the correct datatypes before importing.

    That worked for me, and was alot simpler; although I believe your solution does help demonstrate some of the capabilties of SSIS.

    Mark

  • thejackfy

    SSC Enthusiast

    Points: 114

    Much easier way is:

    Load the Excel file with First Row as Header off even it contains the header. This can ensure all the fields are read in text format. And then add a Conditional Split to skip the first row.

  • Jack Corbett

    SSC Guru

    Points: 184380

    I thought the article was well-written and easy to follow. I do agree with others that loading the first row (header) as data in the source would be the simplest solution, yet I understand why you would miss this initially.

    Jack Corbett
    Consultant - Straight Path Solutions
    Check out these links on how to get faster and more accurate answers:
    Forum Etiquette: How to post data/code on a forum to get the best help
    Need an Answer? Actually, No ... You Need a Question

  • marklegosz

    SSCrazy

    Points: 2816

    I may be missing something, but couldn't you just modify the select clause on the spreadsheet and manually cast to the correct datatype?

  • Steve-524674

    SSC Eights!

    Points: 804

    Call me crazy, but I think I would do a global change on the column to eliminate the dash.

  • David Greenberg-358875

    SSC Enthusiast

    Points: 158

    I found an easier way using steps in this MS KB article on the very same subject. I tried it and it works very well.

    Solution : Set ConnectionString property of Excel source file as following

    Provider=Microsoft.Jet.OLEDB.4.0;Data Source=filepath\filename.xls;Extended Properties=”EXCEL 8.0;HDR=YES;IMEX=1“;

    NOTE: Setting IMEX=1 tells the driver to use Import mode. In this state, the registry setting ImportMixedTypes=Text will be noticed. This forces mixed data to be converted to text. For this to work reliably, you may also have to modify the registry setting, TypeGuessRows=8. The ISAM driver by default looks at the first eight rows and from that sampling determines the datatype. If this eight row sampling is all numeric, then setting IMEX=1 will not convert the default datatype to Text; it will remain numeric.

    You must be careful that IMEX=1 not be used indiscriminately. This is IMPORT mode, so the results may be unpredictable if you try to do appends or updates of data in this mode.

    Reference : http://support.microsoft.com/kb/194124/en-us

  • JB-117464

    Ten Centuries

    Points: 1279

    I have heard several 'fixes' for this problem. From what I have read it appears to be a major design flaw in the excel provider which covers all versions including 2007.

    Essentially it is down to these 2 registry keys (not sure which computer(s) these are read from; source, destination or SSIS run host)

    HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Jet\3.5\Engines\Excel\TypeGuessRows=8

    HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Jet\4.0\Engines\Excel\TypeGuessRows=8

    Excel reads the first 8 lines and determines maximum length and data type from that. So if you have

    header

    1

    2

    3

    4

    5

    6

    7

    8

    word

    the 9th line containing "word" will be come null because it isnt a float data type. It will also truncate all strings to either 255 or the longest in the first 8 lines if they are longer than 255. Unicode can also present some challenges!

    You can alter these registry keys to 16384 but that doesnt eliminate the problem with most excel files particularly since you can get over 1 million rows in 2007.

    The best way i have found is adding a fake first line which forces the right length and datatype e.g. A12345678....... and then skip this line in your ssis package. This is pretty similar to Ying Zhang suggest of leaving the header in. Unfortunatley the header is not guaranteed to be long enough to prevent truncation of long text fields (over 255).

    The only true way to get round this is to refuse to accept data in excel! If only:)

  • thejackfy

    SSC Enthusiast

    Points: 114

    Simply changing the connection string doesn't work as I've tried before.

    David Greenberg (4/28/2008)


    I found an easier way using steps in this MS KB article on the very same subject. I tried it and it works very well.

    Solution : Set ConnectionString property of Excel source file as following

    Provider=Microsoft.Jet.OLEDB.4.0;Data Source=filepath\filename.xls;Extended Properties=”EXCEL 8.0;HDR=YES;IMEX=1“;

    NOTE: Setting IMEX=1 tells the driver to use Import mode. In this state, the registry setting ImportMixedTypes=Text will be noticed. This forces mixed data to be converted to text. For this to work reliably, you may also have to modify the registry setting, TypeGuessRows=8. The ISAM driver by default looks at the first eight rows and from that sampling determines the datatype. If this eight row sampling is all numeric, then setting IMEX=1 will not convert the default datatype to Text; it will remain numeric.

    You must be careful that IMEX=1 not be used indiscriminately. This is IMPORT mode, so the results may be unpredictable if you try to do appends or updates of data in this mode.

    Reference : http://support.microsoft.com/kb/194124/en-us

  • Holly Kilpatrick

    Ten Centuries

    Points: 1085

    I agree that this situation can be very frustrating. I have always used the IMEX=1 solution and found it very useful.

    Thanks for the script for importing from Excel, I saved that in case it is helpful for something in the future.

    Holly

  • Igor Zaychik

    SSC Veteran

    Points: 245

    I've read article and found that F5 does not work to repopulate fields.

    What is the right key?

    From other side I used different method.

    I open another workbook,

    mark all cells as text

    and copy the original one as text

    through the Pasta special option on Edit menu.

    Even if some columns are numeric SQL will recognize them

    at load.

  • FibRock

    Right there with Babe

    Points: 726

    I have to agree with David, IMEX=1 is a much easier solution and we use this at where I work with excel for a number of different packages. The only thing I have to add is that setting the registry value to 8 is a little low you may want to try 1000.

    Steve
    http://stevefibich.net
    http://utilitydb.codeplex.com/

Viewing 15 posts - 1 through 15 (of 42 total)

You must be logged in to reply to this topic. Login to reply