How to Handle NULL Values Loaded by SSIS from Excel Files

  • Comments posted to this topic are about the item How to Handle NULL Values Loaded by SSIS from Excel Files

  • Useful one. Good for the new bies in SSIS. Also the links provided at the end are useful. 🙂

  • Hi,

    This is an old problem with importing Excel, we just put in a dummy first row with data in the form rewuired for each column, e.g. TEXT for text, 1234 for number, etc. , then delete the first record imported. tends to solve the problem for us.

    Pete Wiggins

  • One could just save the Excel as a CSV file, load the file via BULK INSERT or OPENROWSET and be done.

    Why go through hoops on and develop SSIS package? The portability and predictability of SSIS isn't that great anyway.

    my 2 cents...

  • I've had a similar problem - ended up resolving the problem as follows:

    1) save the data as csv

    2) create the table with the correct datatypes before importing.

    That worked for me, and was alot simpler; although I believe your solution does help demonstrate some of the capabilties of SSIS.

    Mark

  • Much easier way is:

    Load the Excel file with First Row as Header off even it contains the header. This can ensure all the fields are read in text format. And then add a Conditional Split to skip the first row.

  • I thought the article was well-written and easy to follow. I do agree with others that loading the first row (header) as data in the source would be the simplest solution, yet I understand why you would miss this initially.

  • I may be missing something, but couldn't you just modify the select clause on the spreadsheet and manually cast to the correct datatype?

  • Call me crazy, but I think I would do a global change on the column to eliminate the dash.

  • I found an easier way using steps in this MS KB article on the very same subject. I tried it and it works very well.

    Solution : Set ConnectionString property of Excel source file as following

    Provider=Microsoft.Jet.OLEDB.4.0;Data Source=filepath\filename.xls;Extended Properties=”EXCEL 8.0;HDR=YES;IMEX=1“;

    NOTE: Setting IMEX=1 tells the driver to use Import mode. In this state, the registry setting ImportMixedTypes=Text will be noticed. This forces mixed data to be converted to text. For this to work reliably, you may also have to modify the registry setting, TypeGuessRows=8. The ISAM driver by default looks at the first eight rows and from that sampling determines the datatype. If this eight row sampling is all numeric, then setting IMEX=1 will not convert the default datatype to Text; it will remain numeric.

    You must be careful that IMEX=1 not be used indiscriminately. This is IMPORT mode, so the results may be unpredictable if you try to do appends or updates of data in this mode.

    Reference : http://support.microsoft.com/kb/194124/en-us

  • I have heard several 'fixes' for this problem. From what I have read it appears to be a major design flaw in the excel provider which covers all versions including 2007.

    Essentially it is down to these 2 registry keys (not sure which computer(s) these are read from; source, destination or SSIS run host)

    HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Jet\3.5\Engines\Excel\TypeGuessRows=8

    HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Jet\4.0\Engines\Excel\TypeGuessRows=8

    Excel reads the first 8 lines and determines maximum length and data type from that. So if you have

    header

    1

    2

    3

    4

    5

    6

    7

    8

    word

    the 9th line containing "word" will be come null because it isnt a float data type. It will also truncate all strings to either 255 or the longest in the first 8 lines if they are longer than 255. Unicode can also present some challenges!

    You can alter these registry keys to 16384 but that doesnt eliminate the problem with most excel files particularly since you can get over 1 million rows in 2007.

    The best way i have found is adding a fake first line which forces the right length and datatype e.g. A12345678....... and then skip this line in your ssis package. This is pretty similar to Ying Zhang suggest of leaving the header in. Unfortunatley the header is not guaranteed to be long enough to prevent truncation of long text fields (over 255).

    The only true way to get round this is to refuse to accept data in excel! If only:)

  • Simply changing the connection string doesn't work as I've tried before.

    David Greenberg (4/28/2008)


    I found an easier way using steps in this MS KB article on the very same subject. I tried it and it works very well.

    Solution : Set ConnectionString property of Excel source file as following

    Provider=Microsoft.Jet.OLEDB.4.0;Data Source=filepath\filename.xls;Extended Properties=”EXCEL 8.0;HDR=YES;IMEX=1“;

    NOTE: Setting IMEX=1 tells the driver to use Import mode. In this state, the registry setting ImportMixedTypes=Text will be noticed. This forces mixed data to be converted to text. For this to work reliably, you may also have to modify the registry setting, TypeGuessRows=8. The ISAM driver by default looks at the first eight rows and from that sampling determines the datatype. If this eight row sampling is all numeric, then setting IMEX=1 will not convert the default datatype to Text; it will remain numeric.

    You must be careful that IMEX=1 not be used indiscriminately. This is IMPORT mode, so the results may be unpredictable if you try to do appends or updates of data in this mode.

    Reference : http://support.microsoft.com/kb/194124/en-us

  • I agree that this situation can be very frustrating. I have always used the IMEX=1 solution and found it very useful.

    Thanks for the script for importing from Excel, I saved that in case it is helpful for something in the future.

    Holly

  • I've read article and found that F5 does not work to repopulate fields.

    What is the right key?

    From other side I used different method.

    I open another workbook,

    mark all cells as text

    and copy the original one as text

    through the Pasta special option on Edit menu.

    Even if some columns are numeric SQL will recognize them

    at load.

  • I have to agree with David, IMEX=1 is a much easier solution and we use this at where I work with excel for a number of different packages. The only thing I have to add is that setting the registry value to 8 is a little low you may want to try 1000.

    Steve
    http://stevefibich.net
    http://utilitydb.codeplex.com/

Viewing 15 posts - 1 through 15 (of 41 total)

You must be logged in to reply to this topic. Login to reply