SQL Clone
SQLServerCentral is supported by Redgate
 
Log in  ::  Register  ::  Not logged in
 
 
 


How to Handle NULL Values Loaded by SSIS from Excel Files


How to Handle NULL Values Loaded by SSIS from Excel Files

Author
Message
JB-117464
JB-117464
SSC-Addicted
SSC-Addicted (425 reputation)SSC-Addicted (425 reputation)SSC-Addicted (425 reputation)SSC-Addicted (425 reputation)SSC-Addicted (425 reputation)SSC-Addicted (425 reputation)SSC-Addicted (425 reputation)SSC-Addicted (425 reputation)

Group: General Forum Members
Points: 425 Visits: 58
I have heard several 'fixes' for this problem. From what I have read it appears to be a major design flaw in the excel provider which covers all versions including 2007.

Essentially it is down to these 2 registry keys (not sure which computer(s) these are read from; source, destination or SSIS run host)

HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Jet\3.5\Engines\Excel\TypeGuessRows=8
HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Jet\4.0\Engines\Excel\TypeGuessRows=8

Excel reads the first 8 lines and determines maximum length and data type from that. So if you have

header
1
2
3
4
5
6
7
8
word

the 9th line containing "word" will be come null because it isnt a float data type. It will also truncate all strings to either 255 or the longest in the first 8 lines if they are longer than 255. Unicode can also present some challenges!

You can alter these registry keys to 16384 but that doesnt eliminate the problem with most excel files particularly since you can get over 1 million rows in 2007.

The best way i have found is adding a fake first line which forces the right length and datatype e.g. A12345678....... and then skip this line in your ssis package. This is pretty similar to Ying Zhang suggest of leaving the header in. Unfortunatley the header is not guaranteed to be long enough to prevent truncation of long text fields (over 255).

The only true way to get round this is to refuse to accept data in excel! If onlySmile



thejackfy
thejackfy
SSC Rookie
SSC Rookie (36 reputation)SSC Rookie (36 reputation)SSC Rookie (36 reputation)SSC Rookie (36 reputation)SSC Rookie (36 reputation)SSC Rookie (36 reputation)SSC Rookie (36 reputation)SSC Rookie (36 reputation)

Group: General Forum Members
Points: 36 Visits: 81
Simply changing the connection string doesn't work as I've tried before.

David Greenberg (4/28/2008)
I found an easier way using steps in this MS KB article on the very same subject. I tried it and it works very well.

Solution : Set ConnectionString property of Excel source file as following

Provider=Microsoft.Jet.OLEDB.4.0;Data Source=filepath\filename.xls;Extended Properties=”EXCEL 8.0;HDR=YES;IMEX=1“;

NOTE: Setting IMEX=1 tells the driver to use Import mode. In this state, the registry setting ImportMixedTypes=Text will be noticed. This forces mixed data to be converted to text. For this to work reliably, you may also have to modify the registry setting, TypeGuessRows=8. The ISAM driver by default looks at the first eight rows and from that sampling determines the datatype. If this eight row sampling is all numeric, then setting IMEX=1 will not convert the default datatype to Text; it will remain numeric.

You must be careful that IMEX=1 not be used indiscriminately. This is IMPORT mode, so the results may be unpredictable if you try to do appends or updates of data in this mode.

Reference : http://support.microsoft.com/kb/194124/en-us


Holly Kilpatrick
Holly Kilpatrick
SSC-Enthusiastic
SSC-Enthusiastic (127 reputation)SSC-Enthusiastic (127 reputation)SSC-Enthusiastic (127 reputation)SSC-Enthusiastic (127 reputation)SSC-Enthusiastic (127 reputation)SSC-Enthusiastic (127 reputation)SSC-Enthusiastic (127 reputation)SSC-Enthusiastic (127 reputation)

Group: General Forum Members
Points: 127 Visits: 306
I agree that this situation can be very frustrating. I have always used the IMEX=1 solution and found it very useful.

Thanks for the script for importing from Excel, I saved that in case it is helpful for something in the future.

Holly



Igor Zaychik
Igor Zaychik
SSC Rookie
SSC Rookie (38 reputation)SSC Rookie (38 reputation)SSC Rookie (38 reputation)SSC Rookie (38 reputation)SSC Rookie (38 reputation)SSC Rookie (38 reputation)SSC Rookie (38 reputation)SSC Rookie (38 reputation)

Group: General Forum Members
Points: 38 Visits: 123
I've read article and found that F5 does not work to repopulate fields.
What is the right key?
From other side I used different method.
I open another workbook,
mark all cells as text
and copy the original one as text
through the Pasta special option on Edit menu.
Even if some columns are numeric SQL will recognize them
at load.
FibRock
FibRock
SSC Veteran
SSC Veteran (280 reputation)SSC Veteran (280 reputation)SSC Veteran (280 reputation)SSC Veteran (280 reputation)SSC Veteran (280 reputation)SSC Veteran (280 reputation)SSC Veteran (280 reputation)SSC Veteran (280 reputation)

Group: General Forum Members
Points: 280 Visits: 257
I have to agree with David, IMEX=1 is a much easier solution and we use this at where I work with excel for a number of different packages. The only thing I have to add is that setting the registry value to 8 is a little low you may want to try 1000.

Steve
http://stevefibich.net
http://utilitydb.codeplex.com/
rkolsky
rkolsky
SSC Rookie
SSC Rookie (26 reputation)SSC Rookie (26 reputation)SSC Rookie (26 reputation)SSC Rookie (26 reputation)SSC Rookie (26 reputation)SSC Rookie (26 reputation)SSC Rookie (26 reputation)SSC Rookie (26 reputation)

Group: General Forum Members
Points: 26 Visits: 0
This doesn't always work. From the research I've done excel is "smarter" (dumber) than that. It actually analyzes the first x rows and it's "majority rules" so if the first row (dummy row) is text and the next 6 rows are numbers then numbers it is. Very frustrating.
rkolsky
rkolsky
SSC Rookie
SSC Rookie (26 reputation)SSC Rookie (26 reputation)SSC Rookie (26 reputation)SSC Rookie (26 reputation)SSC Rookie (26 reputation)SSC Rookie (26 reputation)SSC Rookie (26 reputation)SSC Rookie (26 reputation)

Group: General Forum Members
Points: 26 Visits: 0
good post, thanks. I found another way around this similar to the first method you tried. If you save as tab delimited text and then use ssis it works fine (just did this last week). Not automated unfortunately but it's a very easy fix for a quick import.
Peter Schott
Peter Schott
SSCommitted
SSCommitted (1.9K reputation)SSCommitted (1.9K reputation)SSCommitted (1.9K reputation)SSCommitted (1.9K reputation)SSCommitted (1.9K reputation)SSCommitted (1.9K reputation)SSCommitted (1.9K reputation)SSCommitted (1.9K reputation)

Group: General Forum Members
Points: 1856 Visits: 1923
Only problem I've noted with exporting to some form of delimited text and then trying to use SSIS is that Excel tries to save space when exporting. If you have a possible 20 columns, but one row doesn't have the last 3 columns filled in, you get no delimiters for those columns. I usually find this out when I get some "too big" or "can't CAST" message for something like an INT field which now has text in it due to the fields getting wrapped.

There are some CONNECT tickets open about allowing us to specify whether EOL characters can override the column delimiters, but the behavior won't change for SQL 2005. The best we've gotten so far are people suggesting script tasks to correctly read EOL characters. To make it easier on me, I add a "dummy" column as the last column and populate it with integers all the way to the end of the dataset. I can then export with no missing columns. Maybe Excel 2007 is better about that, but I never had good luck with it prior to 2007.

I appreciate the article, though. This is one of the really annoying parts of trying to work with Excel and SSIS. Excel is really common and easy to use so we get it as source data from a lot of our customers. I just wish it played better with SSIS.
weida
weida
SSC Rookie
SSC Rookie (37 reputation)SSC Rookie (37 reputation)SSC Rookie (37 reputation)SSC Rookie (37 reputation)SSC Rookie (37 reputation)SSC Rookie (37 reputation)SSC Rookie (37 reputation)SSC Rookie (37 reputation)

Group: General Forum Members
Points: 37 Visits: 8
The article demonstrates some valuable techniques, but for a situation like this, seems overkill. My lazy method (using DTS): use the DTS import wizard, check off Drop and Recreate, and save the DTS package. Open the package, modify the create table statement to fix the datatype for the offending field (zipcode changes to varchar). Save and re-run the package.



Peter Schott
Peter Schott
SSCommitted
SSCommitted (1.9K reputation)SSCommitted (1.9K reputation)SSCommitted (1.9K reputation)SSCommitted (1.9K reputation)SSCommitted (1.9K reputation)SSCommitted (1.9K reputation)SSCommitted (1.9K reputation)SSCommitted (1.9K reputation)

Group: General Forum Members
Points: 1856 Visits: 1923
That lazy method can work a lot of the time, but I know it won't in some cases. Example - US has a postal code system that has two common components - ZIP and ZIP+4. Zip is a 5 digit code. ZIP+4 has a format of #####-####. In Excel, if this column is just left as General, the underlying JET provider reading the sheet interprets this sheet based on what it sees in the column. In this case, that PostalCode column is treated as a FLOAT for import purposes. All of those ZIP+4 values are lost because they can't convert to a FLOAT type, regardless of your target column.

Admittedly, I have not tried this in SQL2008's import. If that's changed, great, but the way I understand it is that we can't actually override the way JET works in any easy manner. There are some properties you can set at the driver level, but not through the Wizard. You can read through the table, but IIRC, that results in a "majority wins" type of determination for data types.

In any case, Excel just isn't a friendly source for a variety of reasons if you're trying to handle a bunch of mixed data. When you're dealing with purely numeric or character data, it's not too bad. Once you start mixing in data types within a column, you're going to have issues.
Go


Permissions

You can't post new topics.
You can't post topic replies.
You can't post new polls.
You can't post replies to polls.
You can't edit your own topics.
You can't delete your own topics.
You can't edit other topics.
You can't delete other topics.
You can't edit your own posts.
You can't edit other posts.
You can't delete your own posts.
You can't delete other posts.
You can't post events.
You can't edit your own events.
You can't edit other events.
You can't delete your own events.
You can't delete other events.
You can't send private messages.
You can't send emails.
You can read topics.
You can't vote in polls.
You can't upload attachments.
You can download attachments.
You can't post HTML code.
You can't edit HTML code.
You can't post IFCode.
You can't post JavaScript.
You can post emoticons.
You can't post or upload images.

Select a forum

































































































































































SQLServerCentral


Search