Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase ««12345»»»

How to Handle NULL Values Loaded by SSIS from Excel Files Expand / Collapse
Author
Message
Posted Monday, April 28, 2008 7:41 AM
Old Hand

Old HandOld HandOld HandOld HandOld HandOld HandOld HandOld Hand

Group: General Forum Members
Last Login: Thursday, November 26, 2009 5:45 AM
Points: 357, Visits: 58
I have heard several 'fixes' for this problem. From what I have read it appears to be a major design flaw in the excel provider which covers all versions including 2007.

Essentially it is down to these 2 registry keys (not sure which computer(s) these are read from; source, destination or SSIS run host)

HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Jet\3.5\Engines\Excel\TypeGuessRows=8
HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Jet\4.0\Engines\Excel\TypeGuessRows=8

Excel reads the first 8 lines and determines maximum length and data type from that. So if you have

header
1
2
3
4
5
6
7
8
word

the 9th line containing "word" will be come null because it isnt a float data type. It will also truncate all strings to either 255 or the longest in the first 8 lines if they are longer than 255. Unicode can also present some challenges!

You can alter these registry keys to 16384 but that doesnt eliminate the problem with most excel files particularly since you can get over 1 million rows in 2007.

The best way i have found is adding a fake first line which forces the right length and datatype e.g. A12345678....... and then skip this line in your ssis package. This is pretty similar to Ying Zhang suggest of leaving the header in. Unfortunatley the header is not guaranteed to be long enough to prevent truncation of long text fields (over 255).

The only true way to get round this is to refuse to accept data in excel! If only:)



Post #491384
Posted Monday, April 28, 2008 7:42 AM
Grasshopper

GrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopper

Group: General Forum Members
Last Login: Monday, October 20, 2014 2:37 PM
Points: 10, Visits: 75
Simply changing the connection string doesn't work as I've tried before.

David Greenberg (4/28/2008)
I found an easier way using steps in this MS KB article on the very same subject. I tried it and it works very well.

Solution : Set ConnectionString property of Excel source file as following

Provider=Microsoft.Jet.OLEDB.4.0;Data Source=filepath\filename.xls;Extended Properties=”EXCEL 8.0;HDR=YES;IMEX=1“;

NOTE: Setting IMEX=1 tells the driver to use Import mode. In this state, the registry setting ImportMixedTypes=Text will be noticed. This forces mixed data to be converted to text. For this to work reliably, you may also have to modify the registry setting, TypeGuessRows=8. The ISAM driver by default looks at the first eight rows and from that sampling determines the datatype. If this eight row sampling is all numeric, then setting IMEX=1 will not convert the default datatype to Text; it will remain numeric.

You must be careful that IMEX=1 not be used indiscriminately. This is IMPORT mode, so the results may be unpredictable if you try to do appends or updates of data in this mode.

Reference : http://support.microsoft.com/kb/194124/en-us

Post #491385
Posted Monday, April 28, 2008 9:07 AM
Valued Member

Valued MemberValued MemberValued MemberValued MemberValued MemberValued MemberValued MemberValued Member

Group: General Forum Members
Last Login: Friday, October 28, 2011 2:02 PM
Points: 71, Visits: 306
I agree that this situation can be very frustrating. I have always used the IMEX=1 solution and found it very useful.

Thanks for the script for importing from Excel, I saved that in case it is helpful for something in the future.

Holly



Post #491507
Posted Monday, April 28, 2008 1:07 PM
Grasshopper

GrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopper

Group: General Forum Members
Last Login: Wednesday, November 5, 2014 1:21 PM
Points: 12, Visits: 104
I've read article and found that F5 does not work to repopulate fields.
What is the right key?
From other side I used different method.
I open another workbook,
mark all cells as text
and copy the original one as text
through the Pasta special option on Edit menu.
Even if some columns are numeric SQL will recognize them
at load.
Post #491666
Posted Monday, April 28, 2008 1:16 PM
SSC Veteran

SSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC Veteran

Group: General Forum Members
Last Login: Wednesday, July 16, 2014 12:43 PM
Points: 222, Visits: 242
I have to agree with David, IMEX=1 is a much easier solution and we use this at where I work with excel for a number of different packages. The only thing I have to add is that setting the registry value to 8 is a little low you may want to try 1000.

Steve
http://stevefibich.net
http://utilitydb.codeplex.com/
Post #491674
Posted Monday, April 28, 2008 2:20 PM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Monday, April 28, 2008 2:16 PM
Points: 2, Visits: 0
This doesn't always work. From the research I've done excel is "smarter" (dumber) than that. It actually analyzes the first x rows and it's "majority rules" so if the first row (dummy row) is text and the next 6 rows are numbers then numbers it is. Very frustrating.
Post #491718
Posted Monday, April 28, 2008 2:26 PM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Monday, April 28, 2008 2:16 PM
Points: 2, Visits: 0
good post, thanks. I found another way around this similar to the first method you tried. If you save as tab delimited text and then use ssis it works fine (just did this last week). Not automated unfortunately but it's a very easy fix for a quick import.
Post #491725
Posted Monday, April 28, 2008 2:54 PM
SSC Eights!

SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!

Group: General Forum Members
Last Login: Saturday, December 13, 2014 9:58 AM
Points: 858, Visits: 1,353
Only problem I've noted with exporting to some form of delimited text and then trying to use SSIS is that Excel tries to save space when exporting. If you have a possible 20 columns, but one row doesn't have the last 3 columns filled in, you get no delimiters for those columns. I usually find this out when I get some "too big" or "can't CAST" message for something like an INT field which now has text in it due to the fields getting wrapped.

There are some CONNECT tickets open about allowing us to specify whether EOL characters can override the column delimiters, but the behavior won't change for SQL 2005. The best we've gotten so far are people suggesting script tasks to correctly read EOL characters. To make it easier on me, I add a "dummy" column as the last column and populate it with integers all the way to the end of the dataset. I can then export with no missing columns. Maybe Excel 2007 is better about that, but I never had good luck with it prior to 2007.

I appreciate the article, though. This is one of the really annoying parts of trying to work with Excel and SSIS. Excel is really common and easy to use so we get it as source data from a lot of our customers. I just wish it played better with SSIS.



Post #491734
Posted Friday, October 3, 2008 6:43 AM
Grasshopper

GrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopper

Group: General Forum Members
Last Login: Wednesday, November 26, 2008 10:46 AM
Points: 13, Visits: 8
The article demonstrates some valuable techniques, but for a situation like this, seems overkill. My lazy method (using DTS): use the DTS import wizard, check off Drop and Recreate, and save the DTS package. Open the package, modify the create table statement to fix the datatype for the offending field (zipcode changes to varchar). Save and re-run the package.


Post #580230
Posted Friday, October 3, 2008 8:37 AM
SSC Eights!

SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!

Group: General Forum Members
Last Login: Saturday, December 13, 2014 9:58 AM
Points: 858, Visits: 1,353
That lazy method can work a lot of the time, but I know it won't in some cases. Example - US has a postal code system that has two common components - ZIP and ZIP+4. Zip is a 5 digit code. ZIP+4 has a format of #####-####. In Excel, if this column is just left as General, the underlying JET provider reading the sheet interprets this sheet based on what it sees in the column. In this case, that PostalCode column is treated as a FLOAT for import purposes. All of those ZIP+4 values are lost because they can't convert to a FLOAT type, regardless of your target column.

Admittedly, I have not tried this in SQL2008's import. If that's changed, great, but the way I understand it is that we can't actually override the way JET works in any easy manner. There are some properties you can set at the driver level, but not through the Wizard. You can read through the table, but IIRC, that results in a "majority wins" type of determination for data types.

In any case, Excel just isn't a friendly source for a variety of reasons if you're trying to handle a bunch of mixed data. When you're dealing with purely numeric or character data, it's not too bad. Once you start mixing in data types within a column, you're going to have issues.



Post #580325
« Prev Topic | Next Topic »

Add to briefcase ««12345»»»

Permissions Expand / Collapse