|
|
|
Old Hand
      
Group: General Forum Members
Last Login: Thursday, November 26, 2009 5:45 AM
Points: 357,
Visits: 58
|
|
I have heard several 'fixes' for this problem. From what I have read it appears to be a major design flaw in the excel provider which covers all versions including 2007.
Essentially it is down to these 2 registry keys (not sure which computer(s) these are read from; source, destination or SSIS run host)
HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Jet\3.5\Engines\Excel\TypeGuessRows=8 HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Jet\4.0\Engines\Excel\TypeGuessRows=8
Excel reads the first 8 lines and determines maximum length and data type from that. So if you have
header 1 2 3 4 5 6 7 8 word
the 9th line containing "word" will be come null because it isnt a float data type. It will also truncate all strings to either 255 or the longest in the first 8 lines if they are longer than 255. Unicode can also present some challenges!
You can alter these registry keys to 16384 but that doesnt eliminate the problem with most excel files particularly since you can get over 1 million rows in 2007.
The best way i have found is adding a fake first line which forces the right length and datatype e.g. A12345678....... and then skip this line in your ssis package. This is pretty similar to Ying Zhang suggest of leaving the header in. Unfortunatley the header is not guaranteed to be long enough to prevent truncation of long text fields (over 255).
The only true way to get round this is to refuse to accept data in excel! If only:)
|
|
|
|
|
Grasshopper
      
Group: General Forum Members
Last Login: Wednesday, November 21, 2012 9:52 AM
Points: 10,
Visits: 61
|
|
Simply changing the connection string doesn't work as I've tried before.
David Greenberg (4/28/2008) I found an easier way using steps in this MS KB article on the very same subject. I tried it and it works very well.
Solution : Set ConnectionString property of Excel source file as following
Provider=Microsoft.Jet.OLEDB.4.0;Data Source=filepath\filename.xls;Extended Properties=”EXCEL 8.0;HDR=YES;IMEX=1“;
NOTE: Setting IMEX=1 tells the driver to use Import mode. In this state, the registry setting ImportMixedTypes=Text will be noticed. This forces mixed data to be converted to text. For this to work reliably, you may also have to modify the registry setting, TypeGuessRows=8. The ISAM driver by default looks at the first eight rows and from that sampling determines the datatype. If this eight row sampling is all numeric, then setting IMEX=1 will not convert the default datatype to Text; it will remain numeric.
You must be careful that IMEX=1 not be used indiscriminately. This is IMPORT mode, so the results may be unpredictable if you try to do appends or updates of data in this mode.
Reference : http://support.microsoft.com/kb/194124/en-us
|
|
|
|
|
Valued Member
      
Group: General Forum Members
Last Login: Friday, October 28, 2011 2:02 PM
Points: 71,
Visits: 306
|
|
I agree that this situation can be very frustrating. I have always used the IMEX=1 solution and found it very useful.
Thanks for the script for importing from Excel, I saved that in case it is helpful for something in the future.
Holly
|
|
|
|
|
Forum Newbie
      
Group: General Forum Members
Last Login: Friday, April 19, 2013 12:19 PM
Points: 9,
Visits: 88
|
|
I've read article and found that F5 does not work to repopulate fields. What is the right key? From other side I used different method. I open another workbook, mark all cells as text and copy the original one as text through the Pasta special option on Edit menu. Even if some columns are numeric SQL will recognize them at load.
|
|
|
|
|
SSC Veteran
      
Group: General Forum Members
Last Login: Thursday, May 16, 2013 12:43 PM
Points: 220,
Visits: 224
|
|
I have to agree with David, IMEX=1 is a much easier solution and we use this at where I work with excel for a number of different packages. The only thing I have to add is that setting the registry value to 8 is a little low you may want to try 1000.
Steve http://stevefibich.net http://utilitydb.codeplex.com/
|
|
|
|
|
Forum Newbie
      
Group: General Forum Members
Last Login: Monday, April 28, 2008 2:16 PM
Points: 2,
Visits: 0
|
|
| This doesn't always work. From the research I've done excel is "smarter" (dumber) than that. It actually analyzes the first x rows and it's "majority rules" so if the first row (dummy row) is text and the next 6 rows are numbers then numbers it is. Very frustrating.
|
|
|
|
|
Forum Newbie
      
Group: General Forum Members
Last Login: Monday, April 28, 2008 2:16 PM
Points: 2,
Visits: 0
|
|
| good post, thanks. I found another way around this similar to the first method you tried. If you save as tab delimited text and then use ssis it works fine (just did this last week). Not automated unfortunately but it's a very easy fix for a quick import.
|
|
|
|
|
Right there with Babe
      
Group: General Forum Members
Last Login: 2 days ago @ 3:27 PM
Points: 739,
Visits: 931
|
|
Only problem I've noted with exporting to some form of delimited text and then trying to use SSIS is that Excel tries to save space when exporting. If you have a possible 20 columns, but one row doesn't have the last 3 columns filled in, you get no delimiters for those columns. I usually find this out when I get some "too big" or "can't CAST" message for something like an INT field which now has text in it due to the fields getting wrapped.
There are some CONNECT tickets open about allowing us to specify whether EOL characters can override the column delimiters, but the behavior won't change for SQL 2005. The best we've gotten so far are people suggesting script tasks to correctly read EOL characters. To make it easier on me, I add a "dummy" column as the last column and populate it with integers all the way to the end of the dataset. I can then export with no missing columns. Maybe Excel 2007 is better about that, but I never had good luck with it prior to 2007.
I appreciate the article, though. This is one of the really annoying parts of trying to work with Excel and SSIS. Excel is really common and easy to use so we get it as source data from a lot of our customers. I just wish it played better with SSIS.
|
|
|
|
|
Grasshopper
      
Group: General Forum Members
Last Login: Wednesday, November 26, 2008 10:46 AM
Points: 13,
Visits: 8
|
|
The article demonstrates some valuable techniques, but for a situation like this, seems overkill. My lazy method (using DTS): use the DTS import wizard, check off Drop and Recreate, and save the DTS package. Open the package, modify the create table statement to fix the datatype for the offending field (zipcode changes to varchar). Save and re-run the package.
|
|
|
|
|
Right there with Babe
      
Group: General Forum Members
Last Login: 2 days ago @ 3:27 PM
Points: 739,
Visits: 931
|
|
That lazy method can work a lot of the time, but I know it won't in some cases. Example - US has a postal code system that has two common components - ZIP and ZIP+4. Zip is a 5 digit code. ZIP+4 has a format of #####-####. In Excel, if this column is just left as General, the underlying JET provider reading the sheet interprets this sheet based on what it sees in the column. In this case, that PostalCode column is treated as a FLOAT for import purposes. All of those ZIP+4 values are lost because they can't convert to a FLOAT type, regardless of your target column.
Admittedly, I have not tried this in SQL2008's import. If that's changed, great, but the way I understand it is that we can't actually override the way JET works in any easy manner. There are some properties you can set at the driver level, but not through the Wizard. You can read through the table, but IIRC, that results in a "majority wins" type of determination for data types.
In any case, Excel just isn't a friendly source for a variety of reasons if you're trying to handle a bunch of mixed data. When you're dealing with purely numeric or character data, it's not too bad. Once you start mixing in data types within a column, you're going to have issues.
|
|
|
|