Disappearing Data

  • I have an import process in SSIS that pulls data out of an Excel file, performs some transformations, and then puts it in an SQL 2005 table.

    I have just one problem with it: when there is mixed data in a column in the Excel file, some of it disappears.

    Very specifically, there is a Time column in the file, and some of the data is typed in as strings, like "11AM", and some is in there as Excel "time" data. When the column is formatted as text, the "time" data is in there as a floating point number.

    The stuff that's stored as numbers disappears on import. Becomes null.

    Interestingly enough, OpenRowSet on the same files also returns null values for those, and strings for the rest of it.

    If the Excel file is first saved as a tab-delimited text file, then imported using the flat-file connection, instead of the Excel connection, it works just fine, and all the data is in there as strings that look like times (not floating point numbers).

    Has anyone else run into this problem and solved it?

    I've attached a sample of a few rows (annonymized) of the Excel file. The openrowset query looks like this:

    select *

    FROM OPENROWSET('Microsoft.Jet.OLEDB.4.0',

    'Excel 8.0;Database=c:\TestFile\TimeTest.xls;',

    'SELECT * FROM [Sheet1$]')

    Can't post the SSIS package, but anything that pulls from this file will duplicate the problem.

    Any ideas?

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • I seem to be getting opposite results than you. I get nulls for the strings and a datetime of 1899-12-30 11:00:00.000 for the time. It seems the lack of a space between 1100 and AM is causing you some heartaches as it can't really be parsed as a time. I seem to remember dealing with this type of thing back in DTS a few years ago and the issue was that it only looked at the first x number of rows to try and match a type to the column or some such thing.

    I take it there's no chance you can do any data cleansing vba style on the xls?

    -Luke.

    To help us help you read this[/url]For better help with performance problems please read this[/url]

  • I found a solution. It's based on the way OLE DB works. Turns out that it scans a few rows of the Excel file, figures out what data type the column is, and then makes anything that doesn't conform to that data type return null.

    So, for me, it found that most of the first few rows were text strings, and set that as nvarchar, while for you, it found that most of the first rows were datetime. In my case, the numeric data was then set to null, while in yours, the text data gets set to null.

    The solution is, in the package designer, go to the properties of the Excel connection, and add "IMEX=1" to the extended properties section. If it already has other extended properties, separate it from them with a semicolon.

    That worked for me. Should work for you.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

Viewing 3 posts - 1 through 2 (of 2 total)

You must be logged in to reply to this topic. Login to reply