How to Handle NULL Values Loaded by SSIS from Excel Files

  • This doesn't always work. From the research I've done excel is "smarter" (dumber) than that. It actually analyzes the first x rows and it's "majority rules" so if the first row (dummy row) is text and the next 6 rows are numbers then numbers it is. Very frustrating.

  • good post, thanks. I found another way around this similar to the first method you tried. If you save as tab delimited text and then use ssis it works fine (just did this last week). Not automated unfortunately but it's a very easy fix for a quick import.

  • Only problem I've noted with exporting to some form of delimited text and then trying to use SSIS is that Excel tries to save space when exporting. If you have a possible 20 columns, but one row doesn't have the last 3 columns filled in, you get no delimiters for those columns. I usually find this out when I get some "too big" or "can't CAST" message for something like an INT field which now has text in it due to the fields getting wrapped.

    There are some CONNECT tickets open about allowing us to specify whether EOL characters can override the column delimiters, but the behavior won't change for SQL 2005. The best we've gotten so far are people suggesting script tasks to correctly read EOL characters. To make it easier on me, I add a "dummy" column as the last column and populate it with integers all the way to the end of the dataset. I can then export with no missing columns. Maybe Excel 2007 is better about that, but I never had good luck with it prior to 2007.

    I appreciate the article, though. This is one of the really annoying parts of trying to work with Excel and SSIS. Excel is really common and easy to use so we get it as source data from a lot of our customers. I just wish it played better with SSIS.

  • The article demonstrates some valuable techniques, but for a situation like this, seems overkill. My lazy method (using DTS): use the DTS import wizard, check off Drop and Recreate, and save the DTS package. Open the package, modify the create table statement to fix the datatype for the offending field (zipcode changes to varchar). Save and re-run the package.

  • That lazy method can work a lot of the time, but I know it won't in some cases. Example - US has a postal code system that has two common components - ZIP and ZIP+4. Zip is a 5 digit code. ZIP+4 has a format of #####-####. In Excel, if this column is just left as General, the underlying JET provider reading the sheet interprets this sheet based on what it sees in the column. In this case, that PostalCode column is treated as a FLOAT for import purposes. All of those ZIP+4 values are lost because they can't convert to a FLOAT type, regardless of your target column.

    Admittedly, I have not tried this in SQL2008's import. If that's changed, great, but the way I understand it is that we can't actually override the way JET works in any easy manner. There are some properties you can set at the driver level, but not through the Wizard. You can read through the table, but IIRC, that results in a "majority wins" type of determination for data types.

    In any case, Excel just isn't a friendly source for a variety of reasons if you're trying to handle a bunch of mixed data. When you're dealing with purely numeric or character data, it's not too bad. Once you start mixing in data types within a column, you're going to have issues.

  • Peter: You are absolutely correct! - and very generous of you to let me down in such a face-saving manner. That'll teach me to shoot off a comment without testing.

  • I've just been bitten by this area hard. I had no issues with DTS, but with SSIS the game has been changed completely. I also wanted you to be well aware of this before you were hit by this like I was. (Customer called wondering where all there zip codes went. 😛 )

    -Pete

  • Hello there,

    I am new with this! I followed the steps and I got below error at the end.

    at System.Data.OleDb.OleDbCommand.ExecuteCommandTextErrorHandling(OleDbHResult hr)

    at System.Data.OleDb.OleDbCommand.ExecuteCommandTextForSingleResult(tagDBPARAMS dbParams, Object& executeResult)

    at System.Data.OleDb.OleDbCommand.ExecuteCommandText(Object& executeResult)

    at System.Data.OleDb.OleDbCommand.ExecuteCommand(CommandBehavior behavior, Object& executeResult)

    at System.Data.OleDb.OleDbCommand.ExecuteReaderInternal(CommandBehavior behavior, String method)

    at System.Data.OleDb.OleDbCommand.ExecuteReader(CommandBehavior behavior)

    at System.Data.OleDb.OleDbCommand.ExecuteReader()

    at ScriptTask_16d19b565ae44ef8858257e2c7d700ba.ScriptMain.Main()

  • All rather interesting solutions and to some degree advanced. Being a newbie a few years back I never caught on to some of the solutions above, but I did find a quick and easy way to alleviate the issue.

    I myself have run into several issues moving data from Excel into SQL Server. Most of my hangups have to do with dates.

    When using dates derived from formulas (ex: creating a calendar table) I have found it easier to create a macro that inserts a single quote in front of the date value and forces the string value of the date into the cell, rather than the serial number of the date. This works well for alpha-numeric values like SSNs ZipCodes, Phone Numbers and such.

    Here is some code I added to my personal workbook that helps with this. It's crude but I am the only one to use this.

    Private Sub Workbook_Open()

    'Reset Right Click Menu

    Application.CommandBars("Cell").Reset

    'Create Object reference to right click cell properties menu in excel

    With Application.CommandBars("Cell").Controls

    With .Add

    'Caption for menu

    .Caption = "Convert Values"

    'Reference to macro that will be called

    .OnAction = ThisWorkbook.Name & "!PasteSpecial"

    .Tag = "PasteValues" 'Tag reference not needed

    .BeginGroup = True

    End With

    End Sub

    Sub ConvertVals()

    Dim rng_Selection As Range

    Set rng_Selection = Selection

    For Each Cell In Range(rng_Selection.Address)

    Cell.Value = "'" & Cell.Value

    Next

    End Sub

    I usually create my insert/update statements inside of excel so the single quote never shows up, so I do not know if this works well with SSIS. I only have the Express version and I can't really test. I would be willing to bet that it does not affect it since excel never prints a single quote by itself, and primarily uses it to force string values when there is only 1 quote in the beginning.

  • Why is this so difficult?

    1. Save excel file as CSV

    2. Create flat file connection in SSIS

    3. Click on Advanced tab

    4. Set the value for the required column to STRING (12)

    5. Use a data flow task to import it.

    (If I had access to a web server I could put up some screen captures, but unfortunately, I don't)

    cheers,

    niall

  • By itself, that's not too bad. However, when you have an Excel file with 15 sheets and need to import all of them, preferably into pre-existing/typed columns, it becomes a little more difficult. The ODBC interface to Excel leaves a lot to be desired.

    Also, see some of my other issues with Excel and saving out to CSV. If you have empty columns at the end of your data set (not too uncommon), Excel doesn't populate empty columns into your CSV for you. It leaves off those commas and ends the line. Excel can re-import that without too much trouble. SSIS cannot. It reads past the end of the line because the next delimiter it expects is a comma. Once it reaches the last field it expects as end of line based solely on the number of commas, it reads all the way up to the EOL character. That in turn almost always leads to some sort of error indicating that you're trying to put too much data into a column.

    Solution to that - a "dummy" column for your last column in Excel that is completely filled with data, even if just an integer or an 'x'.

    Sadly, this is all outside of the control of the SSIS/SQL Team as it has to do almost completely with how Excel and ODBC work together (read - not very well). 😛

    -Peter

  • Hi Peter,

    If I understand you correctly, you have an excel file filled with a number of worksheets, and you would like to import each worksheet into 1 or more tables inside a database.

    I have not experienced the problems you mentioned with x number of 'blank' columns not being exported into CSV from Excel, however...

    If you create a (number of) flat file connections (NOTE - not excel connections), one per worksheet that has been exported to CSV, then create a data flow task, you can set the data type of the 'import' to whatever you like. If you need extra columns, you can (inside the data flow task) set the input of a 'derived column' to the original source (csv) file, and import a NULL or whatever your default value is - or you could talk to your DBA and set a default on the column(s). This should fix the "excel doesn't export blank columns" problem.

    I'm unsure if I have understood your problem correctly though. If I haven't, perhaps you could PM me and include the table create and a portion of the csv file?

  • We're on different pages. We receive these from customers exporting their own data. The Excel file then contains multiple sheets, some filled out to the very end with data, some not. As an experiment, try creating a simple 3 column Excel sheet. Give each column a header. Fill out 4 rows, but don't fill in the last cell or two for some of the rows. Save as CSV. Open that in a text editor. Unless things have changed, you will not have extra commas indicating an empty column. That means that SSIS doesn't know what to do until it hits the actual EOL character or another comma/delimiter if it expects that first.

    We can also add in some fun with people typing things like " The person said, "I want to go home later." and then left." in one column. 😀

    I'm not really having major issues with this myself as I've learned a lot of workarounds to the fun that is Excel. My main point in responding to this thread is to help people not get caught as easily by the same things I've seen way too many times. DTS handled things differently, from Unicode conversions to mixed data, to how it handled imperfectly formed CSV files. SSIS is more strict and that's generally a good thing, but it can be really frustrating when DTS "just worked" and SSIS seems to require a lot more hand-holding. 🙂

  • Hmmm....I see your problem.

    You could probably create a script task to add the relevant commas if this is an ongoing issue, but that seems a bit like overkill.

  • Another way is to add a "Derived Column Transform" and use the expression "?" with a default value that replaces null.

    Example: ISNULL([testColumn])?0:[testColumn]

    Raj

Viewing 15 posts - 16 through 30 (of 41 total)

You must be logged in to reply to this topic. Login to reply