SSIS skips random rows from an Excel file

  • Greetings -

    I've got an issue to which I've not found any directly applicable information thus far on this forum or from general Googling.

    I am troubleshooting a SSIS 2005 package that I did not create. The package imports data from an Excel file and then executes a few stored procedures against the data.

    The import from Excel is being done using an Excel Source Component. The client has reported two instances where a very small number of rows were, for no apparent reason, skipped and not imported at all. An example is below.

    Desc1Desc2Desc3Desc4Desc5

    344334/3/201236852

    344334/3/201221431

    344334/3/201223681

    344334/3/201224781

    344334/3/201224601

    344334/3/201230582

    344334/3/201236721

    344334/3/201221051

    340384/3/201236651

    337494/3/201236913

    337494/3/201235231

    346304/3/201224781

    346304/3/201231192

    335984/3/201220694

    335984/3/201237416

    335984/3/201232673

    346214/3/201221473

    346294/3/201224781

    346294/3/201234421

    348944/3/201231194

    348944/3/201224781

    348944/3/201234772

    348944/3/201234931

    348944/3/201237771

    349354/3/201236291

    349354/3/201234131

    141554/3/201245484R1598

    282354/3/201242A12045

    349314/3/201242A14307

    This particular Excel file contained 831 rows and only the five underlined rows were NOT imported. I've read a few posts regarding SSIS treating missing delimiters in text files oddly, but Excel does not use delimiters (at least that one can see and/or manipulate).

    The Error Output and the Error Input set up on the source and destination (OLE DB Dest.) are all set to fail on error, and the package is completing, which is why it appears that the rows in question are simply being ignored for some reason.

    Can anyone offer any insight into how to get to the bottom of this? Specifically, I am still trying to determine where the root of the issue is - SSIS or funky stuff in the Excel file that I am not seeing.

    Thanks In Advance.

  • I'll start with the assumption that SSIS is not sending your rows into oblivion without being told to do so. To confirm, are you using the EXCEL connector built into SSIS or are you specifying an Excel provider for your connection? Does the Data Flow have any transformations in between the Excel file and the database table, e.g. a Conditional Split. Does the destination table have any triggers on it? Is the table the Excel-data is loaded into the same as the one where they have been deemed missing? Are there any other steps happening in between the data-load and the verification where the rows are deemed missing?

    There are no special teachers of virtue, because virtue is taught by the whole community.
    --Plato

  • Thanks very much for the quick reply.

    One detail that I neglected to mention is that I have not been able to replicate this issue. The first time the client reported it, I wrote it off as fluke after not being able to replicate it. Since it has happened again, however, there appears to be something going on somewhere.

    As for your questions:

    The package is using the Excel Connection Manager.

    There is a Data Transformation step just after the import. The transformation converts the data in each column to a string (DT_STR) of length 255. The Error Output option for each row is set to Fail Component. No failure is occurring and this is what is making me think that the rows in question are simply being skipped.

    The destination table has no triggers on it. It is used as a staging table, to temporarily hold the imported data before it is inserted into its final destination.

    The first table where I am not seeing the data is the staging table, which is specified in the Excel Source Data Flow. This is the first place in the database after the import completes where I should see the data. This table persists between executions of the package, so I can query for the records after execution and see that they are not present.

    All other processing (i.e. the stored procedures) operate on the data once it is in the staging stable.

  • If you rerun the package to an empty table, do all the records appear? You might try debugging the package and put a data viewer before the write to the table. See if the data is flowing through the package.

  • Lynn Pettis (4/10/2012)


    If you rerun the package to an empty table, do all the records appear?

    Not sure, since I've not been able to replicate in testing. Rerunning is not an option live because of the duplicate records that would be created.

    You might try debugging the package and put a data viewer before the write to the table. See if the data is flowing through the package.

    Good thought, thanks.

  • Greetings All -

    Just thought I'd post an update since I still have not found the cause for, and therefore the solution to, this issue.

    I tried using the "IMEX=1" syntax at the end of the Excel connector's connection string to force Import Mode, but this did not seem to have any effect on the issue.

    I am now looking into the possibility of avoiding the issue altogether. I am going to try and see about getting the source data into a different format (e.g. delimited text).

    Thanks for the replies.

  • Have you tryed to save the excel file as a TAB separated textfile

    and use SSIS to import the text file?

    If some rows then are missing the problem is in the content of the file.

    /Gosta

  • charris1119 (4/10/2012)


    Lynn Pettis (4/10/2012)


    If you rerun the package to an empty table, do all the records appear?

    Not sure, since I've not been able to replicate in testing. Rerunning is not an option live because of the duplicate records that would be created.

    You might try debugging the package and put a data viewer before the write to the table. See if the data is flowing through the package.

    Good thought, thanks.

    Rerunning with just the 'missing' records in an Excel file - would that be an option?

    Lynn seems to be suggesting maybe change the destination to an empty table (same definition, different name) to try and see if it is a data problem.

    I kind of question if duplicates might be a problem, does your package check for the existence of a record first?

    Generally you want to design something in where possible to avoid this.

    To be able to pick up the same file and rerun - without having to worry about duplicates - might be worth something.

  • You could also delete the destination temporarily and use a multicast as a dummy destination. Put a dataviewer right in front of this multicast and see if all rows are imported into the SSIS pipeline.

    Need an answer? No, you need a question
    My blog at https://sqlkover.com.
    MCSE Business Intelligence - Microsoft Data Platform MVP

  • Thanks for the replies Gosta, Greg, and Koren, all are great ideas.

    I'll post an update later today.

    Casey

  • I hate working with Excel as a datasource. Business users love it, because it's easy to enter data and the design is flexible. However it's problematic when used in an ETL process for the same reasons. Examine the Excel sheet to determine if it contains any Hidden rows. I've encountered issues in the past with SSIS and BULK INSERT ignoring Hidden rows.

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

Viewing 11 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply