April 10, 2012 at 9:50 am
Greetings -
I've got an issue to which I've not found any directly applicable information thus far on this forum or from general Googling.
I am troubleshooting a SSIS 2005 package that I did not create. The package imports data from an Excel file and then executes a few stored procedures against the data.
The import from Excel is being done using an Excel Source Component. The client has reported two instances where a very small number of rows were, for no apparent reason, skipped and not imported at all. An example is below.
Desc1Desc2Desc3Desc4Desc5
344334/3/201236852
344334/3/201221431
344334/3/201223681
344334/3/201224781
344334/3/201224601
344334/3/201230582
344334/3/201236721
344334/3/201221051
340384/3/201236651
337494/3/201236913
337494/3/201235231
346304/3/201224781
346304/3/201231192
335984/3/201220694
335984/3/201237416
335984/3/201232673
346214/3/201221473
346294/3/201224781
346294/3/201234421
348944/3/201231194
348944/3/201224781
348944/3/201234772
348944/3/201234931
348944/3/201237771
349354/3/201236291
349354/3/201234131
141554/3/201245484R1598
282354/3/201242A12045
349314/3/201242A14307
This particular Excel file contained 831 rows and only the five underlined rows were NOT imported. I've read a few posts regarding SSIS treating missing delimiters in text files oddly, but Excel does not use delimiters (at least that one can see and/or manipulate).
The Error Output and the Error Input set up on the source and destination (OLE DB Dest.) are all set to fail on error, and the package is completing, which is why it appears that the rows in question are simply being ignored for some reason.
Can anyone offer any insight into how to get to the bottom of this? Specifically, I am still trying to determine where the root of the issue is - SSIS or funky stuff in the Excel file that I am not seeing.
Thanks In Advance.
April 10, 2012 at 10:17 am
I'll start with the assumption that SSIS is not sending your rows into oblivion without being told to do so. To confirm, are you using the EXCEL connector built into SSIS or are you specifying an Excel provider for your connection? Does the Data Flow have any transformations in between the Excel file and the database table, e.g. a Conditional Split. Does the destination table have any triggers on it? Is the table the Excel-data is loaded into the same as the one where they have been deemed missing? Are there any other steps happening in between the data-load and the verification where the rows are deemed missing?
There are no special teachers of virtue, because virtue is taught by the whole community.
--Plato
April 10, 2012 at 10:38 am
Thanks very much for the quick reply.
One detail that I neglected to mention is that I have not been able to replicate this issue. The first time the client reported it, I wrote it off as fluke after not being able to replicate it. Since it has happened again, however, there appears to be something going on somewhere.
As for your questions:
The package is using the Excel Connection Manager.
There is a Data Transformation step just after the import. The transformation converts the data in each column to a string (DT_STR) of length 255. The Error Output option for each row is set to Fail Component. No failure is occurring and this is what is making me think that the rows in question are simply being skipped.
The destination table has no triggers on it. It is used as a staging table, to temporarily hold the imported data before it is inserted into its final destination.
The first table where I am not seeing the data is the staging table, which is specified in the Excel Source Data Flow. This is the first place in the database after the import completes where I should see the data. This table persists between executions of the package, so I can query for the records after execution and see that they are not present.
All other processing (i.e. the stored procedures) operate on the data once it is in the staging stable.
April 10, 2012 at 10:46 am
If you rerun the package to an empty table, do all the records appear? You might try debugging the package and put a data viewer before the write to the table. See if the data is flowing through the package.
April 10, 2012 at 10:53 am
Lynn Pettis (4/10/2012)
If you rerun the package to an empty table, do all the records appear?
Not sure, since I've not been able to replicate in testing. Rerunning is not an option live because of the duplicate records that would be created.
You might try debugging the package and put a data viewer before the write to the table. See if the data is flowing through the package.
Good thought, thanks.
April 16, 2012 at 5:15 am
Greetings All -
Just thought I'd post an update since I still have not found the cause for, and therefore the solution to, this issue.
I tried using the "IMEX=1" syntax at the end of the Excel connector's connection string to force Import Mode, but this did not seem to have any effect on the issue.
I am now looking into the possibility of avoiding the issue altogether. I am going to try and see about getting the source data into a different format (e.g. delimited text).
Thanks for the replies.
April 17, 2012 at 3:38 am
Have you tryed to save the excel file as a TAB separated textfile
and use SSIS to import the text file?
If some rows then are missing the problem is in the content of the file.
/Gosta
April 17, 2012 at 2:26 pm
charris1119 (4/10/2012)
Lynn Pettis (4/10/2012)
If you rerun the package to an empty table, do all the records appear?Not sure, since I've not been able to replicate in testing. Rerunning is not an option live because of the duplicate records that would be created.
You might try debugging the package and put a data viewer before the write to the table. See if the data is flowing through the package.
Good thought, thanks.
Rerunning with just the 'missing' records in an Excel file - would that be an option?
Lynn seems to be suggesting maybe change the destination to an empty table (same definition, different name) to try and see if it is a data problem.
I kind of question if duplicates might be a problem, does your package check for the existence of a record first?
Generally you want to design something in where possible to avoid this.
To be able to pick up the same file and rerun - without having to worry about duplicates - might be worth something.
April 18, 2012 at 12:10 am
You could also delete the destination temporarily and use a multicast as a dummy destination. Put a dataviewer right in front of this multicast and see if all rows are imported into the SSIS pipeline.
Need an answer? No, you need a question
My blog at https://sqlkover.com.
MCSE Business Intelligence - Microsoft Data Platform MVP
April 18, 2012 at 6:38 am
Thanks for the replies Gosta, Greg, and Koren, all are great ideas.
I'll post an update later today.
Casey
April 18, 2012 at 7:34 am
I hate working with Excel as a datasource. Business users love it, because it's easy to enter data and the design is flexible. However it's problematic when used in an ETL process for the same reasons. Examine the Excel sheet to determine if it contains any Hidden rows. I've encountered issues in the past with SSIS and BULK INSERT ignoring Hidden rows.
"Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho
Viewing 11 posts - 1 through 11 (of 11 total)
You must be logged in to reply to this topic. Login to reply
This website stores cookies on your computer.
These cookies are used to improve your website experience and provide more personalized services to you, both on this website and through other media.
To find out more about the cookies we use, see our Privacy Policy