An Elegant ETL Solution: Except for Lost Data

  • debrucer1

    SSCertifiable

    Points: 5366

    Comments posted to this topic are about the content posted at http://www.sqlservercentral.com/columnists/drussell/anelegantetlsolutionexceptforlostdata.asp

    Thank-you,
    David Russell
    Any Cloud, Any Database, Oracle since 1982

  • Kirk Haselden

    SSCommitted

    Points: 1683

    I've asked for the package and data so that we can isolate this and understand better what's happening. We take this very serious and will get to the bottom of it as soon as possible. Thanks to David and Jamie Thomson for bringing it to our attention.

    Kirk Haselden

  • Donald Farmer

    Grasshopper

    Points: 11

    It would be good to see the package, as I am very confused by the article.

    It does not include details of the command being executed, so its difficult to know what errors might be anticipated.

    It does not include details of how error-handling was configured on the command component, so we cannot tell how we would expect any anticipated errors to be handled.

    It does not include the error information reported on the Progress tab (or in the logs) which includes the OLEDB errors that would describe why the command failed - especially useful for errors that are either not explicitly handled by error options, or for which the error-handling options are just set to failure.

    Also, the article seems to describe a classic Slowly Changing Dimension scenario - and SSIS includes a wizard that builds those processes very easily including full handling of inserts and updates. The flows built by this wizard can be easily customized too.

    I can understand David's frustration that he is finding errors difficult to handle, and for sure we must improve the ease of use of this very powerful feature in future versions. For now, however, I do believe we have the most flexible row-by-row error handling that I have seen in a data integration tool, and I think that even in this case we include all the information necessary to debug. But I simply cannot tell from the article.

    Donald Farmer

    Group Program Manager

    SQL Server Integration Services

  • John Scarborough

    Ten Centuries

    Points: 1104

    Yes, there just wasn't enough information in the article to make good use of it. Though I appreciate the obvious attention that went into writing the article and doing the screenshots, I'm left with little further knowledge. There's just to much basic information about this process left out. I'd frankly like to see a replay of the whole article with more detailed information.

    >> "It would be good to see the package, as I am very confused by the article.

    It does not include details of the command being executed, so its difficult to know what errors might be anticipated.

    It does not include details of how error-handling was configured on the command component, so we cannot tell how we would expect any anticipated errors to be handled.

    It does not include the error information reported on the Progress tab (or in the logs) which includes the OLEDB errors that would describe why the command failed - especially useful for errors that are either not explicitly handled by error options, or for which the error-handling options are just set to failure. "

    John Scarborough
    MCDBA, MCSA

  • david russell-253790

    SSCarpal Tunnel

    Points: 4042

    I am most appreciative of the offer to help from Kirk and additional input from Donald.  I will also investigate the alternative method suggested by Donald for handling my work load. 

     

    There is no such thing as “a short schedule” and I am on a deadline.

     

    The purpose of these sites is to share ideas, issues, and problems and seek solutions, and that was all I was trying to do.  What better help could I get than from two of Microsoft’s top people on the product in question?  Could I have possibly have reached a better audience?

     

    I privately messaged Kirk on Friday to let him know that I am glad to supply all of the details.  I have discovered that at least part of the problem was self-inflicted.  There was a schema change in the destination database that was made directly to the data type of one column, and not to the other.

     

    So with a bit of egg on my face I will provide all of the details.  There were four rows that failed, and three were on the erroneous column, while one was on the correct data type.  The failures were not graceful, and the error messages were not clear (or even defined).  Even if it was “operator error” it did not work right for this user.

     

    Egg or no egg, I still seek resolution.  It is my intention to write a follow-up article on how it works out.  I like writing, and I hope that the audience likes reading; but my job comes first, of course, and my packages must get written now.

     

    Stay tuned to this channel!


    Cheers,

    david russell

  • Kirk Haselden

    SSCommitted

    Points: 1683

    David,

    Matt David had a look at your package and gave the following comments:

    Kirk,

    I ran this project and as far as I can tell everything worked.  I loaded with patient no dob.  I ran it the first time and all 19 were inserted with no errors.  On his example one of his components went red.  That means that the pipeline shut down prematurely and we make no guarantee that all data was inserted/updated correctly.  If a component fails then the pipeline fails.  Row redirection would mean that the task would not turn red for a redirectable error. The pipeline never finished so all results should be intermediary.  The only thing I initially found strange was that some rows were inserted.  I then realized that he was using an oledb command not an oledb dest to insert the rows so since the dataflow itself wasn’t transactioned then each insert is separate and that is why some rows were there.  Please note that I could not get this failure to occur but this could be because I had to make some changes to get the package to run so perhaps I affected the result as well.  In any event since he had a red component the pipeline failed and therefore the data is not lost it just wasn’t written due to the failure.  This is the expected behavior in this scenario.  If he enabled transaction then all the data should be rolled back.

    Given this input, you might want to check a few things.

    Is the dataflow task in a transaction? You can check that by selecting the Dataflow task in the control flow view and looking at the transaction option property. Does it say suppported or required?

    Are you using an OLEDB Command destination? That component does an operation on each row as it goes through, so you may be causing a failure in the component itself.

    Check to make sure that failed rows get redirected and don't fail the component on the component that's turning red.

    When you get the error that causes the transform to turn red, go to the output window (on the main menu select View | Windows | Output Window) or in the progress tab. You should get more information about the error. If you can also turn on logging to get the full log, that would help.

    It sounds as though there may be some problems with the package. Also, you're struggling with understanding the problem, regardless of the ultimate outcome of the issue, this isn't good. A better error would help, or making the error easier to find perhaps. These are some areas on which we're focusing for future releases.

    Please let us know if any of the above observations or suggestions resolved the problem.

    Thanks,

    K

  • david russell-253790

    SSCarpal Tunnel

    Points: 4042

    What tremendous service!

     

    Just a few comments before I get back into it and see why that solution I sent did not fail for you.  The first thing will be to unzip the file I sent you and make sure it fails for me.

     

    We have something running on our network that (apparently) periodically deletes anything named “Backup Files” (and below) and a lot of my old code is gone this morning.  I did not test the components I sent you for failure before sending them today.  I do have copies; but was quite sure that I was sending the right package.

     

    There are two reasons for using the OLE-DB command vs. destination.  The destination does not work with remote hosts in this release, and there is no success output on a destination to allow the return trip from the destination back to the StageFlag in the source.

     

    I have already taken Donald Farmer’s comments to heart and will be looking at his suggestions.  And I will add Matt David’s suggestions to the list.  I clearly see part two of this article coming.

     

    More to follow…  And thank you once again.


    Cheers,

    david russell

  • Kirk Haselden

    SSCommitted

    Points: 1683

    The destination does not work with remote hosts in this release

    Are you talking about the OLEDB destination? Only the SQL Destination has this limitation.

    There is no success output on a destination to allow the return trip.

    Can you explain this? Is there a reason that this work cannot be accomplished after the dataflow task with a SQL Script? Likely this will be faster and less error prone.

    K

  • david russell-253790

    SSCarpal Tunnel

    Points: 4042

    Okay, I accept that distinction... it is an SQL destination with that limitation; not an OLE-DB destination.  But, "Destinations" from that portion of the toolbox do not have success outputs, so another task after a destination isn't possible (except perhaps by either passing a result set to a new control flow, or implementing an event handler).  It is not possible within the data flow after the destination task.

    Yes, an SQL Script component could have been used - which was exactly why I bought Donald Farmer's book, "Scripting SQL Server 2005 Integration Services"; but what I did appeared to work, and I still believe should have (and will) work.  The additional complexity of a script component was not justified - not only not justified; but not identified as being necessary.


    Cheers,

    david russell

  • Kirk Haselden

    SSCommitted

    Points: 1683

    "Okay, I accept that distinction... it is an SQL destination with that limitation; not an OLE-DB destination.  But, "Destinations" from that portion of the toolbox do not have success outputs, so another task after a destination isn't possible (except perhaps by either passing a result set to a new control flow, or implementing an event handler).  It is not possible within the data flow after the destination task."

    Right, but you can split the flow with a multicast before the destination.

    "Yes, an SQL Script component could have been used..."

    What I meant was with a SQL Script in the SQL Task or multicast into a separate flow where you could do the updates. I don't know what it is you're attempting to update, so I'm just visualizing. You might be able to mark the rows flowing into the destination with a flag and then do set based post-processing with the SQL task for whatever updates you were planning in the OLEDB Command Destination. Or, flow into a temporary table, do the processing there and then update the destination table with the staged data. Maybe this approach will be more performant than the row by row commands you're doing as well.

    Thanks,

    K

  • david russell-253790

    SSCarpal Tunnel

    Points: 4042

    splitting the dataflow with a multicast doesn't tell me that the update or insert was successful, and since there is no success output on the destination there is nothing to merge it back in with to only update the "stageFlag" on a successful update or insert.

    I will attempt to understand and consider your other suggestion later.  Performance is a factor, of course.

    In the meantime, I have unzipped the files I sent you and run the solution and it works just fine for me, too.  Which means to me that I probably do not have the offending code captured unless I moved it.  I had three versions from Friday that got deleted between 11:42 and 12:47 today (apparently) by something running on our network.

     


    Cheers,

    david russell

  • david russell-253790

    SSCarpal Tunnel

    Points: 4042

    "Also, the article seems to describe a classic Slowly Changing Dimension scenario - and SSIS includes a wizard that builds those processes very easily including full handling of inserts and updates. The flows built by this wizard can be easily customized too."

    It's nice to learn something new every day.  Perhaps nobody told me that I was coding a "classic"; but I searched BOL to find the Wizard (it's in the Slowly Changing Dimension Task) and ran it to create a solution in about five minutes... another five minutes to tweak it to include the data conversion from Unicode.  I only included 3 columns vs. the 22 in my original creation; and no error handling or updating of the stageFlag; but it was faster than anything I have done so far.

    There is still the issue that the OLE-DB destination does not have a success output flow, so there is nothing immediate to trigger updating the stageFlag on inserts; but the OLE-DB commands both have them.  I do not yet understand the implications of inferred member updates; but that will be ten more minutes of reading.

    Perhaps we have part two and part three of this "series" of articles

    Thanks again for all the help.


    Cheers,

    david russell

Viewing 12 posts - 1 through 12 (of 12 total)

You must be logged in to reply to this topic. Login to reply