Row-by-row insert with error handling -- Stored Procedures vs. SSIS

  • Hello,

    I have dynamic inserts and update statements that are created based on a select from sys.columns. My inserts and updates need to change when the table structure is changed. This is all done in stored procedures.

    I also have a new requirement -- to handle errors, row-by-row. I'm familiar with the approach to do this in a stored procedure, as well as in SSIS. Since it needs to be dynamic, I'm not sure if SSIS is the best option -- any time there is a change to table structure, I would need to change the package and re-deploy.

    Has anyone done error-handling at the row-level in stored procs, and how effective was it?

    Thank you!

  • Depends rather heavily on table size and the number of rows to be handled, along with whether or not there are appropriate indexes to handle the queries, along with the overall capacity of the system, the current load on the server at the time, the physical configuration of the server, and ... it wouldn't be too surprising if the price of tea in China somehow made it's way into the mix.   You have to be particularly careful in adopting dynamic SQL.   It can be hell to troubleshoot, and might well force you into an awful lot of running traces or querying extended events just to try and see what's actually taking place.   Having any kind of row by row work is unlikely to do well at any kind of volume, as it becomes entirely too inefficient.   SSIS would have to operate entirely outside of it's row by row error handling ability in order to avoid having to get changed with every table change.

    However, maybe the bigger question is why are there going to be so many table changes?   That tends to lead to short-term thinking in regard to database objects, and that's usually a recipe for bad design to start creeping in at every opportunity.   Doesn't sound like something I'd sign up for.   Way too many moving pieces to be manageable...

    Steve (aka sgmunson) 🙂 🙂 🙂
    Rent Servers for Income (picks and shovels strategy)

  • Couple things, although you will have to evaluate how this applies to your situation:

    http://www.sqlservercentral.com/articles/Data+Mart/67606/ - I am using this approach that Mel Sansone recommends here, works well for me, logging all errors to an ETLerrorLog with the XML data so I can review after the fact.
    http://www.sqlservercentral.com/articles/Integration+Services+(SSIS)/173055/ - Hiskke Bekkering shows us here how to do dynamic mapping, so this may apply for you as well, although in his case it's more of a tool that does the same type of load for multiple column mappings, not sure it's the same issue.

    All that said, I've never understood why people do the data cleansing during the SSIS import step, it seems much cleaner to pull the data in as-is, then cleanse/transform in a stored procedure. Avoids paging all that data down to a single machine during the load, easier to troubleshoot and tune, etc. I know you can build an entire career on doing it, and doing it well, but I'm just not sure *why* someone wants to do that, from my narrow perspective anyway.

    Hope that helps, good luck to you!

    -------------------------------------------------------------------------------------------------------------------------------------
    Please follow Best Practices For Posting On Forums to receive quicker and higher quality responses

  • sgmunson - Monday, August 20, 2018 6:48 AM

    Depends rather heavily on table size and the number of rows to be handled, along with whether or not there are appropriate indexes to handle the queries, along with the overall capacity of the system, the current load on the server at the time, the physical configuration of the server, and ... it wouldn't be too surprising if the price of tea in China somehow made it's way into the mix.   You have to be particularly careful in adopting dynamic SQL.   It can be hell to troubleshoot, and might well force you into an awful lot of running traces or querying extended events just to try and see what's actually taking place.   Having any kind of row by row work is unlikely to do well at any kind of volume, as it becomes entirely too inefficient.   SSIS would have to operate entirely outside of it's row by row error handling ability in order to avoid having to get changed with every table change.

    However, maybe the bigger question is why are there going to be so many table changes?   That tends to lead to short-term thinking in regard to database objects, and that's usually a recipe for bad design to start creeping in at every opportunity.   Doesn't sound like something I'd sign up for.   Way too many moving pieces to be manageable...

    Steve,

    Thank you for the reply. I don't disagree. I'm building an engine to handle ETL between two instances of a web application. The changes to the tables are due to the changes to the application, and dynamic SQL is a must. That's the whole point of the project I'm working on. I like a challenge, and saying that it won't be manageable isn't an option for me.

    Thanks,
    Aaron

  • jonathan.crawford - Monday, August 20, 2018 7:20 AM

    Couple things, although you will have to evaluate how this applies to your situation:

    http://www.sqlservercentral.com/articles/Data+Mart/67606/ - I am using this approach that Mel Sansone recommends here, works well for me, logging all errors to an ETLerrorLog with the XML data so I can review after the fact.
    http://www.sqlservercentral.com/articles/Integration+Services+(SSIS)/173055/ - Hiskke Bekkering shows us here how to do dynamic mapping, so this may apply for you as well, although in his case it's more of a tool that does the same type of load for multiple column mappings, not sure it's the same issue.

    All that said, I've never understood why people do the data cleansing during the SSIS import step, it seems much cleaner to pull the data in as-is, then cleanse/transform in a stored procedure. Avoids paging all that data down to a single machine during the load, easier to troubleshoot and tune, etc. I know you can build an entire career on doing it, and doing it well, but I'm just not sure *why* someone wants to do that, from my narrow perspective anyway.

    Hope that helps, good luck to you!

    Jonathan,

    I'm a firm believer in bringing data as-is into staging, then doing the rest through a stored procedure. I have a great dislike for SSIS, because it is a very finicky tool, but when you need it, it is very useful. Once the data is loaded, I'm actually transferring it into another database, and it's during this process that I need to check row-by-row. Not sure if it's what you intended, but I am now going to give some thought to see how I can separate the check for constraints before the load. Thanks for the idea, and for the links.

    Aaron

  • aaron.beer - Tuesday, August 21, 2018 7:51 AM

    sgmunson - Monday, August 20, 2018 6:48 AM

    Depends rather heavily on table size and the number of rows to be handled, along with whether or not there are appropriate indexes to handle the queries, along with the overall capacity of the system, the current load on the server at the time, the physical configuration of the server, and ... it wouldn't be too surprising if the price of tea in China somehow made it's way into the mix.   You have to be particularly careful in adopting dynamic SQL.   It can be hell to troubleshoot, and might well force you into an awful lot of running traces or querying extended events just to try and see what's actually taking place.   Having any kind of row by row work is unlikely to do well at any kind of volume, as it becomes entirely too inefficient.   SSIS would have to operate entirely outside of it's row by row error handling ability in order to avoid having to get changed with every table change.

    However, maybe the bigger question is why are there going to be so many table changes?   That tends to lead to short-term thinking in regard to database objects, and that's usually a recipe for bad design to start creeping in at every opportunity.   Doesn't sound like something I'd sign up for.   Way too many moving pieces to be manageable...

    Steve,

    Thank you for the reply. I don't disagree. I'm building an engine to handle ETL between two instances of a web application. The changes to the tables are due to the changes to the application, and dynamic SQL is a must. That's the whole point of the project I'm working on. I like a challenge, and saying that it won't be manageable isn't an option for me.

    Thanks,
    Aaron

    I too, love a challenge, but I also try to avoid signing up for something I will have to forever be tweaking, as scenarios change and business changes, I'd forever be tweaking the thing until it falls apart under the weight of the spackle I had to apply to keep it operating under the original principles it was designed to operate under.   Or until it falls apart because it just can't scale.   Row by row validation basically says that you want to recreate a wheel that has already been invented and works really well: SSIS.   Yes, a lot of folks get bent out of shape because it's "finicky".   I'll consider that "their problem".   You just need to understand how it works and recognize it's limitations.   If you are constantly undergoing schema changes then I would be concerned that maybe you should be spending time on the database design until you can slow down the rate of change for schema stuff.   A constantly changing schema usually leads to a lot less thinking about how to introduce a particular change, and over time, database design inevitably suffers a really bad design decision that becomes extremely difficult to get out from under, if not impossible.   Far too often, people decide to do things the "quick and easy" way instead of being a tad more thoughtful and doing it the right way.  Just as in life, you have lots of choices, but few good ones, ... SQL Server is pretty much the same in that regard.   Lots of choices, few are actually good ones, most of the time.

    One of the other things you might end up signing up for with something like this is having it forever be your job to fix and tweak, because no one else will have the skill, and you'll own this nightmare forever, effectively limiting your career growth with  your current employer to what you are doing today.   That's a limitation I am just not willing to sign up for any more.

    As to the problem itself, if you are already capable at reading the infrastructure tables (e.g. sys.tables, and sys.columns), then take a look on the web for code that can effectively script out a table with ALL it's properties and details..   Having that code could come in very handy with what you are about to tackle, and may well save some serious time.   You could use SSIS, but only as a container to hold your dynamic SQL scripts.   Using it with a data flow would probably be impractical due to the number of changes you would need to the package every time the schema changed.   Just stay very aware that some kinds of schema changes could totally bollix up your overall design for how things are supposed to work.   Those kinds of gotcha's are what I would worry about, as you could well be forced to re-design the whole shootin' match under the gun of deadline.   The most important thing you will need is a LOT of advance notice for changes in schema.   If you don't ensure that right up front, all it might take to crumble the empire is one breaking change that comes at the last minute.   That kind of thing can quickly destroy years of a good reputation.

    Steve (aka sgmunson) 🙂 🙂 🙂
    Rent Servers for Income (picks and shovels strategy)

  • If you're trying to develop kinda schema-less database then SQL Server is obviously a wrong choice for the database engine.

    _____________
    Code for TallyGenerator

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply