A Hazard of Using the SQL Update Statement

  • I'm assuming I am doing the update because I need to. So rather than just error the operation, wouldn't it be better to create a solution? You have already given us the benefit of realizing there could be duplicates. Decide what you want to do resolve the duplicate issue, perhaps with a sub-query, and then use that to accomplish your original goal.

  • abatej (4/6/2015)


    I too have come to SQL Server from Oracle and although I don't have near the experience with it that most of you do, I absolutely consider this a bug and not a feature. The reason this is a bug is that it allows the result of the update to be non-determinant. Why would I want to allow the table to be changed in a way that results in unpredictable outcome?

    This is of course, my opinion.

    John

    On the whole with SQL Server I've found that most behaviour of this ilk comes down to Microsoft, to their credit, handing us a DB engine and expecting us to research and understand what we're doing with it thoroughly (please don't read this as an underhanded attack because it isn't). Whether this is or isn't a good thing is a separate discussion as I have no doubt that this is an issue for non-SQL developers who find themselves in the position of having to write SQL queries -- they arguably can't be expected to know every nuance of every language, compiler, engine and process that their work will cross paths with. It's been a long time since I worked with Oracle but I would expect that there are similarly perplexing features in that world too. Out of interest, what's the UPSERT or MERGE behaviour in Oracle in such situations?

    I'm personally okay with the current behaviour; I see it as no different to me doing a [font="Courier New"]SELECT TOP 1[/font] without an [font="Courier New"]ORDER BY[/font] and/or [font="Courier New"]WITH TIES[/font], asking a barman to choose a drink for me, or asking somebody to put some papers in a pile without stating which one should be on the top. I think the job of ensuring determinism should always be at least partially in the developers' remits.

    That said, I agree with others that the [font="Courier New"]MERGE[/font] statement bombing out, for whichever internal or logical reasons that it does and [font="Courier New"]UPDATE[/font] doesn't, appears as inconsistent behaviour. I would rather see MERGE optimise its operations to a single [font="Courier New"]DELETE[/font]/[font="Courier New"]UPDATE[/font] per key and then just do it; and then perhaps have both [font="Courier New"]MERGE[/font] and [font="Courier New"]UPDATE[/font] return an informational error/warning à la [font="Courier New"]Null value is eliminated by an aggregate or other SET operation[/font]. That way your universe doesn't implode in such situations but clients and monitoring applications have the opportunity to identify and eliminate the underlying issues.

    Even better, a pair of [font="Courier New"]SET[/font] options to allow control over the behaviour of each statement would go down swell! 🙂

  • On the whole with SQL Server I've found that most behaviour of this ilk comes down to Microsoft, to their credit, handing us a DB engine and expecting us to research and understand what we're doing with it thoroughly (please don't read this as an underhanded attack because it isn't). Whether this is or isn't a good thing is a separate discussion as I have no doubt that this is an issue for non-SQL developers who find themselves in the position of having to write SQL queries -- they arguably can't be expected to know every nuance of every language, compiler, engine and process that their work will cross paths with. It's been a long time since I worked with Oracle but I would expect that there are similarly perplexing features in that world too. Out of interest, what's the UPSERT or MERGE behaviour in Oracle in such situations?

    I'm personally okay with the current behaviour; I see it as no different to me doing a [font="Courier New"]SELECT TOP 1[/font] without an [font="Courier New"]ORDER BY[/font] and/or [font="Courier New"]WITH TIES[/font], asking a barman to choose a drink for me, or asking somebody to put some papers in a pile without stating which one should be on the top. I think the job of ensuring determinism should always be at least partially in the developers' remits.

    That said, I agree with others that the [font="Courier New"]MERGE[/font] statement bombing out, for whichever internal or logical reasons that it does and [font="Courier New"]UPDATE[/font] doesn't, appears as inconsistent behaviour. I would rather see MERGE optimise its operations to a single [font="Courier New"]DELETE[/font]/[font="Courier New"]UPDATE[/font] per key and then just do it; and then perhaps have both [font="Courier New"]MERGE[/font] and [font="Courier New"]UPDATE[/font] return an informational error/warning à la [font="Courier New"]Null value is eliminated by an aggregate or other SET operation[/font]. That way your universe doesn't implode in such situations but clients and monitoring applications have the opportunity to identify and eliminate the underlying issues.

    Even better, a pair of [font="Courier New"]SET[/font] options to allow control over the behaviour of each statement would go down swell! 🙂

    I actually tried to duplicate the example on one of my oracle instances. I had to rework the update to use a sub-select because it wouldn't just run the statement as it was. It immediately fails the update with a message about the sub-select returning more than 1 row of data. You get the same message if you try to do it as a merge statement. If I had the ability to cause SQL Server to error on this I would immediately set it to do so.

    I work in a small IT department and while I am responsible for the databases, it isn't my full time job. I would sleep better knowing that a badly formed sql statement isn't going to corrupt the data.

  • abatej (4/7/2015)


    On the whole...

    I actually tried to duplicate the example on one of my oracle instances. I had to rework the update to use a sub-select because it wouldn't just run the statement as it was. It immediately fails the update with a message about the sub-select returning more than 1 row of data. You get the same message if you try to do it as a merge statement. If I had the ability to cause SQL Server to error on this I would immediately set it to do so.

    I work in a small IT department and while I am responsible for the databases, it isn't my full time job. I would sleep better knowing that a badly formed sql statement isn't going to corrupt the data.

    So Oracle's got consistent behaviour between the two, by the sound of it. Good to know.

    The "not your full time job" is another facet of what I hinted at regarding not realistically being expected to know every external and internal nuance of every technology that you work with when you're wearing many hats. I'm in the same boat but luckily have the luxury of other people caring more about some of my hats than I do.

    The behaviour in SQL Server is quite interesting, especially when compared to [font="Courier New"]MERGE[/font], because it touches on many larger questions regarding ANSI-standard syntax (as referenced by Paul White in a typically thorough response when the article was originally posted), T-SQL-specific constructs (Paul again), and the varying opinions of SQL and general developers, devoted and accidental DBAs and everybody in between as to what should or shouldn't cause a critical error (cross joined with whether you're doing ETL or OLTP, I guess). On the assumption that Microsoft would not enforce the standard on their [font="Courier New"]UPDATE[/font] (given the scope of the change and the downvotes on the Connect item that Paul mentioned), I think configurable behaviour would be a happy medium and imagine that the issue could be (relatively) easily detected at the Stream Aggregate stage (which I assume is how [font="Courier New"]MERGE[/font] detects it).

    I know it's due to a T-SQL-specific construct ([font="Courier New"]UPDATE FROM[/font]) and so somebody using it for the first time should arguably have read up on it and anticipated the behaviour, but I'd further say that a really important part of it was mentioned by Dwain back in 2013:

    dwain.c (9/10/2013)


    Often times, the duplication is much more subtle than my (noted as) contrived example.

    Many other bits of behaviour like this, that continue without raising a severe error, make themselves very quickly known in some way or other. This one's pretty sly and can barrel its way through an ETL process and manifest itself as incorrect figures in a presentation before you know what's happened.

  • abatej (4/7/2015)


    On the whole with SQL Server I've found that most behaviour of this ilk comes down to Microsoft, to their credit, handing us a DB engine and expecting us to research and understand what we're doing with it thoroughly (please don't read this as an underhanded attack because it isn't). Whether this is or isn't a good thing is a separate discussion as I have no doubt that this is an issue for non-SQL developers who find themselves in the position of having to write SQL queries -- they arguably can't be expected to know every nuance of every language, compiler, engine and process that their work will cross paths with. It's been a long time since I worked with Oracle but I would expect that there are similarly perplexing features in that world too. Out of interest, what's the UPSERT or MERGE behaviour in Oracle in such situations?

    I'm personally okay with the current behaviour; I see it as no different to me doing a [font="Courier New"]SELECT TOP 1[/font] without an [font="Courier New"]ORDER BY[/font] and/or [font="Courier New"]WITH TIES[/font], asking a barman to choose a drink for me, or asking somebody to put some papers in a pile without stating which one should be on the top. I think the job of ensuring determinism should always be at least partially in the developers' remits.

    That said, I agree with others that the [font="Courier New"]MERGE[/font] statement bombing out, for whichever internal or logical reasons that it does and [font="Courier New"]UPDATE[/font] doesn't, appears as inconsistent behaviour. I would rather see MERGE optimise its operations to a single [font="Courier New"]DELETE[/font]/[font="Courier New"]UPDATE[/font] per key and then just do it; and then perhaps have both [font="Courier New"]MERGE[/font] and [font="Courier New"]UPDATE[/font] return an informational error/warning à la [font="Courier New"]Null value is eliminated by an aggregate or other SET operation[/font]. That way your universe doesn't implode in such situations but clients and monitoring applications have the opportunity to identify and eliminate the underlying issues.

    Even better, a pair of [font="Courier New"]SET[/font] options to allow control over the behaviour of each statement would go down swell! 🙂

    I actually tried to duplicate the example on one of my oracle instances. I had to rework the update to use a sub-select because it wouldn't just run the statement as it was. It immediately fails the update with a message about the sub-select returning more than 1 row of data. You get the same message if you try to do it as a merge statement. If I had the ability to cause SQL Server to error on this I would immediately set it to do so.

    I work in a small IT department and while I am responsible for the databases, it isn't my full time job. I would sleep better knowing that a badly formed sql statement isn't going to corrupt the data.

    What you said about having to refactor the query in Oracle to use a subselect to avoid the proprietary syntax gets me to wondering if this behavior might not be caused by SQL Server's optimizer doing exactly the same thing. Whereas with a MERGE it does something entirely different to build a query plan.


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • Hmmmn, I see that this was republished from about a year and a half ago, and all the other comments are from the original publication date.

    The may sound somewhat simplistic, but no where in your examples did I see a WHERE clause.

    I think it should be pre-supposed - or at least cautioned - that a professional and sound database design should always be in place. Given that, you're never going to update data in an unpredictable way even with the simple and correct use of a WHERE clause.

    Secondly, anyone who runs new queries on live data, without having first tested the query on a copy of the data in a development or test environment is obviously asking for grief to appear eventually. Plus, do a backup immediately before any major bulk update to live data ... please.

  • jbrentbbutler (4/8/2015)


    Hmmmn, I see that this was republished from about a year and a half ago, and all the other comments are from the original publication date.

    The may sound somewhat simplistic, but no where in your examples did I see a WHERE clause.

    Correct and I agree in normal practice a WHERE clause should appear with any UPDATE. I'm sure an example could have been concocted with a WHERE clause that did the same thing, but I didn't want to distract from the point I was trying to make.

    jbrentbbutler (4/8/2015)


    I think it should be pre-supposed - or at least cautioned - that a professional and sound database design should always be in place. Given that, you're never going to update data in an unpredictable way even with the simple and correct use of a WHERE clause.

    Also true but not to the point of the article.

    jbrentbbutler (4/8/2015)


    Secondly, anyone who runs new queries on live data, without having first tested the query on a copy of the data in a development or test environment is obviously asking for grief to appear eventually. Plus, do a backup immediately before any major bulk update to live data ... please.

    I've found that these sorts of things may not come up in Dev or Test environments, most likely depending on how fresh those are kept.


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • I wasn't trying to be critical of your article. I do think it's fair to discuss professional practices that might eliminate or reduce the types of errors you were discussing.

    It should go without saying that a test environment should mirror the production environment, including keeping the data fresh. We could go into a long list of ignorant and lazy (or "cost saving") mistakes that are made which lead to live data corruption, but that wasn't really the focus of your article or my comment.

  • jbrentbbutler (4/8/2015)


    I wasn't trying to be critical of your article. I do think it's fair to discuss professional practices that might eliminate or reduce the types of errors you were discussing.

    It should go without saying that a test environment should mirror the production environment, including keeping the data fresh. We could go into a long list of ignorant and lazy (or "cost saving") mistakes that are made which lead to live data corruption, but that wasn't really the focus of your article or my comment.

    I agree with you and didn't take your comments as critical, so no worries.


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • Timing is everything... I just read this article the other day and made a mental note. Today, while studying for the 70-461 exam using, Itzik Ben-Gan's, Querying Microsoft SQL Server book, he mentions this exact scenario as 'Nondeterministic UPDATE'. It would be nice if a warning explaining the consequences could be generated. At that point the developer would at least be aware, or, it could be flagged at a QA level (hopefully, there is a QA level).

  • AncientCitySQL (4/8/2015)


    Timing is everything... I just read this article the other day and made a mental note. Today, while studying for the 70-461 exam using, Itzik Ben-Gan's, Querying Microsoft SQL Server book, he mentions this exact scenario as 'Nondeterministic UPDATE'. It would be nice if a warning explaining the consequences could be generated. At that point the developer would at least be aware, or, it could be flagged at a QA level (hopefully, there is a QA level).

    It's a funny thing how things like this go. It would take me a week to explain the things I've done with UPDATEs in the past where I've relied on the ability of UPDATE to do such "Nondeterministic UPDATEs" and would have been totally PO'd at the language had not allowed me to do what I wanted it to do. It's one of the reasons why I don't actually care for Oracle. Do it the way it want's it or take the highway. I don't really want to be that protected because not all rules are 100% correct for 100% of the time.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (4/8/2015)


    AncientCitySQL (4/8/2015)


    Timing is everything... I just read this article the other day and made a mental note. Today, while studying for the 70-461 exam using, Itzik Ben-Gan's, Querying Microsoft SQL Server book, he mentions this exact scenario as 'Nondeterministic UPDATE'. It would be nice if a warning explaining the consequences could be generated. At that point the developer would at least be aware, or, it could be flagged at a QA level (hopefully, there is a QA level).

    It's a funny thing how things like this go. It would take me a week to explain the things I've done with UPDATEs in the past where I've relied on the ability of UPDATE to do such "Nondeterministic UPDATEs" and would have been totally PO'd at the language had not allowed me to do what I wanted it to do. It's one of the reasons why I don't actually care for Oracle. Do it the way it want's it or take the highway. I don't really want to be that protected because not all rules are 100% correct for 100% of the time.

    And how thankful I am for that!

    If you followed the rules all the time there'd never be any interesting banter between you and CELKO!

    :-P:w00t::hehe:


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

Viewing 12 posts - 46 through 56 (of 56 total)

You must be logged in to reply to this topic. Login to reply