UPDATE using a self-join

  • Hi all . . .

    I'm trying to perform an UPDATE based partially on the row before it. I want to set a column, given that the column is NULL and the same column for the ID before it is NOT null.

    Here's some sample data:

    create table #example (ID int primary key, Name varchar(10) not null, Status bit null)

    insert into #example values (1, 'Tinker', 1)

    insert into #example values (2, 'Evers', null)

    insert into #example values (3, 'Chance', null)

    I tried doing an UPDATE using a self-join, like this:

    update #example

    set Status = 1

    from #example ex join #example ex1 on ex.ID = ex1.ID + 1

    where ex.Status is null and ex1.Status is not null

    However, I keep getting this message:The table '#example' is ambiguous.

    What am I doing wrong?

    Thanks, guys!

    +--------------------------------------------------------------------------------------+
    Check out my blog at https://pianorayk.wordpress.com/

  • easy fix: once you aliased #example, you have to use that alias everywhere:

    update ex -- the table that will be updated's alias.

    set Status = 1

    from #example ex

    join #example ex1 on ex.ID = ex1.ID + 1

    where ex.Status is null and ex1.Status is not null

    Lowell


    --help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

  • Ah, yes! Thank you very much, sir!

    +--------------------------------------------------------------------------------------+
    Check out my blog at https://pianorayk.wordpress.com/

  • the beautiful part about the "proprietary" update format is the ease in which you can test to see which records are matching/ would be affected.

    simply inserting "--SELECT *", and highlighting from that point and selecting the rest of the command is a huge benefit to a developer.

    lets you test the records really really easily:

    update ex -- the table that will be updated's alias.

    set Status = 1

    --highlight and run from the "SELECT... to preview affected records.

    --SELECT *

    from #example ex

    join #example ex1 on ex.ID = ex1.ID + 1

    where ex.Status is null and ex1.Status is not null

    Lowell


    --help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

  • CELKO (1/12/2011)


    Using BIT is bad and a NULL bit makes no sense. Using the proprietary UPDATE..FROM.. syntax has lots of problems with cardinality and portability.

    I don't care about portability but I do have to ask, what are the problems with cardinality that you've come across using UPDATE...FROM...? Do you have a runnable code example that proves it? Maybe a white paper or article on the subject?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • No but he has a book you can buy that proves it!

    Joe should just not bother actually making a different post for every thread he posts in and just write "Your code sucks, buy my book to improve it!" for everything. That's pretty much what he does anyway.

  • Celko, I was thinking about responding to your high and mighty horse that you keep riding in on, and then remembered the old adage about "if you don't have anything nice to say . . . "

    Your "help" is NOT WANTED. Knock it off.

    +--------------------------------------------------------------------------------------+
    Check out my blog at https://pianorayk.wordpress.com/

  • Jeff Moden (1/12/2011)


    CELKO (1/12/2011)


    Using BIT is bad and a NULL bit makes no sense. Using the proprietary UPDATE..FROM.. syntax has lots of problems with cardinality and portability.

    I don't care about portability but I do have to ask, what are the problems with cardinality that you've come across using UPDATE...FROM...? Do you have a runnable code example that proves it? Maybe a white paper or article on the subject?

    Update From can have problems if it runs into two or more rows that match the Join/Where criteria and would update one row in the target table. Same problem as assigning a variable value from a table where more than one row may meet the Where criteria. You can't be certain you'll get the row you want, just that you'll get ONE of the rows. Which one is pretty much random though.

    I've read that Merge doesn't have that problem, but haven't tested it myself. Now I'm going to have to try it out!

    In SQL 2008 databases, I've been using Merge instead of Update From, because it's easy to modify into an Upsert if that's needed, and this allows for greater consistency in the coding practices for the database. (Having them all be Merge, instead of some Update From and some Merge.)

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • GSquared (1/13/2011)[hrUpdate From can have problems if it runs into two or more rows that match the Join/Where criteria and would update one row in the target table. Same problem as assigning a variable value from a table where more than one row may meet the Where criteria. You can't be certain you'll get the row you want, just that you'll get ONE of the rows. Which one is pretty much random though.

    But does anyone have a demonstrable test setup to show that?

    If I understand what you're saying, I've also had just the opposite experience. If two rows in the target table meet one row's worth of criteria in another table, both rows will still be updated in the target table which, to me, is not only expected operation, but I count on it.

    I have seen the reverse be true where two rows in the criteria table meet the criteria but only one row meets the criteria in the target table, it's only updated once and, that too, is expected operation for me so its never been a problem. I hope they never change that because who knows? In the world of massaging "bad" 3rd party parvided data, you never know what trick you may need to fix things up. 🙂

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • GSquared (1/13/2011)


    Jeff Moden (1/12/2011)


    CELKO (1/12/2011)


    Using BIT is bad and a NULL bit makes no sense. Using the proprietary UPDATE..FROM.. syntax has lots of problems with cardinality and portability.

    I don't care about portability but I do have to ask, what are the problems with cardinality that you've come across using UPDATE...FROM...? Do you have a runnable code example that proves it? Maybe a white paper or article on the subject?

    Update From can have problems if it runs into two or more rows that match the Join/Where criteria and would update one row in the target table. Same problem as assigning a variable value from a table where more than one row may meet the Where criteria. You can't be certain you'll get the row you want, just that you'll get ONE of the rows. Which one is pretty much random though.

    I've read that Merge doesn't have that problem, but haven't tested it myself. Now I'm going to have to try it out!

    I don't see how Merge can avoid the problem, it seems to be inherent in what is being done. Maybe I'm misunderstanding altogether, but here's how it appears to me:

    Take code like

    Merge Table_A A Using Table_B B

    On A.name = B.name

    When Matched and A.acount is not null

    Set A.acount = B.bcount

    Suppose that Table_A contains a row with 'XYZ' in the name column and something other than null in the acount column, and Table B three rows with 'XYZ' in the name column and values 2,3 and 5 in the bcount column. What will this merge statement do? Will it fail with an error? Will it pick one of the three values 2,3, and 5 to put in the 'XYZ' row of the target table? In the latter case, what degree of control has the programmer over which value is chosen? I believe the answer is that he has exactly no control (the dtabase engine will in effect use the value from the last matching row it processes, and which of the three rows that is depends purely on the order the optimiser chooses to scan Table_A in), which is exactly the situation that arises for

    Update A Set A.acount = B.bcount

    from Table_A A inner join Table_B B

    on A.name = B.name

    where A.acount is not null

    so this problem looks the same for Merge as for Update.

    Tom

  • CELKO (1/13/2011)


    Does the fact that it gives the wrong answers not outweigh your idea of developer ease of reading (which I disagree with).

    Whose idea of developer ease of reading? Neither I, nor anyone else who has contributed to this thread, has mentioned any concept of developer ease of reading.

    The interesting question is how MERGE avoids the problem of dependence on scan order (if it does indeed avoid it) since there has to be some mechanism to resolve which of several candidate values is used. You may find that if there is no clustered index records will be touched in original insertion order, that adding a clustered index changes that order, and that changing properties of the clustered index (eg changing ASC to DESC) changes the order yet again, and that Merge is affected by these index changes. Until you can explain how Merge avoids suffering these effects, you should stop telling us that this is a problem with the proprietary UPDATE syntax. You can't demonstrate that it doesn't suffer these problems by showing that in a couple of trivial cases the database engine gets it right, because we all know from experience that there are often bizarre cases where the optimiser does something quite frightening, so please explain how MERGE avoids being affected.

    Tom

  • Horrible example, Celko; all it proves is the row used to to update the parent table is not guaranteed unless you make sure the joins form a 1:1; That's SQL 101 that every developer is already aware of. it's not pointing out any inherent problem, but really pointing out the lack of definition of what data you are trying to get.

    Jeff Moden already pointed out that this is expected and understandable.

    The scenario you posted would usually be grabbing the min/max/sum/last updated from the details, which probably should not be placed in the record anyway, but calculated as a view, but regardless, that is what you'd typically do.

    yeah, the merge statement won't update if there is multiple rows, so if you KNOW you are shooting for a 1:1 match, that's a nice to have, but it certainly is not the "gold standard" of update statements.

    MERGE INTO Orders

    USING (SELECT order_nbr,sku,item_price

    FROM OrderDetails) AS EX1

    ON Orders.order_nbr = EX1.order_nbr

    WHEN MATCHED

    THEN UPDATE

    SET Orders.some_col = EX1.item_price;

    Msg 8672, Level 16, State 1, Line 2

    The MERGE statement attempted to UPDATE or DELETE the same row more than once. This happens when a target row matches more than one source row. A MERGE statement cannot UPDATE/DELETE the same row of the target table multiple times. Refine the ON clause to ensure a target row matches at most one source row, or use the GROUP BY clause to group the source rows.

    Lowell


    --help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

  • Okay, I found the same thing. Merge doesn't "avoid" or "fix" the problem, it just explicitly makes it clear that there's a problem by raising an error.

    Honestly, since it's about data integrity, I'd rather have an error than not in this case.

    Yeah, updating from ONLY one row is SQL 101, but I've had to deal with far too many devs who were in SQL Kindergarden and thought they were in SQL Post-Grad. I'd prefer an error over loss of data integrity, especially when the integrity issue comes from a simple oversight in a Where clause, and isn't necessarily repeatable. Non-repeatability can make it really hard to debug.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • OK, so Merge gives an error message, which is certainly a good thing (it surprised me, because reading the MS documentation I didn't spot that; but with hindsight it doesn't surprise me, becaude MS documentation often buries useful information where you are unlikely to find it). But what does that error message say? It says

    Error message


    Refine the ON clause to ensure a target row matches at most one source row, or use the GROUP BY clause to group the source rows.

    And what does BoL say about modifying the ON clause to do that? It says

    BoL


    Caution It is important to specify only the columns from the target table that are used for matching purposes. That is, specify columns from the target table that are compared to the corresponding column of the source table. Do not attempt to improve query performance by filtering out rows in the target table in the ON clause, such as by specifying AND NOT target_table.column_x = value. Doing so may return unexpected and incorrect results.

    In other words doing what the error message suggests you do to avoid the problem will deliver incorrect results. Not exactly useful, is it?

    If the writer of the query is awake and avoids the multiple match, there's no problem with either syntax - except that the Merge description suggests that modifying the ON condition to avoid the multiple match will cause incorrect results; that strikes me as saying the Merge syntax is inferior to the alternative, where the multiple match issue can be avoided without introducing errors which seems to be beyond the Capability of Merge's ON clause. Again, as I said before, I don't really know what's going on here, it's more than 15 years since I stopped being payed to concern myself with (amongst other things) SQL definitions and getting clear and solid declarative semantics for the language, so I'm nowhere near up to date, I just get the feeling that something doesn't really work (because it's not sufficiently expressive, perhaps).

    Tom

  • CELKO (1/13/2011)


    Lowell (1/13/2011)


    Horrible example, Celko; all it proves is the row used to to update the parent table is not guaranteed unless you make sure the joins form a 1:1; That's SQL 101 that every developer is already aware of. it's not pointing out any inherent problem, but really pointing out the lack of definition of what data you are trying to get.

    I give up! I guess we don't need seat belts since good drivers never have accidents :w00t:

    When I have an ugly VIEW, CTE or derived table in my UPDATE or MERGE, I cannot spot this problem by eye. My brain chokes after five tables. I cannot see any of the base tables that were altered, either. And I have been doing SQL for decades.

    MERGE checks all of this for me; UPDATE.. FROM .. screws me. What choice will a professional make? 😉

    How about the same choice you always give? Fix the bad DDL or fix the bad query once it's discovered in testing. It's what YOU used to do before MERGE came out. 😉

    And the only reason why UPDATE/FROM would screw someone is if they didn't actually know how to use it. A couple of million Sybase, SQL Server, and other "Rushmore Engine" users have absolutely no problem with it.

    That not withstanding, I agree that MERGE is safer. But there're also a whole lot of people that don't have 2K8 yet that are thanking their lucky stars that they have an UPDATE with a FROM clause instead of having to do like you used to have to do in Oracle before they came out with MERGE.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 15 posts - 1 through 15 (of 37 total)

You must be logged in to reply to this topic. Login to reply