Why doesn’t the last record win?

  • create table #tmpTest

    (

    ClientKey int,

    GuarantorKey int

    )

    insert #tmpTest

    select 10,0

    create table #tmpGuarantor

    (

    clientkey int,

    GuarantorKey int

    )

    insert #tmpGuarantor

    select 10,1 union

    select 10,2 union

    select 10,3

    update t1 set t1.GuarantorKey = t2.GuarantorKey

    from #tmpTest t1

    inner join #tmpGuarantor t2 on t1.ClientKey = t2.clientkey

    select * from #tmpTest

    In the above example, I would expect the single record in #tmpTest to be updated three times and that the results of select * on that table would be 10,3 but instead it is 10, 1.

    I obviously don’t understand what is happening here.

    Would someone mind enlightening me?

  • A database is not like an excel spreadsheet; the data is not stored, nor returned in any specific order .

    if you want the data in order, you MUSt supply an ORDER BY SomeColumn clause as part of your query

    You have to understand that the databases first purpose is to return the data in the fastest way possible.

    so depending on the indexes, where the data is stored, whether the query was executed in parallel and the streams of data are merged, statistics, fragmentation, and many other items all affect the order the data *might* get returned in without an ORDER BY Clasue.

    SQL will return the data however it deemed to get the data the fastest...and fastest in not always the same order the data was inserted.

    my signature has a smart rule to remember.

    Lowell


    --help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

  • There is no real concept of "first" and "last" in a table, the preciuse defintion being that a table is an "unordered set of rows". Therefore it could theoretically be any value from the second table that gets used.

    You will find many people suggesting the "update from" syntax should be deprecated by Microisoft and preferably not used, as it does not make it obvious what will happen in this situation. It is certainly non-ansi standard, but if used carefully has some good uses. The critical thing to be aware of with it is to ensure that exactoly ONE row will be matched for the update, otherwise the effects are effectively undetermined.

    You need to find a way of ensuring exactky one row gets matched, maybe have a look at thge merge statement.

    Mike

  • Unfortunately that is not deterministic. There is no guarantee that the "last record" would win, among other things because there is no such thing as physical ordering in SQL. Books online deals with it in this section of the UPDATE documentation:

    Use caution when specifying the FROM clause to provide the criteria for the update operation. The results of an UPDATE statement are undefined if the statement includes a FROM clause that is not specified in such a way that only one value is available for each column occurrence that is updated, that is if the UPDATE statement is not deterministic. For example, in the UPDATE statement in the following script, both rows in Table1 meet the qualifications of the FROM clause in the UPDATE statement; but it is undefined which row from Table1 is used to update the row in Table2.

    http://msdn.microsoft.com/en-us/library/ms177523.aspx

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • I believe you gentlemen covered it and I thank you. Given that you cannot use an ORDER BY in an update, it sounds like I’m going to end up with whatever SQL decides to give me unless I have the update nailed down to only provide a single value.

  • To add to the above information, merely adding an order by to the update from statement will not guarantee the ordering of the internal components. SQL Server does whatever it pleases internal to the statement, THEN orders your results. I'm not even sure you can apply an ORDER BY to an update, actually.

    Either way, they're quite right. When you're working with an UPDATE FROM, you need to be careful and know your data. It has to be a one to one or you can't be sure.

    Typically in these situations I end up running a max-limiter on the connection to make sure I get 'last row'. Adjust to suit your needs with necessary where clauses.

    UPDATE tt

    SET

    tt.GuarantorKey = tg.MaxKey

    FROM

    #tmpTest AS tt

    JOIN

    (SELECT ClientKey, MAX( GuarantorKey) AS MaxKey

    FROM #tmpGuarantor

    GROUP BY ClientKey

    ) AS tg

    ONtt.ClientKey = tg.ClientKey

    SELECT * FROM #tmpTest


    - Craig Farrell

    Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

    For better assistance in answering your questions[/url] | Forum Netiquette
    For index/tuning help, follow these directions.[/url] |Tally Tables[/url]

    Twitter: @AnyWayDBA

  • Thanks, Craig. Actually, I wasn’t shooting for the highest value as much as expecting it because of the sequence in which I entered them. I was troubleshooting a stored proc today where a developer was allowing this type of update (one to many) but thought I could predict the result. NOT.:-D

  • jshahan (8/16/2012)


    Thanks, Craig. Actually, I wasn’t shooting for the highest value as much as expecting it because of the sequence in which I entered them.

    Typically, entrings sequences are ruined at the very moment when data are inserted into tables.

    At this stage the order of rows is ruled by indexing, page allocation, space availability within pages, etc.

    If you want to preserve the order of entering add IDENTITY column. Then you'll always know which record was entered 1st and which one was the last.

    _____________
    Code for TallyGenerator

  • .........

    create table #tmpGuarantor

    (

    ID int IDENTITY(1,1) PRIMARY KEY,

    clientkey int,

    GuarantorKey int

    )

    ...........

    update t1 set t1.GuarantorKey = t2.GuarantorKey

    from #tmpTest t1

    INNER JOIN (SELECT clientkey, MAX(id) AS lastID

    FROM #tmpGuarantor

    GROUP BY clientkey ) L ON L.clientkey = t1.ClientKey

    inner join #tmpGuarantor t2 on t1.ClientKey = t2.clientkey AND t2.ID = L.lastID

    .............

    _____________
    Code for TallyGenerator

  • I'm not sure I'm seeing what the difference is between your code and what I'd posted earlier Sergiy... am I missing something there?

    Nevermind, I see it. Why round trip to the data twice, Sergiy, when you've already got the answer from the subquery? That's non-optimal and more work, to boot. 😉


    - Craig Farrell

    Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

    For better assistance in answering your questions[/url] | Forum Netiquette
    For index/tuning help, follow these directions.[/url] |Tally Tables[/url]

    Twitter: @AnyWayDBA

  • Evil Kraig F (8/16/2012)


    Why round trip to the data twice, Sergiy, when you've already got the answer from the subquery? That's non-optimal and more work, to boot. 😉

    You don't get GuarantorKey from the subquery, only the latest ID for each clientkey .

    _____________
    Code for TallyGenerator

  • We also may update like this:

    with cte as

    (

    select

    rn = row_number() over (partition by ClientKey order by GuarantorKey desc),

    *

    from

    #tmpGuarantor

    )

    update t

    set t.GuarantorKey = cte.GuarantorKey

    from

    #tmpTest t

    join cte on t.ClientKey = cte.clientkey

    where

    cte.rn = 1

    ;

    select * from #tmpTest;

    I also would like to recomend creating such index:

    create index ix_GK on #tmpGuarantor(clientkey asc, GuarantorKey desc);

    to avoid Sort operator.


    I am really sorry for my poor gramma. And I hope that value of my answers will outweigh the harm for your eyes.
    Blog: http://somewheresomehow.ru[/url]
    Twitter: @SomewereSomehow

Viewing 12 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply