Update table from join that generates duplicates - how does this work behind?

  • Hi,

    I'm wondering what does the T-SQL engine does in this case:

    I created two simple tables with data as an example:

    Table FACT:

    ID_SYSTEM ID_ACCOUNT

    0 1

    0 2

    0 1

    Table DIM:

    ID_ACCOUNT ID_SET_SYSTEM

    1 1

    1 2

    the update query is:

    update FACT

    set FACT.id_system=DIM.id_set_system

    from [UPDATE_FACT] as FACT

    inner join [UPDATE_DIM] DIM

    on FACT.id_account=DIM.id_account

    the FACT i get after the update is:

    ID_SYSTEM ID_ACCOUNT

    1 1

    0 2

    1 1

    the result of the inner join that generates duplicates is :

    ID_SYSTEM ID_ACCOUNT ID_SET_SYSTEM

    0 1 1

    0 1 1

    0 1 2

    0 1 2

    the table above is generated with this query:

    select test.*,dim.id_set_system from [UPDATE_FACT] test

    inner join [UPDATE_DIM] dim

    on test.id_account=dim.id_account

    Since there are different possible values for id_set_system I'm wondering how does sql-server generates this result i.e why does it affects the first two values of id_set_system and not the last two ones , the result is random ? or does it does some grouping and applies some aggregation ? ... ?

    Thanx for the info

  • From the official documentation for the UPDATE statement on MSDN:

    "Use caution when specifying the FROM clause to provide the criteria for the update operation. The results of an UPDATE statement are undefined if the statement includes a FROM clause that is not specified in such a way that only one value is available for each column occurrence that is updated, that is if the UPDATE statement is not deterministic."

    https://msdn.microsoft.com/en-us/library/ms177523.aspx

  • Yes, I've seen that described as an "unreported cardinality error". In other words, there is more than one value to choose from for any or each of the rows to be updated, and SQL Server chooses whichever one it likes, but does not report an error. You can get round this by using a correlated subquery for your update instead of the UPDATE...FROM syntax, or you can use the MERGE statement. I'm not saying either of those is without its drawbacks, just that it's something you might consider.

    John

  • Thank you ! i hadn't thought about checking the MSDN page of the update function.

    I'm not trying to do anything with this query. while working on an update i just thought of what would happen if there were duplicates and tried it to check it out.

    thanx again for the answers

Viewing 4 posts - 1 through 3 (of 3 total)

You must be logged in to reply to this topic. Login to reply