Update using Join to multiple Records - which one wins?

Question

Update using Join to multiple Records - which one wins?

trevor.adams

SSCommitted

Points: 1505
More actions
August 27, 2019 at 10:19 am

#3674644

Hello there, it feels like a surprisingly basic question.
In the update statement below we have a few examples where alt_code2 is not unique in table b and description differs (it is also not unique in table a)
My question is could we be certain that we would retrieve the same b record for each alt_code2? (i.e I accept we wouldn't know which matching record (b) would take precedence but would it be consistent?)
UPDATE a
SET --a.DIAGNOSIS_CODE = b.CODE ,
a.DIAGNOSIS_DESC = b.DESCRIPTION
FROM #DimDiagnosisCodes a
INNER JOIN dbo.Ref_ICD_10_edition4 b ON a.ALT_CODE2 = b.ALT_CODE2;
(Of course, in reality the join is incorrect and our fix will be to correct the join - this is about improving my SQL knowledge)
August 28, 2019 at 9:57 am

This was removed by the editor as SPAM

Viewing 9 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply

Thom A SSC Guru Points: 99126 More actions · Answer 1

it doesn't sound like the join is incorrect, it seems like the problem is the data, and that you have a many-to-many relationship. Seems like you need to fix the data; seems odd that a code can appear multiple times so seems you need to fix that and have one definition for a single code.

Thom~

Excuse my typos and sometimes awful grammar. My fingers work faster than my brain does.
Larnu.uk

ktflash Default port Points: 1400 More actions · Answer 2

Thom A wrote:

it doesn't sound like the join is incorrect, it seems like the problem is the data, and that you have a many-to-many relationship. Seems like you need to fix the data; seems odd that a code can appear multiple times so seems you need to fix that and have one definition for a single code.

the M:N Part is correct, but that is only because the join is "wrong"

the join column is what makes it M:N in the first place

The question is if the results are consistent in:

what happens on the next time the update is executed, is there a possibility that one of the other possible "options" now replaced a previous one just based on a new exection of the update

I just tested the following:

created a inner join that has M:N aswell and ran an update on it, making a backup of the updated column and reran the update, the updated columns atleast were identical with the backuped column values for all rows

i added more columns to the 2nd table i was joining in, the number of rows became even higher but after the update it was still the same result, back uped column = updated column

do the columns with the same ALT_CODE2 have the same update or can it differ

when looking at the test table i updated, i have no different values for the join column i decided to use for the M:N join

so it seems they all got the same values

atleast it seems like there is some consistency on the updated in that test, cant garuantee though that would someone have to answer that actually knows how execution plans work

EDIT:

BIG UFF ON MY PART

i coincidentally have a issue of this topic, i get different results using differect SSMS Windows (connections), in my tests i always used the same holy sh1t thats tough

This reply was modified 4 years, 8 months ago by ktflash.
This reply was modified 4 years, 8 months ago by ktflash.

I want to be the very best
Like no one ever was

https://www.sqlservercentral.com/articles/forum-etiquette-how-to-post-datacode-on-a-forum-to-get-the-best-help

ScottPletcher SSC Guru Points: 100950 More actions · Answer 3

As I understand it, if multiple matches are possible, SQL can match any row. That is, the results are unknown and you should assume they are effectively random.

Similarly, if you specify "SELECT TOP (1) *" without an ORDER BY, SQL can return any row in the table.

SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

nova SSC Journeyman Points: 80 More actions · Answer 4

As Scott says, the result is highly unreliable and as far as I know, undocumented. This is actually a significant flaw with SQL Server's unconventional and non-standard method of doing joins in an UPDATE statement. The problem is arguably Sybase's fault not Microsoft's because it dates from pre-Microsoft SQL Server times. The fix is to modify your join clause but I would argue that SQL Server ought to fail-safe in this case and return a warning or error message rather than silently updating your data with unpredictable results.

The equivalent MERGE statement does the right thing in this case:

MERGE INTO #DimDiagnosisCodes a
USING dbo.Ref_ICD_10_edition4 b
ON a.ALT_CODE2 = b.ALT_CODE2
WHEN MATCHED THEN UPDATE
SET DIAGNOSIS_CODE = b.CODE,
DIAGNOSIS_DESC = b.DESCRIPTION;

If ALT_CODE2 is not unique in table b then when attempting to use MERGE you should get the following error message: "The MERGE statement attempted to UPDATE or DELETE the same row more than once."

The MERGE version being standard SQL it perhaps also has the advantage of being more likely to be understood by more people.

trevor.adams SSCommitted Points: 1505 More actions · Answer 5

Thanks for all answers - Nova yes a really thought-provoking answer

Jonathan AC Roberts SSCoach Points: 18271 More actions · Answer 6

If you know which row you want to use from table dbo.Ref_ICD_10_edition4 then you can change the update to use a cross apply with select top(1) and order by.

UPDATE a
   SET a.DIAGNOSIS_DESC = b.DESCRIPTION
  FROM #DimDiagnosisCodes a
 CROSS APPLY(SELECT TOP(1) DESCRIPTION
               FROM dbo.Ref_ICD_10_edition4 b 
              WHERE a.ALT_CODE2 = b.ALT_CODE2
              ORDER BY b.ColumnToOrderOn) b;

fahey.jonathan Hall of Fame Points: 3617 More actions · Answer 7

I usually solve the problem by doing a JOIN first to get the set of records with a "SortSeq" column that calculates "1" for my preferred row, the JOIN that back to the update table. In this case, all the necessary data is contained in the second table, so that can be used directly.

UPDATEa
SETa.DIAGNOSIS_DESC = b.DESCRIPTION
FROM#DimDiagnosisCodes a
INNER JOIN (
SELECTALT_CODE2
DESCRIPTION,
SortSeq= ROW_NUMBER() OVER (PARTITION BY ALT_CODE2 ORDER BY {{sorting fields}})
FROMdbo.Ref_ICD_10_edition4
) b
ONa.ALT_CODE2 = b.ALT_CODE2
WHEREb.SortSeq = 1;

Make sure that your sorting fields calculate "1" for the desired row.

If your main table is large and the temp table is small, you may get better performance by joining the temp table to the base table first, then joining the result back to the temp table to do the update.

UPDATEu
SETu.DIAGNOSIS_DESC = x.DESCRIPTION
FROM#DimDiagnosisCodes u
INNER JOIN (
SELECTa.ALT_CODE2,
b.DESCRIPTION,
SortSeq= ROW_NUMBER() OVER (PARTITION BY a.ALT_CODE2 ORDER BY {{sorting fields}})
FROM#DimDiagnosisCodes a
INNER JOIN dbo.Ref_ICD_10_edition4 b
ONa.ALT_CODE2 = b.ALT_CODE2
) x
ONu.ALT_CODE2 = x.ALT_CODE2;
WHEREx.SortSeq = 1

This reply was modified 4 years, 8 months ago by fahey.jonathan.