Merge

  • L' Eomot Inversé (2/14/2013)


    Nice question. However, the explanation would have been better if it had pointed out that filters like this can be placed in the when matche/not matched conditions. In this merge statement the when clause should have been when not matched by target and src.RowNo = 1 if the intention was to get only the three rows with RowNo 1.

    Aaahh. The light bulbs are beginning to come on. Thanks for the added insight, Tom!

    Rob Schripsema
    Propack, Inc.

  • Nice question! Thanks

    Not all gray hairs are Dinosaurs!

  • As Tom said,

    the explanation would have been better if it had pointed out that filters like this can be placed in the when matche/not matched conditions.

    However, no one has called out that the explanation is actually flat out wrong. It is simply not true as claimed in the explanation that

    Statements in the on clause of the merge statement are only evaluated if they include a column in the destination table.

    The on clause is used to define how the source rows should be matched (joined) to any existing destination rows. In this case the statement said the rows should only be matched for the first row (RowNo=1) of each id in the source. When RowNo>1 then the source row is not a match for any destination row. At the point that the merge statement is run in the QOTD there are no rows in the destination for any source rows to join on regardless of the RowNo condition.

    The code below demonstrates that a merge statement does indeed evaluate the expressions (not "statements") in the on clause that do not include a column in the destination table.

    I modified the first merge statement (as Tom suggested) to move the RowNo = 1 condition to the WHEN NOT MATCHED clause so that it only inserts one non-matching row for each id. That way we actually get some rows into table B so we run another merge statement that has rows to join to in the destination table.

    The second merge statement uses the condition [font="Courier New"]src.id = dest.id and src.RowNo = 2[/font]. This statement only updates one row because only one source row matches a destination row on id and has [font="Courier New"]RowNo = 2[/font]. SQL Server does evaluate the [font="Courier New"]RowNo = 2[/font] condition even though it doesn't reference the destination table.

    use tempdb;

    go

    create table A (

    id int,

    val int

    );

    go

    insert into A values

    (1,1),

    (1,2),

    (2,3),

    (3,4);

    go

    create table B (

    id int primary key,

    val int

    );

    -- This merge statement inserts 3 rows.

    with CTE as (

    select

    id,

    val,

    row_number() over (partition by id order by val desc) as RowNo

    from A

    )

    merge into B dest

    using CTE src on

    src.id = dest.id

    when not matched by target and src.RowNo = 1 then

    insert (id, val)

    values (id, val);

    -- Returns 2

    select val from B where id = 1;

    -- This merge statement only updates 1 row.

    with CTE as (

    select

    id,

    val,

    row_number() over (partition by id order by val desc) as RowNo

    from A

    )

    merge into B dest

    using CTE src on src.id = dest.id and src.RowNo = 2

    when matched then

    update set val = src.val;

    -- Returns 1

    select val from B where id = 1;

    drop table A;

    drop table B;

  • Great question, Ole Kristian. Learned something new about the MERGE statement.

    Saludos,

    "El" Jerry.

    "A watt of Ottawa" - Gerardo Galvan

    To better understand your help request, please follow these best practices.[/url]

  • Matt Marston (2/14/2013)


    As Tom said,

    the explanation would have been better if it had pointed out that filters like this can be placed in the when matche/not matched conditions.

    However, no one has called out that the explanation is actually flat out wrong. It is simply not true as claimed in the explanation that

    Statements in the on clause of the merge statement are only evaluated if they include a column in the destination table.

    The on clause is used to define how the source rows should be matched (joined) to any existing destination rows. In this case the statement said the rows should only be matched for the first row (RowNo=1) of each id in the source. When RowNo>1 then the source row is not a match for any destination row. At the point that the merge statement is run in the QOTD there are no rows in the destination for any source rows to join on regardless of the RowNo condition.

    The code below demonstrates that a merge statement does indeed evaluate the expressions (not "statements") in the on clause that do not include a column in the destination table.

    This much is true. However, this is also true:

    It is important to specify only the columns from the target table that are used for matching purposes. That is, specify columns from the target table that are compared to the corresponding column of the source table. Do not attempt to improve query performance by filtering out rows in the target table in the ON clause, such as by specifying AND NOT target_table.column_x = value. Doing so may return unexpected and incorrect results.

    (from the reference in the answer: http://technet.microsoft.com/en-us/library/bb510625.aspx).

    While your code happened to work in this instance , it relies on behavior specifically described by Microsoft as non-deterministic, and so should be considered dangerous.

  • sknox (2/14/2013)


    It is important to specify only the columns from the target table that are used for matching purposes. That is, specify columns from the target table that are compared to the corresponding column of the source table. Do not attempt to improve query performance by filtering out rows in the target table in the ON clause, such as by specifying AND NOT target_table.column_x = value. Doing so may return unexpected and incorrect results.

    (from the reference in the answer: http://technet.microsoft.com/en-us/library/bb510625.aspx).

    While your code happened to work in this instance , it relies on behavior specifically described by Microsoft as non-deterministic, and so should be considered dangerous.

    My explanation might not be 100% accurate, but this is indeed the documentation I found when I ran into the issue demonstrated in the question. That it indeed may work in some cases just shows how difficult it is to create a QoT that is 100% correct in the current and coming versions of SQL Server.



    Ole Kristian Velstadbråten Bangås - Virinco - Facebook - Twitter

    Concatenating Row Values in Transact-SQL[/url]

  • Keep them comin', Ole. Thanks!

  • okbangas

    My explanation might not be 100% accurate, but this is indeed the documentation I found when I ran into the issue demonstrated in the question. That it indeed may work in some cases just shows how difficult it is to create a QoT that is 100% correct in the current and coming versions of SQL Server.

    +1000

    Bolding in the quoted statement (above) added by this poster

    If everything seems to be going well, you have obviously overlooked something.

    Ron

    Please help us, help you -before posting a question please read[/url]
    Before posting a performance problem please read[/url]

  • okbangas (2/14/2013)


    My explanation might not be 100% accurate, but this is indeed the documentation I found when I ran into the issue demonstrated in the question. That it indeed may work in some cases just shows how difficult it is to create a QoT that is 100% correct in the current and coming versions of SQL Server.

    Thank you Ole for creating the QotD. I did not intend for my criticism of inaccuracies in the explanation to come across as a criticism of the question. Whether the answer is right or wrong, I find the value from most QotD is in the discussion.

    It is important to specify only the columns from the target table that are used for matching purposes. That is, specify columns from the target table that are compared to the corresponding column of the source table. Do not attempt to improve query performance by filtering out rows in the target table in the ON clause, such as by specifying AND NOT target_table.column_x = value. Doing so may return unexpected and incorrect results.

    (from the reference in the answer: http://technet.microsoft.com/en-us/library/bb510625.aspx).

    The first sentence specifies which columns of the target table should be used (only those used for matching). It does not say that the ON clause can only use columns from the target table (but it is stated that way in the explanation of the QotD). It then goes on to warn against filtering out rows in the target table in the ON clause and gives an example of what not to do using a comparison that only involves the target table. It says nothing about the use of filtering rows of the source table in the ON clause.

    To pick the note apart even further (I realize this is a nit) a merge statement doesn't actually return any results (unless used with an OUTPUT clause) so the last sentence isn't accurate.

    sknox (2/14/2013)


    While your code happened to work in this instance, it relies on behavior specifically described by Microsoft as non-deterministic, and so should be considered dangerous.

    I guess you took the last sentence of the warning to mean that it is non-deterministic. That is not now I read it, but I guess that is open for interpretation. I think it just means that it doesn't do what many people would expect because it is hard to follow the logic and therefore leads people to coding the incorrect logic.

    Is it dangerous? Yes, it is dangerous for developers or DBAs to write and run code they do not understand.

    Should it generally be avoided? Yes, we should avoid writing code that is not easy for someone else to understand.

    Should we try to understand it? I think the answer is "yes". From this discussion it seems that the consensus is that it is dangerous, non-deterministic, and unexpected and therefore can't be understood. But my point is that it can be understood if you look at the logic and that SQL Server is doing exactly what that logic would prescribe.

  • Matt Marston (2/14/2013)


    okbangas (2/14/2013)


    My explanation might not be 100% accurate, but this is indeed the documentation I found when I ran into the issue demonstrated in the question. That it indeed may work in some cases just shows how difficult it is to create a QoT that is 100% correct in the current and coming versions of SQL Server.

    Thank you Ole for creating the QotD. I did not intend for my criticism of inaccuracies in the explanation to come across as a criticism of the question. Whether the answer is right or wrong, I find the value from most QotD is in the discussion.

    It is important to specify only the columns from the target table that are used for matching purposes. That is, specify columns from the target table that are compared to the corresponding column of the source table. Do not attempt to improve query performance by filtering out rows in the target table in the ON clause, such as by specifying AND NOT target_table.column_x = value. Doing so may return unexpected and incorrect results.

    (from the reference in the answer: http://technet.microsoft.com/en-us/library/bb510625.aspx).

    The first sentence specifies which columns of the target table should be used (only those used for matching). It does not say that the ON clause can only use columns from the target table (but it is stated that way in the explanation of the QotD). It then goes on to warn against filtering out rows in the target table in the ON clause and gives an example of what not to do using a comparison that only involves the target table. It says nothing about the use of filtering rows of the source table in the ON clause.

    To pick the note apart even further (I realize this is a nit) a merge statement doesn't actually return any results (unless used with an OUTPUT clause) so the last sentence isn't accurate.

    sknox (2/14/2013)


    While your code happened to work in this instance, it relies on behavior specifically described by Microsoft as non-deterministic, and so should be considered dangerous.

    I guess you took the last sentence of the warning to mean that it is non-deterministic. That is not now I read it, but I guess that is open for interpretation. I think it just means that it doesn't do what many people would expect because it is hard to follow the logic and therefore leads people to coding the incorrect logic.

    Is it dangerous? Yes, it is dangerous for developers or DBAs to write and run code they do not understand.

    Should it generally be avoided? Yes, we should avoid writing code that is not easy for someone else to understand.

    Should we try to understand it? I think the answer is "yes". From this discussion it seems that the consensus is that it is dangerous, non-deterministic, and unexpected and therefore can't be understood. But my point is that it can be understood if you look at the logic and that SQL Server is doing exactly what that logic would prescribe.

    I think you've proven how dangerous it is.

    The QoTD shows that when using NOT MATCHED that the src.RowNo =1 in the 'on clause' is not evaluated or at least not at the correct time. (Great question, I guessed 3, got it wrong and learned something new.)

    Your example shows when using MATCHED on existing rows that the RowNo is evaluated at the right time. (I thank you for this info, it certainly adds to the discussion.)

    One example does not disprove the other and I see a contradiction, or a cognitive dissonance here because both examples should not be true yet are. It should either work in all cases, or not work in all cases. This is likely why Microsoft are explicitly saying not to use it.

    If both these examples are predictably constant then it becomes a very strange set of rules that you could never infer, only memorise. Of course you always practically test your code to ensure it produces the right results, but surely a coding standard that does as Tom (L' Eomot Inversé) pointed out is much better, i.e. always do the filtering like this

    when not matched by target and src.RowNo = 1

    Because it always works without having to remember every quirk of usage.

  • davoscollective (2/14/2013)


    I think you've proven how dangerous it is.

    The QoTD shows that when using NOT MATCHED that the src.RowNo =1 in the 'on clause' is not evaluated or at least not at the correct time. (Great question, I guessed 3, got it wrong and learned something new.)

    Your example shows when using MATCHED on existing rows that the RowNo is evaluated at the right time. (I thank you for this info, it certainly adds to the discussion.)

    One example does not disprove the other and I see a contradiction, or a cognitive dissonance here because both examples should not be true yet are. It should either work in all cases, or not work in all cases. This is likely why Microsoft are explicitly saying not to use it.

    If both these examples are predictably constant then it becomes a very strange set of rules that you could never infer, only memorise. Of course you always practically test your code to ensure it produces the right results, but surely a coding standard that does as Tom (L' Eomot Inversé) pointed out is much better, i.e. always do the filtering like this

    when not matched by target and src.RowNo = 1

    Because it always works without having to remember every quirk of usage.

    There is no contradiction here. Let me try to explain...

    If you go back to the original QotD and replace the MERGE statement with this FULL OUTER JOIN it should illustrate what happened. It can be helpful to think of the MERGE as doing a FULL OUTER JOIN and then the different clauses (WHEN MATCH, WHEN NOT MATCHED) act on the appropriate subsets of the FULL OUTER JOIN results. Below is a SELECT statement with a FULL OUTER JOIN using the same ON clause as in the MERGE. Can you tell me what will be returned by this query?

    with CTE as (

    select

    id,

    val,

    row_number() over (partition by id order by val desc) as RowNo

    from A

    )

    select src.*, dest.*

    from CTE src

    full outer join B dest on src.id = dest.id and src.RowNo = 2;

    I presume you can correctly determine that this will return all the source rows and the dest columns are all null because the source rows didn't have any matches in the dest. The original merge statement specified to insert every source row that didn't join to a target (dest) row. The join demonstrates all of the source rows are without a matching destination row and therefore all of the source rows will be inserted (or they would if it weren't for the PK constraint violation).

    In my earlier example I modified the first merge statement so it would insert some rows in the dest. If you aren't following along from my previous example you can just insert some rows so there is data in table B.

    insert into b values

    (1,2),

    (2,3),

    (3,4);

    Then I demonstrated a second merge statement:

    -- This merge statement only updates 1 row.

    with CTE as (

    select

    id,

    val,

    row_number() over (partition by id order by val desc) as RowNo

    from A

    )

    merge into B dest

    using CTE src on src.id = dest.id and src.RowNo = 2

    when matched then

    update set val = src.val;

    Now image (or actually run) that as a full outer join.

    with CTE as (

    select

    id,

    val,

    row_number() over (partition by id order by val desc) as RowNo

    from A

    )

    select src.*, dest.*

    from CTE src

    full outer join B dest on src.id = dest.id and src.RowNo = 2;

    Or we can run it as an inner join because in the second MERGE example acts on the rows WHEN MATCHED.

    with CTE as (

    select

    id,

    val,

    row_number() over (partition by id order by val desc) as RowNo

    from A

    )

    select src.*, dest.*

    from CTE src

    inner join B dest on src.id = dest.id and src.RowNo = 2;

    That shows the src and dest for the one row that my second MERGE statement updates. In both of these examples SQL behaves exactly as prescribed by set based, relational logic.

  • OK I see your point, but it would also be bad practice to outer join on a constant. I think it's another reason to keep join conditions separate to filter conditions.

  • davoscollective (2/14/2013)


    The QoTD shows that when using NOT MATCHED that the src.RowNo =1 in the 'on clause' is not evaluated or at least not at the correct time. (Great question, I guessed 3, got it wrong and learned something new.)

    Actually no, it doesn't show that. It shows that you have to apply logic very carefully to see what effect a condition will have. The point is that if you are trying to restrict the effect of the operations to take place when NOT MATCHED by conditions in the ON clause, those conditions have to be such as to cause the rows you don't want to be operated on to match, not to ensure that they don't match. The opposite is of course true if you are trying to restrict the scope of the operations in the WHEN MATCHED part.

    That might mislead you to believe that the two following versions of the ON and WHEN clause combination are equivalent. Of course I hope that all of us agree that it is far more sensible to write the first version than the second, and would be even if they actually were equivalent:

    --version 1

    on src.id = dest.id

    when not matched by target and src.RowNo = 1 then

    --version 2

    on src.id = dest.id or src.RowNo <> 1

    when not matched by target then

    However, if there isn't a row in the target then no source row can be matched by the target (not even if the ON clause reads something crazy like "on 0=0") so the two versions are definitely not equivalent - one does filtering when the target is empty, the other doesn't.

    So except in some special cases the ON clause can only be used for conditions that are evaluated when it is checking whether two rows (one in the source and one in the target) match each other. It makes very good sense to avoid writing any other sort of condition there, as it's usually difficult to describe precisely the cases in which other conditions will work and hence difficult to be sure that the code will never encounter data that doesn't fit neatly into those cases and thus difficult to be sure the code will work.

    Tom

  • sknox (2/14/2013)


    it relies on behavior specifically described by Microsoft as non-deterministic, and so should be considered dangerous.

    I wonder why Microsoft allow such code to parse as valid, given that they recognise that it should never be used?

  • Right way is:

    merge into B dest

    using CTE src on src.id = dest.id

    when not matched by target and src.RowNo = 1

    then insert (id, val) values (id, val);

Viewing 15 posts - 16 through 30 (of 39 total)

You must be logged in to reply to this topic. Login to reply