Joining INSERTED and UPDATED tables when PK column changes

  • Hi SSC,

    First off, as should probably be as common as to need no explanation, I did not build the system in question, so "best practices" aren't my immediate concern.

    Second, I've given a lot of detail about my particular situation, but if its  tl;dr  for you,  you can skip  to the end and still get an idea of what I'm asking.

    I'm updating a trigger on a table with a highly composite primary key (there are 9 columns in it). Naturally, some of these values are changed from time to time. Consider the following two tables. They're both nauseatingly similar, but the PKs are slightly different.


    if object_id('tempdb.dbo.#data') is not null drop table #data

    create table #data
    (
      -- Keys
      customer varchar(16) not null,
      symbol varchar(20) not null,
      item smallint not null,
      item_value float not null,
      delivery int not null,
      SymbolType varchar(13) not null,
      CriteriaId varchar(32) not null,
      DestinationID varchar(64) not null,
      ListId varchar(20) not null,
      -- Other stuff
      -- ...
      primary key clustered (customer, symbol, item, item_value, delivery, symboltype, criteriaid, destinationid, listid)
    )

    if object_id('tempdb.dbo.#Other') is not null drop table #Other
    create table #Other
    (
      -- Keys
      customer varchar(16),
      symbol varchar(2),
      item smallint,
      item_value float,
      delivery int,
      SymbolType varchar(13),
      AlertCriteriaId varchar(64),
      RAID varchar(30)
      -- Other stuff
      -- ...
      primary key clustered (customer, symbol, item, item_value, delivery, symboltype, criteriaid, listid, RAID)
    )

    Now, the situation I have right now is how to handle an update to #data.symbol so it cascades to #Other.symbol. I should also note that without some MAJOR re architecture, there is not a way I can hook these up to a foreign key which cascades.

    Ok, so that's the setup. The problem I have is how do I update #Other.Symbol in the trigger? I can certainly identify the DELETED rows, because all those columns are still identical to their original values, and should (in theory) match the key columns in #Other. The problem is, how do I then tie in the INSERTED table data to update #Other.Symbol with the newly inserted value in #Data.Symbol? I can't join on the full primary key in INSERTED because the critical field, Symbol, is different. And without that information, it could match ANY of the rows in #Other with ANY other Symbol value. The corollary of that is I can't join on the full primary key because the symbol in INSERTED won't match a row in #Other.

    To make a long story short, my question boils down to this: If a primary key column changes, how can you correctly pair up rows from the trigger tables INSERTED and DELETED, since the fields which uniquely identify each are no longer in sync?

    Executive Junior Cowboy Developer, Esq.[/url]

  • Xedni - Wednesday, February 15, 2017 4:57 PM

    Hi SSC,

    First off, as should probably be as common as to need no explanation, I did not build the system in question, so "best practices" aren't my immediate concern.

    Second, I've given a lot of detail about my particular situation, but if its  tl;dr  for you,  you can skip  to the end and still get an idea of what I'm asking.

    I'm updating a trigger on a table with a highly composite primary key (there are 9 columns in it). Naturally, some of these values are changed from time to time. Consider the following two tables. They're both nauseatingly similar, but the PKs are slightly different.


    if object_id('tempdb.dbo.#data') is not null drop table #data

    create table #data
    (
      -- Keys
      customer varchar(16) not null,
      symbol varchar(20) not null,
      item smallint not null,
      item_value float not null,
      delivery int not null,
      SymbolType varchar(13) not null,
      CriteriaId varchar(32) not null,
      DestinationID varchar(64) not null,
      ListId varchar(20) not null,
      -- Other stuff
      -- ...
      primary key clustered (customer, symbol, item, item_value, delivery, symboltype, criteriaid, destinationid, listid)
    )

    if object_id('tempdb.dbo.#Other') is not null drop table #Other
    create table #Other
    (
      -- Keys
      customer varchar(16),
      symbol varchar(2),
      item smallint,
      item_value float,
      delivery int,
      SymbolType varchar(13),
      AlertCriteriaId varchar(64),
      RAID varchar(30)
      -- Other stuff
      -- ...
      primary key clustered (customer, symbol, item, item_value, delivery, symboltype, criteriaid, listid, RAID)
    )

    Now, the situation I have right now is how to handle an update to #data.symbol so it cascades to #Other.symbol. I should also note that without some MAJOR re architecture, there is not a way I can hook these up to a foreign key which cascades.

    Ok, so that's the setup. The problem I have is how do I update #Other.Symbol in the trigger? I can certainly identify the DELETED rows, because all those columns are still identical to their original values, and should (in theory) match the key columns in #Other. The problem is, how do I then tie in the INSERTED table data to update #Other.Symbol with the newly inserted value in #Data.Symbol? I can't join on the full primary key in INSERTED because the critical field, Symbol, is different. And without that information, it could match ANY of the rows in #Other with ANY other Symbol value. The corollary of that is I can't join on the full primary key because the symbol in INSERTED won't match a row in #Other.

    To make a long story short, my question boils down to this: If a primary key column changes, how can you correctly pair up rows from the trigger tables INSERTED and DELETED, since the fields which uniquely identify each are no longer in sync?

    And therein lies the problem with the existing design...   However...   it may be ugly, but there is a way to tell which is which, but only if just one field is being updated and only one record has been updated.   So if for any reason, a single update hits multiple records with changes to multiple fields in the PK, you may well be just plain SOL.   What you would need to do is build a query to compare all the PK fields between INSERTED and DELETED.   Such a query probably just doesn't belong in a trigger, which is where the ugly comes in.   Another alternative is to add a new unique identity field to each of these tables, and in the long run, might be a much better choice.

    Steve (aka sgmunson) 🙂 🙂 🙂
    Rent Servers for Income (picks and shovels strategy)

  • Disclaimers:
    1. a redesign of your PKs would be a much better choice.
    2. This approach makes a big assumption and may not work consistently in larger volumes, but it was a fun challenge and works in my limited testing so I figured I would post it.
    The big assumption is that rows are added to the inserted and deleted tables in the same order.  Logically it makes sense, but I have no way to confirm it is technically correct.  Your mileage may vary.

    I modified your table & Pk definitions slightly since they didn't match up, and made them permanent tables.
    In an update trigger on the Data table, I've assigned row numbers to the records in the inserted and deleted tables, then used that as the join key to match them up.  To make the process easier to see, I inserted the output into an Intermediate table, which has a insert trigger to update the Other table.  I don't think the intermediate table is necessary, I used it just to capture the data.

    I generated data using RedGate and tested a few combinations.  It worked consistently each time and kept the Other table in sync with the Data table, even when all but one column was updated (must stay unique) for multiple rows.
    I'm looking forward to the conversation this will generate. 🙂

    use tempdb;

    CREATE TABLE [dbo].[Intermediate](
     [Icustomer] [VARCHAR](16) NOT NULL,
     [Isymbol] [VARCHAR](20) NOT NULL,
     [Iitem] [SMALLINT] NOT NULL,
     [Iitem_value] [FLOAT] NOT NULL,
     [Idelivery] [INT] NOT NULL,
     [ISymbolType] [VARCHAR](13) NOT NULL,
     [ICriteriaId] [VARCHAR](32) NOT NULL,
     [IDestinationID] [VARCHAR](64) NOT NULL,
     [IListId] [VARCHAR](20) NOT NULL,
     [Dcustomer] [VARCHAR](16) NOT NULL,
     [Dsymbol] [VARCHAR](20) NOT NULL,
     [Ditem] [SMALLINT] NOT NULL,
     [Ditem_value] [FLOAT] NOT NULL,
     [Ddelivery] [INT] NOT NULL,
     [DSymbolType] [VARCHAR](13) NOT NULL,
     [DCriteriaId] [VARCHAR](32) NOT NULL,
     [DDestinationID] [VARCHAR](64) NOT NULL,
     [DListId] [VARCHAR](20) NOT NULL
    ) ON [PRIMARY]
    CREATE TRIGGER trg_data
    ON dbo.data
    AFTER UPDATE
    AS
    BEGIN
      WITH cInserted
      AS (SELECT    *,
                    RowSeq = ROW_NUMBER() OVER (ORDER BY (SELECT    NULL))
          FROM  Inserted
         ),
           cDeleted
      AS (SELECT    *,
                    RowSeq = ROW_NUMBER() OVER (ORDER BY (SELECT    NULL))
          FROM  Deleted
         )
      INSERT INTO   dbo.Intermediate ( Icustomer,
                                       Isymbol,
                                       Iitem,
                                       Iitem_value,
                                       Idelivery,
                                       ISymbolType,
                                       ICriteriaId,
                                       IDestinationID,
                                       IListId,
                                       Dcustomer,
                                       Dsymbol,
                                       Ditem,
                                       Ditem_value,
                                       Ddelivery,
                                       DSymbolType,
                                       DCriteriaId,
                                       DDestinationID,
                                       DListId
                                     )
      SELECT    cInserted.customer,
                cInserted.symbol,
                cInserted.item,
                cInserted.item_value,
                cInserted.delivery,
                cInserted.SymbolType,
                cInserted.CriteriaId,
                cInserted.DestinationID,
                cInserted.ListId,
                cDeleted.customer,
                cDeleted.symbol,
                cDeleted.item,
                cDeleted.item_value,
                cDeleted.delivery,
                cDeleted.SymbolType,
                cDeleted.CriteriaId,
                cDeleted.DestinationID,
                cDeleted.ListId
      FROM  cInserted
        INNER JOIN cDeleted
          ON cInserted.RowSeq = cDeleted.RowSeq;
    END;
    CREATE  TRIGGER trg_intermediate
    ON dbo.Intermediate
    AFTER INSERT
    AS
    BEGIN
      WITH cte
      AS (SELECT    *
          FROM  dbo.Intermediate AS i
            INNER JOIN dbo.Other AS o
     --join on the old values
              ON i.Dcustomer = o.customer
                 AND i.Dsymbol = o.symbol
                 AND i.Ditem = o.item
                 AND i.Ditem_value = o.item_value
                 AND i.Ddelivery = o.delivery
                 AND i.DSymbolType = o.SymbolType
                 AND i.DCriteriaId = o.AlertCriteriaId
                 AND i.DListId = o.RAID
         )
      UPDATE    cte
     --update using the new values
      SET   customer = Icustomer,
            symbol = Isymbol,
            item = Iitem,
            item_value = Iitem_value,
            delivery = Idelivery,
            SymbolType = ISymbolType,
            AlertCriteriaId = ICriteriaId,
            RAID = IListId
    END;

    Wes
    (A solid design is always preferable to a creative workaround)

  • Huh! That's a really clever approach (of course relying in the implicit ordering of those tables). Very interesting food for though. But yeah, maybe I should just push for a different key. Thanks!

    Executive Junior Cowboy Developer, Esq.[/url]

  • sgmunson - Thursday, February 16, 2017 6:29 AM

    And therein lies the problem with the existing design...   However...   it may be ugly, but there is a way to tell which is which, but only if just one field is being updated and only one record has been updated.   So if for any reason, a single update hits multiple records with changes to multiple fields in the PK, you may well be just plain SOL.   What you would need to do is build a query to compare all the PK fields between INSERTED and DELETED.   Such a query probably just doesn't belong in a trigger, which is where the ugly comes in.   Another alternative is to add a new unique identity field to each of these tables, and in the long run, might be a much better choice.

    Yeah, and sadly that's the condition I have; where I cant guarantee granular updates. THe best I could come up with on the ride home was delete everything out of the table I want to keep in sync (joining on the [deleted] table) and then perform a subsequent insert from the [inserted] table (left outer join where target is null kind of thing). However that's sort of a colossal pain in the hindquarters.

    Thank you for the response. It's good to have another opinion on the matter.

    Executive Junior Cowboy Developer, Esq.[/url]

  • Xedni - Thursday, February 16, 2017 8:40 AM

    sgmunson - Thursday, February 16, 2017 6:29 AM

    And therein lies the problem with the existing design...   However...   it may be ugly, but there is a way to tell which is which, but only if just one field is being updated and only one record has been updated.   So if for any reason, a single update hits multiple records with changes to multiple fields in the PK, you may well be just plain SOL.   What you would need to do is build a query to compare all the PK fields between INSERTED and DELETED.   Such a query probably just doesn't belong in a trigger, which is where the ugly comes in.   Another alternative is to add a new unique identity field to each of these tables, and in the long run, might be a much better choice.

    Yeah, and sadly that's the condition I have; where I cant guarantee granular updates. THe best I could come up with on the ride home was delete everything out of the table I want to keep in sync (joining on the [deleted] table) and then perform a subsequent insert from the [inserted] table (left outer join where target is null kind of thing). However that's sort of a colossal pain in the hindquarters.

    Thank you for the response. It's good to have another opinion on the matter.

    Yep... can't tell you how often just having a second set of eyes on your problem can, if nothing else, let you see things your "box" you were stuck in was hiding from you.   My gut says it's time to change the primary key to a single identity field and let that be the clustered index, then add this combination of a large number of fields have it's own unique nonclustered index.   Disk space, in relative terms, is cheap...

    Steve (aka sgmunson) 🙂 🙂 🙂
    Rent Servers for Income (picks and shovels strategy)

  • /nod

    Executive Junior Cowboy Developer, Esq.[/url]

  • whenriksen - Thursday, February 16, 2017 8:23 AM

    Disclaimers:
    1. a redesign of your PKs would be a much better choice.
    2. This approach makes a big assumption and may not work consistently in larger volumes, but it was a fun challenge and works in my limited testing so I figured I would post it.
    The big assumption is that rows are added to the inserted and deleted tables in the same order.  Logically it makes sense, but I have no way to confirm it is technically correct.  Your mileage may vary.

    I modified your table & Pk definitions slightly since they didn't match up, and made them permanent tables.
    In an update trigger on the Data table, I've assigned row numbers to the records in the inserted and deleted tables, then used that as the join key to match them up.  To make the process easier to see, I inserted the output into an Intermediate table, which has a insert trigger to update the Other table.  I don't think the intermediate table is necessary, I used it just to capture the data.

    I generated data using RedGate and tested a few combinations.  It worked consistently each time and kept the Other table in sync with the Data table, even when all but one column was updated (must stay unique) for multiple rows.
    I'm looking forward to the conversation this will generate. 🙂

    So I buy the argument that rows are added to inserted/deleted in sequence, but I don't know if I buy that that order can be guaranteed when selected back out. In something like a quirky update, we know those rows are physically ordered on disk like that and so you can, if careful, take advantage of that fact. But my understanding is the inserted/deleted tables are just stored in memory which means where they actually ARE in memory could be a complete mess. It's completely possible I'm missing some element of this, but can you explain why you think order will be maintained in your sample code?

    Executive Junior Cowboy Developer, Esq.[/url]

  • Xedni - Thursday, February 16, 2017 12:27 PM

    So I buy the argument that rows are added to inserted/deleted in sequence, but I don't know if I buy that that order can be guaranteed when selected back out. In something like a quirky update, we know those rows are physically ordered on disk like that and so you can, if careful, take advantage of that fact. But my understanding is the inserted/deleted tables are just stored in memory which means where they actually ARE in memory could be a complete mess. It's completely possible I'm missing some element of this, but can you explain why you think order will be maintained in your sample code?

    Its depending on the inserted and deleted tables being populated in the same sequence and using Row_Number as a unique identifier for the tables to be joined on, as a pseudo identity key. 

    I don't believe SQL Server operates at the physical memory address layer and would not expect the location in memory to affect the order of records in a table.  It may affect the order in which records are touched in parallelism, but as long as the records arrive in inserted & deleted in the same order, the row_number works to join the records together.

    From what I understand, deep under the set based covers, SQL Server still handles each record individually.  It seemed logical to me that the update to a specific record in a set should cause inserts into both Inserted & Deleted before the next record in the set is processed.

    However, I very easily could be wrong.  I think the underlying structure of the inserted/deleted tables would need a deep dive by one of the resident masters here. 🙂

    The "Can I" part of the question I think is answered with a Yes, but for the "Should I", I would probably lean toward No.  Although this exercise is interesting and may very well work, I would consider it experimental code, possibly fragile, and suspect after patch/hotfix.  In the end, I still think redesigning your PK is the best option.  A solid design is always preferable to a creative workaround.  ( I think I'll use that as my signature!)

    Wes
    (A solid design is always preferable to a creative workaround)

  • whenriksen - Thursday, February 16, 2017 12:59 PM

    The "Can I" part of the question I think is answered with a Yes, but for the "Should I", I would probably lean toward No.  Although this exercise is interesting and may very well work, I would consider it experimental code, possibly fragile, and suspect after patch/hotfix.  In the end, I still think redesigning your PK is the best option.  A solid design is always preferable to a creative workaround.  ( I think I'll use that as my signature!)

    Yeah, I fully agree with you on redesigning the PK and believe me, I'm pushing for that. As for your solution, maybe I'll build it on my local instance and just see if I can try to break it 🙂

    Executive Junior Cowboy Developer, Esq.[/url]

  • You could add a unique identity column without touching the existing primary key, and that identity value would still allow you to join the deleted and inserted tables accurately.

    SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

  • ScottPletcher - Thursday, February 16, 2017 4:48 PM

    You could add a unique identity column without touching the existing primary key, and that identity value would still allow you to join the deleted and inserted tables accurately.

    Scott,

    I suggested that early on page 1...

    Steve (aka sgmunson) 🙂 🙂 🙂
    Rent Servers for Income (picks and shovels strategy)

  • sgmunson - Thursday, February 16, 2017 9:25 PM

    ScottPletcher - Thursday, February 16, 2017 4:48 PM

    You could add a unique identity column without touching the existing primary key, and that identity value would still allow you to join the deleted and inserted tables accurately.

    Scott,

    I suggested that early on page 1...

    I saw that, but I just wanted to make it clear to the OP that the identity did not have to become the PK, it could be in addition to that.  So many people have become so fixated on having identity as the a priori / "default" clustering key that they almost forget you can add identity without clustering on it :-).

    SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

  • ScottPletcher - Thursday, February 16, 2017 4:48 PM

    You could add a unique identity column without touching the existing primary key, and that identity value would still allow you to join the deleted and inserted tables accurately.

    If multiple rows are inserted in a single batch and the clustered indexes are not identical between tables, wouldn't the identity keys potentially be different?

    Wes
    (A solid design is always preferable to a creative workaround)

  • whenriksen - Friday, February 17, 2017 8:31 AM

    ScottPletcher - Thursday, February 16, 2017 4:48 PM

    You could add a unique identity column without touching the existing primary key, and that identity value would still allow you to join the deleted and inserted tables accurately.

    If multiple rows are inserted in a single batch and the clustered indexes are not identical between tables, wouldn't the identity keys potentially be different?

    If I understand the problem correctly,

    1. The identity field does not have to be clustered. It is just a unique number that is an index unto itself.
    2. Only Table A would have the identity attribute. Table B would have just an int column (unique index) that will contain the identity value from Table A.

    Setting it up would go something like this:

    Create an identity int in Table A and an int in Table B.
    Populate the identity int in Table A
    Join the two tables on the primary key and update Table B's int with the Table A's identity value.

    Now when the primary key changes in Table A, join on the identity value and update the primary key values. Done.
    If there is a batch of Inserts into Table A, just insert the data into Table B including the new identity from Table A. Done.

    This would work well with MERGE.

    __________________________________________________________________________________________________________
    How to Post to get the most: http://www.sqlservercentral.com/articles/Best+Practices/61537/

Viewing 15 posts - 1 through 15 (of 18 total)

You must be logged in to reply to this topic. Login to reply