SCD maintenance with the help of MERGE

  • Hello folks,

    I am trying to maintain SCD type 2 based on merge statement and here are my cases and results:

    1. If source and target contain same records, then no action. Successful.

    2. If source and target match based on matching ID but there is a change in data (i am using binary_checksum), then disable current record in target. Successful.

    3. Finally insert new record with changes (as second part of step 2). Succesful.

    4. If not matched then new record inserted in Target. Successful.

    5. If not matched and no record in source (but exists in target), update target EndDate as today and mark it as disable. Not Successful.

    For point 5, is there a way to implement using the same merge OR i will need to write separate LEFT JOIN ?

    Also is MERGE a good way to maintain SCD type 2 ? Are there any good ways to implement it ? My generic columns are SCDStartDate, SCDEndDate and Is_Latest.

    Your inputs will be highly appreciated.

  • I feel like I've run across this before.  I'll give it some thought, but for now...have you considered Temporal Tables?

    https://docs.microsoft.com/en-us/sql/relational-databases/tables/temporal-tables?view=sql-server-2017

     

  • I'm assuming your doing a full load, right?  You couldn't be soft-deleting target rows during an incremental load.  A note of caution: when doing a full load you need to ensure that the source has fully loaded into stage before doing your soft-delete.  If for example, the load was only partial, you would end up deleting too many rows.

    On a full load, I do this before merging:

    if @IsFullLoad = 1 
    begin;
    -- Get rows to be deleted
    -- Ok to re-delete
    select
    e.ProjectKey
    into
    #RowsToDelete
    from
    [dbo].[DimOperationsProjects] e
    where
    e.ProjectSourceSystem = @SourceSystem
    and not exists (
    select 1
    from [Stage].[DimOperationsProjects] s
    where
    s.LoadGUID = @LoadGUID
    and s.ProjectSourceSystemId = e.ProjectSourceSystemId
    and [IsRowError] = 0
    );

    -- Delete facts (if any)

    -- Delete members (exclude special "none" member)
    update [dbo].[DimOperationsProjects] set ProjectWhenDeleted = getdate() where ProjectKey in (select r.ProjectKey from #RowsToDelete r) and ProjectKey <> -1;

    drop table #RowsToDelete;
    end;

     

  • one last point, you could delete during an incremental load IF you had a field from the source indicating the row was deleted.

  • Hi Heb, i dont want to delete anything from target table. My source table is a truncate and load feed. Once source is loaded properly, then only other process will trigger that will run the merge statement. As i said, there will be no deletes from target table. So if a record is present in Target table which is not present in source table, then that records should be disabled and SCDEndDate should be updated as today's date.

  • I think we're on the same page.  You said, "if a record is present in Target table which is not present in source table, then that records should be disabled and SCDEndDate should be updated as today's date."  In my mind, that's a soft-delete.

  • Yes, i am talking about same soft delete. However my following condition is not working:

    When not matched with source and is_latest = 'Y' Then update is_latest = 'N', SCDEndDate = getdate()

  • Did you look at the code I gave you?  I wanted to show you how I handle the same situation you are facing.

  • Yes i checked and basically you are updating the status before the merge statement. Is it possible to do all in one single merge statement ?

  • Gotcha.  I went back to SSMS (rather than pull something I already had)...how about this?  Is this what you need?

    declare @t table (cola int, IsActive bit)
    declare @s table (cola int)

    insert into @t values (1,1),(2,1)
    insert into @s values (1),(3);

    merge @t as target
    using (select * from @s) as source
    on target.cola = source.cola
    when matched then update set target.IsActive = 1
    when not matched then insert (cola,IsActive) values (source.cola,1)
    when not matched by source then update set IsActive = 0;

    select * from @t
  • Exactly that's what i am looking for but "when not matched by source" is not working for the UPDATE. It throws error like "cannot insert the value NULL into column 'ID', table dbo.target; column does not allow nulls. INSERT fails"

  • You're saying that without the "when not matched by source" clause it works?  That would be interesting since the error message is referring to an insert action.  Do you have any triggers on the table?

  • No trigger. Even i was surprised by this error. When i remove this "when not matched by source", it runs smoothly. Just that the record which is present in target but not in source remains there as active. Its this record which i dont want to delete but disable and hence need update.

  • In addition, if i mention "when not matched by source then delete", it works like a charm and deletes the record which is present in target but not in source.

  • wow, that is interesting!

    I would write an update query (similar to what I showed you) before the merge rather than spend a bunch of time trying to solve this all in a merge statement.  The update query prior to the merge would simply update the rows in the target which are not in the source - very simple query.

     

Viewing 15 posts - 1 through 15 (of 18 total)

You must be logged in to reply to this topic. Login to reply