Need a suggestion for my SQL requirement

  • Hello everyone,

    Need some best solution for my below requirement.

    I need to Insert/update/delete some data into my target based as per my staging (meta) data.

     

    For example -

     

    Target Table: dbo.Employee (With some existing data)

     

    EmpID

    FirstName

    MiddleName

    LastName

    Designation

    UpdatedDate

    1

    FN1

    MN1

    LN1

    D1

    2018-01-01

    2

    FN2

    MN2

    LN2

    D2

    2018-01-01

    3

    FN3

    MN3

    LN3

    D3

    2018-01-01

    4

    FN4

    MN4

    LN4

    D4

    2018-01-01

    5

    FN5

    MN5

    LN5

    D5

    2018-01-01

    6

    FN6

    MN6

    LN6

    D6

    2018-01-01

     

    Staging Table: staging.Employee

     

    EmpID

    ColumnName

    ChangeType

    Value

    ChangeDate

    1

    FirstName

    Update

    FN1.1

    2018-01-02

    2

    FirstName

    Update

    FN2.1

    2018-01-02

    2

    MiddleName

    Delete

    -

    2018-01-02

    3

    LastName

    Update

    LN3.1

    2018-01-02

    3

    Designation

    Update

    D3.1

    2018-01-02

     

    After applying the staging changeto my target table, my target data should be like below.

     

    EmpID

    FirstName

    MiddleName

    LastName

    Designation

    UpdatedDate

    1

    FN1.1

    MN1

    LN1

    D1

    2018-01-02

    2

    FN2.1

    -

    LN2

    D2

    2018-01-02

    3

    FN3

    MN3

    LN3.1

    D3.1

    2018-01-02

    4

    FN4

    MN4

    LN4

    D4

    2018-01-01

    5

    FN5

    MN5

    LN5

    D5

    2018-01-01

    6

    FN6

    MN6

    LN6

    D6

    2018-01-01

     

    I thought generating required dynamic SQL DML commands as per the staging(meta) data.

    Could someone please share your thoughts if there is any best approach to achieve this?

     

  • IF OBJECT_ID('tempdb..#dbo_Employees') IS NOT NULL DROP TABLE #dbo_Employees;
    SELECT * INTO #dbo_Employees FROM (VALUES
     (1,CAST('FN1' AS VARCHAR(10)), CAST('MN1' AS VARCHAR(10)), CAST('LN1' AS VARCHAR(10)), CAST('D1' AS VARCHAR(10)), CAST('2018-01-01' AS DATE)),
     (2,'FN2', 'MN2', 'LN2', 'D2', '2018-01-01'),
     (3,'FN3', 'MN3', 'LN3', 'D3', '2018-01-01'),
     (4,'FN4', 'MN4', 'LN4', 'D4', '2018-01-01'),
     (5,'FN5', 'MN5', 'LN5', 'D5', '2018-01-01'),
     (6,'FN6', 'MN6', 'LN6', 'D6', '2018-01-01')
    ) d (EmpID,FirstName,MiddleName,LastName,Designation,UpdatedDate)
    IF OBJECT_ID('tempdb..#staging_Employees') IS NOT NULL DROP TABLE #staging_Employees;
    SELECT * INTO #staging_Employees FROM (VALUES
    (1, 'FirstName', 'Update', 'FN1.1', '2018-01-02'),
    (2, 'FirstName', 'Update', 'FN2.1', '2018-01-02'),
    (2, 'MiddleName', 'Delete', '-', '2018-01-02'),
    (3, 'LastName', 'Update', 'LN3.1', '2018-01-02'),
    (3, 'Designation', 'Update', 'D3.1', '2018-01-02')
    ) d (EmpID,ColumnName,ChangeType,Value,ChangeDate)
    -- Look at what we're going to do
    SELECT *
    FROM #dbo_Employees de
    CROSS APPLY (
     SELECT
      FirstName = MAX(CASE WHEN se.ColumnName = 'FirstName' THEN se.[Value] ELSE NULL END),
      MiddleName = MAX(CASE WHEN se.ColumnName = 'MiddleName' THEN se.[Value] ELSE NULL END),
      LastName = MAX(CASE WHEN se.ColumnName = 'LastName' THEN se.[Value] ELSE NULL END),
      Designation = MAX(CASE WHEN se.ColumnName = 'Designation' THEN se.[Value] ELSE NULL END),
      UpdatedDate = MAX(se.ChangeDate)
     FROM #staging_Employees se
     WHERE se.EmpID = de.EmpID
     GROUP BY se.EmpID
    ) x
    -- Do it
    UPDATE de SET
      FirstName = ISNULL(x.FirstName,de.FirstName),
      MiddleName = ISNULL(x.MiddleName,de.MiddleName),
      LastName = ISNULL(x.LastName,de.LastName),
      Designation = ISNULL(x.Designation,de.Designation),
      UpdatedDate = x.UpdatedDate 
    FROM #dbo_Employees de
    CROSS APPLY (
     SELECT
      FirstName = MAX(CASE WHEN se.ColumnName = 'FirstName' THEN se.[Value] ELSE NULL END),
      MiddleName = MAX(CASE WHEN se.ColumnName = 'MiddleName' THEN se.[Value] ELSE NULL END),
      LastName = MAX(CASE WHEN se.ColumnName = 'LastName' THEN se.[Value] ELSE NULL END),
      Designation = MAX(CASE WHEN se.ColumnName = 'Designation' THEN se.[Value] ELSE NULL END),
      UpdatedDate = MAX(se.ChangeDate)
     FROM #staging_Employees se
     WHERE se.EmpID = de.EmpID
     GROUP BY se.EmpID
    ) x
    -- Check if it has worked
    SELECT * FROM #dbo_Employees
    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • ChrisM's solution looks good.

  • You could also look at the MERGE statement, which will do inserts, updates, and deletes from a source data set to a target data set in a single statement.  There are clauses for what to do when matching records are found (UPDATE), what to do when the source has records and the target does not (INSERT), and what to do when the source does not have records but the target does (DELETE).  These can be combined so any combination of those can be used.

    Documentation is found at Microsoft's website:  https://docs.microsoft.com/en-us/sql/t-sql/statements/merge-transact-sql?view=sql-server-2017

  • fahey.jonathan - Friday, November 30, 2018 11:10 AM

    You could also look at the MERGE statement, which will do inserts, updates, and deletes from a source data set to a target data set in a single statement.  There are clauses for what to do when matching records are found (UPDATE), what to do when the source has records and the target does not (INSERT), and what to do when the source does not have records but the target does (DELETE).  These can be combined so any combination of those can be used.

    Documentation is found at Microsoft's website:  https://docs.microsoft.com/en-us/sql/t-sql/statements/merge-transact-sql?view=sql-server-2017

    On second look, your source data is not normalized, so it may be difficult to get your data into a usable format for MERGE.  You could do one column at a time, like this:
    WITH    BaseData
    AS    (
        SELECT    EmpID,
            Value        AS FirstName,
            ROW_NUMBER() OVER (PARTITION BY EmpID ORDER BY ChangeDate DESC) AS RowNum
        FROM    #staging_Employees
        WHERE    ColumnName = 'FirstName'
        )
    MERGE    #dbo_Employees u
    USING    (
        SELECT    EmpID,
            FirstName,
            RowNum
        FROM    BaseData
        WHERE    RowNum = 1
        ) x
        ON    x.EmpID = u.EmpID
    WHEN    MATCHED THEN UPDATE
        SET    FirstName = x.FirstName
    WHEN    NOT MATCHED BY TARGET THEN INSERT
        (EmpID, FirstName)
        VALUES
        (x.EmpID, x.FirstName);

    You could create a more complex query to get your staging data into a normalized format.  I think that would be more difficult, because you have to account of multiple changes to the same item and select only the latest.

  • fahey.jonathanChrisM@Work - Thanks for your inputs. I will consider these both option and choose one on their performance as we are having huge data in staging and target.

  • I'm not sure ChrisM's solution works in the general case - the various MAX's operate independently and so unless you can guarantee the values all ascend in time (which seems unlikely) you won't necessarily get the right end values. Try adding the line:

    (1, 'FirstName', 'Update','A2','2018-02-02')

    to the list and you'll see what I mean. You would need to rank the rows by date and take the most recent for any given field first to ensure you're getting the row values you actually require.

  • @rockys - Is it possible that the same EmpID could have multiple updates for the same column, at the same time? If so, how would you determine which of the multiple values gets updated to the dbo.Employee table?
    Looking at the Staging table, it gives the impression that it's wiped and reloaded on daily basis. If the UpdatedDate really does lack a time component, it could be difficult (more likely impossible) to determine which is the most resent.

  • This was removed by the editor as SPAM

Viewing 9 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply