Leveraging MERGE for type 2 SCD; logic for most current record

  • I am a new data warehouse developer and trying to leverage the MERGE statement to manage type 2 SCD's. I've been consulting many technical articles regarding MERGE and SCD's and have a base understanding of the statement but notice many variations inits construction.

    Im uncertain if I can leverage the statement for the application that I need an looking for guidance on syntax structure. In short, I have a Job table that marries a person to a position. The unique keys of the source system table and what I'd like to keep the history of are:

    EMPID; the employeeID

    RCD; unique instanceof a job for an employee with multiple concurrent jobs

    EFFDT; when thetransaction became or becomes formally active or operational

    EFFSEQ; indicatessequence of multiple transactions for an employee on a job

    Theres not a traditional primary key on this table, but an index primary key based on the above 4 fields- A unique instance of a record, in this case job, are the culmination of these fields.

    There can be more than one instance of a job for an employee, there are effectively dated records/version of those (unique) instances. What I’d like to do is 1) insert initial records then new records going forward into the Target or Dimension and then 2) mark the most recent, valid record as “Y” (or ISCURRENT = ‘Y’). The most recent record would be the MAX(EFFDT) and where EFFSEQ = 0 for each RCD or job instance - the image above is an example, this person has 2 jobs - there have changes to those job's code, department and position.

    I have a generic MERGE statement, WHEN NOT MATCHED BY TARGET, insert the records, but in the WHEN MATCHED part im a bit at a loss on how/where in syntax to isolate the most recent record (MAX(EFFDT) and where EFFSEQ = 0) OR if I should be nesting the logic in WHEN NOT MATCHED.

    Any guidance on structure or direction is much appreciated.

    _____________________________________________________________________
    As soon as you see something, you already start to intellectualize it. As soon as you intellectualize something, it is no longer what you saw. Suzuki-roshi

  • Regarding RCD >

    unique instanceof a job for an employee with multiple concurrent jobs

    . Can you explain more here what you mean by a unique instance? Why could the first three fields not fulfill the need of uniqueness? I guess I would like to understand the business process better here. 

    The record that is the most recent will have a NULL in the expire date, a column I think you are missing. This will equal the effective date of the next record that should enter the table for that entity whose changes you are tracking. 

    ----------------------------------------------------

  • Hi SSCoach, 

    Sure, and although its been a while I'm still acclimating to this source data set. As this table marries a job to an employee, an employee can have more than one job to compromise of their employment. There is a numeric measure later on in this table, FTE or "full-time equivalent". So this employee may have an FTE of .5 for position number 403 in the BIO department and another position number 404 with an FTE of 0.5 in the CHEM department for a total FTE of 1.0. Most will have a unique record(s) where there is only one RCD value, 0. If they have more than one job they'd have 0, 1, or even more - each distinct RCD record to an employee (or EMPID) would map to different position number, jobs codes or FTE, the first three fields wouldn't capture that requirement. 

    Ok, I can add an EXPDT field, that makes sense, I have been focused on an ISCURRENT field (a field you see often when looking at technical article re:MERGE). So where I am stuck or may be missing something is how (logic wise), and where in the MERGE syntax, would I nest that logic. 

    Does that explanation help?

    "This will equal the effective date of the next record that should enter the table for that entity whose changes you are tracking."
    Im not fully understanding, pondering this statement and such logic.

    Thank you for your review and mentions on this

    _____________________________________________________________________
    As soon as you see something, you already start to intellectualize it. As soon as you intellectualize something, it is no longer what you saw. Suzuki-roshi

  • vega805 - Monday, April 16, 2018 3:05 PM

    Hi SSCoach, 

    Sure, and although its been a while I'm still acclimating to this source data set. As this table marries a job to an employee, an employee can have more than one job to compromise of their employment. There is a numeric measure later on in this table, FTE or "full-time equivalent". So this employee may have an FTE of .5 for position number 403 in the BIO department and another position number 404 with an FTE of 0.5 in the CHEM department for a total FTE of 1.0. Most will have a unique record(s) where there is only one RCD value, 0. If they have more than one job they'd have 0, 1, or even more - each distinct RCD record to an employee (or EMPID) would map to different position number, jobs codes or FTE, the first three fields wouldn't capture that requirement. 

    Ok, I can add an EXPDT field, that makes sense, I have been focused on an ISCURRENT field (a field you see often when looking at technical article re:MERGE). So where I am stuck or may be missing something is how (logic wise), and where in the MERGE syntax, would I nest that logic. 

    Does that explanation help?

    "This will equal the effective date of the next record that should enter the table for that entity whose changes you are tracking."
    Im not fully understanding, pondering this statement and such logic.

    Thank you for your review and mentions on this

    I still see other columns like JobID and JocCode that add a little confusion. Maybe those are irrelevant to what you want. 

    Simply put , all my statement says is something like the following illustration
    EMP   JOB   From         To
    -------------------------------------------
    1         A       12/1/2016  1/31/2017
    1        B        1/31/2017  5/7/2017
    1        A         5/7/2017    NULL
    2        A         1/31/2017   5/7/2017
    2        C         5/7/2017    NULL

    The TO date of the current record is the FROM date of the next row in the group (if there is a next row). 
    In the above employee 2 did employee 1's job while emp 1 was doing job B, just a hypothetical illustration.

    ----------------------------------------------------

Viewing 4 posts - 1 through 3 (of 3 total)

You must be logged in to reply to this topic. Login to reply