Possibly missing some change captures

  • Hey guys,
    I am tracking employee changes daily in a DimPerson dimension table, and filling up my fact table each end-of-month and counting Hires, Exits, and Headcount.
    For this example, let's say I will be populating the fact table end-of-month April 30th. Now here's the problem I am facing:
    I have an employee record on April 17th that's a "Hire" action, so at that point in time my DimPerson table reads like this:
    EmpNo | Firstname | LastName | Action | EffectiveStartDate | isCurrent
    4590        John            Smith         Hire         4/17/2017               Y
    Now 2 days later, I see the same employee but with an action "Manager Change", so now my DimPerson table becomes this:
    EmpNo | Firstname | LastName | Action                   | EffectiveStartDate | isCurrent
    4590     John            Smith           Hire                              4/17/2017               N
    4590     John            Smith           Manager Change         4/19/2017               Y

    So at Month end, when I select all "Current" employees, I will miss the Hire capture for this person since his most recent record is just a manager change and the actual hiring happened "in-month". 
    What you recommend I do to capture the Hire action in this case?

  • Regardless of best practices and methodologies from proven methods, your design at the core is saying that new hire record is not an active record. Therefore, when you select all records that are active, you only get the records that meets the rules of your design. Classic case of, working as intended.

    The easiest approach here is to make both records active with the "Y" flag. This would result in having duplicate records for mister Smith. This of course is not the end of the world, but unless the end user knows that, you could cause a lot more trouble than it's worth with those duplicates in the dim table on top of violating any row-level unique constraints.

    Another easy approach is to actually do nothing. It's not like you don't have a log of the history. The purpose of your Type 2 dimension is that you are tracking the history of change for that employee whether it's the hire date, manager change, becomes a manager or even departs the company. At some point, all of these changes can happen within the same month, logically. Thus, you should design around the fact you do have that history. What that means is you don't complicate your dim design, you instead take a different approach to counting these changes elsewhere.

    I would likely take the last option. I would alter my systems on how I count these changes. Being both of these changes for the same employee happens on different days, you're basically saying that you fine with counting history. The question is whether you're fine with counting history for all changes or just new hires.

    i.e.: something like this:


    SELECT
         MONTH([EffectiveStartDate]) AS [Month]
        ,COUNT(CASE WHEN [isCurrent] = 'Y' AND [Action] <> 'Exit' THEN 1 ELSE NULL END) AS [Headcount] --Should only be one active per employee who is not exited
        ,COUNT(CASE WHEN [Action] = 'Hire' THEN 1 ELSE NULL END) AS [Hires] --This is counting across actives and non-actives
        ,COUNT(CASE WHEN [Action] = 'Exit' THEN 1 ELSE NULL END) AS [Exits] --Same as hires in case someone gets rehired same month.
    FROM DimPerson 
    WHERE [EffectiveStartDate] >= '4/01/2017'
    AND [EffectiveStartDate] < '05/01/2017'
    GROUP BY
         MONTH([EffectiveStartDate])

    This should count across hires and exits regardless of active states. If you have 10 hires and they all quit within the same month, that should mean 10 Hire, 10 Exit. The question is whether or not you count these 10 as part of the headcount. This is when the isCurrent and Action not equal to 'Exit' comes into play. The 10 Hire records will have 'N' for their isCurrent flags and you're ignoring Exit records. Thus, they will not be part of the Headcount. If this is a count of active employees by the end of the month, the headcount will be right. Those 10 employees are not active by the end of the month because they quit to find greener pastures.

    At the end of the day, the morale of the story here is that you shift the isCurrent = 'Y' filter from your WHERE clause to your COUNT(CASE).

  • Thank you so much for the detailed response. One thing though, you are doing a query directly on the DimPerson table. Shouldn't we be joining this query into the PersonSnapshotFact table?

    Something like:


    INSERT INTO PersonSnapShotFact
    (
       PersonKey
     , HireCount
     , ExitCount
     , HeadCount
    )
    SELECT
      (SELECT PersonKey FROM DimPerson p WHERE p.PersonNum = stg.PersonNum AND p.isCurrent = 'Y')
    , (CASE WHEN `isCurrent`='Y' AND `Action` <> "Exit" THEN 1 ELSE 0 as HeadCount
    , (CASE WHEN `Action`="Hire" THEN 1 ELSE 0 as ExitCount)
    , (CASE WHEN `Action`="Exit" THEN 1 ELSE 0 as HireCount)
    FROM
    PersonStg stg
    WHERE EffectiveStartDate >= '4/01/2017'
    AND EffectiveStartDate < '05/01/2017'

    Sorry I am fairly new to Data Warehousing 🙂 I think this is how we would build up the fact table right?

  • Well yeah, was just using it for reference. I don't know how you're currently populating your Fact. I would assume though that you would use INSERT INTO WHERE NOT EXIST or MERGE statements too if you have primary keys on you Fact data in production.

Viewing 4 posts - 1 through 3 (of 3 total)

You must be logged in to reply to this topic. Login to reply