Loading Fact Tables - Step by Step Instructions Challenge

  • The top 10 are:

    1. What training has each staff member done or need to do

    2. Is each staff member compliant

    3. When next do they need to do their training

    4. Have each department, staff group and organsation as whole met the KPI set out by the organisation

    5. Can we monitor the percentage compliance by department and topic

    6. What is the trend monthly, quarterly, yearly etc

    7. Can we compare present data to historical data

    8. Can we make predictions based on the data

    These are the main questions for the moment.

  • Attendence/Course work is usually modeled by a methodology called a "Factless Fact Table". It may be appropriate for what you are trying to do.

    Here is an article/design tip by the Kimball Group on how to model this as such. While it is not EXACTLY the same veritcal industry, it is a similar concept. Read this and see if stimulates some ideas to help you connect the dots.

    http://www.kimballgroup.com/1996/09/02/factless-fact-tables/

    Also, it sounds to be as though there is an opportunity to create an Employee dimension with a hierarchy to enable you to roll up totals by department. Let me know if you'd like the design tips for modeling department hierarchies for employees.

    Additionally, there are tools out there for creating a more robust date dimension. The Kimball Group also has an excel spreadsheet that you can load in order to do this.

  • Very Good point - what are the top questions.

    Starting at question 1:

    1. What training has each staff member done or need to do

    This implies the following objects:

    Some kind of Staff Member dimension

    Some kind of Certification dimension

    As usual, there is a Time dimension involved here

    Some kind of fact to describe the historical and possibly planned events and metrics around Staff and Certification

    What training do they need to do? This would appear to boil down to an actual vs target situation. If they are at the same granularity then it might make sense to put the 'actual' and 'target' (and many other targets otherwise) into the same fact.

    But it appears you are already some way down the design path. and your current issue is loading the fact.

    Do you want to explain your issue in more detail and we can help.

  • I have been able to upload the dimensions that I will be using (most of them covers the suggestions given).

    My process. I have several source data tables.

    I create a table called Delegate which has as the source key of employee number.

    I then create a surrogate key called DelegateID to use in my Delegate dimension

    In all my other tables I create I am injecting the surrogate key DelegateID.

    e.g.

    SELECT

    [Staff Group]

    ,DelegateID

    ,GroupTypeID

    FROM

    (

    SELECT DISTINCT

    [Staff Group]

    ,del.DelegateID

    ,ROW_NUMBER() OVER(PARTITION BY esr.Employee ORDER BY DelegateID) AS RowNum

    ,CASE WHEN [Staff Group] IN ('Administrative and Clerical', 'Estates and Ancillary') THEN 2

    WHEN [Staff Group] IN ('Allied Health Professionals'

    ,'Add Prof Scientific and Technic'

    ,'Nursing and Midwifery Registered'

    ,'Additional Clinical Services'

    ,'Healthcare Scientists') THEN 1

    WHEN [Staff Group] = 'Medical and Dental' THEN 3

    ELSE 0

    END AS GroupTypeID

    FROM ESR esr JOIN Delegate del

    ON esr.Employee = del.Employee) x

    WHERE RowNum = 1

    The surrogate key for Delegate dimension is in the StaffGroup query. When I create the StaffGroup dimension, the StaffGroupID surrogate key will be created by IDENTITY.

    My dimension will look like:

    StaffGroupID

    DelegateID

    GroupTypeID

    StaffGroup

    Hope that is clear.

    Now, in creating the Fact Table, I am using the DelegateID as the anchor for the composite set of surrogate keys for for my fact

    e.g. of what should happen when the fact table is loaded is (made up data) :

    DelegateID, StaffGroupID, ConsentID, SafeguardingChildrenID, DateID, Status (fact)

    1 , 1, 4, 0, 1200, 1

    1, 1, 0, 5, 14, 1

    0 meaning NA.

    Hope this too is understandable.

    The problem I am facing is that when I try to load the fact table by using a SQL join I am getting data that doesis not correct. An example would be:

    Delegate 1 has done Consent and SafeguardingChildren and so on the JOIN, I will get four records

    Consent and the ConsentDate

    Consent and the SafeguardingDate

    Safeguarding and the SafeguardingDate

    Safeguarding and the ConsentDate

    Two of these records are incorrect. Perhaps my JOINS are not correct

    I tried using the LookUp function in SSIS but am running in to problems with the LookUps as it only loads some of the data. I am also not sure what to look up against. I am assuming it is the Delegate Table.

    This is where I come to a standstill.

    I hope this is clear and makes sense to others apart from myself.

    I would like to thank you all again for taking time to help me. I really do appreaciate and I am reading a lot and learning a lot as well.

    Regards,

    Marvin.

  • Delegate 1 has done Consent and SafeguardingChildren and so on the JOIN, I will get four records

    Consent and the ConsentDate

    Consent and the SafeguardingDate

    Safeguarding and the SafeguardingDate

    Safeguarding and the ConsentDate

    Two of these records are incorrect. Perhaps my JOINS are not correct

    For Delegate1, how many rows do you want to insert?

    1 rows with two measure values and two measure dates.

    2 rows each with a measure and a date and a measure type?

    If 1 row, then you need some kind of pivot, perhaps insert the delegateID and update Consent columns, then the Safeguarding columns (in a staging table).

    If 2 rows then maybe you need separate inserts. Join Delegate to Consent and insert. Join Delegate to Safeguarding and insert.

  • Simply put, your measurement events appear to be:

    1. Getting consent (possibly)

    2. Taking a course (definitely)

    Those events give context to the employee (working in a dept that is part of an ogranization) via a status (that can change based on the number and type of courses taken)? Can the status change and move along finite milestones such as "Not Certified", Certification in Process", "Partially Certified", "Fully Certified"?

    The end goal is compliance, which is based on a status Fully Certified in different areas of study such as "Slip and Fall"?

  • This is correct. My compliance can move through finite steps, however it has to be based on a set of courses having a 'Completed/Passed' status.

  • Is it important to go back in time and ask:

    "What was the compliance status of the department back on 1/1/2009?"

    Or, do your end users merely care about there "here and now" state of an employee's status?

  • Yes, it is important to know previous years deparment's compliance for comparison and trending and later on predictions.

  • Would you say the trending is fixed (as in merely year to year)? Or, do the end users need to be able to compare any periods to any periods (i.e. this quarter to last year, this month to last month, etc)?

    Also, is there a percentage compliance or a Key Performance Metric you must meet (say 80% Cardiac Nurses must be compliant by 2014--or "employee X must be 100% compliant by the end of the quarter or they're outta here!")? Are those metrics published at all?

    There are some designers that would put all the employee's compliance buckets into columns in the employee dimension (if those buckets were fixed) and, as part of the ETL process, update those buckets based upon the courses taken (courses stored in the fact table) or not taken (courses required but not existing in the fact table). It's called the "When A Dimension Becomes A Fact" concept.

    Joy Mundy of the Kimball Group wrote a nice concise article. In relating to this article, think of your employees as the entity and the courses taken as the measurement event.

    http://www.kimballgroup.com/2011/11/01/design-tip-140-is-it-a-dimension-a-fact-or-both/

  • tinyint is one byte, smallint is 2 bytes. Review how many values you can have for each (256 and 65K+ respectively) and use the right one for every dimension.

    I support this concept, but at least in SSAS 2005 the tinyint can't be made into a connection in the DSV. So I avoid it even if it would have been a good choice. For those, the smallint isn't much of an increase and allows you to use negative numbers for the Unknown and (if needed) Not Applicable Bucket.

Viewing 11 posts - 16 through 25 (of 25 total)

You must be logged in to reply to this topic. Login to reply