Fact/Dimension design suggestions

  • I'm from Higher Education and I'm working on a warehouse so our recruitment folks can compare application counts of certain statuses between today and a year ago today.

    The data I'm working with from our ERP has a table that holds application records - there is a record for every application and what term they are starting in, among other things. There is a related table that holds the status associated with that application along with the date that status was made active.

    I need to display a nice, neat, easy to read graph showing the counts between the two dates but here's the kicker - for both dates I only want to display start terms that are/were being recruited for on that date.

    Any suggestions on building my DW database?

  • Bear with me, I'm going to try to extract every assumption I can make from this but there's not a lot to work with yet. I'm doing it so I can get you to correct any inaccurate assumptions, and we can go from there.

    wshelton 51985 (1/8/2013)


    I'm from Higher Education and I'm working on a warehouse so our recruitment folks can compare application counts of certain statuses between today and a year ago today.

    So, we're counting daily events by categories (status and/or application)?

    The data I'm working with from our ERP has a table that holds application records - there is a record for every application and what term they are starting in, among other things. There is a related table that holds the status associated with that application along with the date that status was made active.

    So, a table holding a list of applications, and another table that holds a status event for a particular date with a key relationship.

    I need to display a nice, neat, easy to read graph showing the counts between the two dates but here's the kicker - for both dates I only want to display start terms that are/were being recruited for on that date.

    You've combined two rules here.

    First, you need to limit your data. You only want data that the application table shows a start date within your range? The data in the graph will also be limited to status events that fall within the parameters.

    Next, you want a graph of some kind. This is a display component and won't be part of your warehouse directly but some other tool, probably SSRS or Excel. Counts are to be done by total? By month? By day? Need more information here.

    So, some questions:

    You mention trying to turn this into a fact/dimension structure, but it sounds like you already have that. Your status event table is your fact table with your applications table being a dimension of that. Are you trying to push this into SSAS cubes or something like that, or are you looking to build a new structure? Most of what you've described can be leveraged out of the existing structure, from what it sounds like.

    Next, any chance we can see some schema and sample data for your existing tables? If you check out the first link in my signature it'll walk you through what I (and some others) would like to see here to get a better idea of what you're dealing with.

    Finally, where are you looking to display this end product in? I doubt your users will be logging into SSMS so my guess is there's a front end somewhere that you want to show this on.


    - Craig Farrell

    Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

    For better assistance in answering your questions[/url] | Forum Netiquette
    For index/tuning help, follow these directions.[/url] |Tally Tables[/url]

    Twitter: @AnyWayDBA

  • Sorry for the delay in responding - it's been crazy.

    I need to count not daily events - the number of events that took place on that date. I need to count the number of applications with a certain status on a certain date. The status could have a date of a previous date but it's still considered the 'active' status until it changes to something else.

    All the data i'm working with so far is in our ERP database - i'm simply trying to figure out how to include all this in a warehouse - I have no fact/dimension structure built.

    Once this is in place I need to take it a step further and implement the other item along with this - get the status totals on a certain date for ONLY the terms that were being recruited for on that day.

    It's crazy and my head starts spinning every time I start working on it.

  • OK, here's the tables:

    CREATE TABLE [dbo].[APPLICATIONS](

    [APPLICATIONS_ID] [varchar](10) NOT NULL,

    [APPL_APPLICANT] [varchar](10) NULL,

    [APPL_START_TERM] [varchar](7) NULL,

    CONSTRAINT [PK_APPLICATIONS] PRIMARY KEY CLUSTERED

    (

    [APPLICATIONS_ID] ASC

    )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [IDX]

    ) ON [IDX] TEXTIMAGE_ON [PRIMARY]

    GO

    CREATE TABLE [dbo].[APPL_STATUSES](

    [APPLICATIONS_ID] [varchar](10) NOT NULL,

    [APPL_STATUS] [varchar](5) NULL,

    [APPL_STATUS_DATE] [datetime] NULL,

    CONSTRAINT [PK_APPL_STATUSES] PRIMARY KEY CLUSTERED

    (

    [APPLICATIONS_ID] ASC,

    )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [IDX]

    ) ON [IDX]

    GO

    The APPLICATIONS table relates to the APPL_STATUSES table by the APPLICATIONS_ID. Whenever a status of an application changes a record is written in APPL_STATUSES along with the status and the date. That is considered the active status until another one comes along with a new date.

    The users want a dashboard to display all this and we are looking at some combination of SSRS/Sharepoint 2010/PerformancePoint.

  • wshelton 51985 (1/14/2013)


    All the data i'm working with so far is in our ERP database - i'm simply trying to figure out how to include all this in a warehouse - I have no fact/dimension structure built.

    If all you need is a report, you can get it directly from the ERP tables.

    On the other hand, if you are planning in building a Data Warehouse, you do not plan a Data Warehouse based on a single report - at least you plan for a whole Datamart e.g. one FACT table describing whatever is the Datamart's domain surrounded by as many DIMensions as needed, a basic star schema.

    Posts suggest a heavy inclination in looking at it from a developer's point of view - that's pretty bad news when planning on a Data Warehouse.

    _____________________________________
    Pablo (Paul) Berzukov

    Author of Understanding Database Administration available at Amazon and other bookstores.

    Disclaimer: Advice is provided to the best of my knowledge but no implicit or explicit warranties are provided. Since the advisor explicitly encourages testing any and all suggestions on a test non-production environment advisor should not held liable or responsible for any actions taken based on the given advice.
  • The APPLICATIONS table relates to the APPL_STATUSES table by the APPLICATIONS_ID. Whenever a status of an application changes a record is written in APPL_STATUSES along with the status and the date. That is considered the active status until another one comes along with a new date.

    Just confirming, what you have in statuses is something like this:

    INSERT INTO APPL_STATUSES

    VALUES ( 1, 'start', '20100101'),

    ( 1, 'other', '20100115'),

    (1, 'ended', '20100119')

    Now, you want to count this application as start on each day for 01/01 through 01/14, then 01/15 through 01/18 is other.

    So, if you wanted this to be easy, it'd really look like this:

    INSERT INTO APPL_STATUSES

    VALUES ( 1, 'start', '20100101'),

    ( 1, 'start', '20100102'),

    ( 1, 'start', '20100103'),

    ( 1, 'start', '20100104'),

    ( 1, 'start', '20100105'),

    ( 1, 'start', '20100106'),

    ( 1, 'start', '20100107'),

    ( 1, 'start', '20100108'),

    ( 1, 'start', '20100109'),

    ( 1, 'start', '20100110'),

    ( 1, 'start', '20100111'),

    ( 1, 'start', '20100112'),

    ( 1, 'start', '20100113'),

    ( 1, 'start', '20100114'),

    ( 1, 'other', '20100115'),

    ( 1, 'other', '20100116'),

    ( 1, 'other', '20100117'),

    ( 1, 'other', '20100118'),

    (1, 'ended', '20100119')

    And then you could simply grab the status on any particular day and sum it up.

    Now, how long does the 'ended' status go for and how long does it need to be considered? I assume something that 'ended' four years ago would really not be part of the report anymore?


    - Craig Farrell

    Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

    For better assistance in answering your questions[/url] | Forum Netiquette
    For index/tuning help, follow these directions.[/url] |Tally Tables[/url]

    Twitter: @AnyWayDBA

  • I believe Paul B answered best but I will reiterate; why does a Data Warehouse need to be built in this instance?

    You could probably create a linked server to the ERP System and run a very simple aggregated OPENQUERY statement such as;

    Select count(*), Application_Status, Application_Date from OPENQUERY(ERP, 'select Application_status, Application_date from library.table')

    group by application_status, Application_Date.

    That may not be 100% what you are after, but after reading the request I believe that is close.

    Data Warehouses store volumes of data that generally span multiple business functions (i.e, financials, sales order, operational data etc)

    At best you have a very limited and localized Data Mart.

    Link to my blog http://notyelf.com/

  • The user wants to compare this year to last. You can create two temp tables representing counts this year and last, grouped on date (no time portion) and status. Join the two recordsets together using somthing like

    ...

    on datepart(d, t.dateLastYear)=datepart(d, v.dateThisYear) and

    datepart(m, t.dateLastYear) =datepart(m, v.dateThisYear)

    You may want to default the counts to zero in the temp tables if you think there will be days where the count is zero ( not likely from what I understand). but if so do a full outer join on the two sets of records.

    I would not go through the trouble of building a dimensional database then a cube just for this ad-hoc query, even though you may understand it is easier in MDX.

    ----------------------------------------------------

  • The Kimball Group usually describes the application process as an event pipeline with predictible milestones. As such, if you were to design a data warehouse like structure for this process then the accumulating snapshot is a nice choice. Here is the section of the website describing accumulating snapshots:

    http://www.kimballgroup.com/2002/06/13/design-tip-37-modeling-a-pipeline-with-an-accumulating-snapshot/

    What's more? He even uses something very close to your business requirement in this article!

  • there is a record for every application and what term they are starting in, among other things

    I wouldn't be surprised if the OP has something similar looking in the relational database already.

    ----------------------------------------------------

Viewing 10 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply