Dimensional DB Design Question

  • Dear all,

    I am familiar with Dimensional Modelling but here is a type of question I have not had to address before and I don't quite know how to model/handle this.

    We supply financial products to customers. A customer applies through an "Application".

    Depending on various factors, an Application may require an action from an Operator in order to progress. We call this a "Task".

    Once a "Task" is fullfilled, another "Task" maybe necessary and there may be a delay before that next task starts (the end of one task is not necessarily the beginning of the next).

    Tasks are grouped in Categories.

    We would like to be able to answer questions like: "What were the Applications that moved from a Task of category X to a Task of category Y within a particular period?"

    My guess would be that I will have a Fact table of Tasks with the following columns (foreign key to dimension tables)

    - TaskApplication -> DimApplication (which application does this task applies to)

    - Task_StartDateTime -> DimDate

    - Task_EndDateTime -> DimDate

    - TaskType -> DimTask (Type and Category)

    My difficulty is then: how do I (quickly) identify all applications that moved from one task of type X to another of type Y.

    Am I right in thinking that I should have a column in my fact table indicating the Task Rank for each Application (first task for that application is 1, second 2, etc)?

    Eric :unsure:

  • What if you use a factless table to keep track of the application? (Kimball used this method to keep track of which event happened, you could find it in his book.)

    FromTask - Dim Task

    FromDate - Dim Date

    ToTask - Dim Task

    ToDate - Dim Date

    Application - Dim Application

  • You may be right

    I'll have to go back to his books (which I have read some time ago)

    Thanks

  • While this approach will work, it looks like you're creating a highly sparse cube structure. (This is often a a marker, for using BI technologies for othger purposes.)

    From what I can see, you would be as far ahead simply designing a report from your operational data store, rather than trying to model this into a dimensional form. Some general quidelines that I've used on dimensional modelling:

    - The dimension should be fully specified, without regard to other dimensions (or fact tables for that matter).

    - Dimension values should be reusable via the intersection of the fact tables to other dimensions. For example, it should be a valid construct, where both customer A and customer B point to the same application. (I suspect that's not true in this case.)

    - Ideally, fact values have some aggregation associated with them. BI is used to quickly identify trends, do rollups, etc.

    Based on the comments from your original post, I'd expect the following:

    Customer Dimension - track all chars of the customer

    Time dimension - This is likely the "date sold", or when it was initiated.

    Task Type dimension - type, category, etc.

    Product dimension - details on the various financial products

    The "data dimension" represents the application, and it's various stages:

    Measures:

    Time executing task

    Application ID (rollup a distinct count of the application IDs)

    Start date (rollup the minimum date)

    End Date (rollup the maximum date)

    Operator (rollup a distinct count of the operators)

    Note: if the same task can occur multiple times for the same application, then you'll need to add a second time dimension.

    I hope this helps.

  • Kind of... Makes me think!

    I suggested to use a datamart instead of fullfilling the client's report request one by one and we are not planning to use Analysis Services (cubes) for the time being.

    In essence, I am just trying to create an easy base for reporting.

    Our view of the data is also partial as we are only an "intermediary" (the web front-end to mainframe back ends). Our client uses their own datawarehouse inhouse.

    I'll do a bit of reading

    Thanks.

  • Dear all,

    I am familiar with Dimensional Modelling but here is a type of question I have not had to address before and I don't quite know how to model/handle this.

    We supply financial products to customers. A customer applies through an "Application".

    Depending on various factors, an Application may require an action from an Operator in order to progress. We call this a "Task".

    Once a "Task" is fullfilled, another "Task" maybe necessary and there may be a delay before that next task starts (the end of one task is not necessarily the beginning of the next).

    Tasks are grouped in Categories.

    We would like to be able to answer questions like: "What were the Applications that moved from a Task of category X to a Task of category Y within a particular period?"

    My guess would be that I will have a Fact table of Tasks with the following columns (foreign key to dimension tables)

    - TaskApplication -> DimApplication (which application does this task applies to)

    - Task_StartDateTime -> DimDate

    - Task_EndDateTime -> DimDate

    - TaskType -> DimTask (Type and Category)

    My difficulty is then: how do I (quickly) identify all applications that moved from one task of type X to another of type Y.

    Am I right in thinking that I should have a column in my fact table indicating the Task Rank for each Application (first task for that application is 1, second 2, etc)?

    Eric

    Hi Eric,

    Your Fact Table looks just right to me. You simply need a query to answer your question. I see no reason why you can't have a query that interrogates the Fact table twice - something like:

    SELECT DISTINCT FACT_FROM.TaskApplication

    FROM FACT_TABLE FACT_FROM,

    FACT_TABLE FACT_TO,

    DIMDATE DATE_FROM_START,

    DIMDATE DATE_TO_START,

    DIMDATE DATE_TO_END,

    DIMTASK TASK_FROM,

    DIMTASK TASK_TO

    WHERE FACT_FROM.STARTDATETIME = DATE_FROM_START.DATETIME

    AND FACT_TO.STARTDATETIME = DATE_TO_START.DATETIME

    AND FACT_TO.ENDDATETIME = DATE_TO_END.DATETIME

    AND FACT_FROM.TaskApplication = FACT_TO.TaskApplication

    AND DATE_TO_START.DATETIME >= @DATE_FROM

    AND DATE_TO_END.DATETIME <= @DATE_TO

    AND DATE_FROM_START.DATETIME <= DATE_TO_START.DATETIME -- Make sure X occurs before Y

    AND FACT_FROM.TASK_TYPE_KEY = TASK_FROM.TASK_KEY

    AND FACT_TO.TASK_TYPE_KEY = TASK_TO.TASK_KEY

    AND TASK_FROM.TASK_TYPE = 'X'

    AND TASK_TO.TASK_TYPE = 'Y'

    I know the query above is not perfect and I've not used ANSI joins etc but hopefully it shows how you can get the answer to your question using your chosen fact table structure.

  • I must spend a bit of time looking into the factless fact idea. I am in no rush.

    However, if I went for the initial solution, I suspect I still need to work out the ranking between tasks because the precise moment an application "move" from one task of type X to a task of type Y is when the first task of type Y following a type of type X starts...

    I would then have a query like

    [font="Courier New"]SELECT DISTINCT FactFrom.ApplicationKey

    FROM TaskFact TaskFrom

    INNER JOIN TaskFact TaskTo ON TaskFrom.ApplicationKey = Facto.ApplicationKey

    AND TaskFrom.TaskRank = TaskTo.TaskRank - 1

    AND TaskFrom.TaskType = X

    AND TaskTo.TaskType = Y

    WHERE TaskTo.Start_StartDateTime >= @MinDate

    AND TaskTo.Start_StartDateTime <= @MaxDate

    [/font]

    I suspect the factless fact idea is better because it's all worked out upfront in the ETL and the final query gets straight forward, which is pretty much my goal as we won't use cubes.

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply