Deactivating Irrelevant Data in a Warehouse

  • In a school context.

    I have data that I source from a operational database that contains a link between a student and subject and the level of the subject ... basically indicating whether a student is taking a subject.

    Using ETL I process the data and load this data into a Bridge table in the Data Warehouse using a MERGE statement - new records are added and changes in the level is updated.

    However it often happens that a student is incorrectly linked to a subject and the link later removed. However the link is already loaded into the warehouse. So basicly I am stuck with irrelevant Student Subject links in the Bridge table - which is giving inaccurate Students per Subject counts.

    My options are as follow:
    1. Delete any data in the Bridge table that was not present in the Staging stable.
    Or
    2. Flag the unmatched record as inactive. Record can be activated again if the a source record is again supplied.

    Option 1 seems somewhat drastic, my feelings is more towards Option 2. Would this approach be the best for most data table where your source data is based on ever changing data.

    Would appreciate some opinions.

    [font="Courier New"]-----------------------------------------------------------------------------------------------------------------------
    😀 Arguing with Programmers are like wrestling a pig in mud, soon you realize the pig is enjoying it. 😀
    -----------------------------------------------------------------------------------------------------------------------[/font]

  • What you've described is essentially a slowly changing dimension. Take a look at Kimball's advice and patterns how to slowly changing dimensions.

    http://www.kimballgroup.com/2008/09/slowly-changing-dimensions-part-2/

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

  • Well there is DimStudent and DimSubject which is handled as proper Dimension tables. However a link between those two is captured in BridgeStudentSubject. Applying SCD rules to the bridge table seem logical enough.

    [font="Courier New"]-----------------------------------------------------------------------------------------------------------------------
    😀 Arguing with Programmers are like wrestling a pig in mud, soon you realize the pig is enjoying it. 😀
    -----------------------------------------------------------------------------------------------------------------------[/font]

Viewing 3 posts - 1 through 2 (of 2 total)

You must be logged in to reply to this topic. Login to reply