Effective date updation in Fact records

  • HI Team

    This would be a common problem and I tried to search the net but could not find a good answer. The problem is

    Say the model of the fact table is an insert only model , where the records are always getting inserted by the ETL process. A separate process would search if the inserted record already exist and if it does, will mark the end_date of that record . The new record will have an end_date as null.

    I have achieved it through a self join on the table. We use Informatica , and a target lookup would have also helped, but caching the full table is expensive. Is there any other way to do it ?

    Thanks

  • Output the new facts to a working table. Then use T-SQL to insert the new records and update the old ones.

    Jez

  • While you're at it, I recommend making the new end-date = '9999' (equals 9999-01-01 when inserted into a DATETIME column) so that you're queries aren't plagued by the use of ISNULL or IS NULL and makes the possibility of future dating not only possible but identical in nature to normal date searches.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Yes Jez we have that implementation too . Thanks for suggesting that .

    I like Jeff's idea.Will do it that ways .

  • Bringing an old topic alive , this is what I am thinking .

    We have a fact table that is , say 1000 columns wide and then lets say we have an additional column , an isactive column which , if 1 means that the fact record is active and if 0 means the record is inactive.

    NOw if the same record comes in with the latest ETL , the older version of the record needs to be set to isactive = 0 and the latest one retains as isactive = 1 .

    Lets say , we take the PK , the Natural Key and the isactive column to a separate table and instead of updating the 1000 column wide table on one flag column , we update this new table to switch the flags.

    Downside is , the select queries have to join the 2 tables to get the active records.

    Is there any obvious performance difference between updating 1 column in a 4 column wide table vs 1 columns in a 1000 column wide table  ?

Viewing 5 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply