what is the most efficient way to maintain a denormalized table?

  • Hi,

    We currently maintain a denormalized table which gets updated from 3 tables twice a day. The data in the 3 tables changes a lot and the denormalized table has over 150 Million records. We currently use an SSIS package to gather new data + changed data from the 3 tables and load them to a look up tables and then apply the changes to the denormalized table every 12 hours using batch inserts and deletes. The process runs in about 15 min in a normal run (less than 100000 updated records), but twice a month we run a different process which increase the data volume to about 3 to 4 million records to be denormalized at once, which sometimes takes over 15 hours. The denormalized table currently has over 10 indexes because it gets hit a lot by other processes, and the 10 hour delay that happens occasionally is becoming a major problem right now. I am wondering if there are more efficient ways of maintaining a large denormalized table other than what we currently have. I read about using indexed views, but they are not recommended for tables where data frequently changes. I would appreciate any recommendation.

    Thanks,

    Sean

  • To improve on your 15 hour process, I think we need to know a bit more about it. Can you give us an explanation of how you are 1) getting the data from your source tables; 2) staging the data; 3) tranformations to denormaize the data - when/where is that happening; 4) how are you inserting into destination table.

    Is this all SP driven or are you using SSIS or some other technology? Where is the bulk of the 15 hours spent?

    John Rowan

    ======================================================
    ======================================================
    Forum Etiquette: How to post data/code on a forum to get the best help[/url] - by Jeff Moden

  • 15 Hours? It would appear that, in your case, denormalization for reporting puposes is the wrong thing to do. I'd consider a more normalized approach that uses performance tools such as pre-aggregation in sub-queries, well formed indexes, and a healthy dose of best practices for performance.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 3 posts - 1 through 2 (of 2 total)

You must be logged in to reply to this topic. Login to reply