Bulk operations and indexes

  • Hi experts

    I've got a db, (50GB) where most tables are truncated and refilled each day from an oracle db (An ERP system).

    All tables are transferred without any indexes.

    Some of the transferred tables are with 10-25 million rows.

    I then use these tables for querying and making KPI's and measurement / analysis.

    As time goes these tables grows, and the queries take longer time and timeout are experienced.

    I would now like to make these queries faster by adding sensible indexes to my db, but how should that be done best/smartest when the tables are truncated and refilled each day.

    eg. should I drop the indexes before refilling and then after the refill recreate them or....

    [edit] Please notice that after refill, there is no writing to the DB. Transactionlog is SIMPLE.

    Best regards

    Tommy Bak

  • if you're inserting 10-20 million rows you probably do not want the index in place and the table will load faster, however the index creation will take some time

    -----------------------------------------------------------------------------------------------------------

    "Ya can't make an omelette without breaking just a few eggs" 😉

  • I would not drop/recreate the indexes - I would disable the non clustered indexes before the load and rebuild them after the load.

    To disable: ALTER INDEX {index name} ON {table} DISABLE;

    To enable: ALTER INDEX ALL ON {table} REBUILD;

    The enable will rebuild ALL indexes including the clustered index on the table and enable them. If you see a performance problem with the load because of the clustered index, you would then have two choices:

    Drop the clustered index before the load, recreate the index after the load and then rebuild all NC indexes.

    Or, modify the load so it is loaded in clustered index order.

    Jeffrey Williams
    “We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”

    ― Charles R. Swindoll

    How to post questions to get better answers faster
    Managing Transaction Logs

  • Jeffrey has the solution that will probably work best, but you might want to test a couple scenarios out. Ultimately the way your system responds best is what you want to use.

  • Thank you very much. I look forward to try Jeffrey's solution.

    Br

    Tommy Bak

  • If you are loading millions of rows you should use clustered indexes ONLY if the source data can be loaded in that order. Loading data into a table with a clustered index where the data is in a different order results in terible index fragmentation and will take 10 times longer than before. Building a clustered index on 10 million rows in a random order EVERY DAY is just not worth the time or trouble.

    I have worked on similar issues and it was pretty easy to convince the Oracle dba to provide the data in the correct order. In fact you should find out what the clustered index is in Oracle and consider using it.

    The other tip I would give is to be sure you create covering indexes.

    For example if your select statement looks like

    Select a, b, c

    from XX

    where b > getdate() and e = 'xx'

    then you want to index on b and e and include columns a and c

    Search for "covering indexes" in the forum or Google for more info.

    Let us know how it goes.

  • On 2000, I found after-the-fact creation to be better.

    On 2008 R2, I didn't see much difference; slightly different CPU/Read/Write tradeoff, but nothing to write home about.

    +1 on an ORDER BY clustered index on the insert.

    Watch your FILLFACTORs as well.

  • Steve Jones - SSC Editor (1/16/2011)


    Jeffrey has the solution that will probably work best, but you might want to test a couple scenarios out. Ultimately the way your system responds best is what you want to use.

    This is a good thing to remember, everyone can tell you how they have seen things work, but it is your system with millions of inserts a lot of things come into play. What platform is doing the inserts, how many cores and type you have, how much memory you have, your I/O (also how many spindales and RAID settings), at large scale even the nic card and network traffic can come into play.

    I insert about 3-4 million rows a day via SSIS from flat files, unfortunately it is not a truncating table but rather compounding (only keep a years worth in one database split across monthly tables unioned in a view). I do not drop my clustered index since it is index such that it always appends to the end, but have found that if i keep the 2 other indexes on the table it can take over 12 hours to insert about a million where 30 minutes without and the takes 3 hours to rebuild indexes. The rebuild out weighs the insert with the indexes.

    It is best to try different ways to find your best way to approach.

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply