Creating Clustered index on a large table

  • I have a fairly large table with roughly 120 million rows. It has a primary key composed of 3 integer fields but, its non-clustered 🙁

    Now, to improve performance on certain queries we would like to cluster the primary key. Is there a quick and efficient way of doing it without causing much downtime? The table is an archive and doesnt get accessed very frequently in the normal day to day operations.

    Thanks in advance for any help provided.

    Regards

  • What is the physical size of the table? Do you have enough free space in the database to create another copy of this table?

    This script will tell you the size of the database files, and the space used by each table.

    Script to analyze table space usage:

    http://www.sqlteam.com/forums/topic.asp?TOPIC_ID=61762

  • Thanks for the response Michael

    The table occupies 48GB of Space (42 data + 6 index)

    And I have enough space on the server to copy the table

    Cheers!

  • Try creating indexes with Online option. This option lets you query and modify data in underlying tables during index creation.

    Online = ON

  • You could just drop the primary key and add a new one. The disadvantage is that it will happen in a single transaction and the transaction log will get large, probably at least as large as your table.

    Another method would be:

    1. Rename the old table and any associated objects, like primary key, defaults, triggers, etc.

    2. Create a new table with a different, temporary name with the type of primary key you want, but without any indexes or unique constraints.

    3. Insert the data from the old table in order by primary key using a DTS datapump or SSIS making sure it is done in fairly small batches, like say 100,000 to 500,000 rows.

    4. Create the remaining indexes, unique constraints, defaults, triggers, and other associated objects.

    5. Rename the table to the correct name.

    6. Drop the old table.

    If the table is referenced by foreign keys, you will have to drop them before and add them again when done.

    If you have a test server, restore a copy of the DB there to try it out there first.

  • Excellent Michael.....Thanks for your help on this!

    I will obviously test this out on a backup copy.....but, this should do the trick 🙂

    Michael Valentine Jones (6/30/2010)


    You could just drop the primary key and add a new one. The disadvantage is that it will happen in a single transaction and the transaction log will get large, probably at least as large as your table.

    Another method would be:

    1. Rename the old table and any associated objects, like primary key, defaults, triggers, etc.

    2. Create a new table with a different, temporary name with the type of primary key you want, but without any indexes or unique constraints.

    3. Insert the data from the old table in order by primary key using a DTS datapump or SSIS making sure it is done in fairly small batches, like say 100,000 to 500,000 rows.

    4. Create the remaining indexes, unique constraints, defaults, triggers, and other associated objects.

    5. Rename the table to the correct name.

    6. Drop the old table.

    If the table is referenced by foreign keys, you will have to drop them before and add them again when done.

    If you have a test server, restore a copy of the DB there to try it out there first.

  • If you have the Enterprise Edition, now would be a really good time to make the choice as to how to partition the table.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • I could make a partition based on Fiscal Year and I think that would help the performance a lot. Can you please point me in the right direction to do partitioning on a existing table?

    I could copy this table to another location with partitioning turned on, build the clustered index and then simply swap the new table....

    Does this sound far-fetched?

    Jeff Moden (7/1/2010)


    If you have the Enterprise Edition, now would be a really good time to make the choice as to how to partition the table.

  • anish_ns (7/1/2010)


    I could make a partition based on Fiscal Year and I think that would help the performance a lot. Can you please point me in the right direction to do partitioning on a existing table?

    I could copy this table to another location with partitioning turned on, build the clustered index and then simply swap the new table....

    Does this sound far-fetched?

    Jeff Moden (7/1/2010)


    If you have the Enterprise Edition, now would be a really good time to make the choice as to how to partition the table.

    My recommendation would be to read about table partitioning in Books Online before you even think about making the attempt. Also, you MUST have the Enterprise Edition to do table partitioning. You need to verify THAT first.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Once again we are moving into the world where a professional would be worth his/her weight in gold - or at least silver (metals are SOOOO inflated in price these days)!:-D

    Partitioning is a VERY advance topic with MANY ways you can go astray or perform suboptimally.

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • TheSQLGuru (7/2/2010)


    Once again we are moving into the world where a professional would be worth his/her weight in gold - or at least silver (metals are SOOOO inflated in price these days)!:-D

    Partitioning is a VERY advance topic with MANY ways you can go astray or perform suboptimally.

    Actually, see what Kevin wrote above? He's absolutely correct. You can hang yourself pretty good if you do partitioning incorrectly and by hang yourself I mean lose data, kill a server, lose your job for doing it so badly. Hiring someone like Kevin for a couple of days IS worth it's weight in gold. The good ones even show you what they're doing so if you ever run into another problem, you can talk intelligently over the phone to fix it so it won't cost you so much.

    Trust me... it's worth finding someone like Kevin. If you can setup a temporary remote login, someone like Kevin can fix you up same day.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Is your composite index sequential, and the columns never update?

    because it needs to be to work well as a clustered index ...

    you can also use ASC DESC to ensure the physical order is sequential if some of the fields descend in value

  • Back to the original question, it could be fastest to build your clustered index with SORT_IN_TEMPDB option, assuming you have a sizeable tempdb on a fast IO system.

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • Jeff Moden (7/2/2010)


    TheSQLGuru (7/2/2010)


    Once again we are moving into the world where a professional would be worth his/her weight in gold - or at least silver (metals are SOOOO inflated in price these days)!:-D

    Partitioning is a VERY advance topic with MANY ways you can go astray or perform suboptimally.

    Actually, see what Kevin wrote above? He's absolutely correct. You can hang yourself pretty good if you do partitioning incorrectly and by hang yourself I mean lose data, kill a server, lose your job for doing it so badly. Hiring someone like Kevin for a couple of days IS worth it's weight in gold. The good ones even show you what they're doing so if you ever run into another problem, you can talk intelligently over the phone to fix it so it won't cost you so much.

    Trust me... it's worth finding someone like Kevin. If you can setup a temporary remote login, someone like Kevin can fix you up same day.

    Thanks for the props Jeff! 😎 Oh, BTW, I just picked up 2 new quickie tuning clients in the last week. One had autoshrink on and neither had ANY index maintenance or statistics update jobs. And those were just the tip of some very large icebergs! :w00t:

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • TheSQLGuru (7/7/2010)


    One had autoshrink on and neither had ANY index maintenance or statistics update jobs.

    BWAA-HAAA!!!! C'mon now, Kevin! Everyone knows that having autoshrink turned on is the only maintenance you need, right? :hehe: And what's with all the index maintenance you're talking about??? :blink: Just get rid of all the indexes and you're on easy street. ;-):-P:-D:laugh::Whistling:

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 15 posts - 1 through 15 (of 18 total)

You must be logged in to reply to this topic. Login to reply