Which is faster?

  • Hi there,

    I have a very large table which I have to copy from one database to another.

    I'm using a SSIS-Package to do that. The table as about 160 million rows with a total size of 40 GB.

    The Clustered Index is on the ID column (INT IDENTITY(1,1)) which can be recreated and does not need to be copied.

    Now I was wondering which fastest way:

    1.) Have an empty table with no indexes --- copy the data to the new table -- recreate the clustered index and all other indexes

    OR

    2.) Have an empty table with the clustered index on the ID column (IDENTITY(1,1)) -- copy the data to the new table -- recreate the additional NONCLUSTERED indexes

    Can anybody help?

    Thanks

    Regards,

    MrAkki

  • I would go with option one as heap inserts are a lot quicker,

    however he best answer would be it depends, so I would test them both and see what is the fastest.

  • Does it really matter since the clustered key is an ever-increasing identity value?


    N 56°04'39.16"
    E 12°55'05.25"

  • That's what I'm wondering....

    After the insert with no Clustered Index I have to create it, doesn't it have to completely write the whole table again to the database to create the clustered index? :crying:

    Doesn't that mean the server has to write the 40GB twice?

    Thanks,

    Regards,

    MrAkki

  • I would go with Second option.

    -------Bhuvnesh----------
    I work only to learn Sql Server...though my company pays me for getting their stuff done;-)

  • Hi All,

    Will bet on the first option, why?

    you have to copy the data on both option so there is not gain were.

    But on the cluster index, this is where the difference will be noticed.

    On the first option the SQL Server will only work with the index one time by creating it.

    On the second option the SQL Server will work with the index on every insert, updating the index.

    But as was said, the best thing is to try both option and see.

  • How about option 3?

    1. Use BCP to copy the data out.

    2. Create the new table, without the identity. Leave it as an integer(or a bigint), and do not add the indexes.

    3. BULK INSERT, or BCP, the values back in.

    4. Change the field to an identity.

    5. Add your indexes.

    SSIS has a bit more overhead than BCP, so that should gain a bit of speed.

    Insert operations on a table with an index has more overhead than inserting into a heap.

    Michael L John
    If you assassinate a DBA, would you pull a trigger?
    To properly post on a forum:
    http://www.sqlservercentral.com/articles/61537/

  • Michael L John (11/5/2010)


    How about option 3?

    1. Use BCP to copy the data out.

    2. Create the new table, without the identity. Leave it as an integer(or a bigint), and do not add the indexes.

    3. BULK INSERT, or BCP, the values back in.

    4. Change the field to an identity.

    5. Add your indexes.

    SSIS has a bit more overhead than BCP, so that should gain a bit of speed.

    Insert operations on a table with an index has more overhead than inserting into a heap.

    You cannot alter a column to add/remove the identity property. You would have to create a new column completely and then drop the old one.

    For an identity field, create a clustered index, insert the data, then create your nonclustered indexes. This should be the fastest overall method.

  • I stand corrected. It's early, and the caffiene hasn't kicked in.

    I still think that leaving the clustered index off will be faster when performing the insert. That being said, the net gain on the insert may be offset by the creation of the index after the table is populated.

    Michael L John
    If you assassinate a DBA, would you pull a trigger?
    To properly post on a forum:
    http://www.sqlservercentral.com/articles/61537/

  • Michael L John (11/5/2010)


    I stand corrected. It's early, and the caffiene hasn't kicked in.

    I still think that leaving the clustered index off will be faster when performing the insert. That being said, the net gain on the insert may be offset by the creation of the index after the table is populated.

    I only found that out recently when I tried to convert an INT column to identity that the devs had code to increment manually on every insert.....ugh.

    There probably isn't much of a difference with the clustered index before and after, since the data will already be logically ordered anyway. But, definitely create the nonclustered indexes after all the data has been inserted.

  • After I though about your statement:

    You cannot alter a column to add/remove the identity property. You would have to create a new column completely and then drop the old one.

    I questioned the validity of this. I was pretty positive that you can make an int into an identity.

    So, I created a test table, filled it with a few thousand rows, and then changed the data type on the ID field to an identity. It worked with no problem, and the values in the table were preserved.

    I then added a few thousand more records, and everything woked as expected.

    Exactly what behavior will not work?

    Michael L John
    If you assassinate a DBA, would you pull a trigger?
    To properly post on a forum:
    http://www.sqlservercentral.com/articles/61537/

  • Michael L John (11/5/2010)


    After I though about your statement:

    You cannot alter a column to add/remove the identity property. You would have to create a new column completely and then drop the old one.

    I questioned the validity of this. I was pretty positive that you can make an int into an identity.

    So, I created a test table, filled it with a few thousand rows, and then changed the data type on the ID field to an identity. It worked with no problem, and the values in the table were preserved.

    I then added a few thousand more records, and everything woked as expected.

    Exactly what behavior will not work?

    How did you do this? Did you do it through the GUI? If so, the gui creates an entirely new temporary table with an identity column, inserts the data, and then removes the old table and replaces it with the new one.

  • With further testing, I discovered that we are both correct.

    Yes, I originally changed it through the GUI. In a table this size, that could cause the system to slow to a complete crawl.

    When I tried various ALTER TABLE syntax, all failed.

    So, I stand by your original post!

    Michael L John
    If you assassinate a DBA, would you pull a trigger?
    To properly post on a forum:
    http://www.sqlservercentral.com/articles/61537/

  • I'd go for a mix of option 1 & 3.

    You can create the table with the identity column defined, but with both SSIS and BCP you can "enbale identity insert". With BCP use the -E option.

    I'd then build all indexes after the insert is completed. As somone pointed out the issue of having the indexes in advance is that every insert causes a hit on the index, with an insert and or update, and if your original data wasn't select with an order by clause, you can't be sure your data is in cluster index order. This means you may do a lot of repeated data schuffles to get it all done.

    It's much faster to do a post load index create, but you should test both options any way.

    In general I've found BCP out and in faster than SSIS up to SQL 2005. (I haven't test SQL 2008, but I've heard that SSIS is faster in 2008.)

    Cheers

    Leo

    Leo
    Nothing in life is ever so complicated that with a little work it can't be made more complicated.

  • Definitely: Create the table with IDENTITY property and (only) the Clustered Index before loading. Ensure the destination database is in BULK_LOGGED or SIMPLE recovery (to enable minimally-logged bulk load).

    Configure the SSIS package to read the data from the source in Clustered Index order, and enable Identity Insert on the destination, specify that the data is arriving in cluster key order, and take the required table lock.

    This way, the data is read and written using sequential IO, on multiple threads, using minimal logging.

    Once the data transfer is complete, re-create the non-clustered indexes.

    See the Data Loading Performance Guide

Viewing 15 posts - 1 through 14 (of 14 total)

You must be logged in to reply to this topic. Login to reply