Which is faster?

Question

Post reply

Which is faster?

MrAkki

SSCrazy

Points: 2574
More actions
October 28, 2010 at 2:14 am

#230956

Hi there,
I have a very large table which I have to copy from one database to another.
I'm using a SSIS-Package to do that. The table as about 160 million rows with a total size of 40 GB.
The Clustered Index is on the ID column (INT IDENTITY(1,1)) which can be recreated and does not need to be copied.
Now I was wondering which fastest way:
1.) Have an empty table with no indexes --- copy the data to the new table -- recreate the clustered index and all other indexes
OR
2.) Have an empty table with the clustered index on the ID column (IDENTITY(1,1)) -- copy the data to the new table -- recreate the additional NONCLUSTERED indexes
Can anybody help?
Thanks
Regards,
MrAkki

Viewing 15 posts - 1 through 15 (of 15 total)

You must be logged in to reply to this topic. Login to reply

steveb. SSC-Forever Points: 46733 More actions · Answer 1

I would go with option one as heap inserts are a lot quicker,

however he best answer would be it depends, so I would test them both and see what is the fastest.

SwePeso SSC-Dedicated Points: 39758 More actions · Answer 2

Does it really matter since the clustered key is an ever-increasing identity value?

N 56°04'39.16"
E 12°55'05.25"

MrAkki SSCrazy Points: 2574 More actions · Answer 3

That's what I'm wondering....

After the insert with no Clustered Index I have to create it, doesn't it have to completely write the whole table again to the database to create the clustered index? :crying:

Doesn't that mean the server has to write the 40GB twice?

Thanks,

Regards,

MrAkki

Bhuvnesh SSC Guru Points: 59351 More actions · Answer 4

I would go with Second option.

-------Bhuvnesh----------
I work only to learn Sql Server...though my company pays me for getting their stuff done;-)

VPombeiro SSC Enthusiast Points: 128 More actions · Answer 5

Hi All,

Will bet on the first option, why?

you have to copy the data on both option so there is not gain were.

But on the cluster index, this is where the difference will be noticed.

On the first option the SQL Server will only work with the index one time by creating it.

On the second option the SQL Server will work with the index on every insert, updating the index.

But as was said, the best thing is to try both option and see.

Michael L John One Orange Chip Points: 27079 More actions · Answer 6

How about option 3?

1. Use BCP to copy the data out.

2. Create the new table, without the identity. Leave it as an integer(or a bigint), and do not add the indexes.

3. BULK INSERT, or BCP, the values back in.

4. Change the field to an identity.

5. Add your indexes.

SSIS has a bit more overhead than BCP, so that should gain a bit of speed.

Insert operations on a table with an index has more overhead than inserting into a heap.

Michael L John
If you assassinate a DBA, would you pull a trigger?
To properly post on a forum:
http://www.sqlservercentral.com/articles/61537/

Derrick Smith SSChampion Points: 10833 More actions · Answer 7

Michael L John (11/5/2010)
How about option 3?
1. Use BCP to copy the data out.
2. Create the new table, without the identity. Leave it as an integer(or a bigint), and do not add the indexes.
3. BULK INSERT, or BCP, the values back in.
4. Change the field to an identity.
5. Add your indexes.
SSIS has a bit more overhead than BCP, so that should gain a bit of speed.
Insert operations on a table with an index has more overhead than inserting into a heap.

You cannot alter a column to add/remove the identity property. You would have to create a new column completely and then drop the old one.

For an identity field, create a clustered index, insert the data, then create your nonclustered indexes. This should be the fastest overall method.

Michael L John One Orange Chip Points: 27079 More actions · Answer 8

I stand corrected. It's early, and the caffiene hasn't kicked in.

I still think that leaving the clustered index off will be faster when performing the insert. That being said, the net gain on the insert may be offset by the creation of the index after the table is populated.

Michael L John
If you assassinate a DBA, would you pull a trigger?
To properly post on a forum:
http://www.sqlservercentral.com/articles/61537/

Derrick Smith SSChampion Points: 10833 More actions · Answer 9

Michael L John (11/5/2010)
I stand corrected. It's early, and the caffiene hasn't kicked in.
I still think that leaving the clustered index off will be faster when performing the insert. That being said, the net gain on the insert may be offset by the creation of the index after the table is populated.

I only found that out recently when I tried to convert an INT column to identity that the devs had code to increment manually on every insert.....ugh.

There probably isn't much of a difference with the clustered index before and after, since the data will already be logically ordered anyway. But, definitely create the nonclustered indexes after all the data has been inserted.

Michael L John One Orange Chip Points: 27079 More actions · Answer 10

After I though about your statement:

You cannot alter a column to add/remove the identity property. You would have to create a new column completely and then drop the old one.

I questioned the validity of this. I was pretty positive that you can make an int into an identity.

So, I created a test table, filled it with a few thousand rows, and then changed the data type on the ID field to an identity. It worked with no problem, and the values in the table were preserved.

I then added a few thousand more records, and everything woked as expected.

Exactly what behavior will not work?

Michael L John
If you assassinate a DBA, would you pull a trigger?
To properly post on a forum:
http://www.sqlservercentral.com/articles/61537/

Derrick Smith SSChampion Points: 10833 More actions · Answer 11

Michael L John (11/5/2010)
After I though about your statement:
You cannot alter a column to add/remove the identity property. You would have to create a new column completely and then drop the old one.
I questioned the validity of this. I was pretty positive that you can make an int into an identity.
So, I created a test table, filled it with a few thousand rows, and then changed the data type on the ID field to an identity. It worked with no problem, and the values in the table were preserved.
I then added a few thousand more records, and everything woked as expected.
Exactly what behavior will not work?

How did you do this? Did you do it through the GUI? If so, the gui creates an entirely new temporary table with an identity column, inserts the data, and then removes the old table and replaces it with the new one.

Michael L John One Orange Chip Points: 27079 More actions · Answer 12

With further testing, I discovered that we are both correct.

Yes, I originally changed it through the GUI. In a table this size, that could cause the system to slow to a complete crawl.

When I tried various ALTER TABLE syntax, all failed.

So, I stand by your original post!

Michael L John
If you assassinate a DBA, would you pull a trigger?
To properly post on a forum:
http://www.sqlservercentral.com/articles/61537/

Leo.Miller SSChampion Points: 13068 More actions · Answer 13

I'd go for a mix of option 1 & 3.

You can create the table with the identity column defined, but with both SSIS and BCP you can "enbale identity insert". With BCP use the -E option.

I'd then build all indexes after the insert is completed. As somone pointed out the issue of having the indexes in advance is that every insert causes a hit on the index, with an insert and or update, and if your original data wasn't select with an order by clause, you can't be sure your data is in cluster index order. This means you may do a lot of repeated data schuffles to get it all done.

It's much faster to do a post load index create, but you should test both options any way.

In general I've found BCP out and in faster than SSIS up to SQL 2005. (I haven't test SQL 2008, but I've heard that SSIS is faster in 2008.)

Cheers

Leo

Leo
Nothing in life is ever so complicated that with a little work it can't be made more complicated.

Paul White SSC Guru Points: 150468 More actions · Answer 14

Definitely: Create the table with IDENTITY property and (only) the Clustered Index before loading. Ensure the destination database is in BULK_LOGGED or SIMPLE recovery (to enable minimally-logged bulk load).

Configure the SSIS package to read the data from the source in Clustered Index order, and enable Identity Insert on the destination, specify that the data is arriving in cluster key order, and take the required table lock.

This way, the data is read and written using sequential IO, on multiple threads, using minimal logging.

Once the data transfer is complete, re-create the non-clustered indexes.

See the Data Loading Performance Guide

Paul White
All articles available on SQL.kiwi
@SQL_Kiwi