Bulk insert performance.

Question

Bulk insert performance.

TedCrilly

SSC Veteran

Points: 213
More actions
November 28, 2009 at 7:55 am

#217288

We have a nightly job that inserts a couple of hundred thousand rows in to a table. There are 4 indexes on the table and the operation is quite slow.
I know droping the indexes should speed up the insert. However, does the time it takes to drop the indexes, doing the insert and recreating the indexes not likely to take the same time if I was not to bother dropping the indexes first place?
If it is still quicker to drop the indexes first .... why so? Whats going on underneath the hood?
Also, can a table still handle inserts updates and deletes while it is being indexed?
I can test this stuff out on Monday morning .... but just curious to know whats going on underneath the covers.
Thanks in advance!

Viewing 9 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply

Piotr.Rodak SSCrazy Eights Points: 9376 More actions · Answer 1

You are right, dropping indexes may speed up the bulk insert operation. This depends on the number of rows in the table the rows are inserted to and the number of rows you are inserting. You have to test it - if you have 10 million rows, dropping indexes may not be a viable option.

The table can be updateable if you rebuild index with ONLINE option, but this works only on Enterprise edition of SQL Server.

Can you tell us more about your specific scenario? What is the size of the table? Are there rows removed from it regularly?

Regards

Piotr

...and your only reply is slàinte mhath

Grant Fritchey SSC Guru Points: 398679 More actions · Answer 2

When you insert a row, it has to be placed in the correct place in an index. When you do a batch insert of umpty-x number of rows, they also have to be placed in the correct place in an index, including splitting and rearranging pages within the index as pages get filled by the new data or the key order changes because of inserts, etc., etc.. That's not even getting into the fact that you may have more than one index on the table.

In many cases, not all, dropping the index, loading the data, and recreating the index is faster because of all the stuff going on in the background when you're inserting data.

SQL Server is fine with inserts and index management occurring at the same time, but, it can be costly. So when doing a bulk load, if you can avoid that cost, it's usually better to do so.

"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt

Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning

David Walker-278941 Ten Centuries Points: 1064 More actions · Answer 3

As Piotr said, *try it and see*. Not to sound too harsh, but there is no way that anyone else is going to know whether, with YOUR disk configuration, and YOUR processor configuration, and the amount of memory and other workloads on YOUR server, and the arrangement of your data, and how much data you have already, and how much data you are inserting, and where exactly that data falls within your existing data (from the index perspective) -- we can't tell from where we sit, whether in YOUR situation it would be faster to drop indexes, import the data, and recreate the indexes. There are too many variables and there is no single answer that applies to every configuration.

I used to be scared to try things like this myself, but once you are sure that you have scripted out the "Create Index" statements from the existing table(s), you shouldn't break anything. You can drop the indexes, import the data, and recreate the indexes. See how long that takes, and then you will know whether it is faster in your environment.

Hope this helps.

David Walker

Gosta Munktell SSCertifiable Points: 6004 More actions · Answer 4

"We have a nightly job that inserts a couple of hundred thousand rows in to a table. There are 4 indexes on the table and the operation is quite slow."

What do you mean with slow? a couple of hundred thousand should not take that long?

For example I have a nightly job which insert about 50 000 rows into a table with about 16 000 000 rows and

6 indexes. That job including inserting into 26 other tables and some calculating takes in this

case all together 11 minutes. Everthing is logged and sent by e-mail.

The cpu is a Xeon dual core 3.2 Ghz with 12 GB internal memory and SQL Server 2005 64 bit.

As suggested first drop the index and then recreate the index should be the best solution.

I do have a bunch of tables with millions av rows to be updated with more rows using an insert statement and after testing found the suggested solution ok.

Please let us know what you get.

/Gosta

TheSQLGuru SSC Guru Points: 134017 More actions · Answer 5

Unless you have very few rows in the table relative to the amount being added, I would say that dropping the indexes and recreating them would NOT be the best solution.

Some questions:

1) What is fill factor of indexes?

2) do you defragment them, if so when - before or after the load?

3) any triggers on the table?

4) any foreign keys on the table? if so, have you verified that all foreign key columns are indexed?

5) what else is running on the server when the load occurs? index maint, backups, checkdb, etc?

6) have you actually monitored the server during the load? file IO stalls, wait stats analysis, CPU utilization, etc, etc?

Those are all very basic questions for a scenario like this. If you weren't already aware of them may I recommend you get some professional on board for a day or two to mentor you on what to look for for troubleshooting perf problems like this? Amazing ROI to be had there I bet!

Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service

Jeff Moden SSC Guru Points: 1003853 More actions · Answer 6

To add to that, do you have persited calculated columns, indexed views, or a wad of constraints?

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

mansshah SSC Enthusiast Points: 128 More actions · Answer 7

Hi,

I am facing similar issue while inserting lots of rows into the SQL server 2005 database.

My through put from the Sql server is 400 rows in a sec, and my target to insert 2000 rows in a sec.

there is one index on that Table, but when took it out it improved little bit but its negligible.

What wonders me is that there is no CPU hit on SQL server machine, Even i have set network packet Size and Worker thread options. nothing worked.

I will appreciate all help in this matter.

Thanks

TheSQLGuru SSC Guru Points: 134017 More actions · Answer 8

mansshah (2/18/2010)
Hi,
I am facing similar issue while inserting lots of rows into the SQL server 2005 database.
My through put from the Sql server is 400 rows in a sec, and my target to insert 2000 rows in a sec.
there is one index on that Table, but when took it out it improved little bit but its negligible.
What wonders me is that there is no CPU hit on SQL server machine, Even i have set network packet Size and Worker thread options. nothing worked.
I will appreciate all help in this matter.
Thanks

in Books Online, see here (ms-help://MS.SQLCC.v9/MS.SQLSVR.v9.en/udb9/html/bd1dac6b-6ef8-4735-ad4e-67bb42dc4f66.htm) and here (ms-help://MS.SQLCC.v9/MS.SQLSVR.v9.en/udb9/html/f2a68e0f-9190-45c4-abee-1b2ebbb13817.htm) and some of the associated links for the requirements for achieving minimal logging.

Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service