Performance with Bulk update

Question

Performance with Bulk update

brainy

SSCrazy

Points: 2445
More actions
December 29, 2009 at 11:06 pm

#96367

Hello,
I have a procedure which will run bulk update insert update statements. I have created some indexes. But whenever I run the procedure it is taking long time 5-8 hours, which is not expected.
When the procedure completes running, the indexes which I had created had 60-80% fragmented, I thought its due to bulk update. If it is so, what is the strategy to defrag the index. Do we need to defrag after each bulk update? or we have any other strategy to deal with this.
Advance thanks for your help!

Viewing 15 posts - 1 through 14 (of 14 total)

You must be logged in to reply to this topic. Login to reply

Chandra Sekhara Vyas Dhara SSChampion Points: 10207 More actions · Answer 1

Drop the indexes and bulk insert and check. It should boost the performance. After bulk insert recreate all the indexes.

brainy SSCrazy Points: 2445 More actions · Answer 2

I am doing the same thing.. I am dropping the index before insert... i am creating index after bulk insert. Then I am doing some bulk updates to the same table. This is creating problem. As I have created some indexes to boost the performance of bulk updates. When I run the bulk updates it is fragmenting the indexes. (I have created 7-8 indexes, the table is too big having 1million records and having 500 columns)

Do we need to run update stat in this situation ( I dont think so).

ChiragNS One Orange Chip Points: 26137 More actions · Answer 3

Try and use the BULK INSERT Statements for inserts.

"Keep Trying"

brainy SSCrazy Points: 2445 More actions · Answer 4

Sorry.. I am not importing data from a flat file.. I am just importing from other table. So I am just using insert into .. select * from .....

I am interested to know is any one came up with the same problem.

If we are doing bulk updates rather than looping through each record. Anyway bulk updates defrag the indexes. If we are using the same indexes inside the stored procedure... since it is defragmented, if we are using that index it will slow down the performance. Is there any startegy to deal with this?

Chandra Sekhara Vyas Dhara SSChampion Points: 10207 More actions · Answer 5

Chandra Sekhara Vyas Dhara

SSChampion

Points: 10207

December 30, 2009 at 12:10 am

#1097332

Try with BCP and Check

ChiragNS One Orange Chip Points: 26137 More actions · Answer 6

brainy (12/29/2009)
Sorry.. I am not importing data from a flat file.. I am just importing from other table. So I am just using insert into .. select * from .....
I am interested to know is any one came up with the same problem.
If we are doing bulk updates rather than looping through each record. Anyway bulk updates defrag the indexes. If we are using the same indexes inside the stored procedure... since it is defragmented, if we are using that index it will slow down the performance. Is there any startegy to deal with this?

You can use BULK INSERT for inserting data from one table to another. Its faster and has less impact on your log file.

De-fragmented indexes will slow down performance. To counter this you will need a index rebuild/reorganize strategy that does this periodically. Or you can drop indexes , import data and then re-create the indexes as pointed out in a earlier post.

What ever you do make sure you test things properly.

"Keep Trying"

Chandra Sekhara Vyas Dhara SSChampion Points: 10207 More actions · Answer 7

Check the following :

Whether the table has constraints or triggers, or both.

The recovery model used by the database.

Whether the table into which data is copied is empty.

Whether the table has indexes.

Whether TABLOCK is being specified.

If is best defined in BOL

brainy SSCrazy Points: 2445 More actions · Answer 8

I am loading data from different db... I am loading into staging table from staging table it will be loaded to different tables.

I have created index on the staging table since we are loading to other tables based on business logic. In between we are making some changes in the staging table according to the business logic. Since we have index on staging table.. and along with that we are making updates (bulk) based on business logic. It is defragmenting the indexes. Do I need to rebuild indexes on each bulk update to the staging table.

Jeff Moden SSC Guru Points: 1003916 More actions · Answer 9

You may be causing massive page splits on the clustered index and massive extent splits on the non-clustered indexes if they are not in the same "order" as the data being inserted. It's one of those things that you just have to deal with. My recommendation for such a thing is that the clustered index should be on an IDENTITY column to keep the tables in a reasonable "physical" order for a large number of inserts and updates. That, of course, won't help the other indexes and you will need to either rebuild or defrag as part of any good maintenance plan.

If you have the Enterprise version of 2k5, you can rebuild in the "Online" mode which will be slower. If not, you can rebuild the table on the side from the original, rebuild all the indexes at your leisure, start a tran to lock both tables, do a final update on the new table, and then do a "snap rename" to put the new table in service while keeping the old table around for a day or two to make sure nothing went wrong... went wrong... went wrong... {whack!}... everything went right. 😉

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

brainy SSCrazy Points: 2445 More actions · Answer 10

Thanks Jeff...

I agree.. and I have IDENTITY as clustered index on the staging table which I am inserting into.

The problem is .. based certain business logic I am updating staging table.. and I am inserting into some other tables from this staging table. I mean I have lot of updates in between..... then again I am inserting into other tables.

While inserting into other tables I am using the indexes on the staging table (which I had created specifically for these inserts).

I am also in the same inference, as I am updating the table which might lead to page splits.

My doubt is do I need to rebuild the index inside the procedure(all inserts and updates are happening inside the SP). Since rebuilding index will take some time.. it might still slow down the performance.

I am in a dilemma; do I need to rebuild index OR do I need to update statistics(configured auto update stats is on ) as I am doing huge updates on the staging table OR should I leave as it is. Since analyzing this will take lot of time as it is huge procedure and I need to load huge data to analyze.

So I am asking for suggestions from experienced. 🙂

brainy SSCrazy Points: 2445 More actions · Answer 11

Thanks Jeff...

I agree.. and I have IDENTITY as clustered index on the staging table which I am inserting into.

The problem is .. based certain business logic I am updating staging table.. and I am inserting into some other tables from this staging table. I mean I have lot of updates in between..... then again I am inserting into other tables.

While inserting into other tables I am using the indexes on the staging table (which I had created specifically for these queries).

I am also in the same inference, as I am updating the table which might lead to page splits.

My doubt is do I need to rebuild the index inside the procedure(all inserts and updates are happening inside the SP). Since rebuilding index will take some time.. it might still slow down the performance.

I am in a dilemma; do I need to rebuild index OR do I need to update statistics(configured auto update stats is on ) as I am doing huge updates on the staging table OR should I leave as it is. Since analyzing this will take lot of time as it is huge procedure and I need to load huge data to analyze.

So I am asking for suggestions from experienced. 🙂

Jeff Moden SSC Guru Points: 1003916 More actions · Answer 12

Auto update of stats being on is no guarantee that stats will be updated. IIRC, something like 20% of the table needs to be modified or new before an automatic stats update will kick in. For a 10 million row table, that means it'll take a 2 million row update or insert before stats update. The bad problem about that is they take time as well and having them auto update likely means they'll update at the worst time possible.

Both indexes and stats should be updated after a large insert. See DBCC SHOWCONTIG in Books Online for some recommendations as to when you should update indexes, etc, and some code that will help you do it.

As an interesting anecdote, I added a brand spanking new index to a query that got just a max date from a paltry little 4 million row table. It brought that particular lookup down from 2 minutes to just a second or so... I recognized that wasn't good enough. DBCC SHOWCONTIG was showing a logical scan rate of only 45% and the number of extent switches was about 20 times the number of actual extents. Page density was only about 55% for most of the indexes. The table is highly transactional suffering from many inserts and a lot of deletes (after archiving old data). The table and it's indexes occupied several gig. After the index rebuilds, the table only occupied a couple of hundred meg and that 1 second query I was telling you about dropped to sub-millisecond. Of course, all the queries against it dropped rather remarkably. You'll really like this next part and it's no misprint... between adding the correct index and rebuilding the indexes, the number of reads (think I/O here folks) dropped from more than 300,000 to only 12. The translation on our end was that a full 50% of the I/O workload on the server simply vanished.

Pay me now or pay me later... if you want a 24/7 system to be 24/7, you need to find a way to do the necessary maintenance to keep it that way. Tricks like the "snap rename" method are essential to such processes on tables that cannot be put down for maintenance for more than a second or two. Another way to do it is to invest in the Enterprise Edition so that you can do "partitioned" tables so such things as index rebuilds are quite as imposing.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

brainy SSCrazy Points: 2445 More actions · Answer 13

Thanks jeff..

I will consider your suggestions.. I will try that. I am also thinking on setting fill factor.. as I am updating table several times which leads to page splits.

How about setting fill factor(currently I have not set any fill factor). Any suggestions on this.?;-)

Jeff Moden SSC Guru Points: 1003916 More actions · Answer 14

brainy (12/30/2009)
Thanks jeff..
I will consider your suggestions.. I will try that. I am also thinking on setting fill factor.. as I am updating table several times which leads to page splits.
How about setting fill factor(currently I have not set any fill factor). Any suggestions on this.?;-)

It depends... a lot. Most folks have tables where, despite inserting an insane number of rows, have strong requirements for fast SELECTs. Adding a low fill factor will help with the INSERTs but can really hurt SELECT performance because of all the extra partially full pages it has to read to do the same job. Of course, that will still be better than a bazillion page splits.

Also, even with an optimal setting for the FILL FACTOR, eventually you will need to do some maintenance on the indexes including the occasional rebuild. The FILL FACTOR should, at best, only be used as a stop gap between the execution of decent maintenance plans.

All that being said, I have no magic formula for what the FILL FACTOR should be. My recommendation would be to keep some records as you make changes and have a couple of "golden test queries" to test with. Start with a 90% FILL FACTOR and adjust from there. I would also suggest than no adjustment be more than 10% change.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)