42% of 100% taken in clustered index insert

Question

Post reply

42% of 100% taken in clustered index insert

npranj

SSCrazy

Points: 2323
More actions
December 1, 2014 at 4:12 am

#299001

I have a query which is primarily a victim of blocking and a blocker itself for other queries. I studied the plan for this and it shows a 42% cost on CI insert operation. The insert is happening on a table (Table A) that has a PK. This PK is not a running number. It is also a business key (primary key) in another master table (Table B).
My understanding is that the cost is heavy because -
1, this PK is not an incremental number. It could be any number not in a sequence.
2. while inserting into CI, there must be a scan happening to find out the location where the index will be inserted.
How can I reduce the cost?
1. Should I go for partitioning of this table Table A? I am trying to do this but I am not able to find any suitable partition key looking at the JOINS and filter clauses where this table is being used in the applicaiton.
2. Should I introduce a surrogate key (running number) as a primary key so that CI is faster ?
Pls advise.

Viewing 13 posts - 1 through 13 (of 13 total)

You must be logged in to reply to this topic. Login to reply

Phil Parkin SSC Guru Points: 247202 More actions · Answer 1

You are on the right track, it seems.

But you should note that the PK does not have to be clustered.

So you could easily convert your PK to nonclustered and introduce a clustered index on a column which is better suited (ie one which does not generate unnecessary page splits as a result of inserts).

Grant Fritchey SSC Guru Points: 399363 More actions · Answer 2

Just because it's not an incrementing number doesn't mean that it's not a good choice for the clustered index. Also, the cost of an insert is usually quite high for an insert operation, so where are the other costs coming from? I wouldn't be shocked to see an insert have 99% of the cost in the insert operation itself. Also, a scan is usually unnecessary for an insert because the index structure should know exactly where to place the new data being inserted. If you're seeing that, something may be up. There are other processes, triggers (though they won't be evident in an actual execution plan), something, that's causing additional load. Without seeing the data or the execution plan, I can't tell you for sure where the issue lies.

Be VERY cautious about partitioning data. It is absolutely not a primary means of performance tuning. It's a data management tool that can, under certain circumstances, sometimes help performance. But partitioning can also make performance much worse in cases where the partitioning key(s) can't be used for all queries. Scans across partitions are worse than regular scans.

"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt

Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning

npranj SSCrazy Points: 2323 More actions · Answer 3

Thanks Grant.

The 42% cost is Clustered Index insert cost. There is one CI and 4 NCI on this table.

All the other costs are from Index Seeks.

Also, this PK that we are currently using is a bigint. The current maximum value is 141887778. The sequence of insert could be anything starting with 1. Hence, my understanding that there must be page splits happening big time adding to this 42% cost and that we should keep an identity column as PK and CI.

The insert operation is happening from the application on the click of a button. We do not use triggers.

The purpose of this query is to insert data into an Archive table.. so we are inserting about 50k-70k in one shot. Issue also is because there is concurrency. Multiple users are inserting records at the end of the month. Each user is inserting specific set records. They are not trying to mess with each others data - just using the same table for inserting records.

Jeff Moden SSC Guru Points: 1004686 More actions · Answer 4

Grant Fritchey (12/1/2014)
Just because it's not an incrementing number doesn't mean that it's not a good choice for the clustered index.

I have to admit that I'm taken aback by that statement a bit. Most would agree that "ever increasing" (which includes dates, which are really a pair of numbers behind the scenes, so long as you include another column that will make it unique along with being as narrow as possible and unchanging) is one of the prerequisites to a good clustered index. What else would you suggest?

And no... no irony intended here, ol' friend. I just can't think of anything else that would actually make a good choice on a monolithic table that suffers heavy inserts. If you know of some neat trick, I'd love to hear more about it.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Grant Fritchey SSC Guru Points: 399363 More actions · Answer 5

Jeff Moden (12/1/2014)
Grant Fritchey (12/1/2014)
Just because it's not an incrementing number doesn't mean that it's not a good choice for the clustered index.
I have to admit that I'm taken aback by that statement a bit. Most would agree that "ever increasing" (which includes dates, which are really a pair of numbers behind the scenes, so long as you include another column that will make it unique along with being as narrow as possible and unchanging) is one of the prerequisites to a good clustered index. What else would you suggest?
And no... no irony intended here, ol' friend. I just can't think of anything else that would actually make a good choice on a monolithic table that suffers heavy inserts. If you know of some neat trick, I'd love to hear more about it.

No Jeff, no disagreement. That is the "best" possible clustered index. I've just found that you shouldn't let that aspect of the definition define your design. A unique, narrow, ever increasing key is the single best choice for a clustered index. But, what if we're in a situation where such a column just won't be used for the majority of the data access? I've seen the designs where there's a clustered primary key on an identity column, but then that column is never, ever referenced in any of the code (heck, I've done that in my youth). It's wasting the clustered index. Where, if you moved it to, just as an arbitrary example, to a name column, which is unique, but not terribly narrow at varchar(30), but is used in every single query, you suddenly receive all the benefits of the clustered index, even though it's not falling outside that "best" list.

I didn't intend to argue against what's best. I just don't think we should be bound by it as long as we're not doing things that are egregiously stupid (three column compound GUID keys and a clustered index on a bit column are two of my favorites).

"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt

Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning

Grant Fritchey SSC Guru Points: 399363 More actions · Answer 6

npranj (12/1/2014)
Thanks Grant.
The 42% cost is Clustered Index insert cost. There is one CI and 4 NCI on this table.
All the other costs are from Index Seeks.
Also, this PK that we are currently using is a bigint. The current maximum value is 141887778. The sequence of insert could be anything starting with 1. Hence, my understanding that there must be page splits happening big time adding to this 42% cost and that we should keep an identity column as PK and CI.
The insert operation is happening from the application on the click of a button. We do not use triggers.
The purpose of this query is to insert data into an Archive table.. so we are inserting about 50k-70k in one shot. Issue also is because there is concurrency. Multiple users are inserting records at the end of the month. Each user is inserting specific set records. They are not trying to mess with each others data - just using the same table for inserting records.

OK, but 42% of the cost of an insert operation still doesn't indicate a performance problem. In fact, the issue that 58% of the cost is elsewhere should be concerning on the insert operations. That's a lot of cost on validating, I assume, referential constraints.

Costs in execution plans are not measures of time. In fact, they're estimates. You may certainly be seeing serious performance problems due to page splits. But, the execution plan will not tell you this. You need to look to the wait statistics of the queries involved to understand what's occurring. Then, maybe modifying the clustered index will eliminate the splits, but, will you use the clustered index for anything other than defining data storage to avoid the page splits? If not, I'd still say that's a waste of the clustered index, since it remains the single best access point to your data. If you don't use it in your queries, I'd say you're throwing away a tool.

"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt

Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning

Gail Shaw SSC Guru Points: 1004514 More actions · Answer 7

npranj (12/1/2014)
Hence, my understanding that there must be page splits happening big time adding to this 42% cost

The optimiser, which is what generates those estimated costs, does not have a clue what a page split is. It doesn't know anything about fragmentation, it does not take the physical storage of the index into account.

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

npranj SSCrazy Points: 2323 More actions · Answer 8

Hi Grant - There are intent exclusive locks (LCK_M_IX)

As for the Identity Column CI - we will not be using this column in any query.

So what should be the next option.

Gail Shaw SSC Guru Points: 1004514 More actions · Answer 9

There will always be intent exclusive locks, that's a normal part of locking, nothing unusual there.

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

Grant Fritchey SSC Guru Points: 399363 More actions · Answer 10

npranj (12/2/2014)
Hi Grant - There are intent exclusive locks (LCK_M_IX)
As for the Identity Column CI - we will not be using this column in any query.
So what should be the next option.

Gail already said, those locks are to be expected. That's how inserts occur. What are your waits during the queries? What's causing things to slow down? Don't guess. Measure it. Capture sys.dm_os_wait_stats before and after the query runs.

If you're not using that CI, then it's nothing other than defining storage, but if it's clustered, unique, narrow and ever increasing, why do you think you're getting page splits again?

"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt

Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning

Jeff Moden SSC Guru Points: 1004686 More actions · Answer 11

Grant Fritchey (12/2/2014)
npranj (12/1/2014)
Thanks Grant.
The 42% cost is Clustered Index insert cost. There is one CI and 4 NCI on this table.
All the other costs are from Index Seeks.
Also, this PK that we are currently using is a bigint. The current maximum value is 141887778. The sequence of insert could be anything starting with 1. Hence, my understanding that there must be page splits happening big time adding to this 42% cost and that we should keep an identity column as PK and CI.
The insert operation is happening from the application on the click of a button. We do not use triggers.
The purpose of this query is to insert data into an Archive table.. so we are inserting about 50k-70k in one shot. Issue also is because there is concurrency. Multiple users are inserting records at the end of the month. Each user is inserting specific set records. They are not trying to mess with each others data - just using the same table for inserting records.
OK, but 42% of the cost of an insert operation still doesn't indicate a performance problem. In fact, the issue that 58% of the cost is elsewhere should be concerning on the insert operations. That's a lot of cost on validating, I assume, referential constraints.
Costs in execution plans are not measures of time. In fact, they're estimates. You may certainly be seeing serious performance problems due to page splits. But, the execution plan will not tell you this. You need to look to the wait statistics of the queries involved to understand what's occurring. Then, maybe modifying the clustered index will eliminate the splits, but, will you use the clustered index for anything other than defining data storage to avoid the page splits? If not, I'd still say that's a waste of the clustered index, since it remains the single best access point to your data. If you don't use it in your queries, I'd say you're throwing away a tool.

As you say, no disagreement with what you said. I've been known to have a non-clustered index PK and the Clustered index on something else but... I wouldn't have it on a name or street address (for example) if the table suffered a lot of indexes.

I do very strongly agree with the intent, though. "It Depends" and I couldn't agree more. I was just a little shocked without seeing an additional bit of info. We're good now. 🙂 As the C.O.W. would say to the O.O.D., "All lights are Green. The boat is ready to dive." 🙂

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Grant Fritchey SSC Guru Points: 399363 More actions · Answer 12

Jeff Moden (12/11/2014)
Grant Fritchey (12/2/2014)
npranj (12/1/2014)
Thanks Grant.
The 42% cost is Clustered Index insert cost. There is one CI and 4 NCI on this table.
All the other costs are from Index Seeks.
Also, this PK that we are currently using is a bigint. The current maximum value is 141887778. The sequence of insert could be anything starting with 1. Hence, my understanding that there must be page splits happening big time adding to this 42% cost and that we should keep an identity column as PK and CI.
The insert operation is happening from the application on the click of a button. We do not use triggers.
The purpose of this query is to insert data into an Archive table.. so we are inserting about 50k-70k in one shot. Issue also is because there is concurrency. Multiple users are inserting records at the end of the month. Each user is inserting specific set records. They are not trying to mess with each others data - just using the same table for inserting records.
OK, but 42% of the cost of an insert operation still doesn't indicate a performance problem. In fact, the issue that 58% of the cost is elsewhere should be concerning on the insert operations. That's a lot of cost on validating, I assume, referential constraints.
Costs in execution plans are not measures of time. In fact, they're estimates. You may certainly be seeing serious performance problems due to page splits. But, the execution plan will not tell you this. You need to look to the wait statistics of the queries involved to understand what's occurring. Then, maybe modifying the clustered index will eliminate the splits, but, will you use the clustered index for anything other than defining data storage to avoid the page splits? If not, I'd still say that's a waste of the clustered index, since it remains the single best access point to your data. If you don't use it in your queries, I'd say you're throwing away a tool.
As you say, no disagreement with what you said. I've been known to have a non-clustered index PK and the Clustered index on something else but... I wouldn't have it on a name or street address (for example) if the table suffered a lot of indexes.
I do very strongly agree with the intent, though. "It Depends" and I couldn't agree more. I was just a little shocked without seeing an additional bit of info. We're good now. 🙂 As the C.O.W. would say to the O.O.D., "All lights are Green. The boat is ready to dive." 🙂

Crap. That means I have to go and close all the valves I just opened.

"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt

Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning