SQL Clone
SQLServerCentral is supported by Redgate
 
Log in  ::  Register  ::  Not logged in
 
 
 


Create Clustered Index on a Very Large Table (500 GB)


Create Clustered Index on a Very Large Table (500 GB)

Author
Message
-A
-A
Grasshopper
Grasshopper (12 reputation)Grasshopper (12 reputation)Grasshopper (12 reputation)Grasshopper (12 reputation)Grasshopper (12 reputation)Grasshopper (12 reputation)Grasshopper (12 reputation)Grasshopper (12 reputation)

Group: General Forum Members
Points: 12 Visits: 125
Jeff Moden (5/12/2014)
-A (5/7/2014)
Thanks for the replays.

--Here is the answers to the follow up questions:
1) I have to create this Index online since it is huge and I cannot afford such a long down time.

2) MAXDOP was set to 10 at first and showed no significant improvement in speed. I believe at the end the create index still happens on 1 processor anyways.

3) Also SORT_IN_TEMPDB = OFF or ON still didn't matter as SQL always seemed to use up all of the tempdb.

4) This table is already partitioned by date and currently has 570 partitions.


Wait justa minute... you have a partitioned table that has no Clustered Index??? Blink

What seems to be the problem ?


Abdel Ougnou
Jeff Moden
Jeff Moden
SSC Guru
SSC Guru (87K reputation)SSC Guru (87K reputation)SSC Guru (87K reputation)SSC Guru (87K reputation)SSC Guru (87K reputation)SSC Guru (87K reputation)SSC Guru (87K reputation)SSC Guru (87K reputation)

Group: General Forum Members
Points: 87386 Visits: 41113
-A (5/12/2014)
Jeff Moden (5/12/2014)
-A (5/7/2014)
Thanks for the replays.

--Here is the answers to the follow up questions:
1) I have to create this Index online since it is huge and I cannot afford such a long down time.

2) MAXDOP was set to 10 at first and showed no significant improvement in speed. I believe at the end the create index still happens on 1 processor anyways.

3) Also SORT_IN_TEMPDB = OFF or ON still didn't matter as SQL always seemed to use up all of the tempdb.

4) This table is already partitioned by date and currently has 570 partitions.


Wait justa minute... you have a partitioned table that has no Clustered Index??? Blink

What seems to be the problem ?


Nothing if you never update variable width columns to contain more than they did at INSERT time and you never do deletes, both of which can waste huge amounts of disk space, memory space, etc. Sure, you can rebuild a heap as of 2008 but that also comes with some hefty hidden costs for the NCIs.

I suppose that for tables like audit tables, which are only inserted into, a partitioned heap might actually do faster inserts than on a partitioned clustered table so that might be an advantage. I would think that you'd need some covering indexes to get any real performance out of such a thing and those might negate the insert advantage that a heap might have.

I've never tried such a thing before and it's an interesting prospect. What's your experience on such a thing?

And thanks for raising the question.

--Jeff Moden

RBAR is pronounced ree-bar and is a Modenism for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column.
If you think its expensive to hire a professional to do the job, wait until you hire an amateur. -- Red Adair

Helpful Links:
How to post code problems
How to post performance problems
Forum FAQs
TheSQLGuru
TheSQLGuru
SSChampion
SSChampion (12K reputation)SSChampion (12K reputation)SSChampion (12K reputation)SSChampion (12K reputation)SSChampion (12K reputation)SSChampion (12K reputation)SSChampion (12K reputation)SSChampion (12K reputation)

Group: General Forum Members
Points: 12639 Visits: 8562
Jeff, if I recall correctly earlier versions of the Fast Track Datawarehouse appliance was spec'd for mostly HEAP tables to ensure the most efficient sequential scans. Not sure of latest models though. I concur with your notion of logging tables likely benefitting from no clustered index as well. Another scenario is tables with a non-sequential GUID as their key. The no-clustered-index scenarios all assume small-to-medium-range-scan performance increases available from CIs are not required.

Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service
Jeff Moden
Jeff Moden
SSC Guru
SSC Guru (87K reputation)SSC Guru (87K reputation)SSC Guru (87K reputation)SSC Guru (87K reputation)SSC Guru (87K reputation)SSC Guru (87K reputation)SSC Guru (87K reputation)SSC Guru (87K reputation)

Group: General Forum Members
Points: 87386 Visits: 41113
Thanks, Kevin. Now THAT's an interesting bit of information. I can see it being a benefit to INSERT performance as well since a heap on a temporal audit table would suffer roughly the same insert order as if a CI were present but without the overhead of updating the BTREE of a CI. Since there isn't much on Yabingooglehoo :-) on the notion of partitioning heaps, I guess I have some experimentation to do.

--Jeff Moden

RBAR is pronounced ree-bar and is a Modenism for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column.
If you think its expensive to hire a professional to do the job, wait until you hire an amateur. -- Red Adair

Helpful Links:
How to post code problems
How to post performance problems
Forum FAQs
Jeff Moden
Jeff Moden
SSC Guru
SSC Guru (87K reputation)SSC Guru (87K reputation)SSC Guru (87K reputation)SSC Guru (87K reputation)SSC Guru (87K reputation)SSC Guru (87K reputation)SSC Guru (87K reputation)SSC Guru (87K reputation)

Group: General Forum Members
Points: 87386 Visits: 41113
Jeff Moden (5/13/2014)
-A (5/12/2014)
Jeff Moden (5/12/2014)
-A (5/7/2014)
Thanks for the replays.

--Here is the answers to the follow up questions:
1) I have to create this Index online since it is huge and I cannot afford such a long down time.

2) MAXDOP was set to 10 at first and showed no significant improvement in speed. I believe at the end the create index still happens on 1 processor anyways.

3) Also SORT_IN_TEMPDB = OFF or ON still didn't matter as SQL always seemed to use up all of the tempdb.

4) This table is already partitioned by date and currently has 570 partitions.


Wait justa minute... you have a partitioned table that has no Clustered Index??? Blink

What seems to be the problem ?


Nothing if you never update variable width columns to contain more than they did at INSERT time and you never do deletes, both of which can waste huge amounts of disk space, memory space, etc. Sure, you can rebuild a heap as of 2008 but that also comes with some hefty hidden costs for the NCIs.

I suppose that for tables like audit tables, which are only inserted into, a partitioned heap might actually do faster inserts than on a partitioned clustered table so that might be an advantage. I would think that you'd need some covering indexes to get any real performance out of such a thing and those might negate the insert advantage that a heap might have.

I've never tried such a thing before and it's an interesting prospect. What's your experience on such a thing?

And thanks for raising the question.


@Abdel,

Looking forward to your answer on the questiong of "What's your experience on such a thing?" above.

--Jeff Moden

RBAR is pronounced ree-bar and is a Modenism for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column.
If you think its expensive to hire a professional to do the job, wait until you hire an amateur. -- Red Adair

Helpful Links:
How to post code problems
How to post performance problems
Forum FAQs
-A
-A
Grasshopper
Grasshopper (12 reputation)Grasshopper (12 reputation)Grasshopper (12 reputation)Grasshopper (12 reputation)Grasshopper (12 reputation)Grasshopper (12 reputation)Grasshopper (12 reputation)Grasshopper (12 reputation)

Group: General Forum Members
Points: 12 Visits: 125
@Jeff,
Sorry for the late response. This table is partitioned by week and we are mostly just inserting into that specific weekly partition and aggregating the data which updates some data. Also since we are almost always inserting a new quarter's data we are basically creating new pages at the end of the table. Because of this special business scenario the inserts are fairly fast for a 4 TB SQL2012 databases. The business operated under NCI index Heap setup for years but some of the larger reports are now suffering some performance and that's why the switch to this compressed CI set up.

This set up is still in development/testing phase and have not been deployed to production yet. So I will not have a real PROD stats on this until we finish implementation.

Thanks again,

-A


Abdel Ougnou
MMartin1
MMartin1
SSCrazy
SSCrazy (2.8K reputation)SSCrazy (2.8K reputation)SSCrazy (2.8K reputation)SSCrazy (2.8K reputation)SSCrazy (2.8K reputation)SSCrazy (2.8K reputation)SSCrazy (2.8K reputation)SSCrazy (2.8K reputation)

Group: General Forum Members
Points: 2801 Visits: 2031
Jeff, I wanted to get your take on something Gail posted once :
http://www.sqlservercentral.com/Forums/FindPost485499.aspx
The way I read it, a clustered index (pk or not) won't necessarily cause a huge slowdown to the insert process. I think it would lead to fragmentation if done often enough, due to new pages everywhere. Is there a balance to reach here?

----------------------------------------------------
How to post forum questions to get the best help
Jeff Moden
Jeff Moden
SSC Guru
SSC Guru (87K reputation)SSC Guru (87K reputation)SSC Guru (87K reputation)SSC Guru (87K reputation)SSC Guru (87K reputation)SSC Guru (87K reputation)SSC Guru (87K reputation)SSC Guru (87K reputation)

Group: General Forum Members
Points: 87386 Visits: 41113
MMartin1 (8/7/2014)
Jeff, I wanted to get your take on something Gail posted once :
http://www.sqlservercentral.com/Forums/FindPost485499.aspx
The way I read it, a clustered index (pk or not) won't necessarily cause a huge slowdown to the insert process. I think it would lead to fragmentation if done often enough, due to new pages everywhere. Is there a balance to reach here?


Correct. Adding rows to a clustered index won't cause any slowdown to an insert process if the insert process is in the same order as the CI and it's at the logical end of the table.

If it's not in the same order or not at the logical end of the table, you get page splits and those not only cause fragmentation but they can also cause huge slowdowns because (on average) roughly have the page will need to be copied to a new page. None of that is contrary to what Gail stated in that post.

Non-clustered indexes suffer the same fate.

I have seen it where someone adding an NCI with a column of very low selectivity as the leading column will suddenly cause massive timeouts on a busy app. Heh... I know this to be true because I was that "someone" in my early days.

The same thing can happen with clustered indexes.

The "balance" to be sought is to determine whether the table will suffer more inserts than selects and to write your indexes accordingly.

--Jeff Moden

RBAR is pronounced ree-bar and is a Modenism for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column.
If you think its expensive to hire a professional to do the job, wait until you hire an amateur. -- Red Adair

Helpful Links:
How to post code problems
How to post performance problems
Forum FAQs
TheSQLGuru
TheSQLGuru
SSChampion
SSChampion (12K reputation)SSChampion (12K reputation)SSChampion (12K reputation)SSChampion (12K reputation)SSChampion (12K reputation)SSChampion (12K reputation)SSChampion (12K reputation)SSChampion (12K reputation)

Group: General Forum Members
Points: 12639 Visits: 8562
I think this is slightly mis-stated Jeff. While you may not get fragmentation from end-of-index inserts, you certainly can get page LATCH contention, which can be a significant bottleneck to good insert performance in such a scenario.

Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service
MMartin1
MMartin1
SSCrazy
SSCrazy (2.8K reputation)SSCrazy (2.8K reputation)SSCrazy (2.8K reputation)SSCrazy (2.8K reputation)SSCrazy (2.8K reputation)SSCrazy (2.8K reputation)SSCrazy (2.8K reputation)SSCrazy (2.8K reputation)

Group: General Forum Members
Points: 2801 Visits: 2031
Thanks Jeff for sharing your knowledge here.
Kevin, I'll have to read up on Latching. Thanks guys.

----------------------------------------------------
How to post forum questions to get the best help
Go


Permissions

You can't post new topics.
You can't post topic replies.
You can't post new polls.
You can't post replies to polls.
You can't edit your own topics.
You can't delete your own topics.
You can't edit other topics.
You can't delete other topics.
You can't edit your own posts.
You can't edit other posts.
You can't delete your own posts.
You can't delete other posts.
You can't post events.
You can't edit your own events.
You can't edit other events.
You can't delete your own events.
You can't delete other events.
You can't send private messages.
You can't send emails.
You can read topics.
You can't vote in polls.
You can't upload attachments.
You can download attachments.
You can't post HTML code.
You can't edit HTML code.
You can't post IFCode.
You can't post JavaScript.
You can post emoticons.
You can't post or upload images.

Select a forum

































































































































































SQLServerCentral


Search