Create Clustered Index on a Very Large Table (500 GB)

Question

Post reply

Create Clustered Index on a Very Large Table (500 GB)

-A

SSC-Addicted

Points: 424
More actions
May 7, 2014 at 9:36 am

#396718

Hello,
I have a scenario where I need to create a Clustered Index (CI) on a very large SQL Server 2012 database table. This table has about approximately 10 billion rows, 500 GB in size. The job ran for about 20 hours into it and then fails with error: "Out of disk space in tempdb". My tempDB size is 1.8TB, but yet it's still not enough.
I have exhausted Google search and decided to post it here to get some extra help.
Here is my script:
CREATE CLUSTERED INDEX CI_IndexName
ON TableName(Column1,Column2)
WITH (MAXDOP= 4, ONLINE=ON, SORT_IN_TEMPDB = ON, DATA_COMPRESSION=PAGE)
ON sh_WeekDT(Day_DT)
GO
Thank you in advance for all your help.

Abdel Ougnou

Viewing 15 posts - 1 through 15 (of 26 total)

You must be logged in to reply to this topic. Login to reply

cunningham SSCarpal Tunnel Points: 4595 More actions · Answer 1

Do you have any downtime windows to create the clustered index offline? Could be worth a go, will use less resource.

SQL Guy 1 SSCoach Points: 15815 More actions · Answer 2

1. Change SORT_IN_TEMPDB = OFF

2. Remove MAXDOP or set it to high and run at at weekend.

3. Consider partitioning such a huge table by date.

TheSQLGuru SSC Guru Points: 134017 More actions · Answer 3

SQL Guy 1 (5/7/2014)
1. Change SORT_IN_TEMPDB = OFF
2. Remove MAXDOP or set it to high and run at at weekend.
3. Consider partitioning such a huge table by date.

1) I believe the table is partitioned (...ON sh_WeekDT(Day_DT))

2) It could be that there are other massive tempdb consuming queries running at the same time. Were you monitoring tempdb usage during these 20 hours?? Big necessity when you know you are going to be crushing it with something like you are trying to do.

3) I also wonder if it isn't the ONLINE part that is getting you, or at least having a play in the issue. Keeping track of stuff that is currently getting modified can add up depending on what is happening.

Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service

-A SSC-Addicted Points: 424 More actions · Answer 4

Thanks for the replays.

--Here is the answers to the follow up questions:

1) I have to create this Index online since it is huge and I cannot afford such a long down time.

2) MAXDOP was set to 10 at first and showed no significant improvement in speed. I believe at the end the create index still happens on 1 processor anyways.

3) Also SORT_IN_TEMPDB = OFF or ON still didn't matter as SQL always seemed to use up all of the tempdb.

4) This table is already partitioned by date and currently has 570 partitions.

Abdel Ougnou

-A SSC-Addicted Points: 424 More actions · Answer 5

TheSQLGuru (5/7/2014)
2) It could be that there are other massive tempdb consuming queries running at the same time. Were you monitoring tempdb usage during these 20 hours?? Big necessity when you know you are going to be crushing it with something like you are trying to do.
3) I also wonder if it isn't the ONLINE part that is getting you, or at least having a play in the issue. Keeping track of stuff that is currently getting modified can add up depending on what is happening.

Yes I was monitoring and very little to nothing was happening as I did this on the weekend.

Perhaps worth trying to do the index offline to see if it finishes faster. However this was done on another DB that was not being accessed by anyone.

Abdel Ougnou

-A SSC-Addicted Points: 424 More actions · Answer 6

TheSQLGuru (5/7/2014)
3) I also wonder if it isn't the ONLINE part that is getting you, or at least having a play in the issue. Keeping track of stuff that is currently getting modified can add up depending on what is happening.

Just wanted to loop back and close this issues as successfully resolved.

Thanks to all for the help. TheSQLGuru was right, the online index creation was trying to keep track of too many changes as it tried to create and keep the index online at the same time. It could not complete this task within the given space on tempDB and withing a reasonable time. So changing this index creation setting to offline worked. The Clustered Index creation completed successfully and under 7:45 Hours. This was done over night during the time when users were offline.

The other great thing that is worth mentioning in case someone else has this similar scenario is by setting the index COMPRESSION=PAGE I was able to compress this 500 GB table down to 400 GB and the index size is only 1 GB. This is a savings of a 100 GB.

Thank you again to all.

Abdel Ougnou

cunningham SSCarpal Tunnel Points: 4595 More actions · Answer 7

cunningham (5/7/2014)
Do you have any downtime windows to create the clustered index offline? Could be worth a go, will use less resource.

Where's the love!

-A SSC-Addicted Points: 424 More actions · Answer 8

cunningham (5/12/2014)
cunningham (5/7/2014)
Do you have any downtime windows to create the clustered index offline? Could be worth a go, will use less resource.
Where's the love!

hhh, sorry, I actually didn't see that. Thank you very much.

Abdel Ougnou

Jeff Moden SSC Guru Points: 1003851 More actions · Answer 9

-A (5/7/2014)
Thanks for the replays.
--Here is the answers to the follow up questions:
1) I have to create this Index online since it is huge and I cannot afford such a long down time.
2) MAXDOP was set to 10 at first and showed no significant improvement in speed. I believe at the end the create index still happens on 1 processor anyways.
3) Also SORT_IN_TEMPDB = OFF or ON still didn't matter as SQL always seemed to use up all of the tempdb.
4) This table is already partitioned by date and currently has 570 partitions.

Wait justa minute... you have a partitioned table that has no Clustered Index??? :blink:

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

-A SSC-Addicted Points: 424 More actions · Answer 10

Jeff Moden (5/12/2014)
-A (5/7/2014)
Thanks for the replays.
--Here is the answers to the follow up questions:
1) I have to create this Index online since it is huge and I cannot afford such a long down time.
2) MAXDOP was set to 10 at first and showed no significant improvement in speed. I believe at the end the create index still happens on 1 processor anyways.
3) Also SORT_IN_TEMPDB = OFF or ON still didn't matter as SQL always seemed to use up all of the tempdb.
4) This table is already partitioned by date and currently has 570 partitions.
Wait justa minute... you have a partitioned table that has no Clustered Index??? :blink:

What seems to be the problem ?

Abdel Ougnou

Jeff Moden SSC Guru Points: 1003851 More actions · Answer 11

-A (5/12/2014)
Jeff Moden (5/12/2014)
-A (5/7/2014)
Thanks for the replays.
--Here is the answers to the follow up questions:
1) I have to create this Index online since it is huge and I cannot afford such a long down time.
2) MAXDOP was set to 10 at first and showed no significant improvement in speed. I believe at the end the create index still happens on 1 processor anyways.
3) Also SORT_IN_TEMPDB = OFF or ON still didn't matter as SQL always seemed to use up all of the tempdb.
4) This table is already partitioned by date and currently has 570 partitions.
Wait justa minute... you have a partitioned table that has no Clustered Index??? :blink:
What seems to be the problem ?

Nothing if you never update variable width columns to contain more than they did at INSERT time and you never do deletes, both of which can waste huge amounts of disk space, memory space, etc. Sure, you can rebuild a heap as of 2008 but that also comes with some hefty hidden costs for the NCIs.

I suppose that for tables like audit tables, which are only inserted into, a partitioned heap might actually do faster inserts than on a partitioned clustered table so that might be an advantage. I would think that you'd need some covering indexes to get any real performance out of such a thing and those might negate the insert advantage that a heap might have.

I've never tried such a thing before and it's an interesting prospect. What's your experience on such a thing?

And thanks for raising the question.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

TheSQLGuru SSC Guru Points: 134017 More actions · Answer 12

Jeff, if I recall correctly earlier versions of the Fast Track Datawarehouse appliance was spec'd for mostly HEAP tables to ensure the most efficient sequential scans. Not sure of latest models though. I concur with your notion of logging tables likely benefitting from no clustered index as well. Another scenario is tables with a non-sequential GUID as their key. The no-clustered-index scenarios all assume small-to-medium-range-scan performance increases available from CIs are not required.

Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service

Jeff Moden SSC Guru Points: 1003851 More actions · Answer 13

Thanks, Kevin. Now THAT's an interesting bit of information. I can see it being a benefit to INSERT performance as well since a heap on a temporal audit table would suffer roughly the same insert order as if a CI were present but without the overhead of updating the BTREE of a CI. Since there isn't much on Yabingooglehoo 🙂 on the notion of partitioning heaps, I guess I have some experimentation to do.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1003851 More actions · Answer 14

Jeff Moden (5/13/2014)
-A (5/12/2014)
Jeff Moden (5/12/2014)
-A (5/7/2014)
Thanks for the replays.
--Here is the answers to the follow up questions:
1) I have to create this Index online since it is huge and I cannot afford such a long down time.
2) MAXDOP was set to 10 at first and showed no significant improvement in speed. I believe at the end the create index still happens on 1 processor anyways.
3) Also SORT_IN_TEMPDB = OFF or ON still didn't matter as SQL always seemed to use up all of the tempdb.
4) This table is already partitioned by date and currently has 570 partitions.
Wait justa minute... you have a partitioned table that has no Clustered Index??? :blink:
What seems to be the problem ?
Nothing if you never update variable width columns to contain more than they did at INSERT time and you never do deletes, both of which can waste huge amounts of disk space, memory space, etc. Sure, you can rebuild a heap as of 2008 but that also comes with some hefty hidden costs for the NCIs.
I suppose that for tables like audit tables, which are only inserted into, a partitioned heap might actually do faster inserts than on a partitioned clustered table so that might be an advantage. I would think that you'd need some covering indexes to get any real performance out of such a thing and those might negate the insert advantage that a heap might have.
I've never tried such a thing before and it's an interesting prospect. What's your experience on such a thing?
And thanks for raising the question.

@Abdel,

Looking forward to your answer on the questiong of "What's your experience on such a thing?" above.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)