Space needed for clustered index rebuild

Question

Space needed for clustered index rebuild

Viewing 15 posts - 16 through 30 (of 38 total)

You must be logged in to reply to this topic. Login to reply

Gail Shaw SSC Guru Points: 1004514 More actions · Answer 1

Jeff Moden (5/10/2015)
Perhaps to rebuild a CI that needs it without rebuilding NCIs that don't?

ALTER INDEX ... REBUILD on the cluster would do that.

Heh... WHUT??? A rebuild error would never happen with a Microsoft product. 😛 For example, corruption never occurred when rebuilding indexes in an ONLINE fashion in 2012, right? :hehe:

😀

I was thinking something much more mundane, like Error 9002. The transaction log is full.

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

Jeff Moden SSC Guru Points: 1004686 More actions · Answer 2

Alright, Ken,

Now that we're done with our diversion, back to you problem. Please see my first post on this thread and answer some of those questions so that we can try to help, if you still need help.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Ken Davis SSCarpal Tunnel Points: 4329 More actions · Answer 3

Thanks to everyone for the notes and clarifications, especially the confirmation that a CI rebuild needs as much free space as the current size of the CI plus more space for sorting.

Jeff Moden, to your questions, this is an audit database where rows are written once and never updated. They are queried for research. I have to keep 6 months of data. I recently broke the 6-month database into a Current DB with the most recent month of data and an Archive DB with months 2-6. Most of the queries are within the current month and it's easy to keep those indexes organized and the queries fast. The big clustered index that needs rebuilding is on the Archive DB. I have archive/purge jobs constantly moving data > 1 month from Current to Archive and deleting data > 6 months from the Archive.

I have it in the back of my mind that this is probably a great candidate for partitioning. Honestly I am dragging my heels on that because I have never partitioned a large exiting MSSQL database. I don't know how much time it would take, etc. Jeff, I know there is a ton of info about that online but if you can share your high level steps for that I would appreciate it.

Gail Shaw SSC Guru Points: 1004514 More actions · Answer 4

Then I'm curious, why do you need an index rebuild?

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

Jeff Moden SSC Guru Points: 1004686 More actions · Answer 5

Ken Davis (5/11/2015)
Jeff Moden, to your questions, this is an audit database where rows are written once and never updated. They are queried for research. I have to keep 6 months of data. I recently broke the 6-month database into a Current DB with the most recent month of data and an Archive DB with months 2-6. Most of the queries are within the current month and it's easy to keep those indexes organized and the queries fast. The big clustered index that needs rebuilding is on the Archive DB. I have archive/purge jobs constantly moving data > 1 month from Current to Archive and deleting data > 6 months from the Archive.
I have it in the back of my mind that this is probably a great candidate for partitioning. Honestly I am dragging my heels on that because I have never partitioned a large exiting MSSQL database. I don't know how much time it would take, etc. Jeff, I know there is a ton of info about that online but if you can share your high level steps for that I would appreciate it.

If you're doing all you say you're doing without losing any data, then you need not be afraid of partitioning for this because it sounds like you have a great handle on T-SQL and the engine itself.

It does, in fact, sound like the perfect candidate for partitioning. As to the ton of info about partition online, I absolutely agree that's there is a ton and there aren't any places I've seen (save one almost BTW entry in a Paul Randal blog) that deal with some of the underlying space issues.

Before I can even talk about high level steps, you need to know that there are two types of partitioning depending on which Edition of SQL Server that you have. So, do you have the Standard Edition or the Enterprise Edition?

I also need to know a lot about the two tables you mentioned. The best way for me to find that out is to post the DDL (the CREATE TABLE) for the tables including all constraints and indexes for each of the two tables. This can be done simply by generating the scripts for the two tables making sure that the scripting options include the generation of such things. I don't think it likely but, if you have triggers on these tables, I'd also need to see those as well as any FK's (again, I think that not likely but have to be sure) that may point to the tables.

If there's the possibility of at least temporarily coming up with the space for another copy of the table, we can make this all happen in the background right up to the point of doing the "switch over" to the new table. If not, it'll take longer (mostly because of DELETEs as we go) but it's still doable in a mostly online fashion that won't require but occasional milliseconds of unavailability of a months worth of data.

We WILL absolutely need the extra space for your largest month of data. Based on your description, the 300GB+ that you previously mentioned will do the trick if you can't come up with the space to cover a full copy of the table plus another 10% or so.

Lemme know.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Ken Davis SSCarpal Tunnel Points: 4329 More actions · Answer 6

The CI on the big 5-month archive table is highly fragmented. Background info: this was implemented with a terrible clustering key - a random GUID. About two months ago I changed that to a sequential ever-increasing key. But that only applied to new inserts. I did not change the existing data as it will eventually get deleted. All rows since that change should be well organized but it's very fragmented before that.

Also, the CI is currently fillfactor = 80. I can change that to 100 since I now have a sequential key and rows that never get updated. That is the other reason I want to rebuild the CI.

Ken Davis SSCarpal Tunnel Points: 4329 More actions · Answer 7

Jeff, this is Enterprise edition. I am attaching the DDL. There are two tables involved. Transactions is the parent and makes up 70% of the DB. TransactionDetails is a child and makes up 30%. There are no triggers. It would take some effort to get enough space for another copy of the archive DB. I would probably have to get another LUN temporarily added to the cluster. These are MBR disks so the limit is 2TB and this disk only has 300gb free.

Gail Shaw SSC Guru Points: 1004514 More actions · Answer 8

Ken Davis (5/11/2015)
The CI on the big 5-month archive table is highly fragmented. Background info: this was implemented with a terrible clustering key - a random GUID. About two months ago I changed that to a sequential ever-increasing key. But that only applied to new inserts. I did not change the existing data as it will eventually get deleted. All rows since that change should be well organized but it's very fragmented before that.

Question is, do you care?

Fragmentation results in slower scans from disk, that's about all. Weigh that against the amount of work that will be required to rebuild this index. And also consider alter index .. reorganise. It won't let you change the fill factor, but might otherwise be a good option given how hard the extra space (and probable extra file/filegroup seeing as you can't expand the drive) will be to get.

If you want to partition for easier maintenance/loads/deletes, see Jeff's reply.

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

Jeff Moden SSC Guru Points: 1004686 More actions · Answer 9

Thanks, Ken. I'll have a deeper look tonight after work.

In the mean time, is it absolutely necessary for all those NVARCHAR columns to be NVARCHAR. Can some of them be changed to INT or BIGINT or just plain VARCHAR? Doing such a thing would cut the size of the tables almost in half.

Also, the NVARCHAR(1) things absolutely need to be changed (this would be done during the move to a partitioned table)... at least to NCHAR(1) if having UNICODE is important. I know it doesn't sound like much but 2 bytes for each entry are being used just to keep track of the size of the entry which can't be anything other than 1 character.

Also, any problem with us changing the nullability on the transaction date column in each table to be NOT NULL (especially since it has a default of GETDATE())? That would end up being the partitioning column and, although it can contain nulls, it's far better if it doesn't so that the first partition doesn't fill up with "junk".

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Ken Davis SSCarpal Tunnel Points: 4329 More actions · Answer 10

Gail, "do I care" is a great question. Knowing this CI is highly fragmented offends my DBA sensibilities 🙂 but I am not getting complaints of slow queries against the Archive DB. Once time goes by and the old inefficent key rolls out of the data I should be in better shape even without a rebuild.

I would like the space savings that going from fillfactor 80 to 100 would bring...

Gail Shaw SSC Guru Points: 1004514 More actions · Answer 11

The only way you're going to change the fill factor is with a rebuild. It's up to you whether that and your sensibilities are worth the work that this is going to take. 😀 :w00t:

If you do decide to rebuild, then take the opportunity to partition at the same time as this probably isn't a task you're going to want to do twice.

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

Bill Talada SSChampion Points: 11954 More actions · Answer 12

Ken Davis (5/11/2015)
The CI on the big 5-month archive table is highly fragmented. Background info: this was implemented with a terrible clustering key - a random GUID. About two months ago I changed that to a sequential ever-increasing key. But that only applied to new inserts. I did not change the existing data as it will eventually get deleted. All rows since that change should be well organized but it's very fragmented before that.
Also, the CI is currently fillfactor = 80. I can change that to 100 since I now have a sequential key and rows that never get updated. That is the other reason I want to rebuild the CI.

Hmm, inserting newSequentialIDs sounds like a bad idea to me since you give up the flexibility of Apps generating random guids. My strong preference would be a heap table, especially since you are valuing more recent data and want that clustered together. The best argument for clustering is to put rows into a reporting order to reduce sorting needs.

Deleting old rows sounds like it will decrease fragmentation. That might take a very long while. I've noticed deleting half the rows in a table rarely frees up any space. Seems sql server hangs onto empty pages as if they were being used until a clustered reindex is done.

If you defrag once a month, measure the percent change over that time period for each table and set your fillfactor to (100 - change%).

Ken Davis SSCarpal Tunnel Points: 4329 More actions · Answer 13

Thanks again for the advice and thanks Jeff for any time you spend on this.

My next concern with this big DB is the 2TB disk limit. I will likely eventually need more space and I think the answer will be adding two large LUNs to the cluster, add new files to the DB for those disks, then use DBCC SHRINKFILE with EMPTYFILE to migrate the data to the new disks and off of the current disk. Then I can "give back" the current LUN. But I have lots of concerns about that like how long it will take, how available the data will be, etc.

Having said all that I think I will start a new thread on that topic instead of changing the topic of this thread...

Jeff Moden SSC Guru Points: 1004686 More actions · Answer 14

Ken Davis (5/11/2015)
Thanks again for the advice and thanks Jeff for any time you spend on this.
My next concern with this big DB is the 2TB disk limit. I will likely eventually need more space and I think the answer will be adding two large LUNs to the cluster, add new files to the DB for those disks, then use DBCC SHRINKFILE with EMPTYFILE to migrate the data to the new disks and off of the current disk. Then I can "give back" the current LUN. But I have lots of concerns about that like how long it will take, how available the data will be, etc.
Having said all that I think I will start a new thread on that topic instead of changing the topic of this thread...

Nah... keep it all here. It's all related. If you can get the extra space, that'll work a treat and then you can give back the current LUN. And there's no need for the SHRINKFILE and EMPTYFILE thing. That will all be taken care of during the partitioning. With the extra space, this will be a whole lot easier than you might imagine. I've done this twice for various folks now and once for myself. With the extra space on a temporary basis, this will be easy and there are a ton of maintenance benefits that go along with it including greatly reducing backup times and "get back in business" restores.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1004686 More actions · Answer 15

Getting back to the subject, can any of the NVARCHAR columns be change to VARCHAR or even INT/BIGINT for things that have numbers in them? It's not a requirement for success here but it would dramatically cut the overall space requirements for these tables and the related indexes. That also affects backups and restores in a highly favorable manner.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)