Space needed for clustered index rebuild

  • I have a large table in a large database. The DB = 1.2TB and the big table is 70% of the DB or 840GB. I would like to rebuild the clustered index on the big table but the disk only has about 300GB free. Is it true that I need enough free space for a full copy of the table (840GB)?

    This is 2008r2 Enterprise and I would set the DB to Bulk Logged and do the Rebuild in offline mode.

    Thanks

  • Ken Davis (5/9/2015)


    I have a large table in a large database. The DB = 1.2TB and the big table is 70% of the DB or 840GB. I would like to rebuild the clustered index on the big table but the disk only has about 300GB free. Is it true that I need enough free space for a full copy of the table (840GB)?

    This is 2008r2 Enterprise and I would set the DB to Bulk Logged and do the Rebuild in offline mode.

    Thanks

    Quick thought, if you use CREATE INDEX with DROP_EXISTING = ON, the size of the database will not increase, only the transaction log.

    😎

    CREATE UNIQUE CLUSTERED INDEX [PK_INDEX_NAME] ON [TABLE_NAME] ([INDEX_COLUMN] ASC) WITH (DROP_EXISTING = ON);

    Edit: wrong advice, see posts below

  • Ken Davis (5/9/2015)


    I have a large table in a large database. The DB = 1.2TB and the big table is 70% of the DB or 840GB. I would like to rebuild the clustered index on the big table but the disk only has about 300GB free. Is it true that I need enough free space for a full copy of the table (840GB)?

    This is 2008r2 Enterprise and I would set the DB to Bulk Logged and do the Rebuild in offline mode.

    Thanks

    Hi, Ken,

    I'm curious. What type of table is it? For example, is it an audit table of some sort where the data is written to once and never modified? Is it a table like an invoice detail table where it is written to and updated only a short time (say, less than 30 days) after the row is written? In other words, is it a table where rows that are 30 days old or older are usually not updated at all?

    I'm asking all these questions because doing a Clustered Index Rebuild would be the perfect time to consider partitioning on a temporal basis so that you can rebuild a month at a time in the future and only those months of data that actually need it. I even have a way to compress any free space that normal index rebuilds might cause in such partitioning schemes.

    As a bit of a side benefit, this can also greatly reduce your backup times/space and the time for a "get back in business" restore if the table can be partitioned temporally and if the rows in the table are usually static after a month or even upon original insert. I did this with our telephone system, which necessarily stores call recordings in the database. Because I was able to make the older months read-only because of the "write once" nature of the data, I was able to drop my full backup times (necessarily to NAS rather than SAN) from nearly 10 hours to just a couple of minutes because I only need to backup the current active month. If the system were to totally burn to the ground (knocking on wood), I could have it back up and running in minutes (restore just the Primary and current month filegroups/partitions) and then do "piece-meal" restores for the rest of the months almost at my leisure. It also allows for an online "piece-meal" restore if a corruption occurs on one of the months.

    Needless to say, my index and stats maintenance times have shrunk to nearly 0, as well.

    There are some caveats in that there should be no FKs that point at the table because of what happens to unique indexes when partitioning (they're actually no longer unique because the partitioning column is added to them if you want to enjoy the benefits of "aligned partitioned indexes") and it can slow performance down a bit for non-temporally based queries, but if it's a mostly "write and no updates" table, partitioning makes a huge amount of sense.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Eirikur Eiriksson (5/10/2015)


    Ken Davis (5/9/2015)


    I have a large table in a large database. The DB = 1.2TB and the big table is 70% of the DB or 840GB. I would like to rebuild the clustered index on the big table but the disk only has about 300GB free. Is it true that I need enough free space for a full copy of the table (840GB)?

    This is 2008r2 Enterprise and I would set the DB to Bulk Logged and do the Rebuild in offline mode.

    Thanks

    Quick thought, if you use CREATE INDEX with DROP_EXISTING = ON, the size of the database will not increase, only the transaction log.

    😎

    CREATE UNIQUE CLUSTERED INDEX [PK_INDEX_NAME] ON [TABLE_NAME] ([INDEX_COLUMN] ASC) WITH (DROP_EXISTING = ON);

    Careful now... The problem with the Internet and publications like Books Online is that only half of it is true and the rest is written in such a fashion that you cannot tell if it's true or not. 😀

    Have you actually ever done a measurement on this subject? I could be wrong in some instances but it would appear the space savings are not true when a Clustered Index is involved. Please see the following code for proof.

    /******************************

    --===== Manual cleanup for after test

    USE [master];

    DROP DATABASE [JBMSizeTest]

    ******************************/

    GO

    --=================================================================================================

    -- Create a new test database with several known conditions

    --=================================================================================================

    --===== Supress the autodisplay of rowcounts

    SET NOCOUNT ON;

    --===== Create the database in the Simple Recovery Mode with small increments of growth to measure

    USE [master];

    CREATE DATABASE [JBMSizeTest];

    ALTER DATABASE [JBMSizeTest] SET RECOVERY SIMPLE WITH NO_WAIT;

    ALTER DATABASE [JBMSizeTest] MODIFY FILE (NAME=N'JBMSizeTest' ,SIZE=5MB,FILEGROWTH= 1MB);

    ALTER DATABASE [JBMSizeTest] MODIFY FILE (NAME=N'JBMSizeTest_log',SIZE=5MB,FILEGROWTH=1MB);

    --===== Show the current size of the MDF/LDF files

    PRINT '========== Original Size ==========';

    SELECT DBFileName = CAST(name AS CHAR(15))

    ,SizeMB=Size/128

    ,type_desc

    FROM sys.master_files

    WHERE name LIKE 'JBMSizeTest%';

    PRINT REPLICATE('=',90);

    GO

    --=================================================================================================

    -- Create a decent size table with a named clustered index and

    --=================================================================================================

    --===== Make the new database current

    USE JBMSizeTest;

    --===== Create a new table with a named Clustered Index so we can use the name later

    CREATE TABLE dbo.RebuildTest

    (

    RowNum INT IDENTITY(1,1)

    ,SomeVarchar VARCHAR(50)

    ,CONSTRAINT PK_RebuildTest PRIMARY KEY CLUSTERED (RowNum)

    )

    ;

    --===== Populate the table with a decent amount of rows

    INSERT INTO dbo.RebuildTest

    (SomeVarchar)

    SELECT TOP 2000000

    SomeVarchar = NEWID()

    FROM sys.all_columns ac1, sys.all_columns ac2

    ;

    --===== Make sure everything is committed to disk

    CHECKPOINT;

    --===== Show the current size of the MDF/LDF files

    PRINT '========== After Table Build ==========';

    SELECT DBFileName = CAST(name AS CHAR(15))

    ,SizeMB=Size/128

    ,type_desc

    FROM sys.master_files

    WHERE name LIKE 'JBMSizeTest%';

    PRINT REPLICATE('=',90);

    GO

    --=================================================================================================

    -- Shrink just the log file to as small as possible so we can remeasure

    --=================================================================================================

    --===== Shrink the logfile to 1MB

    DBCC SHRINKFILE (N'JBMSizeTest_log',1) WITH NO_INFOMSGS

    ;

    --===== Show the current size of the MDF/LDF files

    PRINT '========== After Logfile Shrink ==========';

    SELECT DBFileName = CAST(name AS CHAR(15))

    ,SizeMB=Size/128

    ,type_desc

    FROM sys.master_files

    WHERE name LIKE 'JBMSizeTest%';

    PRINT REPLICATE('=',90);

    GO

    --=================================================================================================

    -- Rebuild the Clustered Index and then show the sizes to prove that

    -- WITH DROP EXISTING doesn't help with the extra space problem for Clustered Indexes.

    -- Remember that we're in the SIMPLE Recovery Model and than index BUILDs/REBUILDs are

    -- minimally logged. Even with SORT_IN_TEMPDB=ON, the single table database still grew

    -- by the size of the original table + 30%. Yes, all but 105MB is free space when it's

    -- all done but the extra space MUST EXIST while inprocess.

    --=================================================================================================

    --===== Rebuild the index using WITH DROP EXISTING to try to save on space.

    CREATE UNIQUE CLUSTERED INDEX PK_RebuildTest ON dbo.RebuildTest(RowNum)

    WITH (DROP_EXISTING=ON,SORT_IN_TEMPDB=ON,ONLINE=OFF)

    ;

    --===== Show the current size of the MDF/LDF files

    PRINT '========== After the Rebuild WITH DROP EXISTING ==========';

    PRINT ' (It didn''t do any good for MDF space savings!)'

    SELECT DBFileName = CAST(name AS CHAR(15))

    ,SizeMB=Size/128

    ,type_desc

    FROM sys.master_files

    WHERE name LIKE 'JBMSizeTest%';

    PRINT REPLICATE('=',90);

    GO

    Run results... the MDF file grew by 130%!!!

    ========== Original Size ==========

    DBFileName SizeMB type_desc

    --------------- ----------- ------------------------------------------------------------

    JBMSizeTest 5 ROWS

    JBMSizeTest_log 5 LOG

    ==========================================================================================

    ========== After Table Build ==========

    DBFileName SizeMB type_desc

    --------------- ----------- ------------------------------------------------------------

    JBMSizeTest 105 ROWS

    JBMSizeTest_log 602 LOG

    ==========================================================================================

    ========== After Logfile Shrink ==========

    DBFileName SizeMB type_desc

    --------------- ----------- ------------------------------------------------------------

    JBMSizeTest 105 ROWS

    JBMSizeTest_log 1 LOG

    ==========================================================================================

    ========== After the Rebuild WITH DROP EXISTING ==========

    (It didn't do any good for MDF space savings!)

    DBFileName SizeMB type_desc

    --------------- ----------- ------------------------------------------------------------

    JBMSizeTest 242 ROWS

    JBMSizeTest_log 3 LOG

    ==========================================================================================

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Eirikur Eiriksson (5/10/2015)


    Ken Davis (5/9/2015)


    I have a large table in a large database. The DB = 1.2TB and the big table is 70% of the DB or 840GB. I would like to rebuild the clustered index on the big table but the disk only has about 300GB free. Is it true that I need enough free space for a full copy of the table (840GB)?

    This is 2008r2 Enterprise and I would set the DB to Bulk Logged and do the Rebuild in offline mode.

    Thanks

    Quick thought, if you use CREATE INDEX with DROP_EXISTING = ON, the size of the database will not increase, only the transaction log.

    False.

    ALL forms of recreating the index (other than an outright DROP followed by a CREATE of a nonclustered index) will create the new index and then drop the old one (they have to for rollback purposes). Hence CREATE with DROP_EXISTING needs just as much free space in the data file as ALTER INDEX ... REBUILD, which is at least the full size of the index plus ~25% for the sort space. The sort space can be shifted to TempDB with the SORT_IN_TEMPDB option but there's no way around needing at least size of the index free in the data file.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • P.S.

    Just to be sure, I did a deep dive double-check in Books Online and, as logical as it might seem, I can't find where it even implies that WITH DROP EXISTING conserves any database file space when used to rebuild indexes. Although that seems logical with NCI's, even those aren't included in such a statement. The only recommendation for space savings is to DISABLE NCIs prior to a rebuild. WITH DROP EXISTING on a CI simply keeps all the NCI's from being rebuilt twice compared to a drop/rebuild of the CI.

    So I can't even blame Books Online for starting this myth.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (5/10/2015)


    but it would appear the space savings are not true when a Clustered Index is involved.

    Nor for a nonclustered index.

    CREATE ... WITH DROP_EXISTING drops the old index once the new one has been created, and it reads the old index to create the new one (meaning it can avoid expensive sorts if the keys aren't changing). Hence there needs to be space for the new index or the CREATE will fail.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Ken Davis (5/9/2015)


    I have a large table in a large database. The DB = 1.2TB and the big table is 70% of the DB or 840GB. I would like to rebuild the clustered index on the big table but the disk only has about 300GB free. Is it true that I need enough free space for a full copy of the table (840GB)?

    Yes, you do. And your next log backup would be ~840GB in size as well because all the modified pages would be included in it.

    Have you considered ALTER INDEX ... REORGANISE instead? Firstly you can stop it at any time and it stops at the point it reached, rather than rolling back which rebuild does. Second it doesn't need much free space in the data file. A single free page to be specific.

    Now, it is a fully logged operation but because it isn't a single transaction, you can run transaction log backups while it's running and they will do the usual job of making the log reusable.

    Also, have you considered archiving or partitioning that monster (partitioning's for maintenance, which sounds like what you need about now)

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Jeff Moden (5/10/2015)


    The only recommendation for space savings is to DISABLE NCIs prior to a rebuild.

    Which itself is wrong, because ALTER INDEX .. REBUILD on the clustered index doesn't touch the nonclustered indexes and hasn't since SQL 2000. CREATE .. WITH DROP_EXISTING of the cluster does however rebuild all nonclustered indexes because the key could change.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • GilaMonster (5/10/2015)


    Jeff Moden (5/10/2015)


    The only recommendation for space savings is to DISABLE NCIs prior to a rebuild.

    Which itself is wrong, because ALTER INDEX .. REBUILD on the clustered index doesn't touch the nonclustered indexes and hasn't since SQL 2000. CREATE .. WITH DROP_EXISTING of the cluster does however rebuild all nonclustered indexes because the key could change.

    I was adding more information to my post while you were posting. Thanks for your confirmations.

    On the subject above, I didn't mention the context and didn't mean to associate that with CI rebuilds in any way. This is what I was referencing in BOL (found under "DISABLING INDEXES").

    When a nonclustered index is not disabled, the rebuild operation requires enough temporary disk space to store both the old and new index. However, by disabling and rebuilding a nonclustered index in separate transactions, the disk space made available by disabling the index can be reused by the subsequent rebuild or any other operation. No additional space is required except for temporary disk space for sorting; this is typically 20 percent of the index size.

    If disk space is limited, it may be helpful to disable the nonclustered index before rebuilding it. For example, you have a stored procedure that rebuilds all nonclustered indexes in one or more tables. By disabling these indexes first in a separate transaction from the rebuild operation, you can significantly reduce the amount of temporary disk space required to rebuild them.

    I haven't actually ever tested it (but will in the very near future) but it would appear that disabling NCIs (with the warning about FKs) would be a benefit no matter what rebuilt them.

    On the subject of...

    CREATE .. WITH DROP_EXISTING of the cluster does however rebuild all nonclustered indexes because the key could change.

    I trust you more than anyone in the world on such things but, I have to ask, is BOL incorrect in this case or is this a case of MS switching context from CI to NCI without putting in the clutch and me taking it wrong? It says...

    [font="Arial Black"]DROP EXISTING CLAUSE[/font]

    {snip}

    DROP_EXISTING enhances performance when you re-create a clustered index, with either the same or different set of keys, on a table that also has nonclustered indexes. DROP_EXISTING replaces the execution of a DROP INDEX statement on the old clustered index followed by the execution of a CREATE INDEX statement for the new clustered index. The nonclustered indexes are rebuilt once, [font="Arial Black"]and then only if the index definition has changed.[/font] [font="Arial Black"]The DROP_EXISTING clause does not rebuild the nonclustered indexes when the index definition has the same index name, key and partition columns, uniqueness attribute, and sort order as the original index.[/font]

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • GilaMonster (5/10/2015)


    Eirikur Eiriksson (5/10/2015)


    Ken Davis (5/9/2015)


    I have a large table in a large database. The DB = 1.2TB and the big table is 70% of the DB or 840GB. I would like to rebuild the clustered index on the big table but the disk only has about 300GB free. Is it true that I need enough free space for a full copy of the table (840GB)?

    This is 2008r2 Enterprise and I would set the DB to Bulk Logged and do the Rebuild in offline mode.

    Thanks

    Quick thought, if you use CREATE INDEX with DROP_EXISTING = ON, the size of the database will not increase, only the transaction log.

    False.

    ALL forms of recreating the index (other than an outright DROP followed by a CREATE of a nonclustered index) will create the new index and then drop the old one (they have to for rollback purposes). Hence CREATE with DROP_EXISTING needs just as much free space in the data file as ALTER INDEX ... REBUILD, which is at least the full size of the index plus ~25% for the sort space. The sort space can be shifted to TempDB with the SORT_IN_TEMPDB option but there's no way around needing at least size of the index free in the data file.

    Entirely my bad, cannot even blame BOL for this on, thanks for the correction Gail.

    😎

  • Cool, I thought it always rebuilt the NCs in a drop_existing on the cluster. Glad there's some extra common sense in there.

    That said, I'm not sure why one would CREATE ... WITH DROP_EXISTING a clustered index without intending to change keys, sort order or the like.

    Yes, you can disable the NC then rebuild them, but I'd rather not do that unless absolutely necessary. What happens if the rebuild hits an error? And if they're unique NC indexes that means dups can go in between the disable and rebuild and that's just a mess.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • GilaMonster (5/10/2015)


    That said, I'm not sure why one would CREATE ... WITH DROP_EXISTING a clustered index without intending to change keys, sort order or the like.

    Most common use I see is to move tables from one filegroup to another.

    😎

  • Jeff Moden (5/10/2015)


    Have you actually ever done a measurement on this subject? I could be wrong in some instances but it would appear the space savings are not true when a Clustered Index is involved. Please see the following code for proof.

    I did do a quick test but i mixed up the files in the results, that's what I get trying to do 20 things at the same time and none of them being making the required industrial strength espresso.

    😎

    Test code

    select * from sys.master_files where database_id = db_id()

    CREATE UNIQUE CLUSTERED INDEX [PK_INDEX_NAME] ON [TABLE_NAME] ([INDEX_COLUMN] ASC) WITH (DROP_EXISTING = ON);

    select * from sys.master_files where database_id = db_id()

  • GilaMonster (5/10/2015)


    Cool, I thought it always rebuilt the NCs in a drop_existing on the cluster. Glad there's some extra common sense in there.

    That said, I'm not sure why one would CREATE ... WITH DROP_EXISTING a clustered index without intending to change keys, sort order or the like.

    Perhaps to rebuild a CI that needs it without rebuilding NCIs that don't? Heh... I'm with you. I can't imagine even that happening on any thoughtfully indexed table.

    Yes, you can disable the NC then rebuild them, but I'd rather not do that unless absolutely necessary. What happens if the rebuild hits an error? And if they're unique NC indexes that means dups can go in between the disable and rebuild and that's just a mess.

    Heh... WHUT??? A rebuild error would never happen with a Microsoft product. 😛 For example, corruption never occurred when rebuilding indexes in an ONLINE fashion in 2012, right? :hehe:

    Again... I'm with you there on not disabling unique NCIs for the very reasons you speak of. I do, however, disable some of the ridiculously wide covering indexes and NCIs on non-partitioned ridiculously large tables that we have at work prior to rebuilding them. Rebuilding the "normal" NCIs don't normally cause unnecessary file growth so I won't disable those or the Unique NCIs (ever) but I'm quick to disable those overgrown rat's nests in my index maintenance routines.

    Thanks again for all that you know. I don't say it often enough but "SQL in the Wild" is a gold mine that I've referred lots of folks to and frequently refer to myself.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 15 posts - 1 through 15 (of 37 total)

You must be logged in to reply to this topic. Login to reply