SQLServerCentral Article

Reclaiming Space After Column Data Type Change

,

Introduction

There are plenty of cases where changing the data type of a column in a table is a must in order to fulfill a business need.  In certain instances, the data type change is quick and painless, but in other cases, the change can cause the table size to balloon and unnecessarily use precious space. This article will explore what happens when a fixed-length column data type is increased and how to reclaim additional space used after the table alter.  Specifically, I am going to change a SmallDateTime column to a DateTime2 column using AdventureWorks2014.

Setup

For this example, I am using AdventureWorks2014 with about 240mb of storage.  To get started, I am going to create a stripped down version of the Sales.SalesOrderDetail table with the data type of the ModifiedDate column set as SmallDateTime.  

USE [AdventureWorks2014]
GO
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE TABLE [Sales].[SalesOrderDetail_Test](
[SalesOrderID] [int] NOT NULL,
[SalesOrderDetailID] [int] NOT NULL,
[CarrierTrackingNumber] [nvarchar](25) NULL,
[OrderQty] [smallint] NULL,
[ProductID] [int] NULL,
[SpecialOfferID] [int] NULL,
[UnitPrice] [money] NULL,
[UnitPriceDiscount] [money] NULL,
[LineTotal]  AS (isnull(([UnitPrice]*((1.0)-[UnitPriceDiscount]))*[OrderQty],(0.0))),
[rowguid] [uniqueidentifier] ROWGUIDCOL  NULL,
[ModifiedDate] [smalldatetime] NULL, --New SmallDateTime type instead of DateTime
 CONSTRAINT [PK_SalesOrderDetail_Test_SalesOrderID_SalesOrderDetailID] PRIMARY KEY CLUSTERED 
(
[SalesOrderID] ASC,
[SalesOrderDetailID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO

I am then going to populate the table with data from the AdventureWorks2014 database.

INSERT INTO [Sales].[SalesOrderDetail_Test] (
SalesOrderID, 
SalesOrderDetailID, 
CarrierTrackingNumber, 
OrderQty, 
ProductID, 
SpecialOfferID, 
UnitPrice, 
UnitPriceDiscount, 
--LineTotal,  --No LineTotal insert due to it being a calculated field
rowguid, 
ModifiedDate)
SELECT [SalesOrderID]
      ,[SalesOrderDetailID]
      ,[CarrierTrackingNumber]
      ,[OrderQty]
      ,[ProductID]
      ,[SpecialOfferID]
      ,[UnitPrice]
      ,[UnitPriceDiscount]
     -- ,[LineTotal]
      ,[rowguid]
      ,[ModifiedDate]
FROM [AdventureWorks2014].[Sales].[SalesOrderDetail]
GO 

Next, I am going to grab a few baseline measurements and benchmarks for comparing once the data type of ModifiedDate gets changed to DateTime2.  The measurements are the database file size, table size, and the physical columns in the table.  The first measurement, the database size and amount of data used at this point, can be queried using the T-SQL below. 

SELECT
  db_name(database_id) as databaseName,
  name AS FixedFileName,
  physical_name AS FixedFilePath,
  [file_id] AS fileID,
  (size * 8.0)/1024 AS FileSize,
  FILEPROPERTY(name, 'spaceused') *8/1024 AS [UsedSpace],
 (FILEPROPERTY(name, 'spaceused') *8/1024)/((size * 8.0)/1024) AS PercentUsed
 FROM sys.master_files 
 WHERE database_id = db_id() 

Figure 2-1. SSMS results of how much space is currently used in the AdventureWorks2014 database file after the creation of the new Sales.SalesOrderDetail_Test and data insert. 

The results show the database size for data is 199MB.

The next measurement, the size of the Sales.SalesOrderDetail_Test table, can be found by navigating to the table properties through the Object Explorer.  The table size is under the Storage tab, and we see this below is 9.156MB.

Figure 2-2. Current size of the Sales.SalesOrderDetail_Test table with SmallDateTime for ModifiedDate

Lastly, the physical columns in the table can be found using a couple of sys tables and a very handy query from Remus Rusanu

SELECT
sip.index_id, 
sip.partition_number, 
sipc.leaf_null_bit,
coalesce(cx.name, c.name) as column_name,
sipc.partition_column_id,
sipc.max_inrow_length,
sipc.max_length,
sipc.key_ordinal,
sipc.leaf_offset,
sipc.is_nullable,
sipc.is_dropped,
sipc.is_uniqueifier,
sipc.is_sparse,
sipc.is_anti_matter
FROM sys.system_internals_partitions sip
INNER JOIN sys.system_internals_partition_columns sipc
    ON sip.partition_id = sipc.partition_id
LEFT JOIN sys.index_columns ic
    ON sip.object_id = ic.object_id
    AND ic.index_id = sip.index_id
    AND ic.index_column_id = sipc.partition_column_id
LEFT JOIN sys.columns c
    ON sip.object_id = c.object_id
    AND ic.column_id = c.column_id 
LEFT JOIN sys.columns cx
    ON sip.object_id = cx.object_id 
    AND sip.index_id in (0,1)
    AND sipc.partition_column_id = cx.column_id
WHERE sip.object_id = OBJECT_ID(N'Adventureworks2014.Sales.SalesOrderDetail_Test')
ORDER BY index_id, partition_number;
The columns in the result set match the columns listed in the Object Explorer for this new table.

Figure 2-3. SSMS results showing the physical columns in the Sales.SalesOrderDetail_Test table. Current max_length of ModifiedDate is 4 since it is SmallDateTime

Now that I have a good idea of the size of my database and table, I can alter the data type of ModifiedDate to DateTime2 and recheck my benchmarks to see what happened.

Altering the Data Type 

I can alter the data type of ModifiedDate on the Sales.SalesOrderDetail_Test table with a very straightforward T-SQL query.

USE [AdventureWorks2014]
GO
ALTER TABLE [Sales].[SalesOrderDetail_Test]
ALTER COLUMN ModifiedDate DATETIME2
GO

Now, I am can review the benchmarks I measured and see what changed.  First, I am going to check my database size.

Figure 2-4. The database file increased in size by about 9mb by altering SmallDateTime to DateTime2

The database size increased from 199mb to 208mb.  That seems pretty excessive for just changing a data type.  That size increase basically represents the size of the entire Sales.SalesOrderDetail_Test table.  I can check the actual table next.

Figure 2-5.  The size of the Sales.SalesOrderDetail_Test table doubled after changing the data type to DateTime2

The table size increased from 9.156mb to 18.305mb when the data type changed even though the change seems somewhat minor and there were no new rows added.  Finally, I am going to take a look at the physical rows after the alter.

Figure 2-6.  There is a dropped column left over after the column alter.  The new column is using a new offset and is not reusing the space from the previous SmallDateTime column

This is where things get really interesting.  We can see a column is now labeled as dropped and there is a new column with a longer data length.  In addition to the longer length, the new column has a new leaf_offset after the previous column so we can assume the new column is not reusing the open space left by the SmallDateTime column. This is wasteful since that open offset range represents an empty portion of the fixed section of a data row on a data page.

To see this in another way, I am going to take a look directly at a data page.  I need to get a data page ID that has data for this table by using the SQL Server system function sys.fn_PhysLocFormatter(%%physloc%%).

SELECT
   *, 
   sys.fn_PhysLocFormatter(%%physloc%%) AS Page_Location
 FROM [Sales].[SalesOrderDetail_Test]

Figure 2-7.  SSMS results from using sys.fn_PhysLocFormatter(%%physloc%%) on the Sales.SalesOrderDetail_Test table.  The format of the Page_Location column is (File_ID, Page_ID, Slot_Number)

The sys.fn_PhysLocFormatter(%%physloc%%) function provides the database file ID, the page ID, and the slot number for the row in the result set.  With this information, I can now take a look at the data pages that holds the data for this table.  The DBCC PAGE command will show what the data looks like at the page level for this row. 

DBCC TRACEON(3604) --Send output of DBCC PAGE to SSMS Client
GO
DBCC PAGE('AdventureWorks2014',1,24184,3) -- (Database, file id, page id, output detail level[0,1,2 or 3])

When I scroll down to Slot 0 on this page, I can see there is a leftover column still present on the data page after the data type alter. 

Figure 2-8.  DBCC PAGE output that shows the DROPPED column from the table alter.  The column ID on the data page also matches the results from Figure 2-6 for the dropped column

After taking a look at all of the benchmarks again and looking at the underlying data pages of the Sales.SalesOrderDetail_Test table, I can clearly see there is space occupied after the alter that is much higher than expected.  The next step is reclaiming the space that is being unnecessarily used.

Reclaiming Storage Space

A reliable method of cleaning up a table after a series of changes is to rebuild it.  In this case, since the table has a clustered index, I can rebuild the clustered index and see how that impacts the database size, table size, and column data on the page and in the sys internal tables. 

USE [AdventureWorks2014]
GO
ALTER INDEX [PK_SalesOrderDetail_Test_SalesOrderID_SalesOrderDetailID]
  ON [Sales].[SalesOrderDetail_Test]
  REBUILD

Again, I can go through my benchmarks and see what happened after the clustered index rebuild.  First, I am going to look at the database file.

Figure 3-1.  SSMS results of checking the database file size.  After the clustered index rebuild, the used space decreased by 9mb. 

The clustered index rebuild decreased the used space of the database file by 9mb and returned the file to the size prior to the data type alter.  Next, I am going to check the table properties.

Figure 3-2.  The Storage properties of the Sales.SalesOrderDetail_Test table show the data space has decreased after the index rebuild from the 18mb it was previously.   

The table size is much lower than the 18mb it was previously.  It is not as low as it was before the data type change since DateTime2 takes more bytes to store than SmallDateTime.  We should also expect to not see the dropped columns that were present in the physical columns and the data pages.  We can rerun the internal sys query and the %%Physloc%%/DBCC PAGE queries to confirm this.  As a note, you'll have to rerun the %%physloc%% to get the new page IDs for the table data pages.  The index rebuild could have shifted the pages around. 

Figure 3-3. SSMS results for the columns of Sales.SalesOrderDetail_Test after rebuilding the clustered index.  There are no dropped tables and the ModifiedDate column shows a longer max length due to the data type change

There are no more columns that are labeled as dropped, the leaf_offsets line up based on max_length, and the max length of ModifiedDate matches what it should be for DateTime2. And lastly, we need to check the data pages.  Use the %%Physloc%% function to get a new page id after the index rebuild.

Figure 3-4.  The data page results from DBCC show the DROPPED column from Fig. 2-8 is gone.  The physical length of ModifiedDate is also changed to 8 to match the bytes required for DateTime2

As expected, the DROPPED columns seen in Figure 2-8 are no longer visible on the data page and the new physical length of the column for ModifiedDate matches the 8 bytes it takes to store a DateTime2 value. 

A straight column alter can cause trouble in SQL Server if you are not careful.  Changing a column in a large table can easily add several GB's of pages to a database file if you are not careful. Hopefully the tools in this piece help monitor the impact of column data type alters.

David Fundakowski

Additional Resources

SQL Server Table Columns Under the Hood - Remus Rusanu

DBCC PAGE - Microsoft MSDN

SQL Server 2008: New (undocumented) Physical Row Locator Function - Paul S. Randal

Reorganize and Rebuild Indexes - Microsoft MSDN

Rate

4.89 (19)

You rated this post out of 5. Change rating

Share

Share

Rate

4.89 (19)

You rated this post out of 5. Change rating