SQL Clone
SQLServerCentral is supported by Redgate
Log in  ::  Register  ::  Not logged in

Adding an Index to Save Space

Sounds weird right, but that is exactly what has happened with a VLDB I inherited.  A third party database that I support contains 23 columns per table and 16 of those columns have a non clustered index.  That’s it, that is all there is to the table? Each month a new table is created and that month’s data is stored in its respective table.  At some point monthly tables are merged into monthly tables and that is where they stay until retention periods are met and the tables are dropped.

Prior to me taking over support of this database all the quarterly and monthly tables were in a single file group.  Since this database is rather large I created file groups based on year and moved the respective tables to their yearly file group.  I started this project in late 2010 so I stopped with the year 2009.

So what is missing with my description of the index structure above.  I stated there are 16 non clustered indexes per table and we have quarterly tables for many years worth of data.  Where is the clustered index?  Yes folks, when I took over this system there was no clustered index on any of the tables.  We are talking billions of rows of data.

In order for you to move the tables into a new file group you have to move the clustered index.  I learned a great deal working on this project regarding indexes.  One of which is that if the table is a heap, SQL Server will assign an 8 byte row identifier for the non clustered index.  In my case I created a new column and chose INT as the data type.  Remember INT is only 4 bytes.

When I first started in development, I created a new table with the same columns as my production tables, created the new column as identity seed, created the clustered index, inserted all the data into the new table and recreated the 16 non clustered indexes.  That is all pretty basic stuff for moving data into a new file group.   Once all this was tested and verified with the vendor I created the new column on all production tables.  This was a necessity since the application uses a view with a UNION ALL across all the tables.  Only the quarterly tables have the clustered index with the identity seed column, the monthly tables contain the new column but all values are NULL.

So what happened?  I created a new table, made the column I created an identity seed, created a clustered index, filled the table full of production data, and then recreated all my non clustered indexes, which should increase the size of my database table right?  Nope.

In the example I show below the data size is the same because my production table also contains the new column but a NULL value.

So I created a clustered index, why did the size of my indexes shrink?  They should have grown right, I went from 16 non clustered indexes to a clustered index and 16 non clustered.  That is 17 indexes on the table.  Remember, before I created the clustered index the table was a heap therefore SQL assigned each row an 8 byte row identifier.  I made the 4 byte column I created late last year a clustered index.  A non clustered index also contains the row identifier.  In the case of the heap, each non clustered index contained the 8 byte row identifier that SQL Server had assigned.  I reduced the size of each non clustered index by 4 bytes.  Multiply that times millions of rows and 16 indexes per row.  It adds up quick doesn’t it?

Take a look at the chart below to see my 2010 data.  In total I reduced the size of my indexes by 8,074 MB for one year of data.  Not bad for a day’s work.

Tim Radney - Database Professional

Tim Radney - Database Professional by Tim Radney I am a Sr DBA for a top 40 US bank. I live in the south eastern US. I have been working with database since 1999 but only full time for the past three years.


Posted by Steve Jones on 18 August 2011

Very nicely done. That's a good reason to make sure you have a clustered index in most situations. You might be adding unnecessary space to a table because SQL Server needs to ensure it can uniquely identify each row.

Posted by Wilfred van Dijk on 22 August 2011

Here's something I did lately: I examined the clustered indexes and selected those with more than 3 columns. I promoted a better candidate (with real static data) to the clustered index. Since all nonclustered indexes contain the clustered columns, this action reduces the nonclustered indexes also. My total index size reduced from 174GB to 160GB

Posted by Sean Smith on 22 August 2011

Excellent little article with some great fundamental observations and points. Thanks for sharing!

Posted by Kenneth Wymore on 22 August 2011

Nicely done! I will be sharing this with my team to drive home the value of clustered indexes.

Posted by RadMan2k6 on 22 August 2011

Great article.  I was wondering why you chose INT for the ID? I have seen INT run out when supporting BILLIONS of rows.

Posted by Tim Radney on 22 August 2011

jparker2 - In my case the way the data is partitioned into quarterly tables they wont exceed a couple of hundred million rows.  2.4 billion for an INT is way more than enough.

Posted by dheeraj on 29 August 2011

Hi Tim,

Can you also share the change in the I/O behaviour?

Would be interesting (to me atleast) since you mention that there are 16 indices on a 23 column table.

Posted by Manish Mittal on 9 October 2011

Nice learning. Thanks to Tim for sharing it.

Leave a Comment

Please register or log in to leave a comment.