Adding an Index to Save Space


Sounds weird right, but that is exactly what has happened with a VLDB I inherited.  A third party database that I support contains 23 columns per table and 16 of those columns have a non clustered index.  That’s it, that is all there is to the table? Each month a new table is created and that month’s data is stored in its respective table.  At some point monthly tables are merged into monthly tables and that is where they stay until retention periods are met and the tables are dropped.

Prior to me taking over support of this database all the quarterly and monthly tables were in a single file group.  Since this database is rather large I created file groups based on year and moved the respective tables to their yearly file group.  I started this project in late 2010 so I stopped with the year 2009.

So what is missing with my description of the index structure above.  I stated there are 16 non clustered indexes per table and we have quarterly tables for many years worth of data.  Where is the clustered index?  Yes folks, when I took over this system there was no clustered index on any of the tables.  We are talking billions of rows of data.

In order for you to move the tables into a new file group you have to move the clustered index.  I learned a great deal working on this project regarding indexes.  One of which is that if the table is a heap, SQL Server will assign an 8 byte row identifier for the non clustered index.  In my case I created a new column and chose INT as the data type.  Remember INT is only 4 bytes.

When I first started in development, I created a new table with the same columns as my production tables, created the new column as identity seed, created the clustered index, inserted all the data into the new table and recreated the 16 non clustered indexes.  That is all pretty basic stuff for moving data into a new file group.   Once all this was tested and verified with the vendor I created the new column on all production tables.  This was a necessity since the application uses a view with a UNION ALL across all the tables.  Only the quarterly tables have the clustered index with the identity seed column, the monthly tables contain the new column but all values are NULL.

So what happened?  I created a new table, made the column I created an identity seed, created a clustered index, filled the table full of production data, and then recreated all my non clustered indexes, which should increase the size of my database table right?  Nope.

In the example I show below the data size is the same because my production table also contains the new column but a NULL value.

So I created a clustered index, why did the size of my indexes shrink?  They should have grown right, I went from 16 non clustered indexes to a clustered index and 16 non clustered.  That is 17 indexes on the table.  Remember, before I created the clustered index the table was a heap therefore SQL assigned each row an 8 byte row identifier.  I made the 4 byte column I created late last year a clustered index.  A non clustered index also contains the row identifier.  In the case of the heap, each non clustered index contained the 8 byte row identifier that SQL Server had assigned.  I reduced the size of each non clustered index by 4 bytes.  Multiply that times millions of rows and 16 indexes per row.  It adds up quick doesn’t it?

Take a look at the chart below to see my 2010 data.  In total I reduced the size of my indexes by 8,074 MB for one year of data.  Not bad for a day’s work.