SQLServerCentral Article

Comparing Space Usage for Different Compression Types

,

Sometimes it is not clear to everyone if and how the different types of indexes influence compression, as well as the impact on the disk space used for the compressed and indexed tables. In the following article, I will perform a small investigation, so that the reader can get an impression on the impact on disk space before and after compression, considering different index types. This comparison will not consider the performance implications of the compression and the corresponding advantages and disadvantages. It is just looking at the space used. On the internet you can find many good articles about the performance implications.

For this test I am using an example Global Master data file with no normalization applied and non-optimal selection of datatypes. The file contains 44 columns with different data types and 279,690 rows. The smallest data type is a varchar(1) and the biggest is a varchar(255). The table has just 1 partition.

This table has a lot more potential for reducing the table size, but in this test, I will just focus on a comparison on Heap tables, Clustered index tables and Clustered Columnstore index tables with compression settings of:

  • Page level
  • Columnstore
  • Archive
  • No compression

It is not possible to create a table with no compression and a Columnstore Index, because Columnstore tables and indexes are always stored with columnstore compression. This will give 6 different scenarios:

Type of Compression

Type of Index

No Compression

Heap

No Compression

Clustered

Columnstore

Clustered Columnstore

Page Level Compression

Heap

Archive

Clustered Columnstore

Page Level Compression

Columnstore

The Indexed key column is the Material number. It is null but doesn’t have any NULLs included. For the Clustered Index I use a FILLFACTOR of 100. To get the used space of the table the stored procedure sp_spaceused will be used. The used output of this SP contains:

Column name

Description

rows

Number of rows existing in the table.

data

Total amount of space used by data.

index_size

Total amount of space used by indexes.

This is the script to get the output:

--Script to create the table, which give information of the used space the tables.
DROP TABLE  #spaceused;
GO
CREATE TABLE #spaceused
(name       NVARCHAR(128),
 rows       CHAR(20),
 reserved   VARCHAR(18),
 data       VARCHAR(18),
 index_size VARCHAR(18),
 unused     VARCHAR(18)
);
INSERT INTO #spaceused
 EXEC sp_spaceused
     N'[core].[NoComp_ClustInd]';
INSERT INTO #spaceused
 EXEC sp_spaceused
     N'[core].[ClustColumnStInd]';
INSERT INTO #spaceused
 EXEC sp_spaceused
     N'[core].[NoComp_Heap]';
INSERT INTO #spaceused
 EXEC sp_spaceused
     N'[core].[PageLevel_ClustInd]';
INSERT INTO #spaceused
 EXEC sp_spaceused
     N'[core].[Archive_ClustColumnStInd]';
INSERT INTO #spaceused
 EXEC sp_spaceused
     N'[core].[PageLevel_Heap]';
SELECT name,
       rows,
       data,
       index_size,
FROM #spaceused;

This is the table overview before the indexes are created and the compression is used. The column Total Size is the sum of the columns data and Index_size. The columns data, index_size and total Size are in using the unit of measure KB.

name

rows

data

index_size

Total Size

NoComp_ClustInd

279690

122024

16

122.040

ClustColumnStInd

279690

122024

16

122.040

NoComp_Heap

279690

122024

16

122.040

PageLevel_ClustInd

279690

122024

16

122.040

Archive_ClustColumnStInd

279690

122024

16

122.040

PageLevel_Heap

279690

122024

16

122.040

This is the table after the indexes are created on the table. The Heap table doesn’t get a new index for sure:

name

rows

data

index_size

Total Size

NoComp_ClustInd

279690

123168

480

123.648

ClustColumnStInd

279690

34016

0

34.016

NoComp_Heap

279690

122024

16

122.040

PageLevel_ClustInd

279690

123160

480

123.640

Archive_ClustColumnStInd

279690

34016

0

34.016

PageLevel_Heap

279690

122024

16

122.040

Now we create Indexes on the analyzed tables.

CREATE CLUSTERED INDEX IX_NoComp_ClustInd ON core.NoComp_ClustInd
       (Material) WITH( FILLFACTOR = 100) ON [PRIMARY];
CREATE CLUSTERED COLUMNSTORE INDEX IX_ClustColumnStInd ON core.ClustColumnStInd
         ON [PRIMARY];
CREATE CLUSTERED INDEX IX_PageLevel_ClustInd ON core.PageLevel_ClustInd
       (Material) WITH( FILLFACTOR = 100) ON [PRIMARY];
CREATE CLUSTERED COLUMNSTORE INDEX IX_Archive_ClustColumnStInd ON core.Archive_ClustColumnStInd
        ON [PRIMARY];

Now let's use Compression on the analysed tables.

ALTER TABLE [core].PageLevel_ClustInd REBUILD PARTITION = ALL
 WITH (DATA_COMPRESSION = PAGE);
ALTER TABLE [core].Archive_ClustColumnStInd REBUILD PARTITION = ALL
 WITH (DATA_COMPRESSION = COLUMNSTORE_ARCHIVE)
ALTER TABLE [core].PageLevel_Heap REBUILD PARTITION = ALL
 WITH (DATA_COMPRESSION = PAGE);

This is how the used space look after using the compression on the table:

name

rows

data

index_size

Total Size

NoComp_ClustInd

279690

123168

480

123.648

ClustColumnStInd

279690

34016

0

34.016

NoComp_Heap

279690

122024

16

122.040

PageLevel_ClustInd

279690

34184

192

34.376

Archive_ClustColumnStInd

279690

22792

0

22.792

PageLevel_Heap

279690

36824

16

36.840

As we can see the overall winner is the Columnstore Index. It uses the least space, even without archive compression. But if you use archive compression, the used space is by far the lowest. This makes it the favorite, if your target is use the lowest amount of used space. The second is the Columnstore Index without compression. In third place we see the clustered index, which is using the Page Level compression, but very close to it is the heap table with Page Level Compression.

Another finding is, that we see differences in the used space between the clustered index table and the heap table. Even without considering the used space for the index we see a difference. Before the compression the heap table is using less space than the table with the clustered index. But after the compression it changes and the table with the clustered index is using slightly less space.

From the results of this example I can see three groups in terms of space usage:

  1. Columnstore Indexes with Archive Compression (very low space usage)
  2. Page Level compression with and without an clustered index and Columnstore index without Archive Compression (low to middle space usage)
  3. Tables without Compression and Indexes (high space usage)

The main finding is that the kind of Index will have influence on the used disk space before and after the compression. The interesting thing is that the table with the column store index is using always the least disk space. 

This is just a test that is based on one specific table. It could give different results on other tables. Maybe even an optimization (normalization, correct datatypes,…) of the table could change the results. This comparison is just to get a feeling for the handling of different compressions and clustered indexes.

Rate

4 (4)

You rated this post out of 5. Change rating

Share

Share

Rate

4 (4)

You rated this post out of 5. Change rating