t.franz - Monday, April 24, 2017 8:21 AM
Yes, tinyint data is also compressed. The average space gain is less than for other data types, but it can still be relevant.
The primary compression method for tinyint will be run-length encoding (RLE). So if the data in rows 1 to 20 reads 1 / 1 / 1 / 1 / 2 / 2 / 2 / 2 / 3 / 3 / 1 / 1 / 1 / 1 / 4 / 1 / 3 / 3 / 3 / 3, then this will be stored as 1 (4) / 2 (4) / 3 (2) / 1 (4) / 4 (1) / 1 (1) / 3 (4). The algorithm that builds the columnstore index includes a step (after dividing the table into rowgroups) where the rows within a rowgroup are sorted to optimize the benefit of RLE. The "optimal" sort order for this column would after RLE result in 1 (9) / 2 (4) / 3 (6) / 4 (1). However, the sort algorithm that is used for columnstore indexes has to look at ALL columns, not just a single one.
A secondary compression method for tinyint tries to reduce the size of each value to less than 1 full byte. Let's for example say that, within a specific rowgroup, all values in your tinyint columns are NULL or between 23 and 48. The algorithm will replace NULLs with a value not in the normal range (probably 22), and then subtract 22 from each value changing the range to 0 (for NULL) and 1 - 26 (for 23-48); this can then be encoded in 5 bits instead of 8 for the value + 1 for the NULL marker. The "magic value" to represent NULL, and the offset of 22, are stored only once, in the rowgroup metadata. (You can find it in the DMV sys.column_store_segments. This DMV is introduced in level 4, though I have not shown all columns; the null value (null_value) and the offset (base_id) are two of the columns I chose not to include).