One of my most favorite technologies is Columnstore. Thankfully, the guys at Microsoft seem to agree with me, because they continue developing this technology and make it faster and faster.
In a nutshell, as oppose to what we’re used to today, a Columnstore index keeps our data at the column-level instead of the row-level. Because the values of the same column have the same data type and tend to repeat themselves, there are many options for optimizations and compressing the data.
For example, if the name “Matan” appears many times in a column, we can store it only once and store the number of times it appears, instead of storing it for each and every row.
Working with Columnstore can be done in two ways:
- A regular table with a Non-Clustered Columnstore Index that sits on top of it (on some or all of its columns)
- A table which sits on a Clustered Columnstore Index, which is the table itself
Speed it achieved because of 3 major factors:
- Because the data is compressed, more data fits in memory, which is much faster than disk, and even if we go to disk, we read a relatively low amount of data
- Since data is stored column-wise, we can fetch only the needed columns, instead of all of the columns as in the regular state
- Batch-Mode Processing optimizes our queries by working in groups of 1000 rows at the CPU level for processing the data, and makes our queries much faster
Lots of new features were added In SQL Server 2016. Here are the main ones:
- It is now possible to define a Non-Clustered Columnstore Index on top of a regular table without making here read-only
- Is it now possible to create regular indexes and keys on top of a Clustered Columnstore table
If you notice, those two features give us lots of flexibility. If my table is more OLTP-style, but I do have analytical queries now and then, I can use a regular table with a Non-Clustered Columnstore index on some of its columns. If my table is intended mainly for reporting and analytical queries, but I do have operational queries that need to filter on specific values and rows, I can use a Clustered Columnstore Index on the table and add Non-Clustered indexes on top of it.
- It is now possible to define the table as Columnstore during its creation
- It is possible to create a Columnstore Index on top of a Memory-Optimized table. We can use this feature, for instance, when we want to run reports on top of our operational system with minimum delay, or, for example, when we want to show quantitative information on streaming data that enters our system
- Alter Index Reorganize can physically delete rows, instead of the state today where a row is only marked as deleted, and physically deleted only during a rebuild of the index. In addition, the Reorganize command can merge different parts of the table (Rowgroups) when they have less than 1 million values, which is the optimal number for SQL Server
- String predicate pushdown – up until SQL Server 2014, when we filtered on a textual column, SQL Server could not perform the filtering when reading the data. It needed to read all the data to memory and only then filter it. Starting SQL Server 2016, SQL Server pushes the filtering to the storage level, and thus needs to read much less data
- Aggregate pushdown – for the functions Min, Max, Avg, Sum and Count, the aggregation will be performed during the scan of the data, instead of the (still pretty fast) situation today where it needs to scan the data and only then aggregate it