Blog Post

Rebuilding and Reorganizing Clustered Columnstore Indexes

,

Clustered Columnstore Indexes, as well as “regular” indexes, support the Rebuild and Reorganize operations. However, the meaning of those operations is different in the case of Columnstore.

Before we describe the differences, let’s remind ourselves how data manipulation works in Columnstore:

Inserts go into an object called Delta Store, an unindexed heap. After 1 Million rows get into the Delta Store, it is marked as closed, compressed and merged into the Columnstore by a process called the tuple-mover. However, if you bulk insert 100k rows or more, they will bypass the Delta Store and go straight into the Columnstore (and if you work with bulks closer to 1 million, it’s even better since this is the optimal size).

Deletes are implemented using an object called Delete Bitmap. Rows are not physically deleted, but only marked as deleted in the bitmap.

Updates are implemented using a delete + insert, meaning the old row is marked as deleted in the Delete Bitmap, and the new row is inserted into the Delta Store.

 

Now, let’s talk about Rebuild and Reorganize.

 

Rebuild:

  • Physically deletes the rows that are marked as deleted in the Delete Bitmap.
  • Merges the data in the Delta Store with the data in the Columnstore.
  • Rebuilds the Columnstore

The rebuild operation cannot be performed online. However, it still allows reading from the table while it’s running. Its main benefits are getting rid of the rows that are in the Delete Bitmap and in the Delta Stores.

If you have a partitioned table, it can be beneficial to rebuild only the partition(s) that get new data, in order to reduce the resource consumption of the Rebuild.

Reorganize:

The Reorganize operation is an online operation, and moves the closed row groups into the Columnstore.

You might ask why you should do this yourself, since this is what the tuple-mover does anyway. The answer is that the tuple-mover processes only one row group at a time and then goes to sleep for a certain amount of seconds. In some heavy load situations, it won’t be able to catch up, and those situations can benefit from the Reorganize operation.

Minimizing the need for Rebuild and Reorganize:

If you follow the next two steps, you’ll be able to reduce the frequency of index Rebuild and Reorganize, and in some cases make them completely unneeded:

  1. Load data using only bulk inserts with 100k rows or more, thus bypassing the Delta Store (but the closer you get to 1 million rows in a bulk, the better)
  2. Delete data only using the Switch Partition operation, thus not using the Delete Bitmap

Keep in mind, though, that this tactic won’t help if you have a lot of updates to the table, since as described above, the update is implemented as delete (using the Delete Bitmap) + insert (using the Delta Store).


Photo credit: Lendingmemo. “Index” is licensed under CC BY 2.0

The post Rebuilding and Reorganizing Clustered Columnstore Indexes appeared first on .

Rate

You rated this post out of 5. Change rating

Share

Share

Rate

You rated this post out of 5. Change rating