Similar to yesterday’s post, I have a follow-up for another Pragmatic Works Training on the T’s session that I delivered last month. This was is titled Introduction to Columnstore Indexes. In the session, I discussed columnstore indexes and the value that they provide in SQL Server, both in 2012 and 2014.
The abstract for the sessions is:
Columnstore indexes were first introduced in SQL Server 2012. Columnstore indexes provide the ability to flip-over your indexes from column-based to row-based, leading to dramatic changes in the storage and performance of your data. In this session, we’ll review the basic concepts of columnstore indexes and look at the changes coming to them in SQL Server 2014
Webcast Recording & Materials
Also, if you just want to flip through the slide deck again, here you go.
A typical fact table is mostly comprised of keys and measures. So if you cannot include keys in the columnstore for the table would you architect the warehouse without foreign key relationships and handle them through a meta-data layer for ad hoc reporting?
Yes, create the data warehouse without foreign key relationships and enforce these business rules through the ETL processes. Unlike, OLTP databases, data warehouses typically have a limited number of processes that change data in each table. So ensuring these business rules are followed is much more simple.
Are all indexes in 2012 columnstore by default?
No, columnstored indexes are just another option for indexing in SQL Server 2012 and on.
Can we have relationships (like PK FK)
No, foreign keys and primary keys are not supported on columnstore indexes
Can I have clustered index on some partitions and not on others?
No, all partitions must have the same underlying physical storage structure, whether heap, clustered index, or clustered columnstore.
Does it work with mirroring?
Yes, database mirroring will work with columnstore indexes.
Can we not update any data in the table with a columnstore index on it?
In SQL Server 2012, columnstore indexes cannot be modified. In SQL Server 2014, clustered columnstore indexes can be modified.
Can you explain how data is stored between rowstore and columnstore index with some example data?
I would primarily recommend re-watching the first portion of the webcast again. That said, in a rowstore, SQL Server stores all of the columns for a row together. While in a column stored all of the rows for a column are stored together.
So the normal concept of index on a particular column is a key and all rows that have that key in that column. So with columnstore do we create index on row instead of column?
Columnstore is created across all, or most columns, in a table. Each column is stored individually, so the order of the columns makes no difference. Each column can be accessed independently from other columns in the table.
Can you have computed colums in the columnar storage indexes?
No, computed columns are not supported.
Is there any space limitation with decimal datatype with columnar storage indexes in 2014?
There is no limitation with SQL Server 2014.
Can’t you just disable (finagle the data) then re-enable the columnstore index? Without dropping and recreating.
Yes, you can disable, though, this is about the same as dropping and recreating since a rebuild would be required to leverage it again.
Could you please explain the fragmentation for clustered columnstore indexes, ie is a rebuild of the whole index required or can it be rebuilt by partition ?
Clustered columnstore partitions can be rebuilt. The main “fragmentation” concern with clustered columnstore indexes is when small segments are created and when there are many updated/deleted records contained in readonly segments.
Could you please recommend a resource (book) which we could use to practice columnstore indexes?
Currently, the only resource I am aware of is Expert Performance Indexing for SQL Server 2012.
Does this change Star design (Wider Fact Tables?)
No, clustered columnstore indexes are excellent for wide fact tables.
How do columnstores impact performance during batch etl – ie all the background index updates that occurs?
When you use clustered columnstore indexes on tables, they will be the only indexes on the tables. There is no longer a need, or ability, to create other indexes on those tables.
How do pointers work in columnstore? for example, if i want to query two columns at the same time, what structure helps to find the relation?
There are rowids in columnstored indexes that help reassemable the rows.
Are columnstores linked to SQL Server edition somehow? are they available in standard edition?
Yes, columnstore indexes are only available in Enterprise edition. As a side note, you can leverage them in SQL Server Azure VMs since they come pre-installed with Enterprise Edition.
How does filtering work if only measures are stored together? how does SQL Server link between the columns in one row for filtering/joining?
Yes, SQL Server can link rows back together with rowids.
Like clust and non-clust indexes, does NC columnstores have a type in sysindexes table? reason is I need to filter them out while finding unused indexes…
Yes, they have their own type.
Is Columnstore available on all SQL 2012 editions?
Only on Enterprise edition.
When did you say Microsoft introduced the Columnstore index paradigm?
Columnstore indexes were introduced in SQL Server 2012.
There are a lot of restrictions on columnstore. would we use columnstore in OLTP or just in DW?
No, columnstore indexes are not aimed or appropriate for OLTP workloads. They are a data warehouse enhancement.
When do you see benefits from using a columnstore index in terms of the numbers of rows in a table?
An easy cutoff is for tables with more than 1 million rows, since that’s the max size of a columnstore segment. Below that, though, I often see many occassions where they still provide value. I don’t have any specific examples, but I’ll see if I can put something up on that.
When 2014 brings clustered columnstore indexes, which (from the sound of it) will determine physical storage of the data, does that mean using a clustered columnstore index will prevent you from using a normal clustered index? (It sounds like it does, but I figured I’d ask anyway)
Yes, clustered columnstore index will be the primary storage. No other clustered index would be needed, or allowed.
Is it possible to partition on these columnstores?
Yes, columnstore indexes can be partitioned.
You just reiterated on adding all columns for the columnstore indexes. How does that work with the parition-swapping though?
It works the same as any other type of partition swapping. The number of columns in an index has no effect.
You mentioned that columnstore indexes most benefit star join queries. Can you give an example of this, and an example of a query that would benefit more from a rowstore index than a columnstore?
This will primarily be of benefit in data warehouse workloads where you want, or need, to take advantage of batch processing to increase the row throughput for your queries. Also, when you are selecting a subset of columns instead of the entire row, which is very common in data warehouses.
Would you still recommend the Partition Switch strategy for processing updates to a table that has a clustered columnstore index?
Yes, I’ve used and recommended this with clients with a lot of success.
Thanks for Attending
Thanks to those that attended the webcast. If you have any questions on the webcast or suggestions for improvements, please leave a comment on this post.