Columnstore index

Question

Columnstore index

Igor Micev

SSC-Dedicated

Points: 33110
More actions
May 15, 2013 at 8:55 pm

#272690

Comments posted to this topic are about the item Columnstore index
Igor Micev,My blog: www.igormicev.com

Viewing 15 posts - 1 through 15 (of 20 total)

You must be logged in to reply to this topic. Login to reply

Raghavendra Mudugal SSChampion Points: 10658 More actions · Answer 1

Good one, IgorMi, thank you for the posting.

COLUMNSTORE index is very new to me and didn't had a clue on what it does, so have to take the help of BOL, but added this in to my list to go through about this today. 🙂

From BOL - "A table with a columnstore index cannot be updated" - seems likes the table will become as a look up table or possibly one of the way to use this on the archive tables?

(...interesting)

ww; Raghu
--
The first and the hardest SQL statement I have wrote- "select * from customers" - and I was happy and felt smart.

sqlnaive SSCoach Points: 17437 More actions · Answer 2

Great great question. Never used Columnstore index but theoretically knew the only fact that all columns in the clustering key must be present in the nonclustered columnstore index. So picked up the wrong'un. I think while automatically adding the column, it does not give any warning.

Lost two points but nevertheless got some precious piece of knowledge. So happy to loose. :-):-):-)

Hugo Kornelis SSC Guru Points: 64790 More actions · Answer 3

Raghavendra Mudugal (5/16/2013)
From BOL - "A table with a columnstore index cannot be updated" - seems likes the table will become as a look up table or possibly one of the way to use this on the archive tables?

Columnstore indexes are primarily a feature targeted at BI/Datawarehouding workloads. They are intended to be used on large (millions or billions of rows) fact tables that are frequently used in BI/DW type queries. (Those are queries that typically use a star schema join to filter, that need to process all or large amounts of the rows in the tables, and that return aggregated data). In those environments, the read-only restriction is not as bad as it sounds, since databases like that are often read-only during the day anyway, only updated once per day during the load/refresh process. For that process, you can drop or disable and later recreate the columnstore index. And with table partitioning, it is also possible to just swap in a new partition without ever having to drop and recreate the columnstore index.

It *is* also possible to use columnstore indexes on large (*) tables in an OLTP database, but it takes a lot of extra work. Not only to work around the read-only limitation, but also to ensure that the queries are written in such a way that they work around the limitations of "batch mode processing" and gain the expected performance boost.

During the 2012 PASS Community Summit, Microsoft has announced that the next version of SQL Server will also support clustered columnstore indexes, for which the read-only restriction has been lifted. (And most of the limitations of batch mode processing have been lifted too). This is already available in the next version of PDW (Parallel DataWarehouse), which has been or will be released somewhere in 2013.

(*) Technically, there is no reason to limit columnstore indexes to large tables. But you won't get any performance benefit by using columnstore indexes on small tables. If the table does not have at least twenty million rows, I would not even consider it. It only really gets interesting when you have at least a couple hundred million rows. And even then, you would still only get a performance benefit if you frequently run reporting-type queries, that process all or most of the rows, on that table.

Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
Visit my SQL Server blog: https://sqlserverfast.com/blog/
SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/

Igor Micev SSC-Dedicated Points: 33110 More actions · Answer 4

Hugo Kornelis (5/16/2013)
Raghavendra Mudugal (5/16/2013)
From BOL - "A table with a columnstore index cannot be updated" - seems likes the table will become as a look up table or possibly one of the way to use this on the archive tables?
Columnstore indexes are primarily a feature targeted at BI/Datawarehouding workloads. They are intended to be used on large (millions or billions of rows) fact tables that are frequently used in BI/DW type queries. (Those are queries that typically use a star schema join to filter, that need to process all or large amounts of the rows in the tables, and that return aggregated data). In those environments, the read-only restriction is not as bad as it sounds, since databases like that are often read-only during the day anyway, only updated once per day during the load/refresh process. For that process, you can drop or disable and later recreate the columnstore index. And with table partitioning, it is also possible to just swap in a new partition without ever having to drop and recreate the columnstore index.
It *is* also possible to use columnstore indexes on large (*) tables in an OLTP database, but it takes a lot of extra work. Not only to work around the read-only limitation, but also to ensure that the queries are written in such a way that they work around the limitations of "batch mode processing" and gain the expected performance boost.
During the 2012 PASS Community Summit, Microsoft has announced that the next version of SQL Server will also support clustered columnstore indexes, for which the read-only restriction has been lifted. (And most of the limitations of batch mode processing have been lifted too). This is already available in the next version of PDW (Parallel DataWarehouse), which has been or will be released somewhere in 2013.
(*) Technically, there is no reason to limit columnstore indexes to large tables. But you won't get any performance benefit by using columnstore indexes on small tables. If the table does not have at least twenty million rows, I would not even consider it. It only really gets interesting when you have at least a couple hundred million rows. And even then, you would still only get a performance benefit if you frequently run reporting-type queries, that process all or most of the rows, on that table.

Hi Hugo

Thank you for your extra explanation!

Best

IgorMi

Igor Micev,My blog: www.igormicev.com

call.copse SSCoach Points: 18295 More actions · Answer 5

IgorMi (5/16/2013)
Hugo Kornelis (5/16/2013)
Raghavendra Mudugal (5/16/2013)
From BOL - "A table with a columnstore index cannot be updated" - seems likes the table will become as a look up table or possibly one of the way to use this on the archive tables?
Columnstore indexes are primarily a feature targeted at BI/Datawarehouding workloads. They are intended to be used on large (millions or billions of rows) fact tables that are frequently used in BI/DW type queries. (Those are queries that typically use a star schema join to filter, that need to process all or large amounts of the rows in the tables, and that return aggregated data). In those environments, the read-only restriction is not as bad as it sounds, since databases like that are often read-only during the day anyway, only updated once per day during the load/refresh process. For that process, you can drop or disable and later recreate the columnstore index. And with table partitioning, it is also possible to just swap in a new partition without ever having to drop and recreate the columnstore index.
It *is* also possible to use columnstore indexes on large (*) tables in an OLTP database, but it takes a lot of extra work. Not only to work around the read-only limitation, but also to ensure that the queries are written in such a way that they work around the limitations of "batch mode processing" and gain the expected performance boost.
During the 2012 PASS Community Summit, Microsoft has announced that the next version of SQL Server will also support clustered columnstore indexes, for which the read-only restriction has been lifted. (And most of the limitations of batch mode processing have been lifted too). This is already available in the next version of PDW (Parallel DataWarehouse), which has been or will be released somewhere in 2013.
(*) Technically, there is no reason to limit columnstore indexes to large tables. But you won't get any performance benefit by using columnstore indexes on small tables. If the table does not have at least twenty million rows, I would not even consider it. It only really gets interesting when you have at least a couple hundred million rows. And even then, you would still only get a performance benefit if you frequently run reporting-type queries, that process all or most of the rows, on that table.
Hi Hugo
Thank you for your extra explanation!
Best
IgorMi

+1

Even if I'm in a hurry and not generally checking the explanations I'll read anything you have added - cheers, appreciate the summary on this one.

Raghavendra Mudugal SSChampion Points: 10658 More actions · Answer 6

call.copse (5/16/2013)
IgorMi (5/16/2013)
Hugo Kornelis (5/16/2013)
Raghavendra Mudugal (5/16/2013)
From BOL - "A table with a columnstore index cannot be updated" - seems likes the table will become as a look up table or possibly one of the way to use this on the archive tables?
Columnstore indexes are primarily a feature targeted at BI/Datawarehouding workloads. They are intended to be used on large (millions or billions of rows) fact tables that are frequently used in BI/DW type queries. (Those are queries that typically use a star schema join to filter, that need to process all or large amounts of the rows in the tables, and that return aggregated data). In those environments, the read-only restriction is not as bad as it sounds, since databases like that are often read-only during the day anyway, only updated once per day during the load/refresh process. For that process, you can drop or disable and later recreate the columnstore index. And with table partitioning, it is also possible to just swap in a new partition without ever having to drop and recreate the columnstore index.
It *is* also possible to use columnstore indexes on large (*) tables in an OLTP database, but it takes a lot of extra work. Not only to work around the read-only limitation, but also to ensure that the queries are written in such a way that they work around the limitations of "batch mode processing" and gain the expected performance boost.
During the 2012 PASS Community Summit, Microsoft has announced that the next version of SQL Server will also support clustered columnstore indexes, for which the read-only restriction has been lifted. (And most of the limitations of batch mode processing have been lifted too). This is already available in the next version of PDW (Parallel DataWarehouse), which has been or will be released somewhere in 2013.
(*) Technically, there is no reason to limit columnstore indexes to large tables. But you won't get any performance benefit by using columnstore indexes on small tables. If the table does not have at least twenty million rows, I would not even consider it. It only really gets interesting when you have at least a couple hundred million rows. And even then, you would still only get a performance benefit if you frequently run reporting-type queries, that process all or most of the rows, on that table.
Hi Hugo
Thank you for your extra explanation!
Best
IgorMi
+1
Even if I'm in a hurry and not generally checking the explanations I'll read anything you have added - cheers, appreciate the summary on this one.

+1

thank you Hugo for the awesome details.

ww; Raghu
--
The first and the hardest SQL statement I have wrote- "select * from customers" - and I was happy and felt smart.

sestell1 SSChampion Points: 10230 More actions · Answer 7

I almost got caught up on the primary key, as I knew clustered index columns needed to be included but didn't know the columns would be added automatically when they aren't included in the create statement. Interesting.

Thanks for the additional insight into columstore use Hugo!

mtassin SSC-Insane Points: 23099 More actions · Answer 8

Raghavendra Mudugal (5/16/2013)
Good one, IgorMi, thank you for the posting.
COLUMNSTORE index is very new to me and didn't had a clue on what it does, so have to take the help of BOL, but added this in to my list to go through about this today. 🙂
From BOL - "A table with a columnstore index cannot be updated" - seems likes the table will become as a look up table or possibly one of the way to use this on the archive tables?
(...interesting)

COLUMNSTORE indexes are used primarily in Data Warehousing. Because of the nature of DW queries the columnstore index tends to provide superior performance. I was familiar with it.... but not to the degree that I'd have even looked to see if you could use a decimal/numeric data type with a precision greater than 18....

weird... learned something.... but weird.

--Mark Tassin
MCITP - SQL Server DBA
Proud member of the Anti-RBAR alliance.
For help with Performance click this link[/url]
For tips on how to post your problems[/url]

Hugo Kornelis SSC Guru Points: 64790 More actions · Answer 9

mtassin (5/16/2013)
I was familiar with it.... but not to the degree that I'd have even looked to see if you could use a decimal/numeric data type with a precision greater than 18....
weird... learned something.... but weird.

I've been told that the reason for this limitation (as well as a similar surprising limitation on precision for datetime2 columns) is that SQL Server 2012 nonclustered columnstore indexes are engineered to ensure that every single value always fits in a single 64-bit processor word. Numeric with precision of 19 or more would not fit in 64 bits, so it had to be cut from this version of the product.

I *think* Microsoft announced that the next version of SQL Server would support far more data types in columnstore indexes (but I am not 100% sure if that's a real memory or just wishful thinking).

Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
Visit my SQL Server blog: https://sqlserverfast.com/blog/
SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/

Igor Micev SSC-Dedicated Points: 33110 More actions · Answer 10

mtassin (5/16/2013)
Raghavendra Mudugal (5/16/2013)
Good one, IgorMi, thank you for the posting.
COLUMNSTORE index is very new to me and didn't had a clue on what it does, so have to take the help of BOL, but added this in to my list to go through about this today. 🙂
From BOL - "A table with a columnstore index cannot be updated" - seems likes the table will become as a look up table or possibly one of the way to use this on the archive tables?
(...interesting)
COLUMNSTORE indexes are used primarily in Data Warehousing. Because of the nature of DW queries the columnstore index tends to provide superior performance. I was familiar with it.... but not to the degree that I'd have even looked to see if you could use a decimal/numeric data type with a precision greater than 18....
weird... learned something.... but weird.

Yes, that's why I posted it. Actually I wanted to create columnstore index on a table having column with precision greater than 18 and I got that error. In such cases I usually share it here.

Regards

IgorMi

Igor Micev,My blog: www.igormicev.com

PChiragS SSCarpal Tunnel Points: 4965 More actions · Answer 11

I used it before, but never occurred this in my mind..

different and new.. learned something ..

Thank You.

Igor Micev SSC-Dedicated Points: 33110 More actions · Answer 12

crussell-931424 (5/16/2013)
I couldn't find much out about it either and don't quite understand how the index data is really stored. The part about the table can't be updated. Do they mean you can't add data to the table or the structure of the table can't be changed? It also says you can create a columnstore index on an empty table. If you can't add data to the table once the index is created and you create the index before you add any data then what good is it?

ColumnStore indexes are intended mainly for OLAP systems. At them data is inserted usually during nightly jobs. At that time indexes are dropped and after the insertion they are recreated. Columnstore indexes show very good performance.

Regards

IgorMi

Igor Micev,My blog: www.igormicev.com

Ken SSCoach Points: 17191 More actions · Answer 13

Nice question. Thanks Igor.

Also, thanks for the extra information Hugo! I've been considering the uses of columnstore indexes but I don't know that I would get much benefit out of them other than with a few tables.

Hugo Kornelis SSC Guru Points: 64790 More actions · Answer 14

KWymore (5/16/2013)
Nice question. Thanks Igor.
Also, thanks for the extra information Hugo! I've been considering the uses of columnstore indexes but I don't know that I would get much benefit out of them other than with a few tables.

Columnstore indexes will never be used on all tables. "other than with a few tables" is exactly what this feature is intended for.

To assess the benefit: What kind of workload are you running? How big are your tables? Are they constantly updated, and if so, could you split them in a large "read only" and a small "read/write" part? What kind of queries are typically run against the large tables, and how are they currently performing?

And (maybe obvious, but relevant nonetheless!) are you running SQL Server 2012 Enterprise Edition?

Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
Visit my SQL Server blog: https://sqlserverfast.com/blog/
SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/