Columnstore index

  • Comments posted to this topic are about the item Columnstore index

    Igor Micev,My blog: www.igormicev.com

  • Good one, IgorMi, thank you for the posting.

    COLUMNSTORE index is very new to me and didn't had a clue on what it does, so have to take the help of BOL, but added this in to my list to go through about this today. 🙂

    From BOL - "A table with a columnstore index cannot be updated" - seems likes the table will become as a look up table or possibly one of the way to use this on the archive tables?

    (...interesting)

    ww; Raghu
    --
    The first and the hardest SQL statement I have wrote- "select * from customers" - and I was happy and felt smart.

  • Great great question. Never used Columnstore index but theoretically knew the only fact that all columns in the clustering key must be present in the nonclustered columnstore index. So picked up the wrong'un. I think while automatically adding the column, it does not give any warning.

    Lost two points but nevertheless got some precious piece of knowledge. So happy to loose. :-):-):-)

  • Raghavendra Mudugal (5/16/2013)


    From BOL - "A table with a columnstore index cannot be updated" - seems likes the table will become as a look up table or possibly one of the way to use this on the archive tables?

    Columnstore indexes are primarily a feature targeted at BI/Datawarehouding workloads. They are intended to be used on large (millions or billions of rows) fact tables that are frequently used in BI/DW type queries. (Those are queries that typically use a star schema join to filter, that need to process all or large amounts of the rows in the tables, and that return aggregated data). In those environments, the read-only restriction is not as bad as it sounds, since databases like that are often read-only during the day anyway, only updated once per day during the load/refresh process. For that process, you can drop or disable and later recreate the columnstore index. And with table partitioning, it is also possible to just swap in a new partition without ever having to drop and recreate the columnstore index.

    It *is* also possible to use columnstore indexes on large (*) tables in an OLTP database, but it takes a lot of extra work. Not only to work around the read-only limitation, but also to ensure that the queries are written in such a way that they work around the limitations of "batch mode processing" and gain the expected performance boost.

    During the 2012 PASS Community Summit, Microsoft has announced that the next version of SQL Server will also support clustered columnstore indexes, for which the read-only restriction has been lifted. (And most of the limitations of batch mode processing have been lifted too). This is already available in the next version of PDW (Parallel DataWarehouse), which has been or will be released somewhere in 2013.

    (*) Technically, there is no reason to limit columnstore indexes to large tables. But you won't get any performance benefit by using columnstore indexes on small tables. If the table does not have at least twenty million rows, I would not even consider it. It only really gets interesting when you have at least a couple hundred million rows. And even then, you would still only get a performance benefit if you frequently run reporting-type queries, that process all or most of the rows, on that table.


    Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
    Visit my SQL Server blog: https://sqlserverfast.com/blog/
    SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/

  • Hugo Kornelis (5/16/2013)


    Raghavendra Mudugal (5/16/2013)


    From BOL - "A table with a columnstore index cannot be updated" - seems likes the table will become as a look up table or possibly one of the way to use this on the archive tables?

    Columnstore indexes are primarily a feature targeted at BI/Datawarehouding workloads. They are intended to be used on large (millions or billions of rows) fact tables that are frequently used in BI/DW type queries. (Those are queries that typically use a star schema join to filter, that need to process all or large amounts of the rows in the tables, and that return aggregated data). In those environments, the read-only restriction is not as bad as it sounds, since databases like that are often read-only during the day anyway, only updated once per day during the load/refresh process. For that process, you can drop or disable and later recreate the columnstore index. And with table partitioning, it is also possible to just swap in a new partition without ever having to drop and recreate the columnstore index.

    It *is* also possible to use columnstore indexes on large (*) tables in an OLTP database, but it takes a lot of extra work. Not only to work around the read-only limitation, but also to ensure that the queries are written in such a way that they work around the limitations of "batch mode processing" and gain the expected performance boost.

    During the 2012 PASS Community Summit, Microsoft has announced that the next version of SQL Server will also support clustered columnstore indexes, for which the read-only restriction has been lifted. (And most of the limitations of batch mode processing have been lifted too). This is already available in the next version of PDW (Parallel DataWarehouse), which has been or will be released somewhere in 2013.

    (*) Technically, there is no reason to limit columnstore indexes to large tables. But you won't get any performance benefit by using columnstore indexes on small tables. If the table does not have at least twenty million rows, I would not even consider it. It only really gets interesting when you have at least a couple hundred million rows. And even then, you would still only get a performance benefit if you frequently run reporting-type queries, that process all or most of the rows, on that table.

    Hi Hugo

    Thank you for your extra explanation!

    Best

    IgorMi

    Igor Micev,My blog: www.igormicev.com

  • IgorMi (5/16/2013)


    Hugo Kornelis (5/16/2013)


    Raghavendra Mudugal (5/16/2013)


    From BOL - "A table with a columnstore index cannot be updated" - seems likes the table will become as a look up table or possibly one of the way to use this on the archive tables?

    Columnstore indexes are primarily a feature targeted at BI/Datawarehouding workloads. They are intended to be used on large (millions or billions of rows) fact tables that are frequently used in BI/DW type queries. (Those are queries that typically use a star schema join to filter, that need to process all or large amounts of the rows in the tables, and that return aggregated data). In those environments, the read-only restriction is not as bad as it sounds, since databases like that are often read-only during the day anyway, only updated once per day during the load/refresh process. For that process, you can drop or disable and later recreate the columnstore index. And with table partitioning, it is also possible to just swap in a new partition without ever having to drop and recreate the columnstore index.

    It *is* also possible to use columnstore indexes on large (*) tables in an OLTP database, but it takes a lot of extra work. Not only to work around the read-only limitation, but also to ensure that the queries are written in such a way that they work around the limitations of "batch mode processing" and gain the expected performance boost.

    During the 2012 PASS Community Summit, Microsoft has announced that the next version of SQL Server will also support clustered columnstore indexes, for which the read-only restriction has been lifted. (And most of the limitations of batch mode processing have been lifted too). This is already available in the next version of PDW (Parallel DataWarehouse), which has been or will be released somewhere in 2013.

    (*) Technically, there is no reason to limit columnstore indexes to large tables. But you won't get any performance benefit by using columnstore indexes on small tables. If the table does not have at least twenty million rows, I would not even consider it. It only really gets interesting when you have at least a couple hundred million rows. And even then, you would still only get a performance benefit if you frequently run reporting-type queries, that process all or most of the rows, on that table.

    Hi Hugo

    Thank you for your extra explanation!

    Best

    IgorMi

    +1

    Even if I'm in a hurry and not generally checking the explanations I'll read anything you have added - cheers, appreciate the summary on this one.

  • call.copse (5/16/2013)


    IgorMi (5/16/2013)


    Hugo Kornelis (5/16/2013)


    Raghavendra Mudugal (5/16/2013)


    From BOL - "A table with a columnstore index cannot be updated" - seems likes the table will become as a look up table or possibly one of the way to use this on the archive tables?

    Columnstore indexes are primarily a feature targeted at BI/Datawarehouding workloads. They are intended to be used on large (millions or billions of rows) fact tables that are frequently used in BI/DW type queries. (Those are queries that typically use a star schema join to filter, that need to process all or large amounts of the rows in the tables, and that return aggregated data). In those environments, the read-only restriction is not as bad as it sounds, since databases like that are often read-only during the day anyway, only updated once per day during the load/refresh process. For that process, you can drop or disable and later recreate the columnstore index. And with table partitioning, it is also possible to just swap in a new partition without ever having to drop and recreate the columnstore index.

    It *is* also possible to use columnstore indexes on large (*) tables in an OLTP database, but it takes a lot of extra work. Not only to work around the read-only limitation, but also to ensure that the queries are written in such a way that they work around the limitations of "batch mode processing" and gain the expected performance boost.

    During the 2012 PASS Community Summit, Microsoft has announced that the next version of SQL Server will also support clustered columnstore indexes, for which the read-only restriction has been lifted. (And most of the limitations of batch mode processing have been lifted too). This is already available in the next version of PDW (Parallel DataWarehouse), which has been or will be released somewhere in 2013.

    (*) Technically, there is no reason to limit columnstore indexes to large tables. But you won't get any performance benefit by using columnstore indexes on small tables. If the table does not have at least twenty million rows, I would not even consider it. It only really gets interesting when you have at least a couple hundred million rows. And even then, you would still only get a performance benefit if you frequently run reporting-type queries, that process all or most of the rows, on that table.

    Hi Hugo

    Thank you for your extra explanation!

    Best

    IgorMi

    +1

    Even if I'm in a hurry and not generally checking the explanations I'll read anything you have added - cheers, appreciate the summary on this one.

    +1

    thank you Hugo for the awesome details.

    ww; Raghu
    --
    The first and the hardest SQL statement I have wrote- "select * from customers" - and I was happy and felt smart.

  • I almost got caught up on the primary key, as I knew clustered index columns needed to be included but didn't know the columns would be added automatically when they aren't included in the create statement. Interesting.

    Thanks for the additional insight into columstore use Hugo!

  • Raghavendra Mudugal (5/16/2013)


    Good one, IgorMi, thank you for the posting.

    COLUMNSTORE index is very new to me and didn't had a clue on what it does, so have to take the help of BOL, but added this in to my list to go through about this today. 🙂

    From BOL - "A table with a columnstore index cannot be updated" - seems likes the table will become as a look up table or possibly one of the way to use this on the archive tables?

    (...interesting)

    COLUMNSTORE indexes are used primarily in Data Warehousing. Because of the nature of DW queries the columnstore index tends to provide superior performance. I was familiar with it.... but not to the degree that I'd have even looked to see if you could use a decimal/numeric data type with a precision greater than 18....

    weird... learned something.... but weird.



    --Mark Tassin
    MCITP - SQL Server DBA
    Proud member of the Anti-RBAR alliance.
    For help with Performance click this link[/url]
    For tips on how to post your problems[/url]

  • This was removed by the editor as SPAM

  • mtassin (5/16/2013)


    I was familiar with it.... but not to the degree that I'd have even looked to see if you could use a decimal/numeric data type with a precision greater than 18....

    weird... learned something.... but weird.

    I've been told that the reason for this limitation (as well as a similar surprising limitation on precision for datetime2 columns) is that SQL Server 2012 nonclustered columnstore indexes are engineered to ensure that every single value always fits in a single 64-bit processor word. Numeric with precision of 19 or more would not fit in 64 bits, so it had to be cut from this version of the product.

    I *think* Microsoft announced that the next version of SQL Server would support far more data types in columnstore indexes (but I am not 100% sure if that's a real memory or just wishful thinking).


    Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
    Visit my SQL Server blog: https://sqlserverfast.com/blog/
    SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/

  • mtassin (5/16/2013)


    Raghavendra Mudugal (5/16/2013)


    Good one, IgorMi, thank you for the posting.

    COLUMNSTORE index is very new to me and didn't had a clue on what it does, so have to take the help of BOL, but added this in to my list to go through about this today. 🙂

    From BOL - "A table with a columnstore index cannot be updated" - seems likes the table will become as a look up table or possibly one of the way to use this on the archive tables?

    (...interesting)

    COLUMNSTORE indexes are used primarily in Data Warehousing. Because of the nature of DW queries the columnstore index tends to provide superior performance. I was familiar with it.... but not to the degree that I'd have even looked to see if you could use a decimal/numeric data type with a precision greater than 18....

    weird... learned something.... but weird.

    Yes, that's why I posted it. Actually I wanted to create columnstore index on a table having column with precision greater than 18 and I got that error. In such cases I usually share it here.

    Regards

    IgorMi

    Igor Micev,My blog: www.igormicev.com

  • I used it before, but never occurred this in my mind..

    different and new.. learned something ..

    Thank You.

  • crussell-931424 (5/16/2013)


    I couldn't find much out about it either and don't quite understand how the index data is really stored. The part about the table can't be updated. Do they mean you can't add data to the table or the structure of the table can't be changed? It also says you can create a columnstore index on an empty table. If you can't add data to the table once the index is created and you create the index before you add any data then what good is it?

    ColumnStore indexes are intended mainly for OLAP systems. At them data is inserted usually during nightly jobs. At that time indexes are dropped and after the insertion they are recreated. Columnstore indexes show very good performance.

    Regards

    IgorMi

    Igor Micev,My blog: www.igormicev.com

  • Nice question. Thanks Igor.

    Also, thanks for the extra information Hugo! I've been considering the uses of columnstore indexes but I don't know that I would get much benefit out of them other than with a few tables.

Viewing 15 posts - 1 through 15 (of 20 total)

You must be logged in to reply to this topic. Login to reply