ColumnStore Index

  • We have a DataWarehouse with Large Fact Tables . The Some of the tables have about 30 to 60 Columns and we also have some denormalized tables that are nearing 300 columns. We are using SQL Server 2014

    We would like to test and see if ColumnStore indexing would make sense for us.

    We load several million of rows and delete several million of rows a day. We keep all history.

    I am wondering if anyone has experience with ColumnStore especially when it comes to deleting large amounts of data daily.

    Thank you in advance

  • mishka-723908 (10/21/2016)


    We have a DataWarehouse with Large Fact Tables . The Some of the tables have about 30 to 60 Columns and we also have some denormalized tables that are nearing 300 columns. We are using SQL Server 2014

    We would like to test and see if ColumnStore indexing would make sense for us.

    We load several million of rows and delete several million of rows a day. We keep all history.

    I am wondering if anyone has experience with ColumnStore especially when it comes to deleting large amounts of data daily.

    Thank you in advance

    Couple questions:

    1. Clustered or non-clustered?

    2. When/how are you deleting millions of rows a day? As part of a regularly scheduled ETL job?

    "I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."

    -- Itzik Ben-Gan 2001

  • Thank you for your response

    We are not sure yet whether to create it as clustered or not. This is something we will need to test with.

    The deletes will happen through a daily ETL data load. In some cases we may have more than one loads per day there are multiple fact tables we delete from.

  • mishka-723908 (10/21/2016)


    Thank you for your response

    We are not sure yet whether to create it as clustered or not. This is something we will need to test with.

    The deletes will happen through a daily ETL data load. In some cases we may have more than one loads per day there are multiple fact tables we delete from.

    You'll certainly have to do a lot of testing. Specifically which impacts performance more: dropping/rebuilding as part of your ETL or having a Columnstore index present during updates.

    Then you'll need to consider the pros/cons of clustered vs nonclustered. E.g you can update a clustered cs index but you can't have any other indexes on that table. You can't update a nonclustered cs index but you don't have to include all the columns (maybe a huge benefit considering how many columns you're working with.

    It will be interesting to see how this works out for you. Cheers.

    "I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."

    -- Itzik Ben-Gan 2001

  • Deletes and updates should be avoided.

    Use partitions for your fact tables and apply partition switching instead of deleting data.

    Then you can consider creating a clustered columnstore index cos I think with the amount of rows added and deleted, you will have large deltastore if you do it directly on the table.

    If you are partition switching, you won't need to rebuild the entire table everytime you have a dataload.

    Catch-all queries done right [/url]
    Gail Shaw's Performance Blog[/url]

  • MadAdmin (10/26/2016)


    Deletes and updates should be avoided.

    Use partitions for your fact tables and apply partition switching instead of deleting data.

    Agree completely.

    Don't bother issuing a delete/update on a columnstore. It may perform great to start with, but it will quickly degrade. And it will degrade all future queries on the table until it is rebuilt.

    I had to update a column for about 1% of a columnstore table with 100 billion rows. The standard approach of updating chunks in a While loop started out great, but very quickly slowed down as the number of delta stores increased.

    Fortunately, our table was partitioned by day (about 100 million per day). What I did was swapped the day to update with an empty partition/table. Than I insert-selected back into the target table, transforming the data in the select statement. After truncating the swap table, I was ready to update the next day. Since the process was repeatable, I was able to script it out and got through it within a couple of hours.

    Wes
    (A solid design is always preferable to a creative workaround)

  • Thank you everyone for the input. It is very helpful.

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply