Best way to ETL on large tablle with columnstore index

  • I have a table with 489M rows and growing. The reason we had to add columnstore index was because it was taking a long time to pull the rows from this table in PowerBI, and CCI seemed to help. But the challange now is update/deleting this table nightly. Since update would suck, I started deleting rows from it and then insert, but even delete itself takes about an hour. Anyway to expedite this?

    What would be the best strategy to do ETL on larger table with columnsotore index.

    Will non-clustered index help delete if i add the columns used in where condition as non clustered in the delete? I'm not sure how non clustered index works on top of columnstore index.

    Thank you in advance.

     

  • Thanks for posting your issue and hopefully someone will answer soon.

    This is an automated bump to increase visibility of your question.

  • Add a supporting non-clustered index for your DELETE operations. Similarly do it for UPDATE as well. These indexes will be huge in size based on the data and will help avoid full table scans.

    =======================================================================

  • Will delete on colunstore index takes longer than Update? Or both will take longer?

  • update will take longer but none are fast - see https://www.sqlservercentral.com/steps/stairway-to-columnstore-indexes-level-6-updating-and-deleting-data-in-a-columnstore-index

    both will require index reorg to get ride of the "deleted rows" from the table.

    for large etl like this one needs to consider what is updated/deleted - e.g. is it just the most recent set of data? or can it be the whole book?

    if only the most recent set then partitioning almost becomes a must do so that index reorgs target only the partitions that need it. otherwise space to rebuild/reorg can be 10's of times bigger than the underlying data.

     

Viewing 5 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply