Best way to ETL on large tablle with columnstore index

Question

Post reply

Best way to ETL on large tablle with columnstore index

SQL_Surfer

SSCrazy Eights

Points: 8320
More actions
March 23, 2023 at 8:56 pm

#4164312

I have a table with 489M rows and growing. The reason we had to add columnstore index was because it was taking a long time to pull the rows from this table in PowerBI, and CCI seemed to help. But the challange now is update/deleting this table nightly. Since update would suck, I started deleting rows from it and then insert, but even delete itself takes about an hour. Anyway to expedite this?
What would be the best strategy to do ETL on larger table with columnsotore index.
Will non-clustered index help delete if i add the columns used in where condition as non clustered in the delete? I'm not sure how non clustered index works on top of columnstore index.
Thank you in advance.

Viewing 5 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply

Site Owners SSC Guru Points: 80382 More actions · Answer 1

Thanks for posting your issue and hopefully someone will answer soon.

This is an automated bump to increase visibility of your question.

Emperor100 SSCrazy Points: 2221 More actions · Answer 2

Add a supporting non-clustered index for your DELETE operations. Similarly do it for UPDATE as well. These indexes will be huge in size based on the data and will help avoid full table scans.

=======================================================================

SQL_Surfer SSCrazy Eights Points: 8320 More actions · Answer 3

Will delete on colunstore index takes longer than Update? Or both will take longer?

frederico_fonseca SSCoach Points: 16254 More actions · Answer 4

update will take longer but none are fast - see https://www.sqlservercentral.com/steps/stairway-to-columnstore-indexes-level-6-updating-and-deleting-data-in-a-columnstore-index

both will require index reorg to get ride of the "deleted rows" from the table.

for large etl like this one needs to consider what is updated/deleted - e.g. is it just the most recent set of data? or can it be the whole book?

if only the most recent set then partitioning almost becomes a must do so that index reorgs target only the partitions that need it. otherwise space to rebuild/reorg can be 10's of times bigger than the underlying data.