Best way to build a partitioned clustered columnstore

  • I am building three partitioned, clustered column store tables.

    I was researching whether it was faster to populate a staging table and swap it into the partitioned table or to directly insert into the partitioned table. I thought it would load faster that way if I could use minimal logging, etc.

    The first partition for the three tables will have:

    Table F: 50M rows, 6 columns wide, partitioned on a date column (1 date, 2 bigint keys, and two varchar columns)

    Table D1: 50M rows, 150 columns wide, partitioned on a bigint

    Table D2: 19M rows, 300 columns wide, partitioned on a bigint

    If build the data that would go into partition 1 in a non partitioned column store, I get these table sizes:

    Table F: 476 MB

    Table D1: 6,800 MB

    Table D2: 5,496 MB

    If build the same data directly in the partitioned column store, my table sizes end up being:

    Table F: 579 MB

    Table D1: 6,800 MB

    Table D2: 5,364 MB

    That's a 20% difference on Table F, the narrow table.

    Looking at the row groups, I see 47 identical row groups in partition 1 and the unpartitioned table, but the average "size_in_bytes" is consistently 20% smaller in the unpartitioned table.

    Any idea why I'm getting better compression on the unpartitioned table?

  • I guess that the posibility of having at least one deltastore (with rows) per partition after loading data could yield less compression for the partitioned columnstore index.

    Try rebuilding the columnstore index for the partition you are switching into to be sure that all data is stored in the columnstore.

  • It wasn't open row groups or a need to rebuild, it was simple operator error.

    It became very clear what the problem was when I tried to switch my staging table into the partitioned table.

    My non partitioned staging table had one fewer column.

Viewing 3 posts - 1 through 2 (of 2 total)

You must be logged in to reply to this topic. Login to reply