April 11, 2017 at 6:17 am
Hi,
I have an event based fact table, it will hold approx 2 billion records.
EventKey        bigint
CustomerKey  int
EventDate      date
TypeKey         int
CategoryKey   int
Description     varchar(255)
Before adding a column store index I'm thinking of moving the Description column to separate table and replacing the field with a surrogate key.
Description has a very high cardinality hence I'm guessing it will have an adverse effect on column store.
It's usually best practice to remove text from fact tables.   Is this still the case with column store?   Even though the table will later be joined to dimensions later on.
April 11, 2017 at 1:59 pm
I don't see any need to move description into a separate table just because of a columnstore index.
I agree that the description should not be in the fact; it would be better served in a degenerate dimension table containing the transaction ID that it describes. But that is a separate issue from a text field in a columnstore.
If you have good reasons for keeping it in the fact ( I can't think of any),then it will work just fine as part of columnstore index. (Make sure you're familiar with the costly overhead of updates and deletes when using clustered columnstore)
Wes
(A solid design is always preferable to a creative workaround)
April 11, 2017 at 3:01 pm
I'm assuming we're talking about an OLAP or data warehouse type database here. Regarding the issue of where to contain the Event Description column, I can't say for certain unless I were entirely in your shoes, but consider these alternate points of view:
If your goal is to follow the star-schema modeling standard (as per Kimball), then move the description to a separate dimension table and add EventDescriptionKey column to the fact table. It's just another dimension. If the typical case usage of your database is to support singleton queries (ie: query for one specific customer), then star-schema will be more optimal. You may want to reconsider if implementing ColumnStore for a fact table containing only key columns even adds any value over a traditional RowStore table.
However, if your goal is to leverage ColumnStore to it's fullest performance optimization potential, then keep the Event Description column and all the other description columns in this one single table, because ColumnStore is really intended for flat 0-normal form tables. ColumnStore will compress a typical description or alpha-numeric code column by 90% or more, even if it's practically unique. The memory footprint of a flat ColumnStore table will typically be less than if the same data were contained in a star-schema data model, and it will typically facilitate aggregate analytical style queries faster than star-schema.
"Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho
Viewing 3 posts - 1 through 3 (of 3 total)
You must be logged in to reply to this topic. Login to reply