SQL Clone
SQLServerCentral is supported by Redgate
 
Log in  ::  Register  ::  Not logged in
 
 
 


Snowflaking Dimensions


Snowflaking Dimensions

Author
Message
Sideout1972
Sideout1972
SSC Rookie
SSC Rookie (35 reputation)SSC Rookie (35 reputation)SSC Rookie (35 reputation)SSC Rookie (35 reputation)SSC Rookie (35 reputation)SSC Rookie (35 reputation)SSC Rookie (35 reputation)SSC Rookie (35 reputation)

Group: General Forum Members
Points: 35 Visits: 116
I've got two related questions.

Question 1: Building an insurance data warehouse, you have a DimPolicy table (for the insurance policy) and a FactPolicyTransaction table (for the transactions on that policy). There is an insurance agent that is responsible for the policy as a whole, and an insurance agent (which may be a different person) that is responsible for the transaction on that policy. There are a large number of attributes about the agent, which leads me to believe that putting all of the attributes in DimPolicy for the agent (where the agent is responsible for thousands of policies) doesn't make sense - therefore, you snowflake the agent, and create a DimAgent table which has it's PK as a FK within DimPolicy. Now you can tell which agent is responsible for the policy. Now, you also need a relationship between DimAgent and FactPolicyTransaction to denote which agent actually made the policy transaction. This creates a "triangle", since DimAgency has its key in both DimPolicy and FactPolicyTransaction, and DimPolicy also has its key in FactPolicyTransction. The question - is this "triangle" ok? It seems awkward to me, and against the philosophies of dimensional modeling.

Question 2: You have a company (brokerage), and an agent works for a company. There are 30+ attributes for the brokerage, and 20+ attributes for an agent. It seems funny to create a DimAgent dimension which includes all 30 of the brokerage's attributes repeated over and over again. By doing that, you will constantly need to run a "SELECT DISTINCT" on the DimAgent table to get a valid list of brokerages (since they are repeated over and over again). Is this a prime time to snowflake and create a DimBrokerage dimension as well?

Thanks in advance for your help.
PaulB-TheOneAndOnly
PaulB-TheOneAndOnly
SSChampion
SSChampion (14K reputation)SSChampion (14K reputation)SSChampion (14K reputation)SSChampion (14K reputation)SSChampion (14K reputation)SSChampion (14K reputation)SSChampion (14K reputation)SSChampion (14K reputation)

Group: General Forum Members
Points: 14069 Visits: 4639
Sideout1972 (2/9/2011)
Question 1: Building an insurance data warehouse, you have a DimPolicy table (for the insurance policy) and a FactPolicyTransaction table (for the transactions on that policy). There is an insurance agent that is responsible for the policy as a whole, and an insurance agent (which may be a different person) that is responsible for the transaction on that policy. There are a large number of attributes about the agent, which leads me to believe that putting all of the attributes in DimPolicy for the agent (where the agent is responsible for thousands of policies) doesn't make sense - therefore, you snowflake the agent, and create a DimAgent table which has it's PK as a FK within DimPolicy. Now you can tell which agent is responsible for the policy. Now, you also need a relationship between DimAgent and FactPolicyTransaction to denote which agent actually made the policy transaction. This creates a "triangle", since DimAgency has its key in both DimPolicy and FactPolicyTransaction, and DimPolicy also has its key in FactPolicyTransction. The question - is this "triangle" ok? It seems awkward to me, and against the philosophies of dimensional modeling.

Question 2: You have a company (brokerage), and an agent works for a company. There are 30+ attributes for the brokerage, and 20+ attributes for an agent. It seems funny to create a DimAgent dimension which includes all 30 of the brokerage's attributes repeated over and over again. By doing that, you will constantly need to run a "SELECT DISTINCT" on the DimAgent table to get a valid list of brokerages (since they are repeated over and over again). Is this a prime time to snowflake and create a DimBrokerage dimension as well?


1.a - I do agree a DimAgent dimension table is needed.

1.b - How about having two FactPolicyTransaction columns named AgentPolicy and AgentTransaction.
I assume FactPolicyTransaction table will be sliced and diced by a handful of dimensions including AgentPolicy and AgentTransaction so it seams like a reasonable solution.

2 - You need two tables, one describing Brokerage companies and one describing Agents. I'm pretty sure your ER model shows a "belong" relationship in between BrokerageCompanies and Agents.

Hope this helps.

_____________________________________
Pablo (Paul) Berzukov

Author of Understanding Database Administration available at Amazon and other bookstores.

Disclaimer: Advice is provided to the best of my knowledge but no implicit or explicit warranties are provided. Since the advisor explicitly encourages testing any and all suggestions on a test non-production environment advisor should not held liable or responsible for any actions taken based on the given advice.
Sideout1972
Sideout1972
SSC Rookie
SSC Rookie (35 reputation)SSC Rookie (35 reputation)SSC Rookie (35 reputation)SSC Rookie (35 reputation)SSC Rookie (35 reputation)SSC Rookie (35 reputation)SSC Rookie (35 reputation)SSC Rookie (35 reputation)

Group: General Forum Members
Points: 35 Visits: 116
PaulB-TheOneAndOnly (2/9/2011)

1.a - I do agree a DimAgent dimension table is needed.

1.b - How about having two FactPolicyTransaction columns named AgentPolicy and AgentTransaction.
I assume FactPolicyTransaction table will be sliced and diced by a handful of dimensions including AgentPolicy and AgentTransaction so it seams like a reasonable solution.

2 - You need two tables, one describing Brokers and one describing Agents.


1a: I think you're saying DimPolicy should be snowflaked into DimAgent and DimPolicy, where DimPolicy contains the FK of DimAgent. I'm thinking the same thing. However, on to 1b.

1b: Makes sense, as long as all queries would go through the Fact table. In many cases, users will need a list of DimAgent(s) and the DimPolicy(ies) for those agents. Using your proposed solution, you're basically have DimAgency have a FK in both DimPolicy as well as FactPolicyTransaction. As long as I'm not breaking any bad rules by having this triangle between DimAgency, DimPolicy, and FactPolicyTransaction, that works for me.

2: Based on your suggestion, that would mean the new DimBroker table wouldn't have its key stored anywhere except for DimAgent, and DimAgent would have its key used in any appropriate Fact tables. Is that correct?

Thanks!
PaulB-TheOneAndOnly
PaulB-TheOneAndOnly
SSChampion
SSChampion (14K reputation)SSChampion (14K reputation)SSChampion (14K reputation)SSChampion (14K reputation)SSChampion (14K reputation)SSChampion (14K reputation)SSChampion (14K reputation)SSChampion (14K reputation)

Group: General Forum Members
Points: 14069 Visits: 4639
Sideout1972 (2/9/2011)
PaulB-TheOneAndOnly (2/9/2011)

1.a - I do agree a DimAgent dimension table is needed.

1.b - How about having two FactPolicyTransaction columns named AgentPolicy and AgentTransaction.
I assume FactPolicyTransaction table will be sliced and diced by a handful of dimensions including AgentPolicy and AgentTransaction so it seams like a reasonable solution.

2 - You need two tables, one describing Brokers and one describing Agents.


1a: I think you're saying DimPolicy should be snowflaked into DimAgent and DimPolicy, where DimPolicy contains the FK of DimAgent. I'm thinking the same thing. However, on to 1b.

1b: Makes sense, as long as all queries would go through the Fact table. In many cases, users will need a list of DimAgent(s) and the DimPolicy(ies) for those agents. Using your proposed solution, you're basically have DimAgency have a FK in both DimPolicy as well as FactPolicyTransaction. As long as I'm not breaking any bad rules by having this triangle between DimAgency, DimPolicy, and FactPolicyTransaction, that works for me.

2: Based on your suggestion, that would mean the new DimBroker table wouldn't have its key stored anywhere except for DimAgent, and DimAgent would have its key used in any appropriate Fact tables. Is that correct?


I'm not convinced a snowflake is needed. Seeing ER model wouldn't hurt.

In a Data Warehouse environment the data users want to see is the data stored in FACT tables, that data will be sliced and diced using the DIMension tables of the proper datamart.

FACT table in this case will contain both an AgentPolicy and an AgentTransaction columns, each one pointing to the rigth DIMension table for further description.

The use of Data Warehouse to get a "list of brokers" is not what Data Warehouse is meant for, that information should be provided by the OLTP system, nobody analyzes a list of brokers. Having said that, proposed structure even provides that.

Last but not least, DimBroker and DimAgent look to me like slowly changing dimensions - how are you planning to address them? would they be treated as Type 2 SCD?

_____________________________________
Pablo (Paul) Berzukov

Author of Understanding Database Administration available at Amazon and other bookstores.

Disclaimer: Advice is provided to the best of my knowledge but no implicit or explicit warranties are provided. Since the advisor explicitly encourages testing any and all suggestions on a test non-production environment advisor should not held liable or responsible for any actions taken based on the given advice.
Sideout1972
Sideout1972
SSC Rookie
SSC Rookie (35 reputation)SSC Rookie (35 reputation)SSC Rookie (35 reputation)SSC Rookie (35 reputation)SSC Rookie (35 reputation)SSC Rookie (35 reputation)SSC Rookie (35 reputation)SSC Rookie (35 reputation)

Group: General Forum Members
Points: 35 Visits: 116

The use of Data Warehouse to get a "list of brokers" is not what Data Warehouse is meant for, that information should be provided by the OLTP system, nobody analyzes a list of brokers. Having said that, proposed structure even provides that.

Last but not least, DimBroker and DimAgent look to me like slowly changing dimensions - how are you planning to address them? would they be treated as Type 2 SCD?


Interesting. Why shouldn't the DW be used to get the list of brokers? Why should the purpose of the data warehouse is to bring data together from all of the source systems into a conformed data model. Asking the OLTP system to do that, when there are multiple OLTP systems coming together to create the discrete "list of brokers", doesn't make a lot of sense to me. Especially if you are trying to use "Kimball Methodology". I'm looking to avoid the need for a relational model of all of the source systems with conformed data in it.

As far as the SCDs - only a few fields in DimBroker will be slowly changing - the association of a broker into a broker group, which can change over time (kind of like a broker can change regions), so it will be type 2. DimAgent I believe will all by type 1.
ssills
ssills
Valued Member
Valued Member (69 reputation)Valued Member (69 reputation)Valued Member (69 reputation)Valued Member (69 reputation)Valued Member (69 reputation)Valued Member (69 reputation)Valued Member (69 reputation)Valued Member (69 reputation)

Group: General Forum Members
Points: 69 Visits: 117
Looks to me as if there are two fact tables: policies and transactions, both have agent relationships. The former is SCD. The latter probably isn't. Whether broker may or may not be useful probably depends on whether that dimension has information unique from the agent dimension, or if there is a direct relationship between a fact and the broker in the source system. Broker, if it is tied to an agent, could also be SCD (hopefully, very slowly changing).

An ERD would help answer some of these questions.
Sideout1972
Sideout1972
SSC Rookie
SSC Rookie (35 reputation)SSC Rookie (35 reputation)SSC Rookie (35 reputation)SSC Rookie (35 reputation)SSC Rookie (35 reputation)SSC Rookie (35 reputation)SSC Rookie (35 reputation)SSC Rookie (35 reputation)

Group: General Forum Members
Points: 35 Visits: 116
ssills (2/9/2011)
Looks to me as if there are two fact tables: policies and transactions, both have agent relationships. The former is SCD. The latter probably isn't. Whether broker may or may not be useful probably depends on whether that dimension has information unique from the agent dimension, or if there is a direct relationship between a fact and the broker in the source system. Broker, if it is tied to an agent, could also be SCD (hopefully, very slowly changing).

An ERD would help answer some of these questions.


Policy wouldn't be a fact table - it would be a dimension. With that said, an option COULD be to have a factless fact table, which will join all of the dimensions together so there wouldn't be any triangular relationships between the dimensions, and that could solve the "list" type report queries.

As far as the ERD goes - are you talking about seeing the star schema or the OLTP data model? There are multiple source systems that are being pulled into staging tables, so there isn't an ERD to go by. It doesn't really matter how the source system databases are designed, since they do not reflect how we want the data to be reported.
PaulB-TheOneAndOnly
PaulB-TheOneAndOnly
SSChampion
SSChampion (14K reputation)SSChampion (14K reputation)SSChampion (14K reputation)SSChampion (14K reputation)SSChampion (14K reputation)SSChampion (14K reputation)SSChampion (14K reputation)SSChampion (14K reputation)

Group: General Forum Members
Points: 14069 Visits: 4639
Sideout1972 (2/9/2011)
ssills (2/9/2011)
Looks to me as if there are two fact tables: policies and transactions, both have agent relationships. The former is SCD. The latter probably isn't. Whether broker may or may not be useful probably depends on whether that dimension has information unique from the agent dimension, or if there is a direct relationship between a fact and the broker in the source system. Broker, if it is tied to an agent, could also be SCD (hopefully, very slowly changing).

An ERD would help answer some of these questions.


Policy wouldn't be a fact table - it would be a dimension. With that said, an option COULD be to have a factless fact table, which will join all of the dimensions together so there wouldn't be any triangular relationships between the dimensions, and that could solve the "list" type report queries.

As far as the ERD goes - are you talking about seeing the star schema or the OLTP data model? There are multiple source systems that are being pulled into staging tables, so there isn't an ERD to go by. It doesn't really matter how the source system databases are designed, since they do not reflect how we want the data to be reported.


ERD of your Data Warehouse - Star schema has to be based in something.

_____________________________________
Pablo (Paul) Berzukov

Author of Understanding Database Administration available at Amazon and other bookstores.

Disclaimer: Advice is provided to the best of my knowledge but no implicit or explicit warranties are provided. Since the advisor explicitly encourages testing any and all suggestions on a test non-production environment advisor should not held liable or responsible for any actions taken based on the given advice.
Sideout1972
Sideout1972
SSC Rookie
SSC Rookie (35 reputation)SSC Rookie (35 reputation)SSC Rookie (35 reputation)SSC Rookie (35 reputation)SSC Rookie (35 reputation)SSC Rookie (35 reputation)SSC Rookie (35 reputation)SSC Rookie (35 reputation)

Group: General Forum Members
Points: 35 Visits: 116
ERD of your Data Warehouse - Star schema has to be based in something.


I'm definitely missing something (which, of course, is off topic). The data warehouse IS the star schema(s). Other than staging/landing and interim tables, I don't plan to have any relationally-modeled tables that store source system data - that seems like a waste to me. The star schema should be modeled how the business itself understands the data.

Back to being on-topic - I posted the same question on the Kimball Forum, if anyone is interested that group's responses.

http://forum.kimballgroup.com/t957-snowflaking-dimensions
Go


Permissions

You can't post new topics.
You can't post topic replies.
You can't post new polls.
You can't post replies to polls.
You can't edit your own topics.
You can't delete your own topics.
You can't edit other topics.
You can't delete other topics.
You can't edit your own posts.
You can't edit other posts.
You can't delete your own posts.
You can't delete other posts.
You can't post events.
You can't edit your own events.
You can't edit other events.
You can't delete your own events.
You can't delete other events.
You can't send private messages.
You can't send emails.
You can read topics.
You can't vote in polls.
You can't upload attachments.
You can download attachments.
You can't post HTML code.
You can't edit HTML code.
You can't post IFCode.
You can't post JavaScript.
You can post emoticons.
You can't post or upload images.

Select a forum

































































































































































SQLServerCentral


Search