How to Relate a Snowflake Dimension to the primary Dimension

Question

How to Relate a Snowflake Dimension to the primary Dimension

geoff5

Hall of Fame

Points: 3560
More actions
November 6, 2012 at 9:46 am

#284529

I have a dimension table with names of individuals and several demographics associated with the names. Each name can have multiple physical addresses types and multiple telephone number types. The number of address types or phone types for any given person is variable. I currently plan to add an dimAddresses table and a dimPhones table as "snowflake" dimensions of the dimNames table in order to accommodate this one-to-many relationship of names-to-addresses and names-to-phones.
My question is this: What should be the foreign key in the snowflake dimensions back to the slowly changing dimNames dimension? If I use the surrogate key, then every time a change to dimNames requires a new row for the same person, the dimAddresses and dimPhones tables will need to be updated too with the new surrogate key. But if I use the business key from the OLTP data, then the dimAddresses and dimPhones tables would have to be updated much more rarely.
Any insights from experienced data warehouse designers are welcome.
Thanks in advance!
Geoff

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply

geoff5 Hall of Fame Points: 3560 More actions · Answer 1

I have continued to research this question, and I have come to the conclusion that the term "snowflake dimension" does not really describe what my intent is for the "addresses" and "phones" dimensions. In fact it seems that open-ended one-to-many relationships between one dimension and another are not really part of the picture in a standard data warehouse design.

Unless I run across a solution that fits my needs, I am going to incorporate the "main" address and phone number data into the dimNames table and then simply ignore any secondary address and phone types. This will mean that the data warehouse will not be suitable for some types of reporting (such as sending out AR invoices, a function that needs to take into account the secondary AR address type along with the main address type). But it will simplify the design and bring it into conformity with a traditional data warehouse star schema.

Perhaps for reports needed in business processes (such as monthly statements, invoices, etc.), I should simply use the OLTP database anyway. I was under the impression that a data warehouse could be the sole source of data for all reporting needs, but that may be asking too much. My current opinion is that the data warehouse is the best source of data for analytical reporting, but not necessarily for reports of other kinds.

I am still interested in additional comment if you have some insight into this scenario.

Jason-299789 SSC-Insane Points: 21601 More actions · Answer 2

Could you not use a Bridge/Factless Fact table to accomplish this rather than a straightforward snowflake type join?

Then with a Many to Many relationship through the Bridge/factless fact you should be able to accomlish all your reporting needs.

_________________________________________________________________________
SSC Guide to Posting and Best Practices

geoff5 Hall of Fame Points: 3560 More actions · Answer 3

I'm not sure how a factless fact table would differ from a snowflake dimension in terms of the actual table structure. But even if I pursue this approach, I am left with the question: Do I relate the address to the name using the business key or with the surrogate key? And If I choose to relate with the surrogate key, what is the best way to handle the update when a new surrogate key is assigned to the same name?

Jason-299789 SSC-Insane Points: 21601 More actions · Answer 4

but if you built a bridge table between the Address and Name that just contained the Surrogate keys from either side, effectively creating a Tuple wouldnt that work?

EG

CREATE TABLE Dim.Name (

NameSurrogateKey int

,NameBusinessKey

,Name varchar(100)

)

CREATE TABLE Dim.Address(

AddressSurrogateKey int

,AddressBusinessKey

,Address1 varchar(1000)

,Phone1 varchar(100)

)

CREATE TABLE Bridge.AddressName(

NameSurrogateKey int

,AddressSurrogateKey Int

)

That way you've created a Tuple between the two so part of the ETL simply looks up the new key combinations and inserts them as you must have some way to link them already.

In SQL you would do something like

SELECT

*

From

Dim.Name n

JOIN Bridge.AddressName an on n.NameSurrogateKey=an.NameSurrogateKey

JOIN Address a on n.AddressSurrogateKey=a.AddressSurrogateKey

Where

Address1='somewhere'

AND/OR Name='Someone'

what are you using for your reporting tool?

_________________________________________________________________________
SSC Guide to Posting and Best Practices

geoff5 Hall of Fame Points: 3560 More actions · Answer 5

That is not unlike how the many-to-many relationship is handled in the OLTP database from which I am pulling the data. In this approach every time a new Type 2 change row is added to Dim.Name, a corresponding row would be added to Dim.AddressName, with a query or expression to associate the existing AddressSurrogateKey with the new NameSurrogateKey. Is that right?

And the reporting tool might vary, but mostly would be Reporting Services.

geoff5 Hall of Fame Points: 3560 More actions · Answer 6

I think I finally figured out why it is acceptible to use the surrogate key for the many-to-many relationship between dimensions like my dimNames and dimAddresses dimensions. It is because the assumption is that the query or report will group the results by the business key and therefore some combination of an aggregate function (MIN or MAX most likely) and a selective function (such as CASE) will allow the address to be associated with the name regardless of which specific row in dimNames the address is attached to.

In my mind I was thinking that the address always needed to be associated with the "current row," but this clearly wouldn't always be the case with the surrogate key as the link.

However, I think the general consensus for data warehousing purposes is that as long as the address is associated with the person via any row, not necessarily the "current" one, the address should be retrievable.