Multiple Sources and Dimensions

  • I have been trying to get opinions on how to handle this issue and have gotten various opinions.

    I have multiple databases (5) that have similar information in them. They all have vendors (sony), retailers (walmar) and stores (Sears store#100). I am setting up a data warehouse (dimensions and fact tables) that will be using SSAS, SSIS and SSRS to build, maintain and report from.

    I have been researching this and have gotten good information including Vincent Rinardis book on building data warehouses with SQL Server.

    My issue is in how to best handle entities (vendors, retails, stores) that span the multiple databases and each have their own primary keys.

    In Vincents book, he talks about setting up a SourceID that tells where the data for each record came from. This seems fine for the Fact table but I am not sure how to handle this in the dimensions. If I were to use the sourceID for the Sears Retailer, I could have 5 different records in the DimRetailer table - one from each database where the sourceID would be something like: 1 for DB1, 2 for DB2 etc.

    I want to report on a retailer (Sears for example) or for a particular Vendor (i.e. Sony). Wouldn't having multiple records for the same Vendor or Retailer be an issue with any cubes I set up or reports I generate from the cubes?

    I would assume that we would want only one record per vendor in my DimVendor and one record per retailer in my DimRetailer tables. If I had multiple records for each retailer or vendor, wouldn't that screw up the cubes?

    I had thought about having cross reference tables for them with the PK as the surrogate key.

    For example:

    RetailerXref table

    1) PK (identity) - This would then be the Surrogate Key for the DimRetailer table

    2) RetailerID (from a Source and would be your TransactionalID)

    3) Source ID (1 for DB1, 2 for DB2, etc)

    VendorXref table

    1) PK (identity) - This would then be the Surrogate Key for the DimVendor table

    2) VendorID (from a Source and would be your TransactionalID)

    3) Source ID (1 for DB1, 2 for DB2, etc)

    In my ETL, I would use the RetailerXref table with my RetailerID that I get from my DB to get the Surrogate key which I would use in the Fact tables. This Surrogate key would also be the key for the DimRetailer and DimVendor tables.

    So in this case, I assume a sourceID would not be used in the Dimension table as it would always come from the XRef tables.

    I want the reports to reports by Vendor or by Retailer combined not segment the report by database.

    Is this the best way to handle this?

    Thanks,

    Tom

  • That sounds like it would work. Another possibility is to add each of the 5 keys to the dimension.

    Are you conforming the information from the different databatases that are for the same vendor, etc? For instance, if they provide different values for the same field (address, for example), are you resolving these discrepancies?

Viewing 2 posts - 1 through 1 (of 1 total)

You must be logged in to reply to this topic. Login to reply