Dimension modelling question

  • Hi

    I'm not very experienced in BI since now. I'm trying to transfer data from a OLTP database into a dimension/facts structure.

    One of the structures to be denormalized are routes and their centers for shipments (=facts). A shipment has a route which has a shipping- and a distribution-center. Between those two centers, there can be up to nine transit-centers which may be passed.

    Here a little sample of the existing OLTP data structure:

    USE tempdb;

    GO

    ---==================================================================

    -- DDL OLTP

    -- the center which can be passed through by a route

    CREATE TABLE Centers (

    CenterId INT NOT NULL IDENTITY(1,1)

    PRIMARY KEY CLUSTERED

    ,Code VARCHAR(20) UNIQUE

    );

    -- the routes which are related to a shipment

    CREATE TABLE Routes (

    RouteId INT NOT NULL IDENTITY(1,1)

    PRIMARY KEY CLUSTERED

    ,Code VARCHAR(30) UNIQUE

    );

    -- relations between routes and centers

    CREATE TABLE RouteCenters (

    RouteId INT NOT NULL

    REFERENCES Routes (RouteId)

    ,Sequence TINYINT NOT NULL

    ,CenterId INT NOT NULL

    REFERENCES Centers (CenterId)

    ,IsShipping BIT NOT NULL

    ,IsTransit BIT NOT NULL

    ,IsDistribution BIT NOT NULL

    ,PRIMARY KEY CLUSTERED (RouteId, Sequence)

    );

    -- shipments which have a route

    CREATE TABLE Shipments (

    Id INT NOT NULL IDENTITY(1,1)

    PRIMARY KEY CLUSTERED

    ,Reference VARCHAR(30)

    ,RouteId INT NOT NULL

    REFERENCES Routes (RouteId)

    );

    GO

    ---==================================================================

    -- some sample data

    INSERT INTO Centers

    SELECT 'CENTER1'

    UNION ALL SELECT 'CENTER2'

    UNION ALL SELECT 'CENTER3'

    UNION ALL SELECT 'CENTER4';

    INSERT INTO Routes

    SELECT 'ROUTE1';

    INSERT INTO RouteCenters (RouteId, Sequence, CenterId, IsShipping, IsTransit, IsDistribution)

    SELECT 1, 1, 1, 1, 0, 0

    UNION ALL SELECT 1, 2, 3, 0, 1, 0

    UNION ALL SELECT 1, 3, 2, 0, 1, 0

    UNION ALL SELECT 1, 4, 4, 0, 0, 1

    INSERT INTO Shipments

    SELECT 'Shipment1', 1;

    GO

    -- clean up

    DROP TABLE Shipments;

    DROP TABLE RouteCenters;

    DROP TABLE Centers;

    DROP TABLE Routes;

    GO

    I have no clue how to denormalize correct to get best performance and best usability. Here some possibilities (and a description of the problems I see):

    USE tempdb;

    GO

    ---==================================================================

    -- DDL OLAP dimensions

    -- DimRoutes1 would be a 1:1 copy of source tables

    -- PROBLEM: causes a 3-level snowflake structure

    CREATE TABLE DimRoutes2 (

    Id INT NOT NULL IDENTITY(1,1)

    PRIMARY KEY CLUSTERED

    ,SurrogateId INT NOT NULL

    UNIQUE

    ,RouteName VARCHAR(30)

    ,CenterName VARCHAR(20)

    ,IsShipping BIT NOT NULL

    ,IsTransit BIT NOT NULL

    ,IsDistribution BIT NOT NULL

    );

    -- PROBLEM: causes a many to many relation between FACT and DIM

    CREATE TABLE DimRoutes3 (

    Id INT NOT NULL IDENTITY(1,1)

    PRIMARY KEY CLUSTERED

    ,SurrogateId INT NOT NULL

    UNIQUE

    ,RouteName VARCHAR(30)

    ,ShippingCenterName VARCHAR(20)

    ,DistributionCenterName VARCHAR(20)

    ,TransitCenter1Name VARCHAR(20)

    ,TransitCenter2Name VARCHAR(20)

    ,TransitCenter3Name VARCHAR(20)

    ,TransitCenter4Name VARCHAR(20)

    ,TransitCenter5Name VARCHAR(20)

    ,TransitCenter6Name VARCHAR(20)

    ,TransitCenter7Name VARCHAR(20)

    ,TransitCenter8Name VARCHAR(20)

    ,TransitCenter9Name VARCHAR(20)

    );

    -- PROBLEM: administration horror and ugly to utilize by users

    GO

    -- clean up

    DROP TABLE DimRoutes2;

    DROP TABLE DimRoutes3;

    Any suggestions for those approaches (or any other) would be helpful!

    Thanks

    Flo

  • A Data Warehouse should generally be designed to at least Boyce-Codd / 5th Normal Form. That is usually the best way to balance the requirements of flexibility, integrity, data loading and performance.

    If you need to consolidate and simplify it for presentational purposes then create Data Marts to do that, or do it when you load your cube / presentation tier model.

    The question to ask yourself about your original normalized design is whether you need new attributes for tables that must preserve history. Typically you'll probably want to add Valid From and To dates to some of those tables.

  • Hi David

    Thanks for your feedback.

    So, I should goon with my source data structure? Independent of the snowflake in this case? There are not too many rows in this complete structure - some thousands.

    I try to setup a star-schema to load a cube.

    The question to ask yourself about your original normalized design is whether you need new attributes for tables that must preserve history. Typically you'll probably want to add Valid From and To dates to some of those tables.

    Versioning and history tables are available. My sample was just a simplified version of the full blown structure. However, thanks for the hint. 🙂

    Greets

    Flo

  • Hi

    Read this http://en.wikipedia.org/wiki/Data_warehouse#Normalized_versus_dimensional_approach_for_storage_of_data

    Before deciding to Normalize to Sixth Normal Form:-)

    Regards Kees

Viewing 4 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply