December 23, 2009 at 6:14 am
Hi
I'm not very experienced in BI since now. I'm trying to transfer data from a OLTP database into a dimension/facts structure.
One of the structures to be denormalized are routes and their centers for shipments (=facts). A shipment has a route which has a shipping- and a distribution-center. Between those two centers, there can be up to nine transit-centers which may be passed.
Here a little sample of the existing OLTP data structure:
USE tempdb;
GO
---==================================================================
-- DDL OLTP
-- the center which can be passed through by a route
CREATE TABLE Centers (
CenterId INT NOT NULL IDENTITY(1,1)
PRIMARY KEY CLUSTERED
,Code VARCHAR(20) UNIQUE
);
-- the routes which are related to a shipment
CREATE TABLE Routes (
RouteId INT NOT NULL IDENTITY(1,1)
PRIMARY KEY CLUSTERED
,Code VARCHAR(30) UNIQUE
);
-- relations between routes and centers
CREATE TABLE RouteCenters (
RouteId INT NOT NULL
REFERENCES Routes (RouteId)
,Sequence TINYINT NOT NULL
,CenterId INT NOT NULL
REFERENCES Centers (CenterId)
,IsShipping BIT NOT NULL
,IsTransit BIT NOT NULL
,IsDistribution BIT NOT NULL
,PRIMARY KEY CLUSTERED (RouteId, Sequence)
);
-- shipments which have a route
CREATE TABLE Shipments (
Id INT NOT NULL IDENTITY(1,1)
PRIMARY KEY CLUSTERED
,Reference VARCHAR(30)
,RouteId INT NOT NULL
REFERENCES Routes (RouteId)
);
GO
---==================================================================
-- some sample data
INSERT INTO Centers
SELECT 'CENTER1'
UNION ALL SELECT 'CENTER2'
UNION ALL SELECT 'CENTER3'
UNION ALL SELECT 'CENTER4';
INSERT INTO Routes
SELECT 'ROUTE1';
INSERT INTO RouteCenters (RouteId, Sequence, CenterId, IsShipping, IsTransit, IsDistribution)
SELECT 1, 1, 1, 1, 0, 0
UNION ALL SELECT 1, 2, 3, 0, 1, 0
UNION ALL SELECT 1, 3, 2, 0, 1, 0
UNION ALL SELECT 1, 4, 4, 0, 0, 1
INSERT INTO Shipments
SELECT 'Shipment1', 1;
GO
-- clean up
DROP TABLE Shipments;
DROP TABLE RouteCenters;
DROP TABLE Centers;
DROP TABLE Routes;
GO
I have no clue how to denormalize correct to get best performance and best usability. Here some possibilities (and a description of the problems I see):
USE tempdb;
GO
---==================================================================
-- DDL OLAP dimensions
-- DimRoutes1 would be a 1:1 copy of source tables
-- PROBLEM: causes a 3-level snowflake structure
CREATE TABLE DimRoutes2 (
Id INT NOT NULL IDENTITY(1,1)
PRIMARY KEY CLUSTERED
,SurrogateId INT NOT NULL
UNIQUE
,RouteName VARCHAR(30)
,CenterName VARCHAR(20)
,IsShipping BIT NOT NULL
,IsTransit BIT NOT NULL
,IsDistribution BIT NOT NULL
);
-- PROBLEM: causes a many to many relation between FACT and DIM
CREATE TABLE DimRoutes3 (
Id INT NOT NULL IDENTITY(1,1)
PRIMARY KEY CLUSTERED
,SurrogateId INT NOT NULL
UNIQUE
,RouteName VARCHAR(30)
,ShippingCenterName VARCHAR(20)
,DistributionCenterName VARCHAR(20)
,TransitCenter1Name VARCHAR(20)
,TransitCenter2Name VARCHAR(20)
,TransitCenter3Name VARCHAR(20)
,TransitCenter4Name VARCHAR(20)
,TransitCenter5Name VARCHAR(20)
,TransitCenter6Name VARCHAR(20)
,TransitCenter7Name VARCHAR(20)
,TransitCenter8Name VARCHAR(20)
,TransitCenter9Name VARCHAR(20)
);
-- PROBLEM: administration horror and ugly to utilize by users
GO
-- clean up
DROP TABLE DimRoutes2;
DROP TABLE DimRoutes3;
Any suggestions for those approaches (or any other) would be helpful!
Thanks
Flo
December 23, 2009 at 7:29 am
A Data Warehouse should generally be designed to at least Boyce-Codd / 5th Normal Form. That is usually the best way to balance the requirements of flexibility, integrity, data loading and performance.
If you need to consolidate and simplify it for presentational purposes then create Data Marts to do that, or do it when you load your cube / presentation tier model.
The question to ask yourself about your original normalized design is whether you need new attributes for tables that must preserve history. Typically you'll probably want to add Valid From and To dates to some of those tables.
December 23, 2009 at 7:51 am
Hi David
Thanks for your feedback.
So, I should goon with my source data structure? Independent of the snowflake in this case? There are not too many rows in this complete structure - some thousands.
I try to setup a star-schema to load a cube.
The question to ask yourself about your original normalized design is whether you need new attributes for tables that must preserve history. Typically you'll probably want to add Valid From and To dates to some of those tables.
Versioning and history tables are available. My sample was just a simplified version of the full blown structure. However, thanks for the hint. 🙂
Greets
Flo
December 24, 2009 at 6:14 am
Hi
Before deciding to Normalize to Sixth Normal Form:-)
Regards Kees
Viewing 4 posts - 1 through 4 (of 4 total)
You must be logged in to reply to this topic. Login to reply