First DW design - advice appreciated

  • All,

    Up to this point I've only worked with relational databases. I'm trying to teach myself data warehousing (currently with more 'non-work' time than I would like so taking the chance to add to my knowledge and my CV). Specifically the Snowflake model as it seemed more aligned to relational databases and easier to learn first?

    With this post I am not looking for someone to do my research for me. I would appreciate an opinion on whether I'm correct in my basic understanding or whether I've completely misunderstood the basics and need to start again with my research.

    My relational tables are as follows:

    CREATE TABLE [dbo].[Posts](
    [Id] [int] IDENTITY(1,1) NOT NULL,
    [AcceptedAnswerId] [int] NULL,
    [AnswerCount] [int] NULL,
    [Body] [nvarchar](max) NOT NULL,
    [ClosedDate] [datetime] NULL,
    [CommentCount] [int] NULL,
    [CommunityOwnedDate] [datetime] NULL,
    [CreationDate] [datetime] NOT NULL,
    [FavoriteCount] [int] NULL,
    [LastActivityDate] [datetime] NOT NULL,
    [LastEditDate] [datetime] NULL,
    [LastEditorDisplayName] [nvarchar](40) NULL,
    [LastEditorUserId] [int] NULL,
    [OwnerUserId] [int] NULL,
    [ParentId] [int] NULL,
    [PostTypeId] [int] NOT NULL,
    [Score] [int] NOT NULL,
    [Tags] [nvarchar](150) NULL,
    [Title] [nvarchar](250) NULL,
    [ViewCount] [int] NOT NULL
    ) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]
    GO

    CREATE TABLE [dbo].[PostHistory](
    [Id] [int] IDENTITY(1,1) NOT NULL,
    [PostHistoryTypeId] [int] NOT NULL,
    [PostId] [int] NOT NULL,
    [RevisionGUID] [uniqueidentifier] NULL,
    [CreationDate] [datetime] NOT NULL,
    [UserId] [int] NULL,
    [UserDisplayName] [nvarchar](40) NULL,
    [Comment] [nvarchar](max) NULL,
    [Text] [nvarchar](max) NULL,
    CONSTRAINT [PK_PostHistory__Id] PRIMARY KEY CLUSTERED
    (
    [Id] ASC
    )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
    ) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]
    GO


    CREATE TABLE [dbo].[PostTypes](
    [Id] [int] IDENTITY(1,1) NOT NULL,
    [Type] [nvarchar](50) NOT NULL,
    CONSTRAINT [PK_PostTypes__Id] PRIMARY KEY CLUSTERED
    (
    [Id] ASC
    )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
    ) ON [PRIMARY]
    GO

    Posts - This is the main transaction table. I think this would be the 'fact' table in a Datawarehouse Snowflake design?

    PostTypes - Each post has one post type (many posts to 1 type). As a user might want to analyse the data by this table it would be a dimension in the DataWarehouse?

    PostHistory - As the name suggests this is the change table for the posts table. It's probably not a table someone would want to analyse by so it wouldn't be a Dimension table? I don't think you can have two levels of Fact table within a SnowFlake model so the data would be included in the Post Fact table in a denormalised structure?

    Apologises if the above is complete rubbish and a waste of time reading. Also apologies if it's off topic as it's related to learning rather than a business issue.

    If the above makes some sense then my intended next step is to setup a Data Warehouse and use SSIS to import the data. The idea being to learn more about SSIS, with a practical example, and create a data warehouse to run some DAX queries again. Any feedback on whether this is a good learning plan would be appreciated.

     

    • This topic was modified 4 years, 3 months ago by  as_1234. Reason: Typing error
  • Instead of thinking of it as "what is this table in a data warehouse" think of it more like "what is the business process that I want to model".  Whatever table in your OLTP system has all the line by line transactions will be what your fact table is based on, but some of the columns you have in this Posts table will be broken out into dimension tables.

    A typical DW will have some sort of date dimension, with one row per calendar day, extending back through time for the business process you're modeling and forward as well.  Other dimensions I see would be a User dimension, and some sort of Classification dimension to cover things like the PostType you mention.

     

  • Thank you for your help.

Viewing 3 posts - 1 through 2 (of 2 total)

You must be logged in to reply to this topic. Login to reply