How to Define a schema for the fact table, and the dimensional tables in SQL from a relational schema?

  • I am very new to Data warehouse and OLAP, and I dont know how to create OLAP database from a relational schema.

    For example, I have a schema like this:

    Student (Id, Name, Addr, Status)

    Professor (Id, Name, DeptId, Rank)

    Course (DeptId, CrsCode, CrsName, Descr)

    Transcript (StudId, CrsCode, Semester, Grade)

    Teaching (ProfId, CrsCode, Semester)

    Department (DeptId, Name, FacultyName)

    And I want to create a fact table and its dimensional table from that. I think my fact table will include:

    Fact_Table(DeptId, CrsCode, ProfessorId, Semester, enrollment)

    with 4 dimensional tables:

    Department, Course, Professor and Semester

    the attribute enrollment is the number of students in a class. But that is the design, I dont know how to implement it in SQL. is that like:

    Create table Fact_table(

    column_name1 data_type(size),

    column_name2 data_type(size),

    column_name3 data_type(size),

    ....

    );

    or I have to do in another way. please help me! Thank you.

  • First of all you have to think about what it is that you want to measure.

    Is it the number of students enrolled in a course? The number of courses a professor teaches?

    Need an answer? No, you need a question
    My blog at https://sqlkover.com.
    MCSE Business Intelligence - Microsoft Data Platform MVP

  • that's right. I want to measure the number of students in class.

  • OK, but now your grain of the fact table (meaning the lowest level at which you can analyze) is at the course level. Typically you would choose the lowest level possible.

    For example, for a store selling bikes, you would want to store the sales per day, not per month. If you would store them by month, you would never be able to analyze data at the day or week level.

    In your case, you would not be able to analyze your data at the student level. Your fact table would not be able to answer your questions like these:

    * How many male students are enrolled for that course?

    * How many students of the age group between 20-25 are enrolled for that course?

    * Show me the different cities where the students come from.

    * ...

    So I would add a student dimension and a studentID in your fact table. Now you would have to aggregate data to get the number of students for a course, but that's OK. That's what we have cubes for 🙂

    Need an answer? No, you need a question
    My blog at https://sqlkover.com.
    MCSE Business Intelligence - Microsoft Data Platform MVP

  • you are great. honestly, this is one of my assignment, I just have to define a schema for the fact table, and the dimensional tables in SQL with the information above and make a bulk insertion into the fact table. So, just ignore the correctness of the design. However, your suggestion is awesome.

    After designing the database, what do I have to do next?

  • Probably implement the database? Create the tables, create indexes to speed up performance, if necessary create foreign keys.

    Need an answer? No, you need a question
    My blog at https://sqlkover.com.
    MCSE Business Intelligence - Microsoft Data Platform MVP

  • that is what I am asking for. to create the database, can I use the normal way as relational database? and which table do I have to create first, dimensional tables or fact table?

  • phamkhanhtung1989 (7/30/2013)


    that is what I am asking for. to create the database, can I use the normal way as relational database? and which table do I have to create first, dimensional tables or fact table?

    A data warehouse in SQL Server is just the same as a regulare database, it is just modelled using the dimensional methodology.

    An OLAP database however, is created using SSAS.

    Need an answer? No, you need a question
    My blog at https://sqlkover.com.
    MCSE Business Intelligence - Microsoft Data Platform MVP

  • Oh, thank you for helping me.

  • No problem, glad to help.

    Need an answer? No, you need a question
    My blog at https://sqlkover.com.
    MCSE Business Intelligence - Microsoft Data Platform MVP

  • Hi, there are what I have done. can you check over for me if it's wrong?

    //Create dimensional tables

    create table [dbo].[Dim_Course](

    [CrsCode] [char](8) not null,

    [CrsName] [nvarchar](50) not null,

    [Descr] [nvarchar](200)

    Constraint [PK_Dim_Course] primary key

    (

    [CrsCode]

    )

    )

    create table [dbo].[Dim_Dept](

    [DeptId] [char](4) not null,

    [Name] [nvarchar](50) not null,

    [FacultyName] [nvarchar](50) not null

    Constraint [PK_Dim_Dept] primary key clustered

    (

    [DeptId]

    )

    )

    Create table [dbo].[Dim_Professor](

    [Id] [char](6) not null,

    [Name][nvarchar](50) not null,

    [Address] [nvarchar](50) not null,

    [Status] [bit] not null

    Constraint [PK_Dim_Professor] primary key clustered

    (

    [Id]

    )

    )

    Create table [dbo].[Dim_Semester](

    [Data_key] [int] not null,

    [Semester] [nvarchar](20) not null,

    [Year][int] not null

    Constraint [PK_Dim_Semester] primary key

    (

    [Data_key]

    )

    )

    //Create Fact_Table

    Create table [dbo].[Fact_Table](

    [DeptId] [char](4) not null,

    [CrsCode] [char](8) not null,

    [Id] [char](6) not null,

    [Data_key] [int] not null,

    [Enrollment][int] not null

    Constraint [PK_Fact_Table] primary key

    (

    [DeptId],

    [CrsCode],

    [Id],

    [Data_key]

    )

    foreign key ([DeptId]) references Dim_Dept([DeptId]),

    foreign key (CrsCode) references Dim_Course(CrsCode),

    foreign key (Id) references Dim_Professor(Id),

    foreign key (Data_key) references Dim_Semester(Data_key)

    )

    I'm a little bit confused about the address attribute in the Dim_Professor. Normally, for dimensional table, every attribute has to be organized as a hierarchy. So, it would be:

    Number > Street > City> Province (territory) > Country. However, if I organize it like that, how can I insert data into that table from a relational database in which the address attribute is just address?

  • Regarding the address: typically you would store the street and number in one field, but zipcode, town and country in other fields. Maybe in a different dimension altogether: the geography dimension.

    Regarding your dimension tables: in correct dimensional modelling, you add a surrogate key, which is a meaningless integer key. The fact table will hold the surrogate keys because they are smaller than your business keys. It also makes your joins quicker.

    Need an answer? No, you need a question
    My blog at https://sqlkover.com.
    MCSE Business Intelligence - Microsoft Data Platform MVP

  • Is it necessary to define the address as a hierarchy?

    Also, do you mean these primary keys in those tables are redundant?

  • It is not necessary, it depends on the requirements. If you never going to analyze on location, it is not necessary.

    However, if you would like to answer questions like "which regions give us the most students for this course", I would implement the hierarchy.

    The surrogate key is not redundant, it has its purpose. It is independent from application data, so it protects your data model from changes.

    For example, what if they later on decide to change the department IDs and they are not unique no more? Your surrogate key is still unique, so you don't have any problems in your department dimension and your fact table.

    I would suggest you read the Data Warehouse Toolkit by Ralph Kimball, it explains the dimensional theory pretty well.

    Need an answer? No, you need a question
    My blog at https://sqlkover.com.
    MCSE Business Intelligence - Microsoft Data Platform MVP

  • so overall, is it ok with my schema?

    I also wrote a query to insert huge amount of data from the relational schema into fact table.

    insert into Fact_Table(DeptId, CrsCode, Id, SemesterId, Enrollment)

    select from Department.DeptId, Course.CrsCode, Professor.Id, Transcript.Semester,

    (Select count(*) from Transcript t where t.CrsCode=Course.CrsCode and t.Semester=Transcript.Semester)

    from Department inner join Course on Department.DeptId=Course.DeptId

    inner join Transcript on Course.CrsCode=Transcript.CrsCode

    inner join Teaching on Course.CrsCode=Teaching.CrsCode

    inner join Professor on Teaching.ProfId=Professor.Id

    group by Department.DeptId, Course.CrsCode, Professor.Id, Transcript.Semester

    could you take a quick look at it and tell me it's ok or not?

    I cannot check it as my sql server 2008 has an error with the sql server analysis services.

Viewing 15 posts - 1 through 14 (of 14 total)

You must be logged in to reply to this topic. Login to reply