Dates in Fact or Dimention

  • Hi

    Should my dates be in my fact tables or dimension tables?

    I have information about an application base on dates

    - application start date

    - application submitted date

    - application completed date

    - application closed date

    there is also an amount field

    currently the date fields are in the dimension table and the amount field in the fact.

    Once populated these date fields will be constant.

    would you advise this to be the best way to organise the data?

    I appreciate the question might be vague.

    Thanks in advance

  • You need one generic calendar/dates dimension which will be related to different columns in your fact table.

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2
  • thanks very much for the reply, do you mind explaining a bit please

    thansk

  • You just need 1 Time dimension. That dimension can have several hierarchies to represent natural year, fiscal year, etc. The key for that dimension is a date.

    In your fact tables, you can have different dates which are different attributes acting as foreign keys. All those columns will make reference to the same dimension

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2
  • should the dates be on the fact table or the dimension table?

  • Maybe this simplified example from AdventureWorks can make it look simpler. The dates are both in the fact and the dimension.

    CREATE TABLE [dbo].[DimDate](

    [DateKey] [date] NOT NULL,

    [DayNumberOfWeek] [tinyint] NOT NULL,

    [DayNameOfWeek] [nvarchar](10) NOT NULL,

    [DayNumberOfMonth] [tinyint] NOT NULL,

    [DayNumberOfYear] [smallint] NOT NULL,

    [WeekNumberOfYear] [tinyint] NOT NULL,

    [MonthName] [nvarchar](10) NOT NULL,

    [MonthNumberOfYear] [tinyint] NOT NULL,

    [CalendarQuarter] [tinyint] NOT NULL,

    [CalendarYear] [smallint] NOT NULL,

    [CalendarSemester] [tinyint] NOT NULL,

    [FiscalQuarter] [tinyint] NOT NULL,

    [FiscalYear] [smallint] NOT NULL,

    [FiscalSemester] [tinyint] NOT NULL,

    CONSTRAINT [PK_DimDate_DateKey] PRIMARY KEY CLUSTERED ([DateKey] ASC),

    );

    CREATE TABLE [dbo].[FactInternetSales](

    [ProductKey] [int] NOT NULL,

    [OrderDate] [date] NULL,

    [DueDate] [date] NULL,

    [ShipDate] [date] NULL,

    [CustomerKey] [int] NOT NULL,

    [SalesTerritoryKey] [int] NOT NULL,

    [OrderQuantity] [smallint] NOT NULL,

    [UnitPrice] [money] NOT NULL,

    [TotalProductCost] [money] NOT NULL,

    [SalesAmount] [money] NOT NULL,

    [TaxAmt] [money] NOT NULL,

    );

    ALTER TABLE [dbo].[FactInternetSales] WITH CHECK ADD CONSTRAINT [FK_FactInternetSales_DimDate] FOREIGN KEY([OrderDate])

    REFERENCES [dbo].[DimDate] ([DateKey]);

    ALTER TABLE [dbo].[FactInternetSales] WITH CHECK ADD CONSTRAINT [FK_FactInternetSales_DimDate1] FOREIGN KEY([DueDate])

    REFERENCES [dbo].[DimDate] ([DateKey]);

    ALTER TABLE [dbo].[FactInternetSales] WITH CHECK ADD CONSTRAINT [FK_FactInternetSales_DimDate2] FOREIGN KEY([ShipDate])

    REFERENCES [dbo].[DimDate] ([DateKey]);

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2
  • fantastic mate thanks

  • I would suggest that your date key be an small integer rather than a date data type. That would mean that there will be a day that is 1, and 2, and so on. It's smaller, and over millions of rows this isn't inconsequential. This will allow you to more easily manage an Unknown and a N/A member. Also, the fact table you've described looks like an accumulating snapshot, date differences can be handled with straight math.

    There are multiple opinions on this and things would certainly work the way described, or with a "smart key" such as 20160701 as an integer. Each has advantages and disadvantages but overall I find my solution works well, and have been using it for more than a decade.

  • I don't like that idea because that way you need to join tables for simple queries restricted by dates. Even if you won't have that problem once the cube is processed, the development is a bit harder.

    That's a personal opinion and wouldn't say either option is bad. They both have pros and cons.

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2
  • I don't like that idea because that way you need to join tables for simple queries restricted by dates.

    To that I always reply that no user should be looking at the information in that manner. Only a developer would have to do this, and they should have no trouble making the join. It's just a simple join.

  • RonKyle (7/1/2016)


    I don't like that idea because that way you need to join tables for simple queries restricted by dates.

    To that I always reply that no user should be looking at the information in that manner. Only a developer would have to do this, and they should have no trouble making the join. It's just a simple join.

    It's a simple join, but it's still additional work for the developer and the server.

    I agree that users shouldn't be querying the tables.

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2
  • It's a simple join, but it's still additional work for the developer and the server

    Doing the date the other way takes constant work for the servers. It's not as if they are equal alternatives but one takes a join. The one that doesn't require the join has costs that in my view outweigh this one very small advantage.

  • Luis Cazares (7/1/2016)


    RonKyle (7/1/2016)


    I don't like that idea because that way you need to join tables for simple queries restricted by dates.

    To that I always reply that no user should be looking at the information in that manner. Only a developer would have to do this, and they should have no trouble making the join. It's just a simple join.

    It's a simple join, but it's still additional work for the developer and the server.

    I agree that users shouldn't be querying the tables.

    Day dimensions are a pretty standard concept in datawarehousing, a developer working in a datawarehouse should be comfortable with the concept. They are also a very powerful tool for users who want to easily dissect data by more than just a date and at that point end up saving either the server a lot of work doing a lot of on the fly calculations or the developers a lot of work explaining to users why they have to manually enter every date range.

Viewing 13 posts - 1 through 12 (of 12 total)

You must be logged in to reply to this topic. Login to reply