NULL dates in fact table

  • Hi

    When I'm populating my FactOrders table i have four date fields OrderDate, ConfirmedDate, DispatchDate and EventDate depending on the stage of the order I may not have values for some of these dates i.e. if an order hasn't been dispatched yet. From my reading, these nulls should be replaced with the NoMatch key in the DimDate during my ETL. Problem is i want to calculate lead times between each stage in the order but having, in my case, 19000101 as a date key for missing dates returns an inaccurate lead time. We have an outside contractor building a cube and he thinks if there is no date for the order stage then the fact date should be null?

  • I do tend to agree. Logically, if an order has been placed, but we do not yet have a fulfillment date, then it is unknown, which is what null means, the date is not 1900-01-01. I also don't see which calculations are going to be made better using a fake date rather than null. If you want an estimated date for those where we as yet have no fulfillment date DATEADD(day, <calculation>, OrderDate) WHERE <otherDate> is null would seem to work perfectly well

    I'm a DBA.
    I'm not paid to solve problems. I'm paid to prevent them.

  • I would agree yes. NULL and 0/blank are completely different in terms of data.

    A NULL value represents an COMPLETE ABSENCE of data, where as a 0, or Blank String ("") may represent there is data, but doesn't represent the absence of it.

    When using a NULL value for calculations, it will be treated differently to a blank cell. For example, Counting a column [Name] within dataset which contains 5 rows. 3 of these rows have value in it, one has a blank string (""), and another has the value NULL. Performing COUNT([NAME]) will result in a value of 4, NOT 3, as "" is still a value.

    The same would be true for your Dates. If you have a calculate the difference in days for today (18-Oct-2016) and 0 (01-Jan-1900), your answer is going to be -42659, which doesn't make any sense, and could well cause problems with your calculations. a difference in days for today and NULL is NULL. This means any aggregations, etc, will ignore the value.

    I've seen people often think that "" and NULL are the same (our old web developer used to populate every column with "" when he didn't have a value, and it drove me close to insanity), but in truth, that are actually completely different, and represent different things.

    On a side note, that doesn't mean that NULL can't also represent a value. For example, if you have a Yes/No field, a NULL value could be assumed to mean "No", depending on the question. This is far from always true, as it is completely dependant on the data being queried.

    Thom~

    Excuse my typos and sometimes awful grammar. My fingers work faster than my brain does.
    Larnu.uk

  • Piling on, using either 19000101 or 99991231 as a date key for missing dates is generally not a good idea and can produce very hard to unravel mess in the calculations whilst the NULL/EMPTY are ignored.

    😎

    Some of the interesting side effects of using min or max date for missing date values are the likes of time travel, negative duration and other peculiarities, something that one can normally do without.

  • Yes starting to understand that. I guess I'm miss-interpreting Kimballs explanation

    "“Null-valued measurements behave gracefully in fact tables. The aggregate functions (SUM, COUNT, MIN, MAX, and AVG) all do the “right thing” with null facts. However, nulls must be avoided in the fact table’s foreign keys because these nulls would automatically cause a referential integrity violation. Rather than a null foreign key, the associated dimension table must have a default row (and surrogate key) representing the unknown or not applicable condition.”"

  • ps_vbdev (10/18/2016)


    Rather than a null foreign key, the associated dimension table must have a default row (and surrogate key) representing the unknown or not applicable condition.”"

    Exactly, so you might have a Date Foreign key in your Fact table of "999999", however, the Date Value in your DateDimension has a value of NULL for that Key.

    This retains the data integrity between FACT and DIMENSION, but when counting your Date value, returns the expected NULL value.

    Thom~

    Excuse my typos and sometimes awful grammar. My fingers work faster than my brain does.
    Larnu.uk

  • Ahhhhh how stupid am i, off course? taking in too much information at the one time. that makes sense. thanks, guys 🙂

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply