Geography Dimension

  • I am currently working on a warehouse project that includes a dimension for geography. It includes Zip Code, City and State currently. Latitude and Longitude are available as well in the feed that populates the dimension. My question is should the lat/long be placed on the dimension or fact? I am leaning towards the dimension but am getting pushback as others are saying it is too narrow for the dimension (will cause too many rows on the dimension) and it should be placed on the fact.

    Anyone else have this debate before?

  • Seriously guessing this depends on what your fact is actually recording..... But assuming sales (when in reality, it's just as likely mining equipment moving round an opencut mine), then the count/volume of the Long/Lat wouldn't be any higher being in the dim than in the fact. Personaly, I'd go with the dim. To me, it's like saying we have 1MM clients, so thats a lot of phone numbers, so put them in the fact. They still belong in the dim 🙂

    Steve.

  • That is exactly my thought and funny you should bring it up. We are tracking phone calls made by customers to a business and where the call was made from.

    The consulting company we are working with wants to put the phone number and lat/long in the fact but my argument is it belongs in the dimension as it's not a measure. Should the number be called more than once, storage is less on the dimension than on the fact. Chances are though , either phone or lat/long will most likely appear only once in the data but it's still not a fact and belongs in a dimension.

  • Wondering if they also lose some reporting/analytical functionality by separating it from the rest of the Customer record? If you/they want to attribute this in *any* way then it's much better suited being in the/a dimension. Although you can definitely get exceptions, my rule of thumb on this would be that it's an attribute of the grain of the transaction - ie the phone call. I could theoretically want to build a hierarchy up over it. It's the equivalent of wanting to store the 'call reason' in the fact, which I'm guessing they're not asking for, because, it's an attribute of the call and again I may want to build a hierarchy up over that.

    Steve.

  • One option is to include a field of spatial type Geography and insert into it a polygon of the geographic area. Review the Spatial data types in BOL for more information.

  • Are you going to be capturing any further location detail in the Geography Dimension like street? Is that level of detail needed? If no, then I would not capture the Lat/Lon in this table. Based on what you have right now, the granularity of the dimension is at zip/city level. Capturing the Lat/Lon would make the dimension exponentially larger since each record would essentially have a different Lat/Lon. Assuming a customer may make the call with a cell phone, a single customer could make a call from multiple Lat/Lon even if the customer is always calling from within the same area. If the Lat/Lon is important for auditing/analysis but not used for aggregate analysis, I would just store the detail in the fact since the granularity could potentially be equivalent to the fact granularity.

    In situations where I have a detail where the granularity is very close to the granularity of the Fact, I usually just add it to the fact table. I believe details like these are referred to as degenerate dimensions. I would also add the customer number to fact as a degenerate dimension in your situation.

    If you create a dimension that is essentially 1 to 1 with the fact granularity, there is no benefit.

  • I would choose dimension because it is a attribute and not a measure.

  • I would argue that the central lat/long of a particular ZIP code region is fine enough granularity. Unless you are a Telco or a Utility, mapping the exact coordinates of each customer might be overkill when you can just map by ZIP.

    Since you already have ZIP code and possibly the suburb/county/council etc that names the ZIP code then a single lat and long record defining the centre of that ZIP region on the geography dimension will not create any more records than already exist in the dimension.

    On the other hand, I've previously used coordinates of client offices in a 'customer' dimension, and this was useful for mapping sales on a fairly zoomed in view, the map was about 50km^2 (about 20Mi^2). I didn't have a geography dimension at all, but that design decision depends on the situation.

  • In my mind, this is a question of granularity. There are only 181 latitudes (90 degrees north to 90 degrees south including 0) and 360 longitudes (plus or minus 180 degrees) and they aren't going to change. Two dimension tables could be created since they are independent of one another holding only the degree value for each one which are linked to whatever fact tables require them. Then the precise latitude and longitude can be added as fields for the fact tables (which is probably the only way it could be done for decimal geolocation values). The facts can then be rolled up by the overall latitude and longitude without involving huge tables and still preserve the precise values for a given entry. Even giving the latitude/longitude tables a granularity of one minute (60 minutes per degree), would only involve a table of 21,600 records for longitude and 10,860 for latitude. Again, if the precise location is needed, the full latitude and longitude can be included in the fact tables. It is trivial for a computer to convert decimal to degree/minute/second and back if needed.

    This is similar to the Kimball design tip 51 for handling time of day as a dimension in addition to a calendar dimension.

    ____________
    Just my $0.02 from over here in the cheap seats of the peanut gallery - please adjust for inflation and/or your local currency.

Viewing 9 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply