Partition or Multiple Tables by Region - Religion or Cult?

  • We are evaluating which would be the best option for a database that have records for properties across the US. There will be something like 108 million records with comprehensive information about each.

    The first choice we're discussing is to split the Properties table by counties or use partitions (my favorite choice). I'd love to have your voice to bring to discuss with my team.

    The conflicting point is having "VERY FAT" fact tables, completely de-normalized and relying only in BRIDGE tables that cross-reference the fact tables.

    Maybe I am outdated, but I can't find a reason to not have a partitioned, normalized database, with correct datatypes which will shrink disk usage in more than 60%. The bridge table the current proposal has, looks to me like a "pseudo-index" which I would trust SQL 2008 R2 to do a better job.

    The database is updated daily with the changes from our sources.

    Thanks for your time and opinion!

    Adriano

  • Normalize it. I've managed a database like that (REO properties industry), and the "never been normalized" version (can't call it "denormalized" if it was never normalized in the first place) was huge, unwieldy, slow, a pain to update or query, and made backups and off-site DR plans nightmarish. The normalized version I designed was leaner, faster, easier to query, easy to update, and (because of partitioning and files) easy to backup, replicate, et al.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • email-511704 (7/1/2011)


    We are evaluating which would be the best option for a database that have records for properties across the US. There will be something like 108 million records with comprehensive information about each.

    The first choice we're discussing is to split the Properties table by counties or use partitions (my favorite choice). I'd love to have your voice to bring to discuss with my team.

    The conflicting point is having "VERY FAT" fact tables, completely de-normalized and relying only in BRIDGE tables that cross-reference the fact tables.

    Maybe I am outdated, but I can't find a reason to not have a partitioned, normalized database, with correct datatypes which will shrink disk usage in more than 60%. The bridge table the current proposal has, looks to me like a "pseudo-index" which I would trust SQL 2008 R2 to do a better job.

    The database is updated daily with the changes from our sources.

    This looks like a Data Warehouse text-book case where you have a Datamart centered on the 108M rows Property FACT table.

    No reason at all not to rely in partitioning.

    In regards to data modeling, being a Data Warehouse database I would go with Star Schema modeling rather than with 3NF normalization.

    _____________________________________
    Pablo (Paul) Berzukov

    Author of Understanding Database Administration available at Amazon and other bookstores.

    Disclaimer: Advice is provided to the best of my knowledge but no implicit or explicit warranties are provided. Since the advisor explicitly encourages testing any and all suggestions on a test non-production environment advisor should not held liable or responsible for any actions taken based on the given advice.
  • Actually, converting it to a star/snowflake schema makes all kinds of sense to me, if it's not primarily OLTP in nature. That's a good idea.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • Thanks for the replies!

Viewing 5 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply