July 1, 2011 at 10:26 am
We are evaluating which would be the best option for a database that have records for properties across the US. There will be something like 108 million records with comprehensive information about each.
The first choice we're discussing is to split the Properties table by counties or use partitions (my favorite choice). I'd love to have your voice to bring to discuss with my team.
The conflicting point is having "VERY FAT" fact tables, completely de-normalized and relying only in BRIDGE tables that cross-reference the fact tables.
Maybe I am outdated, but I can't find a reason to not have a partitioned, normalized database, with correct datatypes which will shrink disk usage in more than 60%. The bridge table the current proposal has, looks to me like a "pseudo-index" which I would trust SQL 2008 R2 to do a better job.
The database is updated daily with the changes from our sources.
Thanks for your time and opinion!
Adriano
July 1, 2011 at 10:43 am
Normalize it. I've managed a database like that (REO properties industry), and the "never been normalized" version (can't call it "denormalized" if it was never normalized in the first place) was huge, unwieldy, slow, a pain to update or query, and made backups and off-site DR plans nightmarish. The normalized version I designed was leaner, faster, easier to query, easy to update, and (because of partitioning and files) easy to backup, replicate, et al.
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
July 1, 2011 at 1:14 pm
email-511704 (7/1/2011)
We are evaluating which would be the best option for a database that have records for properties across the US. There will be something like 108 million records with comprehensive information about each.The first choice we're discussing is to split the Properties table by counties or use partitions (my favorite choice). I'd love to have your voice to bring to discuss with my team.
The conflicting point is having "VERY FAT" fact tables, completely de-normalized and relying only in BRIDGE tables that cross-reference the fact tables.
Maybe I am outdated, but I can't find a reason to not have a partitioned, normalized database, with correct datatypes which will shrink disk usage in more than 60%. The bridge table the current proposal has, looks to me like a "pseudo-index" which I would trust SQL 2008 R2 to do a better job.
The database is updated daily with the changes from our sources.
This looks like a Data Warehouse text-book case where you have a Datamart centered on the 108M rows Property FACT table.
No reason at all not to rely in partitioning.
In regards to data modeling, being a Data Warehouse database I would go with Star Schema modeling rather than with 3NF normalization.
_____________________________________
Pablo (Paul) Berzukov
Author of Understanding Database Administration available at Amazon and other bookstores.
Disclaimer: Advice is provided to the best of my knowledge but no implicit or explicit warranties are provided. Since the advisor explicitly encourages testing any and all suggestions on a test non-production environment advisor should not held liable or responsible for any actions taken based on the given advice.July 6, 2011 at 6:46 am
Actually, converting it to a star/snowflake schema makes all kinds of sense to me, if it's not primarily OLTP in nature. That's a good idea.
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
July 6, 2011 at 9:25 am
Thanks for the replies!
Viewing 5 posts - 1 through 5 (of 5 total)
You must be logged in to reply to this topic. Login to reply