SQL Clone
SQLServerCentral is supported by Redgate
 
Log in  ::  Register  ::  Not logged in
 
 
 


How much database normalization is enough?


How much database normalization is enough?

Author
Message
bkubicek
bkubicek
SSCrazy Eights
SSCrazy Eights (9.7K reputation)SSCrazy Eights (9.7K reputation)SSCrazy Eights (9.7K reputation)SSCrazy Eights (9.7K reputation)SSCrazy Eights (9.7K reputation)SSCrazy Eights (9.7K reputation)SSCrazy Eights (9.7K reputation)SSCrazy Eights (9.7K reputation)

Group: General Forum Members
Points: 9725 Visits: 1092
Comments posted to this topic are about the item How much database normalization is enough?
xsevensinzx
xsevensinzx
SSC-Insane
SSC-Insane (23K reputation)SSC-Insane (23K reputation)SSC-Insane (23K reputation)SSC-Insane (23K reputation)SSC-Insane (23K reputation)SSC-Insane (23K reputation)SSC-Insane (23K reputation)SSC-Insane (23K reputation)

Group: General Forum Members
Points: 23032 Visits: 6103
It depends.

I work with data that is highly denormalized. It works great because of columnstores. The simple structure allows for easy queries and for the use cases of analytics. I can't easily normalize because I work in a MPP database with extremely large datasets. More joins the worse it gets. Thus, I cause redundancy for performance gains oddly enough, especially being I only get one hashing key per table and thus, I replicate data on different hashing keys for that perfomance increase.
Dave Poole
Dave Poole
SSC Guru
SSC Guru (64K reputation)SSC Guru (64K reputation)SSC Guru (64K reputation)SSC Guru (64K reputation)SSC Guru (64K reputation)SSC Guru (64K reputation)SSC Guru (64K reputation)SSC Guru (64K reputation)

Group: General Forum Members
Points: 64126 Visits: 4053
It depends entirely on the use case.
If you have a dimensional data mart then it's probably heavily denormalised
For systems such as SAS analytics they thrive on heavily denormalised data.
In an OLTP system you probably want it normalised because it deals with atomic objects. A product, a sale, an order, a line item etc.
In a system that brings together disparate data sources you probably want it highly normalised to ease the data integration.
In a data lake you probably want someone with undauntable optimism

LinkedIn Profile
www.simple-talk.com
Japie Botma
Japie Botma
SSCrazy
SSCrazy (2.5K reputation)SSCrazy (2.5K reputation)SSCrazy (2.5K reputation)SSCrazy (2.5K reputation)SSCrazy (2.5K reputation)SSCrazy (2.5K reputation)SSCrazy (2.5K reputation)SSCrazy (2.5K reputation)

Group: General Forum Members
Points: 2539 Visits: 451
I agree Ben. Small systems usually have limited resources on hardware. Joining 6 tables instead of 10 gives you quite a performance increase. Also, and I know many people will not agree, but foreign keys slow a system. Doing everything via stored procedures still gives you the control when what happens AND THE SPEED.

5ilverFox
Consulting DBA / Developer
South Africa
steveo250k
steveo250k
SSC-Enthusiastic
SSC-Enthusiastic (196 reputation)SSC-Enthusiastic (196 reputation)SSC-Enthusiastic (196 reputation)SSC-Enthusiastic (196 reputation)SSC-Enthusiastic (196 reputation)SSC-Enthusiastic (196 reputation)SSC-Enthusiastic (196 reputation)SSC-Enthusiastic (196 reputation)

Group: General Forum Members
Points: 196 Visits: 64
Most definitely depends on use cases AND common sense. Take 1NF. It seems we can always divide something into even smaller units. A street address has a number, street name, and suffix such as RD, AVE, BLVD, etc. There can also be a compass direction. E.g., 3704 E. Main St. But, why would we ever want to break the street address up into these smaller parts? We almost never want to query on these parts (Give me a count of addresses grouped by suffix?). Even a simple string can be broken up into words, then letters, and even vowels, consonants, etc. But it would be absurd to design a database this way.

Conversely, we can go too far the other way, as this post pointed out. I'm currently working for a company that recently moved off a Main Frame computer. Everything is now in SQL Server. For the younger folks, Main Frame architecture was largely file based. Step one was to move the architecture to the new technology. JCL became Windows .bat files, COBOL became machine generated Java, and the Adabase/File system was translated, as best they could, into a relational database. But relational in name only. As these projects go, we never got to step 2. Step 2 would have been reworking the architecture into a relational model. Of course, the original Main Frame/COBOL developers remained and continued maintaining the system with little knowledge of relational modeling principles. We have many tables with 100-200+ columns. It's truly genius how they got everything to work. But very inefficient. I work in BI building data warehouses and creating reports. I can often, with a few lines of SQL, recreate a legacy report that took a dozen flat files, even more JCL batch files, and 5-10 COBOL "programs". I just finished one that can be run, on demand, in less that 15 seconds. The previous version was run weekly because it took almost 3 and a half hours. This is not an isolated horror story. Many companies are in the same boat. It is a testament to the power of a properly designed relational model.
latkinson
latkinson
SSC Eights!
SSC Eights! (840 reputation)SSC Eights! (840 reputation)SSC Eights! (840 reputation)SSC Eights! (840 reputation)SSC Eights! (840 reputation)SSC Eights! (840 reputation)SSC Eights! (840 reputation)SSC Eights! (840 reputation)

Group: General Forum Members
Points: 840 Visits: 128
definitely an issue of use case, I will agree.
My brain boggles at even how I would find a data situation that would require much past 4th normal form, and even that tends to cause mental explosions. Experience levels of the DBA and Architect also affect the final solution. I've met a few folks fresh out of training hell-bent on maximum normalization, when truthfully, a completely de-normalized table is the fastest and most optimal solution. As for foreign keys, nice, handy, and very useful, and have saved my butt several times, but sometimes, based on business logic, you can find a situation where enforcing integrity at the application level is a far better control than relying on the underlying database architecture methods.

This is where the DBA, Data Architect (if available), and development team, and the stakeholder need to work together to arrive at the most optimal solution.

As always, in this industry, the answer comes back to "It depends."

Luther



RonKyle
RonKyle
One Orange Chip
One Orange Chip (28K reputation)One Orange Chip (28K reputation)One Orange Chip (28K reputation)One Orange Chip (28K reputation)One Orange Chip (28K reputation)One Orange Chip (28K reputation)One Orange Chip (28K reputation)One Orange Chip (28K reputation)

Group: General Forum Members
Points: 28324 Visits: 4568
This is a scary editorial, and the preceding comments are also scary. Normalization is not some wonderful thought experiment that it can be nice to have, but it's not strictly speaking necessary. It applies a mathematical rigor to the art and science of relational database development.

Normalization through the third normal form is sufficient for most databases, and most sections of any database. It is derelict of any database designer to suggest that normalization to this point is "optional." The normal forms above that address specific cases that the first three normal forms do not adequately address. That the editorial talked about sixth normal form is something to strive for is silly. This is to handle the use of time in a manner that was not foreseen at the time Codd laid out the original ground rules. For all the database work that I've done, I have not had to deal with this. So when I start creating a database, I don't think I've got to normalize this through six normal forms. Three almost always suffices. And frankly the rules are common sense. If you have each fact in one place, you're probably there, or pretty close.

A database administrator's first duty is to protect the integrity of the data. It is not to have the most performant database. Six joins may be faster than ten, but if the design makes data errors possible, what's the point of going fast? Is it better to return bad data fast, or good data a little slower? The rules governing normalization provide the guide for carrying out this first duty. They are ignored at your peril, and the peril of all those who will have to live with the design that you created.

My primary concern in expressing the counter view is that all too often when I've dealt with actual designers and designs, it seems that the resistance is an unwillingness to take on the rigor required. And unfortunately I have had to live with the results. Denormalization has it's place. I have occasionally--and I stress occasionally--denormalized a portion of a design for performance reasons. But that was after no other solution would work.

As for OLAP designs, these normalization rules don't apply. A different set of rules comes into play, such as granularity of fact tables. These rules are also ignored at your own peril. I got into BI because I was tasked to determine why the warehouse a consultant had created for us had some bad data. After having read Ralph Kimball's book to give me some theoretical knowledge, I determined that he had not factored in this very important aspect of design.



brian.fine 41497
brian.fine 41497
Valued Member
Valued Member (55 reputation)Valued Member (55 reputation)Valued Member (55 reputation)Valued Member (55 reputation)Valued Member (55 reputation)Valued Member (55 reputation)Valued Member (55 reputation)Valued Member (55 reputation)

Group: General Forum Members
Points: 55 Visits: 101
Thimble view of a long story....
Ten years ago I was designing reports for an 'improved' version of a proprietary app. The developers said they were using 'sixth and seventh level normalization'. After a lot of searching I found one other system that was using this same system, the Royal Mail Service in the UK.
The person I talked with had retired from there a couple years before. He described it as the 'worst bloody system ever'.
The system I was working on had to be abandoned ( 2 yrs + $4M US ) because the database could only deliver meaningful data in columns and our customers wanted data in rows.
xsevensinzx
xsevensinzx
SSC-Insane
SSC-Insane (23K reputation)SSC-Insane (23K reputation)SSC-Insane (23K reputation)SSC-Insane (23K reputation)SSC-Insane (23K reputation)SSC-Insane (23K reputation)SSC-Insane (23K reputation)SSC-Insane (23K reputation)

Group: General Forum Members
Points: 23032 Visits: 6103
RonKyle - Monday, July 23, 2018 7:03 AM
This is a scary editorial, and the preceding comments are also scary. Normalization is not some wonderful thought experiment that it can be nice to have, but it's not strictly speaking necessary. It applies a mathematical rigor to the art and science of relational database development.

Normalization through the third normal form is sufficient for most databases, and most sections of any database. It is derelict of any database designer to suggest that normalization to this point is "optional." The normal forms above that address specific cases that the first three normal forms do not adequately address.

Unfortunately, it is an option and not a poor one to suggest it's optional. In cases like mine, normalization dramatically decreases the performance of reading the data. If you were to come in as a database designer and told me that you were going to normalize the 10 TB of data in our data warehouse just because it's what you have always done with other traditional SMP databases, then you're going to be shown the door.

When it comes to normalization and even denormalization, it just depends...

roger.plowman
roger.plowman
SSCrazy Eights
SSCrazy Eights (9.2K reputation)SSCrazy Eights (9.2K reputation)SSCrazy Eights (9.2K reputation)SSCrazy Eights (9.2K reputation)SSCrazy Eights (9.2K reputation)SSCrazy Eights (9.2K reputation)SSCrazy Eights (9.2K reputation)SSCrazy Eights (9.2K reputation)

Group: General Forum Members
Points: 9174 Visits: 2026
I can't speak for data lakes, data warehouses, and so on but for OLTP 3NF has always been sufficient for me. There are *special* cases where denormalizing can be required, but it's a niche thing. Personally I use it for denomination sub-totals and nothing else.

The database is where data integrity rules belong, period. Otherwise an application can get around them either through ignorance or malice, and neither case is of the good.

Now in a read-only environment I can see where denomalization might be helpful and application logic supplies some business logic, but absolutely not where users can change the data. Users are endlessly inventive and the old saw "make something idiot proof and the universe will provide better idiots" is so true it isn't even funny.
Go


Permissions

You can't post new topics.
You can't post topic replies.
You can't post new polls.
You can't post replies to polls.
You can't edit your own topics.
You can't delete your own topics.
You can't edit other topics.
You can't delete other topics.
You can't edit your own posts.
You can't edit other posts.
You can't delete your own posts.
You can't delete other posts.
You can't post events.
You can't edit your own events.
You can't edit other events.
You can't delete your own events.
You can't delete other events.
You can't send private messages.
You can't send emails.
You can read topics.
You can't vote in polls.
You can't upload attachments.
You can download attachments.
You can't post HTML code.
You can't edit HTML code.
You can't post IFCode.
You can't post JavaScript.
You can post emoticons.
You can't post or upload images.

Select a forum









































































































































































SQLServerCentral


Search