Does denormalizing the data enhance performance?

Question

Does denormalizing the data enhance performance?

NewBornDBA2017

SSCarpal Tunnel

Points: 4832
More actions
January 31, 2018 at 11:00 am

#410149

We have a sales tracker report which uses a view and that view was created from 15+ tables. I have read that denormalizing the data can be helpful when it comes to performance enhancement. I am going to share to execution plan as well to see what happens when I run select top 10 from that view. I was thinking about getting that data into a table and select top 10 rows to see how its working. I also had statistics time and io on and I see scans, massive amount of logical reads etc. Am I somewhat on a right track here or whatever I mentioned until this point doesn't make sense?

Viewing 14 posts - 1 through 14 (of 14 total)

You must be logged in to reply to this topic. Login to reply

ZZartin SSC-Dedicated Points: 30921 More actions · Answer 1

I'll answer that a little differently, there can be performance value in pre generating a report especially if it's a report a lot of people pull up often. Sometimes that might include denormalizing data.

Sue_H SSC Guru Points: 90860 More actions · Answer 2

NewBornDBA2017 - Wednesday, January 31, 2018 11:00 AM
We have a sales tracker report which uses a view and that view was created from 15+ tables. I have read that denormalizing the data can be helpful when it comes to performance enhancement. I am going to share to execution plan as well to see what happens when I run select top 10 from that view. I was thinking about getting that data into a table and select top 10 rows to see how its working. I also had statistics time and io on and I see scans, massive amount of logical reads etc. Am I somewhat on a right track here or whatever I mentioned until this point doesn't make sense?

I would look at the definition of the view and work on tuning that before considering any denormalization.

Sue

Luis Cazares SSC Guru Points: 183706 More actions · Answer 3

I would start by defining adequate clustered indexes to your tables. Then add any nonclustered indexes that might be needed.

Check the following for more help:
How to Post Performance Problems - SQLServerCentral

Luis C.
General Disclaimer:
Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

How to post data/code on a forum to get the best help: Option 1 / Option 2

Alan Burstein SSC Guru Points: 61141 More actions · Answer 4

A few things:
1. The plan you attached is the Estimated Plan which will not be helpful here
2. A TOP (10) - that will affect the execution plan, what you posted might not be the actual plan when you actually execute the query.

All that aside, the answer to your question is Yes, it sure does... For data retreival; that's why denormalized tables are common for reporting and Data ViZ.

For data modification (inserts, updates, deletes) you want to have your data normalized. For reporting I often denormalize my data. Sometimes I'll create indexed views to pre-join or pre-aggregate my data, sometimes I'll have a job or SSIS package that populates a denormalized reporting table from normalized tables in my transactional environment.

"I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."

-- Itzik Ben-Gan 2001

Chris Harshman SSC-Forever Points: 42192 More actions · Answer 5

I'd agree with Luis on this. Looking at the execution plan, everything is a table scan which tells me there's no clustered indexes, everything is a heap with all the problems and issues that go with that. Just adding nonclustered indexes to the tables would only partially solve these problems, and denormalizing at this point would only complicate data maintenance.

Knut Boehnert SSCrazy Points: 2951 More actions · Answer 6

Denormalising data is not a performance enhancement.
Denormalising data is a report optimisation concept.

Imo you are asking the wrong question. If the report is run multiple times per day on current (changing) data then materialising the view into a table for report optimisation and response times might become a major stumbling block in performance on the application / database side.
Option 1: Report is run on daily data only
Overnight job that pulls the view data into a table. Run the report from this table instead of the view.
With more sophistication make it so that the job runs scheduled overnight and ad-hoc on request. Test the impact on database performance and impact on application.
Option 2: Report is run multiple times per day on current data
Depends on how much control you need on performance impact on the application. Run a scheduled refresh per job at times to pull the view into a table.
Alternatively use a materialised view - after careful testing! Benefit is that you do not worry about refresh as the view will do that itself. However you have no control over when the performance impact hits the database.

And when testing and looking at performance a big data collection never ever use TOP(n) - this changes the query analyser path unless there is a sort within the view which in itself is a bad idea.

Fahim Kanji SSC Rookie Points: 34 More actions · Answer 7

Knut Boehnert - Thursday, February 1, 2018 2:38 AM
Denormalising data is not a performance enhancement.
Denormalising data is a report optimisation concept.
Imo you are asking the wrong question. If the report is run multiple times per day on current (changing) data then materialising the view into a table for report optimisation and response times might become a major stumbling block in performance on the application / database side.
Option 1: Report is run on daily data only
Overnight job that pulls the view data into a table. Run the report from this table instead of the view.
With more sophistication make it so that the job runs scheduled overnight and ad-hoc on request. Test the impact on database performance and impact on application.
Option 2: Report is run multiple times per day on current data
Depends on how much control you need on performance impact on the application. Run a scheduled refresh per job at times to pull the view into a table.
Alternatively use a materialised view - after careful testing! Benefit is that you do not worry about refresh as the view will do that itself. However you have no control over when the performance impact hits the database.
And when testing and looking at performance a big data collection never ever use TOP(n) - this changes the query analyser path unless there is a sort within the view which in itself is a bad idea.

As a BI/DW guy, I'd lean towards this approach.
Bottom line, you need more context.
Is it a real-time report? Low-latency report (intra-day)? daily or longer?
Is the volume in the underlying tables "high" (relative to your performance needs)?
Does the report requirement demand access to a large or small proportion of data from the underlying tables for reporting? (maybe look into filtered indexes - or denormalizing a small dataset in a truncate-reload pattern might work)
Query shape and indexing is always the first place to start (most OLTP admins would tell you that) - try columnstore indexes - that might be worthy of investigation.
Denormalization, automation AND indexing is probably something to investigate when tackling "complex" query reporting with "high" volume where you can run automation to support the latency. I've done batch processing (SSIS/SQL procedures and reporting models in separate schema) for OLTP reporting/analytics requirements that had 30-minute refresh requirements successfully. Driven partly by a priority for data management activities over the reporting activities against the data.

jcelko212 32090 SSCrazy Eights Points: 9304 More actions · Answer 8

NewBornDBA2017 - Wednesday, January 31, 2018 11:00 AM
We have a sales tracker report which uses a view and that view was created from 15+ tables. I have read that denormalizing the data can be helpful when it comes to performance enhancement. I am going to share to execution plan as well to see what happens when I run select top 10 from that view. I was thinking about getting that data into a table and select top 10 rows to see how its working. I also had statistics time and io on and I see scans, massive amount of logical reads etc. Am I somewhat on a right track here or whatever I mentioned until this point doesn't make sense?

I have a pretty firm safety rule about not opening attachments for the same reason they don't do it on the Internet. As a generalization, the normalization can improve performance. But it always destroys data integrity. If the right answers don't matter to you then you can do this. If you really don't care about the right answers to see everything is 42 and give a nod of the head to Douglas Adams

Please post DDL and follow ANSI/ISO standards when asking for help.

Alan Burstein SSC Guru Points: 61141 More actions · Answer 9

jcelko212 32090 - Thursday, February 1, 2018 11:12 AM
NewBornDBA2017 - Wednesday, January 31, 2018 11:00 AM
We have a sales tracker report which uses a view and that view was created from 15+ tables. I have read that denormalizing the data can be helpful when it comes to performance enhancement. I am going to share to execution plan as well to see what happens when I run select top 10 from that view. I was thinking about getting that data into a table and select top 10 rows to see how its working. I also had statistics time and io on and I see scans, massive amount of logical reads etc. Am I somewhat on a right track here or whatever I mentioned until this point doesn't make sense?
I have a pretty firm safety rule about not opening attachments for the same reason they don't do it on the Internet. As a generalization, the normalization can improve performance. But it always destroys data integrity. If the right answers don't matter to you then you can do this. If you really don't care about the right answers to see everything is 42 and give a nod of the head to Douglas Adams

First, .sqlplan files are very helpful for troubleshooting performance problems. DDL is helpful too but not necessary as the question was about normalization/denormalization.

You can denormalize data and be confident you are getting the correct answer, we do it in the data warehouse/BI world using basic ETL methods, SSIS or even indexed views. If your source data is correct and your ETL or views are correct then so will your reporting data. I promise.

Nobody should be reporting against a normalized OLTP environment. Joining a bunch of tables and performing the same aggregations over and over should be reserved for people who don't know any better.

"I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."

-- Itzik Ben-Gan 2001

RonKyle SSC-Dedicated Points: 31570 More actions · Answer 10

The answer to your question is yes, denormalization can definitely help performance. Unfortunately it can also lead to inconsistent data. I'll state plainly that I have sometimes lost design battles in the interest of performance only to have data issues come back and bite us. Ditching transactions is another performance enhancement that has been chucked (over my objections), and again only to be bitten by inconsistent updates.

A DBAs first duty is to protect the integrity of the data. Performance is also important, but there are ways to improve that without using denormalization. I have denormalized in rare instances, but only when it was clear that there was no other way to do what was needed.

If dealing with OLAP tables, these are by nature denormalized, and those rules don't apply in this case.

jcelko212 32090 SSCrazy Eights Points: 9304 More actions · Answer 11

RonKyle - Friday, February 2, 2018 8:34 AM
The answer to your question is yes, denormalization can definitely help performance. Unfortunately it can also lead to inconsistent data. I'll state plainly that I have sometimes lost design battles in the interest of performance only to have data issues come back and bite us. Ditching transactions is another performance enhancement that has been chucked (over my objections), and again only to be bitten by inconsistent updates.
A DBAs first duty is to protect the integrity of the data. Performance is also important, but there are ways to improve that without using denormalization. I have denormalized in rare instances, but only when it was clear that there was no other way to do what was needed.
If dealing with OLAP tables, these are by nature denormalized, and those rules don't apply in this case.

I completely agree with this and the only thing I would add is that when doing OLAP tables, you need constraints but they are not the same sort of constraint you would have on transactional data. And the only thing I would add is that when you add, but they are of a different nature than transaction processing constraints. You tend to look at aggregates rather than individual rows

Please post DDL and follow ANSI/ISO standards when asking for help.

Jeff Moden SSC Guru Points: 1004617 More actions · Answer 12

RonKyle - Friday, February 2, 2018 8:34 AM
The answer to your question is yes, denormalization can definitely help performance. Unfortunately it can also lead to inconsistent data. I'll state plainly that I have sometimes lost design battles in the interest of performance only to have data issues come back and bite us. Ditching transactions is another performance enhancement that has been chucked (over my objections), and again only to be bitten by inconsistent updates.
A DBAs first duty is to protect the integrity of the data. Performance is also important, but there are ways to improve that without using denormalization. I have denormalized in rare instances, but only when it was clear that there was no other way to do what was needed.
If dealing with OLAP tables, these are by nature denormalized, and those rules don't apply in this case.

Absolutely agreed. It wouldn't be a complete answer if someone didn't mention that denormalizing data can also cause huge performance problems. Too-wide tables to accommodate reduced joins, too-long tables to accommodate the duplication of data that's frequently necessary to accomplish that same goal, and duplication of data to accommodate finer temporal resolutions than the original data was naturally capable of all lead to sometimes major performance and concurrency issues.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

xsevensinzx One Orange Chip Points: 25560 More actions · Answer 13

Jeff Moden - Sunday, February 4, 2018 4:34 PM
RonKyle - Friday, February 2, 2018 8:34 AM
The answer to your question is yes, denormalization can definitely help performance. Unfortunately it can also lead to inconsistent data. I'll state plainly that I have sometimes lost design battles in the interest of performance only to have data issues come back and bite us. Ditching transactions is another performance enhancement that has been chucked (over my objections), and again only to be bitten by inconsistent updates.
A DBAs first duty is to protect the integrity of the data. Performance is also important, but there are ways to improve that without using denormalization. I have denormalized in rare instances, but only when it was clear that there was no other way to do what was needed.
If dealing with OLAP tables, these are by nature denormalized, and those rules don't apply in this case.
Absolutely agreed. It wouldn't be a complete answer if someone didn't mention that denormalizing data can also cause huge performance problems. Too-wide tables to accommodate reduced joins, too-long tables to accommodate the duplication of data that's frequently necessary to accomplish that same goal, and duplication of data to accommodate finer temporal resolutions than the original data was naturally capable of all lead to sometimes major performance and concurrency issues.

Columnstores is where it's at for denormalized data for reporting where tables are wide and long.

Does denormalizing the data enhance performance?

Cookies on SQLServerCentral