Does denormalizing the data enhance performance?

  • We have a sales tracker report which uses a view and that view was created from 15+ tables. I have read that denormalizing the data can be helpful when it comes to performance enhancement. I am going to share to execution plan as well to see what happens when I run select top 10 from that view. I was thinking about getting that data into a table and select top 10 rows to see how its working. I also had statistics time and io on and I see scans, massive amount of logical reads etc. Am I somewhat on a right track here or whatever I mentioned until this point doesn't make sense?

  • I'll answer that a little differently, there can be performance value in pre generating a report especially if it's a report a lot of people pull up often.  Sometimes that might include denormalizing data.

  • NewBornDBA2017 - Wednesday, January 31, 2018 11:00 AM

    We have a sales tracker report which uses a view and that view was created from 15+ tables. I have read that denormalizing the data can be helpful when it comes to performance enhancement. I am going to share to execution plan as well to see what happens when I run select top 10 from that view. I was thinking about getting that data into a table and select top 10 rows to see how its working. I also had statistics time and io on and I see scans, massive amount of logical reads etc. Am I somewhat on a right track here or whatever I mentioned until this point doesn't make sense?

    I would look at the definition of the view and work on tuning that before considering any denormalization. 

    Sue

  • I would start by defining adequate clustered indexes to your tables. Then add any nonclustered indexes that might be needed.

    Check the following for more help:
    How to Post Performance Problems - SQLServerCentral

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2
  • A few things:
    1. The plan you attached is the Estimated Plan which will not be helpful here
    2. A TOP (10) - that will affect the execution plan, what you posted might not be the actual plan when you actually execute the query.

    All that aside, the answer to your question is Yes, it sure does... For data retreival; that's why denormalized tables are common for reporting and Data ViZ.

    For data modification (inserts, updates, deletes) you want to have your data normalized. For reporting I often denormalize my data. Sometimes I'll create indexed views to pre-join or pre-aggregate my data, sometimes I'll have a job or SSIS package that populates a denormalized reporting table from normalized tables in my transactional environment.

    "I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."

    -- Itzik Ben-Gan 2001

  • I'd agree with Luis on this.  Looking at the execution plan, everything is a table scan which tells me there's no clustered indexes, everything is a heap with all the problems and issues that go with that.  Just adding nonclustered indexes to the tables would only partially solve these problems, and denormalizing at this point would only complicate data maintenance.

  • Denormalising data is not a performance enhancement.
    Denormalising data is a report optimisation concept.

    Imo you are asking the wrong question. If the report is run multiple times per day on current (changing) data then materialising the view into a table for report optimisation and response times might become a major stumbling block in performance on the application / database side.
    Option 1: Report is run on daily data only
    Overnight job that pulls the view data into a table. Run the report from this table instead of the view.
    With more sophistication make it so that the job runs scheduled overnight and ad-hoc on request. Test the impact on database performance and impact on application.
    Option 2: Report is run multiple times per day on current data
    Depends on how much control you need on performance impact on the application. Run a scheduled refresh per job at times to pull the view into a table.
    Alternatively use a materialised view - after careful testing! Benefit is that you do not worry about refresh as the view will do that itself. However you have no control over when the performance impact hits the database.

    And when testing and looking at performance a big data collection never ever use TOP(n) - this changes the query analyser path unless there is a sort within the view which in itself is a bad idea.

  • Knut Boehnert - Thursday, February 1, 2018 2:38 AM

    Denormalising data is not a performance enhancement.
    Denormalising data is a report optimisation concept.

    Imo you are asking the wrong question. If the report is run multiple times per day on current (changing) data then materialising the view into a table for report optimisation and response times might become a major stumbling block in performance on the application / database side.
    Option 1: Report is run on daily data only
    Overnight job that pulls the view data into a table. Run the report from this table instead of the view.
    With more sophistication make it so that the job runs scheduled overnight and ad-hoc on request. Test the impact on database performance and impact on application.
    Option 2: Report is run multiple times per day on current data
    Depends on how much control you need on performance impact on the application. Run a scheduled refresh per job at times to pull the view into a table.
    Alternatively use a materialised view - after careful testing! Benefit is that you do not worry about refresh as the view will do that itself. However you have no control over when the performance impact hits the database.

    And when testing and looking at performance a big data collection never ever use TOP(n) - this changes the query analyser path unless there is a sort within the view which in itself is a bad idea.

    As a BI/DW guy, I'd lean towards this approach.  
    Bottom line, you need more context.  
    Is it a real-time report? Low-latency report (intra-day)? daily or longer? 
    Is the volume in the underlying tables "high" (relative to your performance needs)?
    Does the report requirement demand access to a large or small proportion of data from the underlying tables for reporting?  (maybe look into filtered indexes - or denormalizing a small dataset in a truncate-reload pattern might work)
    Query shape and indexing is always the first place to start (most OLTP admins would tell you that) - try columnstore indexes - that might be worthy of investigation.
    Denormalization, automation AND indexing is probably something to investigate when tackling "complex" query reporting with "high" volume where you can run automation to support the latency.  I've done batch processing (SSIS/SQL procedures and reporting models in separate schema) for OLTP reporting/analytics requirements that had 30-minute refresh requirements successfully.  Driven partly by a priority for data management activities over the reporting activities against the data.

  • NewBornDBA2017 - Wednesday, January 31, 2018 11:00 AM

    We have a sales tracker report which uses a view and that view was created from 15+ tables. I have read that denormalizing the data can be helpful when it comes to performance enhancement. I am going to share to execution plan as well to see what happens when I run select top 10 from that view. I was thinking about getting that data into a table and select top 10 rows to see how its working. I also had statistics time and io on and I see scans, massive amount of logical reads etc. Am I somewhat on a right track here or whatever I mentioned until this point doesn't make sense?

    I have a pretty firm safety rule about not opening attachments for the same reason they don't do it on the Internet. As a generalization, the normalization can improve performance. But it always destroys data integrity. If the right answers don't matter to you then you can do this. If you really don't care about the right answers to see everything is 42 and give a nod of the head to Douglas Adams 🙂

    Please post DDL and follow ANSI/ISO standards when asking for help. 

  • jcelko212 32090 - Thursday, February 1, 2018 11:12 AM

    NewBornDBA2017 - Wednesday, January 31, 2018 11:00 AM

    We have a sales tracker report which uses a view and that view was created from 15+ tables. I have read that denormalizing the data can be helpful when it comes to performance enhancement. I am going to share to execution plan as well to see what happens when I run select top 10 from that view. I was thinking about getting that data into a table and select top 10 rows to see how its working. I also had statistics time and io on and I see scans, massive amount of logical reads etc. Am I somewhat on a right track here or whatever I mentioned until this point doesn't make sense?

    I have a pretty firm safety rule about not opening attachments for the same reason they don't do it on the Internet. As a generalization, the normalization can improve performance. But it always destroys data integrity. If the right answers don't matter to you then you can do this. If you really don't care about the right answers to see everything is 42 and give a nod of the head to Douglas Adams 🙂

    First, .sqlplan files are very helpful for troubleshooting performance problems. DDL is helpful too but not necessary as the question was about normalization/denormalization. 

    You can denormalize data and be confident you are getting the correct answer, we do it in the data warehouse/BI world using basic ETL methods, SSIS or even indexed views. If your source data is correct and your ETL or views are correct then so will your reporting data. I promise. 

    Nobody should be reporting against a normalized OLTP environment. Joining a bunch of tables and performing the same aggregations over and over should be reserved for people who don't know any better.

    "I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."

    -- Itzik Ben-Gan 2001

  • The answer to your question is yes, denormalization can definitely help performance.  Unfortunately it can also lead to inconsistent data.  I'll state plainly that I have sometimes lost design battles in the interest of performance only to have data issues come back and bite us.  Ditching transactions is another performance enhancement that has been chucked (over my objections), and again only to be bitten by inconsistent updates. 

    A DBAs first duty is to protect the integrity of the data.  Performance is also important, but there are ways to improve that without using denormalization.  I have denormalized in rare instances, but only when it was clear that there was no other way to do what was needed. 

    If dealing with OLAP tables, these are by nature denormalized, and those rules don't apply in this case.

  • RonKyle - Friday, February 2, 2018 8:34 AM

    The answer to your question is yes, denormalization can definitely help performance.  Unfortunately it can also lead to inconsistent data.  I'll state plainly that I have sometimes lost design battles in the interest of performance only to have data issues come back and bite us.  Ditching transactions is another performance enhancement that has been chucked (over my objections), and again only to be bitten by inconsistent updates. 

    A DBAs first duty is to protect the integrity of the data.  Performance is also important, but there are ways to improve that without using denormalization.  I have denormalized in rare instances, but only when it was clear that there was no other way to do what was needed. 

    If dealing with OLAP tables, these are by nature denormalized, and those rules don't apply in this case.

    I completely agree with this and the only thing I would add is that when doing OLAP tables, you need constraints but they are not the same sort of constraint you would have on transactional data. And the only thing I would add is that when you add, but they are of a different nature than transaction processing constraints. You tend to look at aggregates rather than individual rows

    Please post DDL and follow ANSI/ISO standards when asking for help. 

  • RonKyle - Friday, February 2, 2018 8:34 AM

    The answer to your question is yes, denormalization can definitely help performance.  Unfortunately it can also lead to inconsistent data.  I'll state plainly that I have sometimes lost design battles in the interest of performance only to have data issues come back and bite us.  Ditching transactions is another performance enhancement that has been chucked (over my objections), and again only to be bitten by inconsistent updates. 

    A DBAs first duty is to protect the integrity of the data.  Performance is also important, but there are ways to improve that without using denormalization.  I have denormalized in rare instances, but only when it was clear that there was no other way to do what was needed. 

    If dealing with OLAP tables, these are by nature denormalized, and those rules don't apply in this case.

    Absolutely agreed.  It wouldn't be a complete answer if someone didn't mention that denormalizing data can also cause huge performance problems.  Too-wide tables to accommodate reduced joins, too-long tables to accommodate the duplication of data that's frequently necessary to accomplish that same goal, and duplication of data to accommodate finer temporal resolutions than the original data was naturally capable of all lead to sometimes major performance and concurrency issues.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden - Sunday, February 4, 2018 4:34 PM

    RonKyle - Friday, February 2, 2018 8:34 AM

    The answer to your question is yes, denormalization can definitely help performance.  Unfortunately it can also lead to inconsistent data.  I'll state plainly that I have sometimes lost design battles in the interest of performance only to have data issues come back and bite us.  Ditching transactions is another performance enhancement that has been chucked (over my objections), and again only to be bitten by inconsistent updates. 

    A DBAs first duty is to protect the integrity of the data.  Performance is also important, but there are ways to improve that without using denormalization.  I have denormalized in rare instances, but only when it was clear that there was no other way to do what was needed. 

    If dealing with OLAP tables, these are by nature denormalized, and those rules don't apply in this case.

    Absolutely agreed.  It wouldn't be a complete answer if someone didn't mention that denormalizing data can also cause huge performance problems.  Too-wide tables to accommodate reduced joins, too-long tables to accommodate the duplication of data that's frequently necessary to accomplish that same goal, and duplication of data to accommodate finer temporal resolutions than the original data was naturally capable of all lead to sometimes major performance and concurrency issues.

    Columnstores is where it's at for denormalized data for reporting where tables are wide and long.

Viewing 14 posts - 1 through 13 (of 13 total)

You must be logged in to reply to this topic. Login to reply