Denormalization Strategies

  • Paul White (3/15/2010)


    Alvin Ramard (3/15/2010)


    (Seriously, there was no pun intended.)

    Given your track record for bad puns, Alvin, I have my doubts :laugh:

    Benefit of the doubt. 😛

    If a pun had been intended, I would have included a smiley.

    I can understand you comment. I do have a reputation for trying to add a bit of humor to many situations. 🙂



    Alvin Ramard
    Memphis PASS Chapter[/url]

    All my SSC forum answers come with a money back guarantee. If you didn't like the answer then I'll gladly refund what you paid for it.

    For best practices on asking questions, please read the following article: Forum Etiquette: How to post data/code on a forum to get the best help[/url]

  • In spite of the criticism, it was still a simple example of minimal denormalization to achieve an end result rather than a full-on explosion of wide rows to reduce the NF to 0 :pinch:

    Interestingly enough, the biggest cost to the initial query, which probably exceeded benefits of denormalization, was using a 'function' in a WHERE predicate...

    WHERE P_MS.DateReceived > getdate() - 365

    ...would have been better expressed declaring a scalar variable:

    DECLARE @selectDate = getdate()-365

    ...

    WHERE P_MS.DateReceived > @selectDate

    ...

    ...which would allow the optimizer to use an index on DateReceived.

    Unfortunately, denormalization for immutable datasets as we've used it in the past just doesn't scale, especially on large datasets...not to mention the escalating complexity (and headaches) it entails. Ironically, not even for data-warehouses that go the same route (MOLAP) vs. a Multi-dimensional ROLAP Snowflake Schema. The stastical implications are the subject of current research, though it's proving a bit of a challenge to account for all the complexities it can entail (code proliferation, data-correctness, increased complexity of refactoring to accomodate changing business rules, data-explosion, etc.). The data-tsunami is upon us :w00t:

    Dr. Les Cardwell, DCS-DSS
    Enterprise Data Architect
    Central Lincoln PUD

  • Both of these examples depict DSS type operations. Rather then denormalize a live database I would prefer to created a data warehouse where denormalization is the norm. I believe you denormalize during load testing and then only IF you have a significant performance issue. Over time, a normalized database is easier to modify than a denormalized database.

    Another alternative would be to use all natural keys and that way the UID of the parent table would be carried down to the UID of the children, grandchildren etc. Of course the big disadvantage to this approach is if you have many generations there would be more key columns than data columns in the lowest generation.

  • Les Cardwell (3/15/2010)


    Interestingly enough, the biggest cost to the initial query, which probably exceeded benefits of denormalization, was using a 'function' in a WHERE predicate...

    WHERE P_MS.DateReceived > getdate() - 365

    ...would have been better expressed declaring a scalar variable:

    DECLARE @selectDate = getdate()-365

    ...

    WHERE P_MS.DateReceived > @selectDate

    ...

    ...which would allow the optimizer to use an index on DateReceived.

    Nicely said Grasshopper 🙂

    I like the idea of denormalization, but many people look for these types of articles to re-establish their non-existing point of designing a sloppy, good for nothing database. They just totally ignore the last paragraph! :rolleyes:

    It usually takes alot of time and effort to denormalize a database. But shouldn't you FIRST NORMALIZE then DENORMALIZE if benefit can be measured???? Right??? 😀

  • Les Cardwell (3/15/2010)


    In spite of the criticism, it was still a simple example of minimal denormalization to achieve an end result rather than a full-on explosion of wide rows to reduce the NF to 0 :pinch:

    Interestingly enough, the biggest cost to the initial query, which probably exceeded benefits of denormalization, was using a 'function' in a WHERE predicate...

    WHERE P_MS.DateReceived > getdate() - 365

    ...would have been better expressed declaring a scalar variable:

    DECLARE @selectDate = getdate()-365

    ...

    WHERE P_MS.DateReceived > @selectDate

    ...

    ...which would allow the optimizer to use an index on DateReceived.

    Unfortunately, denormalization for immutable datasets as we've used it in the past just doesn't scale, especially on large datasets...not to mention the escalating complexity (and headaches) it entails. Ironically, not even for data-warehouses that go the same route (MOLAP) vs. a Multi-dimensional ROLAP Snowflake Schema. The stastical implications are the subject of current research, though it's proving a bit of a challenge to account for all the complexities it can entail (code proliferation, data-correctness, increased complexity of refactoring to accomodate changing business rules, data-explosion, etc.). The data-tsunami is upon us :w00t:

    Actually, this:

    WHERE P_MS.DateReceived > getdate() - 365

    can use an index on DateReceived. The function call is on the right of the conditional and will only be calculated once.

  • Les Cardwell (3/15/2010)


    In spite of the criticism, it was still a simple example of minimal denormalization to achieve an end result rather than a full-on explosion of wide rows to reduce the NF to 0 :pinch:

    Interestingly enough, the biggest cost to the initial query, which probably exceeded benefits of denormalization, was using a 'function' in a WHERE predicate...

    WHERE P_MS.DateReceived > getdate() - 365

    ...would have been better expressed declaring a scalar variable:

    DECLARE @selectDate = getdate()-365

    ...

    WHERE P_MS.DateReceived > @selectDate

    ...

    ...which would allow the optimizer to use an index on DateReceived.

    Unfortunately, denormalization for immutable datasets as we've used it in the past just doesn't scale, especially on large datasets...not to mention the escalating complexity (and headaches) it entails. Ironically, not even for data-warehouses that go the same route (MOLAP) vs. a Multi-dimensional ROLAP Snowflake Schema. The stastical implications are the subject of current research, though it's proving a bit of a challenge to account for all the complexities it can entail (code proliferation, data-correctness, increased complexity of refactoring to accomodate changing business rules, data-explosion, etc.). The data-tsunami is upon us :w00t:

    Also, this:

    DECLARE @selectDate = getdate()-365

    won't work. In SQL Server 2008 it needs to be like this:

    DECLARE @selectDate datetime = getdate()-365

  • Paul White (3/15/2010)


    Normalize 'til it hurts...de-normalize* 'til it works!

    Agreed.

    Jason...AKA CirqueDeSQLeil
    _______________________________________________
    I have given a name to my pain...MCM SQL Server, MVP
    SQL RNNR
    Posting Performance Based Questions - Gail Shaw[/url]
    Learn Extended Events

  • Alvin Ramard (3/15/2010)


    Paul White (3/15/2010)


    Jim,

    Yes. Data warehouses are a totally different kettle.

    It's normal for denormalization to be present in a data warehouse.

    (Seriously, there was no pun intended.)

    Absolutely. There should not be a lot of transactions occurring there and flatter structures can be much more beneficial.

    Jason...AKA CirqueDeSQLeil
    _______________________________________________
    I have given a name to my pain...MCM SQL Server, MVP
    SQL RNNR
    Posting Performance Based Questions - Gail Shaw[/url]
    Learn Extended Events

  • Lynn Pettis (3/15/2010)


    Les Cardwell (3/15/2010)


    In spite of the criticism, it was still a simple example of minimal denormalization to achieve an end result rather than a full-on explosion of wide rows to reduce the NF to 0 :pinch:

    Interestingly enough, the biggest cost to the initial query, which probably exceeded benefits of denormalization, was using a 'function' in a WHERE predicate...

    WHERE P_MS.DateReceived > getdate() - 365

    ...would have been better expressed declaring a scalar variable:

    DECLARE @selectDate = getdate()-365

    ...

    WHERE P_MS.DateReceived > @selectDate

    ...

    ...which would allow the optimizer to use an index on DateReceived.

    Unfortunately, denormalization for immutable datasets as we've used it in the past just doesn't scale, especially on large datasets...not to mention the escalating complexity (and headaches) it entails. Ironically, not even for data-warehouses that go the same route (MOLAP) vs. a Multi-dimensional ROLAP Snowflake Schema. The stastical implications are the subject of current research, though it's proving a bit of a challenge to account for all the complexities it can entail (code proliferation, data-correctness, increased complexity of refactoring to accomodate changing business rules, data-explosion, etc.). The data-tsunami is upon us :w00t:

    Also, this:

    DECLARE @selectDate = getdate()-365

    won't work. In SQL Server 2008 it needs to be like this:

    DECLARE @selectDate datetime = getdate()-365

    For what it's worth, it doesn't work in 2005 either.

    Cannot assign a default value to a local variable.

    Jason...AKA CirqueDeSQLeil
    _______________________________________________
    I have given a name to my pain...MCM SQL Server, MVP
    SQL RNNR
    Posting Performance Based Questions - Gail Shaw[/url]
    Learn Extended Events

  • CirquedeSQLeil (3/15/2010)


    Lynn Pettis (3/15/2010)


    Les Cardwell (3/15/2010)


    In spite of the criticism, it was still a simple example of minimal denormalization to achieve an end result rather than a full-on explosion of wide rows to reduce the NF to 0 :pinch:

    Interestingly enough, the biggest cost to the initial query, which probably exceeded benefits of denormalization, was using a 'function' in a WHERE predicate...

    WHERE P_MS.DateReceived > getdate() - 365

    ...would have been better expressed declaring a scalar variable:

    DECLARE @selectDate = getdate()-365

    ...

    WHERE P_MS.DateReceived > @selectDate

    ...

    ...which would allow the optimizer to use an index on DateReceived.

    Unfortunately, denormalization for immutable datasets as we've used it in the past just doesn't scale, especially on large datasets...not to mention the escalating complexity (and headaches) it entails. Ironically, not even for data-warehouses that go the same route (MOLAP) vs. a Multi-dimensional ROLAP Snowflake Schema. The stastical implications are the subject of current research, though it's proving a bit of a challenge to account for all the complexities it can entail (code proliferation, data-correctness, increased complexity of refactoring to accomodate changing business rules, data-explosion, etc.). The data-tsunami is upon us :w00t:

    Also, this:

    DECLARE @selectDate = getdate()-365

    won't work. In SQL Server 2008 it needs to be like this:

    DECLARE @selectDate datetime = getdate()-365

    For what it's worth, it doesn't work in 2005 either.

    Cannot assign a default value to a local variable.

    Nope, it doesn't. Being able to assign a value to a variable when it is declared is new to SQL Server 2008. Guess what, we upgraded our PeopleSoft systems to SQL Server 2008 EE. Now, we just need to start upgrading our other systems.

  • Also, this:

    DECLARE @selectDate = getdate()-365

    won't work. In SQL Server 2008 it needs to be like this:

    DECLARE @selectDate datetime = getdate()-365

    For what it's worth, it doesn't work in 2005 either.

    Cannot assign a default value to a local variable.

    Nope, it doesn't. Being able to assign a value to a variable when it is declared is new to SQL Server 2008. Guess what, we upgraded our PeopleSoft systems to SQL Server 2008 EE. Now, we just need to start upgrading our other systems.

    Good catch on the 'type' 🙂

    Actually, in 2005 it needs to be...

    DECLARE @selectDate DATETIME

    SET @selectDate = getdate() - 365

    ;

    I'm jumping around between SQL2000, SQL2005, SQL2008, Oracle10g, and DB2... nutz.

    Dr. Les Cardwell, DCS-DSS
    Enterprise Data Architect
    Central Lincoln PUD

  • Actually, this:

    WHERE P_MS.DateReceived > getdate() - 365

    can use an index on DateReceived. The function call is on the right of the conditional and will only be calculated once.

    Hmmm... positive? Since 'getdate()' is a non-deterministic function, like all non-deterministic functions, we've always assigned them to a scalar variable to ensure the dbms won't perform a table-scan...although admittedly, these days they seem to be more implementation dependent.

    From SQL Server Help...

    For example, the function GETDATE() is nondeterministic. SQL Server puts restrictions on various classes of nondeterminism. Therefore, nondeterministic functions should be used carefully. The lack of strict determinism of a function can block valuable performance optimizations. Certain plan reordering steps are skipped to conservatively preserve correctness. Additionally, the number, order, and timing of calls to user-defined functions is implementation-dependent. Do not rely on these invocation semantics.

    JFWIW...

    Dr. Les Cardwell, DCS-DSS
    Enterprise Data Architect
    Central Lincoln PUD

  • Good points made. I have never found using getdate() inside a SQL query to be problematic in my execution plans. However, if it's "best practice" to not do it, then I'll probably stop. I had never thought about it, before now.

  • Les Cardwell (3/15/2010)


    Also, this:

    DECLARE @selectDate = getdate()-365

    won't work. In SQL Server 2008 it needs to be like this:

    DECLARE @selectDate datetime = getdate()-365

    For what it's worth, it doesn't work in 2005 either.

    Cannot assign a default value to a local variable.

    Nope, it doesn't. Being able to assign a value to a variable when it is declared is new to SQL Server 2008. Guess what, we upgraded our PeopleSoft systems to SQL Server 2008 EE. Now, we just need to start upgrading our other systems.

    Good catch on the 'type' 🙂

    Actually, in 2005 it needs to be...

    DECLARE @selectDate DATETIME

    SET @selectDate = getdate() - 365

    ;

    I'm jumping around between SQL2000, SQL2005, SQL2008, Oracle10g, and DB2... nutz.

    Pretty sure.

    Table/Index defs

    USE [SandBox]

    GO

    /****** Object: Table [dbo].[JBMTest] Script Date: 03/15/2010 12:49:16 ******/

    SET ANSI_NULLS ON

    GO

    SET QUOTED_IDENTIFIER ON

    GO

    CREATE TABLE [dbo].[JBMTest](

    [RowNum] [int] IDENTITY(1,1) NOT NULL,

    [AccountID] [int] NOT NULL,

    [Amount] [money] NOT NULL,

    [Date] [datetime] NOT NULL

    ) ON [PRIMARY]

    GO

    /****** Object: Index [IX_JBMTest] Script Date: 03/15/2010 12:49:16 ******/

    CREATE CLUSTERED INDEX [IX_JBMTest] ON [dbo].[JBMTest]

    (

    [Date] ASC

    )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

    GO

    /****** Object: Index [IX_JBMTest_AccountID_Date] Script Date: 03/15/2010 12:49:16 ******/

    CREATE NONCLUSTERED INDEX [IX_JBMTest_AccountID_Date] ON [dbo].[JBMTest]

    (

    [AccountID] ASC,

    [Date] ASC

    )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

    Simple query:

    select * from dbo.JBMTest where Date > getdate() - 365

    Actual execution plan attached.

    There are 1,000,000 records in the test table.

  • Lynn Pettis (3/15/2010)


    USE [SandBox]

    GO

    /****** Object: Table [dbo].[b]JBMTest[/b] Script Date: 03/15/2010 12:49:16 ******/

    Looks like a familiar setup 😉

    Jason...AKA CirqueDeSQLeil
    _______________________________________________
    I have given a name to my pain...MCM SQL Server, MVP
    SQL RNNR
    Posting Performance Based Questions - Gail Shaw[/url]
    Learn Extended Events

Viewing 15 posts - 16 through 30 (of 45 total)

You must be logged in to reply to this topic. Login to reply