Normalizing-Denormalized Tables

  • I thought you had an interesting idea and it's worth considering for other people. I'm not sure how performance has changed. That would be an interesting follow-up.

    Actually it would be good to show a follow-up article later that removes the RBAR and then compares the approaches. We can link the articles together.

  • I am very interested in follow-on article about the "Insert Instead of" trigger on the view. We are just beginning a massive project to normalize our databases while creating views to keep our legacy apps running. However, its this view trigger that will be the real key to our solution. Hope to see it soon.

    Ron K.

    "Any fool can write code that a computer can understand. Good programmers write code that humans can understand." -- Martin Fowler

  • Mohit (11/9/2008)


    Hi Jeff,

    Thank-you for the comments, I didn't think RBAR would be big issue in my case because the table had limited number of rows and wasn't going to grow. But I forgot to take into account when posting article here that other people reading this article it can be an issue. I am always telling the developers to stop working on the data in SQL Server on row-per-row basis; and I am doing it here myself. Thanks again for the *swift kick*; I will do better next time.

    And your comment on Vendor, I have fought more then one battle against vendors. And lost them all, because I always get brought in at the last moment of when a system is going live; or when its been down for a while. So when working on it I find something that should be changed or altered. I get two responses, 1) it costs money to fix it and 2) we don't have time for it right now; we'll look at it later.

    - Mohit.

    Heh... Understood and appreciated... I've been there with non-compliant vendors myself. Same goes for "out sourcing"... lot's of them don't care about performance or RBAR because they can wash their hands of it once they have your money. I keep trying to invite them to one of my famous porkchop dinners, but I guess word has gotten out on that. 😛

    In SQL Server 2005, there are some tools to do this quite easily without any Cursors or While loops as you'll see in the code below. The view still has a bit of RBAR but it's not too bad... it doesn't use any REPLACE's... just one STUFF.

    The entire "package" below is self standing including the creation of the original data. The whole thing runs in TempDB as a nice safe place to "play". Take a look...

    --===== Do this in a safe place for testing

    USE TempDB

    --=======================================================================================

    -- Make sure that none of the tables used in the demo below already exist.

    -- This just ensures that everyone is on the same page for the demo.

    -- Note that ALL these demo tables are in TempDB... a safe place for this demo.

    --=======================================================================================

    IF OBJECT_ID('TempDB.dbo.Tally','U') IS NOT NULL

    DROP TABLE TempDB.dbo.Tally

    IF OBJECT_ID('TempDB.dbo.OriginalTable','U') IS NOT NULL

    DROP TABLE TempDB.dbo.OriginalTable

    IF OBJECT_ID('TempDB.dbo.ResourceSecurityGroup','U') IS NOT NULL

    DROP TABLE TempDB.dbo.ResourceSecurityGroup

    IF OBJECT_ID('TempDB.dbo.Resource','U') IS NOT NULL

    DROP TABLE TempDB.dbo.Resource

    IF OBJECT_ID('TempDB.dbo.SecurityGroup','U') IS NOT NULL

    DROP TABLE TempDB.dbo.SecurityGroup

    IF OBJECT_ID('TempDB..#Work','U') IS NOT NULL

    DROP TABLE #Work

    IF OBJECT_ID('TempDB.dbo.ApplicationResources','V') IS NOT NULL

    DROP VIEW dbo.ApplicationResources

    --=======================================================================================

    -- Create and populate a Tally table. This would normally be a permanent table and

    -- NOT need to be regenerated everytime.

    --=======================================================================================

    --===== Create and populate the Tally table on the fly

    SELECT TOP 11000 --equates to more than 30 years of dates

    IDENTITY(INT,1,1) AS N

    INTO dbo.Tally

    FROM Master.dbo.SysColumns sc1,

    Master.dbo.SysColumns sc2

    --===== Add a Primary Key to maximize performance

    ALTER TABLE dbo.Tally

    ADD CONSTRAINT PK_Tally_N

    PRIMARY KEY CLUSTERED (N) WITH FILLFACTOR = 100

    --===== Allow the general public to use it

    GRANT SELECT ON dbo.Tally TO PUBLIC

    --=======================================================================================

    -- Setup the original denormalized table so we can demonstrate how to normalize it

    -- using set based methods instead of Cursors and other RBAR.

    --=======================================================================================

    --===== Create a fasimile of the original table

    CREATE TABLE OriginalTable

    (

    ResourceName VARCHAR(50),

    IsEnabled CHAR(1),

    Groups VARCHAR(1024),

    CONSTRAINT PK_OriginalTable PRIMARY KEY CLUSTERED (ResourceName)

    )

    --===== Populate it with data from the article

    INSERT INTO OriginalTable

    (ResourceName, IsEnabled, Groups)

    SELECT 'Resource1','Y','Grp1|Grp2|Grp3|Grp4' UNION ALL

    SELECT 'Resource2','Y','Grp1|Grp3|Grp4' UNION ALL

    SELECT 'Resource3','Y','Grp1|Grp2|Grp4' UNION ALL

    SELECT 'Resource4','Y','Grp1|Grp2|Grp3' UNION ALL

    SELECT 'Resource5','Y','Grp2|Grp3' UNION ALL

    SELECT 'Resource6','Y','Grp1|Grp4'

    --=======================================================================================

    -- Create the new tables as was done in the article including FK constraints

    --=======================================================================================

    --===== New Resource table to contain only Resource information

    CREATE TABLE dbo.Resource

    (

    ResourceID INT IDENTITY(1,1) NOT NULL,

    ResourceName VARCHAR(50) NOT NULL,

    IsEnabled CHAR(1) NOT NULL CONSTRAINT DF_Resource_IsEnabled DEFAULT ('Y'),

    CONSTRAINT PK_Resource PRIMARY KEY CLUSTERED (ResourceID ASC)

    )

    --===== New SecurityGroup table to contain only Group information

    CREATE TABLE dbo.SecurityGroup

    (

    SecurityGroupID INT IDENTITY(1,1) NOT NULL,

    GroupName VARCHAR(255) NOT NULL,

    CONSTRAINT PK_SecurityGroup PRIMARY KEY CLUSTERED (SecurityGroupID ASC)

    )

    --===== New ResourceSecurityGroup table provides the link between Resources and Groups

    CREATE TABLE dbo.ResourceSecurityGroup

    (

    ResourceID INT NOT NULL,

    SecurityGroupID INT NOT NULL,

    CONSTRAINT PK_ResourceSecurityGroup PRIMARY KEY CLUSTERED

    (ResourceID ASC, SecurityGroupID ASC)

    )

    --===== Create the required FK's

    ALTER TABLE dbo.ResourceSecurityGroup WITH CHECK

    ADD CONSTRAINT FK_ResourceSecurityGroup_Resource_ResourceID

    FOREIGN KEY (ResourceID)

    REFERENCES dbo.Resource (ResourceID)

    ALTER TABLE dbo.ResourceSecurityGroup WITH CHECK

    ADD CONSTRAINT FK_ResourceSecurityGroup_SecurityGroup_SecurityGroupID

    FOREIGN KEY (SecurityGroupID)

    REFERENCES dbo.SecurityGroup (SecurityGroupID)

    --=======================================================================================

    -- Populate the new tables without the use of RBAR

    --=======================================================================================

    --===== Split the groups and assign ResourceID's and SecurityGroupID's for all rows.

    -- Data is stored in a temp table and all relationships are established by this

    -- single non-RBAR step. Even the cross-table links are established for population

    -- of the ResourceSecurityGroup table without any RBAR.

    ;WITH

    cteSplitData AS

    (

    SELECT ResourceName,

    IsEnabled,

    SUBSTRING('|'+orig.Groups, t.N+1, CHARINDEX('|', orig.Groups+'|', t.N)-t.N) AS GroupName

    FROM dbo.Tally t

    RIGHT OUTER JOIN --Necessary in case orig.Groups is NULL

    dbo.OriginalTable orig

    ON SUBSTRING('|'+orig.Groups, t.N, 1) = '|'

    AND t.N < LEN('|'+orig.Groups)

    )

    SELECT DENSE_RANK() OVER (ORDER BY ResourceName) AS ResourceID,

    ResourceName,

    IsEnabled,

    DENSE_RANK() OVER (ORDER BY GroupName) AS SecurityGroupID,

    GroupName

    INTO #Work

    FROM cteSplitData

    --===== Populate the Resource table with normalized data

    SET IDENTITY_INSERT dbo.Resource ON

    INSERT INTO dbo.Resource

    (ResourceID, ResourceName, IsEnabled)

    SELECT DISTINCT

    ResourceID, ResourceName, IsEnabled

    FROM #Work

    ORDER BY ResourceID

    SET IDENTITY_INSERT dbo.Resource OFF

    --===== Populate the SecurityGroup table with normalized data

    SET IDENTITY_INSERT dbo.SecurityGroup ON

    INSERT INTO dbo.SecurityGroup

    (SecurityGroupID, GroupName)

    SELECT DISTINCT

    SecurityGroupID, GroupName

    FROM #Work

    ORDER BY SecurityGroupID

    SET IDENTITY_INSERT dbo.SecurityGroup OFF

    --===== Populate the ResourceSecurityGroup link table with normalized data

    INSERT INTO dbo.ResourceSecurityGroup

    (ResourceID, SecurityGroupID)

    SELECT DISTINCT

    ResourceID, SecurityGroupID

    FROM #Work

    ORDER BY ResourceID, SecurityGroupID

    --=======================================================================================

    -- As a sanity check, display the contents of the work table and the 3 new tables

    --=======================================================================================

    SELECT * FROM #Work ORDER BY ResourceID, SecurityGroupID

    SELECT * FROM dbo.Resource ORDER BY ResourceID

    SELECT * FROM dbo.SecurityGroup ORDER BY SecurityGroupID

    SELECT * FROM dbo.ResourceSecurityGroup ORDER BY ResourceID, SecurityGroupID

    GO

    --=======================================================================================

    -- Create the view that puts this all back together for the existing app.

    --=======================================================================================

    CREATE VIEW dbo.ApplicationResources

    AS

    SELECT r.ResourceName,

    r.IsEnabled,

    STUFF((SELECT '|'+ sg.GroupName

    FROM dbo.ResourceSecurityGroup rsg

    INNER JOIN SecurityGroup sg

    ON rsg.SecurityGroupID = sg.SecurityGroupID

    AND rsg.ResourceID = r.ResourceID

    ORDER BY sg.GroupName

    FOR XML PATH('')),1,1,'') AS 'Groups'

    FROM dbo.Resource r

    GO

    --===== Sanity check the view

    SELECT * FROM dbo.ApplicationResources

    Don't forget... if you want to copy the code as formatted above, click and hold on the line just above the code box and drag to the line just below the code box. Then, copy and paste and the code should come in nicely formatted.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • WayneS (11/10/2008)


    What I would recommend doing for it though is to instead of using all of the replace statements, is to:

    Yes. The REPLACEs were used because the case where I lifted the code from actually needed to put a prefix on as well as generating the separator, i.e. replace(replace(replace(...,'][',','),'[],'Entries are: '),']',''). You can still dispence with the UDF...ALTER View [dbo].[ApplicationResources]

    AS

    SELECT R.[ResourceName]

    ,R.[IsEnabled]

    ,substring((SELECT '|'+sg.GroupName

    FROM ResourceSecurityGroup rsg

    JOIN SecurityGroup sg ON rsg.GroupID=sg.GroupID

    WHERE rsg.ResourceID=R.ResourceID

    ORDER BY GroupName

    FOR XML PATH('')),2,1000) as 'Groups'

    FROM dbo.Resource R

    go

    Derek

  • Hi Jeef,

    I just learned a function ... STUFF ... which solves the problem I was aware of relating to value chosen for the 3rd argument to SUBSTRING ... There isn't one! 🙂

    Also, is merging the subquery WHERE clause into an AND on the JOIN merely personal preference or is it more efficient?

    Derek

  • Jeff, Thanks for doing all that work and also for the copy tip. Both came in helpful for me!

    - JJ

  • Jeff Moden (11/10/2008)


    Heh... Understood and appreciated... I've been there with non-compliant vendors myself. Same goes for "out sourcing"... lot's of them don't care about performance or RBAR because they can wash their hands of it once they have your money. I keep trying to invite them to one of my famous porkchop dinners, but I guess word has gotten out on that. 😛

    In SQL Server 2005, there are some tools to do this quite easily without any Cursors or While loops as you'll see in the code below. The view still has a bit of RBAR but it's not too bad... it doesn't use any REPLACE's... just one STUFF.

    The entire "package" below is self standing including the creation of the original data. The whole thing runs in TempDB as a nice safe place to "play".

    ....

    Thanks Jeff. I will play with this later today; if I may can I reference this work in the followup article? I plan to use this and what I did to also show the performance differences. Thanks.

    [font="Arial"]---

    Mohit K. Gupta, MCITP: Database Administrator (2005), My Blog, Twitter: @SQLCAN[/url].
    Microsoft FTE - SQL Server PFE

    * Some time its the search that counts, not the finding...
    * I didn't think so, but if I was wrong, I was wrong. I'd rather do something, and make a mistake than be frightened and be doing nothing. :smooooth:[/font]

    How to ask for help .. Read Best Practices here[/url].

  • I also have had to deal with bad design from 3rd party vendor. I made changes to supplied stored procedures to improve performanace / remove bugs.

    But there was one problem - every-time there was an upgrade the changes were lost.

    I tried to get them to look at their own code - but was unsuccessful.

  • Regardless of the practicality of the particular example used in this article, it is a good introduction to many of the considerations involved in changing a db design and maintaining backward-compatibility.

    One question I had was whether or not the GroupName and ResourceName columns should be allowed to contain duplicates within their respective tables. If not, it would be nice to see the uniqueness constraints declared for them.

    TroyK

  • This is a little off the immediate article and a little theoretical, but it applies to the issue of RBAR in the aggregating function.

    Why doesn't SQL provide a simple text aggregation function that works with text in a GROUP BY the same way SUM accumulates numbers.

    A function TextAccum(var,sep) where var is a text column and sep is a seperator to be inserted before all but the first value would eliminate the need for the CombineGroupNames function. If it overflows varchar(max), treat it the same as a numeric overflow.

    The whole view then becomes

    SELECT R.[ResourceName]

    ,R.[IsEnabled]

    ,TextAccum(R.ResourceID,'|') AS Groups

    FROM Resource R

    INNER JOIN ResourceSecurityGroup RG

    ON R.ResourceID = RG.ResourceID

    INNER JOIN SecurityGroup G

    ON G.GroupID = RG.GroupID

    WHERE R.ResourceID = @ResourceID

    GROUP BY R.[ResourceName]

    ,R.[IsEnabled]

    No need for function definitions, or RBAR discussions.

    Why write a single use function each time you need to accumulate text.

    I know I'm not a sql guru, but this seems so obvious a need, I guess I'm looking for someone to tell me what I'm missing ???

  • Heh you are not the only one in that thought I believe :).

    I just found this on Microsoft Connect:

    https://connect.microsoft.com/SQLServer/feedback/ViewFeedback.aspx?FeedbackID=125819&wa=wsignin1.0

    But I think because something similar is possible to do by means of XPath as listed in the article above; or by the examples given by many other reviewers (thanks!!!).

    - Mohit.

    [font="Arial"]---

    Mohit K. Gupta, MCITP: Database Administrator (2005), My Blog, Twitter: @SQLCAN[/url].
    Microsoft FTE - SQL Server PFE

    * Some time its the search that counts, not the finding...
    * I didn't think so, but if I was wrong, I was wrong. I'd rather do something, and make a mistake than be frightened and be doing nothing. :smooooth:[/font]

    How to ask for help .. Read Best Practices here[/url].

  • cs_troyk (11/11/2008)


    Regardless of the practicality of the particular example used in this article, it is a good introduction to many of the considerations involved in changing a db design and maintaining backward-compatibility.

    One question I had was whether or not the GroupName and ResourceName columns should be allowed to contain duplicates within their respective tables. If not, it would be nice to see the uniqueness constraints declared for them.

    TroyK

    Thank-you,

    I'll make sure to add that in the follow up article :).

    - Mohit.

    [font="Arial"]---

    Mohit K. Gupta, MCITP: Database Administrator (2005), My Blog, Twitter: @SQLCAN[/url].
    Microsoft FTE - SQL Server PFE

    * Some time its the search that counts, not the finding...
    * I didn't think so, but if I was wrong, I was wrong. I'd rather do something, and make a mistake than be frightened and be doing nothing. :smooooth:[/font]

    How to ask for help .. Read Best Practices here[/url].

  • Derek Dongray (11/10/2008)


    Hi Jeef,

    I just learned a function ... STUFF ... which solves the problem I was aware of relating to value chosen for the 3rd argument to SUBSTRING ... There isn't one! 🙂

    Also, is merging the subquery WHERE clause into an AND on the JOIN merely personal preference or is it more efficient?

    Some say so... to be honest, I haven't tested it. I normally don't mix the two methods of joining.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • JJ B (11/10/2008)


    Jeff, Thanks for doing all that work and also for the copy tip. Both came in helpful for me!

    - JJ

    Glad it helped. Thanks for the feedback, JJ.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Ed Klein (11/11/2008)


    This is a little off the immediate article and a little theoretical, but it applies to the issue of RBAR in the aggregating function.

    Why doesn't SQL provide a simple text aggregation function that works with text in a GROUP BY the same way SUM accumulates numbers.

    A function TextAccum(var,sep) where var is a text column and sep is a seperator to be inserted before all but the first value would eliminate the need for the CombineGroupNames function. If it overflows varchar(max), treat it the same as a numeric overflow.

    The whole view then becomes

    SELECT R.[ResourceName]

    ,R.[IsEnabled]

    ,TextAccum(R.ResourceID,'|') AS Groups

    FROM Resource R

    INNER JOIN ResourceSecurityGroup RG

    ON R.ResourceID = RG.ResourceID

    INNER JOIN SecurityGroup G

    ON G.GroupID = RG.GroupID

    WHERE R.ResourceID = @ResourceID

    GROUP BY R.[ResourceName]

    ,R.[IsEnabled]

    No need for function definitions, or RBAR discussions.

    Why write a single use function each time you need to accumulate text.

    I know I'm not a sql guru, but this seems so obvious a need, I guess I'm looking for someone to tell me what I'm missing ???

    Can we see the code for the function, Ed, or are you just saying that it would be nice to have one?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 15 posts - 16 through 30 (of 62 total)

You must be logged in to reply to this topic. Login to reply