Do I (sometimes) have to sacrifice normalization for data integrity?

  • I often come up against the following challenge, and end up de-normalizing my table design in order to enforce data integrity. I would be interested to hear if there is a different way of modelling the data that is both normalized and enforces integrity.

    Here is a typical (simplified) example:

    create table [ProductType]

    (

    [ProductTypeId] INT identity(1,1) not null,

    [ProductTypeName] nvarchar(100) not null,

    constraint [PK_ProductType] primary key ([ProductTypeId])

    )

    create table [Product]

    (

    [ProductId] INT identity(1,1) not null,

    [ProductTypeId] int not null,

    [ProductName] nvarchar(100) not null,

    constraint [PK_Product] primary key ([ProductId]),

    constraint [AK_Product] unique ([ProductId], [ProductTypeId]),

    constraint [FK_Product_ProductType] foreign key ([ProductTypeId]) references [ProductType]([ProductTypeId])

    )

    create table [ProductTypeProperty]

    (

    [PropertyId] INT identity(1,1) not null,

    [ProductTypeId] int not null,

    [PropertyName] nvarchar(100) not null,

    constraint [PK_ProductTypeProperty] primary key ([PropertyId]),

    constraint [AK_ProductTypeProperty] unique ([PropertyId], [ProductTypeId]),

    constraint [FK_ProductTypeProperty_ProductType] foreign key ([ProductTypeId]) references [ProductType]([ProductTypeId])

    )

    create table [ProductPropertyValue]

    (

    [ProductId] INT not null,

    [PropertyId] INT not null,

    [ProductTypeId] int not null,

    [PropertyValue] nvarchar(100) not null,

    constraint [PK_ProductPropertyValue] primary key ([ProductId], [PropertyId]),

    constraint [FK_ProductPropertyValue_ProductTypeProperty] foreign key ([PropertyId], [ProductTypeId]) references [ProductTypeProperty]([PropertyId], [ProductTypeId]),

    constraint [FK_ProductPropertyValue_Product] foreign key ([ProductId], [ProductTypeId]) references [Product]([ProductId], [ProductTypeId])

    )

    SET IDENTITY_INSERT [dbo].[ProductType] ON

    INSERT [dbo].[ProductType] ([ProductTypeId], [ProductTypeName]) VALUES (1, N'Clothing')

    INSERT [dbo].[ProductType] ([ProductTypeId], [ProductTypeName]) VALUES (2, N'Drink')

    SET IDENTITY_INSERT [dbo].[ProductType] OFF

    SET IDENTITY_INSERT [dbo].[ProductTypeProperty] ON

    INSERT [dbo].[ProductTypeProperty] ([PropertyId], [ProductTypeId], [PropertyName]) VALUES (1, 1, N'Colour')

    INSERT [dbo].[ProductTypeProperty] ([PropertyId], [ProductTypeId], [PropertyName]) VALUES (2, 1, N'Size')

    INSERT [dbo].[ProductTypeProperty] ([PropertyId], [ProductTypeId], [PropertyName]) VALUES (3, 2, N'Volume')

    SET IDENTITY_INSERT [dbo].[ProductTypeProperty] OFF

    SET IDENTITY_INSERT [dbo].[Product] ON

    INSERT [dbo].[Product] ([ProductId], [ProductTypeId], [ProductName]) VALUES (1, 1, N'T-shirt')

    INSERT [dbo].[Product] ([ProductId], [ProductTypeId], [ProductName]) VALUES (2, 2, N'Milk')

    SET IDENTITY_INSERT [dbo].[Product] OFF

    INSERT [dbo].[ProductPropertyValue] ([ProductId], [PropertyId], [ProductTypeId], [PropertyValue]) VALUES (1, 1, 1, N'Red')

    INSERT [dbo].[ProductPropertyValue] ([ProductId], [PropertyId], [ProductTypeId], [PropertyValue]) VALUES (1, 2, 1, N'XL')

    INSERT [dbo].[ProductPropertyValue] ([ProductId], [PropertyId], [ProductTypeId], [PropertyValue]) VALUES (2, 3, 2, N'1 pint')

    -- NOTE THAT THE FKS ON [PRODUCTPROPERTYVALUE] MEAN YOU CANNOT RUN EITHER OF THESE

    -- WHICH TRY TO ASSIGN A PROPERTY TO A PRODUCT THAT DOESN'T BELONG TO ITS TYPE

    INSERT [dbo].[ProductPropertyValue] ([ProductId], [PropertyId], [ProductTypeId], [PropertyValue]) VALUES (1, 3, 1, N'Red')

    INSERT [dbo].[ProductPropertyValue] ([ProductId], [PropertyId], [ProductTypeId], [PropertyValue]) VALUES (2, 2, 1, N'XL')

    Here's what I am trying to model:

    1. A product has a product type (e.g. "clothing", "drink" etc)

    2. A product type has a number of properties (e.g. "clothing" has "colour" and "size")

    3. A product has property values for any/all of the properties belonging to its type

    4. A product cannot have property values for properties not belonging to its type

    Point #4 is the one that causes the challenge. In order to create foreign keys to achieve this ([FK_ProductProperty_ProductTypeProperty] and [FK_ProductPropertyValue_Product]) I have de-normalized and added "unnecessary" unique constraints:

    1. Added [ProductTypeId] to [ProductPropertyValue]

    2. Added a unique constraint on [PropertyId], [ProductTypeId] to [ProductPropertyValue]

    3. Added a unique constraint on [ProductId], [ProductTypeId] to [Product]

    However it seems impossible to achieve all 4 of the above points without these ... is it?

    This challenge seems to come up when my design departs from a simple branching "snowflake" and there needs to be "loops" in the relationships:

    P ---------> PT

    ^ ^

    | |

    | |

    PPV ------> PTP

    Note: this is a simple example with a 4 tables. With a real-world design with many levels of related tables, the issue becomes magnified (unique constraints on multiple columns, more than one "unnecessary" unique constraint per table etc).

  • If you could add some sample data to your tables above that would help.

  • Done

  • I think I'd add another table in there. Instead of having a direct link between ProductPropertyType and ProductPropertyValue, I would define the values, the properties, and then have a table that maps between them. Then, only the ProductPropertyValue that are in the mapping table for each ProductPropertyType will be available to the Product of that ProductPropertyType. That should completely satisfy the business requirement.

    Denormalization just isn't a mechanism I would normally use for data integrity. It's kind of the opposite really.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Laurence Neville (9/27/2016)


    I often come up against the following challenge, and end up de-normalizing my table design in order to enforce data integrity. I would be interested to hear if there is a different way of modelling the data that is both normalized and enforces integrity.

    I think you are probably not doing sufficient normalisation: If you don't top at 3NF but intead go on to EKNF and then to 4NF and 5NF wile avoiding the BCNF nonsense (BCNF was a mistake, but descriptions of 4NF and 5NF that incorporate it are excessively common) you may find that you are enforcing data integrity quite effectively without any denormalisation. Of course if teh denormalisation you are referring tois reverting from BCNF to EKNF, that is not denormalisation if the two forms are different (because in that case BCNF is not a normal form for your data model, and reverting to EKNF is normalisation).

    This challenge seems to come up when my design departs from a simple branching "snowflake" and there needs to be "loops" in the relationships:

    P ---------> PT

    ^ ^

    | |

    | |

    PPV ------> PTP

    Note: this is a simple example with a 4 tables. With a real-world design with many levels of related tables, the issue becomes magnified (unique constraints on multiple columns, more than one "unnecessary" unique constraint per table etc).

    Well, your diagram doesn't have a loop in it - it is acyclic - so the loop idea seems to be something of a red herring. Real loops in the network of foreign key constraints would be a problem, but if you had such a loop I am pretty sure that the problem would be that you had chosen a bizarre and illogical bunch of data that didn't represent reality. If the network is acyclic and not a tree, like the network in the diagram, cascades are a pain to handle (which is why SQL Server refuses to handle them and you have to do it yourself) but that shouldn't lead to denormalisation.

    Tom

Viewing 5 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply