Breaking 3NF to save a join - should I do it?

  • Guys,

    I have the following situation for my new Data Warehouse:

    A table called ENTITY

    CREATE TABLE Entity

    (

    ID int identity(1,1),

    EntityType int not null,

    NaturalKey nvarchar(100) not null

    )

    (The NaturalKey is the value we will want to report on). Once set the Entity.NaturalKey should not change and is therefore a candiate for PK on Entity.

    and I have a table called SOURCE

    CREATE TABLE Source

    (

    ID int identity(1,1),

    SourceName nvarchar(100) not null

    )

    Now the challenge is that the same Entity may have different keys in different source systems (don't ask!)

    so there is an interim table

    CREATE TABLE Source_X_Entity

    (

    ID int identity(1,1),

    SourceID int not null, -- FK to source

    EntityID int not null, -- FK to Entity

    SourceRef nvarchar(100) not null,

    -- some other fields not relevant to the discussion

    )

    The issue is that most of the time we will be searching, grouping and filtering by the Entity.NaturalKey but going through the Source_X_Entity table. Should I break 3rd Normal Form and put the NaturalKey into the Source_X_Entity table as well and risk having to do a mass update if the natural key is ever changed. It will save me a join on almost every query.

    FYI we are using SQL2012 for this project but I would be interested to know if the answer changes for SQL2008 ro SQL2005

  • Is this "extra join" causing a performance issue? If not, I would recommend that you keep the table structure you have. If this is a new system load it with data and test the performance.



    Microsoft Certified Master - SQL Server 2008
    Follow me on twitter: @keith_tate

    Forum Etiquette: How to post data/code on a forum to get the best help[/url]

  • assuming you would join Source_X_Entity to the other two tables ,

    i would think that if you create a non clustered index on Entity(NaturalKey), or maybe a non cluster on(Entity(EntityID) INCLUDE(NaturalKey) you would see a good query plan when hitting the three tables together, na dwould not need to modify the structure.

    Lowell


    --help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

  • Thanks guys,

    The Entity.NaturalKey will definately be indexed in some form as it will be displayed on most granular level reports. As to performance, I am not sure because I havn't built it yet. I am a contractor and have 5 weeks to develop a generic DW which will then need to perform admirably with an unknown quantity of data. most of the time the X table will be a 1:1:1 relationship so I doubt that performance will be an issue. The question was whether to save effort when creating queries by including the remote field in the primary driving table - The major con of this would be that if you do join the two tables you would need to specify the table name in the field definitions.

    I think I will stick to the 3NF for now and say you need to join to the table if you need the field. I might create a view with the two tables already joined so save code duplication.

Viewing 4 posts - 1 through 3 (of 3 total)

You must be logged in to reply to this topic. Login to reply