SQLServerCentral Article

Using Super Keys to Enforce Database Constraints

,

It is common knowledge ( and rightly so!) that normalizing a database, then de-normalizing it is bad thing to do. We would be sacrificing data integrity for the sake of questionable efficiency gains in some queries. I want to show that de-normalization does not need to decrease level of data integrity. On the contrary, it may help increase data integrity.

In this text, I will demonstrate how de-normalization can be used to enforce constraints that we normally would use procedural code for. There could be even some efficiency gain, as a by-product of this integrity increasing de-normalization. However, the efficiency of query execution is not the goal, the goal is to enforce certain constraints in pure declarative way.

A Scenario

Sport teams from all over the world are coming to a world championship tournament. Naturally, the teams speak many languages, so the host wants to provide guides for the teams. We need a database to take care of guide assignments.

On any given day, each guide can be assigned to no more than one team. Each team may be with no more than one guide on any given day.

The (often unspoken) rule to enforce: teams and guides must speak the same language. This article will deal with all the rules, including the unspoken one.

We can start with this code. A couple tables and data:

IF Object_ID ('Guides') IS NOT NULL
DROP TABLE Guides
;
GO
CREATE TABLE Guides
(
  Guide varchar (15) NOT NULL UNIQUE
, LangSpoken varchar (3) NOT NULL
)
;
GO
INSERT INTO Guides ( Guide,         LangSpoken)
VALUES             (    'Julles'   , 'FRE')
,                  (    'William'  , 'ENG')
,                  (    'Regine'   , 'FRE')
,                  (    'Donald'   , 'ENG')
,                  (    'Sergey'   , 'RUS')
,                  (    'Heinz'    , 'GER')
;
GO
SELECT *
FROM Guides
ORDER BY LangSpoken
;
/*--
Guide           LangSpoken
--------------- ----------
William         ENG
Donald          ENG
Regine          FRE
Julles          FRE
Heinz           GER
Sergey          RUS
(6 row(s) affected)
--*/IF Object_ID ('Teams') IS NOT NULL
DROP TABLE Teams
;
GO
CREATE TABLE Teams
(
  Team varchar (15) NOT NULL UNIQUE
, LangSpoken varchar (3) NOT NULL
)
;
GO
INSERT INTO Teams  ( Team, LangSpoken )
VALUES             (    'France'   , 'FRE'    )
,                  (    'England'  , 'ENG'    )
,                  (    'Morocco'  , 'FRE'    )
,                  (    'Germany'  , 'GER'    )
,                  (    'Russia'   , 'RUS'    )
,                  (    'USA'      , 'ENG'    )
;
SELECT *
FROM Teams
ORDER BY LangSpoken
;
/*--
Team            LangSpoken
--------------- ----------
England         ENG
USA             ENG
France          FRE
Morocco         FRE
Germany         GER
Russia          RUS
(6 row(s) affected)
--*/

Both tables are in the highest possible normal form. Both have two attributes, one of which is the unique key. Another attribute, LangSpoken, is common for both tables and is not part of the unique key. NULLS are not allowed, so even mathematical purists wold agree that we have achieved 5th NF, even 6th, according to definitions in “Database Design and Relational Theory, Normal Forms & All That Jazz” by C.J. Date.

We will create another table, TeamGuides, to connect Guides with Teams on given tournament day:

IF Object_ID('TeamGuides') IS NOT NULL
DROP TABLE TeamGuides
;
GO
CREATE TABLE TeamGuides
(
  Guide varchar(15) NOT NULL
, Team varchar(15)
, TourDay int NOT NULL     -- Tournament days labeled as integers
, CONSTRAINT TeamGuides_UNQ_TeamDay UNIQUE (Team, TourDay)
, CONSTRAINT TeamGuides_UNQ_GuideDay UNIQUE (Guide, TourDay)
, CONSTRAINT TeamGuides_FK_Teams
     FOREIGN KEY (Team) REFERENCES Teams (Team)
, CONSTRAINT TeamGuides_FK_Guides
     FOREIGN KEY (Guide) REFERENCES Guides (Guide)
)
;
GO

Again we have the highest level of normalization, 6th NF. The table has a composite Primary Key (PK), (Guide, Team), and just one non-key attribute, hence 6th NF.

The two UNIQUE constraints take care of the requirement that each team may participate in any tournament day no more than once, and each guide can be assigned to a team no more than once each tournament day. We are done with the stated requests. None of the UNIQUE constraints seems more important than the other, so none of them is selected as the Primary Key.

What, a table without a Primary Key? Why not? a PK is just one of all possible keys, this key being a unique set of attributes. Anyways, we took care of the stated requirements. The often unstated constraint, “Guides and Teams must speak the same language”, is a whole different game.

Before we proceed with the unspoken requirement, we will add some sample data to show how perfect normalization does not do a thing for “Guides and Teams must speak the same language”. Common knowledge tells us that we might have to resort to triggers, stored procedures, meterialized views, etc. Those methods do work, but all have drawbacks and are not so difficult to circumvent. Database constraints would be something, but how? Read on.

Let’s add some sample data, for tournament day 1 (the rule about the same languages not enforced):

INSERT INTO TeamGuides (Guide, Team, TourDay)
VALUES    
           ('Donald', 'England', 1)   ,   
           ('Regine', 'Morocco', 1)   ,   
           ('Sergey', 'Russia',  1)
;
-- (3 row(s) affected)

It is not so obvious that teams and guides really speak the same language, so we’ll check it by following query (a confirmation query):

SELECT TG.Guide, TG.Team, TG.TourDay
, TeamLang = T.LangSpoken
, GuideLang = G.LangSpoken
FROM TeamGuides AS TG
INNER JOIN Teams AS T ON T.Team = Tg.Team
INNER JOIN Guides AS G ON G.Guide = TG.Guide
;
/*--
Guide           Team                TourDay TeamLang GuideLang
--------------- --------------- ----------- -------- ---------
Donald          England                   1 ENG      ENG
Regine          Morocco                   1 FRE      FRE
Sergey          Russia                    1 RUS      RUS
(3 row(s) affected)
--*/

So far, so good. In each case the TeamLang matches the GuideLang. This means we were careful enough with our INSERTs. What if we are not careful and provide inconsistent data?

Let’s continue with Guide-Team assignments, for tournament day 2, with some inconsistent data:

INSERT INTO TeamGuides (Guide, Team, TourDay)
VALUES     ('Heinz',  'England', 2)
     ,    ('Julles', 'France',  2)
     ,    ('Sergey', 'Germany', 2)
;
-- (3 row(s) affected),

All looks well again, the INSERT did not fail. However, if we run our confirmation query, the result is less than perfect:

SELECT TG.Guide, TG.Team, TG.TourDay
, TeamLang = T.LangSpoken
, GuideLang = G.LangSpoken
FROM TeamGuides AS TG
INNER JOIN Teams AS T ON T.Team = TG.Team
INNER JOIN Guides AS G ON G.Guide = TG.Guide
ORDER BY TourDay, Guide
;
/*--
    TourDay Guide           Team            TeamLang GuideLang
----------- --------------- --------------- -------- ---------
          1 Donald          England         ENG      ENG
          1 Regine          Morocco         FRE      FRE
          1 Sergey          Russia          RUS      RUS
          2 Heinz           England         ENG      GER
          2 Julles          France          FRE      FRE
          2 Sergey          Germany         GER      RUS
(6 row(s) affected)
--*/

On day 2, teams of England and Germany were assigned the wrong guides. England has a German speaking guide, and Germany ended up with Russian speaking guide. Oops. The mix-up is worth a diplomatic scandal. CNN will have a field day - Russian hackers anyone? Not good. Oh boy, do I wish I had written those triggers...

It all looked well - Primary Key, uniqueness, foreign keys - it is all there, 6th NF. So what went wrong?

For starters, we haven’t even tried to enforce the unspoken rule, “teams and guides must speak the same language”. Everything else we did was perfect. So, how do we make sure that guides are assigned to teams that speak the same language? Triggers and more come to mind, but that is too much work and too complicated. We would need one trigger for each of these requirements:

  1. Both the given guide and the team must speak the same language all the time. This requires a trigger on TeamGuides for INSERT/UPDATE.
  2. Once a guide is assigned to the team, we should not be able to change or delete the guide's language. This requires a trigger on Guides for UPDATE, DELETE.
  3. Once a team is assigned a guide, the team's language cannot be changed or deleted. This requires trigger on Teams for UPDATE/DELETE.

This means one trigger per each table, and we cannot copy/paste the code. I don't feel like writing code, so something else will have to do.

Luckily, there is another solution. First, we will delete all the rows from TeamGuides. If we don’t do this, (Heinz, England) and (Sergey, Germany) will prevent us from declaring the constraints we want to use in our demonstration.

DELETE TeamGuides
;
GO

We will start fresh, by adding a new column to TeamGuides:

ALTER TABLE TeamGuides
ADD LangSpoken varchar (3) NOT NULL DEFAULT ('ENG')
;
GO

The purpose of the DEFAULT ('ENG') is to allow the NOT NULL setting in the ALTER TABLE.

I just introduced a redundancy. LangSpoken is now stored in three places, instead of two. According to the rules of normalization, it should not be in the table, TeamGuides. For the mathematically minded, LangSpoken in the table, TeamGuides, does not depend on the entire PK (Guide, Team, TourDay). It is enough to know either Guide or Team, and we will know the language. It is not necessary to know entire PK (Guide, team, TourDay), hence a violation of 2nd NF.

So here we are in 1st NF, from 6th NF in one step. No worries, the alarm is false. By introducing redundant column in table TeamGuides, we did not relax any of PK-FK rules established earlier. The constraints on declared on TeamGuides are still active:

   FOREIGN KEY (Team) REFERENCES Teams (Team)
   FOREIGN KEY (Guide) REFERENCES Guides (Guide)

The meaning of the existing FK is “TeamGuides cannot contain any value in the attribute Team if such a value does not exist in the table, Teams. Ditto for Guides. Yet, it is still possible to enter a (Guide, Team) combination with different languages.

Now, the key thing (pun intended) - let’s add two superkeys. Remember, a superkey is a set of attributes which uniquely identifies a row in the table (a tuple). The set of attributes that make a superkey may be minimal. Not all attributes are really needed to identify the row. If we take away all unnecessary attributes, the super key becomes - a key. Since we naturally know what a key is, we can also say that a super-key is a key with some extra atributes, unnecessary for determining uniqueness of given row. Since super keys behave the same as keys, we can use them instead of keys, which is convenient. In our case, we will create two superkeys, one for each of tables, Teams and Guides.

ALTER TABLE Teams
ADD CONSTRAINT Teams_SK_Lang UNIQUE (Team, LangSpoken)
;
ALTER TABLE Guides
ADD CONSTRAINT Guides_SK_Lang UNIQUE (Guide, LangSpoken)
;

The same way we referenced keys from TeamGuides, we can now reference our newly minted superkeys.

ALTER TABLE TeamGuides
ADD CONSTRAINT TeamGuides_FK_TeamLang
FOREIGN KEY (Team, LangSpoken)
REFERENCES Teams (Team, LangSpoken)
;
ALTER TABLE TeamGuides
ADD CONSTRAINT TeamGuides_FK_GuideLang
FOREIGN KEY (Guide, LangSpoken)
REFERENCES Guides (Guide, LangSpoken)
;

There, we generated additional FK constraints from TeamGuides, each referencing superkeys in tables Guides and Teams. This prevents a mismatch in language. If we try to assign guide ‘Donald’ (English speaking) to team ‘Russia’ (Russian speaking, eh), we have room to enter only one value for TeamGuides.LangSpoken. If we enter ‘RUS’ then the constraint, TeamGuides_FK_GuideLang, will fail. If we enter ‘ENG’, then the constraint, TeamGuides_FK_TeamLang, will fail. Therefore, the combination (Guide, Team) must be such that the languages always match.

It is safe now to drop the constraints we set up initially, referencing keys (not super-keys)

ALTER TABLE TeamGuides
DROP  CONSTRAINT TeamGuides_FK_Teams
;
ALTER TABLE TeamGuides
DROP CONSTRAINT TeamGuides_FK_Guides
;

Let’s see if our new super-key based foreign keys are working. The following INSERT is expected to work, because guides and teams were carefully chosen:

INSERT INTO TeamGuides (Guide, Team, TourDay, LangSpoken)
VALUES    
           ('Donald', 'England', 1, 'ENG')  ,   
           ('Regine', 'Morocco', 1, 'FRE')  ,   
           ('Sergey', 'Russia',  1, 'RUS')
;    -- (3 row(s) affected)

The next one is expected to fail, because we are assigning German speaking guide to an English speaking team:

INSERT INTO TeamGuides (Guide, Team, TourDay, LangSpoken)
VALUES     ('Heinz',  'England', 2,   'GER')
;

Sure enough, it did fail:

Msg 547, Level 16, State 0, Line 1. The INSERT statement conflicted with the FOREIGN KEY constraint "TeamGuides_FK_TeamLang". The conflict occurred in database "DMA", table "dbo.Teams".
The statement has been terminated.

This should also fail:

INSERT INTO TeamGuides (Guide, Team, TourDay, LangSpoken)
VALUES          ('Sergey', 'Germany', 2,   'RUS')
;

Just as expected:

Msg 547, Level 16, State 0, Line 1. The INSERT statement conflicted with the FOREIGN KEY constraint "TeamGuides_FK_TeamLang". The conflict occurred in database "DMA", table "dbo.Teams".
The statement has been terminated.

If we are careful, we will eventually succeed:

INSERT INTO TeamGuides (Guide, Team, TourDay, LangSpoken)
VALUES     ('Julles', 'France',  2,   'FRE')
;    -- (1 row(s) affected)

Time to see what we have now in TeamGuides. We'll run the confirmation query again:

SELECT TG.TourDay, TG.Guide, TG.Team
, TeamLang = T.LangSpoken
, GuideLang = G.LangSpoken
FROM TeamGuides AS TG
INNER JOIN Teams AS T ON T.Team = Tg.Team
INNER JOIN Guides AS G ON G.Guide = TG.Guide
ORDER BY TourDay, Guide
;
/*--
    TourDay Guide           Team            TeamLang GuideLang
----------- --------------- --------------- -------- ---------
          1 Donald          England         ENG      ENG
          1 Regine          Morocco         FRE      FRE
          1 Sergey          Russia          RUS      RUS
          2 Julles          France          FRE      FRE
(4 row(s) affected)
--*/

The new constraints worked as expected, in one direction. The data in TeamGuides must match the data in Guides and Teams. There is another direction that foreign keys enforce. Once a key value from the referenced (parent) table appears in the referencing (child) table, then changes of the key in parent table are forbidden, unless we declare some CASCADE UPDATE/DELETE options. In this case, we assume no cascading.

What happens if we try to change language for Donald to FRE, for example?

UPDATE Guides
SET LangSpoken = 'FRE'
WHERE Guide = 'Donald'
;

We get an error.

Msg 547, Level 16, State 0, Line 1. The UPDATE statement conflicted with the REFERENCE constraint "TeamGuides_FK_GuideLang". The conflict occurred in database "DMA", table "dbo.TeamGuides".The statement has been terminated.

The new FK constraint fired again, and the inconsistent UPDATE was rejected.

It is safe to say that we have achieved our goal – enforcing the rule that Guides must speak the same language as the Teams they are associated with, in a declarative way. The price is an extra column in table, TeamGuides, and two super-keys, one in each of table, Teams and Guides.

Is this going to cost a lot in disk space? I don’t know, and don’t want to think about that, as space on disk is cheap nowadays. Did we jeopardize speed of query execution? Maybe, but that is not the concern here. We wanted declarative constraints at some reasonable price. By the way, which query would be more efficient?

A) Based on initial solution, no redundant column LangSpoken in table TeamGuides:


SELECT TG.TourDay, TG.Guide, TG.Team
, TeamLang = T.LangSpoken
, GuideLang = G.LangSpoken
FROM TeamGuides AS TG
INNER JOIN Teams AS T ON T.Team = Tg.Team
INNER JOIN Guides AS G ON G.Guide = TG.Guide

or this one:

B) Based on solution with redundant column LangSpoken in TeamGuides:

SELECT TG.TourDay, TG.Guide, TG.Team
, TeamLang = TG.LangSpoken
, GuideLang = TG.LangSpoken
FROM TeamGuides AS TG

I am not expert in query plans, so my feeling might be wrong, but a query without JOINs should be faster than one with two JOINs.

To recapitulate: we can use superkeys in place of keys and reference them by foreign keys, when we need to. And we need to do that when the parent and child tables have more attributes in common than what parent key provides. This assumes adding the common attribute(s) to the child table, hence apparent denormalization. However, denormalization is not real, it is apparent. In the process we did not relax any of referential integrity rules, in spite of having additional attribute(s) in the child table.

There are many situations where we can apply this method, we might explore them in the near future if you find this interesting. A few are as simple as the example used in this article, but many are more complicated and require more redundancy and additional CHECK constraints. These are purely declarative and always more efficient and reliable than any kind of procedural code. Just as ancient wisdom goes: constraints are safer than triggers, triggers are safer than stored procedures, stored procedures are safer than front-end application code, which is safer than ignoring the rule altogether.

We can denormalize, not for speed but for data integrity and consistency without loosing any benefits from initial normalization. By no means we should do any of this before full normalization. First normalize the best you can, then see if there is need for introducing and referencing superkeys and all will be well.

Rate

3.42 (24)

You rated this post out of 5. Change rating

Share

Share

Rate

3.42 (24)

You rated this post out of 5. Change rating