SQLServerCentral Article

Expanding The Scope of Bridge Tables

,

Editor's Note: Please read the discussion that goes with this article. There is valuable information you ought to know before following this advice.

In my article "Decoupling in Relational Databases," I talked about bridge tables, or tables that allowed for two concepts in a database to be related in terms of the bridge table. In the article, I used the example of a User, a Group, and a bridge table that allows for any user to belong to any number of groups, called USER_GROUP.

I'd like to expand on that concept of a bridge table, and show how, in relational databases, bridge tables can have a broader scope than the example demonstrated in "Decoupling in Relational Databases."

Consider a similar example as mentioned above: a database that has employees and functional areas, and a scenario in which employees can serve in multiple functional areas. To represent these concepts, I would build three tables: Employee, FunctionalArea, and Employee_FunctionalArea

Table Name: Employee

Column NameData Type
IDint (auto-increment, Primary Key)
EmployeeIDvarchar(25)
FirstNamevarchar(75)
LastNamevarchar(75)

Table Name: FunctionalArea

Column NameData Type
IDint (auto-increment, Primary Key)
Namevarchar(50)

Table Name: Employee_FunctionalArea

Column NameData Type
IDint (auto-increment, Primary Key)
FK_Employeeint (relates to the Employee Table)
FK_FunctionalAreaint (relates to the FunctionalArea Table)

The above tables represent a bridge relationship; however, since the reason we build databases and applications is because we need ways to represent reality on a computer, there are many times where the database would need to facilitate more functionality. Consider the possibility that the company that owns this database needs to provide specific training for employees , and the training item that the employee needs to be trained on is specific to the functional area in which they work.

The first thing that jumps out at me about this scenario is that we need more tables to facilitate these relationships. The first table we would need to add is a TrainingCourse table, and it might look something like this:

Table Name: TrainingCourse

Column NameData Type
IDint (auto-increment, Primary Key)
CourseNamevarchar(75)
CourseDescriptionvarchar(255)

The best way to decide how to represent an Employee's training history depends on how granular the company wants to report on the data (to what level of specification do we want the database to represent reality). Many database architects may be inclined to bridge the Employee table and the TrainingCourse table with a table that might be named Employee_TrainingCourse, and thus creating another bridge table; however, I would probably be inclined to bridge the Employee_FunctionalArea table with the TrainingCourse table, because it would offer me that extra granularity of knowing what the intent was of providing the training course to the employee. The table I envision would look as follows:

Table Name: Employee_FunctionalArea_TrainingCourse

Column NameData Type
IDint (auto-increment, Primary Key)
FK_Employee_FunctionalAreaint (relates to the Employee_FunctionalArea table)
FK_TrainingCourseint (relates to the TrainingCourse table)
TrainingDateDateTime

The above table gives us the opportunity to better manage employee training, and prevent duplicate training. For instance, if I wanted to see all the training courses an employee has received, I might build a query as follows:

SELECT E.Firstname, E.LastName, 
FA.Name,
TC.CourseName,
E_FA_TC.TrainingDate
FROM Employee E
JOIN Employee_FunctionalArea E_FA ON E.ID=E_FA.FK_Employee
JOIN FunctionalArea FA ON E_FA.FK_FunctionalArea=FA.ID
JOIN Employee_FunctionalArea_TrainingCourse E_FA_TC ON E_FA.ID=E_FA_TC.FK_Employee_FunctionalArea
JOIN TrainingCourse TC ON E_FA_TC.FK_TrainingCourse=TC.ID
WHERE E.EmployeeID='12345'

Likewise, if I wanted to make sure that an employee has not already received the same training that may be offered by multiple functional areas, I could modify the above query, and build some business logic to prevent duplication (also accomplishable by adding constraints to the table):

DECLARE @receivedTraining Bit
SET @receivedTraining =
( SELECT COUNT(*)
FROM Employee E
JOIN Employee_FunctionalArea E_FA ON E.ID=E_FA.FK_Employee
JOIN FunctionalArea FA ON E_FA.FK_FunctionalArea=FA.ID
JOIN Employee_FunctionalArea_TrainingCourse E_FA_TC ON E_FA.ID=E_FA_TC.FK_Employee_FunctionalArea
JOIN TrainingCourse TC ON E_FA_TC.FK_TrainingCourse=TC.ID
WHERE E.EmployeeID='12345'
AND TC.Name='Introductory Employee Training' ) IF @receivedTraining = 1
PRINT 'Employee 12345 has already received the Introductory Employee Training course'
ELSE
INSERT INTO Employee_FunctionalArea_TrainingCourse(...

This concept of building on top of bridge tables comes up a lot (at least in my database developments). As the complexity of bridge table relationships increases, so does the query complexity. Aside from this complexity, there are also index considerations; in other words, because you are linking through so many tables to get the desired data, there must indexes to facilitate that (my rule of thumb is that an index gets created on each foreign key column in the table. This rule can get altered based on how many records I expect to be going into the table, though).

Solving some of the problems mentioned above can be quite time consuming, and can really test your skills as a database administrator. There are solutions though. For instance, views can come in quite handy to limit the need to build very complex queries.

Another tool that one may use in dealing with all this normalization is some degree of denormalization. For instance, in the above example, I may put a FK_Employee column in the Employee_FunctionalArea_TrainingCourse table, so it would look as follows:

Table Name: Employee_FunctionalArea_TrainingCourse

Column NameData Type
IDint (auto-increment, Primary Key)
FK_Employeeint (relates to the Employee table)
FK_Employee_FunctionalAreaint (relates to the Employee_FunctionalArea table)
FK_TrainingCourseint (relates to the TrainingCourse table)
TrainingDateDateTime

The above denormalization would allow me to reduce the number of tables I had to go through to get the data I need. Though introducing denormalization (duplication of data) can be risky, it reduces risk in other ways, particularly in the amount of time it takes to build a query, as well as how SQL Server assembles the execution plan. I can state quite emphatically that the more tables you have to go through, in one query, to get required data, the more risk you assume in getting a bad performance plan, particularly if a specific index or indexes are fragmented or out-of-date.

Regardless of how one decides to solve some of these architecting and maintenance problems, being armed with the right tools can save a great deal of time and frustration, and the above concepts are valuable tools in the DBA's toolbox.

Editor's Note: Please read the discussion that goes with this article. There is valuable information you ought to know before following this advice.

Rate

2.34 (47)

You rated this post out of 5. Change rating

Share

Share

Rate

2.34 (47)

You rated this post out of 5. Change rating