many to many relationship

  • Hi, why do we need a junction table in a many to many relationship? Why can't everything be in just 2 tables?

  • I've never seen any way to implement it in a relational database without a junction table.

    Even the Wikipedia definition includes the statement "...it is necessary to implement such relationships physically via a third junction table..." http://en.wikipedia.org/wiki/Many-to-many_(data_model)

    The reason why you don't want to use just two tables is because if you did then you'd be duplicating lots of data.

  • But if you set multiple columns as primary keys so if you are entering something new the server will tell you if its duplicated, wouldn't that be ok?

    I'm sorry, I'm new to SQL.

  • Why not just use a junction table? Normally, the junction table holds the foreign keys from the two parent tables and then any information that is specific to the relationship (not the other two tables).

    In a nutshell, it would be a LOT easier to query a properly designed database. (Been there, seen the mess that makes!) What's your argument against using a junction table? Just that it requires some space?

  • For you to have a lot of one thing that relates to one or more other things, you have to have a way to map between in them that allows for multiple entries on both sides.

    If we had a situation where you have a primary key in table A and a related column in table B, you can then ensure that there will only ever be one value in Table A. But, in Table B, you can have lots and lots of values for Table A, but only ONE for each value of Table B. You can reverse that relationship too. One value in Table B with lots of values in Table B. But, to make lots of values in Table A relate to lots of values in Table B, more than one relationship, right? You have to have a way to match a primary key from each table. That mechanism is the interim table.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Here's an example of a common many-to-many relationship.

    Assume we have many products, each which can be categorized in more than one category. Our primary keys are numeric IDs (ProductID and CategoryID respectively).

    dbo.Product (ProductID INT PRIMARY KEY, Name VARCHAR(50), ... )

    ProductID Name

    1 Pizza

    2 Tacos

    3 Curry

    dbo.Category (CategoryID INT PRIMARY KEY, Name VARCHAR(50), ... )

    CategoryID Name

    1 Dinners

    2 Frozen Foods

    3 Ethnic Foods

    There is no place you can put a single foreign key on either of the two tables that does not constrain you to a one-to-many relationship. If you create a category foreign key on the product, then that given product record can only be associated with one category. If you create a product foreign key on category, then only one product can ever appear in that category!

    That's why we create a junction table (informally known as a join table or a cross-reference table). We can create the primary key as a composite of the two primary keys from the tables we are relating. This allows us to assign products to as many categories as we'd like without allowing duplication (no need or reason to assign the same product to the same category more than once).

    CREATE TABLE dbo.ProductCategory

    (

    ProductID INT NOT NULL CONSTRAINT FK_ProductCategory_Product FOREIGN KEY REFERENCES dbo.Product(Product_ID)

    , CategoryID INT NOT NULL CONSTRAINT FK_ProductCategory_Category FOREIGN KEY REFERENCES dbo.Category(Category_ID)

    , CONSTRAINT PK_ProductCategory PRIMARY KEY CLUSTERED (ProductID, CategoryID)

    )

    We can now relate our 3 sample products to each of our sample categories.

    ProductID CategoryID

    1 1

    1 2

    1 3

    2 1

    2 2

    2 3

    3 1

    3 2

    3 3

    Hope this explains things in a clear manner.

  • This helps a lot, thank you everyone!

  • aajavaherian (5/5/2014)


    Hi, why do we need a junction table in a many to many relationship? Why can't everything be in just 2 tables?

    So we know what you're suggesting, show us an example schema of a many-to-many relationship between 2 tables using only 2 tables.

    With a normalized relational schema, you can only implement many-to-many relationship using 2 tables, if it's also a self-referencing relationship. For example, a Person table can have a Person_Relationship table linking employees that report to managers or children related to parent.

    However, if you have a Person table and an Address table, and a person can have multiple addresses, and an address can be shared by multiple people, then you need a 3rd Person_Address table.

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

  • if i was describing lets say classes and students where each student can be in more than one class and each class can have multiple students. Or how about just using Table A by itself. I'm sorry, I'm new to SQL.

    Table A

    ID Student Class

    1 Bob math

    2 Tim math

    3 Linda english

    4 Linda science

  • aajavaherian (5/5/2014)


    if i was describing lets say classes and students where each student can be in more than one class and each class can have multiple students. Or how about just using Table A by itself. I'm sorry, I'm new to SQL.

    Table A

    ID Student Class

    1 Bob math

    2 Tim math

    3 Linda english

    4 Linda science

    You have just created the same table as posted in the example above, except you added an ID column. The biggest difference is you have created it without normalization. What happens when Linda changes her name? How would you validate the text entries for this type of thing? You want to store the primary key of the other tables, not copies of the values.

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • If Linda changes her name, couldn't you just query her name and update the table with an update statement?

  • In that scenario an update would work yes, but you need to consider the data volumes involved and the impact on overhead.

    In the example of Student and Classes, yes you could have a compound key that links 2 tables however its highly inefficient when you consider 1000 students each taking 7 classes.

    In the Class table with a compound key (Class & StudentKey) would have 7000 rows with repeating data for class overlaps, the Class name is likely to be a text field of some sort, say 100 bytes, with integer keys that means you have a max Class table size of 7000*(104) = 730KB.

    However if you had an intersect table the Class Table would be 104*20=2080, the intersect table would be 7000*8 (2 integer keys) = 56K.

    Thus with an intersect structure the Class & intersect max table size is less than 64K, compared to the max possible size 730Kb.

    The scenario you propose doesn't take into account other factors like Teacher, Years, Classrooms, all of which could be a Many to many relationship, as teachers will teach multiple classes, across multiple years and class can be held in multiple locations.

    _________________________________________________________________________
    SSC Guide to Posting and Best Practices

  • aajavaherian (5/5/2014)


    If Linda changes her name, couldn't you just query her name and update the table with an update statement?

    That's one the biggest database design mistakes. First, there are a lot of students named 'Linda' or even 'Linda Lynn Schlossberg', so it's not unique. In a population of more than one million persons (across the entire public university system), there will be many collisions on FirstName + MiddleName + LastName + BirthDate. Demographic data makes a poor choice for a key.

    Another problem is that the keyword 'Linda' is contained throughout multiple tables and even exported into other external databases for which you don't have any update access. You need a key like SSN or StudentID that is static. Preferably you need to use a StudentID originated within the university so that they key can be shared with external database systems without issue of exposing SSN.

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

  • aajavaherian (5/5/2014)


    If Linda changes her name, couldn't you just query her name and update the table with an update statement?

    You could...but that defeats the purpose of relational data. Consider what happens when you have 100 tables with the name Linda. You would have to update every single table to hold the new value. What about the new guy on the team who doesn't know he needs to update table 100 and only updates 99 of them.

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • aajavaherian (5/5/2014)


    If Linda changes her name, couldn't you just query her name and update the table with an update statement?

    As others have said, that involves updating many rows when, with a junction table, you'd have to update only one.

    But the bigger issue probably comes in situations where it is not a matter of performance or taste. Check out deletion anomalies, where unnormalized data might simply be wiped out from your tables as a result of updates or deletes.

    See for example this page:

    http://www.dariopardo.com/data-modeling/paradigms/updateanomalies/[/url]

    Good luck. I became a true believer in normalization early on in my database education after I found out that it solved a books and authors problem I ran into at a job.

    - webrunner

    -------------------
    A SQL query walks into a bar and sees two tables. He walks up to them and asks, "Can I join you?"
    Ref.: http://tkyte.blogspot.com/2009/02/sql-joke.html

Viewing 15 posts - 1 through 14 (of 14 total)

You must be logged in to reply to this topic. Login to reply