In general how do you know which fields to join on between 2 tables?

  • I know this is a broad question.

    I have 2 tables (no primary/foreign keys).  In general, with 2 tables, how do you know which fields to join on and who many fields do you use?

    From what I understand, the more fields you join on the more narrow you are making your data. But do you typically join fields that are identical with each other.  Say table 1 has a column STATES and table 2 has a column STATES. Would it be a good idea to join using STATES?

  • It depends...

    Why are you joining the 2 tables and what data does each table contain?  For example, if table1 contains sales data - and has a column named StateID, and table2 is the State column and contains StateID, StateMnemonic, StateName and you need to return the Mnemonic (e.g. TX, CA, WA) instead of the ID then you would join to the State table using the StateID.

    On the other hand - if the StateID were actually the mnemonic and you only needed to return that value then you would not need the join.

    Further - if table1 is for Claim data and table2 is ClaimDetail and you have the columns OrgID, GroupID, ClaimID, (other columns) in the claim table and the ClaimDetail table has OrgID, GroupID, ClaimID, DetailID, (other columns) then you would need to join on the 3 columns from the Claim table to get correct and accurate results.

    If your systems does not have any PK/FK definitions - hopefully there are indexes available, including a clustered index.  You might be able to figure out the PK based on any unique key constraints or indexes.

    One final note - do not trust the ID column is the *same* domain in every table.  In many systems this column is added to every table as an identity and has the same name in other tables - but is NOT the same value in those tables.  Hopefully you don't have a system that was designed that way.

    Jeffrey Williams
    “We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”

    ― Charles R. Swindoll

    How to post questions to get better answers faster
    Managing Transaction Logs

  • If, and that's a big if, the database is properly defined, you should be able to look at the foreign keys to determine what fields to link in most cases.

    Drew

    J. Drew Allen
    Business Intelligence Analyst
    Philadelphia, PA

  • I have the same issue. Due to merge replication, all foreign key relations were removed so that publications can be setup and scheduled in any order without issues. Most tables do have primary key. It's a daunting task for any SQL developer to write a query in the very beginning. Once getting used to the schema, it worked out pretty well.

  • As Professor Hill stated, "You gotta know the territory!"

    Sure, the business should need to tell us their rules.  But sometimes you have to drag that out of them, so you can figure out what the database is doing, and even more importantly what it should be doing.  Perhaps start by looking at joins in stored procedures to see if there are common patterns.

     

Viewing 5 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply