Removing Duplicates from a LEFT JOIN

  • I'm very new to Access and am simply trying to query all records from LEFT TABLE whether they have linked records in RIGHT TABLE or not. The LEFT JOIN I'm using is displaying duplicates of the records in A (if a record in A has 5 related/linked records in B, record A is showing up 5 times). I only want to display the records in A one time.

    Current query:

    SELECT A.ID, A.variable1, A.variable2, B.ID, B.variable1, B.variable2

    FROM A LEFT JOIN B ON A.ID = B.ID

    ORDER BY A.variable1;

    Any help is very much appreciated! Thanks!

  • SELECT DISTINCT will group by every field, so if all fields you are returning are the same on the 5 duplicate records, it will do what you want.

  • The problem is that only the fields in table A are the same for each record. Fields in table B are different for each record, so running DISTINCT still gives duplicate records from column A.

  • Then the question is WHICH record do you want from table B? Do you want the first record, the last record, the maximum record?

  • Like michael mentioned, it sounds like you're going to need to use some aggregate function to get rid of your "dupes". Since you ARE in access - perhaps consider the FIRST, MIN or MAX functions depending on what is appropriate. SQL Server doesn't allow for FIRST, so you'd have to use one of the other SQL aggregates

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • Assuming I could use the aggregate of any variable with numbers, say A.variable1, what would be the modification to the code below? I've tried putting a MAX in for one of the numeric variables, but when I then use a GROUP BY (assuming I would add this to aggregate), I get an error saying "You tried to execute a query that does not include the specified expression 'Variable1' as part of an aggregate function". If I include all SELECT variables in the GROUP BY section (excluding the variable that gets MAX'd), the query works, but returns the same output with duplicates.

    SELECT A.ID, A.variable1, A.variable2, B.ID, B.variable1, B.variable2

    FROM A LEFT JOIN B ON A.ID = B.ID

    ORDER BY A.variable1;

    Thanks again!

  • Keep in mind that "max" works against non-numeric values too.

    The right "group by", would definitely handle it.

    In 2005, you could use another approach as well.

    SELECT A.ID,

    A.variable1,

    A.variable2,

    B.ID,

    B.variable1,

    B.variable2

    FROM

    A

    LEFT JOIN (

    select B .*,

    RowNumber() over (partition by ID order by variable1,variable2) RN

    from B) Bnew

    ON A.ID = BNew.ID

    WHERE

    BNew.RN=1 or

    Bnew.RN is null

    ORDER BY A.variable1;

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

Viewing 7 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply