Get double records returned from SELECT because I failed to save the PK.

  • I have two tables. A lookup table that has a PK and various columns. Looks something like this:

    tblOperator

    PK OperatorID, OperatorName, BadgeID

     

    My main table saves the BadgeID value from the tblOperator instead of the PK OperatorID.

    So when I do an INNER JOIN between tblMain and tblOperator, I get duplicate records returned.

     

    Is there a way to not get the duplicate records, other than to go back to myMain Table and save the FK OperatorID instead of the value of BadgeID.

     

    Thanks,

    Tim

     

  • In my opinion, this is something which you should definitely fix.

    But if you use

    SELECT DISTINCT cols

    rather than

    SELECT cols

    any duplicate rows will be removed.

    If the answer to your question can be found with a brief Google search, please perform the search yourself, rather than expecting one of the SSC members to do it for you.
    See https://www.sqlservercentral.com/articles/forum-etiquette-how-to-post-datacode-on-a-forum-to-get-the-best-help/ for details of how to post T-SQL code-related questions.

  • I was thinking about this and is the data accurate/valid as it is currently?

    What I mean is since you are getting duplicates, that makes me think that a single badgeID can have multiple OperatorID's.  So the badge ID is shared between different operators.  Due to this, how do you map up data from the Main table to an operator or does that data not really matter?

    What I mean is I'm going to assume that Main is a time tracking table.  Tracks when the operator starts and ends their work day.  So when you select and do that join, you see badge ID 123 starts at 8:02 AM and 8:05 AM and ends at 4:15 Pm and 4:22 PM.  badge ID 123 corresponds to Steve and Jason.  Which time corresponds to Steve?

  • Brian,

    OperatorBadgeID are only used by one Operator, not shared. Some do work in more than one area of the plant, so, yes, there can be duplicate entries, one for each area they work. I have tried to use DISTINCT like Phil suggested above, but I get a syntax error.

     

    "ORDER BY items must appear in the select list if SELECT DISTINCT is specified."

     

    Can't seem to get around it, no matter what I try. Anyway, in my SELECT, I don't really care which of the duplicates I get since I am looking for their name, which is the same for both entries.

  • If operator ID and badge ID are unique in the Operator table (ie it is a 1:1 mapping of operator ID to badge ID), then that isn't going to be the problem with duplicates.  Your problem then is on the MAIN table which can be confirmed by finding a duplicate value and doing a SELECT COUNT(1) FROM tblMain WHERE badgeID = <duplicate badgeID> (replacing <duplicate badgeID> with the duplicate you are seeing).

    So switching it over to using the PK that you have on the Operator table is just going to make better use of the index (so I DO recommend like Phil did that you correct that)..

    As for the error you are getting, your ORDER BY column must appear in the SELECT list if you are using a SELECT DISTINCT.  There are ways around this, such as nested SELECTs or CTE's.  It really depends on your use case though.  Do you need your data sorted by a column that isn't being presented to the end user?  If not, leave out the ORDER BY.  If you DO need it sorted by that column AND you are not planning on presenting it to the end user, then rewrite the query to something like:

    SELECT DISTINCT *
    FROM (
    <your current query here without the DISTINCT>
    ) AS MainOperator
  • Brian,

    OperatorID is the PK and it is unique, but the same Operator can work in multiple areas, which is why I need DISTINCT somehow. For this select, I don't care which OperatorID I get, the corresponding OperatorName is the same.

  • So since you are ordering by a column you are not using in the SELECT list, you have 3 options if you are using the DISTINCT keyword:

    1 - add the column(s) from the ORDER BY to the SELECT list

    2 - remove the ORDER BY

    3 - use a CTE or nested SELECT and put the DISTINCT on the final data set as in my example

  • tim8w wrote:

    OperatorBadgeID are only used by one Operator, not shared. Some do work in more than one area of the plant, so, yes, there can be duplicate entries, one for each area they work. I have tried to use DISTINCT like Phil suggested above, but I get a syntax error.

    What are you using to determine that these entries are duplicates?  I would assume that there is something different on each row in the main table - for example, the location - the badge in time - the badge out time - etc...

    If that is the case, you can solve this without using DISTINCT.  You could use ROW_NUMBER() - and use a TOP 1 WITH TIES, but it depends on the actual query.

    Can you post the query?

     

    Jeffrey Williams
    Problems are opportunities brilliantly disguised as insurmountable obstacles.

    How to post questions to get better answers faster
    Managing Transaction Logs

Viewing 8 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply