Get double records returned from SELECT because I failed to save the PK.

Question

Post reply

Get double records returned from SELECT because I failed to save the PK.

tim8w

Old Hand

Points: 388
More actions
June 10, 2021 at 1:38 pm

#3894428

I have two tables. A lookup table that has a PK and various columns. Looks something like this:
tblOperator
PK OperatorID, OperatorName, BadgeID

My main table saves the BadgeID value from the tblOperator instead of the PK OperatorID.
So when I do an INNER JOIN between tblMain and tblOperator, I get duplicate records returned.

Is there a way to not get the duplicate records, other than to go back to myMain Table and save the FK OperatorID instead of the value of BadgeID.

Thanks,
Tim

Viewing 8 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply

Phil Parkin SSC Guru Points: 247225 More actions · Answer 1

In my opinion, this is something which you should definitely fix.

But if you use

SELECT DISTINCT cols

rather than

SELECT cols

any duplicate rows will be removed.

Mr. Brian Gale One Orange Chip Points: 25026 More actions · Answer 2

I was thinking about this and is the data accurate/valid as it is currently?

What I mean is since you are getting duplicates, that makes me think that a single badgeID can have multiple OperatorID's. So the badge ID is shared between different operators. Due to this, how do you map up data from the Main table to an operator or does that data not really matter?

What I mean is I'm going to assume that Main is a time tracking table. Tracks when the operator starts and ends their work day. So when you select and do that join, you see badge ID 123 starts at 8:02 AM and 8:05 AM and ends at 4:15 Pm and 4:22 PM. badge ID 123 corresponds to Steve and Jason. Which time corresponds to Steve?

The above is all just my opinion on what you should do.
As with all advice you find on a random internet forum - you shouldn't blindly follow it. Always test on a test server to see if there is negative side effects before making changes to live!
I recommend you NEVER run "random code" you found online on any system you care about UNLESS you understand and can verify the code OR you don't care if the code trashes your system.

tim8w Old Hand Points: 388 More actions · Answer 3

Brian,

OperatorBadgeID are only used by one Operator, not shared. Some do work in more than one area of the plant, so, yes, there can be duplicate entries, one for each area they work. I have tried to use DISTINCT like Phil suggested above, but I get a syntax error.

"ORDER BY items must appear in the select list if SELECT DISTINCT is specified."

Can't seem to get around it, no matter what I try. Anyway, in my SELECT, I don't really care which of the duplicates I get since I am looking for their name, which is the same for both entries.

Mr. Brian Gale One Orange Chip Points: 25026 More actions · Answer 4

If operator ID and badge ID are unique in the Operator table (ie it is a 1:1 mapping of operator ID to badge ID), then that isn't going to be the problem with duplicates. Your problem then is on the MAIN table which can be confirmed by finding a duplicate value and doing a SELECT COUNT(1) FROM tblMain WHERE badgeID = <duplicate badgeID> (replacing <duplicate badgeID> with the duplicate you are seeing).

So switching it over to using the PK that you have on the Operator table is just going to make better use of the index (so I DO recommend like Phil did that you correct that)..

As for the error you are getting, your ORDER BY column must appear in the SELECT list if you are using a SELECT DISTINCT. There are ways around this, such as nested SELECTs or CTE's. It really depends on your use case though. Do you need your data sorted by a column that isn't being presented to the end user? If not, leave out the ORDER BY. If you DO need it sorted by that column AND you are not planning on presenting it to the end user, then rewrite the query to something like:

SELECT DISTINCT *
FROM (
<your current query here without the DISTINCT>
) AS MainOperator

The above is all just my opinion on what you should do.
As with all advice you find on a random internet forum - you shouldn't blindly follow it. Always test on a test server to see if there is negative side effects before making changes to live!
I recommend you NEVER run "random code" you found online on any system you care about UNLESS you understand and can verify the code OR you don't care if the code trashes your system.

tim8w Old Hand Points: 388 More actions · Answer 5

Brian,

OperatorID is the PK and it is unique, but the same Operator can work in multiple areas, which is why I need DISTINCT somehow. For this select, I don't care which OperatorID I get, the corresponding OperatorName is the same.

Mr. Brian Gale One Orange Chip Points: 25026 More actions · Answer 6

So since you are ordering by a column you are not using in the SELECT list, you have 3 options if you are using the DISTINCT keyword:

1 - add the column(s) from the ORDER BY to the SELECT list

2 - remove the ORDER BY

3 - use a CTE or nested SELECT and put the DISTINCT on the final data set as in my example

The above is all just my opinion on what you should do.
As with all advice you find on a random internet forum - you shouldn't blindly follow it. Always test on a test server to see if there is negative side effects before making changes to live!
I recommend you NEVER run "random code" you found online on any system you care about UNLESS you understand and can verify the code OR you don't care if the code trashes your system.

Jeffrey Williams SSC Guru Points: 90347 More actions · Answer 7

tim8w wrote:

OperatorBadgeID are only used by one Operator, not shared. Some do work in more than one area of the plant, so, yes, there can be duplicate entries, one for each area they work. I have tried to use DISTINCT like Phil suggested above, but I get a syntax error.

What are you using to determine that these entries are duplicates? I would assume that there is something different on each row in the main table - for example, the location - the badge in time - the badge out time - etc...

If that is the case, you can solve this without using DISTINCT. You could use ROW_NUMBER() - and use a TOP 1 WITH TIES, but it depends on the actual query.

Can you post the query?

Jeffrey Williams
“We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”

― Charles R. Swindoll

How to post questions to get better answers faster
Managing Transaction Logs