Finding Duplicated across several servers

Question

Finding Duplicated across several servers

Michael Lee-158245

SSC Eights!

Points: 980
More actions
June 19, 2012 at 12:09 pm

#257718

Hi,
I have several db servers that service several of our clinics (each clinic using their own server,as of now). NOTE: The patientid is the key on the table I lloking up, so there will be no duplicates within the same database. I'm more concerned that another clinic may have used a patientid that another clinic may have used. I will soon move all data over to use One server/DB so I wanted to make sure I don't run into a problem when I do move over the data. In preparation for that, I wanted to do some research into the data we have now and see if we have duplicate clientid's within our servers. I have the following start off query which works for :
SELECT PATIENTS.[PatientID], PATIENTS.[LastName], PATIENTS.[FirstName]
FROM Independence_822_APG.dbo.PATIENTS
WHERE (((PATIENTS.[PatientID])
In (SELECT [PatientID] FROM Independence_822_APG.dbo.[PATIENTS] As Tmp GROUP BY [PatientID] HAVING Count(*)>1 )))
ORDER BY PATIENTS.[PatientID]
What I'd like to do is use a join all on the above query along with the queries for the other servers. For example (duplicate look up not used in example):
Select patientid from Clinic1_822_APG.dbo.PATIENTS
union all
Select patientid from Clinic2_822_APG.dbo.PATIENTS
Using the first query above, I was unsure how to code the query to get the duplicates across the server, as the one above will only get dupes within it's own db. Thanks for any advice.
Michael

Viewing 4 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply

GSquared SSC Guru Points: 260824 More actions · Answer 1

The easiest way to do this would be to consolidate the data into a single database, from all servers. Create a single table with a GUID column in it, then import all the data into that. The GUIDs will keep each row unique, regardless of patient ID.

Once you have it all in one table, then you can collect all the rows with duplicate patient IDs simply by joining that table to itself on patient ID, using GUID to differentiate rows.

Something like:

select *

from dbo.MyConsolidatedTable as T1

inner join dbo.MyConsolidatedTable as T2

on T1.PatientID = T2.PatientID

and T1.GUID != T2.GUID ;

Then, you can use criteria other than the patient ID to test for duplicates. Things like name, address, social security number (or local equivalent if not US and some comparable value is available), and so on.

Don't just check for exact duplicates. The same person might be in on server as "John Smith" and another as "Johnny Smith", and an exact match on first name will miss that one.

Does that help?

- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread

"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

Michael Lee-158245 SSC Eights! Points: 980 More actions · Answer 2

Hi GSquared. thanks for the reply. It did help. I thought about doing something similar, but wanted to see if there was an easier way (not to say that this is terribly difficult or anything). Thanks again for the help.

GSquared SSC Guru Points: 260824 More actions · Answer 3

You're welcome.

- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread

"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon