Finding Duplicated across several servers

  • The easiest way to do this would be to consolidate the data into a single database, from all servers. Create a single table with a GUID column in it, then import all the data into that. The GUIDs will keep each row unique, regardless of patient ID.

    Once you have it all in one table, then you can collect all the rows with duplicate patient IDs simply by joining that table to itself on patient ID, using GUID to differentiate rows.

    Something like:

    select *

    from dbo.MyConsolidatedTable as T1

    inner join dbo.MyConsolidatedTable as T2

    on T1.PatientID = T2.PatientID

    and T1.GUID != T2.GUID ;

    Then, you can use criteria other than the patient ID to test for duplicates. Things like name, address, social security number (or local equivalent if not US and some comparable value is available), and so on.

    Don't just check for exact duplicates. The same person might be in on server as "John Smith" and another as "Johnny Smith", and an exact match on first name will miss that one.

    Does that help?

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • Hi GSquared. thanks for the reply. It did help. I thought about doing something similar, but wanted to see if there was an easier way (not to say that this is terribly difficult or anything). Thanks again for the help.

  • You're welcome.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

Viewing 3 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply