April 28, 2005 at 10:33 am
HI,
I have a table with 20,000 names and addresses and I need to ensure that there are no duplicates on it. I have been querying previously on firstname, surname, address1 and postcode. I have been trying to think of what the syntax could be to write this query - any ideas?
thanks,
Paul
April 28, 2005 at 12:49 pm
If you have a unique id in the table you could try this:
DELETE FROM namesaddresses
WHERE id NOT IN (
SELECT id FROM (
(SELECT DISTINCT id, firstname, surname, address1, postcode
FROM namesaddresses)))
Or if there is no RI on the table you could even Try something like this.
SELECT DISTINCT firstname, surname, address1, postcode
INTO ##temp
FROM namesaddresses
TRUNCATE TABLE namesaddresses
INSERT INTO namesaddresses (firstname, surname, address1,postcode)
SELECT firstname, surname, address1, postcode
FROM ##temp
April 29, 2005 at 6:09 am
I would use the REPLACE function to remove blanks from the address and postcode:
REPLACE(address1, ' ', ''), REPLACE(postcode, ' ','')
but I can guarantee there will still be loads of duplicates due to typos, ambiguous addresses etc. These, unfortunately, are difficult to deal with in SQL. Some address dedupe programs, for example, remove vowels from words but you cannot use 'a','e','i','o','u' as a matching pattern in SQL string functions.
Maybe your best approach, time permitting, is to have several attempts at it, examining the results after each attempt and refiniing/adjusting the query accordingly. Personally, I would send the names and addresses to a commercial deduplication agency.
Viewing 3 posts - 1 through 3 (of 3 total)
You must be logged in to reply to this topic. Login to reply
This website stores cookies on your computer.
These cookies are used to improve your website experience and provide more personalized services to you, both on this website and through other media.
To find out more about the cookies we use, see our Privacy Policy