How to loop through dataset to identify and flag duplicates

Question

How to loop through dataset to identify and flag duplicates

iain.nelson

SSC Enthusiast

Points: 161
More actions
September 20, 2012 at 2:52 am

#268009

Hi,
I have a job on to identify duplicate leads in a single dataset based on whether or not the creationdatetime of a record is within 30 seconds of the previous record.
I have attempted to split out into grouped sub sets but can't get it work. I think I am left with either a cursor or building a loop based ssis package. The criteria I have to follow is simply:
If the next ID's media code and phone number match the previous and the creationdatetime is less than 30 seconds, then flag as duplicate.
Any help greatly appreciated....

Viewing 5 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply

Ant-Green SSC Guru Points: 113567 More actions · Answer 1

Hi, welcome to SSC, if you would take a moment to read the second link in my signature block on posting questions it would help you get an answer quicker, but as this seems like a simple data problem it isn't to hard to recreate some play data.

Based on that I believe the problem is, the below should satisfy your problem.

DECLARE @Table TABLE (MediaCode INT, PhoneNumber CHAR(11), CreationDateTime DATETIME)

INSERT INTO @Table VALUES

(1,'01234567890','2012-01-01 00:00:00'),

(1,'01234567890','2012-01-01 00:00:30'),--Flag as duplicate

(1,'01234567890','2012-01-01 00:10:00'),

(1,'01234567890','2012-01-01 00:10:30'),--Flag as duplicate

(1,'01234567890','2012-01-01 00:20:00'),

(1,'01234567890','2012-01-01 00:20:30')--Flag as duplicate

select distinct

t2.*,

CASE WHEN DATEDIFF(SECOND,t1.CreationDateTime,t2.CreationDateTime) <= 30 THEN 'Duplicate' ELSE 'NotDuplicate' END AS Duplicate

from

@Table t1

inner join

@Table t2

on

t1.MediaCode = t2.MediaCode

and

t1.PhoneNumber = t2.PhoneNumber

and

t1.CreationDateTime < t2.CreationDateTime

where

DATEDIFF(SECOND,t1.CreationDateTime,t2.CreationDateTime) <= 30

iain.nelson SSC Enthusiast Points: 161 More actions · Answer 2

Thanks for the quick response!

Point taken on getting quick answers through provision of useful data, below is a script to provide an actual example of the issue I am having I have also revised my original criteria:

If the next ID's media code and phone number and createddate match the previous and the CREATIONDATETIME is less than 30 seconds, then flag as duplicate.

CREATE TABLE #Deduplicate(

ID int,

CreatedDateTime datetime,

CreatedDate datetime,

Media varchar(25),

ContactPhone varchar(25)

)

Insert into #Deduplicate

(ID,CreatedDateTime,CreatedDate,Media,ContactPhone)

Select 1,'Jan 01 2012 07:00:05AM','Jan 01 2012 00:00AM','TV','Anonymous' Union ALL

Select 2,'Jan 01 2012 07:00:10AM','Jan 01 2012 00:00AM','TV','Anonymous' Union ALL

Select 3,'Jan 01 2012 07:00:15AM','Jan 01 2012 00:00AM','TV','Anonymous' Union ALL

Select 4,'Jan 01 2012 07:00:20AM','Jan 01 2012 00:00AM','TV','Anonymous' Union ALL

Select 5,'Jan 01 2012 07:00:25AM','Jan 01 2012 00:00AM','TV','Anonymous' Union ALL

I think your solution still works using this data as a guide? Can you confirm?

Thanks again!

Ant-Green SSC Guru Points: 113567 More actions · Answer 3

yes my original solution will still work, my solution will return ID's 2,3,4,5 as duplicates.

iain.nelson SSC Enthusiast Points: 161 More actions · Answer 4

That's great,

Just tested and it works exactly as you explained.

Many thanks!