September 20, 2012 at 2:52 am
Hi,
I have a job on to identify duplicate leads in a single dataset based on whether or not the creationdatetime of a record is within 30 seconds of the previous record.
I have attempted to split out into grouped sub sets but can't get it work. I think I am left with either a cursor or building a loop based ssis package. The criteria I have to follow is simply:
If the next ID's media code and phone number match the previous and the creationdatetime is less than 30 seconds, then flag as duplicate.
Any help greatly appreciated....
September 20, 2012 at 3:08 am
Hi, welcome to SSC, if you would take a moment to read the second link in my signature block on posting questions it would help you get an answer quicker, but as this seems like a simple data problem it isn't to hard to recreate some play data.
Based on that I believe the problem is, the below should satisfy your problem.
DECLARE @Table TABLE (MediaCode INT, PhoneNumber CHAR(11), CreationDateTime DATETIME)
INSERT INTO @Table VALUES
(1,'01234567890','2012-01-01 00:00:00'),
(1,'01234567890','2012-01-01 00:00:30'),--Flag as duplicate
(1,'01234567890','2012-01-01 00:10:00'),
(1,'01234567890','2012-01-01 00:10:30'),--Flag as duplicate
(1,'01234567890','2012-01-01 00:20:00'),
(1,'01234567890','2012-01-01 00:20:30')--Flag as duplicate
select distinct
t2.*,
CASE WHEN DATEDIFF(SECOND,t1.CreationDateTime,t2.CreationDateTime) <= 30 THEN 'Duplicate' ELSE 'NotDuplicate' END AS Duplicate
from
@Table t1
inner join
@Table t2
on
t1.MediaCode = t2.MediaCode
and
t1.PhoneNumber = t2.PhoneNumber
and
t1.CreationDateTime < t2.CreationDateTime
where
DATEDIFF(SECOND,t1.CreationDateTime,t2.CreationDateTime) <= 30
September 20, 2012 at 3:51 am
Thanks for the quick response!
Point taken on getting quick answers through provision of useful data, below is a script to provide an actual example of the issue I am having I have also revised my original criteria:
If the next ID's media code and phone number and createddate match the previous and the CREATIONDATETIME is less than 30 seconds, then flag as duplicate.
CREATE TABLE #Deduplicate(
ID int,
CreatedDateTime datetime,
CreatedDate datetime,
Media varchar(25),
ContactPhone varchar(25)
)
Insert into #Deduplicate
(ID,CreatedDateTime,CreatedDate,Media,ContactPhone)
Select 1,'Jan 01 2012 07:00:05AM','Jan 01 2012 00:00AM','TV','Anonymous' Union ALL
Select 2,'Jan 01 2012 07:00:10AM','Jan 01 2012 00:00AM','TV','Anonymous' Union ALL
Select 3,'Jan 01 2012 07:00:15AM','Jan 01 2012 00:00AM','TV','Anonymous' Union ALL
Select 4,'Jan 01 2012 07:00:20AM','Jan 01 2012 00:00AM','TV','Anonymous' Union ALL
Select 5,'Jan 01 2012 07:00:25AM','Jan 01 2012 00:00AM','TV','Anonymous' Union ALL
I think your solution still works using this data as a guide? Can you confirm?
Thanks again!
September 20, 2012 at 4:00 am
yes my original solution will still work, my solution will return ID's 2,3,4,5 as duplicates.
September 20, 2012 at 4:07 am
That's great,
Just tested and it works exactly as you explained.
Many thanks!
Viewing 5 posts - 1 through 5 (of 5 total)
You must be logged in to reply to this topic. Login to reply