SQL Clone
SQLServerCentral is supported by Redgate
 
Log in  ::  Register  ::  Not logged in
 
 
 


Convoluted DeDuplicate Related Records


Convoluted DeDuplicate Related Records

Author
Message
homebrew01
homebrew01
SSCarpal Tunnel
SSCarpal Tunnel (4.8K reputation)SSCarpal Tunnel (4.8K reputation)SSCarpal Tunnel (4.8K reputation)SSCarpal Tunnel (4.8K reputation)SSCarpal Tunnel (4.8K reputation)SSCarpal Tunnel (4.8K reputation)SSCarpal Tunnel (4.8K reputation)SSCarpal Tunnel (4.8K reputation)

Group: General Forum Members
Points: 4783 Visits: 9108
I'm not sure the right terms here, but I have a puzzle, that is easy to see, but I'm stuck on how to actually code for it. We have duplicate member records that have been identified, but the problem is that they are sort of many-to-many, so that duplicate #7 is a duplicate of #6, and #6 is a duplicate of #4, so I don't see how to make the link between #7 and #4.

Here's some EDITED sample data:

CREATE TABLE dbo.Duplicates_GD
(Ident int NOT NULL IDENTITY (1, 1),
MemIdNew int NOT NULL,
MemIdOld int NOT NULL ) ON [PRIMARY]

CREATE TABLE dbo.Members_GD
(MemID int NOT NULL,
JoinDate datetime NULL,
PaidDate datetime NULL) ON [PRIMARY]

insert into Members_GD (MemID, JoinDate, PaidDate) values(1, '01-01-2013', '01-15-2013')
insert into Members_GD (MemID, JoinDate, PaidDate) values(2, '02-01-2013', '02-15-2013')
insert into Members_GD (MemID, JoinDate, PaidDate) values(3, '03-01-2013', '03-15-2013')
insert into Members_GD (MemID, JoinDate, PaidDate) values(4, '04-01-2013', '04-15-2013')
insert into Members_GD (MemID, JoinDate, PaidDate) values(5, '05-01-2013', '05-15-2013')
insert into Members_GD (MemID, JoinDate, PaidDate) values(6, '06-01-2013', '06-15-2013')
insert into Members_GD (MemID, JoinDate, PaidDate) values(7, '07-01-2013', '07-15-2013')
insert into Members_GD (MemID, JoinDate, PaidDate) values(8, '08-01-2013', '08-15-2013')
insert into Members_GD (MemID, JoinDate, PaidDate) values(9, '09-01-2013', '09-15-2013')
insert into Members_GD (MemID, JoinDate, PaidDate) values(10, '10-01-2013', '10-15-2013')
insert into Members_GD (MemID, JoinDate, PaidDate) values(11, '11-01-2013', '11-15-2013')
insert into Members_GD (MemID, JoinDate, PaidDate) values(12, '12-01-2013', '12-15-2013')

insert into Duplicates_GD (MemIdNew, MemIdOld) values (11,5)
insert into Duplicates_GD (MemIdNew, MemIdOld) values(7,3)
insert into Duplicates_GD (MemIdNew, MemIdOld) values(9,7)
insert into Duplicates_GD (MemIdNew, MemIdOld) values(11,9)
insert into Duplicates_GD (MemIdNew, MemIdOld) values(5,1)

insert into Duplicates_GD (MemIdNew, MemIdOld) values(6,4)
insert into Duplicates_GD (MemIdNew, MemIdOld) values(10,8)
insert into Duplicates_GD (MemIdNew, MemIdOld) values(10,6)
insert into Duplicates_GD (MemIdNew, MemIdOld) values(8,2)
insert into Duplicates_GD (MemIdNew, MemIdOld) values(12,10)

select * from Duplicates_GD order by MemIdNew, MemIdOld
select * from Members_GD order by MemID




The even MemID (2,4,6,8,10,12) are related as a chain of records in the same "family" ie group1
and odd MemID 1 through 11 are related as a chain of records in the same "family", ie group2.

Goal: Update the oldest record's PaidDate from the most recent in each "family"
then delete all the records in each "family" except for the oldest, as below:

Select * from Members_GD

MemID JoinDate PaidDate
1 2013-01-01 2013-11-15
2 2013-01-08 2013-12-15



It is a one-time cleanup that needs to be done, so I'm not looking for the most efficient or elegant method, just something I can understand and possibly repeat if a similar problem is discovered later. I've been trying to populate temp tables and get distinct results, but not quite getting there.

Thoughts ?



Phil Parkin
Phil Parkin
SSCoach
SSCoach (18K reputation)SSCoach (18K reputation)SSCoach (18K reputation)SSCoach (18K reputation)SSCoach (18K reputation)SSCoach (18K reputation)SSCoach (18K reputation)SSCoach (18K reputation)

Group: General Forum Members
Points: 18225 Visits: 20393
Here is one way, using running totals & an additional column.

1) Create a FamilyId column on Members_GD

alter table dbo.Members_GD
add FamilyId int



2) Populate the FamilyId column using grotty double quirky update thingy:

declare @FamilyId int = 1
declare @Changed int = 0

update Members_GD
set @FamilyId = FamilyId = @FamilyId + (
case
when @Changed = 0
then 0
else 1
end
)
,@Changed = (
case
when PaidDate is null
then 0
else 1
end
)



3) Now that's in place, all that is required is a simple GROUP BY:

select MemId = min(MemId)
,JoinDate = min(JoinDate)
,PaidDate = max(PaidDate)
from Members_GD
group by FamilyId




Help us to help you. For better, quicker and more-focused answers to your questions, consider following the advice in this link.

If the answer to your question can be found with a brief Google search, please perform the search yourself, rather than expecting one of the SSC members to do it for you.

Please surround any code or links you post with the appropriate IFCode formatting tags. It helps readability a lot.
homebrew01
homebrew01
SSCarpal Tunnel
SSCarpal Tunnel (4.8K reputation)SSCarpal Tunnel (4.8K reputation)SSCarpal Tunnel (4.8K reputation)SSCarpal Tunnel (4.8K reputation)SSCarpal Tunnel (4.8K reputation)SSCarpal Tunnel (4.8K reputation)SSCarpal Tunnel (4.8K reputation)SSCarpal Tunnel (4.8K reputation)

Group: General Forum Members
Points: 4783 Visits: 9108
Garbage In, Garbage Out.

I must apologize for wasting your time & not supplying representative data originally, so your solution worked based on what I posted earlier, but now i realize that many "families" of data do not have NULL dates. Blush

Also, the MemID records are not sequential for a "family" of records. They are intermingled, so I made 1 group odd numbers, and one even numbers

So I've updated my original post.



Go


Permissions

You can't post new topics.
You can't post topic replies.
You can't post new polls.
You can't post replies to polls.
You can't edit your own topics.
You can't delete your own topics.
You can't edit other topics.
You can't delete other topics.
You can't edit your own posts.
You can't edit other posts.
You can't delete your own posts.
You can't delete other posts.
You can't post events.
You can't edit your own events.
You can't edit other events.
You can't delete your own events.
You can't delete other events.
You can't send private messages.
You can't send emails.
You can read topics.
You can't vote in polls.
You can't upload attachments.
You can download attachments.
You can't post HTML code.
You can't edit HTML code.
You can't post IFCode.
You can't post JavaScript.
You can post emoticons.
You can't post or upload images.

Select a forum

































































































































































SQLServerCentral


Search