﻿<?xml version='1.0' encoding='UTF-8'?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/"><channel><title>SQLServerCentral / SQL Server 2005 / T-SQL (SS2K5)  / Convoluted DeDuplicate Related Records / Latest Posts</title><generator>InstantForum.NET v2.9.0</generator><description>SQLServerCentral</description><link>http://www.sqlservercentral.com/Forums/</link><webMaster>notifications@sqlservercentral.com</webMaster><lastBuildDate>Wed, 22 May 2013 04:18:13 GMT</lastBuildDate><ttl>20</ttl><item><title>RE: Convoluted DeDuplicate Related Records</title><link>http://www.sqlservercentral.com/Forums/Topic1420296-338-1.aspx</link><description>Garbage In, Garbage Out.I must apologize for wasting your time &amp; not supplying representative data originally, so your solution worked based on what I posted earlier, but now i realize that many "families" of data do not have NULL dates.   :blush:Also, the MemID records are not sequential for a "family" of records. They are intermingled, so I made 1 group odd numbers, and one even numbersSo I've updated my original post.</description><pubDate>Thu, 14 Feb 2013 18:51:58 GMT</pubDate><dc:creator>homebrew01</dc:creator></item><item><title>RE: Convoluted DeDuplicate Related Records</title><link>http://www.sqlservercentral.com/Forums/Topic1420296-338-1.aspx</link><description>Here is one way, using running totals &amp; an additional column.1) Create a FamilyId column on Members_GD[code="sql"]alter table dbo.Members_GDadd FamilyId int[/code]2) Populate the FamilyId column using grotty double quirky update thingy:[code="sql"]declare @FamilyId int = 1declare @Changed int = 0update Members_GDset @FamilyId = FamilyId = @FamilyId + (		case 			when @Changed = 0				then 0			else 1			end		)	,@Changed = (		case 			when PaidDate is null				then 0			else 1			end		)[/code]3) Now that's in place, all that is required is a simple GROUP BY:[code="sql"]select MemId = min(MemId)	,JoinDate = min(JoinDate)	,PaidDate = max(PaidDate)from Members_GDgroup by FamilyId[/code]</description><pubDate>Thu, 14 Feb 2013 15:30:03 GMT</pubDate><dc:creator>Phil Parkin</dc:creator></item><item><title>Convoluted DeDuplicate Related Records</title><link>http://www.sqlservercentral.com/Forums/Topic1420296-338-1.aspx</link><description>I'm not sure the right terms here, but I have a puzzle, that is easy to see, but I'm stuck on how to actually code for it. We have duplicate member records that have been identified, but the problem is that they are sort of many-to-many, so that duplicate #7 is a duplicate of #6, and #6 is a duplicate of #4, so I don't see how to make the link between #7 and #4.Here's some EDITED sample data:[code="sql"]CREATE TABLE dbo.Duplicates_GD	(Ident int NOT NULL IDENTITY (1, 1),	MemIdNew int NOT NULL,	MemIdOld int NOT NULL )  ON [PRIMARY]CREATE TABLE dbo.Members_GD	(MemID int NOT NULL,	JoinDate datetime NULL,	PaidDate datetime NULL)  ON [PRIMARY]insert into Members_GD (MemID, JoinDate, PaidDate) values(1, '01-01-2013', '01-15-2013')insert into Members_GD (MemID, JoinDate, PaidDate) values(2, '02-01-2013', '02-15-2013')insert into Members_GD (MemID, JoinDate, PaidDate) values(3, '03-01-2013', '03-15-2013')insert into Members_GD (MemID, JoinDate, PaidDate) values(4, '04-01-2013', '04-15-2013')insert into Members_GD (MemID, JoinDate, PaidDate) values(5, '05-01-2013', '05-15-2013')insert into Members_GD (MemID, JoinDate, PaidDate) values(6, '06-01-2013', '06-15-2013')insert into Members_GD (MemID, JoinDate, PaidDate) values(7, '07-01-2013', '07-15-2013')insert into Members_GD (MemID, JoinDate, PaidDate) values(8, '08-01-2013', '08-15-2013')insert into Members_GD (MemID, JoinDate, PaidDate) values(9, '09-01-2013', '09-15-2013')insert into Members_GD (MemID, JoinDate, PaidDate) values(10, '10-01-2013', '10-15-2013')insert into Members_GD (MemID, JoinDate, PaidDate) values(11, '11-01-2013', '11-15-2013')insert into Members_GD (MemID, JoinDate, PaidDate) values(12, '12-01-2013', '12-15-2013')insert into Duplicates_GD (MemIdNew, MemIdOld) values (11,5)insert into Duplicates_GD (MemIdNew, MemIdOld) values(7,3)insert into Duplicates_GD (MemIdNew, MemIdOld) values(9,7)insert into Duplicates_GD (MemIdNew, MemIdOld) values(11,9)insert into Duplicates_GD (MemIdNew, MemIdOld) values(5,1)insert into Duplicates_GD (MemIdNew, MemIdOld) values(6,4)insert into Duplicates_GD (MemIdNew, MemIdOld) values(10,8)insert into Duplicates_GD (MemIdNew, MemIdOld) values(10,6)insert into Duplicates_GD (MemIdNew, MemIdOld) values(8,2)insert into Duplicates_GD (MemIdNew, MemIdOld) values(12,10)select * from Duplicates_GD order by MemIdNew, MemIdOldselect * from Members_GD order by MemID[/code]The even MemID (2,4,6,8,10,12)  are related as a chain of records  in the same "family" ie group1and odd MemID 1 through 11 are related as a chain of records in the same "family", ie group2.Goal: Update the oldest record's PaidDate from the most recent in each "family"then delete all the records in each "family" except for the oldest, as below:[code="sql"]Select * from Members_GDMemID	JoinDate	PaidDate1	2013-01-01 	2013-11-152	2013-01-08	2013-12-15	[/code]It is a one-time cleanup that needs to be done, so I'm not looking for the most efficient or elegant method, just something I can understand and possibly repeat if a similar problem is discovered later. I've been trying to populate temp tables and get distinct results, but not quite getting there.Thoughts ?</description><pubDate>Thu, 14 Feb 2013 14:10:41 GMT</pubDate><dc:creator>homebrew01</dc:creator></item></channel></rss>