• I've continued to scratch my head over this one, and I think I have a better solution now. What was troubling me is the fact that there is no good way to identify the "original" row from a "duplicate" row, especially with no row ID value, such as a primary key, for the set of data. In the end I decided the only way to approach this was to compare every row with every other row for the same Student name to see if any of the pertinent columns match, and if there is a match on any of them, flag it as a possible duplicate row. This still requires some way to identify the rows uniquely, so I added a new CTE to supply a unique row number for each row, and then in the main query I use a series of subqueries and a FOR XML trick I learned awhile back to check for likely duplicate rows and list their row numbers.

    I feel pretty confident that this will yield meaningful data, though it might take awhile to run against "real world" data because of all the subqueries that are essentially partial cross-joins for the complete set of data.

    In the example below I added a few rows for the student "Jack" to represent most of the variations in populated vs. null data points.

    with

    MyData as

    (SELECT 'Alison' as Student, NULL as IDNo, '1111' as Tel3, '4444' as Tel1 UNION ALL

    SELECT 'Alison' as Student, '80405088' as IDNo, '1111' as Tel3, '4444' as Tel1 UNION ALL

    SELECT 'Amber' as Student, NULL as IDNo, '5555' as Tel3, '6666' as Tel1 UNION ALL

    SELECT 'Amber' as Student, NULL as IDNo, '7777' as Tel3, '7777' as Tel1 UNION ALL

    SELECT 'Carmen' as Student, NULL as IDNo, '8888' as Tel3, '8888' as Tel1 UNION ALL

    SELECT 'Carmen' as Student, '20083085' as IDNo, '9999' as Tel3, '8888' as Tel1

    union all select 'Jack', '20001111', '1212', '3232'

    union all select 'Jack', '20001111', '1212', '3232'

    union all select 'Jack', null, '1212', '3232'

    union all select 'Jack', null, null, '3232'

    union all select 'Jack', null, null, null

    union all select 'Jack', '20001111', null, null

    union all select 'Jack', null, '3232', '1212'),

    MyDataWithRowID as

    (select RowID = row_number() over (order by (select null)), Student, IDNo, Tel3, Tel1 from MyData)

    select

    RowID,

    Student,

    IDNo,

    Tel3,

    Tel1,

    IDNo_MatchingRows = stuff((select ', ' + cast(RowID as varchar) from MyDataWithRowID m1 where m.Student = m1.Student and m.RowID <> m1.RowID and m.IDNo = m1.IDNo order by m1.RowID for xml path(''), type).value('.', 'varchar(max)'), 1, 2, ''),

    Tel3_MatchingRows = stuff((select ', ' + cast(RowID as varchar) from MyDataWithRowID m1 where m.Student = m1.Student and m.RowID <> m1.RowID and m.Tel3 = m1.Tel3 order by m1.RowID for xml path(''), type).value('.', 'varchar(max)'), 1, 2, ''),

    Tel1_MatchingRows = stuff((select ', ' + cast(RowID as varchar) from MyDataWithRowID m1 where m.Student = m1.Student and m.RowID <> m1.RowID and m.Tel1 = m1.Tel1 order by m1.RowID for xml path(''), type).value('.', 'varchar(max)'), 1, 2, ''),

    Tel1ToTel3_MatchingRows = stuff((select ', ' + cast(RowID as varchar) from MyDataWithRowID m1 where m.Student = m1.Student and m.RowID <> m1.RowID and (m.Tel1 = m1.Tel3 or m.Tel3 = m1.Tel1) order by m1.RowID for xml path(''), type).value('.', 'varchar(max)'), 1, 2, ''),

    All_Identified_Duplicate_Rows = stuff((select ', ' + cast(RowID as varchar) from MyDataWithRowID m1 where m.Student = m1.Student and m.RowID <> m1.RowID and (m.IDNo = m1.IDNo or m.Tel3 = m1.Tel3 or m.Tel1 = m1.Tel1 or m.Tel1 = m1.Tel3 or m.Tel3 = m1.Tel1) order by m1.RowID for xml path(''), type).value('.', 'varchar(max)'), 1, 2, ''),

    Tel1_Equals_Tel3 = case when m.Tel1 = m.Tel3 then 'Yes' else 'No' end

    from

    MyDataWithRowID m