How to collapse Matched Pairs into Groups?

  • I'm sure I'm missing something because this feels like it should be easy but I just can't get my head around it.

    I've got a whole bunch of records I need to dedupe based on a series of potential criteria. I've built a cte (not included here because I don't think it's relevant but am happy to add it if it helps) that pulls out all the paired ids from that data. Now I need to collapse those pairs into a series of groups.

    So given the following sample data:-

    Create Table #DummyData

    (P1 Int,

    P2 Int)

    Insert into #DummyData values (1,1)

    Insert into #DummyData values (2,2)

    Insert into #DummyData values (3,3)

    Insert into #DummyData values (4,3)

    Insert into #DummyData values (3,4)

    Insert into #DummyData values (4,4)

    Insert into #DummyData values (5,5)

    Insert into #DummyData values (6,6)

    Insert into #DummyData values (6,7)

    Insert into #DummyData values (6,8)

    Insert into #DummyData values (7,6)

    Insert into #DummyData values (7,7)

    Insert into #DummyData values (7,8)

    Insert into #DummyData values (8,6)

    Insert into #DummyData values (8,7)

    Insert into #DummyData values (8,8)

    Insert into #DummyData values (3,8)

    Insert into #DummyData values (8,3)

    I would want to return the following:-

    ID Group

    1 1

    2 2

    3 3

    4 3

    5 4

    6 5

    7 5

    8 5

    3 6

    8 6

    A few rules which might not be obvious from the data:-

    1. An arbitrary sequence will be fine for the Group Identifier, as long as it uniquely identifies the group. I envisaged using a Dense_Rank but I'm open to any suggestion.

    2. A pairing on one direction will always have an associated pairing in the other direction. I.e. if there's a pairing of 3,4 then there must be a pairing of 4,3. I'm happy that the data I'm using conforms to this rule.

    3. An ID can belong to more than one group. E.g. in the example, ID 3 belongs to group 3 (alongside ID 4) and Group 6 (alongside ID 8). Similarly ID 8 belongs to Group 5 (alongside IDs 6 and 7) and Group 6 (alongside ID 3).

    This feels like it should be easy but I've been mucking around with it all morning and can't quite seem to crack it. I could fall back on a cursor but I suspect that will be a performance hog.

    any suggestion? Am I missing anything obvious?

  • What defines a pair? This just seems to be totally random what values get paired. There does not appear to be any solid business rule here.

    Why is 8,5 a pair? What about 6,5? 7,5? These don't exist in your data but are somehow in the output. The biggest problem here is you need to provide clear business logic.

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • Hi Sean. Thanks for the response and sorry if I was unclear. Let me see if I can clarify that a bit.

    The values in the second column on the output are entirely arbitrary. They're there to identify the group but aren't generated from the input values in any way. So, for example, IDs 3 and 4 both ended up in group 3 because of the existence of these records:-

    Insert into #DummyData values (4,3)

    Insert into #DummyData values (3,4)It didn't need to be called Group 3, it could have been called Group 99 or Group PinkElephant for all I care. I just want to know that there were pairings between IDs 3 and 4.

    Similarly IDs 6, 7 and 8 end up in Group 5 because of the existence of these records:-

    Insert into #DummyData values (6,7)

    Insert into #DummyData values (6,8)

    Insert into #DummyData values (7,6)

    Insert into #DummyData values (7,8)

    Insert into #DummyData values (8,6)

    Insert into #DummyData values (8,7)The fact that I've called the group "5" is incidental.

    Finally, I did make an edit to the original post but I think it got lost somewhere and that may have added to the confusion. I've made it again so if you look back you may notice I added a couple more inserts (sorry about that). As the post reads now you can also see that IDs 3 and 8 belong to group 6 because of the existence of these records

    Insert into #DummyData values (8,8)

    Insert into #DummyData values (3,8)

    Finally, I only want a group for a single ID is that ID cannot be grouped with another. So ID 1 gets a group of it's own because there's no other record it pairs with. ID3 doesn't get a group of it's own because it will end up in groups 3 and 6.

    I hope that helps clarify it a bit

    Edit> It might also help clarify things if I say the Pairings are the input data. The output represents the groups, not the pairings.

  • FunkyDexter (4/13/2016)


    Hi Sean. Thanks for the response and sorry if I was unclear. Let me see if I can clarify that a bit.

    The values in the second column on the output are entirely arbitrary. They're there to identify the group but aren't generated from the input values in any way. So, for example, IDs 3 and 4 both ended up in group 3 because of the existence of these records:-

    Insert into #DummyData values (4,3)

    Insert into #DummyData values (3,4)It didn't need to be called Group 3, it could have been called Group 99 or Group PinkElephant for all I care. I just want to know that there were pairings between IDs 3 and 4.

    Similarly IDs 6, 7 and 8 end up in Group 5 because of the existence of these records:-

    Insert into #DummyData values (6,7)

    Insert into #DummyData values (6,8)

    Insert into #DummyData values (7,6)

    Insert into #DummyData values (7,8)

    Insert into #DummyData values (8,6)

    Insert into #DummyData values (8,7)The fact that I've called the group "5" is incidental.

    Finally, I did make an edit to the original post but I think it got lost somewhere and that may have added to the confusion. I've made it again so if you look back you may notice I added a couple more inserts (sorry about that). As the post reads now you can also see that IDs 3 and 8 belong to group 6 because of the existence of these records

    Insert into #DummyData values (8,8)

    Insert into #DummyData values (3,8)

    Finally, I only want a group for a single ID is that ID cannot be grouped with another. So ID 1 gets a group of it's own because there's no other record it pairs with. ID3 doesn't get a group of it's own because it will end up in groups 3 and 6.

    I hope that helps clarify it a bit

    Edit> It might also help clarify things if I say the Pairings are the input data. The output represents the groups, not the pairings.

    There is still some elusive rules here that don't make any sense to me at all. Why do we know that 8,8 and 3,8 belong to group 6? Why not 5? Or why not 11? It just seems to randomly somehow belong to a group.

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • If it helps clarify further, I'm fairly convinced I need some sort of recursion here.

    E.g. ID 6 associates with IDs 7 and 8. Based purely on that it would create 2 groups, one containing 6 and 7, one containing 7 and 8. However, those two groups then collapse to one because ID 7 associates with 8.

    This is not true of ID 3 which associates with both 4 and 8. That creates 2 groups. However these do not collapse to one because records 4 and 8 are not associated.

    Edit> crossed over with your last response.

    Why do we know that 8,8 and 3,8 belong to group 6? Why not 5?

    To belong to group 5, ID 3 would need to associate with IDs 6, 7, and 8. So to form a group, all the IDs within it must associate with every other ID in it. Anything else results in a separate group. Actually, as I type that it sounds wrong. Let me ponder that a bit.

    Edit> nope, I was right first time. All the elements in a group must relate to each other. If there's any two elements that don't relate it would result in two groups, albeit with a lot of the same members.

  • If I understand correctly, he is looking for complete subgraphs (or Cliques).

    Drew

    J. Drew Allen
    Business Intelligence Analyst
    Philadelphia, PA

  • I'll admit that I didn't fully understand that page :ermm: but I believe you are correct. It sounds exactly what I'm after.

  • Actually, thanks hugely for that because it gave me something I could actually google for... and guess where it led me[/url]. That's going to take a bit of digesting but I think it might take me to where I need to be.

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply