number the Group on data from two columns duplicated

  • Hi All,

    i have a table LinkTable which contains two intger columns ID1 and ID2 which may contain duplicate data interchanged in different rows

    Sample data of the table,

    SID(primary key)| ID1 | ID2

    =================================

    1 | 101 | 102

    2 | 102 | 101

    3 | 103 | 104

    4 | 103 | 105

    5 | 104 | 105

    6 | 106 | 104

    7 | 107 | 108

    I need a sql query which will form groups based on values of ID1 and ID2 and need to number the group starting from 1

    ID1 | ID2

    ================================================

    101 | 102 => 1 and 2 will go to group 1

    102 | 101 => Already 1 and 2 is added so ignore

    103 | 104 => 3 and 4 will make group 2

    103 | 105 => 5 will go to group 2 as its ID1 is 3, which in turn is part of group 1

    104 | 105 => ignore since 4 and 5 already added

    106 | 104 => 6 will go to group 2 as its ID2 4 is of group 2

    107 | 108 => 7 and 8 will make group 3

    so the result will look like this :

    Group | Member

    =======|============

    1 | 101

    1 | 102

    2 | 103

    2 | 104

    2 | 105

    2 | 106

    3 | 107

    3 | 108

    Please help me to solve this problem,

    Thanks

    Sh.

  • Would you mind sharing the business case that would use such a strange data structure?

    Regarding your requirement:

    Am I right that you'd need to start a new group as soon as neither one of the numbers in one row has been previously assigned to a group?

    If so, how would you handle row 6 to be "6 | 101 | 104" instead of "6 | 106 | 104" (both values in a row are preassigned, but to different groups)?

    Are the groups supposed to be consecutive or can you "jump back" to let's say group 1 if row 8 would be 8 | 101 | 109?

    Again: really strange requirement...



    Lutz
    A pessimist is an optimist with experience.

    How to get fast answers to your question[/url]
    How to post performance related questions[/url]
    Links for Tally Table [/url] , Cross Tabs [/url] and Dynamic Cross Tabs [/url], Delimited Split Function[/url]

  • Hi Lutz,

    Thanks for the quick reply,

    Regarding the requirement ,

    In LinkTable , ID1 contains the uniqueid of a transaction and ID2 contans the uniqueid of the transaction related to it,

    One transaction can be related to 0 or many transactions,

    We are in the process of creating specific groups for all related transaction.

    Quote: Am I right that you'd need to start a new group as soon as neither one of the numbers in one row has been previously assigned to a group?

    Ans: Yes, you are correct ,

    Quote:If so, how would you handle row 6 to be "6 | 101 | 104" instead of "6 | 106 | 104" (both values in a row are preassigned, but to different groups)?

    Are the groups supposed to be consecutive or can you "jump back" to let's say group 1 if row 8 would be 8 | 101 | 109?

    Ans :

    If the linktable contains data :

    SID(primary key)|ID1 |ID2

    ====================

    1| 101 | 102

    2| 102 | 101

    3| 103 | 104

    4| 103 | 105

    5| 104 | 105

    6| 101 | 104

    7| 107 | 108

    8| 101 | 109

    The expected output is:

    GroupNum| ID

    ==========

    1| 101

    1| 102

    1| 103

    1| 104

    1| 105

    2| 107

    2| 108

    1| 109

    Thanks & Regards

    Sh.

  • I thought about applying the concept of a hierarchical CTE (aka recursive CTE, see BOL for details). Basically, I would have used the smaller ID per row as "manager" and the other number as "employee" (this would actually take care of (101,102) and (102,101) not to cause an infinite loop.

    But after spending quite some time with it, I have to give up.

    The sample data you provided do not show any consistent logical pattern to apply a hierarchy structure to.

    Examples:

    In your second set of sample "employee" 105 would have to report to "manager" 104 and both have to report to 103 at the same time.

    As far as I can see, the only way to solve that puzzle would be using a *cough* cursor *cough**cough* since it would handle one row at a time, ignoring inconsistent data. (Un)fortunately, my SQL Server version is not capable of programming a cursor (or maybe I'm not. Can't tell.) 😉

    But the much more recommended way would be to get normalized data. Those could be transformed using hierarchy tools like recursive CTEs.

    Btw: I can't think of a business case using transaction IDs that are related in a many to many relationship including circular reference across several levels...



    Lutz
    A pessimist is an optimist with experience.

    How to get fast answers to your question[/url]
    How to post performance related questions[/url]
    Links for Tally Table [/url] , Cross Tabs [/url] and Dynamic Cross Tabs [/url], Delimited Split Function[/url]

  • shaini.binu (1/14/2010)


    Hi Lutz,

    Thanks for the quick reply,

    Regarding the requirement ,

    In LinkTable , ID1 contains the uniqueid of a transaction and ID2 contans the uniqueid of the transaction related to it,

    One transaction can be related to 0 or many transactions,

    We are in the process of creating specific groups for all related transaction.

    [font="Arial Black"]Quote: Am I right that you'd need to start a new group as soon as neither one of the numbers in one row has been previously assigned to a group?

    Ans: Yes, you are correct ,[/font]

    Quote:If so, how would you handle row 6 to be "6 | 101 | 104" instead of "6 | 106 | 104" (both values in a row are preassigned, but to different groups)?

    Are the groups supposed to be consecutive or can you "jump back" to let's say group 1 if row 8 would be 8 | 101 | 109?

    Ans :

    If the linktable contains data :

    SID(primary key)|ID1 |ID2

    ====================

    1| 101 | 102

    2| 102 | 101

    3| 103 | 104

    4| 103 | 105

    5| 104 | 105

    6| 101 | 104

    7| 107 | 108

    8| 101 | 109

    The expected output is:

    GroupNum| ID

    ==========

    1| 101

    1| 102

    1| 103

    1| 104

    1| 105

    2| 107

    2| 108

    1| 109

    Thanks & Regards

    Sh.

    Then why doesn't 103, 104, and 105 live in a group 2 by themselves?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (1/14/2010)


    shaini.binu (1/14/2010)


    [font="Arial Black"]Quote: Am I right that you'd need to start a new group as soon as neither one of the numbers in one row has been previously assigned to a group?

    Ans: Yes, you are correct ,[/font]

    Ans :

    If the linktable contains data :

    SID(primary key)|ID1 |ID2

    ====================

    1| 101 | 102

    2| 102 | 101

    3| 103 | 104

    4| 103 | 105

    5| 104 | 105

    6| 101 | 104

    7| 107 | 108

    8| 101 | 109

    The expected output is:

    GroupNum| ID

    ==========

    1| 101

    1| 102

    1| 103

    1| 104

    1| 105

    2| 107

    2| 108

    1| 109

    Thanks & Regards

    Sh.

    Then why doesn't 103, 104, and 105 live in a group 2 by themselves?

    As far as I would read it, row #6 makes 103 to 105 belonging to group 1.

    Instead of "previously assigned" the term probably should be "assigned anywhere in the table".

    I think to solve this puzzle you'd need to order the rows asc by the value of either id1 or id2 (per row), depending on which one is less than the other...

    Using the example from above you'd end up with

    SID(primary key)|ID1 |ID2

    ====================

    1| 101 | 102

    2| 101 | 102 (ID1 and ID2 switched)

    6| 101 | 104

    8| 101 | 109

    3| 103 | 104

    4| 103 | 105

    5| 104 | 105

    7| 107 | 108

    ... at least that's what I guess should be done to get the expected result...



    Lutz
    A pessimist is an optimist with experience.

    How to get fast answers to your question[/url]
    How to post performance related questions[/url]
    Links for Tally Table [/url] , Cross Tabs [/url] and Dynamic Cross Tabs [/url], Delimited Split Function[/url]

  • Hi,

    I have solved the puzzle with the help of cursurs, Thanks for all the help,

    Thanks

    Sh.

  • shaini.binu (1/15/2010)


    Hi,

    I have solved the puzzle with the help of cursurs, Thanks for all the help,

    Thanks

    Sh.

    Two way street here, shaini... please post the code for your solution.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Would I be correct in thinking that these are reconciliation groups?

    If you have 3 receipts and 10 invoices with cross associations in the reconciliation they all need to be in the same reconciliation group....

    not sure exactly how to solve the problem yet, but cursors is NOT a nice solution.

    Which version on MSSQL are you on.

  • Jeff Moden (1/15/2010)


    shaini.binu (1/15/2010)


    Hi,

    I have solved the puzzle with the help of cursurs, Thanks for all the help,

    Thanks

    Sh.

    Two way street here, shaini... please post the code for your solution.

    Heh... so much for Shaini... 😉 Stuck with cursors and doesn't even know he has a problem... yet.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • This kept me awake half the night - I knew it shouldn't be too difficult. (would have been a nightmare in SQL2000)

    If you were to put this in a production environment or an sp, you could wrap the repeat block in a do while...loop

    I think the current data has to go through 4 iterations to get the required results

    begin try

    drop table #x1

    end try

    begin catch

    -- no table to drop

    end catch

    begin try

    drop table #x2

    end try

    begin catch

    -- no table to drop

    end catch

    -- bulid the base data

    create table #x1

    (

    IDint,

    value1 int,

    value2 int

    )

    insert into #x1 (ID,value1,value2) values (1, 101 , 102)

    insert into #x1 (ID,value1,value2) values (2, 102 , 101 )

    insert into #x1 (ID,value1,value2) values (3, 103 , 104 )

    insert into #x1 (ID,value1,value2) values (4, 103 , 105 )

    insert into #x1 (ID,value1,value2) values (5, 104 , 105 )

    insert into #x1 (ID,value1,value2) values (6, 101 , 104 )

    insert into #x1 (ID,value1,value2) values (7, 107 , 108 )

    insert into #x1 (ID,value1,value2) values (8, 101 , 109)

    -- normalise the data

    Create table #x2

    (

    ID int identity(1,1),

    PairID int,

    Value int,

    GroupID int

    )

    insert into #x2 select ID,value1,ID from #X1

    insert into #x2 select ID,value2,ID from #X1

    /* repeat the bit below until you get 0 rows affected */

    -- get the lowest grouping for each value

    with mingroup as

    (select value,min(groupID) as 'mingroupID' from #x2 group by value

    )

    ,

    x3 as

    (

    select

    x.PairID,

    x.value,

    x.GroupID,

    mg.mingroupID

    from

    #x2 x

    join

    mingroup mg on x.value = mg.value

    )

    ,

    x4 as(

    select pairID,min(mingroupID) as mingroupID from x3 group by PairID

    )

    update #x2

    set #x2.GroupID = x4.mingroupID

    from #x2 inner join x4 on #x2.pairID = x4.pairID

    where #x2.GroupID <> x4.mingroupID

    /* End Repeat */

    -- show the full results

    select distinct groupID,Value from #X2 order by GroupID,Value

Viewing 11 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic. Login to reply