identifying many to many relationship between columns

  • Hello All,

    I am struggling with something that is conceptually simple but I'm at a loss at how to implement it. I have two columns with a many to many relationship between the columns;

    ITEM1 ITEM2

    0224180 0224181

    0224180 0224190

    0224181 0224180

    0224181 0224190

    0224190 0224180

    0224190 0224181

    0202294 0202295

    0202295 0202294

    0209250 0209251

    0209251 0209250

    I need to add a single identifier that will allow these cross referenced parts to be queried. The desired outcome would be something like;

    ITEM1 ITEM2 ID

    0224180 0224181 1

    0224180 0224190 1

    0224181 0224180 1

    0224181 0224190 1

    0224190 0224180 1

    0224190 0224181 1

    0202294 0202295 2

    0202295 0202294 2

    0209250 0209251 3

    0209251 0209250 3

    I hope this is clear and can attach a script or data if necessary.

    Thanks in Advance,

    Lonnie

  • Not sure you can show a many-to-many relationship without a child table. That's just the standard way to do it... the standard example is

    Invoice--(1,M)--LineItem---(M,1)--Product

    where the LineItem table breaks the M-M relationship between Invoice and Product into two 1-M relationships.

  • PL is probably correct in the way this should be handled from a theoretical perspective.

    On the other hand, this may work but also may not be particularly performant.

    DECLARE @T1 TABLE (ITEM1 INT, ITEM2 INT);

    INSERT INTO @T1

    SELECT 0224180,0224181

    UNION ALL SELECT 0224180,0224190

    UNION ALL SELECT 0224181,0224180

    UNION ALL SELECT 0224181,0224190

    UNION ALL SELECT 0224190,0224180

    UNION ALL SELECT 0224190,0224181

    UNION ALL SELECT 0202294,0202295

    UNION ALL SELECT 0202295,0202294

    UNION ALL SELECT 0209250,0209251

    UNION ALL SELECT 0209251,0209250

    UNION ALL SELECT 1, 2

    UNION ALL SELECT 1, 3

    UNION ALL SELECT 1, 4

    UNION ALL SELECT 2, 3

    UNION ALL SELECT 2, 4

    UNION ALL SELECT 3, 4

    UNION ALL SELECT 2, 1

    UNION ALL SELECT 3, 1

    UNION ALL SELECT 4, 1

    UNION ALL SELECT 3, 2

    UNION ALL SELECT 4, 2

    UNION ALL SELECT 4, 3

    UNION ALL SELECT 7, 8

    UNION ALL SELECT 7, 9

    UNION ALL SELECT 8, 9

    UNION ALL SELECT 8, 7

    UNION ALL SELECT 9, 7

    UNION ALL SELECT 9, 8

    ;

    WITH rCTE AS

    (

    SELECT n=0, ITEM

    ,rn=ROW_NUMBER() OVER (ORDER BY (SELECT NULL))

    ,ITEM1=ITEM

    ,ITEM2=NULL

    FROM

    (

    SELECT DISTINCT ITEM=MIN(b.ITEM1)

    FROM

    (

    SELECT ITEM1, ITEM2, rn=ROW_NUMBER() OVER (ORDER BY (SELECT NULL))

    FROM @T1

    ) a

    CROSS APPLY

    (

    SELECT ITEM1=CASE WHEN ITEM1 < ITEM2 THEN ITEM1 ELSE ITEM2 END

    ,ITEM2=CASE WHEN ITEM1 < ITEM2 THEN ITEM2 ELSE ITEM1 END

    FROM @T1 b

    WHERE a.ITEM1 IN (b.ITEM1, b.ITEM2)

    ) b

    GROUP BY rn

    ) a

    UNION ALL

    SELECT n+1, b.ITEM1, rn, b.ITEM1, b.ITEM2

    FROM rCTE a

    JOIN @T1 b ON a.ITEM = b.ITEM1

    WHERE n <= 1

    UNION ALL

    SELECT n+1, b.ITEM1, rn, b.ITEM1, b.ITEM2

    FROM rCTE a

    JOIN @T1 b ON a.ITEM = b.ITEM2

    WHERE n <= 1

    )

    SELECT DISTINCT ITEM1, ITEM2, ID=rn

    FROM rCTE

    WHERE n = 2

    ORDER BY rn, ITEM1, ITEM2

    Note that I added some additional sample data because I wanted to see how it would resolve out with an additional set of triplet product IDs and also a 4-tuple of product IDs.

    I'm assuming you don't care which set of grouped products gets which ID. Just that each group is numbered independently.

    Edit: Slightly simplified the query. Note that this may only work if all permutations appear in the adjacency list.


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • Thank you Dwain. I'll read through the code and give it a try this morning!

  • On a quick read through I noticed your data type is int. This would drop the leading zeros on many of my skus. When I run this with the varchar type I get:

    Msg 240, Level 16, State 1, Line 1

    Types don't match between the anchor and the recursive part in column "STD SKU" of recursive query "rCTE".

    I'll keep working through this but thought I'd throw that out there if you have a quick reply.

    Thanks,

  • I had another suggestion (from a different forum) that provided a solution. It may not be the most eloquent but it ran on 98K rows in 8 minutes.

    This is from "Lamprey" on the SQLTeam .com forum:

    DECLARE @Foo TABLE (ITEM1 INT, ITEM2 INT, ID INT)

    INSERT @Foo (ITEM1, ITEM2) VALUES

    (0224180, 0224181),

    (0224180, 0224190),

    (0224181, 0224180),

    (0224181, 0224190),

    (0224190, 0224180),

    (0224190, 0224181),

    (0202294, 0202295),

    (0202295, 0202294),

    (0209250, 0209251),

    (0209251, 0209250)

    DECLARE @Val1 INT;

    DECLARE @Val2 INT;

    DECLARE @Iterator INT = 1;

    -- Prime Loop

    SELECT TOP 1

    @Val1 = Item1,

    @Val2 = Item2

    FROM

    @Foo

    ORDER BY

    Item1

    WHILE @Val1 IS NOT NULL

    BEGIN

    -- Perform Update

    UPDATE

    @Foo

    SET

    ID = @Iterator

    WHERE

    Item1 IN (@Val1, @Val2)

    OR Item2 IN (@Val1, @Val2);

    -- Get next item/group

    SELECT TOP 1

    @Val1 = Item1,

    @Val2 = Item2

    FROM

    @Foo

    WHERE

    Item1 NOT IN (@Val1, @Val2)

    AND Item2 NOT IN (@Val1, @Val2)

    AND ID IS NULL

    ORDER BY

    Item1

    IF @@ROWCOUNT = 0

    BEGIN

    SET @Val1 = NULL;

    END

    SET @Iterator = @Iterator + 1;

    END

    SELECT *

    FROM @Foo

  • lmeinke (11/12/2013)


    On a quick read through I noticed your data type is int. This would drop the leading zeros on many of my skus. When I run this with the varchar type I get:

    Msg 240, Level 16, State 1, Line 1

    Types don't match between the anchor and the recursive part in column "STD SKU" of recursive query "rCTE".

    I'll keep working through this but thought I'd throw that out there if you have a quick reply.

    Thanks,

    That error is pretty simple to resolve by CASTing the NULL assignment to ITEM2 in the rCTE anchor to a VARCHAR type, the same as your item codes.

    DECLARE @T1 TABLE (ITEM1 VARCHAR(20), ITEM2 VARCHAR(20));

    INSERT INTO @T1

    SELECT '0224180','0224181'

    UNION ALL SELECT '0224180','0224190'

    UNION ALL SELECT '0224181','0224180'

    UNION ALL SELECT '0224181','0224190'

    UNION ALL SELECT '0224190','0224180'

    UNION ALL SELECT '0224190','0224181'

    UNION ALL SELECT '0202294','0202295'

    UNION ALL SELECT '0202295','0202294'

    UNION ALL SELECT '0209250','0209251'

    UNION ALL SELECT '0209251','0209250';

    WITH rCTE AS

    (

    SELECT n=0, ITEM

    ,rn=ROW_NUMBER() OVER (ORDER BY (SELECT NULL))

    ,ITEM1=ITEM

    ,ITEM2=CAST(NULL AS VARCHAR(20))

    FROM

    (

    SELECT DISTINCT ITEM=MIN(b.ITEM1)

    FROM

    (

    SELECT ITEM1, ITEM2, rn=ROW_NUMBER() OVER (ORDER BY (SELECT NULL))

    FROM @T1

    ) a

    CROSS APPLY

    (

    SELECT ITEM1=CASE WHEN ITEM1 < ITEM2 THEN ITEM1 ELSE ITEM2 END

    ,ITEM2=CASE WHEN ITEM1 < ITEM2 THEN ITEM2 ELSE ITEM1 END

    FROM @T1 b

    WHERE a.ITEM1 IN (b.ITEM1, b.ITEM2)

    ) b

    GROUP BY rn

    ) a

    UNION ALL

    SELECT n+1, b.ITEM1, rn, b.ITEM1, b.ITEM2

    FROM rCTE a

    JOIN @T1 b ON a.ITEM = b.ITEM1

    WHERE n <= 1

    UNION ALL

    SELECT n+1, b.ITEM1, rn, b.ITEM1, b.ITEM2

    FROM rCTE a

    JOIN @T1 b ON a.ITEM = b.ITEM2

    WHERE n <= 1

    )

    SELECT DISTINCT ITEM1, ITEM2, ID=rn

    FROM rCTE

    WHERE n = 2

    ORDER BY rn, ITEM1, ITEM2;

    I would love to hear how this performs against your 98K rows.

    Edit: Minor edit to the sample data.


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • BTW. I thought this was a really fun problem. Best I've seen on the forums in weeks.

    I have an idea for another way to solve it that might be just a tad faster than my rCTE, which I'm betting will beat the pants off the RBAR UPDATE that other forum suggested. I'm talking about the rCTE I already posted that is.

    Setting up a performance test harness might be a bit of a challenge. Since you have piqued my interest, I may continue on it for a bit.

    Also, about the note where I indicated the product combinations present must include all permutations. I have a way around that now also, in case you need it.


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • Dwain,

    I appreciate your interest and effort on this. I would like to find a more efficient way to handle this but after changing the ITEM1 and ITEM2 to the appropriate column names and changing @T1 to my actual table name I get these errors;

    Msg 102, Level 15, State 1, Line 18

    Incorrect syntax near '='.

    Msg 156, Level 15, State 1, Line 29

    Incorrect syntax near the keyword 'WHERE'.

    Msg 156, Level 15, State 1, Line 34

    Incorrect syntax near the keyword 'WHERE'.

    I will try to get back to this this afternoon but thought I owed you a response.

  • Correction....those errors were due to my sloppy typing....but back to the original error:

    Msg 240, Level 16, State 1, Line 1

    Types don't match between the anchor and the recursive part in column "STD SKU" of recursive query "rCTE".

  • Can someone please help me to catch up? I feel I'm not understanding the issue. Using the OP's initial data:

    ITEM1 ITEM2 ID

    0224180 0224181 1

    0224180 0224190 1

    0224181 0224180 1

    0224181 0224190 1

    0224190 0224180 1

    0224190 0224181 1

    0202294 0202295 2

    0202295 0202294 2

    0209250 0209251 3

    0209251 0209250 3

    Why, for example, does the last value pair "0209251, 0209250" get an ID of 3? (This value pair appears only 2 times in the data.) Or why does the value pair "0224190, 0224181" get an ID of 1? (This value pair appears twice in the data.)

    Thanks, am hoping to grasp what's going on here so I can learn from it too! 🙂


    "If I had been drinking out of that toilet, I might have been killed." -Ace Ventura

  • lmeinke (11/13/2013)


    Correction....those errors were due to my sloppy typing....but back to the original error:

    Msg 240, Level 16, State 1, Line 1

    Types don't match between the anchor and the recursive part in column "STD SKU" of recursive query "rCTE".

    In my original post, the rCTE started like this:

    WITH rCTE AS

    (

    SELECT n=0, ITEM

    ,rn=ROW_NUMBER() OVER (ORDER BY (SELECT NULL))

    ,ITEM1=ITEM

    ,ITEM2=NULL

    FROM

    In my suggested correction for this error, it looks like this.

    WITH rCTE AS

    (

    SELECT n=0, ITEM

    ,rn=ROW_NUMBER() OVER (ORDER BY (SELECT NULL))

    ,ITEM1=ITEM

    ,ITEM2=CAST(NULL AS VARCHAR(20))

    FROM

    So did you remember to add the CAST around the NULL for ITEM2? The length of the VARCHAR should match whatever the length of your product code is in the table.


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • autoexcrement (11/13/2013)


    Can someone please help me to catch up? I feel I'm not understanding the issue. Using the OP's initial data:

    ITEM1 ITEM2 ID

    0224180 0224181 1

    0224180 0224190 1

    0224181 0224180 1

    0224181 0224190 1

    0224190 0224180 1

    0224190 0224181 1

    0202294 0202295 2

    0202295 0202294 2

    0209250 0209251 3

    0209251 0209250 3

    Why, for example, does the last value pair "0209251, 0209250" get an ID of 3? (This value pair appears only 2 times in the data.) Or why does the value pair "0224190, 0224181" get an ID of 1? (This value pair appears twice in the data.)

    Thanks, am hoping to grasp what's going on here so I can learn from it too! 🙂

    I'll try to explain. Look at the last 2 rows containing 0209250 and 0209251. These share a relationship in that either product can be substituted for the other. The same is true for 0202295 and 0202294 (in the rows with ID = 2).

    The first 6 rows all contain a pair chosen from any of these 3 products: 0224180, 0224181 and 0224190. Hence they share a relationship that any can be substituted for any other.

    Therefore ID = Product Substitution Category (or something like that)

    This is an excellent SQL problem to learn from as it is fiendishly diabolical to solve with high performance.


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • Wow, I never in a million years would have been able to interpret the data in that way. I still don't exactly understand how you were able to see this pattern there, or how you determined the parameters of it. Hats off, and thanks for the explanation. I can't even begin to fathom the posted SQL solution. 🙁


    "If I had been drinking out of that toilet, I might have been killed." -Ace Ventura

  • autoexcrement (11/13/2013)


    Wow, I never in a million years would have been able to interpret the data in that way. I still don't exactly understand how you were able to see this pattern there, or how you determined the parameters of it. Hats off, and thanks for the explanation. I can't even begin to fathom the posted SQL solution. 🙁

    The solution is even more fiendish if only limited substitutions are allowed.

    Think about it another way. Each product in the two column table represents a node on a graph. Each row of the table represents an edge that connects two nodes in a specified direction. For the case of IDs 2 and 3, the 2 rows for each represent a single bi-directional edge on the graph. In a graph where all substitutions are allowed, there will exist exactly n (number of products) * (n - 1) edges. Limited substitution means there are less than that.

    Having a little background in graph theory helps.

    Edit: Added parentheses around (n - 1) to override the standard operator precedence for clarity.


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

Viewing 15 posts - 1 through 15 (of 20 total)

You must be logged in to reply to this topic. Login to reply