Recursive cross join to get all available combinaisons

  • I think the title says it all. This little step is part of a much larger procedure.

    Bascially I have 3 documents : orders, shippings and invoices. All with headers and lines obviously.

    The fun starts with the fact that shippings and invoices have the column orderNo, but no link between the shipping and invoice. From that little misdesign, I need to match all 3 document # in a list of invoices...

    The challenge: each order can have n shippings and the major PITA is that any # of shippings (for a single order) can then be merged into a single invoice... or not.

    I have cases where a single order has 5 shippings and only 3 invoices which is where I'm hitting a wall.

    In the final query I need to have only 1 line per invoice, so when I have more than 1 shipping, I need to display it like so : "order #" "invoice #" "VBL-001234, VBL-001236"

    The only way I can safely match all documents is to go at the details level of shipments and invoices and check that I have 100% match on all rows for sku and Qty which proves I have the correct document links.

    This method works excellent if I have only 1 invoice per order even if I have n shippings.

    I have adapted my query to work with a list of possible combinaisons (ship #1 + ship #3).

    Now what I need to do is generate that list of combos. I need to generate a list of all distinct combinaisons of two or more shipping # having without containing all possible documents. I have the feeling a recursive cte could do it but I've never worked with that and I'm a little lost on how to make it work.

    So far, the worst case scenario I have is 16 shippings for a single order hence the reason I'm not building this with a series of self joins.

    Here's some sample data and required output :

    DECLARE @Base TABLE (OrderNo VARCHAR(20) NOT NULL, ShippingNo VARCHAR(20) NOT NULL, GroupRowID INT NULL, PRIMARY KEY CLUSTERED (OrderNo, ShippingNO))

    INSERT INTO @Base (OrderNo, ShippingNO)

    VALUES

    ('VCB-000070', '1'),

    ('VCB-000070', '2'),

    ('VCB-000070', '3'),

    ('VCB-000071', '1'),

    ('VCB-000071', '2'),

    ('VCB-000071', '3'),

    ('VCB-000071', '4')

    for 'VCB-000070' :

    --1 --exclude

    --2 --exclude

    --3 --exclude

    1,2

    1,3

    2,3

    --1,2,3 --exclude

    for 'VCB-000071'

    --1 --exclude

    --2 --exclude

    --3 --exclude

    --4 --exclude

    1,2

    1,3

    1,4

    2,3

    2,4

    3,4

    1,2,3

    1,2,4

    2,3,4

    --1,2,3,4 --exclude

  • It may help to see the whole problem not just a piece of it. Can you put together a test suite including the expected output? I know I work better when I know the whole problem, and can see what the end result should be.

  • I'm not sure if I fully understood the requirement...

    But I would consider building a permanent table holding all variations and query that one based on the number of shippings.

    Maybe the following code helps to understand what I'd do (@lvl is used to define the number of shipments):

    DECLARE @lvl INT

    SET @lvl=4

    DECLARE @tbl TABLE

    (

    id INT

    )

    INSERT INTO @tbl

    SELECT 1 UNION ALL

    SELECT 2 UNION ALL

    SELECT 3 UNION ALL

    SELECT 4 UNION ALL

    SELECT 5

    ;WITH cte AS

    (

    SELECT

    t1.id AS id1,

    t2.id AS id2,

    t3.id AS id3,

    t4.id AS id4,

    CASE

    WHEN t2.id=t3.id THEN 3

    WHEN t2.id<t3.id AND t3.id=t4.id THEN 4

    ELSE 5 END AS lvl

    FROM @tbl t1

    CROSS JOIN @tbl t2

    CROSS JOIN @tbl t3

    CROSS JOIN @tbl t4

    WHERE t1.id<=t2.id

    AND t2.id<=t3.id

    AND t3.id<=t4.id

    AND t1.id<>t2.id AND t1.id<>t3.id AND t1.id<>t4.id

    ), cte2 AS

    (

    SELECT

    id1,

    id2,

    CASE WHEN @lvl>3 AND id2<>id3 AND id2<>id4 THEN id3 ELSE 8 END AS id3,

    CASE WHEN @lvl>4 AND id3<>id4 AND id2<> id4 AND id3<>id4 THEN id4 ELSE 9 END AS id4

    FROM cte

    WHERE lvl<= @lvl

    AND id2<= @lvl

    AND id3<= @lvl

    AND id4<= @lvl

    )

    SELECT

    id1,

    id2,

    CASE WHEN id3<8 THEN id3 ELSE 0 END AS id3,

    CASE WHEN id4<9 THEN id4 ELSE 0 END AS id4

    FROM cte2

    WHERE id2>id1 AND id3>id2 AND id4>id3

    GROUP BY id1,id2,id3,id4



    Lutz
    A pessimist is an optimist with experience.

    How to get fast answers to your question[/url]
    How to post performance related questions[/url]
    Links for Tally Table [/url] , Cross Tabs [/url] and Dynamic Cross Tabs [/url], Delimited Split Function[/url]

  • Lynn Pettis (4/7/2010)


    It may help to see the whole problem not just a piece of it. Can you put together a test suite including the expected output? I know I work better when I know the whole problem, and can see what the end result should be.

    That post would contain about 500 lines of code and another 2000 to build the relevant objects and test data which I'm sure I can't provide because it contains personal data. And of course there's even more to it than what I said but those missing pieces are basic joins to other tables, nothing relevant to my little problem.

    I have everything working except the piece to build all possible combinaisons which is where I need help. Other than just going with a loop I don't see any easy way out of this one.

  • lmu92 (4/7/2010)


    I'm not sure if I fully understood the requirement...

    But I would consider building a permanent table holding all variations and query that one based on the number of shippings.

    Maybe the following code helps to understand what I'd do (@lvl is used to define the number of shipments):

    DECLARE @lvl INT

    SET @lvl=4

    DECLARE @tbl TABLE

    (

    id INT

    )

    INSERT INTO @tbl

    SELECT 1 UNION ALL

    SELECT 2 UNION ALL

    SELECT 3 UNION ALL

    SELECT 4 UNION ALL

    SELECT 5

    ;WITH cte AS

    (

    SELECT

    t1.id AS id1,

    t2.id AS id2,

    t3.id AS id3,

    t4.id AS id4,

    CASE

    WHEN t2.id=t3.id THEN 3

    WHEN t2.id<t3.id AND t3.id=t4.id THEN 4

    ELSE 5 END AS lvl

    FROM @tbl t1

    CROSS JOIN @tbl t2

    CROSS JOIN @tbl t3

    CROSS JOIN @tbl t4

    WHERE t1.id<=t2.id

    AND t2.id<=t3.id

    AND t3.id<=t4.id

    AND t1.id<>t2.id AND t1.id<>t3.id AND t1.id<>t4.id

    ), cte2 AS

    (

    SELECT

    id1,

    id2,

    CASE WHEN @lvl>3 AND id2<>id3 AND id2<>id4 THEN id3 ELSE 8 END AS id3,

    CASE WHEN @lvl>4 AND id3<>id4 AND id2<> id4 AND id3<>id4 THEN id4 ELSE 9 END AS id4

    FROM cte

    WHERE lvl<= @lvl

    AND id2<= @lvl

    AND id3<= @lvl

    AND id4<= @lvl

    )

    SELECT

    id1,

    id2,

    CASE WHEN id3<8 THEN id3 ELSE 0 END AS id3,

    CASE WHEN id4<9 THEN id4 ELSE 0 END AS id4

    FROM cte2

    WHERE id2>id1 AND id3>id2 AND id4>id3

    GROUP BY id1,id2,id3,id4

    No you're missing a small part of it. I don't know ahead of time how many levels of cross joins I'll need and it's going to be different for every order # (which a single client can have as much as 300 in a single day... and this report I'm building can display all the client's order since day 1 of the ERP).

    So in essence I need to build all the possible combinaisons for all orders at run time. So that leaves me with 2 obvious answers

    1 - Dynamic SQL which is really hard to implement

    2 - Looping 1 level at the time until everything is resolved... which would be my next try assuming a recursing cross join is not possible is SQL.

  • Think of it this way...

    You have 4 persons in a room, how many different "couples" can do a handshake.

    p1 can shake hands with p2,p3,p4

    p2 can't shake hands again with p1 so only p2 with p3 and p4 remain

    then only p3 can shake hands with p4.

    Now that you have that done, how many 3 way hand shaking can you have (this is where recursion kicks in)?

    p1,p2,p3

    p1,p2,p4

    p2,p3,p4

    In my current solution I'm already adressing the possibilities that a single hand shake or everybody at the same time will resolve the join. Now I need everything in between.

    Again I have cases with as much as 16 peoples in the room (16 shippings for 1 single order). So that's 14 levels of recursions to build all possible groups of handshakes. I wish I could give you a formula but I just can't figure that one out atm!!!

  • Ninja's_RGR'us (4/7/2010)


    Lynn Pettis (4/7/2010)


    It may help to see the whole problem not just a piece of it. Can you put together a test suite including the expected output? I know I work better when I know the whole problem, and can see what the end result should be.

    That post would contain about 500 lines of code and another 2000 to build the relevant objects and test data which I'm sure I can't provide because it contains personal data. And of course there's even more to it than what I said but those missing pieces are basic joins to other tables, nothing relevant to my little problem.

    I have everything working except the piece to build all possible combinaisons which is where I need help. Other than just going with a loop I don't see any easy way out of this one.

    Not asking for actual data, but sample data; something that at least represents the problem domain. From what you have posted I'll have to leave it to others who are more imaginative. I need to see the problem and what the ultimate result is going to be to be of any help.

  • Do you know the max number of cross joins ahead of time?

    What I'd probably do in this case is to build a static table like the one I posted covering a "safe" level. When processing new data I would check if the nesting level would be sufficient and, if not, call as separate proc that would rebuild that table with expanded levels (most probably max_level + 1).

    To query the data I would use an iTVF with the number of shippings as an input parameter to return the reduced matrix.

    Basically, the "matrix" table would either be created based on a dynamic SQL statement to cover the unknown levels (including re-creation of the iTVF) or I would throw an exception and redo it script based (depending on the business requirement).

    I might even add a pre-warning alert to send me an email if I'd have a level of 20 covered and there will be data 18 levels deep... It depends. πŸ˜‰

    But I agree trying to avoid looping for each shipment.



    Lutz
    A pessimist is an optimist with experience.

    How to get fast answers to your question[/url]
    How to post performance related questions[/url]
    Links for Tally Table [/url] , Cross Tabs [/url] and Dynamic Cross Tabs [/url], Delimited Split Function[/url]

  • This is a really interesting problem. I cannot think of a way to do it completely dynamic in terms of the maximum size of your combination collections and I'm hoping someone will come along with an amazing method I never would have considered.

    However, if you can limit the maximum element grouping to a certain number, I have a possible suggestion for you. Basically, it's a tally table ... the numbers would join back to a ROW_NUMBER or identity or however you want to join back to your actual data. The 0's would be non-joins (so as to account for groupings smaller than the maximum.

    My tiny tally table to avoid getting too many results back:

    declare @test-2 table (myInt int)

    insert into @test-2(myInt)

    select 0

    union select 1

    union select 2

    union select 3

    union select 4

    union select 5

    union select 6

    union select 7

    union select 8

    union select 9

    union select 10

    And my logic to build my complex-er tally table:

    select t1.myInt col1,

    t2.myInt col2,

    t3.myInt col3,

    t4.myInt col4,

    t5.myInt col5,

    t6.myInt as col6

    from @test-2 t1

    join @test-2 t2

    on t2.myInt > t1.myInt

    join @test-2 t3

    on t3.myInt > t2.myInt or t3.myInt = 0

    join @test-2 t4

    on (t4.myInt > t3.myInt and t3.myInt > 0) or t4.myInt = 0

    join @test-2 t5

    on (t5.myInt > t4.myInt and t4.myInt > 0) or t5.myInt = 0

    join @test-2 t6

    on (t6.myInt > t5.myInt and t5.myInt > 0) or t6.myInt = 0

    where t1.myInt > 0

    order by t1.myInt, t2.myInt

    With only the 10 initial values, it should return 837 possible combination of between 2 and 6 elements.

    I would be interested to hear feedback.

    Thanks,

    -Ben

    β””> bt



    Forum Etiquette: How to post data/code on a forum to get the best help[/url]

  • bteraberry (4/7/2010)


    This is a really interesting problem. I cannot think of a way to do it completely dynamic in terms of the maximum size of your combination collections and I'm hoping someone will come along with an amazing method I never would have considered.

    However, if you can limit the maximum element grouping to a certain number, I have a possible suggestion for you. Basically, it's a tally table ... the numbers would join back to a ROW_NUMBER or identity or however you want to join back to your actual data. The 0's would be non-joins (so as to account for groupings smaller than the maximum.

    My tiny tally table to avoid getting too many results back:

    declare @test-2 table (myInt int)

    insert into @test-2(myInt)

    select 0

    union select 1

    union select 2

    union select 3

    union select 4

    union select 5

    union select 6

    union select 7

    union select 8

    union select 9

    union select 10

    And my logic to build my complex-er tally table:

    select t1.myInt col1,

    t2.myInt col2,

    t3.myInt col3,

    t4.myInt col4,

    t5.myInt col5,

    t6.myInt as col6

    from @test-2 t1

    join @test-2 t2

    on t2.myInt > t1.myInt

    join @test-2 t3

    on t3.myInt > t2.myInt or t3.myInt = 0

    join @test-2 t4

    on (t4.myInt > t3.myInt and t3.myInt > 0) or t4.myInt = 0

    join @test-2 t5

    on (t5.myInt > t4.myInt and t4.myInt > 0) or t5.myInt = 0

    join @test-2 t6

    on (t6.myInt > t5.myInt and t5.myInt > 0) or t6.myInt = 0

    where t1.myInt > 0

    order by t1.myInt, t2.myInt

    With only the 10 initial values, it should return 837 possible combination of between 2 and 6 elements.

    I would be interested to hear feedback.

    Thanks,

    -Ben

    The problem with this approach is that it assumes a max # of recursions... I could of course go on a limb and go all the way to 25 levels up front but I can't even estimate the # of rows this dataset would include. I adjusted your code to have 25 values instead of 10 and still do only 6 joins and the result had almost 250 000 rows... I can't imagine going even only to 16 with this.

  • Lynn Pettis (4/7/2010)


    Ninja's_RGR'us (4/7/2010)


    Lynn Pettis (4/7/2010)


    It may help to see the whole problem not just a piece of it. Can you put together a test suite including the expected output? I know I work better when I know the whole problem, and can see what the end result should be.

    That post would contain about 500 lines of code and another 2000 to build the relevant objects and test data which I'm sure I can't provide because it contains personal data. And of course there's even more to it than what I said but those missing pieces are basic joins to other tables, nothing relevant to my little problem.

    I have everything working except the piece to build all possible combinaisons which is where I need help. Other than just going with a loop I don't see any easy way out of this one.

    Not asking for actual data, but sample data; something that at least represents the problem domain. From what you have posted I'll have to leave it to others who are more imaginative. I need to see the problem and what the ultimate result is going to be to be of any help.

    I'll see what I can do but I can't promise anything... I'd need to built scripts for at least 8 tables, adjust 500 lines of code to match the new tables then do about 2000 insert statements to have anything close to reality for 1 month of data for only 1 client... that's a lot of work for something I already know how to do... but wishing there was another way.

  • This would make a great TSQL Challenge. I've been going around in circles for a couple hours now trying to figure out a way to do this in a dynamic set based manner. Haven't quite made it all come together yet.

    Seth Phelabaum


    Consistency is only a virtue if you're not a screwup. πŸ˜‰

    Links: How to Post Sample Data[/url] :: Running Totals[/url] :: Tally Table[/url] :: Cross Tabs/Pivots[/url] :: String Concatenation[/url]

  • ^__^ Hi there,

    uhmmm, i'm not usre if i understood the problem right...

    i turn Ben's code to dynamic sql (which i always avoid using)

    from Ben's code, are you saying you want to specify the number of fields/joins?

    NOTES:

    1. I have included the minimum number of combinations as an input parameter but it doesn't work yet...

    2. I kinda hate how my code looks...

    3. You can combine the two loops but will have additional conditional statements...

    4. I tried 25 records with 16 joins... it took 3 mins and row 1 still has the value 1 (i stoped the execution)... that sample data would be kinda slow...

    5. Just change the value of @MaxRows to the desired number of joins. (enter 3 or more, dont have proper validations yet)

    hmmmm, is this what your looking for?

    I hope it helps ^__^

    SAMPLE DATA

    CREATE TABLE #Test (myInt int)

    insert into #Test(myInt)

    select 0

    union select 1

    union select 2

    union select 3

    union select 4

    union select 5

    union select 6

    union select 7

    union select 8

    union select 9

    union select 10

    MY CODE

    DECLARE @MinRows INT, @MaxRows INT, @SqlString NVARCHAR(MAX),

    @RowString NVARCHAR(MAX),@JoinString NVARCHAR(MAX),

    @i INT,@iStr NVARCHAR(10),@iStrBefore NVARCHAR(10)

    SELECT@MinRows='2'-- NO USE YET

    ,@MaxRows='8'

    SET @i=1

    SET @RowString='SELECT'

    --// ROWS

    WHILE @i<=@MaxRows

    BEGIN

    SET @iStr=CAST(@i AS NVARCHAR(MAX))

    SELECT @RowString = @RowString + '

    t' + @iStr + '.myInt col' + @iStr + ','

    SET @i=@i+1

    END

    SELECT @RowString=LEFT(@RowString,LEN(@RowString)-1)

    --//JOINS

    SET @JoinString='

    from #Test t1

    inner join #Test t2

    on t2.myInt > t1.myInt'

    IF @MaxRows>=3

    BEGIN

    SELECT @JoinString=@JoinString+'

    inner join #Test t3

    on t3.myInt > t2.myInt or t3.myInt = 0'

    IF @MaxRows>=4

    BEGIN

    SET @i=4

    WHILE @i<=@MaxRows

    BEGIN

    SET @iStr=CAST(@i AS NVARCHAR(MAX))

    SET @iStrBefore=CAST(@i-1 AS NVARCHAR(MAX))

    SELECT @JoinString=@JoinString+'

    inner join #Test t' + @iStr + '

    on (t' + @iStr + '.myInt > t' + @iStrBefore + '.myInt and t' + @iStrBefore + '.myInt > 0) or t' + @iStr + '.myInt = 0'

    SET @i=@i+1

    END

    END

    END

    SELECT @SqlString = @RowString + @JoinString +

    'where t1.myInt > 0

    order by t1.myInt, t2.myInt'

    SELECT@SqlString AS 'My_SQL_Query_String'

    ,CAST(@SqlString AS XML) AS 'CLICK_ME_TO_VIEW'

    EXEC (@SqlString)

    DROP TABLE #Test

    Is this what your looking for?

    _____________________________________________
    [font="Comic Sans MS"]Quatrei Quorizawa[/font]
    :):D:P;):w00t::cool::hehe:
    MABUHAY PHILIPPINES!

    "Press any key...
    Where the heck is the any key?
    hmmm... Let's see... there's ESC, CTRL, Page Up...
    but no any key"
    - Homer Simpson
  • Backtracking a little...

    Orders (OrderID)

    OrderLines (OrderLineID)

    Shipping (ShippingID)

    ShippingLines (ShippingLineID)

    Invoices (InvoiceID)

    InvoiceLines (InvoiceLineID)

    Which tables hold fk's of which other tables?

    β€œWrite the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • Ninja's_RGR'us (4/7/2010)


    Think of it this way...

    You have 4 persons in a room, how many different "couples" can do a handshake.

    N-way handshaking combinations...

    USE tempdb;

    GO

    SET ANSI_NULLS ON

    SET QUOTED_IDENTIFIER ON

    GO

    DROP FUNCTION dbo.Numbers;

    GO

    CREATE FUNCTION dbo.Numbers

    (@N BIGINT)

    RETURNS TABLE

    WITH SCHEMABINDING

    AS RETURN

    WITH

    N1 AS (SELECT N = 1 UNION ALL SELECT 1),

    N2 AS (SELECT N = 1 FROM N1 T, N1),

    N3 AS (SELECT N = 1 FROM N2 T, N2),

    N4 AS (SELECT N = 1 FROM N3 T, N3),

    N5 AS (SELECT N = 1 FROM N4 T, N4),

    N6 AS (SELECT N = 1 FROM N5 T, N5),

    NM AS (SELECT ROW_NUMBER() OVER (ORDER BY (SELECT 0)) AS N FROM N6)

    SELECT TOP (@N)

    N

    FROM NM

    WHERE N <= @N

    ORDER BY

    N ASC;

    GO

    DECLARE @People

    TABLE (

    person_id INTEGER IDENTITY(0,1) PRIMARY KEY,

    name VARCHAR(10) NOT NULL

    );

    INSERT @People (name) VALUES ('Alan');

    INSERT @People (name) VALUES ('Bob');

    INSERT @People (name) VALUES ('Carol');

    INSERT @People (name) VALUES ('Debra');

    INSERT @People (name) VALUES ('Eric');

    INSERT @People (name) VALUES ('Frank');

    INSERT @People (name) VALUES ('Gina');

    INSERT @People (name) VALUES ('Harry');

    INSERT @People (name) VALUES ('Ian');

    INSERT @People (name) VALUES ('Julie');

    DECLARE @Bits

    TABLE (

    bit_id BIGINT NOT NULL PRIMARY KEY,

    value BIGINT NOT NULL

    );

    INSERT @Bits (bit_id, value)

    SELECT P.person_id - 1, POWER(2, (P.person_id - 1))

    FROM @People P;

    DECLARE @Grouped

    TABLE (

    N BIGINT NOT NULL,

    name VARCHAR(10) NOT NULL,

    group_size BIGINT NOT NULL,

    group_order BIGINT NOT NULL,

    group_row BIGINT NOT NULL,

    UNIQUE (N, name),

    UNIQUE (group_row, group_size, group_order, N)

    );

    INSERT @Grouped

    (N, name, group_size, group_order, group_row)

    SELECT Q.N,

    Q.name,

    Q.group_size,

    group_order = DENSE_RANK() OVER (PARTITION BY Q.group_size ORDER BY Q.N),

    group_row = DENSE_RANK() OVER (PARTITION BY Q.group_size ORDER BY Q.name)

    FROM (

    SELECT N.N,

    P.name,

    group_size = COUNT(*) OVER (PARTITION BY N.N)

    FROM dbo.Numbers (POWER(2, (SELECT COUNT(*) FROM @People))) N

    JOIN @Bits B

    ON N.N & B.value = B.value

    JOIN @People P

    ON P.person_id = B.bit_id

    ) Q

    SELECT names =

    STUFF

    (

    (

    SELECT ',' + name

    FROM @Grouped T2

    WHERE T2.N = T1.N

    ORDER BY

    name ASC

    FOR XML PATH('')

    )

    , 1, 1, SPACE(0))

    FROM @Grouped T1

    WHERE group_row = 1

    ORDER BY

    T1.group_size,

    T1.group_order;

    GO

Viewing 15 posts - 1 through 15 (of 64 total)

You must be logged in to reply to this topic. Login to reply