Recursive cross join to get all available combinaisons

  • Chris Morris-439714 (4/8/2010)


    Backtracking a little...

    Orders (OrderID)

    OrderLines (OrderLineID)

    Shipping (ShippingID)

    ShippingLines (ShippingLineID)

    Invoices (InvoiceID)

    InvoiceLines (InvoiceLineID)

    Which tables hold fk's of which other tables?

    ShippingHeader and InvoiceHeader has FK to OrderID

    1 order can have n shippings

    You cannot have more than 1 order in a single shipping

    1 shipping can have 1 or less invoice... 1 shipping, 2 shipping or more or even all shippings can be merged into a single invoice.

    I already filtered what I can from document dates before I get to that stage (an invoice of may 1st cannot be linked to a shipment of may 15th. The invoice needs to be same day of the shipping or later).

    Nothing to link Shipping to Invoice except matching row for row using document + rowid (where 100% of the rows match in the details of 1 to many documents... 1 invoice to n shippings), sku, and Sum(qty) of all shippings to the invoice lines

  • Paul White NZ (4/8/2010)


    Ninja's_RGR'us (4/7/2010)


    Think of it this way...

    You have 4 persons in a room, how many different "couples" can do a handshake.

    N-way handshaking combinations...

    USE tempdb;

    GO

    SET ANSI_NULLS ON

    SET QUOTED_IDENTIFIER ON

    GO

    DROP FUNCTION dbo.Numbers;

    GO

    CREATE FUNCTION dbo.Numbers

    (@N BIGINT)

    RETURNS TABLE

    WITH SCHEMABINDING

    AS RETURN

    WITH

    N1 AS (SELECT N = 1 UNION ALL SELECT 1),

    N2 AS (SELECT N = 1 FROM N1 T, N1),

    N3 AS (SELECT N = 1 FROM N2 T, N2),

    N4 AS (SELECT N = 1 FROM N3 T, N3),

    N5 AS (SELECT N = 1 FROM N4 T, N4),

    N6 AS (SELECT N = 1 FROM N5 T, N5),

    NM AS (SELECT ROW_NUMBER() OVER (ORDER BY (SELECT 0)) AS N FROM N6)

    SELECT TOP (@N)

    N

    FROM NM

    WHERE N <= @N

    ORDER BY

    N ASC;

    GO

    DECLARE @People

    TABLE (

    person_id INTEGER IDENTITY(0,1) PRIMARY KEY,

    name VARCHAR(10) NOT NULL

    );

    INSERT @People (name) VALUES ('Alan');

    INSERT @People (name) VALUES ('Bob');

    INSERT @People (name) VALUES ('Carol');

    INSERT @People (name) VALUES ('Debra');

    INSERT @People (name) VALUES ('Eric');

    INSERT @People (name) VALUES ('Frank');

    INSERT @People (name) VALUES ('Gina');

    INSERT @People (name) VALUES ('Harry');

    INSERT @People (name) VALUES ('Ian');

    INSERT @People (name) VALUES ('Julie');

    DECLARE @Bits

    TABLE (

    bit_id BIGINT NOT NULL PRIMARY KEY,

    value BIGINT NOT NULL

    );

    INSERT @Bits (bit_id, value)

    SELECT P.person_id - 1, POWER(2, (P.person_id - 1))

    FROM @People P;

    DECLARE @Grouped

    TABLE (

    N BIGINT NOT NULL,

    name VARCHAR(10) NOT NULL,

    group_size BIGINT NOT NULL,

    group_order BIGINT NOT NULL,

    group_row BIGINT NOT NULL,

    UNIQUE (N, name),

    UNIQUE (group_row, group_size, group_order, N)

    );

    INSERT @Grouped

    (N, name, group_size, group_order, group_row)

    SELECT Q.N,

    Q.name,

    Q.group_size,

    group_order = DENSE_RANK() OVER (PARTITION BY Q.group_size ORDER BY Q.N),

    group_row = DENSE_RANK() OVER (PARTITION BY Q.group_size ORDER BY Q.name)

    FROM (

    SELECT N.N,

    P.name,

    group_size = COUNT(*) OVER (PARTITION BY N.N)

    FROM dbo.Numbers (POWER(2, (SELECT COUNT(*) FROM @People))) N

    JOIN @Bits B

    ON N.N & B.value = B.value

    JOIN @People P

    ON P.person_id = B.bit_id

    ) Q

    SELECT names =

    STUFF

    (

    (

    SELECT ',' + name

    FROM @Grouped T2

    WHERE T2.N = T1.N

    ORDER BY

    name ASC

    FOR XML PATH('')

    )

    , 1, 1, SPACE(0))

    FROM @Grouped T1

    WHERE group_row = 1

    ORDER BY

    T1.group_size,

    T1.group_order;

    GO

    Your version seems to be the most promising so far. I ran it with 16 people and it returned 32k rows in 12 seconds.

    Now assuming I need to do all 32K cross checks before finding the correct match, there's no way I can run this as the user calls for the report... I'll just have to precompile the data and save it to the DB (no point in recalculating after I've found the correct data).

    Also one more piece to consider. A single client (and we have a couple 1000s), can have 100s invoices PER DAY. In our peak months we have ± 2000 orders per day from ±250 clients. So I'd need to adjust this script to work with multiple "offending" orders at the same time from the same client... and even all clients at once if I rebuild the whole history for the day at once.

    P.S. I ran your script with 20 people and here are the results 524288 rows in 6 M 26 S.

  • Quatrei.X (4/8/2010)


    ^__^ Hi there,

    uhmmm, i'm not usre if i understood the problem right...

    i turn Ben's code to dynamic sql (which i always avoid using)

    from Ben's code, are you saying you want to specify the number of fields/joins?

    NOTES:

    1. I have included the minimum number of combinations as an input parameter but it doesn't work yet...

    2. I kinda hate how my code looks...

    3. You can combine the two loops but will have additional conditional statements...

    4. I tried 25 records with 16 joins... it took 3 mins and row 1 still has the value 1 (i stoped the execution)... that sample data would be kinda slow...

    5. Just change the value of @MaxRows to the desired number of joins. (enter 3 or more, dont have proper validations yet)

    hmmmm, is this what your looking for?

    I hope it helps ^__^

    SAMPLE DATA

    CREATE TABLE #Test (myInt int)

    insert into #Test(myInt)

    select 0

    union select 1

    union select 2

    union select 3

    union select 4

    union select 5

    union select 6

    union select 7

    union select 8

    union select 9

    union select 10

    MY CODE

    DECLARE @MinRows INT, @MaxRows INT, @SqlString NVARCHAR(MAX),

    @RowString NVARCHAR(MAX),@JoinString NVARCHAR(MAX),

    @i INT,@iStr NVARCHAR(10),@iStrBefore NVARCHAR(10)

    SELECT@MinRows='2'-- NO USE YET

    ,@MaxRows='8'

    SET @i=1

    SET @RowString='SELECT'

    --// ROWS

    WHILE @i<=@MaxRows

    BEGIN

    SET @iStr=CAST(@i AS NVARCHAR(MAX))

    SELECT @RowString = @RowString + '

    t' + @iStr + '.myInt col' + @iStr + ','

    SET @i=@i+1

    END

    SELECT @RowString=LEFT(@RowString,LEN(@RowString)-1)

    --//JOINS

    SET @JoinString='

    from #Test t1

    inner join #Test t2

    on t2.myInt > t1.myInt'

    IF @MaxRows>=3

    BEGIN

    SELECT @JoinString=@JoinString+'

    inner join #Test t3

    on t3.myInt > t2.myInt or t3.myInt = 0'

    IF @MaxRows>=4

    BEGIN

    SET @i=4

    WHILE @i<=@MaxRows

    BEGIN

    SET @iStr=CAST(@i AS NVARCHAR(MAX))

    SET @iStrBefore=CAST(@i-1 AS NVARCHAR(MAX))

    SELECT @JoinString=@JoinString+'

    inner join #Test t' + @iStr + '

    on (t' + @iStr + '.myInt > t' + @iStrBefore + '.myInt and t' + @iStrBefore + '.myInt > 0) or t' + @iStr + '.myInt = 0'

    SET @i=@i+1

    END

    END

    END

    SELECT @SqlString = @RowString + @JoinString +

    'where t1.myInt > 0

    order by t1.myInt, t2.myInt'

    SELECT@SqlString AS 'My_SQL_Query_String'

    ,CAST(@SqlString AS XML) AS 'CLICK_ME_TO_VIEW'

    EXEC (@SqlString)

    DROP TABLE #Test

    Is this what your looking for?

    Thanks for the effort. However I've come to realize that this is just impossible to run a script like this at run time. I'll be looking to see if it's worth spending a couple more days finding a solution for this when it doesn't have a huge impact for us at the moment since I have way less than 1% offending orders with this criteria.

  • I'm sure you will find a way to do this - I had a crack at the handshaking thing because it was in terms I could understand (!) and seemed interesting. I hope it was helpful in some way or another - good luck with the wider problem 😎

  • Thanks Paul... getting it done is not really the problem. Getting it done on the fly and / or without wasting a week to do it fast is really the challenge. Obviously pre-compiling the data seems like the sensible thing to do.

    The db is only weeks hold and most transaction tables like ledgers are hitting the 1M rows mark... and this is a low activity period for us. :w00t:

    However I think I'll go and have a crack with the vendor to see if they can fix this little oversight. And then we could only worry about the couple 100 cases that require that massive list of possible matches manually once and be done with it.

    Maybe this is worthy of being turned into one of those t-sql challenges simple-talk is holding...

    Thanks for everyone who contributed on this one.

  • Ninja's_RGR'us (4/8/2010)


    Chris Morris-439714 (4/8/2010)


    Backtracking a little...

    Orders (OrderID)

    OrderLines (OrderLineID)

    Shipping (ShippingID)

    ShippingLines (ShippingLineID)

    Invoices (InvoiceID)

    InvoiceLines (InvoiceLineID)

    Which tables hold fk's of which other tables?

    ShippingHeader and InvoiceHeader has FK to OrderID

    1 order can have n shippings

    You cannot have more than 1 order in a single shipping

    1 shipping can have 1 or less invoice... 1 shipping, 2 shipping or more or even all shippings can be merged into a single invoice.

    I already filtered what I can from document dates before I get to that stage (an invoice of may 1st cannot be linked to a shipment of may 15th. The invoice needs to be same day of the shipping or later).

    Nothing to link Shipping to Invoice except matching row for row using document + rowid (where 100% of the rows match in the details of 1 to many documents... 1 invoice to n shippings), sku, and Sum(qty) of all shippings to the invoice lines

    Thanks. It's still horrible!

    FYI the number of combinations is 2(number of rows)-1, i.e. for 6 rows, that's 63 combinations.

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • Chris Morris-439714 (4/8/2010)


    Ninja's_RGR'us (4/8/2010)


    Chris Morris-439714 (4/8/2010)


    Backtracking a little...

    Orders (OrderID)

    OrderLines (OrderLineID)

    Shipping (ShippingID)

    ShippingLines (ShippingLineID)

    Invoices (InvoiceID)

    InvoiceLines (InvoiceLineID)

    Which tables hold fk's of which other tables?

    ShippingHeader and InvoiceHeader has FK to OrderID

    1 order can have n shippings

    You cannot have more than 1 order in a single shipping

    1 shipping can have 1 or less invoice... 1 shipping, 2 shipping or more or even all shippings can be merged into a single invoice.

    I already filtered what I can from document dates before I get to that stage (an invoice of may 1st cannot be linked to a shipment of may 15th. The invoice needs to be same day of the shipping or later).

    Nothing to link Shipping to Invoice except matching row for row using document + rowid (where 100% of the rows match in the details of 1 to many documents... 1 invoice to n shippings), sku, and Sum(qty) of all shippings to the invoice lines

    Thanks. It's still horrible!

    FYI the number of combinations is 2(number of rows)-1, i.e. for 6 rows, that's 63 combinations.

    Thanks for the words of compassion. The really horrible part is that in all those documents the prices don't even match (0.01$ errors on a few details lines). So we tell the client price A on the order. Then price B on the shipping slip (which is our invoice at the moment) and when he comes back on the web site to get the real invoice, he possibly gets a 3rd different price.

    A big no-no when the bigger clients refuse to pay for even 0.01$ difference for the whole invoice.

    And this is also where I need to come in to fix the problem because on the shipping "invoice" we don't have the real invoice # which is created after the delivery guys come back with the adjusted shipping slip which accounts for returns and refused deliveries.

    Which means that the client can only track the invoice with the shipping #... which is even more compunded by the fact that the dates of documents and total amounts DON'T MATCH... and that you can have n shippings on 1 invoice!

    A real pleasure to fix!

    Oh ya and I don't have access to the source code to correct that little misdesign :crazy:... would be so easy to add 1 column and save the value of the invoice on creation and just fix the issue once for the missing data.

    Then the problem would only be to concatenate the shipping # if there's more than 1 on the invoice.

    Anyhow, 50 hours later and I should be done right about today with a 99.9% correct solution approved by direction.

    Anyhow thanks again for all the help.

  • Ninja (Remi?)

    Sympathies mate. How much leeway do you have with the db? Can you for instance sneak in a trigger here and there, and maybe a link table?

    It's an interesting challenge as you and others have pointed out, so I'm fiddling with a recursive CTE to see if it can do anything useful.

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • Chris Morris-439714 (4/8/2010)


    Ninja (Remi?)

    Sympathies mate. How much leeway do you have with the db? Can you for instance sneak in a trigger here and there, and maybe a link table?

    It's an interesting challenge as you and others have pointed out, so I'm fiddling with a recursive CTE to see if it can do anything useful.

    I have None. I can either call the vendors to change it or create a db besides it and maintain the new field in the new db manually. Anything else voids the contract.

    Now the fun part with a new db is that we have 30 incorporations. Currently 3 of them are in that ERP system whereas the rest of them will gradually make the transition.

    The really fun part here is that there is only 1 single database. Everytime you add a new company to the system a new set of ±1200 tables is created like so :

    dbo.[<name for the cie here>$Invoice Header]

    So any script, index, mod or whatever needs to be applied to n number of companies, ideally without changing the scripts I wrote. I've already gotten around that little PITA for indexes, but for keeping dbs in sync that'd be another little project all in itself. I know I can use 2k8 change management to accomplish this but I've never played with it and I'm not sure it's allowed either.

    Anyhow if I just run a job once a day I can easily go around that one at that time.

  • Ninja's_RGR'us (4/8/2010)


    Chris Morris-439714 (4/8/2010)


    Ninja (Remi?)

    Sympathies mate. How much leeway do you have with the db? Can you for instance sneak in a trigger here and there, and maybe a link table?

    It's an interesting challenge as you and others have pointed out, so I'm fiddling with a recursive CTE to see if it can do anything useful.

    I have None. I can either call the vendors to change it or create a db besides it and maintain the new field in the new db manually. Anything else voids the contract.

    Now the fun part with a new db is that we have 30 incorporations. Currently 3 of them are in that ERP system whereas the rest of them will gradually make the transition.

    The really fun part here is that there is only 1 single database. Everytime you add a new company to the system a new set of ±1200 tables is created like so :

    dbo.[<name for the cie here>$Invoice Header]

    So any script, index, mod or whatever needs to be applied to n number of companies, ideally without changing the scripts I wrote. I've already gotten around that little PITA for indexes, but for keeping dbs in sync that'd be another little project all in itself. I know I can use 2k8 change management to accomplish this but I've never played with it and I'm not sure it's allowed either.

    Anyhow if I just run a job once a day I can easily go around that one at that time.

    Wait... you have 30x1200 tables in that database right now? That's sick.

    Seth Phelabaum


    Consistency is only a virtue if you're not a screwup. 😉

    Links: How to Post Sample Data[/url] :: Running Totals[/url] :: Tally Table[/url] :: Cross Tabs/Pivots[/url] :: String Concatenation[/url]

  • Chris Morris-439714 (4/8/2010)


    Ninja (Remi?)

    Sympathies mate. How much leeway do you have with the db? Can you for instance sneak in a trigger here and there, and maybe a link table?

    It's an interesting challenge as you and others have pointed out, so I'm fiddling with a recursive CTE to see if it can do anything useful.

    The problem I kept running into with the recursive CTE is the 'missing in the middle' number. Sequences for combos up to 4 combos aren't a problem, once you hit 5 and can have to generate stuff like '02,05' '02,03,05' etc, it falls apart.

    Although, in typing that I might have just thought of way to fix that problem. Keyword being "might" =).

    Seth Phelabaum


    Consistency is only a virtue if you're not a screwup. 😉

    Links: How to Post Sample Data[/url] :: Running Totals[/url] :: Tally Table[/url] :: Cross Tabs/Pivots[/url] :: String Concatenation[/url]

  • Not to take all of the fun out of tihs - but is there a need to match up the two children? As in - the order can have n shipments and m invoices - under what circumstance do you need to cross-reference a specific shipment to a specific invoice? As long as the shipment details add up to the total on the order and so do the invoices - what else is needed?

    I just have a sneaking suspicion that the details might not exactly match, even if the totals tie in.

    knowing the "why" might help with the "how".

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • Ninja's_RGR'us (4/8/2010)


    Chris Morris-439714 (4/8/2010)


    Ninja (Remi?)

    Sympathies mate. How much leeway do you have with the db? Can you for instance sneak in a trigger here and there, and maybe a link table?

    It's an interesting challenge as you and others have pointed out, so I'm fiddling with a recursive CTE to see if it can do anything useful.

    I have None. I can either call the vendors to change it or create a db besides it and maintain the new field in the new db manually. Anything else voids the contract.

    Now the fun part with a new db is that we have 30 incorporations. Currently 3 of them are in that ERP system whereas the rest of them will gradually make the transition.

    The really fun part here is that there is only 1 single database. Everytime you add a new company to the system a new set of ±1200 tables is created like so :

    dbo.[<name for the cie here>$Invoice Header]

    So any script, index, mod or whatever needs to be applied to n number of companies, ideally without changing the scripts I wrote. I've already gotten around that little PITA for indexes, but for keeping dbs in sync that'd be another little project all in itself. I know I can use 2k8 change management to accomplish this but I've never played with it and I'm not sure it's allowed either.

    Anyhow if I just run a job once a day I can easily go around that one at that time.

    Well not at the moment, there are only 4 cies, 3 of them really the transactional ones. The 4th one is just there to increase the buying power and discounts. You also need to account for 1 test and 1 model cie in there too... which makes it already to around 7200 tables. The final total will be 5 times that in a couple years assuming the plans don't change.

    When I hit the + in SSMS to expand all the tables I have plenty of time for a bathroom break!

    In VS 2008 to build reports for asp net (not the bids version, but the full pro version), the wizard drills down to table defs, keys and column types for all tables right off the bat. That one takes a couple hours to load at the moment and generates a trace file of a couple 100 MBs with tracing only batch ending and rpc calls and nothing else... loads of fun.

  • Garadin (4/8/2010)


    Chris Morris-439714 (4/8/2010)


    Ninja (Remi?)

    Sympathies mate. How much leeway do you have with the db? Can you for instance sneak in a trigger here and there, and maybe a link table?

    It's an interesting challenge as you and others have pointed out, so I'm fiddling with a recursive CTE to see if it can do anything useful.

    The problem I kept running into with the recursive CTE is the 'missing in the middle' number. Sequences for combos up to 4 combos aren't a problem, once you hit 5 and can have to generate stuff like '02,05' '02,03,05' etc, it falls apart.

    Although, in typing that I might have just thought of way to fix that problem. Keyword being "might" =).

    Be sure to let us know!

  • Ninja's_RGR'us (4/8/2010)


    Garadin (4/8/2010)


    Chris Morris-439714 (4/8/2010)


    Ninja (Remi?)

    Sympathies mate. How much leeway do you have with the db? Can you for instance sneak in a trigger here and there, and maybe a link table?

    It's an interesting challenge as you and others have pointed out, so I'm fiddling with a recursive CTE to see if it can do anything useful.

    The problem I kept running into with the recursive CTE is the 'missing in the middle' number. Sequences for combos up to 4 combos aren't a problem, once you hit 5 and can have to generate stuff like '02,05' '02,03,05' etc, it falls apart.

    Although, in typing that I might have just thought of way to fix that problem. Keyword being "might" =).

    Be sure to let us know!

    Probably still a few hours work making it all come together. If I get a chance to mess with it again and get anything to work, I'll post it here. Let me know what you decide on compiling this into a T-SQL Challenge (for either simple talk or beyondrelational), as I'll probably submit something similar if you decide not to.

    Seth Phelabaum


    Consistency is only a virtue if you're not a screwup. 😉

    Links: How to Post Sample Data[/url] :: Running Totals[/url] :: Tally Table[/url] :: Cross Tabs/Pivots[/url] :: String Concatenation[/url]

Viewing 15 posts - 16 through 30 (of 64 total)

You must be logged in to reply to this topic. Login to reply