• LutzM (11/17/2013)


    Isn't that more a scenario for CROSS JOINS rather than a recursive cte?

    Yes, it needs cross joins if conditions that allow you to avod them aren't met. But how to do the right number of joins when you don't know how many joins there are when writing the query - that seems to be either looping or recursion. It's also necessary to do them in such a way as to avoid the combinatorial explosion.

    The reason for picking 5000 when I started looking at this was to force that me to think about how to avoid that explosion, so probably I should have used a much lower number for my question here as it's not the issue I wanted help with, my problem was how to wrap suitable algorithms in a recursive or iterative framework in decent T-SQL - and for the cases where joins are the right way to go, demand driven dataflow seems sensible and that can only happen within a single query, so the decent framework is probably recursive not iterative.

    I would use a Tally table to expand the number of coins per denomination and the CROSS JOIN to get the total amount.

    But with the 5000 coins per denomination and 8 denomiations this would require "quite some data space" since we're talking about

    390.625.000.000.000.000.000.000.000.000 possible solutions (including duplicates).

    Just with 5 coins per denomination we already have 390625 possible solutions (5*5*5*5*5*5*5*5 = 5^8). Or am I missing something?

    You've spotted the big combinatorial explosion issue. Early elimination of duplicates rather than doing an 8-way (actually 16-way) join and then eliminating duplicates makes a big difference. With 5000 each of 1,2,5,10,20,50 and 100 unit coins one can very easily reduce the number of rows generated to 8,801,725,001. That's a reduction by 19 or 20 decimal orders of magnitude in the number of rows (with 5 of each it would be 10,036 rows, not as spectacular an improvement but still quite a bit better than 390625) just by the trivial trick of eliminating duplicates once per join and throwing away the 0 value from the first generated set; the largest number of rows generated by an individual join in that process would be 4,900,940,000. That's without doing any clever algorithm design like detecting special cases and using different algorithms to fit different cases. For example in the case where each one coin value except the smallest can be made up from the available smaller coins one can avoid doing any joins other than the ones needed to check that the set of coins has property, which don't generate many rows, and that's going to be the most common case for large numbers of coins. There are 1,940,000 values other than 0 that can be generated from that set of 40,000 coins, and I wouldn't generate any rows at all to calculate that because there's enough of the smallest coin to generate the one coin value of any of the larger coins, so I didn't need to do any joins to see whether that coin set has the property that allows me to do arithmetic instead of generating and eliminating duplicates. For coin sets with large numbers of coins that don't have this property, there are often other algorithms based on weaker properties that shortcut the process not quite so much; for small numbers of coins one has to follow the join and eliminate duplicates track.

    So my main interest in asking about this was how to build the thing so that the stages could be set up to do it properly for the cases where I do have to generate big sets and eliminate duplicates without having to alter the code when the number of possible coin values changes.

    Tom