Generating n-Tuples with SQL

  • SQL Kiwi (5/30/2012)


    ChrisM@Work (5/30/2012)


    This quaint snippet runs in (only) about 40% more time than the usual inline CTE tally table - which is itself no longer the fastest kid on the block.

    Hey so what is the fastest kid on the block these days?

    Hi Paul

    I wouldn't know, haven't had much opportunity to lurk here or elsewhere recently - but here are two alternatives which are measurably faster if you run them enough times:

    /*

    each of the queries was executed 1200 times as follows:

    DBCC FREEPROCCACHE

    DBCC DROPCLEANBUFFERS

    SET @i = 0; WHILE @i < 20 BEGIN

    SET @Starttime = GETDATE()

    run the query

    INSERT INTO #RunDurations (Method, msDuration) SELECT 'Alternative B 6', DATEDIFF(ms,@Starttime, GETDATE())

    SET @i = @i+1; END

    repeat for the other six queries

    change the order of the queries in the script, last becomes first

    repeat for all seven queries

    repeat this cycle

    */

    -- Standard inline CTE tally (baseline):

    ;WITH

    E1(N) AS (

    SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL

    SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL

    SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1

    ), --10E+1 or 10 rows

    E2(N) AS (SELECT 1 FROM E1 a CROSS JOIN E1 b), --10E+2 or 100 rows

    E4(N) AS (SELECT 1 FROM E2 a CROSS JOIN E2 b), --10E+4 or 10,000 rows max

    E5(N) AS (SELECT 1 FROM E1 a CROSS JOIN E4 b)

    SELECT ROW_NUMBER() OVER(ORDER BY (SELECT PI()))

    FROM E5

    -- Alternative A (5% faster)

    ;WITH

    t1 AS (SELECT n FROM (VALUES (0),(1),(2),(3),(4),(5),(6),(7),(8),(9)) seeds(n)),

    t2 AS (SELECT n=n*10 FROM t1),

    t3 AS (SELECT n=n*10 FROM t2),

    t4 AS (SELECT n=n*10 FROM t3),

    t5 AS (SELECT n=n*10 FROM t4)

    SELECT n = (t1.n + t2.n + t3.n + t4.n + t5.n)

    FROM t4 CROSS JOIN t1 CROSS JOIN t2 CROSS JOIN t3 CROSS JOIN t5

    -- Alternative B (6% faster)

    ;WITH t1 AS (SELECT n1 FROM (VALUES (0),(1),(2),(3),(4),(5),(6),(7),(8),(9)) seeds(n1))

    SELECT n = (n1 + n2 + n3 + n4 + n5)

    FROM (SELECT n1*1000 FROM t1) t4 (n4)

    CROSS JOIN t1

    CROSS JOIN (SELECT n1*10 FROM t1) t2 (n2)

    CROSS JOIN (SELECT n1*100 FROM t1) t3 (n3)

    CROSS JOIN (SELECT n1*10000 FROM t1) t5 (n5)

    -- Alternative C (4% faster)

    ;WITH

    t1 AS (SELECT n1=n FROM (VALUES (0),(1),(2),(3),(4),(5),(6),(7),(8),(9)) seeds(n)),

    t2 AS (SELECT n2=n1*10 FROM t1),

    t3 AS (SELECT n3=n2*10 FROM t2),

    t4 AS (SELECT n4=n3*10 FROM t3),

    t5 AS (SELECT n5=n4*10 FROM t4)

    SELECT n = (n1 + n2 + n3 + n4 + n5)

    FROM t4 CROSS JOIN t1 CROSS JOIN t2 CROSS JOIN t3 CROSS JOIN t5

    -- Alternative C: Row-constructor tally (6% faster)

    -- http://www.sqlservercentral.com/Forums/FindPost1189755.aspx

    SELECT n = (n1 + n2 + n3 + n4 + n5)

    FROM (VALUES (0),(1000),(2000),(3000),(4000),(5000),(6000),(7000),(8000),(9000)) t4 (n4)

    CROSS JOIN (VALUES (0),(1),(2),(3),(4),(5),(6),(7),(8),(9)) t1 (n1)

    CROSS JOIN (VALUES (0),(10),(20),(30),(40),(50),(60),(70),(80),(90)) t2 (n2)

    CROSS JOIN (VALUES (0),(100),(200),(300),(400),(500),(600),(700),(800),(900)) t3 (n3)

    CROSS JOIN (VALUES (0),(10000),(20000),(30000),(40000),(50000),(60000),(70000),(80000),(90000)) t5 (n5)

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • ChrisM@Work (5/31/2012)


    I wouldn't know, haven't had much opportunity to lurk here or elsewhere recently - but here are two alternatives which are measurably faster if you run them enough times:

    Thanks, Chris.

  • Thanks Jeff, Paul and Chris for stopping by again.

    I'm amazed this discussion thread is still active!


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • chris.stuart (5/17/2012)


    I always enjoy reading the articles, but this one got me a bit stumped. Where would you actually use this?:unsure:

    To Chris and all the folks out there that took enough of an interest in this article to read and discuss it, I'd like to provide an additional example to Chris' question.

    Recently this thread appeared: http://www.sqlservercentral.com/Forums/Topic1318149-392-1.aspx and the solution to it involved the unique, ordered n-tuples (combinations) script described in this article. Thanks to ChrisM@Work, we were able to improve the speed of the solution and the thread provides a very interesting example of where it can be used.

    I hope this additional information will generate some new ideas on how these scripts can be used, and I fervently hope I will be able to participate in those when they arise.

    Thanks for listening.


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • For anyone that's interested in these rCTEs and because there's been a recent flurry of possible uses appear in the forums, I took another look trying to improve the speed.

    I tried various combinations but the one that seemed to be the best was this one (UNIQUEnTuples).

    ;WITH UNIQUEnTuples (n, Tuples, ID) AS (

    SELECT DISTINCT 1, CAST(strcol AS VARCHAR(8000)), strcol

    FROM @t

    UNION ALL

    SELECT 1 + n.n, t.strcol + ',' + n.Tuples, strcol

    FROM UNIQUEnTuples n

    CROSS APPLY (

    SELECT strcol

    FROM @t t

    WHERE t.strcol < n.ID) t

    )

    SELECT *

    FROM UNIQUEnTuples

    ORDER BY n, Tuples

    I used 18 unique n-Tuples which generates 262,143 rows (combinations).

    Using a temp table instead of a table variable and dumping the output into holding variables, time improvements (averaged over 5 runs) were:

    - CPU: 44%

    - Elapsed: 45% Edit: Corrected

    Removing DISTINCT of course (if you can get away with it) measurably improved the speed as well.

    Changes:

    - The Tuples column was changed to VARCHAR(8000) on the assumption that you probably aren't working with so many combinations (and your IDs are short enough) that you need 2GB of storage for them.

    - Switched the order of tables accessed in the recursive leg and used a CROSS APPLY instead of a JOIN.

    - Added the ID column to the rCTE, to allow checking against a much shorter length string (or if your ID is an INT it would then switch to an INT comparison).

    - Removed the CHARINDEX check because it was not needed.

    I verified each of these improvements incrementally so each had some effect. Presumably, similar changes to the nTuples rCTE would have similar performance impacts.


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • It's been fun to watch the permutations in this discussion. Thanks for the article and the followups, Dwain.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (10/3/2012)


    It's been fun to watch the permutations in this discussion. Thanks for the article and the followups, Dwain.

    You might be interested to know I'm working on a follow up article that applies the UNIQUEnTuples approach to a specific problem. I plan to do some performance comparisons, including against the set-based WHILE loop you were attempting here.


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

Viewing 7 posts - 46 through 51 (of 51 total)

You must be logged in to reply to this topic. Login to reply