Create a Tally Function (fnTally)

  • Jeff Moden wrote:

    Heh... wonder if that table is actually spt_values?

    My gut says the two are unrelated. As far as I can tell, spt_values is just an old EAV type lookup table/view dating back to the Sybase era. The only time I've seen it used outside of a system procedure is in an old Adam Machanic article dealing with large bitmasks. Dealing with very large bitmasks

    Plus, I'm fairly certain that the execution plan nodes would show table scans of spt_values rather that the more cryptic constant scans.

    Of course, I've been wrong before and it's entirely possible that I'm wrong this time too...

  • I find it humorous that, for someone who argued so strongly against Hungarian notation on another thread, this object names starts with "fn".  🙂

    Question: rather than have one function that does it all, would it make economic sense to have different functions for SMALLINT, INT, etc, and for starting at 0 rather than 1?  Would separate functions perform better enough to warrant multiple functions, or does the benefit of stopping at @MaxN perform faster than always calculating enough values cover INT or BIGINT?

    In such a case, the code would change from "FROM dbo.fnTally(1,@MaxN)" to "FROM dbo.fnTallyInt1" or "FROM dbo.fnTallyBig0"

  • I think it comes down to a balance between performance and management. Sure, your proposal to have a slew of functions to fit various criteria would work and would likely perform quite well, but it would be more difficult to manage. Not only from a consumer point of view (those who would use the functions), but also from a developer point of view (the maintainer of the functions).

  • Aaron N. Cutshall wrote:

    No worries, Jeff. I also use the first one most primarily (99+%) with only a few use cases for the offset one. You got me to thinking about the UNION ALL issue. I wonder if this version would perform any better?

    CREATE FUNCTION dbo.fnTally(@RowCnt bigint, @ZeroOrOne bit = 1)
    RETURNS TABLE WITH SCHEMABINDING
    AS RETURN
    WITH e1(n) AS (SELECT n FROM (VALUES (1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) AS x(n)) -- 16 records
    ,e2(n) AS (SELECT 1 FROM e1 a CROSS JOIN e1 b) -- 16^2 or 256 records (16*16)
    ,e4(n) AS (SELECT 1 FROM e2 a CROSS JOIN e2 b) -- 16^4 or 65,536 records (256*256)
    ,e8(n) AS (SELECT 1 FROM e4 a CROSS JOIN e4 b) -- 16^8 or 4,294,967,296 records (65,536*65,536)
    SELECT TOP (@RowCnt-CAST(@ZeroOrOne as bigint)) ROW_NUMBER() OVER (ORDER BY n) - (CAST(~@ZeroOrOne as bigint)) AS n
    FROM e8;
    GO

    I am also amazed at the volume of folks who overuse recursive CTEs. It almost seems that they go out of their way to use them!

    I've got a function I sometimes use too. I've edited it to have the same signature as Jeff's code:

    CREATE FUNCTION dbo.fnTally1(@ZeroOrOne bit,@MaxN bigint)
    RETURNS TABLE WITH SCHEMABINDING AS
    RETURN WITH A(A) AS (SELECT '' FROM (VALUES (0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) T(C))
    SELECT TOP(@MaxN + CASE WHEN @ZeroOrOne=0 THEN 1 ELSE 0 END) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) - 1 + @ZeroOrOne N
    FROM A A,A B,A C,A D,A E,A F,A G,A H -- 16^8
    GO

    It seems to make very little difference to performance whichever way you do it.

     

  • It's true that the result is the same either way, however I prefer not to use the old method you have in your FROM statement of comma separated sources instead of explicit CROSS JOIN statements.

  • fahey.jonathan wrote:

    I find it humorous that, for someone who argued so strongly against Hungarian notation on another thread, this object names starts with "fn".  🙂

    Heh... I did explain that in the text.  I would have called it something else but couldn't resist the HR compliant usage since I'm actually a walking HR violation. 😀

    fahey.jonathan wrote:

    Question: rather than have one function that does it all, would it make economic sense to have different functions for SMALLINT, INT, etc, and for starting at 0 rather than 1?  Would separate functions perform better enough to warrant multiple functions, or does the benefit of stopping at @MaxN perform faster than always calculating enough values cover INT or BIGINT?

    GREAT questions!!!

    1. Yes... you can get a pretty decent (by %) performance improvement on the things that start at 1 by having a dedicated function that always starts at 1.  There is a cost to the test of whether to do the 0-UNION ALL thing even if you're not using 0 but the tradeoff was small enough for me to go through the inconvenience of having separately named functions.As for having different functions based on the datatype of the usage, I don't know because I've not tested it.  I can only tell you that decreasing the number of Cross Joins, as in the posted function, did help a good bit and it makes looking at execution plans that use the function a whole lot less complicated.  I also haven't tested just how expensive a datatype conversion within the function might be.  It might even be a slight performance improvement to do so in that fewer bytes have to make a trip through the system (ex. Compare the size of BIGINT to INT... 2:1 ratio).
    2. As for the @MaxN thing... that is used in a TOP().  There is no performance advantage to not having it and there is a performance disadvantage to not having it even for TINYINT applications because if you only need to generate 10 rows in a high volume per-row usage scenario, there's a relatively large performance advantage compared to generating 256 rows.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden wrote:

    Thanks for the very kind feedback, Jason.

    Interesting code, Jason.  While I've seen such a thing before, I've not seen many folks that provide a starting value and number of rows to return.

    Interesting.  I actually did this back in 2009 when I wrote this article: https://www.sqlservercentral.com/articles/the-dynamic-tally-or-numbers-table

     

  • Lynn Pettis wrote:

    Jeff Moden wrote:

    Thanks for the very kind feedback, Jason.

    Interesting code, Jason.  While I've seen such a thing before, I've not seen many folks that provide a starting value and number of rows to return.

    Interesting.  I actually did this back in 2009 when I wrote this article: https://www.sqlservercentral.com/articles/the-dynamic-tally-or-numbers-table

    Lordy... I totally missed this comment, Lynn.  My apologies.

    That was a good article and a lot of people made comments.  You put a good amount of work into it and it showed.  I said so in the discussion for that article but I'll say it again here, thank you, good Sir!

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • I use this function with recursion and @top of 1000000 is enough for me, but I don't n know if it performs better:

    -- by Carlo.Romagnano

    CREATE FUNCTION dbo.fn_tally(@top INT)

    RETURNS TABLE

    AS

    RETURN WITH tally

    AS

    (

    SELECT 1 AS idx

    UNION ALL

    SELECT 1 + tally.idx AS idx

    FROM tally

    WHERE tally.idx < 100

    )

    SELECT TOP(@top) ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) AS idx

    FROM tally T100

    ,tally T10000

    ,tally T1000000

    ,tally T100000000 --if you need less rows, comment this line and get more speed

    ,tally T10000000000 --if you need less rows, comment this line and get more speed

  • You might be interested in an interesting series of articles by Itzik Ben-Gan:

    https://sqlperformance.com/2021/04/t-sql-queries/number-series-solutions-4

  • Ken McKelvey wrote:

    You might be interested in an interesting series of articles by Itzik Ben-Gan:

    https://sqlperformance.com/2021/04/t-sql-queries/number-series-solutions-4

    Ken, Thank you for posting this link.

  • Note that Microsoft is preparing to switch, yet again, to a different feedback system.

    Microsoft will be moving away from UserVoice sites on a product-by-product basis throughout the 2021 calendar year. We will leverage 1st party solutions for customer feedback. Learn more here.

    I hope feedback isn't lost or discounted in the transition.

  • Carlo Romagnano wrote:

    I use this function with recursion and @top of 1000000 is enough for me, but I don't n know if it performs better:

    -- by Carlo.Romagnano CREATE FUNCTION dbo.fn_tally(@top INT) RETURNS TABLE AS RETURN WITH tally AS ( SELECT 1 AS idx UNION ALL SELECT 1 + tally.idx AS idx FROM tally WHERE tally.idx < 100 ) SELECT TOP(@top) ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) AS idx FROM tally T100 ,tally T10000 ,tally T1000000 ,tally T100000000 --if you need less rows, comment this line and get more speed ,tally T10000000000 --if you need less rows, comment this line and get more speed

    Thanks for posting your code, Carlo.  Please see the following article for why you should generally avoid such rCTEs even with small rowcounts.

    https://www.sqlservercentral.com/articles/hidden-rbar-counting-with-recursive-ctes

    You also shouldn't have to modify the lines of code in a function to "get more speed" depending on row sizes.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • ratbak wrote:

    Note that Microsoft is preparing to switch, yet again, to a different feedback system.

    Microsoft will be moving away from UserVoice sites on a product-by-product basis throughout the 2021 calendar year. We will leverage 1st party solutions for customer feedback. Learn more here.

    I hope feedback isn't lost or discounted in the transition.

    You mean like the last time... I wouldn't count on that.

    I also consider that to be mostly a "marketing for Microsoft" site.  Consider the number of hits the bloody community put on the "dark theme" request in SSMS and how few hits there are on the request to fix the broken STRING_SPLIT() function or the 12 years it's been for a high performance machine language level fnTally() function to be built into T-SQL like it has been in much cheaper versions of RDBMS software almost since their inception 1 to 2 decades ago.

    And then there are "joys" like the bloody performance issues built into the FORMAT() function and the fact that the newer temporal datatypes like DATE, TIME, DATETIME2 are no longer ANSI compliant when it comes to date math.

    None of that will ever be fixed because it's obviously much more important to come of with things like the "dark theme" than it is to fix partially broken or limited usage stuff.  Heh... and when they do, it's not actually a fix but some bloody workaround like DATEDIFF_BIG or the current rendition of that damned PIVOT operator (which is MUCH better in MS Access!).

     

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jason A. Long wrote:

    Ken McKelvey wrote:

    You might be interested in an interesting series of articles by Itzik Ben-Gan:

    https://sqlperformance.com/2021/04/t-sql-queries/number-series-solutions-4

    Ken, Thank you for posting this link.

    I was involved in that series for a while.  I still don't understand the need to punish a function on a permanent basis by adding things to it that are easy to add outside of the function only when needed.  I stopped participating when it seemed like the overwhelming majority seemed to think it was a great idea to add all that stuff, which I actually never have the need for nor have I seen folks that need it.

    The good part about it is that it the series is that it does show a cool performance gain by using a special "dummy" table with no rows to get the function to run in the batch mode instead of the rows mode in 2019.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 15 posts - 16 through 30 (of 41 total)

You must be logged in to reply to this topic. Login to reply