Create a Tally Function (fnTally)

  • Jeff Moden wrote:

    Heh... wonder if that table is actually spt_values?

    My gut says the two are unrelated. As far as I can tell, spt_values is just an old EAV type lookup table/view dating back to the Sybase era. The only time I've seen it used outside of a system procedure is in an old Adam Machanic article dealing with large bitmasks. Dealing with very large bitmasks

    Plus, I'm fairly certain that the execution plan nodes would show table scans of spt_values rather that the more cryptic constant scans.

    Of course, I've been wrong before and it's entirely possible that I'm wrong this time too...

  • I find it humorous that, for someone who argued so strongly against Hungarian notation on another thread, this object names starts with "fn".  🙂

    Question: rather than have one function that does it all, would it make economic sense to have different functions for SMALLINT, INT, etc, and for starting at 0 rather than 1?  Would separate functions perform better enough to warrant multiple functions, or does the benefit of stopping at @MaxN perform faster than always calculating enough values cover INT or BIGINT?

    In such a case, the code would change from "FROM dbo.fnTally(1,@MaxN)" to "FROM dbo.fnTallyInt1" or "FROM dbo.fnTallyBig0"

  • I think it comes down to a balance between performance and management. Sure, your proposal to have a slew of functions to fit various criteria would work and would likely perform quite well, but it would be more difficult to manage. Not only from a consumer point of view (those who would use the functions), but also from a developer point of view (the maintainer of the functions).

  • Aaron N. Cutshall wrote:

    No worries, Jeff. I also use the first one most primarily (99+%) with only a few use cases for the offset one. You got me to thinking about the UNION ALL issue. I wonder if this version would perform any better?

    CREATE FUNCTION dbo.fnTally(@RowCnt bigint, @ZeroOrOne bit = 1)
    RETURNS TABLE WITH SCHEMABINDING
    AS RETURN
    WITH e1(n) AS (SELECT n FROM (VALUES (1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) AS x(n)) -- 16 records
    ,e2(n) AS (SELECT 1 FROM e1 a CROSS JOIN e1 b) -- 16^2 or 256 records (16*16)
    ,e4(n) AS (SELECT 1 FROM e2 a CROSS JOIN e2 b) -- 16^4 or 65,536 records (256*256)
    ,e8(n) AS (SELECT 1 FROM e4 a CROSS JOIN e4 b) -- 16^8 or 4,294,967,296 records (65,536*65,536)
    SELECT TOP (@RowCnt-CAST(@ZeroOrOne as bigint)) ROW_NUMBER() OVER (ORDER BY n) - (CAST(~@ZeroOrOne as bigint)) AS n
    FROM e8;
    GO

    I am also amazed at the volume of folks who overuse recursive CTEs. It almost seems that they go out of their way to use them!

    I've got a function I sometimes use too. I've edited it to have the same signature as Jeff's code:

    CREATE FUNCTION dbo.fnTally1(@ZeroOrOne bit,@MaxN bigint)
    RETURNS TABLE WITH SCHEMABINDING AS
    RETURN WITH A(A) AS (SELECT '' FROM (VALUES (0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) T(C))
    SELECT TOP(@MaxN + CASE WHEN @ZeroOrOne=0 THEN 1 ELSE 0 END) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) - 1 + @ZeroOrOne N
    FROM A A,A B,A C,A D,A E,A F,A G,A H -- 16^8
    GO

    It seems to make very little difference to performance whichever way you do it.

     

  • It's true that the result is the same either way, however I prefer not to use the old method you have in your FROM statement of comma separated sources instead of explicit CROSS JOIN statements.

  • fahey.jonathan wrote:

    I find it humorous that, for someone who argued so strongly against Hungarian notation on another thread, this object names starts with "fn".  🙂

    Heh... I did explain that in the text.  I would have called it something else but couldn't resist the HR compliant usage since I'm actually a walking HR violation. 😀

    fahey.jonathan wrote:

    Question: rather than have one function that does it all, would it make economic sense to have different functions for SMALLINT, INT, etc, and for starting at 0 rather than 1?  Would separate functions perform better enough to warrant multiple functions, or does the benefit of stopping at @MaxN perform faster than always calculating enough values cover INT or BIGINT?

    GREAT questions!!!

    1. Yes... you can get a pretty decent (by %) performance improvement on the things that start at 1 by having a dedicated function that always starts at 1.  There is a cost to the test of whether to do the 0-UNION ALL thing even if you're not using 0 but the tradeoff was small enough for me to go through the inconvenience of having separately named functions.As for having different functions based on the datatype of the usage, I don't know because I've not tested it.  I can only tell you that decreasing the number of Cross Joins, as in the posted function, did help a good bit and it makes looking at execution plans that use the function a whole lot less complicated.  I also haven't tested just how expensive a datatype conversion within the function might be.  It might even be a slight performance improvement to do so in that fewer bytes have to make a trip through the system (ex. Compare the size of BIGINT to INT... 2:1 ratio).
    2. As for the @MaxN thing... that is used in a TOP().  There is no performance advantage to not having it and there is a performance disadvantage to not having it even for TINYINT applications because if you only need to generate 10 rows in a high volume per-row usage scenario, there's a relatively large performance advantage compared to generating 256 rows.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".
    "Dear Lord... I'm a DBA so please give me patience because, if you give me strength, I'm going to need bail money too!"

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden wrote:

    Thanks for the very kind feedback, Jason.

    Interesting code, Jason.  While I've seen such a thing before, I've not seen many folks that provide a starting value and number of rows to return.

    Interesting.  I actually did this back in 2009 when I wrote this article: https://www.sqlservercentral.com/articles/the-dynamic-tally-or-numbers-table

     

  • Lynn Pettis wrote:

    Jeff Moden wrote:

    Thanks for the very kind feedback, Jason.

    Interesting code, Jason.  While I've seen such a thing before, I've not seen many folks that provide a starting value and number of rows to return.

    Interesting.  I actually did this back in 2009 when I wrote this article: https://www.sqlservercentral.com/articles/the-dynamic-tally-or-numbers-table

    Lordy... I totally missed this comment, Lynn.  My apologies.

    That was a good article and a lot of people made comments.  You put a good amount of work into it and it showed.  I said so in the discussion for that article but I'll say it again here, thank you, good Sir!

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".
    "Dear Lord... I'm a DBA so please give me patience because, if you give me strength, I'm going to need bail money too!"

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 8 posts - 16 through 23 (of 23 total)

You must be logged in to reply to this topic. Login to reply