Inline table-valued function to list numbers

  • Jeff Moden (11/13/2016)


    TheTrojan (11/13/2016)


    @mister.magoo You will be happy to know that your solution is slightly faster than mine! Have a look at the times:

    -- if the parameter is 1000: 76 ms (instead of 280 ms)

    -- if the parameter is 10000: 233 ms (instead of 400 ms)

    -- if the parameter is 100000: 1920 ms (instead of 2183 ms)

    Incredible. You've gone from not knowing what a Tally Table/Tally function is to building and testing them all in just a day or two AND you appear to understand how they're used. You also understand the principle of "pseudo-cursors" and why rCTEs are all they're cracked up to be. And, you solved the "Fizz Buzz" problem that you originally posted about.

    You're going to do VERY well in the world of databases. Very glad to have met you.

    Now that you know all that, here's the Tally Function that I use. You can tell it to start at 1 or 0, both of which are common uses. The reason why I still use SELECT/UNION ALL instead of VALUES is because I still need to support a lot of folks that use 2005. There's no appreciable difference in performance between the two. SELECT/UNION ALL simply makes the code a little longer.

    As a bit of a sidebar, this is also how I document my code in real life and, yes, this is a function that I use in production.

    CREATE FUNCTION [dbo].[fnTally]

    /**********************************************************************************************************************

    Purpose:

    Return a column of BIGINTs from @ZeroOrOne up to and including @MaxN with a max value of 1 Trillion.

    As a performance note, it takes about 00:02:10 (hh:mm:ss) to generate 1 Billion numbers to a throw-away variable.

    Usage:

    --===== Syntax example (Returns BIGINT)

    SELECT t.N

    FROM dbo.fnTally(@ZeroOrOne,@MaxN) t

    ;

    Notes:

    1. Based on Itzik Ben-Gan's cascading CTE (cCTE) method for creating a "readless" Tally Table source of BIGINTs.

    Refer to the following URLs for how it works and introduction for how it replaces certain loops.

    http://www.sqlservercentral.com/articles/T-SQL/62867/

    http://sqlmag.com/sql-server/virtual-auxiliary-table-numbers

    2. To start a sequence at 0, @ZeroOrOne must be 0 or NULL. Any other value that's convertable to the BIT data-type

    will cause the sequence to start at 1.

    3. If @ZeroOrOne = 1 and @MaxN = 0, no rows will be returned.

    5. If @MaxN is negative or NULL, a "TOP" error will be returned.

    6. @MaxN must be a positive number from >= the value of @ZeroOrOne up to and including 1 Billion. If a larger

    number is used, the function will silently truncate after 1 Billion. If you actually need a sequence with

    that many values, you should consider using a different tool. ;-)

    7. There will be a substantial reduction in performance if "N" is sorted in descending order. If a descending

    sort is required, use code similar to the following. Performance will decrease by about 27% but it's still

    very fast especially compared with just doing a simple descending sort on "N", which is about 20 times slower.

    If @ZeroOrOne is a 0, in this case, remove the "+1" from the code.

    DECLARE @MaxN BIGINT;

    SELECT @MaxN = 1000;

    SELECT DescendingN = @MaxN-N+1

    FROM dbo.fnTally(1,@MaxN);

    8. There is no performance penalty for sorting "N" in ascending order because the output is explicity sorted by

    ROW_NUMBER() OVER (ORDER BY (SELECT NULL))

    Revision History:

    Rev 00 - Unknown - Jeff Moden

    - Initial creation with error handling for @MaxN.

    Rev 01 - 09 Feb 2013 - Jeff Moden

    - Modified to start at 0 or 1.

    Rev 02 - 16 May 2013 - Jeff Moden

    - Removed error handling for @MaxN because of exceptional cases.

    Rev 03 - 22 Apr 2015 - Jeff Moden

    - Modify to handle 1 Trillion rows for experimental purposes.

    **********************************************************************************************************************/

    (@ZeroOrOne BIT, @MaxN BIGINT)

    RETURNS TABLE WITH SCHEMABINDING AS

    RETURN WITH

    E1(N) AS (SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL

    SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL

    SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL

    SELECT 1) --10E1 or 10 rows

    , E4(N) AS (SELECT 1 FROM E1 a, E1 b, E1 c, E1 d) --10E4 or 10 Thousand rows

    ,E12(N) AS (SELECT 1 FROM E4 a, E4 b, E4 c) --10E12 or 1 Trillion rows

    SELECT N = 0 WHERE ISNULL(@ZeroOrOne,0)= 0 --Conditionally start at 0.

    UNION ALL

    SELECT TOP(@MaxN) N = ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM E12 -- Values from 1 to @MaxN

    ;

    Just throwing a small spanner (1/4 inch) in the works, the pre "populating" method where the CTEs accumulate the full cardinality is roughly 10% slower than using lower a cardinality CTE with a cross join.

    😎

    Test set

    USE TEEST;

    GO

    SET NOCOUNT ON;

    GO

    DECLARE @NUM INT = 10000000;

    DECLARE @INT_BUCKET INT = 0;

    -- IBG UNION

    ;WITH

    C1(N) AS

    (

    SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL

    SELECT 1 UNION ALL SELECT 1

    ), -- 5 rows

    C2(N) AS (SELECT 1 FROM C1 a, C1 b), -- 25 rows

    C3(N) AS (SELECT 1 FROM C2 a, C2 b), -- 625 rows

    C4(N) AS (SELECT 1 FROM C3 a, C3 b), -- 390,625 rows

    C5(N) AS (SELECT 1 FROM C4 a, C4 b) -- 390,625 rows

    SELECT

    TOP(@Num)

    @INT_BUCKET = ROW_NUMBER() OVER (ORDER BY (SELECT NULL))

    FROM C5;

    GO

    -- EE T(N)

    DECLARE @NUM INT = 10000000;

    DECLARE @INT_BUCKET INT = 0;

    ;WITH T(N) AS (SELECT X.N FROM (VALUES (0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) X(N))

    , NUMS(N) AS (SELECT TOP(@NUM) ROW_NUMBER() OVER (ORDER BY @@VERSION) AS N

    FROM T T1,T T2,T T3,T T4,T T5,T T6,T T7,T T8,T T9)

    SELECT

    @INT_BUCKET = NM.N

    FROM NUMS NM

    GO

    Extended Event Timing

    CREATE EVENT SESSION [TEEST_TSQL_Timing_001] ON SERVER

    ADD EVENT sqlserver.sql_statement_completed(SET collect_statement=(1)

    ACTION(package0.collect_cpu_cycle_time,package0.collect_system_time,sqlserver.context_info,sqlserver.database_name)

    WHERE ([sqlserver].[database_name]=N'TEEST'))

    ADD TARGET package0.event_file(SET filename=N'C:\EE_DATA2\SQLXE\TEEST_TSQL_Timings_002')

    WITH (MAX_MEMORY=4096 KB,EVENT_RETENTION_MODE=ALLOW_SINGLE_EVENT_LOSS,MAX_DISPATCH_LATENCY=30 SECONDS,MAX_EVENT_SIZE=0 KB,MEMORY_PARTITION_MODE=NONE,TRACK_CAUSALITY=OFF,STARTUP_STATE=OFF)

    GO

    Output (Win10 4Core Atom tablet)

    name duration cpu_time row_count

    ---------- --------- --------- ---------

    EE T(N) 5963373 5953000 10000000

    IBG UNION 6461994 6438000 10000000

  • Eirikur Eiriksson (11/15/2016)


    Just throwing a small spanner (1/4 inch) in the works, the pre "populating" method where the CTEs accumulate the full cardinality is roughly 10% slower than using lower a cardinality CTE with a cross join.

    Output (Win10 4Core Atom tablet)

    name duration cpu_time row_count

    ---------- --------- --------- ---------

    EE T(N) 5963373 5953000 10000000

    IBG UNION 6461994 6438000 10000000

    You didn't label the code. I'm assuming that the "IBG union" that you speak of is the first snippet of code in your test? If so, I agree that the smaller IBG UNIONs are a bit slower and that's the reason why I made my fnTally use (10^4)^3. Not sure how that compares with what your EE T(N) method does because I haven't tested it. I do know that the "zero detection/creation" does slow things down a tiny amount and took it as a small tradeoff for flexibility and simplicity.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (11/15/2016)


    Eirikur Eiriksson (11/15/2016)


    Just throwing a small spanner (1/4 inch) in the works, the pre "populating" method where the CTEs accumulate the full cardinality is roughly 10% slower than using lower a cardinality CTE with a cross join.

    Output (Win10 4Core Atom tablet)

    name duration cpu_time row_count

    ---------- --------- --------- ---------

    EE T(N) 5963373 5953000 10000000

    IBG UNION 6461994 6438000 10000000

    You didn't label the code. I'm assuming that the "IBG union" that you speak of is the first snippet of code in your test? If so, I agree that the smaller IBG UNIONs are a bit slower and that's the reason why I made my fnTally use (10^4)^3. Not sure how that compares with what your EE T(N) method does because I haven't tested it. I do know that the "zero detection/creation" does slow things down a tiny amount and took it as a small tradeoff for flexibility and simplicity.

    Sorry, my bad, corrected now.

    😎

    I'll see if I find time to set up a proper harness.

  • Several years ago I got "Fizz Buzz" as an interview question. It is not that tricky, but people tend to write something like this:

    CASE WHEN x % 5 = 0 THEN 'buzz'

    WHEN x % 3 = 0 THEN 'fizz'

    WHEN (x % 3) = 0 AND (x % 5) = 0 THEN 'fizz buzz'

    ELSE NULL END

    Oops this does not work! They simply read the specs and wrote when clauses in the order they heard it. But furthermore, we know that the when clauses in the case expression are evaluated left or right. We need to check the strongest condition first, divisibility by both five and three.

    But even people who got this far tend not to see it being divisible by both three and five means that you are divisible by (3 * 5).

    CASE WHEN x % 15 = 0 THEN 'fizz buzz'

    WHEN x % 3 = 0 THEN 'fizz' -- denser than the next condition.

    WHEN x % 5 = 0 THEN 'buzz'

    ELSE NULL END

    Decades ago, one of the standard interview problems was to write a simple binary search on a one-dimensional array. In fact, the Journal of the ACM had an article on this, which I am sure I have somewhere in my vertical filing cabinet of 40 years of clippings.

    Books in Celko Series for Morgan-Kaufmann Publishing
    Analytics and OLAP in SQL
    Data and Databases: Concepts in Practice
    Data, Measurements and Standards in SQL
    SQL for Smarties
    SQL Programming Style
    SQL Puzzles and Answers
    Thinking in Sets
    Trees and Hierarchies in SQL

  • @celko As a newbie, I must admit that I was aware that if a number is divisible by 3 AND by 5, it means that it is divisible by 3*5 (and I used it in my code). Yet I could not get that it's more correct to put "WHEN x % 3" before "WHEN x % 5" as you and Jeff said. I though it was exactly the same. Now I know! Thanks for pointing this out.

  • CELKO (11/15/2016)


    Decades ago, one of the standard interview problems was to write a simple binary search on a one-dimensional array.

    That's still a fun little exercise... especially if you also have them write the sort necessary to drive the binary search.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 6 posts - 16 through 20 (of 20 total)

You must be logged in to reply to this topic. Login to reply