The "Numbers" or "Tally" Table: What it is and how it replaces a loop

  • Carl Federl (5/7/2008)


    From the article, you wrote " Tally table .. starting at 0 or 1 (mine start at 1)"

    There are advantages for having the Tally table include 0 and the "Freight by Date" problem becomes easier if 0 is included.

    Yep... I absolutely agree. It depends on what most of your work with the Tally table will be. If most of that work is with dates, then the zero based Tally table is the better way to go. Most of my work doesn't involve dates so I use a Unit based Tally table... keeps me from having to write AND t.N BETWEEN 1 AND...

    Thanks for posting all the code, Carl. Lots of folks talk about alternatives... when folks like you, GSquared, and all the others offer up code, it really adds to the discussion. Even when some folks offer up code that doesn't quite sit right, it shows that they're thinking and others jump in and think with them... and it helps everyone.

    Thanks again! 🙂

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • GSquared (5/7/2008)


    Hugo:

    Just SP is fine (though I personally dislike the habit - as if anything other than a stored procedure might follow an EXEC keyword). It's SP_ (with an underscore directly following the SP) that turns it into the special prefix reserved for system stored procedures.

    Actually, you can execute scalar UDFs.

    create function ExecTest()

    returns int

    as

    begin

    return (1)

    end

    go

    declare @a int

    exec @a = dbo.exectest

    select @a

    Will select 1

    Interesting sidebar, there... you beat me to it, too! Thanks Gus!

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • GSquared (5/7/2008)


    I ran some speed and load tests on this code.

    Summary: While the CTE works, and is fast by any normal standard, the Numbers version is even faster, and requires less IO.

    Note: All tests run 5 or more times on an isolated box running no concurrent queries.

    Awesome as usual, Gus. Do me a favor... run the following on that same box, please... let us know how it turns out... Thanks.

    --===== Gus' original test parameter

    DECLARE @params varchar(8000), @Res varchar(10)

    --SET @params = '1,2,3,4,5,6,7,8,9,10';

    select @params = coalesce(@params + ',' + cast(number as varchar(10)),

    cast(number as varchar(10)))

    from dbo.numbers

    where number between 1 and 1820

    DECLARE @Top INT

    SET @Top = LEN(@Params)-1

    ;WITH

    cteTally AS

    (--==== "Modenized" CTE Tally table

    SELECT TOP(@Top) ROW_NUMBER() OVER(ORDER BY sc1.Object_ID) AS N

    FROM Master.sys.All_Columns sc1,

    Master.sys.All_Columns sc2

    )

    --Gus' orignal code with a tweek

    SELECT @res =

    SUBSTRING(@params+',', N,

    CHARINDEX(',', @params+',', N) - N) --as Parsed

    FROM cteTally

    WHERE SUBSTRING(',' + @params, N, 1) = ','

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • JohnG (5/7/2008)


    As always, a great article from Jeff.

    As for performance testing of various techniques by anyone, I am continually amazed as to the incorrect methods for generating sample data.

    Example (taken from this thread)

    SET @params =

    '1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19

    For the use of a tally table or other mechanism, populating a variable with sample data like the above is completely wrong! Those delimited values would never be passed by an application. The proper approach for generating sample data is to use RANDOM values.

    Thanks for the compliment John...

    Setting up the @params the way they did is good for "discovery" testing because each parameter element IS numbered sequentially and very easy to troubleshoot. But, I agree, you have to do some random testing and it should be a stress test, as well. Here's some code that'll do both...;)

    DECLARE @params VARCHAR(MAX)

    SELECT TOP 10000

    @params = COALESCE(@Params+',','') + CAST(NEWID() AS VARCHAR(MAX))

    FROM Master.sys.All_Columns sc1,

    Master.sys.All_Columns sc2

    SELECT LEN(@Params), @params

    Heh... 369,999 characters across 10,000 totally random 36 character parameters should be just what the doctor ordered. :hehe:

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • GSquared (5/7/2008)


    Excellent article, Jeff. Good introduction.

    I've been using Numbers tables for a couple of years now, and I just keep finding more uses for them all the time.

    Examples include: String and Column parsing, Dates/Times lists, generating test data, cleaning up strings, random number generation, finding missing rows in ID columns

    Thanks for the compliment, Gus. I gotta say it again, I thank folks like you a Carl for taking the time to get involved and post code in these disussions. Really get's people thinking.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Hugo Kornelis (5/7/2008)


    buktseat (5/7/2008)


    This is probably a rediculous question, but how can any of your "using the tally table" examples work if you don't declare N? I guess I just don't see how N is getting created/assigned? Thanks!

    N is a column (or rather: the column) in the Tally table.

    Hugo... thanks for answering so many questions some folks had on this discussion. I really appreciate it.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • RyanRandall (5/7/2008)


    Mark (5/7/2008)


    Jeff Marcus (5/7/2008)


    If you are using SQL Server 2005 you can use ROW_NUMBER instead.

    If you're not using SQL 2005, I guess you could simulate ROW_NUMBER like this...

    declare @param varchar(8000)

    set @param = '101,2008.04.02,1.1,102,2008.04.03,2.2,103,2008.04.04,3.3'

    select

    max(case when RowNumber % 3 = 0 then v end) as val,

    max(case when RowNumber % 3 = 1 then v end) as ord,

    max(case when RowNumber % 3 = 2 then v end) as fval

    from (

    select

    len(left(@param, N)) - len(replace(left(@param, N), ',', '')) as RowNumber,

    substring(@param, N, charindex(',',','+@param+',',N+1)-N-1) v

    from dbo.Tally where N <= len(@param) and substring(','+@param+',',N,1)=',') a

    group by RowNumber / 3

    /* Results

    val ord fval

    --------------- ----------------- -----------------

    101 2008.04.02 1.1

    102 2008.04.03 2.2

    103 2008.04.04 3.3

    */

    ... but at a terrible cost in performance... the use of a Temp table makes it more than 100 times faster...

    SET NOCOUNT ON

    --===== Make an expanded parameter based on Ryan's

    declare @param varchar(8000)

    SELECT @param = COALESCE(@param+',','')+'101,2008.04.02,1.1,102,2008.04.03,2.2,103,2008.04.04,3.3'

    FROM Tally WHERE N <= 140 --Makes 140 copies of the above in a single parameter

    --===== Run Ryan's "tableless" code with a timer

    SET STATISTICS TIME ON

    select

    max(case when RowNumber % 3 = 0 then v end) as val,

    max(case when RowNumber % 3 = 1 then v end) as ord,

    max(case when RowNumber % 3 = 2 then v end) as fval

    from (

    select

    len(left(@param, N)) - len(replace(left(@param, N), ',', '')) as RowNumber,

    substring(@param, N, charindex(',',','+@param+',',N+1)-N-1) v

    from dbo.Tally where N <= len(@param) and substring(','+@param+',',N,1)=',') a

    group by RowNumber / 3

    SET STATISTICS TIME OFF

    PRINT REPLICATE('=',100)

    --===== Run the temp table code with a timer (uses lots of Ryan's code)

    SET STATISTICS TIME ON

    SELECT RowNumber = IDENTITY(INT,0,1),

    substring(@param, N, charindex(',',','+@param+',',N+1)-N-1) v

    INTO #MyHead

    FROM dbo.Tally WHERE N <= len(@param) AND substring(','+@param+',',N,1)=','

    SELECT max(case when RowNumber % 3 = 0 then v end) as val,

    max(case when RowNumber % 3 = 1 then v end) as ord,

    max(case when RowNumber % 3 = 2 then v end) as fval

    FROM #MyHead

    GROUP BY RowNumber / 3

    SET STATISTICS TIME OFF

    DROP TABLE #MyHead

    No... not picking on Ryan... he was just showing a way around using the Temp table... I'm not sure why folks try to avoid them so much. Look at all the Microsoft procs... they use Temp tables just like this...

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Steve Rosenbach (5/7/2008)


    Jeff, once again, you are my hero!!

    Beautifully-written article about such a useful topic.

    I've used a "dates" table for quite a while, and actually have a tally table in the current database that I'm working with, but really didn't understand how to use it until your article.

    The "dates" table has been a great help. It's amazing how such seemingly simple things, like these auxilliary tables, can be so useful.

    Best regards,

    SteveR

    Steve, thanks for the great compliment. I really had some hesitation in writing the article because I thought some of the "old dogs" would get after me for writing about something they already knew. Thought they'd hit me with the "why did you state the obvious" bat. 😛 Now that I see the posts in this discussion, I'm tickled to have been able to help those that didn't know. And, the "old dogs" jumped in and helped a lot with fielding questions!!! Lordy, I love this forum!

    Glad to have helped and thanks again, Steve. 🙂

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Wayne West (5/7/2008)


    Excellent article, Jeff! I came across an interesting date table article on Tech Republic that was linked here yesterday after it re-surfaced at BuilderAU.Com.au, I updated the loader to run from a tally table and the speed diff was just amazing.

    I have not yet gotten in to string parsing with a tally table, but I'll get there eventually. I did have an amusing little time re-writing the Bizz/Buzz test using it:

    [font="Courier New"]select case when (Number % 15 = 0) then 'BizzBuzz'

    when (Number % 3 = 0) then 'Bizz'

    when (Number % 5 = 0) then 'Buzz'

    else cast(Number as char)

    end

    from zNumbers

    where Number < 100[/font]

    Keep up the good work, Jeff!

    Heh! Oh No! Not the "SQL Test", again! 😛 I started a heck of a fight when Grant first posted the test... someone had written a loop to solve it and they didn't have any comments, etc, and I started ranting about how all code, even the simple stuff, should be documented and always written with performance and scalability in mind. I even wrote and example using the Tally table similar to the one you posted above. It turned out to be one huge thread! It's still going, too! I get a new email about someone posting to it every once in a while.

    Anyway... yeah, Tally tables make a huge difference in performance compared to most any form of loop (I did recently build one exception for Proper Casing, it's a fluke) and will blow the doors off of most forms of recursive CTE's.

    Thanks for the feedback and the great compliment! 🙂

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Eric Wilson (5/7/2008)


    Great article.

    I used an "Integers" table for years, but starting with 2005 found that CLR functions to stream out integers is better (faster and takes virtually no space in the DB).

    Or here's a pure T-SQL solution instead of having a table:

    http://www.sqlservercentral.com/articles/Datetime+Manipulation/61822/

    Cheers.

    Thanks for the compliment and the feedback, Eric!

    In regards to the article, check this out...

    --===== Common presets

    SET NOCOUNT ON

    DECLARE @Bitbucket INT

    DECLARE @MyCount INT

    SET @MyCount = 1000000

    SET STATISTICS TIME ON

    --===== The "other" method

    ;WITH

    L0 AS (SELECT 1 AS C UNION ALL SELECT 1), --2 rows

    L1 AS (SELECT 1 AS C FROM L0 AS A, L0 AS B),--4 rows

    L2 AS (SELECT 1 AS C FROM L1 AS A, L1 AS B),--16 rows

    L3 AS (SELECT 1 AS C FROM L2 AS A, L2 AS B),--256 rows

    L4 AS (SELECT 1 AS C FROM L3 AS A, L3 AS B),--65536 rows

    L5 AS (SELECT 1 AS C FROM L4 AS A, L4 AS B),--4294967296 rows

    num AS (SELECT ROW_NUMBER() OVER(ORDER BY C) AS N FROM L5)

    SELECT @Bitbucket = N FROM NUM WHERE N <= @MyCount

    --===== Simple available cross-join method

    ;WITH

    cteTally AS

    (

    SELECT TOP (@MyCount)

    N = ROW_NUMBER() OVER(ORDER BY sc1.Object_ID)

    FROM sys.all_columns sc1,

    sys.all_columns sc2

    )

    SELECT @Bitbucket = N FROM cteTally

    SET STATISTICS TIME OFF

    The other thing is, most things only up to 11k or so and an 11k Tally table only takes 200K bytes including the Clustered index.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Kevin Kennedy (5/7/2008)


    Really enjoyed the article.

    However... on the parameter split if parameter list happens to be exactually 8000 characters the substring in the select will fail. For example:

    DECLARE @Parameter VARCHAR(8000)

    SET @Parameter = REPLICATE('1,2,3,4,5,6,7,8,9,10',1000)

    will produce @Parameter with a length of 8000 and cause the error. I got around this buy checking the length and if it was 8000 using a select with a case statement looking for zero to be returned by charindex.

    Correct... I didn't include any error checking code in my examples... didn't want to detract from what I was trying to show and I'd prefer it to "blow up" rather than returning something like a null. In fact, it'll blow up at 7999 characters because 2 commas are added (you can actually write a split to only add 1 comma or even to handle all 8000 characters, but again, would have detracted from what I was trying to show).

    That, not withstanding, thanks for the compliment and the feedback, both! 🙂

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Matt Miller (5/7/2008)


    What - only 12 useful tips? in a single article?:hehe:

    Stop it now - you're setting the bar too high for the rest of us....

    (superlative as usual).....

    Ah, my ol' testing buddy! Glad you could make it! Yeah, I know... I'm slippin' in my old age 😛

    Thanks for the compliment, Matt.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Well, folks... I guess that's just about it for tonight. I'd also like to say thanks to...

    Christopher Ford

    David McKinney

    humbleDBA

    Cory Ellingson

    Manie Verster

    curbina

    tbeadle

    JJ B

    ... for the wonderful comments. All of you have made it an absolute joy to write and be a member of this great forum! Thanks again! :):):):)

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Hi Jeff,

    My preference is actually the CLR function I coded up a while back. It's range is the (-) to (+) limit of the bigint type and it takes no room.

    I have used the Integer Domain table/function (which is what it really is) for many uses including ones that span hundreds of thousands of numbers. And an advantage of the CLR wide range is that I never have to do funky offset math, which is a large advantage for simpler code.

    (My funct. has one other tiny advantage: pass it (10,5) and it gives a descending list instead of ascending. Sometimes helpful. Although in a truly relational system order shouldn't matter, sometimes knowing it helps.)

    Cheers,

    --Eric

  • Cool... thanks for the feedback, Eric. Sounds like a great CLR! Any chance of twisting your arm into posting the source code for it?

    Also... what did you use the hundreds of thousands of integers for?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 15 posts - 61 through 75 (of 511 total)

You must be logged in to reply to this topic. Login to reply