The "Numbers" or "Tally" Table: What it is and how it replaces a loop

  • Jeff Moden (5/7/2008)


    Carl Federl (5/7/2008)


    Another great article ! Thanks

    The first reference for a tally table that I can recall is in "Guide to Sybase and SQL Server" by C. J. Date and D.McGoveran published in June of 1992.

    Does anyone know of an earlier reference ?

    ...

    Hugh Darwin (oft co-author with Chris Date) wrote about it I believe in the '80s. It's in one of the articles published in the "Relational Database Writings" series of books, but I couldn't find the exact reference quickly.

    It's really just the mathematical concept of a domain. *Relational* systems should be phenomenally good at managing domains, after all it's what they are (supposedly) built atop. The set of all Integers is a simple and useful domain. Darwin argues that an RDBMS should provide such "built-in" domains in useful ways that we can use.

    A "Tally" table (sorry Jeff) is a bad name though. It really should be just an "Integer" domain table. Ideally the DBMS should provide such domain support out of the box. Like Jeff, I've added one into about every system I've built in the past few years as it's incredibly useful.

    You can imagine all sorts of other domains that would be useful as well. For example, I work in scientific applications where often a process skips by some repeatable interval. So I have a "Floats(min,max,step)" function which returns a "table" of floating point numbers offset by some "step" value. It's just like "selecting" numbers from the "floating point numbers" domain.

    But Integers is the most useful I've found. You can emulate the "ASCII-character domain", or the "days of the month of July" set, or just about anything else you need to do set-based operations. Truly a "set-eye for the procedural guy" sort of tool anyone using Relational DBs should know and love.

    I used it a lot for doing semiconductor wafer-map combinatorial explosions to come up with Row/Column numbering based on min/max index numbers. You know, whatever you need...

    --Eric

  • Jeff

    We had a discussion regarding this article couple of days back in another forum. I was waiting for this article. I have no words to praise this one. I just wants to thank you for the article. You are rally taking this forum to a new height.

    πŸ™‚

    Anirban

  • Amazing ... you really open my eyes, what a good article.

    Best wishes from Peru.

    JosΓ© Miguel GutiΓ©rrez Hilaccama
    Database Developer - MCTS SQL Server 2005
    No way Jose? ... bah

  • Eric Wilson (5/8/2008)


    A "Tally" table (sorry Jeff) is a bad name though. It really should be just an "Integer" domain table.

    Not a problem... a rose is a rose and I agree... it's darned useful. I just can't bring myself to use a plural name like "Numbers" or "Integers", "Integer" is a reserved word in SQL Server, and "Number" is a reserved word in Oracle. Not likely that "Tally" will ever become a reserved word, but we'll think of something else if it ever does. πŸ™‚

    As a side bar, it's easy to "warp" a Tally table into an "Stepped Value" table with either division or multiplication. That's what happens in the "shift" example I gave.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Anirban Paul (5/8/2008)


    Jeff

    We had a discussion regarding this article couple of days back in another forum. I was waiting for this article. I have no words to praise this one. I just wants to thank you for the article. You are rally taking this forum to a new height.

    πŸ™‚

    Anirban

    Awesome compliment, Anirban... thanks! Yep, I remember the "conversation"... I think it was on Karthik's "Tally Table" post.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (5/7/2008)


    [font="Courier New"]WHILE....[/font]

    ... and that blows me away. :hehe:

    They also used a WHILE loop to create some of their test data...

    Hi Jeff,

    Though I understand the sentiment from a purist point of view, I must admit being "guilty" of that myself.

    In my defense - there is, of course, a vast difference between code that runs (or is intended to run one day) in production, on multi-million row tables and with a need for fast performance (either because it's part of an online process or because it has to fit in a small maintenance window) on the one hand, and code to create test data on the other hand. The latter never runs on a production system (I hope), and most developers have plenty of other useful things to do (like getting their co-workers some coffee ;)) while the test data is being generated.

    So yes. I am guilty as charged. And will probably continue to be, as I prefer ease of coding over better performance in cases like this, where performance doesn't actually matter.

    (And if you don't believe the "ease of coding" part, I challenge you to visit my blog, find the first part of the -still incomplete- series about bin packing, and rewrite the stored procedure that generates the test data in a set-based fashion)


    Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
    Visit my SQL Server blog: https://sqlserverfast.com/blog/
    SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/

  • jose (5/8/2008)


    Amazing ... you really open my eyes, what a good article.

    Best wishes from Peru.

    Thanks, Jose. Best wishes from the "U S of A"!

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Hugo Kornelis (5/8/2008)


    So yes. I am guilty as charged. And will probably continue to be, as I prefer ease of coding over better performance in cases like this, where performance doesn't actually matter.

    (And if you don't believe the "ease of coding" part, I challenge you to visit my blog, find the first part of the -still incomplete- series about bin packing, and rewrite the stored procedure that generates the test data in a set-based fashion)

    Heh... I really appreciate the feedback, Hugo. Hmmmm... ease of coding... you're saying that writing a WHILE loop is easier to write than a simple cross-join? I don't think so.

    Performance does matter to me... always. Even in simple supposedly 1 time use code because you just never know when it's going to be used again. πŸ˜› I just can't bring myself to spending the extra time to write a While loop when the cross tab code, whether it be for a Tally table or for test code, works so well and takes so little time to write.

    Thanks for the invite to take a peek at your blog...

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (5/7/2008)


    I really had some hesitation in writing the article because I thought some of the "old dogs" would get after me for writing about something they already knew.

    Hi Jeff,

    I'll admit that I was surprised to see the subject you chose. I consider you to be one of the few contributor to this site that deliver a constantly good quality - and now I thought you were wasting your and our time by, to quote you, stating the obvious.

    I'm glad I held back that comment πŸ™‚ After reading this discussion, I am really amazed how many people apparently were still unaware of this technique. Apparently, the "old dogs" have never been able to do a good job of reaching out and communicating this technique as well as they (we?) thought.

    For yes!, there has already been a lot of publicity about this technique (though usually under the name "Numbers table" instead of "Tallly table" - I agree with Eric that the "Tally" name is not correct, though I disagree with his alternative suggestion as well for most Numbers table will typpically NOT represent the whole domain).

    * When answering question in newsgroups, I constantly refer to this article, written by Aaron Bertrand: http://sqlserver2000.databases.aspfaq.com/why-should-i-consider-using-an-auxiliary-numbers-table.html. There are many similar articles, but I long ago decided that memorizing one of them suffices. πŸ™‚

    * Joe Celko has covered the technique in at least one of his books: http://www.amazon.com/Joe-Celkos-SQL-Smarties-Programming/dp/1558605762

    * Huge numbers of blog posts, forum and newsgroup answers, and other resources advertise the use of a Numbers table to solve a particular problem, often with a mention of either Aaron Bertrand's article or one of the countless others.

    Up until now, I thought that the world of SQL Server developers and DBAs constituted of two parts: those who consult online resources to expand their knowledge, and those who don't; I expected the former group to already be awarre of this technique and the latter group not to read your article either. I am now forced to revisit this picture - apparently there is a third group of SQL Server professionals who do read this site, but don't go beyonnd that. For that, apparently quite large group, your article has served a great purpose - so I'd like to thank you for that.

    And to all the SQL Server professionals out there who read this site but nothing else, I'd like to say: Get out! Explore! There is so much information about SQL Server te be found on the internet beyond this site. Some of it might be less useful or less organised than here, but there's also huge loads of information that is better than most of what you'll find here - don't limit yourself to this site only!

    (Edit - fixed the hyperlinks)


    Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
    Visit my SQL Server blog: https://sqlserverfast.com/blog/
    SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/

  • Eric Wilson (5/8/2008)


    It's really just the mathematical concept of a domain. *Relational* systems should be phenomenally good at managing domains, after all it's what they are (supposedly) built atop. The set of all Integers is a simple and useful domain. Darwin argues that an RDBMS should provide such "built-in" domains in useful ways that we can use.

    A "Tally" table (sorry Jeff) is a bad name though. It really should be just an "Integer" domain table.

    Hi Eric,

    Except that the domain of all integers is of infinite size, and no table can match that. That why I prefer the (more common) name "Numbers" for this table - as it describes exactly what it contains: a set of numbers.

    You can imagine all sorts of other domains that would be useful as well. (...)

    But Integers is the most useful I've found.

    The second best (in my opinion) is a calendar table. See http://sqlserver2000.databases.aspfaq.com/why-should-i-consider-using-an-auxiliary-calendar-table.html for more information about this related concept. I believe that every database should have both a Numbers and a Calendar table.


    Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
    Visit my SQL Server blog: https://sqlserverfast.com/blog/
    SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/

  • Jeff Moden (5/8/2008)


    I just can't bring myself to use a plural name like "Numbers" or "Integers", "Integer" is a reserved word in SQL Server, and "Number" is a reserved word in Oracle.

    Hi Jeff,

    Off topic, but still - I really don't understand your problem with plural names for tables.

    The name of a table should describe what's in it. So if a table holds exactly one row, a singular name ("Employee", "Number", "Day") would be a good name. If, however, more rows might be in the table, then the name should either be plural ("Numbers"), or a collective noun ("Personnel", "Calendar").

    Besides this logical argument, there are also pragmatic arguments:

    1) All reserved words are singular. By using plural table names, you have almost no chance of a clash with a reserved word, now or after upgrading to a future version.

    2) Consistency with system tables and views (SQL 2000: sysobjects, sysindexes, etc; SQL 2005: sys.objects, sys.indexes, sys.dm_db_partition_stats, etc; ANSI-defined: INFORMATION_SCHEMA.VIEWS, INFORMATION_SCHEMA.TABLES, etc).

    But on the other hand - every naming convention is a good one, as long as it's applied consitently πŸ˜€


    Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
    Visit my SQL Server blog: https://sqlserverfast.com/blog/
    SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/

  • Jeff Moden (5/8/2008)


    Heh... I really appreciate the feedback, Hugo. Hmmmm... ease of coding... you're saying that writing a WHILE loop is easier to write than a simple cross-join? I don't think so.

    Hi Jeff,

    Wow, you're fast mate! Isn't it somewhere deep in the night in your part of the world? Shouldn't you be sleeping? πŸ˜›

    Of course, a simple cross join is always better than a WHILE loop. I was talking about the, let's call it, "less simple" situations.

    Another advantage of using WHILE for generating test data is that you can use RAND() instead of NEWID() to generate the random values. RAND() in a setbased query will be evaluated once and use the same value for each row - obviously not good for test data. I'm not favoring RAND() over NEWID() for strength of the randomization algorithm (though RAND() is better, but I don't think that matters for test data), but because you can optionally seed RAND() so that you have the ability to generate the same set of test data every time.


    Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
    Visit my SQL Server blog: https://sqlserverfast.com/blog/
    SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/

  • Oh no... Hugo! You're one of those folks that actually believes in the myth of code portability! :hehe:

    You wrote the following in your blog...

    SQL Server will happily update the same row over and over again if it matches more than one row in the joined table, with only the result of the last of those updates sticking.

    ... Do you have any code that proves that?

    Shifting gears a bit... the "bin packing by weight" problem sounds like a very interesting challenge for set based processing. I'll give it a whirl. I guarantee that it won't be "portable" code because the non-Ansi extensions of SQL Server are just too valuable in solving this particular problem. I can just about guarantee that very few, if any, other RDBMS's have the particular extension that I'm thinking about. Either way, it sounds like a wad of fun trying.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Hugo Kornelis (5/8/2008)


    Jeff Moden (5/8/2008)


    Heh... I really appreciate the feedback, Hugo. Hmmmm... ease of coding... you're saying that writing a WHILE loop is easier to write than a simple cross-join? I don't think so.

    Hi Jeff,

    Wow, you're fast mate! Isn't it somewhere deep in the night in your part of the world? Shouldn't you be sleeping? πŸ˜›

    Of course, a simple cross join is always better than a WHILE loop. I was talking about the, let's call it, "less simple" situations.

    Another advantage of using WHILE for generating test data is that you can use RAND() instead of NEWID() to generate the random values. RAND() in a setbased query will be evaluated once and use the same value for each row - obviously not good for test data. I'm not favoring RAND() over NEWID() for strength of the randomization algorithm (though RAND() is better, but I don't think that matters for test data), but because you can optionally seed RAND() so that you have the ability to generate the same set of test data every time.

    Heh... I AM sleeping! You should see what I do when I'm awake πŸ˜€

    You certainly don't need a While loop to generate random test data...

    --drop table jbmtest

    --===== Create and populate a 1,000,000 row test table.

    -- Column "RowNum" has a range of 1 to 1,000,000 unique numbers

    -- Column "SomeInt" has a range of 1 to 50,000 non-unique numbers

    -- Column "SomeLetters2" has a range of "AA" to "ZZ" non-unique 2 character strings

    -- Column "SomeMoney has a range of 0.0000 to 99.9999 non-unique numbers

    -- Column "SomeDate" has a range of >=01/01/2000 and <01/01/2010 non-unique date/times

    -- Column "SomeCSV" contains 'Part01,Part02,Part03,Part04,Part05,Part06,Part07,Part08,Part09,Part10'

    -- for all rows.

    -- Column "SomeHex12" contains 12 random hex characters (ie, 0-9,A-F)

    -- Jeff Moden

    SELECT TOP 1000000

    RowNum = IDENTITY(INT,1,1),

    SomeInt = ABS(CHECKSUM(NEWID()))%50000+1,

    SomeLetters2 = CHAR(ABS(CHECKSUM(NEWID()))%26+65)

    + CHAR(ABS(CHECKSUM(NEWID()))%26+65),

    SomeCSV = CAST('Part01,Part02,Part03,Part04,Part05,Part06,Part07,Part08,Part09,Part10' AS VARCHAR(80)),

    SomeMoney = CAST(ABS(CHECKSUM(NEWID()))%10000 /100.0 AS MONEY),

    SomeDate = CAST(RAND(CHECKSUM(NEWID()))*3653.0+36524.0 AS DATETIME),

    SomeHex12 = RIGHT(NEWID(),12)

    INTO dbo.JBMTest

    FROM Master.dbo.SysColumns t1,

    Master.dbo.SysColumns t2 --Lack of join criteria makes this a CROSS-JOIN

    --===== A table is not properly formed unless a Primary Key has been assigned

    ALTER TABLE dbo.JBMTest

    ADD CONSTRAINT PK_JBMTest_RowNum PRIMARY KEY CLUSTERED (RowNum)

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • p.s. Understood about the random seed... but if you want to generate the exact same test data every time, make a copy of the table.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 15 posts - 76 through 90 (of 511 total)

You must be logged in to reply to this topic. Login to reply