The "Numbers" or "Tally" Table: What it is and how it replaces a loop

  • Ed Wagner - Tuesday, April 11, 2017 9:24 AM

    Jeff Moden - Tuesday, April 11, 2017 9:12 AM

    jetboy2k - Tuesday, April 11, 2017 7:58 AM

    Jeff, thanks for the explanation regarding why the query to generate the Tally table used a CROSS JOIN on two instances of the Syscolumns table. In my SQL instance, the COUNT(*) for the Master.dbo.Syscolumns table was 15645, so I couldn't figure out why there was a need for the CROSS JOIN. Now it makes sense.

    Ed, sorry, how was this a duplicate post? Had someone else already asked this question?

    Ed's a good man and he knows a whole lot.  He was probably in a bit of a hurry, didn't read my post, and posted a nearly identical post to mine and then realized all that and deleted his.

    Thanks, Jeff.  No, I didn't see your post before I created mine, even though you posted it last night. :blush:

    JetBoy2K, I described the technique of using the CROSS JOIN and how it used the presence of rows to create a list of numbers.  I also explained what Jeff explained, but didn't include the math like he did.  After posting it, I saw Jeff's and deleted mine.

    Sorry for the confusion.  I wasn't awake enough yet.

    Hmmm..... Now I'm very curious as to just how short of a SQL statement could be written to populate the table without referencing any actual tables. Hmmm...... The game is afoot.

  • david.holley - Tuesday, April 11, 2017 11:27 AM

    Ed Wagner - Tuesday, April 11, 2017 9:24 AM

    Jeff Moden - Tuesday, April 11, 2017 9:12 AM

    jetboy2k - Tuesday, April 11, 2017 7:58 AM

    Jeff, thanks for the explanation regarding why the query to generate the Tally table used a CROSS JOIN on two instances of the Syscolumns table. In my SQL instance, the COUNT(*) for the Master.dbo.Syscolumns table was 15645, so I couldn't figure out why there was a need for the CROSS JOIN. Now it makes sense.

    Ed, sorry, how was this a duplicate post? Had someone else already asked this question?

    Ed's a good man and he knows a whole lot.  He was probably in a bit of a hurry, didn't read my post, and posted a nearly identical post to mine and then realized all that and deleted his.

    Thanks, Jeff.  No, I didn't see your post before I created mine, even though you posted it last night. :blush:

    JetBoy2K, I described the technique of using the CROSS JOIN and how it used the presence of rows to create a list of numbers.  I also explained what Jeff explained, but didn't include the math like he did.  After posting it, I saw Jeff's and deleted mine.

    Sorry for the confusion.  I wasn't awake enough yet.

    Hmmm..... Now I'm very curious as to just how short of a SQL statement could be written to populate the table without referencing any actual tables. Hmmm...... The game is afoot.

    Got it, thanks. And that is thanks to both of you - your assistance, and your patience with a newbie, is greatly appreciated.

  • david.holley - Tuesday, April 11, 2017 11:27 AM

    Ed Wagner - Tuesday, April 11, 2017 9:24 AM

    Jeff Moden - Tuesday, April 11, 2017 9:12 AM

    jetboy2k - Tuesday, April 11, 2017 7:58 AM

    Jeff, thanks for the explanation regarding why the query to generate the Tally table used a CROSS JOIN on two instances of the Syscolumns table. In my SQL instance, the COUNT(*) for the Master.dbo.Syscolumns table was 15645, so I couldn't figure out why there was a need for the CROSS JOIN. Now it makes sense.

    Ed, sorry, how was this a duplicate post? Had someone else already asked this question?

    Ed's a good man and he knows a whole lot.  He was probably in a bit of a hurry, didn't read my post, and posted a nearly identical post to mine and then realized all that and deleted his.

    Thanks, Jeff.  No, I didn't see your post before I created mine, even though you posted it last night. :blush:

    JetBoy2K, I described the technique of using the CROSS JOIN and how it used the presence of rows to create a list of numbers.  I also explained what Jeff explained, but didn't include the math like he did.  After posting it, I saw Jeff's and deleted mine.

    Sorry for the confusion.  I wasn't awake enough yet.

    Hmmm..... Now I'm very curious as to just how short of a SQL statement could be written to populate the table without referencing any actual tables. Hmmm...... The game is afoot.

    My gut feeling is that the shortest (mind you, not fastest - then again, it's a one-time task only) would be a simple recursive CTE. Untested and unchecked for syntax, something like "WITH c AS (SELECT 1 AS n UNION ALL SELECT n + 1 FROM c WHERE n < 100000) OPTION (MAXRECURSION 0);"


    Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
    Visit my SQL Server blog: https://sqlserverfast.com/blog/
    SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/

  • Hugo Kornelis - Tuesday, April 11, 2017 2:59 PM

    david.holley - Tuesday, April 11, 2017 11:27 AM

    Ed Wagner - Tuesday, April 11, 2017 9:24 AM

    Jeff Moden - Tuesday, April 11, 2017 9:12 AM

    jetboy2k - Tuesday, April 11, 2017 7:58 AM

    Jeff, thanks for the explanation regarding why the query to generate the Tally table used a CROSS JOIN on two instances of the Syscolumns table. In my SQL instance, the COUNT(*) for the Master.dbo.Syscolumns table was 15645, so I couldn't figure out why there was a need for the CROSS JOIN. Now it makes sense.

    Ed, sorry, how was this a duplicate post? Had someone else already asked this question?

    Ed's a good man and he knows a whole lot.  He was probably in a bit of a hurry, didn't read my post, and posted a nearly identical post to mine and then realized all that and deleted his.

    Thanks, Jeff.  No, I didn't see your post before I created mine, even though you posted it last night. :blush:

    JetBoy2K, I described the technique of using the CROSS JOIN and how it used the presence of rows to create a list of numbers.  I also explained what Jeff explained, but didn't include the math like he did.  After posting it, I saw Jeff's and deleted mine.

    Sorry for the confusion.  I wasn't awake enough yet.

    Hmmm..... Now I'm very curious as to just how short of a SQL statement could be written to populate the table without referencing any actual tables. Hmmm...... The game is afoot.

    My gut feeling is that the shortest (mind you, not fastest - then again, it's a one-time task only) would be a simple recursive CTE. Untested and unchecked for syntax, something like "WITH c AS (SELECT 1 AS n UNION ALL SELECT n + 1 FROM c WHERE n < 100000) OPTION (MAXRECURSION 0);"

    Even shorter:
    WITH c AS(SELECT 1 AS n UNION ALL SELECT n+1 FROM c WHERE n<100000) SELECT * FROM c OPTION(MAXRECURSION 0);

    The second shortest but most efficient way...
    WITH E AS(SELECT v FROM(VALUES(1),(1),(1),(1),(1),(1),(1),(1),(1),(1))t(v))
    SELECT N=ROW_NUMBER() OVER (ORDER BY (SELECT 1))
    FROM E a,E b,E c,E d,E f

    "I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."

    -- Itzik Ben-Gan 2001

  • Hugo Kornelis - Tuesday, April 11, 2017 2:59 PM

    david.holley - Tuesday, April 11, 2017 11:27 AM

    Ed Wagner - Tuesday, April 11, 2017 9:24 AM

    Jeff Moden - Tuesday, April 11, 2017 9:12 AM

    jetboy2k - Tuesday, April 11, 2017 7:58 AM

    Jeff, thanks for the explanation regarding why the query to generate the Tally table used a CROSS JOIN on two instances of the Syscolumns table. In my SQL instance, the COUNT(*) for the Master.dbo.Syscolumns table was 15645, so I couldn't figure out why there was a need for the CROSS JOIN. Now it makes sense.

    Ed, sorry, how was this a duplicate post? Had someone else already asked this question?

    Ed's a good man and he knows a whole lot.  He was probably in a bit of a hurry, didn't read my post, and posted a nearly identical post to mine and then realized all that and deleted his.

    Thanks, Jeff.  No, I didn't see your post before I created mine, even though you posted it last night. :blush:

    JetBoy2K, I described the technique of using the CROSS JOIN and how it used the presence of rows to create a list of numbers.  I also explained what Jeff explained, but didn't include the math like he did.  After posting it, I saw Jeff's and deleted mine.

    Sorry for the confusion.  I wasn't awake enough yet.

    Hmmm..... Now I'm very curious as to just how short of a SQL statement could be written to populate the table without referencing any actual tables. Hmmm...... The game is afoot.

    My gut feeling is that the shortest (mind you, not fastest - then again, it's a one-time task only) would be a simple recursive CTE. Untested and unchecked for syntax, something like "WITH c AS (SELECT 1 AS n UNION ALL SELECT n + 1 FROM c WHERE n < 100000) OPTION (MAXRECURSION 0);"

    Agreed. That "shortest" method also takes the longest, uses a large amount of resources, and can be beaten by a well written WHILE loop.  For those interested, please see the following article for why you shouldn't use recursive CTEs (rCTEs) to count even for one-off tasks. 😉
    http://www.sqlservercentral.com/articles/T-SQL/74118/

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • david.holley - Tuesday, April 11, 2017 11:27 AM

    Ed Wagner - Tuesday, April 11, 2017 9:24 AM

    Jeff Moden - Tuesday, April 11, 2017 9:12 AM

    jetboy2k - Tuesday, April 11, 2017 7:58 AM

    Jeff, thanks for the explanation regarding why the query to generate the Tally table used a CROSS JOIN on two instances of the Syscolumns table. In my SQL instance, the COUNT(*) for the Master.dbo.Syscolumns table was 15645, so I couldn't figure out why there was a need for the CROSS JOIN. Now it makes sense.

    Ed, sorry, how was this a duplicate post? Had someone else already asked this question?

    Ed's a good man and he knows a whole lot.  He was probably in a bit of a hurry, didn't read my post, and posted a nearly identical post to mine and then realized all that and deleted his.

    Thanks, Jeff.  No, I didn't see your post before I created mine, even though you posted it last night. :blush:

    JetBoy2K, I described the technique of using the CROSS JOIN and how it used the presence of rows to create a list of numbers.  I also explained what Jeff explained, but didn't include the math like he did.  After posting it, I saw Jeff's and deleted mine.

    Sorry for the confusion.  I wasn't awake enough yet.

    Hmmm..... Now I'm very curious as to just how short of a SQL statement could be written to populate the table without referencing any actual tables. Hmmm...... The game is afoot.

    The answer is...

     SELECT N FROM fnTally(1,11000);

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden - Tuesday, April 11, 2017 8:42 PM

    The answer is...

     SELECT N FROM fnTally(1,11000);

    Jeff, from a performance perspective, would you prefer to use the dbo.fnTally() method or join to the dbo.Tally table?

    Looking at the execution plans I'm kind of in favour of the dbo.Tally table as:

    • In the best case using dbo.Tally is doing a Clustered Index Seek, in the worst a Clustered Index Scan.
    • Using dbo.fnTally() adds a bunch of Nested Loops (Inner Join) nodes and a Sequence Project (for the row_number())

    To me dbo.fnTally() seems to be burning CPU unnecessarily when B-Tree searches and traversals would be cheaper. Using dbo.Tally also gives you the ability to specify a range of values for N.

  • AnthonyR - Tuesday, April 11, 2017 11:44 PM

    Jeff Moden - Tuesday, April 11, 2017 8:42 PM

    The answer is...

     SELECT N FROM fnTally(1,11000);

    Jeff, from a performance perspective, would you prefer to use the dbo.fnTally() method or join to the dbo.Tally table?

    • In the best case using dbo.Tally is doing a Clustered Index Seek, in the worst a Clustered Index Scan.
    • Using dbo.fnTally() adds a bunch of Nested Loops (Inner Join) nodes and a Sequence Project (for the row_number())

    To me dbo.fnTally() seems to be burning CPU unnecessarily when B-Tree searches and traversals would be cheaper. Using dbo.Tally also gives you the ability to specify a range of values for N.

    The nested loops have no join predicate and add virtually no overhead.

    The only way to know which is faster for a given requirement is to test them both. I have found that, in most situations, dbo.tally is a wee-bit slower. The CTE-tally method  (used in fnTally) is readless whereas dbo.tally will generate a small number of reads. I always put a unique nonclustered index on dbo.tally which reduces the reads. 

    Another kind of tally table (provided you have SQL Server 2014+) is what I call an electric tally table: AKA an in-memory tally table. I have seen this outperform the CTE-tally method in a lot of cases. The downside is that in-memory objects will prevent parallelism in your execution plan. 

    I have all three at my disposal when writing code and will test them all.

    "I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."

    -- Itzik Ben-Gan 2001

  • AnthonyR - Tuesday, April 11, 2017 11:44 PM

    Jeff Moden - Tuesday, April 11, 2017 8:42 PM

    The answer is...

     SELECT N FROM fnTally(1,11000);

    Jeff, from a performance perspective, would you prefer to use the dbo.fnTally() method or join to the dbo.Tally table?

    • In the best case using dbo.Tally is doing a Clustered Index Seek, in the worst a Clustered Index Scan.
    • Using dbo.fnTally() adds a bunch of Nested Loops (Inner Join) nodes and a Sequence Project (for the row_number())

    To me dbo.fnTally() seems to be burning CPU unnecessarily when B-Tree searches and traversals would be cheaper. Using dbo.Tally also gives you the ability to specify a range of values for N.

    "It Depends".  The advantage of using a function based on Itzek Ben-Gans wonderful cascading CTE method is that it produces no reads.  It IS, however, very slightly slower than reading from a physical Tally table that has been cached in memory.  Since a lot of people look for excessive logical reads (including myself) to find poorly written code that causes performance problems, I tend to use the function rather than the Table.

    There are also certain DBAs in this world that won't let anyone add such a "helper" table as a Tally Table.  Those same DBAs sometimes also take grand exception to any kind of function either because they're a bit uninformed or don't have the time to review code and have found it simpler to forbid the use of any function rather that checking as to whether or not it's an iTVF or something slower.  That's when you have to embed the cascading CTE into the code.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden - Wednesday, April 12, 2017 8:47 AM

    AnthonyR - Tuesday, April 11, 2017 11:44 PM

    Jeff Moden - Tuesday, April 11, 2017 8:42 PM

    The answer is...

     SELECT N FROM fnTally(1,11000);

    Jeff, from a performance perspective, would you prefer to use the dbo.fnTally() method or join to the dbo.Tally table?

    • In the best case using dbo.Tally is doing a Clustered Index Seek, in the worst a Clustered Index Scan.
    • Using dbo.fnTally() adds a bunch of Nested Loops (Inner Join) nodes and a Sequence Project (for the row_number())

    To me dbo.fnTally() seems to be burning CPU unnecessarily when B-Tree searches and traversals would be cheaper. Using dbo.Tally also gives you the ability to specify a range of values for N.

    "It Depends".  The advantage of using a function based on Itzek Ben-Gans wonderful cascading CTE method is that it produces no reads.  It IS, however, very slightly slower than reading from a physical Tally table that has been cached in memory.  Since a lot of people look for excessive logical reads (including myself) to find poorly written code that causes performance problems, I tend to use the function rather than the Table.

    There are also certain DBAs in this world that won't let anyone add such a "helper" table as a Tally Table.  Those same DBAs sometimes also take grand exception to any kind of function either because they're a bit uninformed or don't have the time to review code and have found it simpler to forbid the use of any function rather that checking as to whether or not it's an iTVF or something slower.  That's when you have to embed the cascading CTE into the code.

    Chipping in, it is also a matter of code portability as using a CTE in the code does require neither the function nor the numbers table being present. There is a very little performance difference between embedding a tally CTE in the code and the other methods.
    😎

  • Eirikur Eiriksson - Wednesday, April 12, 2017 9:35 AM

    Chipping in, it is also a matter of code portability as using a CTE in the code does require neither the function nor the numbers table being present. There is a very little performance difference between embedding a tally CTE in the code and the other methods.
    😎

    Agreed.  Of course, you know the following but I have to say it out loud...  I just hate duplicating such black box utility code especially since improvements have been made over time.  A good example is that Itzik Ben-Gan did come out with an improvement to his code by the inclusion of TOP not too long after he came out with his first rendition.  If it's in an iTVF, you only have to fix it in one place.  If it's "portable" because the basic code was embedded somewhere, then you have to find it all and fix it.  If MS ever sees the light and comes out with a high performance built in version, it would be really nice to only have to make the change in one place.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Microsoft should simply add a system tally table to SQL Server.

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

  • Eric M Russell - Wednesday, April 12, 2017 11:30 AM

    Microsoft should simply add a system tally table to SQL Server.

    And of course with data compression (page) as a default
    😎

  • Jeff Moden - Wednesday, April 12, 2017 11:18 AM

    Eirikur Eiriksson - Wednesday, April 12, 2017 9:35 AM

    Chipping in, it is also a matter of code portability as using a CTE in the code does require neither the function nor the numbers table being present. There is a very little performance difference between embedding a tally CTE in the code and the other methods.
    😎

    Agreed.  Of course, you know the following but I have to say it out loud...  I just hate duplicating such black box utility code especially since improvements have been made over time.  A good example is that Itzik Ben-Gan did come out with an improvement to his code by the inclusion of TOP not too long after he came out with his first rendition.  If it's in an iTVF, you only have to fix it in one place.  If it's "portable" because the basic code was embedded somewhere, then you have to find it all and fix it.  If MS ever sees the light and comes out with a high performance built in version, it would be really nice to only have to make the change in one place.

    Need to wrap up the tally table CTE article I've been working on (for too long)
    😎

  • Eirikur Eiriksson - Wednesday, April 12, 2017 11:36 AM

    Eric M Russell - Wednesday, April 12, 2017 11:30 AM

    Microsoft should simply add a system tally table to SQL Server.

    And of course with data compression (page) as a default
    😎

    You guys promote you own proprietary tally table solution like it's the best thing since sliced bread. It's just a table stubbed with monotonically increasing numbers.
    Try inventing something like this:
    https://www.diet-blog.com/07/diet_water.php  :rolleyes:

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

Viewing 15 posts - 436 through 450 (of 511 total)

You must be logged in to reply to this topic. Login to reply