The Dynamic Tally or Numbers Table

Question

Post reply

The Dynamic Tally or Numbers Table

Viewing 15 posts - 16 through 30 (of 159 total)

You must be logged in to reply to this topic. Login to reply

Mohit Nayyar Ten Centuries Points: 1008 More actions · Answer 1

Great Article Lynn

I also wrote something similar long time back on my blog....

http://mohitnayyar.blogspot.com/2008/09/data-extrapolation-on-fly.html

Thanks

Mohit

Thanks
Mohit Nayyar
http://mohitnayyar.blogspot.com/
"If I am destined to fail, then I do have a purpose in my life, To fail my destiny"

peter-757102 SSCertifiable Points: 6877 More actions · Answer 2

I wonder why a SQL magazine published code that used a multiline table valued function given their reputation for scaling bad. It will not help to educate readers on how to make the most of their platform. Lynn Pettis version is for that reason alone a good contribution, now have it published too 😉

After copying I cleaned up the code on my system as the article has many keywords merged, like “innerjoin” instead of “inner join” and “casewhen” instead of “case when”. I understand all that is happening, except for one thing…the ordered part of the output and more especially…why.

I understand how the numbers come to be, but am uncertain what good the order is in an in-line function. To my knowledge no order assumption can be made by the consuming code, unless something is at work here I don’t know or realize yet.

As for performance I would have to run many tests, especially how it interacts in more complex operations that usually involve a traditional tally table for speedup.

Lynn Pettis SSC Guru Points: 442467 More actions · Answer 3

peter-757102 (9/22/2009)
I wonder why a SQL magazine published code that used a multiline table valued function given their reputation for scaling bad. It will not help to educate readers on how to make the most of their platform. Lynn Pettis version is for that reason alone a good contribution, now have it published too 😉
After copying I cleaned up the code on my system as the article has many keywords merged, like “innerjoin” instead of “inner join” and “casewhen” instead of “case when”. I understand all that is happening, except for one thing…the ordered part of the output and more especially…why.
I understand how the numbers come to be, but am uncertain what good the order is in an in-line function. To my knowledge no order assumption can be made by the consuming code, unless something is at work here I don’t know or realize yet.
As for performance I would have to run many tests, especially how it interacts in more complex operations that usually involve a traditional tally table for speedup.

I welcome the testing you will do and would be interested in the results. As to the ordered part of the output, I think it has to do with the following code fragment:

row_number() over (order by a1.N)

Something I have noticed when working with the windowing functions in queries using them, if the query itself does not have an ORDER BY, the ORDER BY in the windowing function orders the output.

I'd be interested if anyone else has seen this behaviour as well.

MrAkki SSCrazy Points: 2574 More actions · Answer 4

Nice article. 🙂

Another way would be:

DECLARE @Start BIGINT, @Step BIGINT, @max-2 BIGINT

SELECT @Start = 1, @Step = 10 , @max-2 = 10000000

SELECT (Number * @Step + @Start)

FROM (

SELECT TOP (@Max/@Step)

ROW_NUMBER() OVER (ORDER BY c.[object_id]) - 1 AS Number

FROM sys.columns AS c WITH (NOLOCK)

CROSS JOIN sys.columns AS c2 WITH (NOLOCK)

CROSS JOIN sys.columns AS c3 WITH (NOLOCK)

CROSS JOIN sys.columns AS c4 WITH (NOLOCK)

CROSS JOIN sys.columns AS c5 WITH (NOLOCK)

) a

WHERE (Number * @Step + @Start) <= @max-2

It comes close to the ufn_Tally2 Function just a little more reads...

Regards,

Thorsten

peter-757102 SSCertifiable Points: 6877 More actions · Answer 5

We really need a mode of SQL server where al output order is randomized if no ordering is in effect (explicit or otherwise). This would take away the guessing out of some SQL constructs. So if it can go wrong, it will go wrong and right on your first test run.

Personally I would never rely on implicit ordering, we seen how that effect can change as the LOP changes due to available cores (remember the partitioned version of running totals?).

Lynn Pettis SSC Guru Points: 442467 More actions · Answer 6

MrAkki (9/22/2009)
Nice article. 🙂
Another way would be:
DECLARE @Start BIGINT, @Step BIGINT, @max-2 BIGINT
SELECT @Start = 1, @Step = 10 , @max-2 = 10000000
SELECT (Number * @Step + @Start)
FROM (
SELECT TOP (@Max/@Step)
ROW_NUMBER() OVER (ORDER BY c.[object_id]) - 1 AS Number
FROM sys.columns AS c WITH (NOLOCK)
CROSS JOIN sys.columns AS c2 WITH (NOLOCK)
CROSS JOIN sys.columns AS c3 WITH (NOLOCK)
CROSS JOIN sys.columns AS c4 WITH (NOLOCK)
CROSS JOIN sys.columns AS c5 WITH (NOLOCK)
) a
WHERE (Number * @Step + @Start) <= @max-2
It comes close to the ufn_Tally2 Function just a little more reads...
Regards,
Thorsten

~~Yes, it does.~~ The biggest difference, however, is dbo.ufn_Tally2 isn't dependent on any system tables. Here is a what if; what if you aren't allowed to access system tables in your code for something like this?

Edit: Actually, it turns out it doesn't.

Jeff Moden SSC Guru Points: 1004224 More actions · Answer 7

MrAkki (9/22/2009)
Nice article. 🙂
Another way would be:
DECLARE @Start BIGINT, @Step BIGINT, @max-2 BIGINT
SELECT @Start = 1, @Step = 10 , @max-2 = 10000000
SELECT (Number * @Step + @Start)
FROM (
SELECT TOP (@Max/@Step)
ROW_NUMBER() OVER (ORDER BY c.[object_id]) - 1 AS Number
FROM sys.columns AS c WITH (NOLOCK)
CROSS JOIN sys.columns AS c2 WITH (NOLOCK)
CROSS JOIN sys.columns AS c3 WITH (NOLOCK)
CROSS JOIN sys.columns AS c4 WITH (NOLOCK)
CROSS JOIN sys.columns AS c5 WITH (NOLOCK)
) a
WHERE (Number * @Step + @Start) <= @max-2
It comes close to the ufn_Tally2 Function just a little more reads...
Regards,
Thorsten

Actually, it doesn't come close. To generate 1 to 10,000,000 numbers, Lynn's method pretty much blows the doors off that type of cross join (ie: more than a pair of tables cross joined) when inserting into a temp table...

Lynn's Method Akki's Method Akki/Lynn Ratio

------------- ------------- ---------------

Duration 6.880 14.520 111.0%

CPU 5.172 12.570 143.0%

Reads 10,676 129,596 1,113.9% YOWCH!!

Writes 21,005 18,911 -10.0%

Keep in mind that the display is the "great equalizer"... you cannot judge real performance when returning result sets to the screen.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1004224 More actions · Answer 8

peter-757102 (9/22/2009)
We really need a mode of SQL server where al output order is randomized if no ordering is in effect (explicit or otherwise). This would take away the guessing out of some SQL constructs. So if it can go wrong, it will go wrong and right on your first test run.
Personally I would never rely on implicit ordering, we seen how that effect can change as the LOP changes due to available cores (remember the partitioned version of running totals?).

Heh... add OPTION(MAXDOP 1) so you don't have to worry about the available cores. 😉

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

peter-757102 SSCertifiable Points: 6877 More actions · Answer 9

Jeff Moden (9/22/2009)
peter-757102 (9/22/2009)
We really need a mode of SQL server where al output order is randomized if no ordering is in effect (explicit or otherwise). This would take away the guessing out of some SQL constructs. So if it can go wrong, it will go wrong and right on your first test run.
Personally I would never rely on implicit ordering, we seen how that effect can change as the LOP changes due to available cores (remember the partitioned version of running totals?).
Heh... add OPTION(MAXDOP 1) and so you don't have to worry about the available cores. 😉

I know Jeff, I read the article well, and there was even more to it, like forcing clustered index usage. But here we are talking functions that are to be incorporated in more complex constructs. Just turning off parallelism can be bad for performance. And the user of the function might not be aware of it.

It always pays to be cautious and not to raise expectations that are not necessarily true.

Lynn Pettis SSC Guru Points: 442467 More actions · Answer 10

Jeff Moden (9/22/2009)
MrAkki (9/22/2009)
Nice article. 🙂
Another way would be:
DECLARE @Start BIGINT, @Step BIGINT, @max-2 BIGINT
SELECT @Start = 1, @Step = 10 , @max-2 = 10000000
SELECT (Number * @Step + @Start)
FROM (
SELECT TOP (@Max/@Step)
ROW_NUMBER() OVER (ORDER BY c.[object_id]) - 1 AS Number
FROM sys.columns AS c WITH (NOLOCK)
CROSS JOIN sys.columns AS c2 WITH (NOLOCK)
CROSS JOIN sys.columns AS c3 WITH (NOLOCK)
CROSS JOIN sys.columns AS c4 WITH (NOLOCK)
CROSS JOIN sys.columns AS c5 WITH (NOLOCK)
) a
WHERE (Number * @Step + @Start) <= @max-2
It comes close to the ufn_Tally2 Function just a little more reads...
Regards,
Thorsten
Actually, it doesn't come close. To generate 1 to 10,000,000 numbers, Lynn's method pretty much blows the doors off that type of cross join (ie: more than a pair of tables cross joined) when inserting into a temp table...
Lynn's Method Akki's Method Akki/Lynn Ratio
------------- ------------- ---------------
Duration 6.880 14.520 111.0%
CPU 5.172 12.570 143.0%
Reads 10,676 129,596 1,113.9% YOWCH!!
Writes 21,005 18,911 -10.0%
Keep in mind that the display is the "great equalizer"... you cannot judge real performance when returning result sets to the screen.

Awesome. Thanks for the comparision between methods.

Jeff Moden SSC Guru Points: 1004224 More actions · Answer 11

peter-757102 (9/22/2009)
Jeff Moden (9/22/2009)
peter-757102 (9/22/2009)
We really need a mode of SQL server where al output order is randomized if no ordering is in effect (explicit or otherwise). This would take away the guessing out of some SQL constructs. So if it can go wrong, it will go wrong and right on your first test run.
Personally I would never rely on implicit ordering, we seen how that effect can change as the LOP changes due to available cores (remember the partitioned version of running totals?).
Heh... add OPTION(MAXDOP 1) and so you don't have to worry about the available cores. 😉
I know Jeff, I read the article well, but here we are talking functions that are to be incorporated in more complex constructs. Just turning off parallelism can be bad for performance. And the user of the function might not be aware of it.
It pays to be cautious and not to raise expectations that are not necessarily true.

I agree. You do have to be careful especially if order is important. If a DBA refuses to allow a permanent Tally table, I'll try to slip a temporary one in where I can use an ORDER BY on the Tally table to ensure the correct processing order. The ORDER BY costs nothing because if the clustered index is on "N" because a SORT won't even show up in the execution plan, but you have the comfort of knowing that the order is guaranteed because of the ORDER BY.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1004224 More actions · Answer 12

Lynn Pettis (9/22/2009)
Jeff Moden (9/22/2009)
MrAkki (9/22/2009)
Nice article. 🙂
Another way would be:
DECLARE @Start BIGINT, @Step BIGINT, @max-2 BIGINT
SELECT @Start = 1, @Step = 10 , @max-2 = 10000000
SELECT (Number * @Step + @Start)
FROM (
SELECT TOP (@Max/@Step)
ROW_NUMBER() OVER (ORDER BY c.[object_id]) - 1 AS Number
FROM sys.columns AS c WITH (NOLOCK)
CROSS JOIN sys.columns AS c2 WITH (NOLOCK)
CROSS JOIN sys.columns AS c3 WITH (NOLOCK)
CROSS JOIN sys.columns AS c4 WITH (NOLOCK)
CROSS JOIN sys.columns AS c5 WITH (NOLOCK)
) a
WHERE (Number * @Step + @Start) <= @max-2
It comes close to the ufn_Tally2 Function just a little more reads...
Regards,
Thorsten
Actually, it doesn't come close. To generate 1 to 10,000,000 numbers, Lynn's method pretty much blows the doors off that type of cross join (ie: more than a pair of tables cross joined) when inserting into a temp table...
Lynn's Method Akki's Method Akki/Lynn Ratio
------------- ------------- ---------------
Duration 6.880 14.520 111.0%
CPU 5.172 12.570 143.0%
Reads 10,676 129,596 1,113.9% YOWCH!!
Writes 21,005 18,911 -10.0%
Keep in mind that the display is the "great equalizer"... you cannot judge real performance when returning result sets to the screen.
Awesome. Thanks for the comparision between methods.

Let me add a caveat to that comparison. Akki didn't refer to the MASTER database where a goodly row count will be in the sysColumns table. If the database has realtively few column counts (I did the test from TempDB where there are relatively few), then the multiple cross joins may get up to 3 or 4 that are actually activated before the desired rowcount is realized. Had Akki referred the cross joins to the Master database, sysColumns will always have at least 4K rows in 2k and 11k rows in 2k5. Both are sufficient to satisfy an X² count up to 10 million where only one cross join would be activated and his method would, in fact, be a little faster. It would only take 4 seconds instead of 6 to create the 10,000,000 rows. But, move above 16M in 2k or 121M in 2k5 and performance on Akii's method will tank again.

The bottom line is as we always say... "It Depends". 😀

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Lynn Pettis SSC Guru Points: 442467 More actions · Answer 13

Jeff Moden (9/22/2009)
Lynn Pettis (9/22/2009)
Jeff Moden (9/22/2009)
MrAkki (9/22/2009)
Nice article. 🙂
Another way would be:
DECLARE @Start BIGINT, @Step BIGINT, @max-2 BIGINT
SELECT @Start = 1, @Step = 10 , @max-2 = 10000000
SELECT (Number * @Step + @Start)
FROM (
SELECT TOP (@Max/@Step)
ROW_NUMBER() OVER (ORDER BY c.[object_id]) - 1 AS Number
FROM sys.columns AS c WITH (NOLOCK)
CROSS JOIN sys.columns AS c2 WITH (NOLOCK)
CROSS JOIN sys.columns AS c3 WITH (NOLOCK)
CROSS JOIN sys.columns AS c4 WITH (NOLOCK)
CROSS JOIN sys.columns AS c5 WITH (NOLOCK)
) a
WHERE (Number * @Step + @Start) <= @max-2
It comes close to the ufn_Tally2 Function just a little more reads...
Regards,
Thorsten
Actually, it doesn't come close. To generate 1 to 10,000,000 numbers, Lynn's method pretty much blows the doors off that type of cross join (ie: more than a pair of tables cross joined) when inserting into a temp table...
Lynn's Method Akki's Method Akki/Lynn Ratio
------------- ------------- ---------------
Duration 6.880 14.520 111.0%
CPU 5.172 12.570 143.0%
Reads 10,676 129,596 1,113.9% YOWCH!!
Writes 21,005 18,911 -10.0%
Keep in mind that the display is the "great equalizer"... you cannot judge real performance when returning result sets to the screen.
Awesome. Thanks for the comparision between methods.
Let me add a caveat to that comparison. Akki didn't refer to the MASTER database where a goodly row count will be in the sysColumns table. If the database has realtively few column counts (I did the test from TempDB where there are relatively few), then the multiple cross joins may get up to 3 or 4 that are actually activated before the desired rowcount is realized. Had Akki referred the cross joins to the Master database, sysColumns will always have at least 4K rows in 2k and 11k rows in 2k5. Both are sufficient to satisfy an X² count up to 10 million where only one cross join would be activated and his method would, in fact, be a little faster. It would only take 4 seconds instead of 6 to create the 10,000,000 rows. But, move above 16M in 2k or 121M in 2k5 and performance on Akii's method will tank again.
The bottom line is as we always say... "It Depends". 😀

Also remember that syscolumns is depreciated. It is there for backward compatibility.

Jeff Moden SSC Guru Points: 1004224 More actions · Answer 14

Excellent article, Lynn. Very well laid out and nicely explained. It's too bad that the publishing process ate some of the spaces in the code or it would have been perfect.

There is one optimization you can make that knocks about a third of the run time off your good code. You have the following code to determine the number of values to return...

L3 (

N

) as (

select top ((abs(casewhen @pStartValue < @pEndValue

then @pEndValue

else @pStartValue

end -

case when @pStartValue < @pEndValue

then @pStartValue

else @pEndValue

end))/abs(@pIncrement)+ 1)

a1.N

from

L2 a1

crossjoin L2 a2

That can be replaced with the following:

L3 (N) AS (SELECT TOP (ABS(@pEndValue - @pStartValue) / ABS(@pIncrement)+ 1) --Limit N per parameters

a1.N FROM L2 a1 CROSS JOIN L2 a2), --up to 100,000,000

That knocks the duration from 6.880 down to 4.664 and the CPU usage from 5.172 down to 3.782. That optimization makes your good code run just as fast as an optimized cross-join of SysColumns or All_Columns but your code has the extra capacity of being programmable.

I also see that folks are complaining about the complexity of the code. I think a lot of that comes simply from the way you have it formatted. Yep... I agee... to each their own when it comes to formatting. But, I think you may have gotten fewer complaints about complexity if you formatted it something like this (includes the optimization I spoke of)...

CREATE FUNCTION dbo.ufn_Tally2

(

@pStartValue BIGINT = 1,

@pEndValue BIGINT = 1000000,

@pIncrement BIGINT = 1

)

RETURNS TABLE

AS

RETURN (

WITH

BaseNum (N) AS (

SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL

SELECT 1

), --10

L1 (N) AS (SELECT bn1.N FROM BaseNum bn1 CROSS JOIN BaseNum bn2), --100

L2 (N) AS (SELECT a1.N FROM L1 a1 CROSS JOIN L1 a2), --10,000

L3 (N) AS (SELECT TOP (ABS(@pEndValue - @pStartValue) / ABS(@pIncrement)+ 1) --Limit N per parameters

a1.N FROM L2 a1 CROSS JOIN L2 a2), --up to 100,000,000

Tally (N) AS (SELECT ROW_NUMBER()OVER (ORDER BY a1.N) FROM L3 a1) --Add row numbers

SELECT ((N - 1) * @pIncrement) + @pStartValue AS N FROM Tally --Modify N per parameters

);

So far as any complaints of complexity go, though, my answer would be "What, you type this stuff everytime you use it?" 😛

I will admit, however, that I have had to work on sites where they do everything possible, short of a cavity search, to ensure that you're not carrying any type of portable media. That means nothing can go in or come out... period. That's when I will resort to the sysColumns method because I normally don't need more than 16 million rows and its very easy to remember.

Just a couple of more things and then I'll get off the soap box. This is for other folks that may read this....

The reason why the method Lynn wrote about is so very important is because of some things that most folks don't even know about. Lynn's method will generate a Billion rows (if you ever need it) without any expansion of the LDF files whatsoever... the multiple cross join of sysColumns to do the same thing will blow out an LDF file by adding more than 40 Gig to it. (Heh... now just take a guess on how I know that 😉 ) They'll both take about the same time, but which would you rather use?

Of course, there's aways the "I can't add to the schema" problem that many folks have to suffer through that this type of code easily resolves. Adding anything to certain servers can void maintenance agreements, etc, etc.

Others may also be of the ilk that Itzik already wrote about this particular method... why do we need yet another article about it? The answer is that Lynn did a pot wad of testing and I've verified his testing. Having 10 values in the first CTE is a "sweet spot". Going either above or below that (Itzick used 2 initial values) decreases performance. For this method, 10 values in the first CTE seems to be the magic number for the max in performance.

Lynn... I'll say it again... nice job.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1004224 More actions · Answer 15

Lynn Pettis (9/22/2009)
Also remember that syscolumns is depreciated. It is there for backward compatibility.

Agreed. For existing 2k and 2k5 apps that aren't likely to migrate, that's probably ok (heh... ok... update everything). For new 2k5 and 2k8 apps, folks should use Master.sys.All_Columns or something else that will be around for a rev or two.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)