The Dynamic Tally or Numbers Table

  • the sqlist (9/22/2009)


    Jeff Moden (9/22/2009)

    I'm using SQL SERVER Profiler with the data produced by the queries routed to a temp table. Routing the output to the screen gives false readings because it takes so long to display the data that it looks like all methods are the same. That's why I called outputting to the screen the "great equalizer".

    I guess you're right, here is a much simpler way to test performance though:

    set statistics time on

    select number into #tmp from [dbo].[ufn_Tally_zb] (1,100000,1) order by number

    drop table #tmp

    Of course, you have to execute both functions using the example template above in different windows and not at the same time. Mine it is much slower indeed becasue creates from start all 1Meg of rows and filters them later. For big numbers though the difference drops drastically.

    However, in order to make the function run on SQL 2000 and earlier you have to use my approach. Actually performance wasn't really my goal here and I asume I could tweak it a little bit. Not today though. 😛

    Heh.... now I'm not sure how you are testing. Like I said before... Lynn's runs in under 6 seconds for a 10M gen... yours (after adding 1 extra cross join to get there) took 65 seconds.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Lynn Pettis (9/22/2009)


    row_number() over (order by a1.N)

    Something I have noticed when working with the windowing functions in queries using them, if the query itself does not have an ORDER BY, the ORDER BY in the windowing function orders the output.

    I'd be interested if anyone else has seen this behaviour as well.

    I have. And it makes sense. The engine has to sort the data to assign row numbers to it (or rank/dense rank), and changing it would actually add work.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • Lynn Pettis (9/22/2009)


    First, leave it to you squeeze another 33% out of a routine, awesome. I'll have to incorporate that change into my code. I think Steve needs to add a disclaimer to the top and bottom of this articel:

    Heh... I just get lucky. And disclaimers are for bad articles... not this one.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (9/22/2009)

    Heh.... now I'm not sure how you are testing. Like I said before... Lynn's runs in under 6 seconds for a 10M gen... yours (after adding 1 extra cross join to get there) took 65 seconds.

    You're rigth, here are my results,

    test:

    set statistics time on

    select * into #tmp from [dbo].[ufn_Tally2] (1,10000000,1) --order by number

    drop table #tmp

    results:

    SQL Server Execution Times:

    CPU time = 4890 ms, elapsed time = 4933 ms.

    (10000000 row(s) affected)

    under 5 secs.

    Don't just give the hungry man a fish, teach him how to catch it as well.

    the sqlist

  • Jeff Moden (9/22/2009)


    timothyawiseman (9/22/2009)


    Great article Lynn. It seems that even if you aren't permitted to change the schema that if you are going to use it multiple times you might get better results by putting it into a temp table instead of generating it dynamically over and over.

    Still, if it must be dynamically generated, this is a great technique.

    True enough. But I've found that the same sites that don't allow a permanent Tally table also have an "angry" DBA the will frequently not allow any TempDB usage nor any access to things like sysColumns in any code... that's where dynamic generation can really save the day especially if it's nasty fast like this one is.

    Heh... here's to "angry" DBA's... I get to charge for the time I take to work around their restrictions. 😛

    And most of them won't realize that "you can't do anything in tempdb" really means "you can't do anything at all in the whole database except for simple CRUD, and don't you dare use Order By in any of your selects". SQL Server has a real tendency to do a lot of work in tempdb behind the scenes, even if you don't explicitly tell it to.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • BWAA-HAA!!! Correct... guess it really matters how "angry" the DBA really is. 🙂

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (9/22/2009)


    Lynn Pettis (9/22/2009)


    First, leave it to you squeeze another 33% out of a routine, awesome. I'll have to incorporate that change into my code. I think Steve needs to add a disclaimer to the top and bottom of this articel:

    Heh... I just get lucky. And disclaimers are for bad articles... not this one.

    I don't know, Jeff, even good articles have great discussions attached just like this one. You call it luck squeezing more performace, I call applied mathematics. Had I taken the time to look at the math, I would have seen what you did as well. I was so busy looking at the routine itself and comparing it to that other one, I didn't even consider tuning mine more.

  • Jeff beat me to the punch on limiting the total number of rows processed as an added speed boost.

    Lynn, try this with the function:

    select *

    from dbo.ufn_tally2(1,-1,1);

    You can correct for that one by changing the final select:

    select

    ((N - 1) * case when @pEndValue 0 then @pIncrement * -1 else @pIncrement end) + @pStartValue as N

    from

    Tally

    The added case statement handles having a non-negative increment when the top number is less than the bottom number. My assumption on this one is that the increment should have been negative. Another valid assumption would be that it should just return the starting number, since a single increment exceeds the value of the end number. That could also be coded, but would require a case statement in the Top value calculation instead of the final select.

    Also:

    select *

    from dbo.ufn_tally2(1,20,0);

    If the increment is 0, you'll get a div-0 error. If you want to avoid that, I assume you just want the min number and nothing else (increment 0 = add 0 as many times as you want, and you still have the base number). You can correct for that by modifying this part:

    select top (isnull((abs(case when @pStartValue < @pEndValue

    then @pEndValue

    else @pStartValue

    end -

    case when @pStartValue < @pEndValue

    then @pStartValue

    else @pEndValue

    end))/abs(nullif(@pIncrement, 0))+ 1, 1))

    I added a NullIf around the increment value, and an IsNull around the whole thing, with a 1 as the alternate return value. That gives you just the base number, nothing else.

    Not that anyone is likely to deliberately feed the function nonsense parameters like these, but in a dynamic environment, you never know what you're going to end up with if you use any sort of math to generate the parameters, or if you use user-entered values. If any part of the code calling the function ends up with a floating point calculation that rounds in a funky way, these minor changes will help keep it from crashing.

    I haven't tested the performance on these changes. They probably add to the cost. (Edit: Of course, the competing function also doesn't correct for these errors, and adding correction to it would add to its cost as well.)

    Other than those couple of suggestions, it looks like reasonably good code for what it's meant to do.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • Lynn Pettis (9/22/2009)


    Jeff Moden (9/22/2009)


    Lynn Pettis (9/22/2009)


    First, leave it to you squeeze another 33% out of a routine, awesome. I'll have to incorporate that change into my code. I think Steve needs to add a disclaimer to the top and bottom of this articel:

    Heh... I just get lucky. And disclaimers are for bad articles... not this one.

    I don't know, Jeff, even good articles have great discussions attached just like this one. You call it luck squeezing more performace, I call applied mathematics. Had I taken the time to look at the math, I would have seen what you did as well. I was so busy looking at the routine itself and comparing it to that other one, I didn't even consider tuning mine more.

    I agree. Even the best articles benefit from the discussion. That's what makes this site better than a print magazine.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • Hmmmm.... hadn't thought about that. A note about the discussions would probably be appropriate here.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • See, and now we are finding some very interesting errors in the code that I missed. Thanks, I do appreciate it. 🙂

  • I incorporated the changes that both Gus and Jeff suggested and using the simple SET STATISTICS TIME ON method of testing, the routine generates 1,000,000 rows in approximately 900ms.

    Here is the code as an alter function:

    USE [SandBox]

    GO

    /****** Object: UserDefinedFunction [dbo].[ufn_Tally2] Script Date: 09/22/2009 09:06:54 ******/

    SET ANSI_NULLS ON

    GO

    SET QUOTED_IDENTIFIER ON

    GO

    alter function [dbo].[ufn_Tally2](

    @pStartValue bigint = 1,

    @pEndValue bigint = 1000000,

    @pIncrement bigint = 1

    )

    returns table

    as

    return(

    with BaseNum (

    N

    ) as (

    select 1 union all

    select 1 union all

    select 1 union all

    select 1 union all

    select 1 union all

    select 1 union all

    select 1 union all

    select 1 union all

    select 1 union all

    select 1

    ),

    L1 (

    N

    ) as (

    select

    bn1.N

    from

    BaseNum bn1

    cross join BaseNum bn2

    cross join BaseNum bn3

    ),

    L2 (

    N

    ) as (

    select

    a1.N

    from

    L1 a1

    cross join L1 a2

    ),

    L3 (

    N

    ) as (

    select top (isnull(abs(@pEndValue - @pStartValue) / abs(nullif(@pIncrement,0))+ 1,1)) --Limit N per parameters

    a1.N

    from

    L2 a1

    cross join L2 a2

    ),

    Tally (

    N

    ) as (

    select

    row_number() over (order by a1.N)

    from

    L3 a1

    )

    select

    ((N - 1) * case when @pEndValue 0 then @pIncrement * -1 else @pIncrement end) + @pStartValue as N

    from

    Tally

    );

  • Just to keep from doing the apples'n'oranges thing, I ran the new code with both changes. The optimization came pretty close to being negated by the extra checking. It took 5.670 seconds to gen the 10M rows into a temp table... but now it's just about "Gumby Proof".

    As a side bar, no matter what the form has taken on this thread so far, they all still blow the doors off of any form of recursion. I'm really surprised that no one caught that on SQL Mag.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff,

    Test this one out:

    USE [SandBox]

    GO

    /****** Object: UserDefinedFunction [dbo].[ufn_Tally2] Script Date: 09/22/2009 09:06:54 ******/

    SET ANSI_NULLS ON

    GO

    SET QUOTED_IDENTIFIER ON

    GO

    alter function [dbo].[ufn_Tally2](

    @pStartValue bigint = 1,

    @pEndValue bigint = 1000000,

    @pIncrement bigint = 1

    )

    returns table

    as

    return(

    with BaseNum (

    N

    ) as (

    select 1 union all

    select 1 union all

    select 1 union all

    select 1 union all

    select 1 union all

    select 1 union all

    select 1 union all

    select 1 union all

    select 1 union all

    select 1

    ),

    L1 (

    N

    ) as (

    select

    bn1.N

    from

    BaseNum bn1

    cross join BaseNum bn2

    cross join BaseNum bn3

    ),

    L2 (

    N

    ) as (

    select

    a1.N

    from

    L1 a1

    cross join L1 a2

    ),

    L3 (

    N

    ) as (

    select top (isnull(abs(@pEndValue - @pStartValue) / abs(nullif(@pIncrement,0))+ 1,1)) --Limit N per parameters

    a1.N

    from

    L2 a1

    cross join L2 a2

    ),

    Tally (

    N

    ) as (

    select

    row_number() over (order by a1.N)

    from

    L3 a1

    )

    select

    ((N - 1) * (sign(@pEndValue - @pStartValue) * @pIncrement)) + @pStartValue as N

    from

    Tally

    );

  • I'll have to test it tomorrow. I'm on a different machine for the night.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 15 posts - 46 through 60 (of 159 total)

You must be logged in to reply to this topic. Login to reply