The Set-Based Limit

  • peter (12/23/2008)


    Jeff is quiet about my posts - too quiet....is a storm coming?

    LOL

    Nah... not ignoring you, Peter. Functions do normally contain some unneccessary overhead. Normally... there are exceptions. It was a very long time ago that we did some testing on this very forum and I can't even remember who the players were. I can't find the tests but... on multi-processor boxes using certain functions, I remember the functions actually winning out over the inline code. They were a bit of an oddity and I can't remember what the heck they did, but they did beat inline code. My silence was because I was trying to find those tests and haven't been able to put my finger on them.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Steve Jones - Editor (12/22/2008)


    Indeed, my guess is Jeff if handcrafting something right now in the winter snows of MI.

    Heh... just waiting for the specs on the 23 character password/pin. I also want a copy of the original code to test against.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Ninja's_RGR'us (12/23/2008)


    The only problem I see is to be able to create a dynamic amount of keys without looping and without sacrificing performance (how do you handle making 10 keys and 10 000 000 keys from the same query with same linear performance). I have a couple theories, but no time to test... maybe next year.

    Actually, I suspect that the real challenge is to efficiently exclude any duplicates from the result set.

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

  • Jeff Moden (12/23/2008)[hrIt was a very long time ago that we did some testing on this very forum and I can't even remember who the players were. I can't find the tests but...

    Jeff: Do you remember the approximate time frame?

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

  • This would make a great article. We don't have enough of these comparing RBAR with set-based solutions.

  • RBarryYoung (12/23/2008)


    Jeff Moden (12/23/2008)[hrIt was a very long time ago that we did some testing on this very forum and I can't even remember who the players were. I can't find the tests but...

    Jeff: Do you remember the approximate time frame?

    Gosh... I wanna say about 4 years ago... it started off with someone saying that, except for a certain function, all functions were bad. What the heck was his name? I'll keep looking for it now and again...

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Are we talking about UDF's or builtin functions? as far as I know, the only overhead that the builtin functions have is the CPU cost of their actual calculations.

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

  • RBarryYoung (12/23/2008)


    Are we talking about UDF's or builtin functions? as far as I know, the only overhead that the builtin functions have is the CPU cost of their actual calculations.

    Oh, it was for UDF's alright. And, like I said, at one point I proved that a certain type of UDF in the presence of a multi-processor box was actually faster than inline code. Come to think of it... I may have actually saved the argument on my hard drive somewhere... lemme take a look.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Kit Brandner (12/22/2008)


    Try a 23-character long alphanumeric PIN code with certain characters excluded for legibility that can't be repeated in the database. So Jeff's solution wouldn't work. I suppose I should have just put that in the original message, but proprietary sounded better.

    One of the proposed set-based methods I was trying was having a table that contained all the valid characters, and a tally table. But this method didn't render desirable execution times, and often would hang in the case of generating large sets of PIN codes. I also tried working with CROSS JOINs, but that didn't pan out much further. Any other ideas?

    Consider this: the term "PIN code" doesn't necessarily apply to the bank machine, just like the term "algorithm" is a fancy word for a set of rules. And banks often allow for a PIN code longer than 4 digits.

    Ok, Kit... Merry Christmas... in return, I sure would like to see the script you have for doing this... I dug up a fairly recent bit of code I made in response to a script posting and was actually going to make an article about it... I can probably still do that, but here's the code...

    First, we need to be able to use the NEWID() function within a User Defined Function... since that's not allowed, we gotta get sneaky... we need this view to do it for us...

    CREATE VIEW dbo.RandomIntegerGenerator AS

    /**********************************************************************************************************************

    Purpose:

    Return a single random Integer for use in User Defined Functions. May return negative, 0, or positive whole numbers.

    Range of numbers is approximately +- 2 Billion

    -----------------------------------------------------------------------------------------------------------------------

    Revision History:

    Rev 00 - 13 Nov 2008 -- Jeff Moden

    Intial creation in response to the script titled "Generate a random password" posted at

    http://www.sqlservercentral.com/scripts/Random+number/64471/

    **********************************************************************************************************************/

    SELECT CHECKSUM(NEWID()) AS RandomInteger --CheckSum's return type is INT

    GO

    ... and here's the code for the function that will generate just about any kind of password you want. The only thing I have left to do is to force it to use at least 1 character from each requested character set, on demand. Do notice that things like a lower case "L" and uppercase "O" have been omitted. As usual, my explanations for what's happening in the code and how to use it, are in the code. Please read the header for all the different things you can do with this function...

    CREATE FUNCTION dbo.CreateRandomPassword

    /**********************************************************************************************************************

    Purpose:

    Generate random passwords of a given length using up to 3 types of characters.

    -----------------------------------------------------------------------------------------------------------------------

    Usage:

    SELECT dbo.CreateRandomPassword (@pCharacterSet, @pPasswordLength)

    -----------------------------------------------------------------------------------------------------------------------

    Parameters:

    ==== @pCharacterSet: (See "Programmer's Notes #2)

    When @pCharacterSet contains 'LOWER' anywhere, Then LOWER CASE ALPHA characters are included

    When @pCharacterSet contains 'UPPER' anywhere, Then UPPER CASE ALPHA characters are included

    When @pCharacterSet contains 'ALPHA' anywhere, Then both LOWER and UPPER CASE ALPHA characters are included

    When @pCharacterSet contains 'NUMERIC' anywhere, Then numeric digits 2 thru 9 are included

    When @pCharacterSet contains 'DIGITS' anywhere, Then numeric digits 0 thru 9 are included

    When @pCharacterSet contains 'SYMBOL' anywhere, Then special characters are included

    When @pCharacterSet contains 'ALL' anywhere, Then all character types except 'DIGITS' are included

    When @pCharacterSet IS NULL or BLANK, Then all character types are included

    ==== @pPasswordLength

    When @pPasswordLength < 1 Then will return 10 character password

    When @pPasswordLength > 8000 Then will return 8000 character password

    When @pPasswordLength IS NULL Then will return 10 character password

    Otherwise, will return desired length

    -----------------------------------------------------------------------------------------------------------------------

    Programmer's Notes:

    1. Note that following characters have been eliminated to avoid confusion for the user. The exception is that when

    'DIGITS' is included in the @pCharacterSet, both 0 and 1 will be included for things like PINs.

    0 (zero)

    O (capital "O")

    1 (one)

    l (lower case "L")

    I (capital "I")

    2. There is no guarantee that at least 1 character from each of the "character sets" will actually be used.

    3. Because NEWID(), the only "real" source of random numbers in SQL Server, is not allowed in user defined functions

    in either SQL Server 2000 or 2005, this function calls a the RandomIntegerGenerator view which generates random

    whole numbers (positive, zero, or negative) as Integers (INT datatype) with a range of about +- 2 Billion.

    -----------------------------------------------------------------------------------------------------------------------

    Technical References:

    1. Title: The "Numbers" or "Tally" Table: What it is and how it replaces a loop.

    Author: Jeff Moden - 07 May 2008

    Link: http://www.sqlservercentral.com/articles/TSQL/62867/

    2. Title: Performance Tuning: Concatenation Functions and Some Tuning Myths

    Author: Jeff Moden - 01 Jan 2008

    Link: http://www.sqlservercentral.com/articles/Test+Data/61572/

    -----------------------------------------------------------------------------------------------------------------------

    Revision History:

    Rev 00 - 13 Nov 2008 -- Jeff Moden

    Intial creation in response to the script titled "Generate a random password" posted at

    http://www.sqlservercentral.com/scripts/Random+number/64471/

    **********************************************************************************************************************/

    --===== Declare the input/output parameters

    (@pCharacterSet VARCHAR(256), @pPasswordLength INT)

    RETURNS VARCHAR(8000)

    AS

    BEGIN

    --===== Declare local variables

    DECLARE @rPassword VARCHAR(8000), --Return variable

    @CharacterType INT, --Bitmap to determine character types

    @Characters VARCHAR(256) --Characters to build the password from

    SELECT

    --===== Refactor input values to bitmap values using bit OR's.

    @pCharacterSet = UPPER(@pCharacterSet), --For case sensitive installations

    @CharacterType = CASE WHEN @pCharacterSet LIKE '%LOWER%' THEN 1 ELSE 0 END

    | CASE WHEN @pCharacterSet LIKE '%UPPER%' THEN 2 ELSE 0 END

    | CASE WHEN @pCharacterSet LIKE '%ALPHA%' THEN 3 ELSE 0 END --Combo 1 and 2

    | CASE WHEN @pCharacterSet LIKE '%NUMERIC%' THEN 4 ELSE 0 END

    | CASE WHEN @pCharacterSet LIKE '%DIGITS%' THEN 8 ELSE 0 END --For PINs

    | CASE WHEN @pCharacterSet LIKE '%SYMBOLS%' THEN 16 ELSE 0 END

    | CASE WHEN @pCharacterSet LIKE '%SPECIAL%' THEN 16 ELSE 0 END

    | CASE WHEN @pCharacterSet LIKE '%ALL%' THEN 23 ELSE 0 END, --Combo 1, 2, 4, and 16

    @CharacterType = ISNULL(NULLIF(@CharacterType,0),23), --Default to lower case if null, 0, blank, or not found

    @pPasswordLength = CASE --Limit the "psuedo-cursor loop" to the desired length with "limits"

    WHEN ISNULL(@pPasswordLength,0) < 1 THEN 10

    WHEN @pPasswordLength > 8000 THEN 8000

    ELSE @pPasswordLength

    END,

    --===== Initialize the password

    @rPassword = '',

    --===== Build the character set to build the password from.

    @Characters = ''

    + CASE WHEN @CharacterType & 1 = 1 THEN 'abcdefghijkmnopqrstuvwxyz' ELSE '' END --LOWER/ALPHA/ALL

    + CASE WHEN @CharacterType & 2 = 2 THEN 'ABCDEFGHJKLMNPQRSTUVWXYZ' ELSE '' END --UPPER/ALPHA/ALL

    + CASE WHEN @CharacterType & 4 = 4 THEN '23456789' ELSE '' END --NUMERIC/ALL

    + CASE WHEN @CharacterType & 8 = 8 THEN '0123456789' ELSE '' END --DIGITS

    + CASE WHEN @CharacterType & 16 = 16 THEN '">_!@#$%&=?<>' ELSE '' END --SYMBOL/SPECIAL/ALL

    --===== Using concatenation and a psuedo-loop provided by a Tally table, create a password using the desired

    -- mix of characters to the desired length. See "Programmer's Note 3" in the header for why the call to

    -- the RandomIntegerGenerator view is necessary.

    SELECT @rPassword = @rPassword

    + SUBSTRING(@Characters,(SELECT ABS(RandomInteger)%LEN(@Characters)+1 FROM dbo.RandomIntegerGenerator),1)

    FROM dbo.Tally t WITH(NOLOCK)

    WHERE t.N <= @pPasswordLength --Limit the "psuedo-cursor loop" to the desired length with "limits"

    --===== Return the assemble password and exit

    RETURN @rPassword

    END

    GO

    And, here's some test code... on my box, it gen's 10,000 23 character alpha-numeric passwords in about 1200 milli-seconds... that should be plenty fast enough for anyone...

    DECLARE @StartTime DATETIME

    SET @StartTime = GETDATE()

    SELECT dbo.CreateRandomPassword('AlphaNumeric',23)

    FROM dbo.Tally

    WHERE N <= 10000

    SELECT DATEDIFF(ms,@StartTime,GETDATE())

    I suppose you could reject passwords that have like adjacent letters, but then that would make the password pseudo-random and would cut down on the number of possibilities that someone would have to try to break a password.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Oh yeah... almost forgot... you need a Tally table for that code to work... please see the following...

    [font="Arial Black"]

    The "Numbers" or "Tally" Table: What it is and how it replaces a loop.

    [/font][/url]

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • So, Kit... can I see your code like I asked, please?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff, any luck in finding the saved "function argument" you mentioned? I would love to have a look at that earlyer discussion. BTW. This seems to be part of the discussion where everyone asks code from everyone else, lol...so to change the tune:

    Until shown otherwise I stick to my conclusion that the code example I provided is strong indictation, if not proof of, that the slowdown with user definined functions has nothing to do with them not being set based at all. And that instead the usual experienced massive slowdown is caused by an unreal call overhead of the UDF functions themselfs (compared next to none for build-in functions). This is a direct result of the way Microsoft implemented them and not of a fundamental property of functions or procedural code as is often perceived by the pure set based line of thinking camp. Time to stop bashing on even sound functions/procedural code and instead put the blame where it belongs....at the desk of some Micrsoft SQL Server team member.

    Get me right! This is not an argument against the set based definition (such as Jeff posted). I fully subscribe to the core of that definition. Instead I augment it by stating that what it expresses it is just as valid for a well writen set based solution as for a well written procedural one. Touch a minimal set of data, a minimal number of times, and in a minimal amount of set based steps as the optimizer or query complexity allows you to get away with. It is all about getting your results fast and efficient (and preferably in a non-blocking way) and not about a holy war of set vs procedural.

    Use set based constructs where you can as it exploits indexes and such and break it up in several set based stages when it gets to complex or inefficent due to optimizer/information limitations. This later is perceived as procedural by some. I would like to counter that this observation it is irrelevant and that set based solutons usually manage to achieve the primary goal in one step...doing the least amount of work...but sometimes you nead to break things up to achive this goal.

    And remember that under the hood ALL software is procedural, the set based aproach is just a mathematical model on top of it that allows for automatic optimizations to happen...and as we all know, no single model is perfect.

  • All software does get procedural, though parallelism might get through some of that, and the interrupt/jumps to new parts as we wait for IO can also help things appear to happen a little more "set based"

    However, the optimizations in software, specifically SQL Server in this case, do make thinking in sets more efficient than procedurally dealing with data.

    Does this matter with UDFs? No idea, but if you'd like to write up some articles on tests (or anyone else, for or against), I think it would be great to get that information out there. I don't know I have enough experience to devise good tests, but it would be great to see what you can come up with, as well as some result from Jeff and others.

  • Steve Jones - Editor (1/22/2009)


    All software does get procedural, though parallelism might get through some of that, and the interrupt/jumps to new parts as we wait for IO can also help things appear to happen a little more "set based"

    However, the optimizations in software, specifically SQL Server in this case, do make thinking in sets more efficient than procedurally dealing with data.

    Does this matter with UDFs? No idea, but if you'd like to write up some articles on tests (or anyone else, for or against), I think it would be great to get that information out there. I don't know I have enough experience to devise good tests, but it would be great to see what you can come up with, as well as some result from Jeff and others.

    Just a quick heads up for the function question. I remember doing a test way back when on 2000 and even a schema bound function had an impact (nothing huge but it seemed consistant). Now when you start querying base tables, you're just screwed. I just tuned a query yesterday. The whole thing ran in 0.5 secs on average (15K records returned). Once I added a function to calculate the # of business days between the rows' date and today, the query jumped to 3 minutes (server class machine 64 bit, 16gb of ram, quad procs, the db I was using is only 8 GB in size and on a super fast SAN... the table the function was using? 10 000 rows, correctly indexed, and less than 1 MB of data).

    I tuned by using a temp table + quirky update to figure out all the possible values I'd need which runs in under 13 MS then use that in an inner join.

  • peter (1/22/2009)


    Jeff, any luck in finding the saved "function argument" you mentioned?

    No, sorry... I can't even remember who the "argument" was with...

    Touch a minimal set of data, a minimal number of times, and in a minimal amount of set based steps as the optimizer or query complexity allows you to get away with. It is all about getting your results fast and efficient (and preferably in a non-blocking way) and not about a holy war of set vs procedural.

    Use set based constructs where you can as it exploits indexes and such and break it up in several set based stages when it gets to complex or inefficent due to optimizer/information limitations. This later is perceived as procedural by some. I would like to counter that this observation it is irrelevant and that set based solutons usually manage to achieve the primary goal in one step...doing the least amount of work...but sometimes you nead to break things up to achive this goal.

    And remember that under the hood ALL software is procedural, the set based aproach is just a mathematical model on top of it that allows for automatic optimizations to happen...and as we all know, no single model is perfect.

    That's mostly right (especially about touching the data the fewest number of times possible) except for one thing that I argue over and over... people simply give up on "set based" solutions too early because they simply haven't studied their trade well enough to "just know" the proper set based solution (I emphasize "proper" because there's a lot of wrong ways to do it and it sometimes picks up a bad name from those wrong ways). For me, it is a bit of a "holy war" because I've seen so much very slow, resource demanding code come from it along with some of the really lame excuses people use to justify RBAR.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 15 posts - 46 through 60 (of 74 total)

You must be logged in to reply to this topic. Login to reply