The Set-Based Limit

Question

The Set-Based Limit

Viewing 15 posts - 46 through 60 (of 74 total)

You must be logged in to reply to this topic. Login to reply

Jeff Moden SSC Guru Points: 1003896 More actions · Answer 1

peter (12/23/2008)
Jeff is quiet about my posts - too quiet....is a storm coming?
LOL

Nah... not ignoring you, Peter. Functions do normally contain some unneccessary overhead. Normally... there are exceptions. It was a very long time ago that we did some testing on this very forum and I can't even remember who the players were. I can't find the tests but... on multi-processor boxes using certain functions, I remember the functions actually winning out over the inline code. They were a bit of an oddity and I can't remember what the heck they did, but they did beat inline code. My silence was because I was trying to find those tests and haven't been able to put my finger on them.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1003896 More actions · Answer 2

Steve Jones - Editor (12/22/2008)
Indeed, my guess is Jeff if handcrafting something right now in the winter snows of MI.

Heh... just waiting for the specs on the 23 character password/pin. I also want a copy of the original code to test against.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

RBarryYoung SSC Guru Points: 143329 More actions · Answer 3

Ninja's_RGR'us (12/23/2008)
The only problem I see is to be able to create a dynamic amount of keys without looping and without sacrificing performance (how do you handle making 10 keys and 10 000 000 keys from the same query with same linear performance). I have a couple theories, but no time to test... maybe next year.

Actually, I suspect that the real challenge is to efficiently exclude any duplicates from the result set.

[font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung [font="Arial Black"]
Proactive Performance Solutions, Inc. [/font][font="Verdana"] "Performance is our middle name."[/font]

RBarryYoung SSC Guru Points: 143329 More actions · Answer 4

Jeff Moden (12/23/2008)[hrIt was a very long time ago that we did some testing on this very forum and I can't even remember who the players were. I can't find the tests but...

Jeff: Do you remember the approximate time frame?

[font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung [font="Arial Black"]
Proactive Performance Solutions, Inc. [/font][font="Verdana"] "Performance is our middle name."[/font]

Steve Jones - SSC Editor SSC Guru Points: 734564 More actions · Answer 5

This would make a great article. We don't have enough of these comparing RBAR with set-based solutions.

Jeff Moden SSC Guru Points: 1003896 More actions · Answer 6

RBarryYoung (12/23/2008)
Jeff Moden (12/23/2008)[hrIt was a very long time ago that we did some testing on this very forum and I can't even remember who the players were. I can't find the tests but...
Jeff: Do you remember the approximate time frame?

Gosh... I wanna say about 4 years ago... it started off with someone saying that, except for a certain function, all functions were bad. What the heck was his name? I'll keep looking for it now and again...

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

RBarryYoung SSC Guru Points: 143329 More actions · Answer 7

Are we talking about UDF's or builtin functions? as far as I know, the only overhead that the builtin functions have is the CPU cost of their actual calculations.

[font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung [font="Arial Black"]
Proactive Performance Solutions, Inc. [/font][font="Verdana"] "Performance is our middle name."[/font]

Jeff Moden SSC Guru Points: 1003896 More actions · Answer 8

RBarryYoung (12/23/2008)
Are we talking about UDF's or builtin functions? as far as I know, the only overhead that the builtin functions have is the CPU cost of their actual calculations.

Oh, it was for UDF's alright. And, like I said, at one point I proved that a certain type of UDF in the presence of a multi-processor box was actually faster than inline code. Come to think of it... I may have actually saved the argument on my hard drive somewhere... lemme take a look.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1003896 More actions · Answer 9

Kit Brandner (12/22/2008)
Try a 23-character long alphanumeric PIN code with certain characters excluded for legibility that can't be repeated in the database. So Jeff's solution wouldn't work. I suppose I should have just put that in the original message, but proprietary sounded better.
One of the proposed set-based methods I was trying was having a table that contained all the valid characters, and a tally table. But this method didn't render desirable execution times, and often would hang in the case of generating large sets of PIN codes. I also tried working with CROSS JOINs, but that didn't pan out much further. Any other ideas?
Consider this: the term "PIN code" doesn't necessarily apply to the bank machine, just like the term "algorithm" is a fancy word for a set of rules. And banks often allow for a PIN code longer than 4 digits.

Ok, Kit... Merry Christmas... in return, I sure would like to see the script you have for doing this... I dug up a fairly recent bit of code I made in response to a script posting and was actually going to make an article about it... I can probably still do that, but here's the code...

First, we need to be able to use the NEWID() function within a User Defined Function... since that's not allowed, we gotta get sneaky... we need this view to do it for us...

CREATE VIEW dbo.RandomIntegerGenerator AS

/**********************************************************************************************************************

Purpose:

Return a single random Integer for use in User Defined Functions. May return negative, 0, or positive whole numbers.

Range of numbers is approximately +- 2 Billion

-----------------------------------------------------------------------------------------------------------------------

Revision History:

Rev 00 - 13 Nov 2008 -- Jeff Moden

Intial creation in response to the script titled "Generate a random password" posted at

http://www.sqlservercentral.com/scripts/Random+number/64471/

**********************************************************************************************************************/

SELECT CHECKSUM(NEWID()) AS RandomInteger --CheckSum's return type is INT

GO

... and here's the code for the function that will generate just about any kind of password you want. The only thing I have left to do is to force it to use at least 1 character from each requested character set, on demand. Do notice that things like a lower case "L" and uppercase "O" have been omitted. As usual, my explanations for what's happening in the code and how to use it, are in the code. Please read the header for all the different things you can do with this function...

CREATE FUNCTION dbo.CreateRandomPassword

/**********************************************************************************************************************

Purpose:

Generate random passwords of a given length using up to 3 types of characters.

-----------------------------------------------------------------------------------------------------------------------

Usage:

SELECT dbo.CreateRandomPassword (@pCharacterSet, @pPasswordLength)

-----------------------------------------------------------------------------------------------------------------------

Parameters:

==== @pCharacterSet: (See "Programmer's Notes #2)

When @pCharacterSet contains 'LOWER' anywhere, Then LOWER CASE ALPHA characters are included

When @pCharacterSet contains 'UPPER' anywhere, Then UPPER CASE ALPHA characters are included

When @pCharacterSet contains 'ALPHA' anywhere, Then both LOWER and UPPER CASE ALPHA characters are included

When @pCharacterSet contains 'NUMERIC' anywhere, Then numeric digits 2 thru 9 are included

When @pCharacterSet contains 'DIGITS' anywhere, Then numeric digits 0 thru 9 are included

When @pCharacterSet contains 'SYMBOL' anywhere, Then special characters are included

When @pCharacterSet contains 'ALL' anywhere, Then all character types except 'DIGITS' are included

When @pCharacterSet IS NULL or BLANK, Then all character types are included

==== @pPasswordLength

When @pPasswordLength < 1 Then will return 10 character password

When @pPasswordLength > 8000 Then will return 8000 character password

When @pPasswordLength IS NULL Then will return 10 character password

Otherwise, will return desired length

-----------------------------------------------------------------------------------------------------------------------

Programmer's Notes:

1. Note that following characters have been eliminated to avoid confusion for the user. The exception is that when

'DIGITS' is included in the @pCharacterSet, both 0 and 1 will be included for things like PINs.

0 (zero)

O (capital "O")

1 (one)

l (lower case "L")

I (capital "I")

2. There is no guarantee that at least 1 character from each of the "character sets" will actually be used.

3. Because NEWID(), the only "real" source of random numbers in SQL Server, is not allowed in user defined functions

in either SQL Server 2000 or 2005, this function calls a the RandomIntegerGenerator view which generates random

whole numbers (positive, zero, or negative) as Integers (INT datatype) with a range of about +- 2 Billion.

-----------------------------------------------------------------------------------------------------------------------

Technical References:

1. Title: The "Numbers" or "Tally" Table: What it is and how it replaces a loop.

Author: Jeff Moden - 07 May 2008

Link: http://www.sqlservercentral.com/articles/TSQL/62867/

2. Title: Performance Tuning: Concatenation Functions and Some Tuning Myths

Author: Jeff Moden - 01 Jan 2008

Link: http://www.sqlservercentral.com/articles/Test+Data/61572/

-----------------------------------------------------------------------------------------------------------------------

Revision History:

Rev 00 - 13 Nov 2008 -- Jeff Moden

Intial creation in response to the script titled "Generate a random password" posted at

http://www.sqlservercentral.com/scripts/Random+number/64471/

**********************************************************************************************************************/

--===== Declare the input/output parameters

(@pCharacterSet VARCHAR(256), @pPasswordLength INT)

RETURNS VARCHAR(8000)

AS

BEGIN

--===== Declare local variables

DECLARE @rPassword VARCHAR(8000), --Return variable

@CharacterType INT, --Bitmap to determine character types

@Characters VARCHAR(256) --Characters to build the password from

SELECT

--===== Refactor input values to bitmap values using bit OR's.

@pCharacterSet = UPPER(@pCharacterSet), --For case sensitive installations

@CharacterType = CASE WHEN @pCharacterSet LIKE '%LOWER%' THEN 1 ELSE 0 END

| CASE WHEN @pCharacterSet LIKE '%UPPER%' THEN 2 ELSE 0 END

| CASE WHEN @pCharacterSet LIKE '%ALPHA%' THEN 3 ELSE 0 END --Combo 1 and 2

| CASE WHEN @pCharacterSet LIKE '%NUMERIC%' THEN 4 ELSE 0 END

| CASE WHEN @pCharacterSet LIKE '%DIGITS%' THEN 8 ELSE 0 END --For PINs

| CASE WHEN @pCharacterSet LIKE '%SYMBOLS%' THEN 16 ELSE 0 END

| CASE WHEN @pCharacterSet LIKE '%SPECIAL%' THEN 16 ELSE 0 END

| CASE WHEN @pCharacterSet LIKE '%ALL%' THEN 23 ELSE 0 END, --Combo 1, 2, 4, and 16

@CharacterType = ISNULL(NULLIF(@CharacterType,0),23), --Default to lower case if null, 0, blank, or not found

@pPasswordLength = CASE --Limit the "psuedo-cursor loop" to the desired length with "limits"

WHEN ISNULL(@pPasswordLength,0) < 1 THEN 10

WHEN @pPasswordLength > 8000 THEN 8000

ELSE @pPasswordLength

END,

--===== Initialize the password

@rPassword = '',

--===== Build the character set to build the password from.

@Characters = ''

+ CASE WHEN @CharacterType & 1 = 1 THEN 'abcdefghijkmnopqrstuvwxyz' ELSE '' END --LOWER/ALPHA/ALL

+ CASE WHEN @CharacterType & 2 = 2 THEN 'ABCDEFGHJKLMNPQRSTUVWXYZ' ELSE '' END --UPPER/ALPHA/ALL

+ CASE WHEN @CharacterType & 4 = 4 THEN '23456789' ELSE '' END --NUMERIC/ALL

+ CASE WHEN @CharacterType & 8 = 8 THEN '0123456789' ELSE '' END --DIGITS

+ CASE WHEN @CharacterType & 16 = 16 THEN '">_!@#$%&=?<>' ELSE '' END --SYMBOL/SPECIAL/ALL

--===== Using concatenation and a psuedo-loop provided by a Tally table, create a password using the desired

-- mix of characters to the desired length. See "Programmer's Note 3" in the header for why the call to

-- the RandomIntegerGenerator view is necessary.

SELECT @rPassword = @rPassword

+ SUBSTRING(@Characters,(SELECT ABS(RandomInteger)%LEN(@Characters)+1 FROM dbo.RandomIntegerGenerator),1)

FROM dbo.Tally t WITH(NOLOCK)

WHERE t.N <= @pPasswordLength --Limit the "psuedo-cursor loop" to the desired length with "limits"

--===== Return the assemble password and exit

RETURN @rPassword

END

GO

And, here's some test code... on my box, it gen's 10,000 23 character alpha-numeric passwords in about 1200 milli-seconds... that should be plenty fast enough for anyone...

DECLARE @StartTime DATETIME

SET @StartTime = GETDATE()

SELECT dbo.CreateRandomPassword('AlphaNumeric',23)

FROM dbo.Tally

WHERE N <= 10000

SELECT DATEDIFF(ms,@StartTime,GETDATE())

I suppose you could reject passwords that have like adjacent letters, but then that would make the password pseudo-random and would cut down on the number of possibilities that someone would have to try to break a password.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1003896 More actions · Answer 10

Oh yeah... almost forgot... you need a Tally table for that code to work... please see the following...

[font="Arial Black"]

The "Numbers" or "Tally" Table: What it is and how it replaces a loop.

[/font][/url]

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1003896 More actions · Answer 11

So, Kit... can I see your code like I asked, please?

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

peter-757102 SSCertifiable Points: 6877 More actions · Answer 12

Jeff, any luck in finding the saved "function argument" you mentioned? I would love to have a look at that earlyer discussion. BTW. This seems to be part of the discussion where everyone asks code from everyone else, lol...so to change the tune:

Until shown otherwise I stick to my conclusion that the code example I provided is strong indictation, if not proof of, that the slowdown with user definined functions has nothing to do with them not being set based at all. And that instead the usual experienced massive slowdown is caused by an unreal call overhead of the UDF functions themselfs (compared next to none for build-in functions). This is a direct result of the way Microsoft implemented them and not of a fundamental property of functions or procedural code as is often perceived by the pure set based line of thinking camp. Time to stop bashing on even sound functions/procedural code and instead put the blame where it belongs....at the desk of some Micrsoft SQL Server team member.

Get me right! This is not an argument against the set based definition (such as Jeff posted). I fully subscribe to the core of that definition. Instead I augment it by stating that what it expresses it is just as valid for a well writen set based solution as for a well written procedural one. Touch a minimal set of data, a minimal number of times, and in a minimal amount of set based steps as the optimizer or query complexity allows you to get away with. It is all about getting your results fast and efficient (and preferably in a non-blocking way) and not about a holy war of set vs procedural.

Use set based constructs where you can as it exploits indexes and such and break it up in several set based stages when it gets to complex or inefficent due to optimizer/information limitations. This later is perceived as procedural by some. I would like to counter that this observation it is irrelevant and that set based solutons usually manage to achieve the primary goal in one step...doing the least amount of work...but sometimes you nead to break things up to achive this goal.

And remember that under the hood ALL software is procedural, the set based aproach is just a mathematical model on top of it that allows for automatic optimizations to happen...and as we all know, no single model is perfect.

Steve Jones - SSC Editor SSC Guru Points: 734564 More actions · Answer 13

All software does get procedural, though parallelism might get through some of that, and the interrupt/jumps to new parts as we wait for IO can also help things appear to happen a little more "set based"

However, the optimizations in software, specifically SQL Server in this case, do make thinking in sets more efficient than procedurally dealing with data.

Does this matter with UDFs? No idea, but if you'd like to write up some articles on tests (or anyone else, for or against), I think it would be great to get that information out there. I don't know I have enough experience to devise good tests, but it would be great to see what you can come up with, as well as some result from Jeff and others.

Ninja's_RGR'us SSC Guru Points: 294069 More actions · Answer 14

Steve Jones - Editor (1/22/2009)
All software does get procedural, though parallelism might get through some of that, and the interrupt/jumps to new parts as we wait for IO can also help things appear to happen a little more "set based"
However, the optimizations in software, specifically SQL Server in this case, do make thinking in sets more efficient than procedurally dealing with data.
Does this matter with UDFs? No idea, but if you'd like to write up some articles on tests (or anyone else, for or against), I think it would be great to get that information out there. I don't know I have enough experience to devise good tests, but it would be great to see what you can come up with, as well as some result from Jeff and others.

Just a quick heads up for the function question. I remember doing a test way back when on 2000 and even a schema bound function had an impact (nothing huge but it seemed consistant). Now when you start querying base tables, you're just screwed. I just tuned a query yesterday. The whole thing ran in 0.5 secs on average (15K records returned). Once I added a function to calculate the # of business days between the rows' date and today, the query jumped to 3 minutes (server class machine 64 bit, 16gb of ram, quad procs, the db I was using is only 8 GB in size and on a super fast SAN... the table the function was using? 10 000 rows, correctly indexed, and less than 1 MB of data).

I tuned by using a temp table + quirky update to figure out all the possible values I'd need which runs in under 13 MS then use that in an inner join.

Jeff Moden SSC Guru Points: 1003896 More actions · Answer 15

peter (1/22/2009)
Jeff, any luck in finding the saved "function argument" you mentioned?

No, sorry... I can't even remember who the "argument" was with...

Touch a minimal set of data, a minimal number of times, and in a minimal amount of set based steps as the optimizer or query complexity allows you to get away with. It is all about getting your results fast and efficient (and preferably in a non-blocking way) and not about a holy war of set vs procedural.

Use set based constructs where you can as it exploits indexes and such and break it up in several set based stages when it gets to complex or inefficent due to optimizer/information limitations. This later is perceived as procedural by some. I would like to counter that this observation it is irrelevant and that set based solutons usually manage to achieve the primary goal in one step...doing the least amount of work...but sometimes you nead to break things up to achive this goal.
And remember that under the hood ALL software is procedural, the set based aproach is just a mathematical model on top of it that allows for automatic optimizations to happen...and as we all know, no single model is perfect.

That's mostly right (especially about touching the data the fewest number of times possible) except for one thing that I argue over and over... people simply give up on "set based" solutions too early because they simply haven't studied their trade well enough to "just know" the proper set based solution (I emphasize "proper" because there's a lot of wrong ways to do it and it sometimes picks up a bad name from those wrong ways). For me, it is a bit of a "holy war" because I've seen so much very slow, resource demanding code come from it along with some of the really lame excuses people use to justify RBAR.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)