Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase ««12345»»»

The Set-Based Limit Expand / Collapse
Author
Message
Posted Sunday, December 21, 2008 7:19 PM


SSC-Dedicated

SSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-Dedicated

Group: Administrators
Last Login: Today @ 11:24 AM
Points: 32,781, Visits: 14,942
And once again we're off and running in a different direction.

I doubt this will eclipse the "Are the Posted Questions", but you never know.







Follow me on Twitter: @way0utwest

Forum Etiquette: How to post data/code on a forum to get the best help
Post #623689
Posted Sunday, December 21, 2008 7:26 PM


SSCrazy Eights

SSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy Eights

Group: General Forum Members
Last Login: Friday, March 28, 2014 2:25 PM
Points: 9,902, Visits: 9,479
Matt Miller (12/21/2008)
RBarryYoung (12/21/2008)
Ninja's_RGR'us (12/21/2008)
... so how about your mother!!!! :D

Uh, she died on a remote planet after being beaten and starved to death by Sandpeople, so out of anger I killed every man woman and child in their village. Is that a problem?


I am betting she makes you take that helmet off at the dinner table...:)

Hmmm. Where are those transmissions you intercepted?


-- RBarryYoung, (302)375-0451 blog: MovingSQL.com, Twitter: @RBarryYoung
Proactive Performance Solutions, Inc.
"Performance is our middle name."
Post #623690
Posted Sunday, December 21, 2008 7:29 PM


SSCrazy Eights

SSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy Eights

Group: General Forum Members
Last Login: Friday, March 28, 2014 2:25 PM
Points: 9,902, Visits: 9,479
Yeah, Attention Deficit Disorder is such a, ... uh, ... um, what were we talking about again?

-- RBarryYoung, (302)375-0451 blog: MovingSQL.com, Twitter: @RBarryYoung
Proactive Performance Solutions, Inc.
"Performance is our middle name."
Post #623691
Posted Sunday, December 21, 2008 8:57 PM


Right there with Babe

Right there with BabeRight there with BabeRight there with BabeRight there with BabeRight there with BabeRight there with BabeRight there with BabeRight there with Babe

Group: General Forum Members
Last Login: Friday, April 04, 2014 4:40 PM
Points: 751, Visits: 917
Jeff Moden (12/21/2008)
timothyawiseman (12/21/2008)
Two things of note though are that it is sometimes faster to code the procedural solution than it is to code the set based one, largely due to habit. It is sometimes the case that that savings in programmer time is far more valuable than the processing speed given up, especially if it is a script being written for one time use.


I'm not sure why people think that way... if you practice piano, you don't practice hitting the wrong notes. Once someone knows it, writing set based code is typically faster than writting procedural code if for no other reason than it's usually shorter.


You have a good point, and one I normally agree with, but for those of us that did procedural programming long before we picked up SQL and still do procedural (and OO, haven't tried functional yet, at least not strictly functional like Lisp or Haskell...) in addition to SQL the procedural answer often comes more easily than the set based one.

So when I have the time, I always think through the set based solution for SQL, but I have no hesitation to just jot out the procedural version one time and be done with it if deadlines loom nigh.

And the piano analogy is not perfect there. If you are playing the piano and hit the wrong note it sounds bad and the final product (the music) is markedly flawed. If you are writing code and you write a suboptimal piece of code, it is suboptimal not wrong. The end user may never know, especially if the final product is a spreadsheet with analysis and they never see either the code or its running process.


---
Timothy A Wiseman
SQL Blog: http://timothyawiseman.wordpress.com/
Post #623703
Posted Monday, December 22, 2008 4:42 AM


SSC-Dedicated

SSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-Dedicated

Group: General Forum Members
Last Login: Today @ 4:00 PM
Points: 35,967, Visits: 30,258
timothyawiseman (12/21/2008)
So when I have the time, I always think through the set based solution for SQL, but I have no hesitation to just jot out the procedural version one time and be done with it if deadlines loom nigh.


Exactly what I've talked about on various other threads... managment is to blame for that because they don't understand that a database is much more than just a place to store data.

And the piano analogy is not perfect there. If you are playing the piano and hit the wrong note it sounds bad and the final product (the music) is markedly flawed. If you are writing code and you write a suboptimal piece of code, it is suboptimal not wrong. The end user may never know, especially if the final product is a spreadsheet with analysis and they never see either the code or its running process.


You're right... with a piano, you know you've hit a wrong note and can correct it immediately. With bad/suboptimal code, it may take a while... and then it'll be harder to find what the problem is and correct it.



--Jeff Moden
"RBAR is pronounced "ree-bar" and is a "Modenism" for "Row-By-Agonizing-Row".

First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column."

"Change is inevitable. Change for the better is not." -- 04 August 2013
(play on words) "Just because you CAN do something in T-SQL, doesn't mean you SHOULDN'T." --22 Aug 2013

Helpful Links:
How to post code problems
How to post performance problems
Post #623837
Posted Monday, December 22, 2008 5:19 AM


SSC-Insane

SSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-Insane

Group: General Forum Members
Last Login: Thursday, January 16, 2014 12:03 PM
Points: 21,376, Visits: 9,594
I've played in concerts back in high school... and if and when you hit the wrong note, you can't correct it without starting the whole thing over... quite similar to changing the code :).
Post #623851
Posted Monday, December 22, 2008 5:59 AM
Old Hand

Old HandOld HandOld HandOld HandOld HandOld HandOld HandOld Hand

Group: General Forum Members
Last Login: Wednesday, April 16, 2014 3:35 AM
Points: 319, Visits: 2,151
Jeff,


You be happy to know that your understanding of set based code and mine are 100% the same, going by what you posted in this threat. But still I cannot subscribe to your conclusions and no doubt you will be interested to know why as that might highlight something not often brought up! I see myself a little as a bridge person here as I am a thoroughbred developer with over 20 years under my belt and am quite deep into SQL as well. Maybe not to the level of some of the DBAs here, but I do like SQL server a lot and it offers amazing performance most of the time. That said, its implementation has a serious flaw and just ignoring it because I am a "fan" and like SQL will not make it go away!

So here it goes...I will try to articulate the problem as best I can....I owe you that much (we all do)!

The problem as I see it is SQL Servers ** pardon me ** horrible function support. As soon as you use functions (even deterministic scalar ones), pure set based solutions seems to break down performance wise. There is an unreasonable cost there to keep set based solutions readable when they involve even moderate amounts of "just beyond simple" logic that is best broken down in multiple lines of code to ease understanding and improve expressiveness.

Lack of good function support is a problem because is prevents formulating repeating logic into human readable deterministic scalar functions that can be used in queries like any other logic encapsulated in build-in functions. The current way to keep performance up is to write the logic as complex expressions, much like a formula, that simulate the logic and can be used in set based operations without the performance hit functions take for no good reason.

It would be preferable to rewrite code such as this

select
case
when st.a > st.b then st.a
else st.b
end as BiggestNumberInRow
from
dbo.SomeTable as st
where
case
when st.a > st.b then st.a
else st.b
end < 100


into code such as

select
dbo.maxNum( st.a, st.b ) as BiggestNumberInRow
from
dbo.SomeTable as st
where
dbo.maxNum( st.a, st.b ) < 100

Without having to incur a performance hit as the 2nd snippet of code is much easier to understand. It does demonstrate clearly how much more readable SQL code becomes when you use functions in SQL to hide details that derail you from understanding what the query as a whole tries to accomplish. And as soon as you need something even a little beyond what the build in functions offer you will run into situations like this one.

Here is an example of formula like notations I recently used in a project.

select 
count(1) as TotalConsumers
, sum( sign( isnull( l.IdConsumer, 0 ) ) ) as ActivatedConsumers
, sum( 1 - sign( isnull( l.IdConsumer, 0 ) )) as PendingConsumers
-- l.IdConsumer is null
, sum( ( 1 - sign( isnull( l.IdConsumer, 0 ) ) )
* sign( isnull( l.EstateAgentMailSend, 0 ) ) ) as IncompleteConsumers
-- l.IdConsumer is null and l.EstateAgentMailSend is not null
, sum( sign( isnull( l.IdConsumer, 0 ) )
* sign( isnull( l.EstateAgentMailSend, 0 ) ) ) as CompletedConsumers
-- l.IdConsumer is not null and l.EstateAgentMailSend is not null
from
import.feedlog as l
where
l.idestateagent = 20

The code touches each record only once like proper set based code does. It uses formula like expressions to prevent 'case when ... then ... else ... end' code. The intention of the logic I wrote as comment at the end of each line is totally lost if you just read the formulas. Making a modification in such code is a pain if you do not fully understand what is going on or have moderate SQL knowledge.

BTW this actual code I wrote for a colleague that first used a query with a sub selects for each aggregate...obviously touching every record multiple times. Seeing this I think it is hard to uphold that set based code is by definition more elegant and easier to write and understand.

Here a older production case where I decided to use a deterministic scalar functions to encapsulate some logic, which written as formulas would be incomprehensible.

-- Returns the day of week, with monday being day 0, sunday being day 6!
--
create function dbo.dayOfWeek( @day as datetime ) returns int
with schemabinding as
begin
declare @delta as int set @delta = datediff( d, {d '2007-01-01'}, @day ) % 7
return case when @delta < 0 then 7 + @delta else @delta end
end
go


-- Returns the number of week days (mo..fr) in a date range!
--
create function dbo.fnWeekDaysInDateRange( @firstday as datetime, @lastday as datetime ) returns int
with schemabinding as
begin
-- Widen date range to encompas full calendar weeks
declare @firstMonday as datetime
set @firstMonday = dateadd( d, 0 - dbo.dayOfWeek( @firstday ), @firstday )
declare @lastMonday as datetime
set @lastMonday = dateadd( d, 7 - dbo.dayOfWeek( @lastday ), @lastday )
declare @excessdays as int
declare @daysdiff as int

-- Compensate for extended days at start of daterange
set @daysdiff = datediff( d, @firstMonday, @firstday )
set @excessdays = case when @daysdiff > 5 then 5 else @daysdiff end

-- Compensate for extended days at end of daterange
set @daysdiff = datediff( d, @lastday, @lastMonday )
set @excessdays = @excessdays + case
when @daysdiff >= 3
then @daysdiff - 2
when @daysdiff != 0
then 1
else 0 end

-- Calculate total number of workdays!
return (datediff( ww, @firstMonday, @lastMonday ) * 5) - @excessdays + 1
end
go

These functions are then used in some views and many queries. I show this code as the encapsulated logic cannot realistically be deployed without functions. These particular functions are deterministic, meaning that for any specific input the outcome will always be the same and these do no IO at all. As such they should be evaluated at close the same speed as build in functions in an optimized system.

Unfortunately it performs not as good as it could due to the way functions are implemented in SQL Server...but they demonstrate a realistic need for better function support. You do not want to rewrite these functions into one big formula to evade any SQL Server ** implementation ** performance limitation. It can be done, but I lack the time now to do so, the result however will be much faster and you wouldn't understand how it does its work at all no matter how many detailed comments I would add to the code! Let alone you would be able to maintain it or spot a problem!


I think I made enough of a point to have a meaningful discussion about the role of functions in all this. The common perception is that functions are slow due to them being procedural, but I tell you...build in functions are also procedural. And many SQL tricks deployed to make fast set based code rely heavily on inefficient use of build in relatively fast functions. This IMHO makes in many cases where set based code is victorious over other coding the claim that this is due to it not containing procedural parts.....a shaky claim at best.

There is more to this then meets the eye...it is benchmarking after all. I seen many arguments where it is claimed that on a old server this solution beats that other solution. Seldom I see the counter argument that benchmarking on older hardware is a bad way to test how it will run on newer hardware. Characteristics of servers change over generations, memory is dirt cheap and plenty full these days. Addressing large amounts of it is easy with 64 bit systems too. Disks are about to undergo a revolution with the advance of solid state disks...current server cpu's are incredible fast and their power often still goes underutilized. Testing on hardware from last generation is shaky, testing on 2 or more generations old hardware or on a laptop is just fooling oneself.

Someone on this forum made a sharpish comment a while back that I believe deserves more attention. It was about that servers under load respond differently then when testing on a nearly dedicated sever such as a a development sever. Proper set based code following the definition Jeff mentioned will work best across different server loads. This is because each set based operation can be sort of 'streamed' and resource use in general is low and can be throttled by the server. The worst that can happen when the query plan is good is that temporary results need to be stored in the tempdb.

Which brings us to the optimizer....complex set based code relies heavily on the optimizer to perform well. If it goes tits, prepare for performance far worse then you can imagine based on normal experience. You need to do serious testing to make sure it works as efficient as you expect...check the query plan, IO statistics and time statistics when evaluating several solutions. Really complex operations are often best separated in several set based steps. The optimizer is pretty good but not almighty and it will let you down at some point if you rely on it blindly. As an added advantage you can put in knowledge about the job/data that an optimizer can simply not extract form the statement or from statistics (or it simply gives up trying too soon).

It would be nice is Microsoft would give us more tools to help us optimize. Like giving some control to simulate conditions a query runs under (memory, locks, optimizer aggressiveness). It would make tuning solutions a lot easier and also let us explore how far the optimizer can be pushed to extract the most out of it.

I written to much already I believe.....shoot!
Post #623876
Posted Monday, December 22, 2008 7:04 AM
Old Hand

Old HandOld HandOld HandOld HandOld HandOld HandOld HandOld Hand

Group: General Forum Members
Last Login: Wednesday, April 16, 2014 3:35 AM
Points: 319, Visits: 2,151
To Jeff mostly,


To bolster my claim about function overhead being the great killer in this debate.....I read your article about the function that you wrote for determining the number of workdays in a week. I read it AFTER my previous post and did some quick benchmarks.

I used two functions:

1. One for calculating the day of the week in a deterministic way (no side effects)
2. One doing the actual work.

Your solution used one function that did everything on its own. Guess what....your solution was faster on my system (roughly 9 vs 15 seconds on 100.000 date ranges).

Then I switched to one function as well, by duplicating the work from my first function to the second calling function instead of doing the two calls. This function then became faster still (roughly 7 vs 9 seconds on 100.000 date ranges).

Thus for performing the exact same work my original solution wend from 15 to 7 seconds!!! The time difference totally attributable to the function call overhead! This IMHO proves the whole point about functions being implemented crap by SQL Server. And showcases exactly why in many instances set based code is unreasonable much faster then procedural.

As I said before I can rewrite the code to fit on one line and do it all and it would be even faster (maybe I will, just to stun you all) .... but would anyone still comprehend it?


New code:

create function dbo.fnWeekDaysInDateRange_solo( @firstday as datetime, @lastday as datetime ) returns int
with schemabinding as
begin

declare @delta as int
declare @excessdays as int
declare @daysdiff as int
declare @firstMonday as datetime
declare @lastMonday as datetime

-- Widen date range to encompas full calendar weeks
set @delta = datediff( d, {d '2007-01-01'}, @firstday ) % 7
set @firstMonday = dateadd( d, 0 - case when @delta < 0 then 7 + @delta else @delta end, @firstday )

set @delta = datediff( d, {d '2007-01-01'}, @lastday ) % 7
set @lastMonday = dateadd( d, 7 - case when @delta < 0 then 7 + @delta else @delta end, @lastday )

-- Compensate for extended days at start of daterange
set @daysdiff = datediff( d, @firstMonday, @firstday )
set @excessdays = case when @daysdiff > 5 then 5 else @daysdiff end

-- Compensate for extended days at end of daterange
set @daysdiff = datediff( d, @lastday, @lastMonday )
set @excessdays = @excessdays + case when @daysdiff >= 3 then @daysdiff - 2 when @daysdiff != 0 then 1 else 0 end

-- Calculate total number of workdays!
return (datediff( ww, @firstMonday, @lastMonday ) * 5) - @excessdays + 1
end
go


Inline table valued functions fair better until they overwhelm the optimizers capacity of course...this version does the same job in 3-4 seconds and needs to be used in conjunction with cross apply.

create function dbo.fnWeekDaysInDateRange_solo_inline( @firstday as datetime, @lastday as datetime ) returns table
as
return
(
with
-- Widen date range to encompas full calendar weeks - to first monday
firstDeltaQ (N) as ( select datediff( d, {d '2007-01-01'}, @firstday ) % 7 )
, firstMondayQ (N) as
(
select
dateadd( d, 0 - case when delta.N < 0 then 7 + delta.N else delta.N end, @firstday )
from
firstDeltaQ as delta
)

-- Widen date range to encompas full calendar weeks - to next monday
, lastDeltaQ (N) as ( select datediff( d, {d '2007-01-01'}, @lastday ) % 7 )
, lastMondayQ (N) as
(
select
dateadd( d, 7 - case when delta.N < 0 then 7 + delta.N else delta.N end, @lastday )
from
lastDeltaQ as delta
)

, excessdaysQ ( N ) as
(
select
-- Compensate for extended days at start of daterange
case
when datediff( d, firstMonday.N, @firstday ) > 5 then 5
else datediff( d, firstMonday.N, @firstday )
end
-- Compensate for extended days at end of daterange
+ case
when datediff( d, @lastday, lastMonday.N ) >= 3 then datediff( d, @lastday, lastMonday.N ) - 2
when datediff( d, @lastday, lastMonday.N ) != 0 then 1
else 0
end
from
firstMondayQ as firstMonday
cross join lastMondayQ as lastMonday
)
select
-- Calculate total number of workdays!
(datediff( ww, firstMonday.N, lastMonday.N ) * 5) - excessdays.N + 1 as n
from
firstMondayQ as firstMonday
cross join lastMondayQ as lastMonday
cross join excessdaysQ as excessdays
)
;
go

Annoying smileys all over my code, why?

BTW. I was mentioning seconds, but the last function does the job in like 0.8 seconds the rest is moving the results over to the client (grid mode). Corrected for this time (a little over 3 seconds in my test cases), the inline function is like 6 times faster then your original scalar function and 12 times faster then my own scalar original.

The overhead of scalar functions is as you can see monster like!
Post #623924
Posted Monday, December 22, 2008 7:04 AM


SSCrazy

SSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazy

Group: General Forum Members
Last Login: 2 days ago @ 8:12 AM
Points: 2,627, Visits: 19,093
Two comments -

Steve - interesting model of the solar system you have posted in the article, haven't seen that one before :D

Everyone else (except Darth & Ninja, you guys are lost already) - Jeff's right, if you hit a wrong note on the piano you have to just play it off like it's a right one. Not learning a more effective way to do something because it's harder or takes longer just means that you're never going to be as good as you could be. That said, we all have to play the instruments that we have in our repertoire, so if you have to get something done now and don't have the luxury of learning something new, get it done.

Then rebuild it using the new way, and learn what the differences are.


---------------------------------------------------------
How best to post your question
How to post performance problems
Tally Table:What it is and how it replaces a loop

"stewsterl 80804 (10/16/2009)I guess when you stop and try to understand the solution provided you not only learn, but save yourself some headaches when you need to make any slight changes."
Post #623925
Posted Monday, December 22, 2008 7:46 AM


SSC Eights!

SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!

Group: General Forum Members
Last Login: Wednesday, April 09, 2014 12:22 PM
Points: 806, Visits: 2,096
I guess I object to cursors because of their awkward looking syntax more than anything, although their performance compared side by side with equivalent rewritten set-based or even conventional procedural loops (e.g. while @rownum end), is also bad.

Databases quickly become complex beasts, no matter how simple the requirements or program used (SQL Server, Oracle, MySQL, etc.) It helps that a solid design at the beginning be a worthy goal, because later on, god help us when we have to do a join on a billion rows and the query optimizer decides to do a full table scan.

Gaby


Gaby
________________________________________________________________
"In theory, theory and practice are the same. In practice, they are not."
- Albert Einstein
Post #623946
« Prev Topic | Next Topic »

Add to briefcase ««12345»»»

Permissions Expand / Collapse