|
|
|
SSCoach
         
Group: General Forum Members
Last Login: Yesterday @ 1:55 PM
Points: 15,442,
Visits: 9,571
|
|
I came up with a different plan from yours for "best solution":
set nocount on go CREATE TABLE CaseDetails (CaseId INT IDENTITY(1,1) primary key, QTY NUMERIC(14,4), weightedQty NUMERIC(14,4), ShipmentId INT, Total as Qty + WeightedQty) go raiserror('First Insert', 10, 1) with nowait go insert into dbo.CaseDetails(qty, weightedqty, shipmentid) select 200, 300, 1 from common.dbo.bignumbers where number between 1 and 15000 go raiserror('Second Insert', 10, 1) with nowait go insert into dbo.CaseDetails(qty, weightedqty, shipmentid) select 200, 300, numbers.number from common.dbo.numbers cross join common.dbo.numbers n2 where numbers.number between 2 and 10000 go raiserror('Index', 10, 1) with nowait go create index IDX_CaseDetails_TotalQty on dbo.casedetails(shipmentid) include (qty, weightedqty, total) go set statistics io on set statistics time on
print 'First Method'
select qty, weightedqty, qty + weightedqty from dbo.casedetails where shipmentid = 1 /* SQL Server parse and compile time: CPU time = 0 ms, elapsed time = 1 ms. SQL Server parse and compile time: CPU time = 0 ms, elapsed time = 1 ms. Table 'CaseDetails'. Scan count 1, logical reads 82, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times: CPU time = 16 ms, elapsed time = 231 ms. */ print 'Second Method'
declare @Total int
select @total = qty + weightedqty from dbo.casedetails where shipmentid = 1
select qty, weightedqty, @total from dbo.casedetails where shipmentid = 1 /* SQL Server parse and compile time: CPU time = 0 ms, elapsed time = 1 ms. Table 'CaseDetails'. Scan count 1, logical reads 82, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times: CPU time = 0 ms, elapsed time = 8 ms. Table 'CaseDetails'. Scan count 1, logical reads 82, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times: CPU time = 31 ms, elapsed time = 230 ms. */ print 'Third Method'
select qty, weightedqty, total from dbo.casedetails where shipmentid = 1 /* SQL Server parse and compile time: CPU time = 0 ms, elapsed time = 1 ms. SQL Server parse and compile time: CPU time = 0 ms, elapsed time = 1 ms. Table 'CaseDetails'. Scan count 1, logical reads 82, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times: CPU time = 0 ms, elapsed time = 226 ms. */
As you'll see, I added a calculated column to the test table. Then I inserted 15,000 rows for shipping ID 1, and 100-million rows of data for 10,000 other shipping IDs. (I prefer tests in bigger tables.) Then I indexed the table.
On my computer, populating the tables with over 100-million rows of data took quite a while, but by using a Numbers table solution, it wasn't too bad (under an hour). That might have been overkill, I have to admit. If you decide to test this, I recommend about 10-million rows. Will be easier to deal with and perfectly valid for testing.
I didn't test the inline function, since I already know that will fail horribly.
With the calculated column and the right index, the CPU time on these queries goes WAY down.
The first test was a straight-up calculation, without the calculated column. It used the index, and took 16 milliseconds of CPU time and 1 table scan.
Second test used the "assign the number to a variable" method, as outlined in the article. That used the index, took 31 milliseconds, and took 2 table scans (1 to assign the variable, 1 to select the data).
The final test used the calculated column. That took under 1 millisecond of CPU time (too fast to measure) and 1 table scan.
(All of these "table scans" are just the IO stats. They're actually index seeks in the execution plan.)
Rather than using an inline function for this kind of calculation, I highly recommend non-persisted, indexed, calculated columns. I have used them for many things, and they are very fast, don't slow down insert/update/delete by too much (I guess if the calculation is complex enough and the insert/update is big enough, it might slow it down measurably, but I haven't seen that yet). Obviously, they do have some cost for those operations, and do add to the size of the index, but the cost is quite good compared to the benefit.
This way, you don't have to count on the data being the same in all rows, you cut IO in half, and you get your select done so fast that "statistics time" can't really measure it. And that's in a table with 100-million rows.
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
|
|
|
|
|
SSC Rookie
      
Group: General Forum Members
Last Login: Thursday, January 31, 2013 3:06 AM
Points: 36,
Visits: 225
|
|
There is a big difference in writing : >> Unoptimsed SP method for lookup using local var create procedure mysum (@myid int) as declare @myint int set @myint= @myid select x,y,z from mytable where mylookupfilter = @myint
and: >>Optimised using input variable allows select statement >> to be properly cached create procedure mysum (@myid int) as select x,y,z from mytable where mylookupfilter = @myid
If to test the true performance, would need to correctly structure the sp to use the input parameter on the where clause filter, otherwise the query is not cached the same. I suspect you will find performance of the set-based sql procedure will improve even further My mantras always been : "functions like cursors should be avoided if at all possible".
I have certainly seen a good benefit of removing functions and converting to pure optimsed set based sql solutions.
|
|
|
|
|
SSCoach
         
Group: General Forum Members
Last Login: Yesterday @ 1:55 PM
Points: 15,442,
Visits: 9,571
|
|
d_sysuk (4/29/2008) There is a big difference in writing : >> Unoptimsed SP method for lookup using local var create procedure mysum (@myid int) as declare @myint int set @myint= @myid select x,y,z from mytable where mylookupfilter = @myint
and: >>Optimised using input variable allows select statement >> to be properly cached create procedure mysum (@myid int) as select x,y,z from mytable where mylookupfilter = @myid
The query plan on those two will be exactly the same. The first one just adds a fraction of a microsecond of allocating some memory for a variable and assigning a value to it, that the second one doesn't have to do. The actual query is the same.
Maybe I'm misunderstanding what you're writing.
Edit: Just to be sure, I just tested both of these concepts on my 100-million row table. Two procs, exact same execution plan, exact same execution time, IO, etc. First run on each took a few seconds, second and subsequent runs on each took too little time to measure.
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
|
|
|
|
|
SSC Rookie
      
Group: General Forum Members
Last Login: Thursday, January 31, 2013 3:06 AM
Points: 36,
Visits: 225
|
|
There is a more subtle point, Im making in the query plan optimisation that is saved.
A greater man, than I has quite nicely explained this , have a quick look at ( a google search should pull up the resource info): MSDN I smell a parameter. Tips, Tricks, and Advice from the SQL Server Query Optimization Team MSDN Ken Henderson plan stability in SQL Server 2000
It goes into a lot more detail than I can in this forum.
The example posted was quite cut-down, within the given example a name lookup is used to get the PK, which is then used to get the result set back. (2 Querys onto the source table - when this should really be a single set-based query utilising the name as in posters article to get the desired result-set, without the need to lookup the ID value then assign to a local var then use the local var value as a lookup )
|
|
|
|
|
SSCommitted
      
Group: General Forum Members
Last Login: Tuesday, January 15, 2013 11:11 AM
Points: 1,945,
Visits: 2,782
|
|
>> What I mean is you have a bunch (about a dozen) of tables with charts of tax rates, freight rates, etc. Each row will have different values from each table and then they get computed for the final result. We have functions for various lookups and calculations, some will be repeated, some won't How can you improve this type of process? <<
You might want to get a copy of THINKING IN SETS, which deals with auxiliary tables for such things. Instead of trying to compute business days (which is impossible), you use a Calendar table and join to it. Instead of trying to compute shipping zones rates, look them up based on ZIP codes. Etc.
SQL is good at JOINs and bad at math. Functions cannot be parallelized or optimized; JOINs can.
Books in Celko Series for Morgan-Kaufmann Publishing Analytics and OLAP in SQL Data and Databases: Concepts in Practice Data, Measurements and Standards in SQL SQL for Smarties SQL Programming Style SQL Puzzles and Answers Thinking in Sets Trees and Hierarchies in SQL
|
|
|
|
|
SSCoach
         
Group: General Forum Members
Last Login: Yesterday @ 1:55 PM
Points: 15,442,
Visits: 9,571
|
|
d_sysuk (4/29/2008) There is a more subtle point, Im making in the query plan optimisation that is saved.
A greater man, than I has quite nicely explained this , have a quick look at ( a google search should pull up the resource info): MSDN I smell a parameter. Tips, Tricks, and Advice from the SQL Server Query Optimization Team MSDN Ken Henderson plan stability in SQL Server 2000
It goes into a lot more detail than I can in this forum.
The example posted was quite cut-down, within the given example a name lookup is used to get the PK, which is then used to get the result set back. (2 Querys onto the source table - when this should really be a single set-based query utilising the name as in posters article to get the desired result-set, without the need to lookup the ID value then assign to a local var then use the local var value as a lookup )
I think you may have partially misinterpreted the articles you cite. I just read them, and what you outlined in your example isn't actually what's gone over in the articles.
Yes, setting a default can override certain aspects of query optimization. My prefered method for doing this is creating separate procs called by a "meta-proc".
What you have in your example doesn't actually accomplish this. I tested it, and the execution plans are identical, except for the name of the variable in the scalar operator in the seek predicates.
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
|
|
|
|
|
SSCoach
         
Group: General Forum Members
Last Login: Yesterday @ 1:55 PM
Points: 15,442,
Visits: 9,571
|
|
Matt Minnis (4/29/2008) What about matrix type calculations? How can they be accelerated? What I mean is you have a bunch (about a dozen) of tables with charts of tax rates,freight rates, etc. Each row will have different values from each table and then they get computed for the final result. We have functions for various lookups and calculations, some will be repeated, some won't How can you improve this type of process? I can see your technique for the 1 function, but I'm not sure how it will scale in a large stored proc with lots of function calls and computations.
Pre-compute the values as much as possible, put them in tables, and join to them. Using inline scalar functions makes the proc operate row-by-row, which is inherently slow. Joining to pre-computed values is inherently fast.
If the calculation is too complex for that, then move it out of the database into something that's more efficient at it, or move it to a CLR UDF, which will (if built correctly), be better at it than SQL is.
But it's very, very rare that building tables for these things can't get you the right results, and do it fast.
Let's take tax rates for example. Maybe sales tax.
You could build a function that would calculate sales tax based on state, county, city of merchant, and state, county, city of customer, and then use a function to look this up. Or you could build a table of state-to-state rates, including counties and cities, and then just join to that. Do you end up with a larger table to join to? Yep. Does your update code have to do more work when tax rates are changed or tax laws are modified? Yep. But the cost of those things is tiny compared to the fact that querying by joining to that table will be potentially thousands of times faster than an inline UDF. You have to build the table once. You have to update it every few years (in most cases). You might have to query it a few thousand times per hour. Put your speed on the query, in that case.
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
|
|
|
|
|
Ten Centuries
      
Group: General Forum Members
Last Login: Wednesday, March 27, 2013 3:22 PM
Points: 1,276,
Visits: 1,112
|
|
Confucius247 (4/29/2008) My attitude with functions is avoid like the plague, as they break the rule of thinking in sets while programming SQL.
Table-Valued Functions.
|
|
|
|
|
Right there with Babe
      
Group: General Forum Members
Last Login: Thursday, February 14, 2013 12:01 PM
Points: 743,
Visits: 900
|
|
There are definite places where functions are of value, but it is important that they be written carefully to avoid replicating work. In one of the examples listed, the function runs a select statement against the table and then is used inside of a select statement against the same table. It is doing vastly more reads than it needs to. On the other hand, there are places like complex manipulation of a single string that will be done the same way regardless of the strings source can usefully be put into a function with minimal performance hits and increased testing possibilities and code reuse.
--- Timothy A Wiseman SQL Blog: http://timothyawiseman.wordpress.com/
|
|
|
|
|
Grasshopper
      
Group: General Forum Members
Last Login: Friday, March 27, 2009 4:01 AM
Points: 14,
Visits: 40
|
|
Mike C (4/29/2008)
Table-Valued Functions.
And if that were what we were talking about...
|
|
|
|