Click here to monitor SSC
SQLServerCentral is supported by Redgate
 
Log in  ::  Register  ::  Not logged in
 
 
 


Speeding up updates that use STDEV (standard deviation)


Speeding up updates that use STDEV (standard deviation)

Author
Message
bkmooney
bkmooney
SSC Journeyman
SSC Journeyman (83 reputation)SSC Journeyman (83 reputation)SSC Journeyman (83 reputation)SSC Journeyman (83 reputation)SSC Journeyman (83 reputation)SSC Journeyman (83 reputation)SSC Journeyman (83 reputation)SSC Journeyman (83 reputation)

Group: General Forum Members
Points: 83 Visits: 251
I am adding a column to a table in my database that tracks the rolling standard deviation over a specified period of time.

So far, I have been able to implement this, but it is really slow! I'm hoping someone might have a hint as to how I might be able to speed this procedure up a bit...

Or maybe I'm heading in the complete wrong direction and SQL is not the right tool to be using for this. I know it's not meant to be a statistics workhorse, but I didn't think STDEV was really that intense of a statistical computation...

Here's information on the tables and the code I'm using so far - this is for a rolling stdev calculation over the 30 days prior.

At the start of the procedure, myTable contains rows like:
IDNumber INT
Date DATETIME
OHdiff FLOAT
OLdiff FLOAT
StDev_OH_30Day FLOAT
StDev_OL_30Day FLOAT

The StDev columns are NULL at the beginning, these are what I will be filling in with the procedure below:
NOTE: @lastUpdate, @updateThrough, @minIDNumber and @maxIDNumber are set by another part of my query, and are just making sure that I'm not duplicating work that's already been done...

   UPDATE a
SET StDev_OH_30Day=(SELECT STDEV(b.OHdiff)
FROM myTable b
WHERE b.IDNumber= a.IDNumber
AND b.MarketDate < a.MarketDate
AND b.MarketDate > a.MarketDate-31 )
, StDev_OL_30Day=(SELECT STDEV(b.OLdiff)
FROM myTable b
WHERE b.IDNumber = a.IDNumber
AND b.MarketDate < a.MarketDate
AND b.MarketDate > a.MarketDate-31 )
FROM myTable a
WHERE
a.MarketDate BETWEEN @lastUpdate+.1 AND @updateThrough
AND a.IDNumber BETWEEN @minIDNumber AND @maxIDNumber



It takes about 10 minutes to run this query on a little less than 90K rows. Increasing the time period for all the stdev calcs from 30 days to 90 days increases the processing time to 15 minutes.

This is meant to be part of a sequence of other updates, so the snappier I can make it, the better =)

Any ideas are always appreciated - Thanks!
ab5sr
ab5sr
SSC Veteran
SSC Veteran (216 reputation)SSC Veteran (216 reputation)SSC Veteran (216 reputation)SSC Veteran (216 reputation)SSC Veteran (216 reputation)SSC Veteran (216 reputation)SSC Veteran (216 reputation)SSC Veteran (216 reputation)

Group: General Forum Members
Points: 216 Visits: 699
Might be a cool time to try something built in the SQL CLR to see if you can get any perf gains.

Lee Everest


bkmooney
bkmooney
SSC Journeyman
SSC Journeyman (83 reputation)SSC Journeyman (83 reputation)SSC Journeyman (83 reputation)SSC Journeyman (83 reputation)SSC Journeyman (83 reputation)SSC Journeyman (83 reputation)SSC Journeyman (83 reputation)SSC Journeyman (83 reputation)

Group: General Forum Members
Points: 83 Visits: 251
Sorry, but I'm not sure I really understand what you mean. I'm still really new to SQL.

Do you mean defining a user-defined function with C or some other language like in the examples on this page?http://msdn.microsoft.com/en-us/library/w2kae45k(VS.80).aspx
Lowell
Lowell
SSChampion
SSChampion (14K reputation)SSChampion (14K reputation)SSChampion (14K reputation)SSChampion (14K reputation)SSChampion (14K reputation)SSChampion (14K reputation)SSChampion (14K reputation)SSChampion (14K reputation)

Group: General Forum Members
Points: 14955 Visits: 38948
can you show us the execution plan when you run the update? that will tell us exactly what the update is doing, and allow us to offer suggestions like indexes on the date column, etc that might speed up the update

Lowell

--
help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

ab5sr
ab5sr
SSC Veteran
SSC Veteran (216 reputation)SSC Veteran (216 reputation)SSC Veteran (216 reputation)SSC Veteran (216 reputation)SSC Veteran (216 reputation)SSC Veteran (216 reputation)SSC Veteran (216 reputation)SSC Veteran (216 reputation)

Group: General Forum Members
Points: 216 Visits: 699
Exactly. And it doesn't need to be just limited to one clr function...just an idea.

Lee Everest


bkmooney
bkmooney
SSC Journeyman
SSC Journeyman (83 reputation)SSC Journeyman (83 reputation)SSC Journeyman (83 reputation)SSC Journeyman (83 reputation)SSC Journeyman (83 reputation)SSC Journeyman (83 reputation)SSC Journeyman (83 reputation)SSC Journeyman (83 reputation)

Group: General Forum Members
Points: 83 Visits: 251
I would love suggestions on how to improve my indexing!

Right now I have 3 indexes on my table:
1) A clustered PK index on a RowID column
2) A non-clustered index on Date, then IDNumber which includes the StDev cols (and a few other cols not involved in this update)
3) A non-clustered index on IDNumber then Date which includes the StDev cols (and a few other cols not involved in this update)

Was that what you were looking for?

Thanks!
KarenD
KarenD
SSC-Enthusiastic
SSC-Enthusiastic (182 reputation)SSC-Enthusiastic (182 reputation)SSC-Enthusiastic (182 reputation)SSC-Enthusiastic (182 reputation)SSC-Enthusiastic (182 reputation)SSC-Enthusiastic (182 reputation)SSC-Enthusiastic (182 reputation)SSC-Enthusiastic (182 reputation)

Group: General Forum Members
Points: 182 Visits: 770
I'm not sure why you have two separate subqueries to calculate the two standard deviations. It looks to me like it could be one.

I would also change the whole thing to a CTE. It should be much faster that way.
bkmooney
bkmooney
SSC Journeyman
SSC Journeyman (83 reputation)SSC Journeyman (83 reputation)SSC Journeyman (83 reputation)SSC Journeyman (83 reputation)SSC Journeyman (83 reputation)SSC Journeyman (83 reputation)SSC Journeyman (83 reputation)SSC Journeyman (83 reputation)

Group: General Forum Members
Points: 83 Visits: 251
Sorry, but I'm not sure I follow... I'm learning a lot just from trying to understand what everyone here is suggesting.

Would you mind explaining a bit further how I might be able to put the two separate subqueries into one as part of an update statement?

Thanks!
bkmooney
bkmooney
SSC Journeyman
SSC Journeyman (83 reputation)SSC Journeyman (83 reputation)SSC Journeyman (83 reputation)SSC Journeyman (83 reputation)SSC Journeyman (83 reputation)SSC Journeyman (83 reputation)SSC Journeyman (83 reputation)SSC Journeyman (83 reputation)

Group: General Forum Members
Points: 83 Visits: 251
In trying to combine the two subqueries into one, I got this far, but have encountered the error messages:
The multi-part identifier "a.IDNumber" could not be bound.
The multi-part identifier "a.MarketDate" could not be bound.


UPDATE a
SET StDev_OH_30Day=T.StDev_OHdiff
, StDev_OL_30Day=T.StDev_OLdiff
FROM myTable,
(SELECT STDEV(b.OHdiff) AS StDev_OHdiff
, STDEV(b.OLdiff) AS StDev_OLdiff
FROM myTable b
WHERE b.IDNumber = a.IDNumber
AND b.MarketDate < a.MarketDate
AND b.MarketDate > a.MarketDate-31 ) AS T
WHERE
a.MarketDate BETWEEN @lastUpdate+.1 AND @updateThrough
AND a.IDNumber BETWEEN @minIDNumber AND @maxIDNumber





If anyone's got any ideas or suggestions, I would really appreciate it.

Thanks!
KarenD
KarenD
SSC-Enthusiastic
SSC-Enthusiastic (182 reputation)SSC-Enthusiastic (182 reputation)SSC-Enthusiastic (182 reputation)SSC-Enthusiastic (182 reputation)SSC-Enthusiastic (182 reputation)SSC-Enthusiastic (182 reputation)SSC-Enthusiastic (182 reputation)SSC-Enthusiastic (182 reputation)

Group: General Forum Members
Points: 182 Visits: 770
Here's a sample of it as a CTE - note the ; before it. You will need that if this is part of a procedure that does other things.

;
WITH get_stddev AS
( SELECT a.idnumber ,
a.marketdate ,
STDEV(b.OHdiff) stddev_oh ,
STDEV(b.OLdiff) stddev_ol

FROM myTable b
JOIN myTable a
ON
(
b.IDNumber = a.IDNumber
AND b.MarketDate < a.MarketDate
AND b.MarketDate > a.MarketDate-31
)
WHERE a.MarketDate BETWEEN @lastUpdate + .1 AND @updateThrough
AND a.IDNumber BETWEEN @minIDNumber AND @maxIDNumber
GROUP BY a.idnumber,
a.marketdate
) ----- end cte
UPDATE c
SET StDev_OH_30Day = stddev_oh ,
StDev_OL_30Day = stddev_ol
FROM mytable c
JOIN get_stddev gs
ON
(
gs.IDNumber = c.IDNumber
AND c.marketdate = gs.marketdate )



I urge you to test this thoroughly. I wasn't completely sure of your data set-up. It's possible this needs some changes for your tables.
Go


Permissions

You can't post new topics.
You can't post topic replies.
You can't post new polls.
You can't post replies to polls.
You can't edit your own topics.
You can't delete your own topics.
You can't edit other topics.
You can't delete other topics.
You can't edit your own posts.
You can't edit other posts.
You can't delete your own posts.
You can't delete other posts.
You can't post events.
You can't edit your own events.
You can't edit other events.
You can't delete your own events.
You can't delete other events.
You can't send private messages.
You can't send emails.
You can read topics.
You can't vote in polls.
You can't upload attachments.
You can download attachments.
You can't post HTML code.
You can't edit HTML code.
You can't post IFCode.
You can't post JavaScript.
You can post emoticons.
You can't post or upload images.

Select a forum

































































































































































SQLServerCentral


Search