|
|
|
Valued Member
      
Group: General Forum Members
Last Login: Wednesday, August 29, 2012 2:08 PM
Points: 74,
Visits: 237
|
|
I am adding a column to a table in my database that tracks the rolling standard deviation over a specified period of time.
So far, I have been able to implement this, but it is really slow! I'm hoping someone might have a hint as to how I might be able to speed this procedure up a bit...
Or maybe I'm heading in the complete wrong direction and SQL is not the right tool to be using for this. I know it's not meant to be a statistics workhorse, but I didn't think STDEV was really that intense of a statistical computation...
Here's information on the tables and the code I'm using so far - this is for a rolling stdev calculation over the 30 days prior.
At the start of the procedure, myTable contains rows like: IDNumber INT Date DATETIME OHdiff FLOAT OLdiff FLOAT StDev_OH_30Day FLOAT StDev_OL_30Day FLOAT
The StDev columns are NULL at the beginning, these are what I will be filling in with the procedure below: NOTE: @lastUpdate, @updateThrough, @minIDNumber and @maxIDNumber are set by another part of my query, and are just making sure that I'm not duplicating work that's already been done...
UPDATE a SET StDev_OH_30Day=(SELECT STDEV(b.OHdiff) FROM myTable b WHERE b.IDNumber= a.IDNumber AND b.MarketDate < a.MarketDate AND b.MarketDate > a.MarketDate-31 ) , StDev_OL_30Day=(SELECT STDEV(b.OLdiff) FROM myTable b WHERE b.IDNumber = a.IDNumber AND b.MarketDate < a.MarketDate AND b.MarketDate > a.MarketDate-31 ) FROM myTable a WHERE a.MarketDate BETWEEN @lastUpdate+.1 AND @updateThrough AND a.IDNumber BETWEEN @minIDNumber AND @maxIDNumber
It takes about 10 minutes to run this query on a little less than 90K rows. Increasing the time period for all the stdev calcs from 30 days to 90 days increases the processing time to 15 minutes.
This is meant to be part of a sequence of other updates, so the snappier I can make it, the better =)
Any ideas are always appreciated - Thanks!
|
|
|
|
|
SSC Veteran
      
Group: General Forum Members
Last Login: Thursday, February 21, 2013 1:07 PM
Points: 214,
Visits: 609
|
|
Might be a cool time to try something built in the SQL CLR to see if you can get any perf gains.
Lee Everest
|
|
|
|
|
Valued Member
      
Group: General Forum Members
Last Login: Wednesday, August 29, 2012 2:08 PM
Points: 74,
Visits: 237
|
|
Sorry, but I'm not sure I really understand what you mean. I'm still really new to SQL.
Do you mean defining a user-defined function with C or some other language like in the examples on this page?http://msdn.microsoft.com/en-us/library/w2kae45k(VS.80).aspx
|
|
|
|
|
SSChampion
        
Group: General Forum Members
Last Login: Today @ 10:50 AM
Points: 11,605,
Visits: 27,647
|
|
can you show us the execution plan when you run the update? that will tell us exactly what the update is doing, and allow us to offer suggestions like indexes on the date column, etc that might speed up the update
Lowell
--There is no spoon, and there's no default ORDER BY in sql server either. Actually, Common Sense is so rare, it should be considered a Superpower. --my son
|
|
|
|
|
SSC Veteran
      
Group: General Forum Members
Last Login: Thursday, February 21, 2013 1:07 PM
Points: 214,
Visits: 609
|
|
Exactly. And it doesn't need to be just limited to one clr function...just an idea.
Lee Everest
|
|
|
|
|
Valued Member
      
Group: General Forum Members
Last Login: Wednesday, August 29, 2012 2:08 PM
Points: 74,
Visits: 237
|
|
I would love suggestions on how to improve my indexing!
Right now I have 3 indexes on my table: 1) A clustered PK index on a RowID column 2) A non-clustered index on Date, then IDNumber which includes the StDev cols (and a few other cols not involved in this update) 3) A non-clustered index on IDNumber then Date which includes the StDev cols (and a few other cols not involved in this update)
Was that what you were looking for?
Thanks!
|
|
|
|
|
SSC-Enthusiastic
      
Group: General Forum Members
Last Login: Thursday, May 09, 2013 10:02 AM
Points: 170,
Visits: 664
|
|
I'm not sure why you have two separate subqueries to calculate the two standard deviations. It looks to me like it could be one.
I would also change the whole thing to a CTE. It should be much faster that way.
|
|
|
|
|
Valued Member
      
Group: General Forum Members
Last Login: Wednesday, August 29, 2012 2:08 PM
Points: 74,
Visits: 237
|
|
Sorry, but I'm not sure I follow... I'm learning a lot just from trying to understand what everyone here is suggesting.
Would you mind explaining a bit further how I might be able to put the two separate subqueries into one as part of an update statement?
Thanks!
|
|
|
|
|
Valued Member
      
Group: General Forum Members
Last Login: Wednesday, August 29, 2012 2:08 PM
Points: 74,
Visits: 237
|
|
In trying to combine the two subqueries into one, I got this far, but have encountered the error messages: The multi-part identifier "a.IDNumber" could not be bound. The multi-part identifier "a.MarketDate" could not be bound.
UPDATE a SET StDev_OH_30Day=T.StDev_OHdiff , StDev_OL_30Day=T.StDev_OLdiff FROM myTable, (SELECT STDEV(b.OHdiff) AS StDev_OHdiff , STDEV(b.OLdiff) AS StDev_OLdiff FROM myTable b WHERE b.IDNumber = a.IDNumber AND b.MarketDate < a.MarketDate AND b.MarketDate > a.MarketDate-31 ) AS T WHERE a.MarketDate BETWEEN @lastUpdate+.1 AND @updateThrough AND a.IDNumber BETWEEN @minIDNumber AND @maxIDNumber
If anyone's got any ideas or suggestions, I would really appreciate it.
Thanks!
|
|
|
|
|
SSC-Enthusiastic
      
Group: General Forum Members
Last Login: Thursday, May 09, 2013 10:02 AM
Points: 170,
Visits: 664
|
|
Here's a sample of it as a CTE - note the ; before it. You will need that if this is part of a procedure that does other things.
; WITH get_stddev AS ( SELECT a.idnumber , a.marketdate , STDEV(b.OHdiff) stddev_oh , STDEV(b.OLdiff) stddev_ol FROM myTable b JOIN myTable a ON ( b.IDNumber = a.IDNumber AND b.MarketDate < a.MarketDate AND b.MarketDate > a.MarketDate-31 ) WHERE a.MarketDate BETWEEN @lastUpdate + .1 AND @updateThrough AND a.IDNumber BETWEEN @minIDNumber AND @maxIDNumber GROUP BY a.idnumber, a.marketdate ) ----- end cte UPDATE c SET StDev_OH_30Day = stddev_oh , StDev_OL_30Day = stddev_ol FROM mytable c JOIN get_stddev gs ON ( gs.IDNumber = c.IDNumber AND c.marketdate = gs.marketdate )
I urge you to test this thoroughly. I wasn't completely sure of your data set-up. It's possible this needs some changes for your tables.
|
|
|
|