December 15, 2017 at 10:54 am
Hi All,
So in an effort to optimize a larger query (which I've posted about here before), I'm trying to utilize computed columns, as I see chunks of code that are repeated throughout the query on certain columns. Amongst other things I've tried/will try, I'm hoping to be able to improve performance/run time, by implementing indexing on the computed column, but it turns out upon researching computed columns, it must be deterministic. So in order for this to happen, I'm trying to use schemabinding with the computed column, but am unsure how to go about doing so. This chunk of code is repeated about 7 times in the overall query:
,DC.OriginalEnrollmentTermNumber-ROUND((DATEDIFF(DAY,CAST(LEFT(RIGHT(FCLD.[CoverageEffectiveDateKey],4),2) + '/'
+ RIGHT(FCLD.[CoverageEffectiveDateKey],2) + '/' + LEFT(FCLD.[CoverageEffectiveDateKey],4) AS DATE), CAST(LEFT(RIGHT(FCLD.[CoverageEndDateKey],4),2) + '/'
+ RIGHT(FCLD.[CoverageEndDateKey],2) + '/' + LEFT(FCLD.[CoverageEndDateKey],4) AS DATE)))/(365/12),0) as MonthsCancelled2
and I've created a computed column for this using the following:
Alter table FctCoverageLengthDetail
Add [CalculatedMonths] as ROUND((DATEDIFF(DAY,CAST(LEFT(RIGHT([CoverageEffectiveDateKey],4),2) + '/'
+ RIGHT([CoverageEffectiveDateKey],2) + '/' + LEFT([CoverageEffectiveDateKey],4) AS DATE), CAST(LEFT(RIGHT([CoverageEndDateKey],4),2) + '/'
+ RIGHT([CoverageEndDateKey],2) + '/' + LEFT([CoverageEndDateKey],4) AS DATE)))/(365/12),0) --Persisted
, but it wouldn't let me specify "Persisted" (which would've helped towards the indexing cause), as it's saying "Computed column 'CalculatedMonths' in table 'FctCoverageLengthDetail' cannot be persisted because the column is non-deterministic.", which I'm thinking is the case because the data would change, being that it's date-based, and also depending on what's in the Where clause. In any case, I'm just having a hard time figuring out how to be able to index the computed column, and would appreciate your much valued feedback.
Thank you!
December 15, 2017 at 11:02 am
daniness - Friday, December 15, 2017 10:54 AMHi All,So in an effort to optimize a larger query (which I've posted about here before), I'm trying to utilize computed columns, as I see chunks of code that are repeated throughout the query on certain columns. Amongst other things I've tried/will try, I'm hoping to be able to improve performance/run time, by implementing indexing on the computed column, but it turns out upon researching computed columns, it must be deterministic. So in order for this to happen, I'm trying to use schemabinding with the computed column, but am unsure how to go about doing so. This chunk of code is repeated about 7 times in the overall query:
,DC.OriginalEnrollmentTermNumber-ROUND((DATEDIFF(DAY,CAST(LEFT(RIGHT(FCLD.[CoverageEffectiveDateKey],4),2) + '/'+ RIGHT(FCLD.[CoverageEffectiveDateKey],2) + '/' + LEFT(FCLD.[CoverageEffectiveDateKey],4) AS DATE), CAST(LEFT(RIGHT(FCLD.[CoverageEndDateKey],4),2) + '/'
+ RIGHT(FCLD.[CoverageEndDateKey],2) + '/' + LEFT(FCLD.[CoverageEndDateKey],4) AS DATE)))/(365/12),0) as MonthsCancelled2
and I've created a computed column for this using the following:Alter table FctCoverageLengthDetail
Add [CalculatedMonths] as ROUND((DATEDIFF(DAY,CAST(LEFT(RIGHT([CoverageEffectiveDateKey],4),2) + '/'
+ RIGHT([CoverageEffectiveDateKey],2) + '/' + LEFT([CoverageEffectiveDateKey],4) AS DATE), CAST(LEFT(RIGHT([CoverageEndDateKey],4),2) + '/'
+ RIGHT([CoverageEndDateKey],2) + '/' + LEFT([CoverageEndDateKey],4) AS DATE)))/(365/12),0) --Persisted
, but it wouldn't let me specify "Persisted" (which would've helped towards the indexing cause), as it's saying "Computed column 'CalculatedMonths' in table 'FctCoverageLengthDetail' cannot be persisted because the column is non-deterministic.", which I'm thinking is the case because the data would change, being that it's date-based, and also depending on what's in the Where clause. In any case, I'm just having a hard time figuring out how to be able to index the computed column, and would appreciate your much valued feedback.Thank you!
what datatype is "CoverageEffectiveDateKey"?
________________________________________________________________
you can lead a user to data....but you cannot make them think
and remember....every day is a school day
December 15, 2017 at 11:30 am
Hi J Livingston SQL,
"CoverageEffectiveDateKey" is of the int datatype...it displays dates in the "yyyymmdd" format...it's actually a foreign key being used by several tables in the datamart here. Please let me know if I can further clarify anything else. Thanks!
December 15, 2017 at 11:34 am
It looks like you are using string manipulation to convert your CoverageEffectiveDateKey and CoverageEndDateKey from yyyymmdd format to mm/dd/yyyy format. DATEDIFF is perfectly capable of handling strings in yyyymmdd format, so don't bother doing the string manipulation. (You may still need to convert to a string if your fields are INT keys.)
Drew
J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA
December 15, 2017 at 11:45 am
fyi
DECLARE @start AS INT= 20171215;
DECLARE @end AS INT= 20180119;
SELECT CONVERT( DATETIME, CONVERT(VARCHAR(8), @start));
SELECT CONVERT( DATETIME, CONVERT(VARCHAR(8), @end));
SELECT DATEDIFF(DAY, CONVERT( DATETIME, CONVERT(VARCHAR(8), @start)),CONVERT( DATETIME, CONVERT(VARCHAR(8), @end)))
________________________________________________________________
you can lead a user to data....but you cannot make them think
and remember....every day is a school day
December 15, 2017 at 7:37 pm
daniness - Friday, December 15, 2017 11:30 AMHi J Livingston SQL,
"CoverageEffectiveDateKey" is of the int datatype...it displays dates in the "yyyymmdd" format...it's actually a foreign key being used by several tables in the datamart here. Please let me know if I can further clarify anything else. Thanks!
I'm just curious... why did they select the INT datatype to store dates when the DATE datatype is a byte less, does checking to ensure it's a real date, and easily allows for some date math. INT doesn't allow for any of that directly.
Also, your calculation is a bit confusing... you're using all Integer Math and yet you use the ROUND function. What do you really want to use for the number of days in a month? 30 or 30.416666 or a more proper calculation based on the day of the month?
--Jeff Moden
Change is inevitable... Change for the better is not.
December 18, 2017 at 9:45 am
Hi Jeff,
Thanks for your reply. So upon looking into it further, the CoverageEffectiveDateKey is actually a foreign key that has been set to the INT datatype, which I traced from table to table of different SSIS packages and ultimately, it appears to pull from a datetime field from another table. Regarding the calculation, I believe an integer value for months is intended to be used. Please let me know if I can further clarify. Thanks.
December 19, 2017 at 8:20 pm
The "Deterministic" rules are pretty nasty when it comes to CONVERT and implicit converts. That's one of the many reasons why dates represented as integers are so frowned upon. Here are the rules according to Books Online.
The following functions are not always deterministic, but can be used in indexed views or indexes on computed columns when they are specified in a deterministic manner.
Function | Comments |
---|---|
CAST | Deterministic unless used with datetime, smalldatetime, or sql_variant. |
CONVERT | Deterministic unless one of these conditions exists:
|
Here's your PERSISTED computed column using the intended formula (365/12 was incorrect... it needed to be 365/12.0 or 30.416666 to meet the requirements according to the ROUND used in the original formula). I'm not sure that I agree with that formula but that's up to you. The requirement to return the number of months as an INT required yet another CONVERT.
CREATE TABLE dbo.FctCoverageLengthDetail
(
CoverageEffectiveDateKey INT
,CoverageEndDateKey INT
,CalculatedMonths AS CONVERT(INT,ROUND(
DATEDIFF(dd
,CONVERT(DATETIME,CONVERT(CHAR(8),CoverageEffectiveDateKey),112)
,CONVERT(DATETIME,CONVERT(CHAR(8),CoverageEndDateKey) ,112)
)/30.416666 --(365/12.0)
,0)) PERSISTED
)
;
--Jeff Moden
Change is inevitable... Change for the better is not.
December 20, 2017 at 9:30 am
Thank you for your feedback, Jeff. That was very helpful and I was finally able to use the "Persisted" option 🙂.
I am wondering though, if there's a preferred index type for a computed column - clustered vs nonclustered vs covered? Any advice on this would be greatly appreciated as always. Thanks in advance!
December 22, 2017 at 11:14 am
Okay, so another questions for the experts out there....so as posted above, I'm in the process of attempting to optimize a crazy query, and I think I'm on a good path of using indexed computed columns to rid use of functions in WHERE clauses, i.e.:
WHERE FILD2.[FirstInvoiceDateKey] <> '-1' --(2:02m 7.9M)
AND FILD2.[InvoiceLengthMonthNumber] <= round((DATEDIFF(DAY,CAST(LEFT(RIGHT(FILD2.[FirstInvoiceDateKey],4),2) + '/' + RIGHT(FILD2.[FirstInvoiceDateKey],2) + '/'
+ LEFT(FILD2.[FirstInvoiceDateKey],4) AS DATE),CAST(LEFT(RIGHT(FILD2.[OriginalMaturityDateKey],4),2) + '/' + RIGHT(FILD2.[OriginalMaturityDateKey],2)
+ '/' + LEFT(FILD2.[OriginalMaturityDateKey],4) AS DATE)))/(365/12),0)
My question is, when aliases are used on columns like in the above, i.e. "FILD2", which refers to a subquery that is joined to, earlier in the query, does it affect the computed column and/or prevent it from optimizing the overall query's performance? I just want to clarify, because throughout the overall query, various different aliases are used when subqueries are referring to these same columns.
Please let me know if I need to further clarify. Thanks in advance for your valuable insight!
December 22, 2017 at 9:56 pm
daniness - Wednesday, December 20, 2017 9:30 AMThank you for your feedback, Jeff. That was very helpful and I was finally able to use the "Persisted" option 🙂.I am wondering though, if there's a preferred index type for a computed column - clustered vs nonclustered vs covered? Any advice on this would be greatly appreciated as always. Thanks in advance!
Non-Clustered. Clustered indexes should always be based on immutable keys and, for larger tables, can be generally used to prevent page splits during inserts for the table itself (which is the Clustered Index, if there is one). It also helps a lot if the Clustered Index is narrow, unique, and ever-increasing, as well. There are exceptions to those rules but those are the most common generic rules.
--Jeff Moden
Change is inevitable... Change for the better is not.
Viewing 11 posts - 1 through 10 (of 10 total)
You must be logged in to reply to this topic. Login to reply