To Index a Computed Column...

  • Hi All,

    So in an effort to optimize a larger query (which I've posted about here before), I'm trying to utilize computed columns, as I see chunks of code that are repeated throughout the query on certain columns. Amongst other things I've tried/will try, I'm hoping to be able to improve performance/run time, by implementing indexing on the computed column, but it turns out upon researching computed columns, it must be deterministic. So in order for this to happen, I'm trying to use schemabinding with the computed column, but am unsure how to go about doing so. This chunk of code is repeated about 7 times in the overall query:


    ,DC.OriginalEnrollmentTermNumber-ROUND((DATEDIFF(DAY,CAST(LEFT(RIGHT(FCLD.[CoverageEffectiveDateKey],4),2) + '/'

    + RIGHT(FCLD.[CoverageEffectiveDateKey],2) + '/' + LEFT(FCLD.[CoverageEffectiveDateKey],4) AS DATE), CAST(LEFT(RIGHT(FCLD.[CoverageEndDateKey],4),2) + '/'

    + RIGHT(FCLD.[CoverageEndDateKey],2) + '/' + LEFT(FCLD.[CoverageEndDateKey],4) AS DATE)))/(365/12),0) as MonthsCancelled2


    and I've created a computed column for this using the following:

    Alter table FctCoverageLengthDetail

    Add [CalculatedMonths] as ROUND((DATEDIFF(DAY,CAST(LEFT(RIGHT([CoverageEffectiveDateKey],4),2) + '/'

    + RIGHT([CoverageEffectiveDateKey],2) + '/' + LEFT([CoverageEffectiveDateKey],4) AS DATE), CAST(LEFT(RIGHT([CoverageEndDateKey],4),2) + '/'

    + RIGHT([CoverageEndDateKey],2) + '/' + LEFT([CoverageEndDateKey],4) AS DATE)))/(365/12),0) --Persisted


    , but it wouldn't let me specify "Persisted" (which would've helped towards the indexing cause), as it's saying "Computed column 'CalculatedMonths' in table 'FctCoverageLengthDetail' cannot be persisted because the column is non-deterministic.", which I'm thinking is the case because the data would change, being that it's date-based, and also depending on what's in the Where clause. In any case, I'm just having a hard time figuring out how to be able to index the computed column, and would appreciate your much valued feedback.

    Thank you! 

  • daniness - Friday, December 15, 2017 10:54 AM

    Hi All,

    So in an effort to optimize a larger query (which I've posted about here before), I'm trying to utilize computed columns, as I see chunks of code that are repeated throughout the query on certain columns. Amongst other things I've tried/will try, I'm hoping to be able to improve performance/run time, by implementing indexing on the computed column, but it turns out upon researching computed columns, it must be deterministic. So in order for this to happen, I'm trying to use schemabinding with the computed column, but am unsure how to go about doing so. This chunk of code is repeated about 7 times in the overall query:


    ,DC.OriginalEnrollmentTermNumber-ROUND((DATEDIFF(DAY,CAST(LEFT(RIGHT(FCLD.[CoverageEffectiveDateKey],4),2) + '/'

    + RIGHT(FCLD.[CoverageEffectiveDateKey],2) + '/' + LEFT(FCLD.[CoverageEffectiveDateKey],4) AS DATE), CAST(LEFT(RIGHT(FCLD.[CoverageEndDateKey],4),2) + '/'

    + RIGHT(FCLD.[CoverageEndDateKey],2) + '/' + LEFT(FCLD.[CoverageEndDateKey],4) AS DATE)))/(365/12),0) as MonthsCancelled2


    and I've created a computed column for this using the following:

    Alter table FctCoverageLengthDetail

    Add [CalculatedMonths] as ROUND((DATEDIFF(DAY,CAST(LEFT(RIGHT([CoverageEffectiveDateKey],4),2) + '/'

    + RIGHT([CoverageEffectiveDateKey],2) + '/' + LEFT([CoverageEffectiveDateKey],4) AS DATE), CAST(LEFT(RIGHT([CoverageEndDateKey],4),2) + '/'

    + RIGHT([CoverageEndDateKey],2) + '/' + LEFT([CoverageEndDateKey],4) AS DATE)))/(365/12),0) --Persisted


    , but it wouldn't let me specify "Persisted" (which would've helped towards the indexing cause), as it's saying "Computed column 'CalculatedMonths' in table 'FctCoverageLengthDetail' cannot be persisted because the column is non-deterministic.", which I'm thinking is the case because the data would change, being that it's date-based, and also depending on what's in the Where clause. In any case, I'm just having a hard time figuring out how to be able to index the computed column, and would appreciate your much valued feedback.

    Thank you! 

    what datatype is "CoverageEffectiveDateKey"?

    ________________________________________________________________
    you can lead a user to data....but you cannot make them think
    and remember....every day is a school day

  • Hi J Livingston SQL,
    "CoverageEffectiveDateKey" is of the int datatype...it displays dates in the "yyyymmdd" format...it's actually a foreign key being used by several tables in the datamart here. Please let me know if I can further clarify anything else. Thanks!

  • It looks like you are using string manipulation to convert your CoverageEffectiveDateKey and CoverageEndDateKey from yyyymmdd format to mm/dd/yyyy format.  DATEDIFF is perfectly capable of handling strings in yyyymmdd format, so don't bother doing the string manipulation.  (You may still need to convert to a string if your fields are INT keys.)

    Drew

    J. Drew Allen
    Business Intelligence Analyst
    Philadelphia, PA

  • fyi

    DECLARE @start AS INT= 20171215;
    DECLARE @end AS INT= 20180119;
    SELECT CONVERT( DATETIME, CONVERT(VARCHAR(8), @start));
    SELECT CONVERT( DATETIME, CONVERT(VARCHAR(8), @end));
    SELECT DATEDIFF(DAY, CONVERT( DATETIME, CONVERT(VARCHAR(8), @start)),CONVERT( DATETIME, CONVERT(VARCHAR(8), @end)))

    ________________________________________________________________
    you can lead a user to data....but you cannot make them think
    and remember....every day is a school day

  • daniness - Friday, December 15, 2017 11:30 AM

    Hi J Livingston SQL,
    "CoverageEffectiveDateKey" is of the int datatype...it displays dates in the "yyyymmdd" format...it's actually a foreign key being used by several tables in the datamart here. Please let me know if I can further clarify anything else. Thanks!

    I'm just curious... why did they select the INT datatype to store dates when the DATE datatype is a byte less, does checking to ensure it's a real date, and easily allows for some date math.  INT doesn't allow for any of that directly.

    Also, your calculation is a bit confusing... you're using all Integer Math and yet you use the ROUND function.  What do you really want to use for the number of days in a month?  30 or 30.416666 or a more proper calculation based on the day of the month?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Hi Jeff,
    Thanks for your reply. So upon looking into it further, the CoverageEffectiveDateKey is actually a foreign key that has been set to the INT datatype, which I traced from table to table of different SSIS packages and ultimately, it appears to pull from a datetime field from another table. Regarding the calculation, I believe an integer value for months is intended to be used. Please let me know if I can further clarify. Thanks.

  • The "Deterministic" rules are pretty nasty when it comes to CONVERT and implicit converts.  That's one of the many reasons why dates represented as integers are so frowned upon.  Here are the rules according to Books Online.

    The following functions are not always deterministic, but can be used in indexed views or indexes on computed columns when they are specified in a deterministic manner.

    Function Comments

    CAST

    Deterministic unless used with datetime, smalldatetime, or sql_variant.

    CONVERT

    Deterministic unless one of these conditions exists:

    • Source type is sql_variant.
    • Target type is sql_variant and its source type is nondeterministic.
    • Source or target type is datetime or smalldatetime, the other source or target type is a character string, and a nondeterministic style is specified. To be deterministic, the style parameter must be a constant. Additionally, styles less than or equal to 100 are nondeterministic, except for styles 20 and 21. Styles greater than 100 are deterministic, except for styles 106, 107, 109 and 113.

    Here's your PERSISTED computed column using the intended formula (365/12 was incorrect... it needed to be 365/12.0 or 30.416666 to meet the requirements according to the ROUND used in the original formula).  I'm not sure that I agree with that formula but that's up to you.  The requirement to return the number of months as an INT required yet another CONVERT.


    CREATE TABLE dbo.FctCoverageLengthDetail
            (
             CoverageEffectiveDateKey   INT
            ,CoverageEndDateKey         INT
            ,CalculatedMonths        AS CONVERT(INT,ROUND(
                                            DATEDIFF(dd
                                                    ,CONVERT(DATETIME,CONVERT(CHAR(8),CoverageEffectiveDateKey),112)
                                                    ,CONVERT(DATETIME,CONVERT(CHAR(8),CoverageEndDateKey)      ,112)
                                                    )/30.416666 --(365/12.0)
                                            ,0)) PERSISTED
            )
    ;

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Thank you for your feedback, Jeff. That was very helpful and I was finally able to use the "Persisted" option 🙂.

    I am wondering though, if there's a preferred index type for a computed column - clustered vs nonclustered vs covered? Any advice on this would be greatly appreciated as always. Thanks in advance!

  • Okay, so another questions for the experts out there....so as posted above, I'm in the process of attempting to optimize a crazy query, and I think I'm on a good path of using indexed computed columns to rid use of functions in WHERE clauses, i.e.:

    WHERE FILD2.[FirstInvoiceDateKey] <> '-1' --(2:02m 7.9M)

    AND FILD2.[InvoiceLengthMonthNumber] <= round((DATEDIFF(DAY,CAST(LEFT(RIGHT(FILD2.[FirstInvoiceDateKey],4),2) + '/' + RIGHT(FILD2.[FirstInvoiceDateKey],2) + '/'

    + LEFT(FILD2.[FirstInvoiceDateKey],4) AS DATE),CAST(LEFT(RIGHT(FILD2.[OriginalMaturityDateKey],4),2) + '/' + RIGHT(FILD2.[OriginalMaturityDateKey],2)

    + '/' + LEFT(FILD2.[OriginalMaturityDateKey],4) AS DATE)))/(365/12),0)


    My question is, when aliases are used on columns like in the above, i.e. "FILD2", which refers to a subquery that is joined to, earlier in the query, does it affect the computed column and/or prevent it from optimizing the overall query's performance? I just want to clarify, because throughout the overall query, various different aliases are used when subqueries are referring to these same columns. 
    Please let me know if I need to further clarify. Thanks in advance for your valuable insight!

  • daniness - Wednesday, December 20, 2017 9:30 AM

    Thank you for your feedback, Jeff. That was very helpful and I was finally able to use the "Persisted" option 🙂.

    I am wondering though, if there's a preferred index type for a computed column - clustered vs nonclustered vs covered? Any advice on this would be greatly appreciated as always. Thanks in advance!

    Non-Clustered.  Clustered indexes should always be based on immutable keys and, for larger tables, can be generally used to prevent page splits during inserts for the table itself (which is the Clustered Index, if there is one).  It also helps a lot if the Clustered Index is narrow, unique, and ever-increasing, as well.  There are exceptions to those rules but those are the most common generic rules.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 11 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic. Login to reply