Scalar Functions with Query Plans: Not all is obvious!

  • Comments posted to this topic are about the item Scalar Functions with Query Plans: Not all is obvious!

    Mike Byrd

  • Totally agree here. But what would happen if we take out the GETUTCDATE() into a variable and change the call of DATEDIFF with it? What would be the impact?

  • Same query plan, same cost, same IO, same cpu time.  No change defining it as a variable first.

    Mike Byrd

  • I thought this rule only applied to functions in the WHERE clause.  Is performance increased when applying this to other parts of the query?

  • To state the obvious, in the first query the calculation has to be done for every row of data while in the second query it only needs to be done once.

  • Lynn Pettis - Monday, June 4, 2018 2:21 PM

    To state the obvious, in the first query the calculation has to be done for every row of data while in the second query it only needs to be done once.

    I've realized that after I posted the comment 🙂

  • It is clear that the  query change was an improvement but I suggest that some index work would make a dramatic improvement.

    OP did not provide index or key information but it looks like the query probably returns relatively few rows.

    A covering index, reducing the columns in the select, and/or redefined PK could reduce the logical reads to the single digit range and eliminate the Lookup completely.

    If the query is run often enough that ~5% improvement is worth the effort then the index improvement would certainly be worth the effort.

  • Index work had already been accomplished -- hence the index + key lookup.  The query returns every column in the table, and I am just opposed to creating a non-clustered index replicating the original table.  If you download my presentation on "Climbing the B-Tree" from http://www.sqlsaturday.com/734/Sessions/Schedule.aspx you can see that a fully column non-clustered index can potentially take up as much disk space as the clustered index thus costing additional IO retrievals as opposed to the narrow index used in both queries.  Yes there is usually a small result set -- hence the optimizer selecting the index seek plus key lookup.  Key Lookups are not always bad!  As you can see from the index definition below:
    CREATE NONCLUSTERED INDEX IX_Alert_UserID ON dbo.Alert
         (UserID ASC,AlertTypeID ASC)
         INCLUDE (DelFlag,DateUpdated)
    GO
    where all the columns (except in the SELECT clause) are included.  I call this a semi-covering nonclustered index.

    Mike Byrd

  • Ray Herring - Tuesday, June 5, 2018 11:01 AM

    It is clear that the  query change was an improvement but I suggest that some index work would make a dramatic improvement.

    OP did not provide index or key information but it looks like the query probably returns relatively few rows.

    A covering index, reducing the columns in the select, and/or redefined PK could reduce the logical reads to the single digit range and eliminate the Lookup completely.

    If the query is run often enough that ~5% improvement is worth the effort then the index improvement would certainly be worth the effort.

    With all those columns included, if the DateUpdated is the most common way to access the table, I would suggest a change in the clustered index (which I'm afraid that it could be in the AlertID identity column which happens to work as PK as well).

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2
  • Mike Byrd - Tuesday, June 5, 2018 12:44 PM

    Index work had already been accomplished -- hence the index + key lookup.  The query returns every column in the table, and I am just opposed to creating a non-clustered index replicating the original table.  If you download my presentation on "Climbing the B-Tree" from http://www.sqlsaturday.com/734/Sessions/Schedule.aspx you can see that a fully column non-clustered index can potentially take up as much disk space as the clustered index thus costing additional IO retrievals as opposed to the narrow index used in both queries.  Yes there is usually a small result set -- hence the optimizer selecting the index seek plus key lookup.  Key Lookups are not always bad!  As you can see from the index definition below:
    CREATE NONCLUSTERED INDEX IX_Alert_UserID ON dbo.Alert
         (UserID ASC,AlertTypeID ASC)
         INCLUDE (DelFlag,DateUpdated)
    GO
    where all the columns (except in the SELECT clause) are included.  I call this a semi-covering nonclustered index.

    I understand your reply but nothing you've said invalidates my comments and suggestions.  For example from the limited schema information it seems that DelFlag and DateUpdated would be more appropriate as part of the index rather than as an included column.  The query plan you posted does indicate a large number of rows are output from the index seek and about the same number for both versions.  (i.e. every row for the UserId, AlertType.) Adding those columns to the index would reduce the number of rows in the interim result set substantially, thereby reducing the number of looks and the logical reads.
    Also, as one of the other responders mentioned, reworking the clustered index might be the better approach.
    Of course all of this depends on how the query is used and what other queries hit the table.  You are certainly in the best position to evaluate that.
    I have many tables that have 2x or even 3x space dedicated to non-clustered indexes.  I have other tables that have none at all.

  • Ray Herring - Tuesday, June 5, 2018 1:23 PM

    Mike Byrd - Tuesday, June 5, 2018 12:44 PM

    Index work had already been accomplished -- hence the index + key lookup.  The query returns every column in the table, and I am just opposed to creating a non-clustered index replicating the original table.  If you download my presentation on "Climbing the B-Tree" from http://www.sqlsaturday.com/734/Sessions/Schedule.aspx you can see that a fully column non-clustered index can potentially take up as much disk space as the clustered index thus costing additional IO retrievals as opposed to the narrow index used in both queries.  Yes there is usually a small result set -- hence the optimizer selecting the index seek plus key lookup.  Key Lookups are not always bad!  As you can see from the index definition below:
    CREATE NONCLUSTERED INDEX IX_Alert_UserID ON dbo.Alert
         (UserID ASC,AlertTypeID ASC)
         INCLUDE (DelFlag,DateUpdated)
    GO
    where all the columns (except in the SELECT clause) are included.  I call this a semi-covering nonclustered index.

    I understand your reply but nothing you've said invalidates my comments and suggestions.  For example from the limited schema information it seems that DelFlag and DateUpdated would be more appropriate as part of the index rather than as an included column.  The query plan you posted does indicate a large number of rows are output from the index seek and about the same number for both versions.  (i.e. every row for the UserId, AlertType.) Adding those columns to the index would reduce the number of rows in the interim result set substantially, thereby reducing the number of looks and the logical reads.
    Also, as one of the other responders mentioned, reworking the clustered index might be the better approach.
    Of course all of this depends on how the query is used and what other queries hit the table.  You are certainly in the best position to evaluate that.
    I have many tables that have 2x or even 3x space dedicated to non-clustered indexes.  I have other tables that have none at all.

    I would actually suggest that DelFlag could be a filter for the index as is seems to be a simple flag for 'Y' or 'N'. That would reduce the size of the index, especially comparing it to the future when deleted rows could get to the millions. This, as all of the suggestions, is just a guess based on limited information.

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2
  • Actually all the suggestions made are good ones, and as most have pointed out all is very environment sensitive.  90% of the DelFlag column are marked 'N' and the others marked 'Y'.  There is some periodic garbage cleanup of the Ys.  I tried a non-clustered covering index on DateUpdated (with updated statistics) but the optimizer still picked the covering index based on UserID.  Changing the Clustered Index is not really an option because of the table's usage in other queries.  The major point to the article though was to reinforce "Don't use Scalar Functions with Table Columns" and unfortunately at first glance the Graphical Query Plan doesn't give you much of a clue other than the query costs.

    Mike Byrd

Viewing 12 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply