Performance problem when joining to a view.

  • Hi,

    I have a simple query that joins a largeish fact table (3 million rows) to a view that returns 120 rows. The SKEY in the view is returned via a scalar function. The view returns instantly if queried on it's own however when joined to the fact table in the simple query below results in a query execution plan that runs forever. Interestingly if I change the INNER JOIN to a LEFT OUTER JOIN the query returns the matched results almost instantly.

    Select

    Dimension.Age_Band.[10_Year_Age_Band],

    Count(*)

    From

    Fact.APC_Episodes

    Inner Join Dimension.Age_Band ON

    Fact.APC_Episodes.AGE_BAND_SKEY = Age_Band.AGE_BAND_SKEY

    Group By

    Dimension.Age_Band.[10_Year_Age_Band]

    I know joining to a view using a column generated by a scalar function is not a good recipe for performance. I also know that I could fix this by populating a physical table with the view first as I have already tested this though I hoping not to have to go down that route.

    Does anyone have any ideas why a LEFT OUTER JOIN works and not an INNER JOIN or anyway I can get the query optimizer to generate an execution plan that works?

    Thanks

    Daniel Forrester

  • Hi Dan, have a look at the execution plan, the leftmost operator, SELECT. Rightclick to raise the properties tab. Look for something like "Reason for early termination..." and post the value string. Better still, post the actual execution plan as a .sqlplan attachment.

    β€œWrite the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • How long does it take to get a return if you run the following, making the appropriate change in the FROM clause, of course?

    SELECT TOP 10 *

    FROM yourviewname

    ;

    Shifting gears back to looking at the code... does the Age_Band.AGE_BAND_SKEY column contain ONLY unique values?

    And, yes. Please attach the execution plan as a file along with the necessary DDL. Please see the second link under "Helpful Links" in my signature line below for what we need and how we need it to help you troubleshoot a performance problem. Thanks.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Hi Guys,

    I found a work around which seems to trick the query optimizer into working properly by comparing the Age_Band View to other dimensions that don't seem to cause the problem. I have found that if I introduce a Union statement that determines the type of the columns (in my example AGE_BAND_SKEY) within the view/sub query it works and the results are returned instantly. In the example below if I comment out the first statement in the sub query (Select 0 As AGE_BAND_SKEY) this query never finished, put in the union and instant response.

    Select

    Count(*)

    From

    Fact.APC_Episodes

    Inner Join (

    Select

    0 As AGE_BAND_SKEY

    Where

    1=2

    Union All

    Select

    dbo.fn_Generate_SKEY(Age_Band.Age,Age_Band.Valid_From) As AGE_BAND_SKEY

    From

    NHSH_Reference.Dimension.Age_Band With (NoLock)

    ) As Age_Band ON

    Fact.APC_Episodes.AGE_BAND_SKEY = Age_Band.AGE_BAND_SKEY

    I have attached the plans for your interest however I don't want to spend too much time on this. The reason for the views is because I am trying to rapidly prototype a data mart based on existing data and therefore reduce the time spent on the ETL development. I'm sure I'll have to do this properly at some point.

    Thanks for your interest.

    Daniel

  • Daniel Forrester 123 (1/19/2015)


    I'm sure I'll have to do this properly at some point.

    Why not now? The logic and code is fresh in your mind. It almost always takes less time to do it correctly the first time instead of slamming out something that works today but you have to go back and fix at some point.

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • I wonder if the reason is the Scalar Function, which seems to be generating an on the fly surrogate key, is there a reason why the Age band view isn't materialised as a Physical table?

    Any chance you can post the scalar so it can be evaluated.

    Also the WITH(NOLOCK) Hint could do with being removed as you can get dirty reads and so inconsistent data, possibly not as much of an issue on a static dataset, as it would be on an OLTP system.

    _________________________________________________________________________
    SSC Guide to Posting and Best Practices

  • Daniel Forrester 123 (1/19/2015)


    Hi Guys,

    I found a work around which seems to trick the query optimizer into working properly by comparing the Age_Band View to other dimensions that don't seem to cause the problem. I have found that if I introduce a Union statement that determines the type of the columns (in my example AGE_BAND_SKEY) within the view/sub query it works and the results are returned instantly. In the example below if I comment out the first statement in the sub query (Select 0 As AGE_BAND_SKEY) this query never finished, put in the union and instant response.

    Select

    Count(*)

    From

    Fact.APC_Episodes

    Inner Join (

    Select

    0 As AGE_BAND_SKEY

    Where

    1=2

    Union All

    Select

    dbo.fn_Generate_SKEY(Age_Band.Age,Age_Band.Valid_From) As AGE_BAND_SKEY

    From

    NHSH_Reference.Dimension.Age_Band With (NoLock)

    ) As Age_Band ON

    Fact.APC_Episodes.AGE_BAND_SKEY = Age_Band.AGE_BAND_SKEY

    I have attached the plans for your interest however I don't want to spend too much time on this. The reason for the views is because I am trying to rapidly prototype a data mart based on existing data and therefore reduce the time spent on the ETL development. I'm sure I'll have to do this properly at some point.

    Thanks for your interest.

    Daniel

    Thanks, Daniel, but I don't see any of the plans attached. It would be fun to look this one over because the optimizer has obviously been tricked into performance.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jason-299789 (1/19/2015)


    Also the WITH(NOLOCK) Hint could do with being removed as you can get dirty reads and so inconsistent data, possibly not as much of an issue on a static dataset, as it would be on an OLTP system.

    Of course if the data is static it is highly unlikely to have locks anyway. πŸ˜‰ I agree 100% about removing that hint unless actually needed (and all too often it is just no fully understood).

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • The execution plan is your bestest buddy in situations like this.

    Joining to views is a common code smell in T-SQL. The language just doesn't lend itself to code reuse. While it looks like it should be great to write a series of JOINs once into a view and then use them over and over, it's actually a poor choice.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • You could also defer decoding/lookup until after the counts are computed. This might make the final query more "natural-looking":

    Select

    age.[10_Year_Age_Band],

    epi.count

    From (

    Select

    AGE_BAND_SKEY, Count(*) AS epi_count

    From

    Fact.APC_Episodes

    Group By

    AGE_BAND_SKEY

    ) AS epi

    Inner Join NHSH_Reference.Dimension.Age_Band age With (NoLock) ON

    age.AGE_BAND_SKEY = epi.AGE_BAND_SKEY

    Order by

    [10_Year_Age_Band]

    SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

  • Hopefully the plans have attached successfully this time. Like I said before the use of views is to enable me to quickly prototype the dimensions for a data mart to support the proposed reporting requirements. With the specifications being pretty slack the views enable me to make changes and see the results straight away. It is not my recommended final solution!

    Regards

    Daniel

Viewing 11 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic. Login to reply