Query regression by index or statistics?

  • Uwe Ricken

    SSCrazy

    Points: 2986

    Comments posted to this topic are about the item Query regression by index or statistics?

    Microsoft Certified Master: SQL Server 2008
    MVP - Data Platform (2013 - ...)
    my blog: http://www.sqlmaster.de (german only!)

  • This was removed by the editor as SPAM

  • Carlo Romagnano

    SSC-Insane

    Points: 21667

    Both filtered index and statistics suffer of parameterization in the where clause.

    In the following query, the optimizer can't predict which filtered statistic should be used (think to stored procs).

    declare @lan VARCHAR(100) = 'english'

    SELECT M.message_id,

    M.text,

    L.language

    FROM dbo.messages AS M INNER JOIN dbo.languages AS L

    ON (M.language_id = L.language_id)

    WHERE L.language = @lan;

    GO

    A clustered index always works better:

    CREATE CLUSTERED INDEX nix_messages_language_id ON dbo.messages (language_id) with drop_existing;

  • Uwe Ricken

    SSCrazy

    Points: 2986

    Thank you Carlo for your valued comment. Pleas take into consideration that your example is completely different from the original question. The usage of a variable in a batch will ALWAYS lead to the usage of the density vector of the statistics. Only literal predicates / Procedures / sp_executesql can "sniff" the parameter and use the histogram. Otherwise the value of the predicate is not available for the query estimator at compile time!

    Your example will always go for the estimate of 4,050 records because the density is 8,100 records divided by 2 (languages).

    If you use a stored procedure than my example will work like a charm because in this case it can sniff the value. But than you run into another problem called parameter sniffing 🙂

    CREATE PROC dbo.GetMessages

    @lang CHAR(20)

    AS

    SET NOCOUNT ON;

    SELECT m.message_id,

    m.text,

    l.language

    FROM dbo.messages AS M INNER JOIN dbo.languages AS L

    ON (M.language_id = L.language_id)

    WHERE L.language = @Lang

    OPTION (RECOMPILE);

    SET NOCOUNT OFF

    GO

    The above procedure uses the OPTION RECOMPILE to avoid parameter sniffing!

    Microsoft Certified Master: SQL Server 2008
    MVP - Data Platform (2013 - ...)
    my blog: http://www.sqlmaster.de (german only!)

  • Rune Bivrin

    SSCertifiable

    Points: 7506

    Thank you very much, Uwe!

    That was a very enlightening question and explanation. Definitely something to add to the tool belt for those pesky edge cases.


    Just because you're right doesn't mean everybody else is wrong.

  • Sean Lange

    SSC Guru

    Points: 286391

    Definitely an interesting question. Of course I would start with a clustered index so we aren't dealing with heaps. I realize the topic being discussed is different though. 😀

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • MMullane-222251

    SSC Veteran

    Points: 279

    Uwe, do you have a test showing that creating a Clustered Index wouldn't improve the execution plan?

  • Igor Micev

    SSC-Dedicated

    Points: 33108

    Fantastic question!

    None of the indexes options will improve the execution plan, but the proposed statistics.

    Igor Micev,
    My blog: www.igormicev.com

  • George Vobr

    SSCrazy Eights

    Points: 8881

    Rune Bivrin (1/5/2017)


    Thank you very much, Uwe!

    That was a very enlightening question and explanation. Definitely something to add to the tool belt for those pesky edge cases.

    Igor Micev (1/5/2017)


    Fantastic question!

    None of the indexes options will improve the execution plan, but the proposed statistics.

    +1

    Thanks Uwe for a very interesting question, here is another useful link:

    https://www.mssqltips.com/sqlservertip/2353/performance-advantages-of-sql-server-filtered-statistics/

  • Uwe Ricken

    SSCrazy

    Points: 2986

    MMullane-222251 (1/5/2017)


    Uwe, do you have a test showing that creating a Clustered Index wouldn't improve the execution plan?

    Hi,

    of course you can test it with a clustered index on one or both tables but that will not lead to a sufficient solution because neither a heap nor a table will solve the generic problem what the query optimizer has in this case

    -- Would the estimates been better with clustered indexes?

    CREATE UNIQUE CLUSTERED INDEX cuix_languages_id ON dbo.languages (language_id);

    GO

    The execution plan will look nearby exactly the same as before. The only thing what changes is the TABLE SCAN which turns into a CLUSTERED INDEX SCAN.

    BTW: The same result will happen if you put a clustered index on the [language] attribute.

    The problem is coming from one special issue: The result from the outer table (dbo.languages) is not predictable: What Id will I get when I ask for the language = 'english').

    Due to THAT fact Microsoft SQL Server has to guess. What does Microsoft SQL Server know in the moment when you hit [F5]?

    - I will get ONE record out of two (50%) from dbo.languages!

    - I have 8.100 records in the "inner table"

    What would you now estimate? You don't have more information!

    If I have 8.100 records I have a chance to get 50% of the records back = 4050.

    Unfortunately I cannot upload depictions here - I will take the time in my holidays to write an article about it at simple-talk and will send the link to this discussion.

    All the best from cold Germany...

    Microsoft Certified Master: SQL Server 2008
    MVP - Data Platform (2013 - ...)
    my blog: http://www.sqlmaster.de (german only!)

  • Konstantin Reu

    Hall of Fame

    Points: 3149

    Finally interesting question here. I got it due research.

    Also

    CREATE unique index IX_lang ON [dbo].[languages](language) WHERE language = 'deutsch'

    lead indirectly to filtered statistics and desired query plan

    Thanks.

    MS SQL 2008 MCITP x 3
    MS SQL 2012 MCSE x 2

  • Ken Wymore

    SSCoach

    Points: 16321

    Konstantin Reu (1/5/2017)


    Finally interesting question here. I got it due research.

    Also

    CREATE unique index IX_lang ON [dbo].[languages](language) WHERE language = 'deutsch'

    lead indirectly to filtered statistics and desired query plan

    Thanks.

    I was wondering this same thing. Thanks for the info!

  • Revenant

    SSC-Forever

    Points: 42468

    Very elaborate and interesting question - just reading it thoroughly took a cup of coffee. Thanks, Uwe!

  • Ken Wymore

    SSCoach

    Points: 16321

    I really liked the setup for this question as it can be easily translated for real world scenarios. I found it fun to test the different options on the tables to see what the actual results would be versus just guessing at what might work. Worth the effort. Thanks Uwe!

  • Uwe Ricken

    SSCrazy

    Points: 2986

    Thank you all for your positive feedback.

    I love to create QotD with a bit more "substance" from my real world experiences.

    I will try to write more of them in the upcoming year 🙂

    Microsoft Certified Master: SQL Server 2008
    MVP - Data Platform (2013 - ...)
    my blog: http://www.sqlmaster.de (german only!)

Viewing 15 posts - 1 through 15 (of 18 total)

You must be logged in to reply to this topic. Login to reply