Query regression by index or statistics?

Question

Post reply

Query regression by index or statistics?

Uwe Ricken

Hall of Fame

Points: 3171
More actions
January 4, 2017 at 10:04 pm

#333173

Comments posted to this topic are about the item Query regression by index or statistics?
Microsoft Certified Master: SQL Server 2008
MVP - Data Platform (2013 - ...)
my blog: http://www.sqlmaster.de (german only!)

Viewing 15 posts - 1 through 15 (of 17 total)

You must be logged in to reply to this topic. Login to reply

Carlo Romagnano SSC-Insane Points: 22772 More actions · Answer 1

Both filtered index and statistics suffer of parameterization in the where clause.

In the following query, the optimizer can't predict which filtered statistic should be used (think to stored procs).

declare @lan VARCHAR(100) = 'english'

SELECTM.message_id,

M.text,

L.language

FROMdbo.messages AS M INNER JOIN dbo.languages AS L

ON (M.language_id = L.language_id)

WHEREL.language = @lan;

GO

A clustered index always works better:

CREATE CLUSTERED INDEX nix_messages_language_id ON dbo.messages (language_id) with drop_existing;

Uwe Ricken Hall of Fame Points: 3171 More actions · Answer 2

Thank you Carlo for your valued comment. Pleas take into consideration that your example is completely different from the original question. The usage of a variable in a batch will ALWAYS lead to the usage of the density vector of the statistics. Only literal predicates / Procedures / sp_executesql can "sniff" the parameter and use the histogram. Otherwise the value of the predicate is not available for the query estimator at compile time!

Your example will always go for the estimate of 4,050 records because the density is 8,100 records divided by 2 (languages).

If you use a stored procedure than my example will work like a charm because in this case it can sniff the value. But than you run into another problem called parameter sniffing

CREATE PROC dbo.GetMessages

@lang CHAR(20)

AS

SET NOCOUNT ON;

SELECTm.message_id,

m.text,

l.language

FROMdbo.messages AS M INNER JOIN dbo.languages AS L

ON (M.language_id = L.language_id)

WHEREL.language = @Lang

OPTION(RECOMPILE);

SET NOCOUNT OFF

GO

The above procedure uses the OPTION RECOMPILE to avoid parameter sniffing!

Microsoft Certified Master: SQL Server 2008
MVP - Data Platform (2013 - ...)
my blog: http://www.sqlmaster.de (german only!)

Rune Bivrin SSCrazy Eights Points: 8618 More actions · Answer 3

Thank you very much, Uwe!

That was a very enlightening question and explanation. Definitely something to add to the tool belt for those pesky edge cases.

Just because you're right doesn't mean everybody else is wrong.

Sean Lange SSC Guru Points: 286609 More actions · Answer 4

Definitely an interesting question. Of course I would start with a clustered index so we aren't dealing with heaps. I realize the topic being discussed is different though.

_______________________________________________________________

Need help? Help us help you.

Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

MMullane-222251 Old Hand Points: 379 More actions · Answer 5

Uwe, do you have a test showing that creating a Clustered Index wouldn't improve the execution plan?

Igor Micev SSC-Dedicated Points: 33110 More actions · Answer 6

Fantastic question!

None of the indexes options will improve the execution plan, but the proposed statistics.

Igor Micev,My blog: www.igormicev.com

George Vobr SSChampion Points: 10095 More actions · Answer 7

Rune Bivrin (1/5/2017)
Thank you very much, Uwe!
That was a very enlightening question and explanation. Definitely something to add to the tool belt for those pesky edge cases.

Igor Micev (1/5/2017)
Fantastic question!
None of the indexes options will improve the execution plan, but the proposed statistics.

+1

Thanks Uwe for a very interesting question, here is another useful link:

https://www.mssqltips.com/sqlservertip/2353/performance-advantages-of-sql-server-filtered-statistics/

Uwe Ricken Hall of Fame Points: 3171 More actions · Answer 8

MMullane-222251 (1/5/2017)
Uwe, do you have a test showing that creating a Clustered Index wouldn't improve the execution plan?

Hi,

of course you can test it with a clustered index on one or both tables but that will not lead to a sufficient solution because neither a heap nor a table will solve the generic problem what the query optimizer has in this case

-- Would the estimates been better with clustered indexes?

CREATE UNIQUE CLUSTERED INDEX cuix_languages_id ON dbo.languages (language_id);

GO

The execution plan will look nearby exactly the same as before. The only thing what changes is the TABLE SCAN which turns into a CLUSTERED INDEX SCAN.

BTW: The same result will happen if you put a clustered index on the [language] attribute.

The problem is coming from one special issue: The result from the outer table (dbo.languages) is not predictable: What Id will I get when I ask for the language = 'english').

Due to THAT fact Microsoft SQL Server has to guess. What does Microsoft SQL Server know in the moment when you hit [F5]?

- I will get ONE record out of two (50%) from dbo.languages!

- I have 8.100 records in the "inner table"

What would you now estimate? You don't have more information!

If I have 8.100 records I have a chance to get 50% of the records back = 4050.

Unfortunately I cannot upload depictions here - I will take the time in my holidays to write an article about it at simple-talk and will send the link to this discussion.

All the best from cold Germany...

Microsoft Certified Master: SQL Server 2008
MVP - Data Platform (2013 - ...)
my blog: http://www.sqlmaster.de (german only!)

Konstantin Reu Hall of Fame Points: 3149 More actions · Answer 9

Finally interesting question here. I got it due research.

Also

CREATE unique index IX_lang ON [dbo].[languages](language) WHERE language = 'deutsch'

lead indirectly to filtered statistics and desired query plan

Thanks.

MS SQL 2008 MCITP x 3
MS SQL 2012 MCSE x 2

Ken SSCoach Points: 17191 More actions · Answer 10

Konstantin Reu (1/5/2017)
Finally interesting question here. I got it due research.
Also
CREATE unique index IX_lang ON [dbo].[languages](language) WHERE language = 'deutsch'
lead indirectly to filtered statistics and desired query plan
Thanks.

I was wondering this same thing. Thanks for the info!

Revenant SSC-Forever Points: 42467 More actions · Answer 11

Very elaborate and interesting question - just reading it thoroughly took a cup of coffee. Thanks, Uwe!

Ken SSCoach Points: 17191 More actions · Answer 12

I really liked the setup for this question as it can be easily translated for real world scenarios. I found it fun to test the different options on the tables to see what the actual results would be versus just guessing at what might work. Worth the effort. Thanks Uwe!

Uwe Ricken Hall of Fame Points: 3171 More actions · Answer 13

Thank you all for your positive feedback.

I love to create QotD with a bit more "substance" from my real world experiences.

I will try to write more of them in the upcoming year

Microsoft Certified Master: SQL Server 2008
MVP - Data Platform (2013 - ...)
my blog: http://www.sqlmaster.de (german only!)

Konstantin Reu Hall of Fame Points: 3149 More actions · Answer 14

Here is article with exactly the same case study

https://blogs.msdn.microsoft.com/psssql/2010/09/28/case-of-using-filtered-statistics/

It does not have deep technical explanation, unfortunately. Respect to author of question.

MS SQL 2008 MCITP x 3
MS SQL 2012 MCSE x 2

Query regression by index or statistics?

Cookies on SQLServerCentral