RE: SQLServerCentral.com Best Practices Clinic: Part 5

SSC Guru

Points: 100851

August 2, 2011 at 5:10 pm

bradmcgehee@hotmail.com (8/2/2011)
I am providing most of the information requested, except the XML execution plans. You will have to work with the graphical execution plans.

Why not? Not that I can't recreate a reasonable facsimile of your data, I can't mimic your data distribution with a random data loader. Why would you purposely tie the hands of those asking?

Query 1 and 2:

Here’s a little background on what they do. Both queries are used to help track the number of clicks articles receive on SQLServerCentral.com. When a web page is accessed, Query #2 runs first, looking to see if that particular article has ever been viewed before or not. If not, then another query (which is not in the top 10 list) is run to create a new record where clicks are tracked in a table; but if the article has been clicked on before, then query #1 runs, which increments a counter to track the number of clicks for that page. As you can see from figure 2, both of these queries have a high number of executions, high duration, high CPU usage, and high logical reads over the 24 hour period being analyzed.

That's all lovely, but why isn't this information attached to the articles table (unlisted but assumed to exist) where the Article ID (obviously available, as it's in the HTTP header) is easily seekable and can increment simply there, on an existing clustered ID for the article? This would also get it out of this logging table which is apparently a mismatch and thus has a wonky data-distribution. Also, the article should always exist in the articles table, which means half of this check can be thrown out the window. This is an attribute being applied in the wrong place, from this initial discussion's perspective.

This seems like an odd design choice to me that requires further analysis. In the worst case scenario remove the 'check' and simply include a row into the table when the article is first generated.

Query3:

I found all of this information from a Profiler trace so that I could better understand how this particular query works, and to ensure that the data I used for the parameters was typical.
Take a few minutes to review the actual code I ran, the IO STATISTICS results, and the execution plans. Based on this information, what, if anything, would you do to try to optimize this query?

Without knowing the wrapper and intent of the outlying query, it is hard to know if this is just cost of doing business or not. What is the purpose here of using a second proc by the calling proc and forcing a temptable involved? Is this a cross linked server query? Is this the coder's way of using 'modular' code?

The query itself also seems odd out of context. It's doing a double pull from the source, selecting PostId twice but never using the variable. P.ThreadID = @ThreadID could be swapped to the two parameters in the WHERE clause in the first select, and thus completely remove the reasoning behind the double call. Can we confirm this is a correct translation of the original code and/or see the originating proc with a sample parameter? This doesn't make sense at first glance.

Query 4:

The last of the queries we are examining in this article is the longest of the four, and is used by the SQLServerCentral.com blog to display information about blog authors.

The indexing is off in cs_sections. Nowhere will it leading edge with SectionID with a second column of ApplicationType, it's all ApplicationKey first. Unless there's other queries using ApplicationKey, swap those around in the Unique index IX_cs_Sections_Validation and that may help on selectivity, can't know without seeing heuristics. 97 rows should NOT take up that much firepower though.

After that, I'd like to know the heuristics on selectivity of SectionID in cs_weblog_WebLogs. 89 rows isn't that expensive to scan though, so it might be a moot point.

This query being an issue is something of a puzzle. I really want to see why 97 rows by 89 rows (even crossjoined) is taking 18ms to produce, unless it's time to deliver the data that's really the issue and it's asynch_IO waits, which are a whole different story. That's really something that I think you'd have to be on-server for. It's relatively tight other than the flip on the index, and that w.* pretty much kills any covering index trims.

Other notes:

I personally get some different top 10s in the 15 minute totals overview, and in the 5 minute average. I don't have a 24 hour option available from the free review.

Ninja, thanks for asking about the DDL. The first thing that struck me as I went through the process was "Where's the schema?... must be attached at the end or something."

Brad, you know the drill. If you want intelligent answers we need to see the execution plans. Otherwise we're swinging an axe in the dark. No heuristics, no visibility to blocks and locks at time of execution, no ability to run traces on things, we're already limited. No reason to tie the other hand up too.

- Craig Farrell

Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

For better assistance in answering your questions[/url] | Forum Netiquette
For index/tuning help, follow these directions.[/url] |Tally Tables[/url]
Twitter: @AnyWayDBA