RE: Join performance gets much worse after combining two selects that are each fairly fast

SSC Guru

Points: 100851

September 7, 2012 at 5:41 pm

tmoleary (9/7/2012)
MainDocumentAndAddendaActualPlan.sqlplan is from the first query
MainDocumentAndDiagnosesActualPlan.sqlplan is from the second query
MainDocumentAddendaAndDiagnosesEstimatedPlan.sqlplan is from the third query

Starting with these, the first problem is the 286 billion row estimate going into the select at the tail of the combined query.

It's relatively alright coming off the ICD table scan (Heap table) to include the vDiagnosis, Visit_IDs, and PatientMap_2_patients tables, but then it self destructs including information fron TIUDocument_8925 and VHAEnterpriseStdTitle_Filter. It's full level statement optimization, so it's possible the statistics are just horribly stale here. I'd still triple check your joins, and take a good look at ICD to determine if some indexes will make sense there, particularly a sound clustered index.

MainDocumentAndAddendaIntoTempTableActualPlan.sqlplan is from saving the first query results in a temp table
MainDocumentAndDiagnosesIntoTempTableActualPlan.sqlplan is from saving the second query results in a temp table

These should just be repetitions from the first set but using an INSERT instead of a select at the tail. They aren't, but nothing in them during a fast-glance review showed that there was any significant concerns. If anything, they looked a little more efficient, even if there are more components.

However:

MainDocumentAddendaAndDiagnosesFromTempTablesActualPlan.sqlplan is like the third query, but it uses the temp tables. That query looks like this:
SELECT doc_ad.TIUDocumentSID, doc_ad.ScrSSN, doc_ad.TIUDocumentSID, doc_dx.ICDCode
FROM #DocAndAdID AS doc_ad
LEFT JOIN #DocAndDxID AS doc_dx
ON doc_ad.TIUDocumentSID = doc_dx.TIUDocumentSID
ORDER BY doc_ad.TIUDocumentSID, doc_ad.TIUAddendumSID

Nice, clean, and orderly. You could even possibly align the index on one of the #tmps to the rest of the data's indexes and only need a single #tmp insert, lowering the # of statements in the process. How long did the multiple temp table build take to run end to end? Btw, if you include an index on both temps for TIUDocumentSID you'll get yourself a merge join, and if you include TIUAddendumSID in the index for #DocAndAdIDs you should be able to get it pre-sorted before this query runs.

I would also recommend clustered indexing ICD on ICDSID for cleaner queries. That heap can't be helping matters here. At the very least, a non-clustered index on ICDSID including ICDCode as a leaf level inclusion would allow for that index to be used instead of the full table.

Also, when you get some maintenance time, get a statistics update with fullscan going, and while you're at it check to make sure your defraggers are running as well (just always worth a look). My guess for this is query complexity + stale statistics are the primary cause of the issue, compounded by a heap in the middle. Temp table divide and conquer is not bad practice however so I would certainly approach it from that perspective unless you aren't getting good time runs from the practice.

- Craig Farrell

Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

For better assistance in answering your questions[/url] | Forum Netiquette
For index/tuning help, follow these directions.[/url] |Tally Tables[/url]
Twitter: @AnyWayDBA