Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase

Join performance gets much worse after combining two selects that are each fairly fast Expand / Collapse
Author
Message
Posted Friday, September 7, 2012 3:25 PM
Grasshopper

GrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopper

Group: General Forum Members
Last Login: Thursday, November 15, 2012 2:08 PM
Points: 11, Visits: 62
I have a query that seems to take forever to run. I broke it into parts to try to understand it better.

This query takes about 1 or 2 seconds to return main document and addendum id numbers:

SELECT pat.TIUDocumentSID, pat.ScrSSN, ad.TIUDocumentSID AS TIUAddendumSID
FROM PatientMap_2_patients AS pat
LEFT JOIN Src.TIUDocument_8925_New AS ad
ON pat.Sta3n = ad.Sta3n AND pat.TIUDocumentIEN = ad.ParentIEN
ORDER BY pat.TIUDocumentSID

This query takes about 10 seconds to return main document and diagnosis id numbers:

SELECT pat.TIUDocumentSID, pat.ScrSSN, icd.ICDCode
FROM PatientMap_2_patients AS pat
LEFT JOIN Visit_IDs AS vis
ON pat.Sta3n = vis.Sta3n AND pat.VisitIEN = vis.VisitIEN
LEFT JOIN VDiagnosis AS diag
ON vis.VisitSID = diag.VisitSID
LEFT JOIN ICD AS icd
ON diag.ICDSID = icd.ICDSID
ORDER BY pat.TIUDocumentSID

This query returns all three -- main document, addendum and diagnosis id numbers. I don't know how long it takes to finish, because I killed it after an hour when it still hadn't returned.

SELECT pat.TIUDocumentSID, pat.ScrSSN, ad.TIUDocumentSID AS TIUAddendumSID, icd.ICDCode
FROM PatientMap_2_patients AS pat
LEFT JOIN TIUDocument_8925_New AS ad
ON pat.Sta3n = ad.Sta3n AND pat.TIUDocumentIEN = ad.ParentIEN
LEFT JOIN Visit_IDs AS vis
ON pat1.Sta3n = vis.Sta3n AND pat1.VisitIEN = vis.VisitIEN
LEFT JOIN VDiagnosis AS diag
ON vis.VisitSID = diag.VisitSID
LEFT JOIN ICD AS icd
ON diag.ICDSID = icd.ICDSID;
ORDER BY pat.TIUDocumentSID, TIUAddendumSID

It's estimated execution plan says that 84% of the cost is in sorting the results after they are all joined together. If I skip the ORDER BY step, it says that 75% of the cost is in the final "Parallelism (Gather Streams)" step.

On the other hand, if I select the results of the first two queries into temp tables and change the third query so it selects from those tables, that whole process takes about 25 seconds. The tables I am working with are very small -- PatientMap_2_patients has 608 rows in it, and the third query is supposed to return 1201 rows. Would creating and dropping temp tables to make this run faster scale to working with tables that have up to 80 million rows? Or is there any way to make the third query run faster? I can add indexes to the PatientMap_2_patients and ICD tables, but not to TIUDocument_8925_New, Visit_IDs or VDiagnosis.
Thanks,
Mike
Post #1356263
Posted Friday, September 7, 2012 3:32 PM


SSChampion

SSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampion

Group: General Forum Members
Last Login: Today @ 1:23 PM
Points: 13,082, Visits: 12,542
There are probably a number of things that can be done. However we don't have enough details to even hazard a guess.

Take a look at this article. http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

It does a great job explaining the kinds of information we would need to help with performance issues.


_______________________________________________________________

Need help? Help us help you.

Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

Need to split a string? Try Jeff Moden's splitter.

Cross Tabs and Pivots, Part 1 – Converting Rows to Columns
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs
Understanding and Using APPLY (Part 1)
Understanding and Using APPLY (Part 2)
Post #1356267
Posted Friday, September 7, 2012 3:52 PM


SSCertifiable

SSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiable

Group: General Forum Members
Last Login: Saturday, October 18, 2014 4:43 AM
Points: 5,401, Visits: 7,512
Two brief suggestions, then we'll need sqlplans and the like to actually help you troubleshoot the direct cause.

Add OPTION ( MAXDOP(1) ) to the full query and confirm it's not the parallelism goofing you up. Occassionally it can.

Divide and conquer. Drop the first query into a #tmp with a reasonable index for the rest of the joins, then use that temp table as the primary source connecting to the rest. It's not an uncommon optimization for complex components with a lot of row variation.

Part of the issue could also be the extended ORDER BY, where the first components were easily handled via existing indexing, and the second requires secondary sorting or is causing it to use alternate indexing, but that's one of those 'examine the .sqlplan and schema' kind of reviews.



- Craig Farrell

Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

For better assistance in answering your questions | Forum Netiquette
For index/tuning help, follow these directions. |Tally Tables

Twitter: @AnyWayDBA
Post #1356273
Posted Friday, September 7, 2012 4:54 PM
Grasshopper

GrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopper

Group: General Forum Members
Last Login: Thursday, November 15, 2012 2:08 PM
Points: 11, Visits: 62
The .sqlplan files for the queries that I put in the first message are attached.

MainDocumentAndAddendaActualPlan.sqlplan is from the first query
MainDocumentAndDiagnosesActualPlan.sqlplan is from the second query
MainDocumentAddendaAndDiagnosesEstimatedPlan.sqlplan is from the third query

MainDocumentAndAddendaIntoTempTableActualPlan.sqlplan is from saving the first query results in a temp table
MainDocumentAndDiagnosesIntoTempTableActualPlan.sqlplan is from saving the second query results in a temp table
MainDocumentAddendaAndDiagnosesFromTempTablesActualPlan.sqlplan is like the third query, but it uses the temp tables. That query looks like this:

SELECT doc_ad.TIUDocumentSID, doc_ad.ScrSSN, doc_ad.TIUDocumentSID, doc_dx.ICDCode
FROM #DocAndAdID AS doc_ad
LEFT JOIN #DocAndDxID AS doc_dx
ON doc_ad.TIUDocumentSID = doc_dx.TIUDocumentSID
ORDER BY doc_ad.TIUDocumentSID, doc_ad.TIUAddendumSID



  Post Attachments 
MainDocumentAddendaAndDiagnosesEstimatedPlan.sqlplan (5 views, 87.58 KB)
MainDocumentAddendaAndDiagnosesFromTempTablesActualPlan.sqlplan (3 views, 9.30 KB)
MainDocumentAndAddendaActualPlan.sqlplan (3 views, 45.45 KB)
MainDocumentAndAddendaIntoTempTableActualPlan.sqlplan (1 view, 130.24 KB)
MainDocumentAndDiagnosesActualPlan.sqlplan (1 view, 43.12 KB)
MainDocumentAndDiagnosesIntoTempTableActualPlan.sqlplan (1 view, 50.53 KB)
Post #1356299
Posted Friday, September 7, 2012 5:41 PM


SSCertifiable

SSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiable

Group: General Forum Members
Last Login: Saturday, October 18, 2014 4:43 AM
Points: 5,401, Visits: 7,512
tmoleary (9/7/2012)

MainDocumentAndAddendaActualPlan.sqlplan is from the first query
MainDocumentAndDiagnosesActualPlan.sqlplan is from the second query
MainDocumentAddendaAndDiagnosesEstimatedPlan.sqlplan is from the third query


Starting with these, the first problem is the 286 billion row estimate going into the select at the tail of the combined query.

It's relatively alright coming off the ICD table scan (Heap table) to include the vDiagnosis, Visit_IDs, and PatientMap_2_patients tables, but then it self destructs including information fron TIUDocument_8925 and VHAEnterpriseStdTitle_Filter. It's full level statement optimization, so it's possible the statistics are just horribly stale here. I'd still triple check your joins, and take a good look at ICD to determine if some indexes will make sense there, particularly a sound clustered index.


MainDocumentAndAddendaIntoTempTableActualPlan.sqlplan is from saving the first query results in a temp table
MainDocumentAndDiagnosesIntoTempTableActualPlan.sqlplan is from saving the second query results in a temp table


These should just be repetitions from the first set but using an INSERT instead of a select at the tail. They aren't, but nothing in them during a fast-glance review showed that there was any significant concerns. If anything, they looked a little more efficient, even if there are more components.

However:

MainDocumentAddendaAndDiagnosesFromTempTablesActualPlan.sqlplan is like the third query, but it uses the temp tables. That query looks like this:

SELECT doc_ad.TIUDocumentSID, doc_ad.ScrSSN, doc_ad.TIUDocumentSID, doc_dx.ICDCode
FROM #DocAndAdID AS doc_ad
LEFT JOIN #DocAndDxID AS doc_dx
ON doc_ad.TIUDocumentSID = doc_dx.TIUDocumentSID
ORDER BY doc_ad.TIUDocumentSID, doc_ad.TIUAddendumSID



Nice, clean, and orderly. You could even possibly align the index on one of the #tmps to the rest of the data's indexes and only need a single #tmp insert, lowering the # of statements in the process. How long did the multiple temp table build take to run end to end? Btw, if you include an index on both temps for TIUDocumentSID you'll get yourself a merge join, and if you include TIUAddendumSID in the index for #DocAndAdIDs you should be able to get it pre-sorted before this query runs.

I would also recommend clustered indexing ICD on ICDSID for cleaner queries. That heap can't be helping matters here. At the very least, a non-clustered index on ICDSID including ICDCode as a leaf level inclusion would allow for that index to be used instead of the full table.

Also, when you get some maintenance time, get a statistics update with fullscan going, and while you're at it check to make sure your defraggers are running as well (just always worth a look). My guess for this is query complexity + stale statistics are the primary cause of the issue, compounded by a heap in the middle. Temp table divide and conquer is not bad practice however so I would certainly approach it from that perspective unless you aren't getting good time runs from the practice.



- Craig Farrell

Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

For better assistance in answering your questions | Forum Netiquette
For index/tuning help, follow these directions. |Tally Tables

Twitter: @AnyWayDBA
Post #1356305
Posted Friday, September 7, 2012 7:13 PM
Grasshopper

GrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopper

Group: General Forum Members
Last Login: Thursday, November 15, 2012 2:08 PM
Points: 11, Visits: 62
The database I am working with is part of a data warehouse made available for research teams, so there are some limits to what I can do with adding indexes to tables. In particular, TIUDocument_8925_New is a view to some tables, and we don't have access to those tables (except via the view). The join of TIUDocument_8925 and VHAEnterpriseStdTitle_Filter is in the view's definition, so that join is apparently executed whenever a query references TIUDocument_8925_New. We have little or no leverage in asking for indexes to be changed on those underlying tables. I can change the indexes on the PatientMap_2Patients and ICD tables (and the temp tables, of course) though, since we created them.

Adding the MAXDOP hint had a big impact. The original query that ran for an hour without returning results finished in about 12 minutes with the MAXDOP hint. When I used the hint on queries that used one or the other of the temp tables, they ran faster, especially the one that used the diagnoses temp table. Using only the addenda temp table with MAXDOP in the full query took 4 minutes, and using only the diagnoses temp table with MAXDOP took 33 seconds. Using both temp tables allowed the query to run in about 3 seconds. They query lans for these queries are attached.

It takes about 25 seconds to create each of the two temp tables, and 3 seconds for the query that uses them, but if I execute the temp table creation statements and the query together, the whole thing takes about 1:25.

I don't know very much about SQL tuning, so I need to ask you to explain how to do some of the things you suggested. These are:
- align the index on one of the #tmps to the rest of the data's indexes [Could you give an example of this? I'm not sure if I understand how to do it.]
- include TIUAddendumSID in the index for #DocAndAdIDs [does this mean defining the index on both columns or mentioning TIUAddendumSID in an INCLUDE clause?]
- get a statistics update with fullscan going [I don't know how to do this. Could you tell me what command(s) to use?]

Thanks,
Mike


  Post Attachments 
MainDocumentAddendaAndDiagnoses_FromAddendaTempTable_WithMAXDOPHint_ActualPlan.sqlplan (0 views, 19.68 KB)
MainDocumentAddendaAndDiagnoses_FromDiagnosesTempTable_WithMAXDOPHint_ActualPlan.sqlplan (0 views, 56.50 KB)
MainDocumentAddendaAndDiagnoses_WithMAXDOPHint_ActualPlan.sqlplan (0 views, 67.37 KB)
Post #1356327
Posted Monday, September 10, 2012 12:36 PM


SSCertifiable

SSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiable

Group: General Forum Members
Last Login: Saturday, October 18, 2014 4:43 AM
Points: 5,401, Visits: 7,512
tmoleary (9/7/2012)
The database I am working with is part of a data warehouse made available for research teams, so there are some limits to what I can do with adding indexes to tables. In particular, TIUDocument_8925_New is a view to some tables, and we don't have access to those tables (except via the view). The join of TIUDocument_8925 and VHAEnterpriseStdTitle_Filter is in the view's definition, so that join is apparently executed whenever a query references TIUDocument_8925_New. We have little or no leverage in asking for indexes to be changed on those underlying tables. I can change the indexes on the PatientMap_2Patients and ICD tables (and the temp tables, of course) though, since we created them.

Ah, so you're mixing in with a vendor system. Always a bit of a pain.

Adding the MAXDOP hint had a big impact. The original query that ran for an hour without returning results finished in about 12 minutes with the MAXDOP hint. When I used the hint on queries that used one or the other of the temp tables, they ran faster, especially the one that used the diagnoses temp table. Using only the addenda temp table with MAXDOP in the full query took 4 minutes, and using only the diagnoses temp table with MAXDOP took 33 seconds. Using both temp tables allowed the query to run in about 3 seconds. They query lans for these queries are attached.

That's some interesting data. So gathering streams on the queries is causing you some signficant problems. Can you describe the # of cores you've got going on the source server? Also, give it a go with maxdop(4) instead of 1. It may just be overextending itself into the nether reaches.


I don't know very much about SQL tuning, so I need to ask you to explain how to do some of the things you suggested. These are:

Of course, not a problem.


- align the index on one of the #tmps to the rest of the data's indexes [Could you give an example of this? I'm not sure if I understand how to do it.]

Aligning indexes means making sure the indexes have the same column order and sorting order as other existing indexing. For example, if you have an Index that's on columns C, A DESC, B, you'll want your temp table to have the equivalent clustered index to help match up cleaner on the join to the permanent table.

- include TIUAddendumSID in the index for #DocAndAdIDs [does this mean defining the index on both columns or mentioning TIUAddendumSID in an INCLUDE clause?]

Because you're using it in the order by, you should include it in the actual index definition. Sorry for the confusion there.

- get a statistics update with fullscan going [I don't know how to do this. Could you tell me what command(s) to use?]

This is more an administrative command, and I'm not sure if you're also the DBA as well as the DBDev on this system. The command you want is UPDATE STATISTICS <tablename> WITH FULLSCAN. You'd end up originally running this on every table in the query to see if it makes a difference. Depending on the size of the underlying tables this can beat up some performance, so you want to make sure you run it during maintenance periods where you're not also reindexing or rebuilding the indexes in these tables.

Thanks,
Mike[/quote]



- Craig Farrell

Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

For better assistance in answering your questions | Forum Netiquette
For index/tuning help, follow these directions. |Tally Tables

Twitter: @AnyWayDBA
Post #1356955
« Prev Topic | Next Topic »

Add to briefcase

Permissions Expand / Collapse