optimizing a complex join

  • I have a really ugly looking update statement. What it does is look for duplicate ordinal sort indexes in some records and move some to the end if it finds any. The way this happens is that we are bringing in edits from another database. Anyway, to make the problem more complex, the records have a hierarchy and the ordinal sort indexes are within the parent only. Now before anyone jumps all over me about poor design, I will save you the trouble; it's bad, but it's out there. Anyway, I have a statement that works, but my concern is that the most limiting constraint on the joins is buried too deep and make cause SQL Server to gather a bunch of unneeded data for other parts of the join. Here it is:

    Update tblExplorer_AuditProcedures set SortIndex = apDupes.TopOrdinal + apDupes.DupeRow from

    tblExplorer_AuditProcedures inner join

    (SELECT ap.ProcedureID, (Rank() OVER (Partition by ap.Parent_ProcedureID,

    ap.SortIndex ORDER BY ap.FileTypeOrigin) -1) as DupeRow, Ords.TopOrdinal FROM

    dbo.tblExplorer_AuditProcedures AS ap

    INNER JOIN

    dbo.tblExplorer_AuditProcedures AS ap1 ON ap.Parent_ProcedureID = ap1.Parent_ProcedureID

    AND ap.ProcedureID <> ap1.ProcedureID AND ap.SortIndex = ap1.SortIndex Inner join

    (SELECT Parent_ProcedureID, MAX(SortIndex) AS TopOrdinal

    FROM dbo.tblExplorer_AuditProcedures

    GROUP BY Parent_ProcedureID) as Ords

    on ap.Parent_ProcedureID = Ords.Parent_ProcedureID) as apDupes

    on tblExplorer_AuditProcedures.ProcedureID = apDupes.ProcedureID

    where DupeRow > 0

    There will be very few records that meet the criteria that I put in bold. All the others just provide additional information about those records. Even the where clause is only exercised on the master records that have dupes (the sort by FileTypeOrigin for ranking assures the master records will be at the top). My concern is that SQL may figure out what the max ordinal is for every parent proc in the database even though only a few are needed or sorting and ranking all procedures and then checking for matches. I am not sure if there is a way to make it check the bold condition first to feed the others (I am not sure it doesn’t now).

    If you know how to optimize this, I would prefer an explanation as I have other similar update statements to apply the same sort of optimization to. But if you want to provide me a magic solution that works with no explanation, I can live with that. 🙂

    Thanks!

    Andy

  • If you really want to force it to check those first, then you can make a Temp table to hold the results of that join. Populate it first and then replace the join in the update with the temp table. You may want to add keys and indexes to the temp table also.

    To do much more, we would have to see the table definitions, query plan and some sample data (see this article on how to do this: http://www.sqlservercentral.com/articles/Best+Practices/61537/).

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

  • What do the indexes on these tables look like? For example, because of this statement:

    AND ap.ProcedureID <> ap1.ProcedureID

    If ProcedureID is the leading edge of the index used in the join, you're going to get full table or index scans. If the other columns are the leading edge and this is only part, then you're more likely to get either a seek or small scan against a range within the index.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Thanks. I may try the temp table. The reason I didn't try to provide all sorts of detail is that I was/am trying to ask a somewhat generic question. If I have 3 conditions - A, B and C and test them like this:

    where A and B and C

    I would expect it to short circuit and not test B and C if A is false. Is there a similar thing that can be done with joins? They are all inner joins, so if any of them return no records, it seems like it would short circuit. Is that correct? If it is correct, what order are they tested in give the following pseudo code join:

    from T1 inner join T2 on A inner join T2 on B inner join T3 on C

  • Grant Fritchey (10/17/2008)


    What do the indexes on these tables look like? For example, because of this statement:

    AND ap.ProcedureID <> ap1.ProcedureID

    If ProcedureID is the leading edge of the index used in the join, you're going to get full table or index scans. If the other columns are the leading edge and this is only part, then you're more likely to get either a seek or small scan against a range within the index.

    There is only one table, referenced a shocking number of times in one statement. ProcedureID is the key field. In that part you referenced, I am looking for a record under the same parent, with the same sort index but that isn't the same record.

  • arbarnhart (10/17/2008)


    Thanks. I may try the temp table. The reason I didn't try to provide all sorts of detail is that I was/am trying to ask a somewhat generic question. If I have 3 conditions - A, B and C and test them like this:

    where A and B and C

    I would expect it to short circuit and not test B and C if A is false.

    The order of conditions in a join/where (and in fact the order of joins) does not guarantee the order of execution. The optimiser can and will change the order that conditions are evaluated and joins are done to get the most optimal execution plan it can find.

    See - http://sqlinthewild.co.za/index.php/2007/07/29/order-of-execution/

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • I have had that discussion with other developers here and I thought SQL did a pretty good job. I lean toward creating single statements whenever possible to give SQL server the best chance to optimize it. But can it "see through" paranthesis to look at all my subqueres together or does it only optimize within a sub query?

    Thanks!

    Andy

  • It will treat them, as much as it can, as a whole. But the more complicated the query, the longer it will take the optimizer to find a good plan. As it takes the optimizer longer, it will tend to drop out of the optimization process with the best plan it has at the moment. That could be a horrific plan. Any time you can, you should simplify things to ensure a better plan.

    From the sounds of this data, it really might benefit from a break down in processing, getting a smaller data set up front and then using that to filter the larger set, even though it's in two steps, could be the faster way to go. Testing is the only way to be sure sometimes.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply