Need help with runaway query using multiple APPLY joins

  • I’ve never written a query with multiple APPLY joins before and I’m running into some troubles with my first one. The below SQL statement runs within 10 seconds if I comment out either one of the APPLY joins and its corresponding field columns. However, when I try to execute with both APPLY joins, the query runs indefinitely. The longest I’ve waited before cancelling it is 90 minutes.

    Now, I know there are probably other ways I could write this query to get me the results I’m looking for. I’m posting this on the board because I’m curious about finding out why multiple APPLY joins could cause SQL Server to run away. I’m hoping to gain some insight so that I can better understand how APPLY joins work so that in case I have a big need to do this again in the future (without suitable workarounds) I can code it correctly.

    Here are some things I’ve tried so far…

    1.Changed the States table into a subquery that only returns a single state

    2.Change all the references inside the APPLY subqueries so that they had different aliases (just in case they were conflicting with each other).

    3.Changed the CROSS applies to OUTER applies. States has 50 records and only 32 have matching permit data so the 18 extra iterations using OUTER APPLY don’t impact performance any when an APPLY is used by itself.

    SELECT s.state_name

    , COUNT(DISTINCT DUPS.PermitNumber) AS NumOfDupPermits

    , SUM(DistinctPermits) AS DistinctPermits

    FROM States S

    CROSS APPLY (SELECT w.StateID, COUNT(*) as DistinctPermits

    FROM Permit P INNER JOIN Well W1 ON P.WellID = w.WellID

    WHERE w.StateID=S.Stateid

    GROUP BY P.PermitNumber, w.StateID) STD

    CROSS APPLY (select P.PermitNumber, w.StateID AS State_IDs, COUNT(*) as counts

    FROM Permit P INNER JOIN Well W2 ON P.WellID = w.WellID

    WHERE w.StateID=S.Stateid

    GROUP BY P.PermitNumber, w.StateID

    HAVING count(*)>1) DUPS

    group by s.state_name

  • Do you have the DDL for these tables and some sample data?

  • Take a look at the estimated execution plan. It will tell you the choices SQL Server is making based on the code you provided and the statistics available on the tables and indexes.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Check your table aliases:

    SELECT s.state_name

    , COUNT(DISTINCT DUPS.PermitNumber) AS NumOfDupPermits

    , SUM(DistinctPermits) AS DistinctPermits

    FROM States S

    CROSS APPLY (

    SELECT w.StateID, COUNT(*) as DistinctPermits

    FROM Permit P

    INNER JOIN Well W1 ON P.WellID = w.WellID -- W1 / w

    WHERE w.StateID = S.Stateid

    GROUP BY P.PermitNumber, w.StateID

    ) STD

    CROSS APPLY (

    select P.PermitNumber, w.StateID AS State_IDs, COUNT(*) as counts

    FROM Permit P

    INNER JOIN Well W2 ON P.WellID = w.WellID -- W2 / w

    WHERE w.StateID = S.Stateid

    GROUP BY P.PermitNumber, w.StateID

    HAVING count(*)>1

    ) DUPS

    group by s.state_name

    Since this query as it stands cannot run, it has to be different to the actual query you are running. The differences could be significant. If you post the estimated execution plan as Grant suggests, we get to see your actual query, and a few other useful clues too.

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • ChrisM@Work (3/21/2014)


    Check your table aliases:

    SELECT s.state_name

    , COUNT(DISTINCT DUPS.PermitNumber) AS NumOfDupPermits

    , SUM(DistinctPermits) AS DistinctPermits

    FROM States S

    CROSS APPLY (

    SELECT w.StateID, COUNT(*) as DistinctPermits

    FROM Permit P

    INNER JOIN Well W1 ON P.WellID = w.WellID -- W1 / w

    WHERE w.StateID = S.Stateid

    GROUP BY P.PermitNumber, w.StateID

    ) STD

    CROSS APPLY (

    select P.PermitNumber, w.StateID AS State_IDs, COUNT(*) as counts

    FROM Permit P

    INNER JOIN Well W2 ON P.WellID = w.WellID -- W2 / w

    WHERE w.StateID = S.Stateid

    GROUP BY P.PermitNumber, w.StateID

    HAVING count(*)>1

    ) DUPS

    group by s.state_name

    Since this query as it stands cannot run, it has to be different to the actual query you are running. The differences could be significant. If you post the estimated execution plan as Grant suggests, we get to see your actual query, and a few other useful clues too.

    Yes, sorry about posting invalid SQL. Right before I made this post, I tried messing with the aliases (#2 in my original post) and when that was unsuccessful I thought I had reverted all the changes I made but definitely missed some.

    Here's the functional SQL.

    SELECT s.state_name

    , COUNT(DISTINCT DUPS.StatePermitSerialNumber) AS NumOfDupPermits, SUM(DUPS.AmendedDups) As AmendedDups --<<comment out these fields when you comment out the DUPS apply join

    , SUM(DistinctPermits) AS DistinctPermits --<<comment out this field when you comment out the STD apply join

    FROM States S

    CROSS APPLY (SELECT W1.StateID, COUNT(*) as DistinctPermits

    FROM Permit P1 INNER JOIN Well W1 ON P1.WellID = W1.WellID

    WHERE w1.StateID=S.Stateid

    GROUP BY p1.StatePermitSerialNumber, w1.StateID) STD

    CROSS APPLY (select P2.StatePermitSerialNumber, w2.StateID AS State_IDs, COUNT(*) as counts,SUM(DISTINCT ISNULL(P2.AmendedPermit,0)) AmendedDups

    FROM Permit P2 INNER JOIN Well W2 ON P2.WellID = W2.WellID

    WHERE w2.StateID=S.Stateid

    GROUP BY p2.StatePermitSerialNumber, w2.StateID

    HAVING count(*)>1) DUPS

    group by s.state_name

    I've attached the estimated execution plan for the query with both joins and then I've included the actual execution plan for two queries where one of the joins is commented out. Thanks for your help!

  • I think you can simplify the query something like below, although some tweaking might need done:

    SELECT

    s.state_name,

    derived.NumOfDupPermits, derived.AmendedDups, derived.DistinctPermits

    FROM (

    SELECT

    W.StateID,

    COUNT(P.StatePermitSerialNumber) - COUNT(DISTINCT P.StatePermitSerialNumber) AS NumOfDupPermits,

    CASE WHEN COUNT(P.StatePermitSerialNumber) - COUNT(DISTINCT P.StatePermitSerialNumber) <= 0 THEN 0

    ELSE SUM(DISTINCT ISNULL(P.AmendedPermit,0)) END AS AmendedDups,

    COUNT(DISTINCT P.StatePermitSerialNumber) AS DistinctPermits

    FROM Permit P

    INNER JOIN Well W ON

    P.WellID = W.WellID

    GROUP BY

    W.StateID

    ) AS derived

    INNER JOIN States S ON

    S.StateID = derived.StateID

    ORDER BY

    s.state_name

    SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

  • Thanks.

    So I actually came up with an alternate solution shortly after making this post. However, getting an alternate solution was not my goal with this post. My goal with this post was to understand SQL Server better and try to understand why two APPLY joins causes this query to run away. I haven't done many APPLY joins so if there's something I'm doing wrong here that causes multiple APPLY joins to go crazy I want to know.

  • bncaffey (3/27/2014)


    Thanks.

    So I actually came up with an alternate solution shortly after making this post. However, getting an alternate solution was not my goal with this post. My goal with this post was to understand SQL Server better and try to understand why two APPLY joins causes this query to run away. I haven't done many APPLY joins so if there's something I'm doing wrong here that causes multiple APPLY joins to go crazy I want to know.

    First, how many rows are in Well and Permit tables?

    Second, would you please post the actual code? All of your execution plans say they're using "tblWell" and "tblPermit". Yes, we'll make fun of you for having "tbl" in your database but that's only temporary. 😉

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Well has 5.6M rows and Permit has 1.4M rows.

    The SQL posted is the actual working SQL. I just started at this company (yes, these tables are in a company data mart) so views were created to hide the tbl-ness!

  • So does anyone have experience writing queries with multiple APPLY joins?

  • bncaffey (3/31/2014)


    So does anyone have experience writing queries with multiple APPLY joins?

    Yes, but I've not observed this before. I can't see anything obvious in the plans either. If you have the patience and the time to do it, the actual plan might offer more clues.

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

Viewing 11 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic. Login to reply