Cross Apply Executing Too Many Times

  • I have a rather complex set of queries that are mostly embedded in table valued functions and call one another in chains. I've had a number of very peculiar performance issues that seem to come and go, but that don't seem to be directly related to the queries themselves (heavily indexed tables).

    Currently I'm examining a particular cross apply that seems to be behaving strangely. I'm simplifying a bit, but the two following queries return the "same" result, just different ways; the main difference is that the first one has an additional column hardcoded instead of relying on the cross join to fill it in.

    select 'LA13G12ST-05', 10000, pidcm.met, pidcm.quantityFound

    from table_valued_function('LA13G12ST-05', 10000, '00000630', 1, 'S0000013122', 'LA2013') pidcm

    union all

    select 'LA13G12ST-05', 20000, pidcm.met, pidcm.quantityFound

    from table_valued_function('LA13G12ST-05', 20000, '00000630', 1, 'S0000013122', 'LA2013') pidcm

    select 'LA13G12ST-05', pidc.[Line No_], pidcm.met, pidcm.quantityFound

    from (select distinct pidc.[Line No_]

    from table1 pid

    join table2 pidc

    on pid.[Pidc Code] = pidc.Code

    where pid.Code = 'LA2013'

    and pid.[Line No_] = 10000) pidc

    cross

    apply table_valued_function('LA13G12ST-05', pidc.[Line No_], '00000630', 1, 'S0000013122', 'LA2013') pidcm

    Yes, the subquery does return exactly two rows, 10000 and 20000

    So, when I run the both queries, I get the expected (and identical) results from both

    LA13G12ST-051000003

    LA13G12ST-052000000

    LA13G12ST-051000003

    LA13G12ST-052000000

    However, the statistics are very different:

    SQL Server parse and compile time:

    CPU time = 0 ms, elapsed time = 3 ms.

    (2 row(s) affected)

    Table '#71DCD509'. Scan count 1, logical reads 1, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table '#70E8B0D0'. Scan count 1, logical reads 1, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 813 ms, elapsed time = 809 ms.

    (2 row(s) affected)

    Table '#71DCD509'. Scan count 73, logical reads 78, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'Table2'. Scan count 1, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'Table1'. Scan count 0, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 28859 ms, elapsed time = 28898 ms.

    Curious on why the latter required 73 times as many scans, I ran the SQL profiler, and found that the table valued function is running 72 times in the cross apply example.

    It may be purely a coincidence, but there happen to be 72 rows in the subquery if you remove the where clause from it. So, it looks to me as if SQL Server is unrolling my subquery, moving its "where" to the outer query, executing the cross apply, and then filtering. Unfortunately, this ends up being much more expensive.

    So, is there some way to convince the query optimizer to do otherwise? I had expecting that adding the "distinct" to the subquery (not technically needed) would help the optimizer know it should do the inner part first, but it didn't.

    Being from an Oracle background, the first thing that comes to mind is to put the inner query in a cursor, and loop through its results calling the function for each, so I can control the number of function calls. Such behavior seems to be frowned upon in this forum, and disparagingly referred to as RBAR. So, I'm open to alternatives.

    I'm sure someone will ask whether it is possible to avoid table valued functions and roll all the logic into a single SQL statement. It would be very difficult (perhaps impossible) because of two things:

    1) Analytic functions aren't fully available in SQL Server 2008R2 (and I can't easily upgrade to 2012); specifically, I need to limit something by a running total, and the "rows between" clause isn't available yet on the analytic sum function.

    2) In several cases, completely different queries need to run depending upon some other setting or function result (case statement runs different queries). It might be possible perhaps to join all the possible results and discard the ones that don't apply, but I'm not sure that would be good for performance anyway.

    Edit:

    I found a workaround, sort of; I can select the subquery into a temporary table and then cross apply against that temporary table instead of the subquery. That seems to solve the particular problem, but it also forces this function to be a multi-statement table valued function, whereas before it could have been in-line. Since it is calling a multi-statement TVF (one with a gigantic case), I suppose the distinction is largely irrelevant.

    I don't suppose there is some optimizer hint that says "this part is expensive"? I tried force order, but it didn't help (maybe it applies only to joins, not applies).

  • OPTION (RECOMPILE)
    fixed this issue for me.

  • CROSS APPLY is a row level operator. So your table_valued_function will be run once for each row the rest of the query returns.
    So my guess is that the query:
    select *
    from (select distinct pidc.[Line No_]
            from table1 pid
            join table2 pidc
              on pid.[Pidc Code] = pidc.Code
           where pid.Code = 'LA2013'
             and pid.[Line No_] = 10000) pidc

    will return over 70 rows.

  • Thanks for the input. But since I posted this question six years ago, I don't have all the code instrumented and ready to test suggestions right now.

  • LOL, Sorry I didn't notice the original post was that old. Someone else brought it back from the dead.

  • Jonathan AC Roberts - Monday, January 14, 2019 10:35 AM

    CROSS APPLY is a row level operator. So your table_valued_function will be run once for each row the rest of the query returns.
    So my guess is that the query:
    select *
    from (select distinct pidc.[Line No_]
            from table1 pid
            join table2 pidc
              on pid.[Pidc Code] = pidc.Code
           where pid.Code = 'LA2013'
             and pid.[Line No_] = 10000) pidc

    will return over 70 rows.

    I'm pretty sure that's not true for iTVFs.  To be clear as to what I mean, while it may look like an iTVF is working as if it were being called for every row, it works more like a view and isn't actually called once for each row.  That's a part of what makes  iTVFs to much faster than even an equivalent mTVF.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden - Monday, January 14, 2019 5:54 PM

    Jonathan AC Roberts - Monday, January 14, 2019 10:35 AM

    CROSS APPLY is a row level operator. So your table_valued_function will be run once for each row the rest of the query returns.
    So my guess is that the query:
    select *
    from (select distinct pidc.[Line No_]
            from table1 pid
            join table2 pidc
              on pid.[Pidc Code] = pidc.Code
           where pid.Code = 'LA2013'
             and pid.[Line No_] = 10000) pidc

    will return over 70 rows.

    I'm pretty sure that's not true for iTVFs.  To be clear as to what I mean, while it may look like an iTVF is working as if it were being called for every row, it works more like a view and isn't actually called once for each row.  That's a part of what makes  iTVFs to much faster than even an equivalent mTVF.

    Jeff Moden - Monday, January 14, 2019 5:54 PM

    Jonathan AC Roberts - Monday, January 14, 2019 10:35 AM

    CROSS APPLY is a row level operator. So your table_valued_function will be run once for each row the rest of the query returns.
    So my guess is that the query:
    select *
    from (select distinct pidc.[Line No_]
            from table1 pid
            join table2 pidc
              on pid.[Pidc Code] = pidc.Code
           where pid.Code = 'LA2013'
             and pid.[Line No_] = 10000) pidc

    will return over 70 rows.

    I'm pretty sure that's not true for iTVFs.  To be clear as to what I mean, while it may look like an iTVF is working as if it were being called for every row, it works more like a view and isn't actually called once for each row.  That's a part of what makes  iTVFs to much faster than even an equivalent mTVF.

    It might not be called once for each row but the results are the same as if it were called once for each row. So it's a good way to think about what results you are going to get. And in this case it looks like the optimiser hasn't worked out a better way to do it.

  • Jonathan AC Roberts - Monday, January 14, 2019 7:36 PM

    Jeff Moden - Monday, January 14, 2019 5:54 PM

    Jonathan AC Roberts - Monday, January 14, 2019 10:35 AM

    CROSS APPLY is a row level operator. So your table_valued_function will be run once for each row the rest of the query returns.
    So my guess is that the query:
    select *
    from (select distinct pidc.[Line No_]
            from table1 pid
            join table2 pidc
              on pid.[Pidc Code] = pidc.Code
           where pid.Code = 'LA2013'
             and pid.[Line No_] = 10000) pidc

    will return over 70 rows.

    I'm pretty sure that's not true for iTVFs.  To be clear as to what I mean, while it may look like an iTVF is working as if it were being called for every row, it works more like a view and isn't actually called once for each row.  That's a part of what makes  iTVFs to much faster than even an equivalent mTVF.

    Jeff Moden - Monday, January 14, 2019 5:54 PM

    Jonathan AC Roberts - Monday, January 14, 2019 10:35 AM

    CROSS APPLY is a row level operator. So your table_valued_function will be run once for each row the rest of the query returns.
    So my guess is that the query:
    select *
    from (select distinct pidc.[Line No_]
            from table1 pid
            join table2 pidc
              on pid.[Pidc Code] = pidc.Code
           where pid.Code = 'LA2013'
             and pid.[Line No_] = 10000) pidc

    will return over 70 rows.

    I'm pretty sure that's not true for iTVFs.  To be clear as to what I mean, while it may look like an iTVF is working as if it were being called for every row, it works more like a view and isn't actually called once for each row.  That's a part of what makes  iTVFs to much faster than even an equivalent mTVF.

    It might not be called once for each row but the results are the same as if it were called once for each row. So it's a good way to think about what results you are going to get. And in this case it looks like the optimiser hasn't worked out a better way to do it.

    I just don't want people walking away from this thinking that iTVFs are as bad as a Scalar UDF or and mTVF.  For me, I think of a CROSS APPLY to an iTVF is more like joining to a view but with a lot more control.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply