where fldValue=123 vs join to table with one row

  • I have a huge DB and a view that pulls records from multiple tables, some as big as 2M records. I have great indexes. I can query: select from myView where fldValue = 123.

    It returns 3K records in .3 seconds. Happy. Now I need to return data for multiple fldValue values.

    Select from myView where fldValue IN (123,456,789)

    Works great. However this stinks:

    Select from myView V inner join myTemp T ON V.fldValue = T.fldValue

    So for a test, I reduced the temp table to one record. Still falls on its face. I have a few whacky solutions. My question is, why would this happen and is there a non whacky way to make the join to the temp table perform correctly?

    Thanks

    ST

  • souLTower (1/23/2013)


    I have a huge DB and a view that pulls records from multiple tables, some as big as 2M records. I have great indexes. I can query: select from myView where fldValue = 123.

    It returns 3K records in .3 seconds. Happy. Now I need to return data for multiple fldValue values.

    Select from myView where fldValue IN (123,456,789)

    Works great. However this stinks:

    Select from myView V inner join myTemp T ON V.fldValue = T.fldValue

    So for a test, I reduced the temp table to one record. Still falls on its face. I have a few whacky solutions. My question is, why would this happen and is there a non whacky way to make the join to the temp table perform correctly?

    Thanks

    ST

    What's the DDL for that temp table? Do you have the same PK and indexing on the temp table as you do on the tables underlying the view? Or are you leaving the temp table in a heap?

    Roland Alexander 
    The Monday Morning DBA 
    There are two means of refuge from the miseries of life: music and cats. ~ Albert Schweitzer

  • Can you post the actual execution plan for both queries here?

    A wild guess is that parallell execution with hash match or sort is involved, and that the distribution of processed rows between each thread is so skewed that some threads run out of memory to store the hash buckets and therefore spilling them to tempdb.

    Look for parallellism operators in the actual exec plan, and see if number of rows processed by each thread is very skewed. Also run a profiler trace on your spid with Sort Warnings and Hash Warnings included when executing the query.

  • The execution plans are pretty big. I'm not sure how to post them.

    I get completely different execution plans between fldValue=123 and fldValue IN (select value from tempTable). I know that the optimizer is a mystery but to me, I'm asking for the same thing. It boils down to:

    A) where fldValue=CONSTANT

    B) where fldValue IN (CONSTANT, CONSTANT2, etc)

    C) where fldValue IN (Select from table with 1 row)

    Execution plan and performace for A and B is the same. Very fast.

    C is terrible

    Is there a hint or something I can try? I've tried adding an index to the temp table, using a table variable, no joy. I'm left with:

    Create the IN () list as a SQL variable and run the query dynamically.

    Query the temp table, get the value I want row by row, query the main query for each value and put the results in a temp table. Then query the temp table.

    All of those ideas seem stupid to me.

    Thanks for any additional thoughts.

    ST

  • souLTower (1/23/2013)


    The execution plans are pretty big. I'm not sure how to post them.

    I get completely different execution plans between fldValue=123 and fldValue IN (select value from tempTable). I know that the optimizer is a mystery but to me, I'm asking for the same thing. It boils down to:

    A) where fldValue=CONSTANT

    B) where fldValue IN (CONSTANT, CONSTANT2, etc)

    C) where fldValue IN (Select from table with 1 row)

    Execution plan and performace for A and B is the same. Very fast.

    C is terrible

    Is there a hint or something I can try? I've tried adding an index to the temp table, using a table variable, no joy. I'm left with:

    Create the IN () list as a SQL variable and run the query dynamically.

    Query the temp table, get the value I want row by row, query the main query for each value and put the results in a temp table. Then query the temp table.

    All of those ideas seem stupid to me.

    Thanks for any additional thoughts.

    ST

    Still like to know how that temp table is organized...

    Roland Alexander 
    The Monday Morning DBA 
    There are two means of refuge from the miseries of life: music and cats. ~ Albert Schweitzer

  • Temp table is a heap with 1 row, 2 columns of int, int.

    Looking for how to post an execution plan.

  • Uploaded the two execution plans.

    Thanks!

    ST

  • Well, I can see the difference. I'm interested in knowing if there's a way to improve it.

    When I use a constant in my query, the data is filtered at the first query in the execution plan. This greatly reduces the records upstream. When I use IN (temp), the records are filtered about halfway through the execution.

  • You need to get rid of the massive GROUP BY; you should be able to do the SUM() in a subquery rather than the main query, but I can't be sure which columns come from which views/tables so I can't rewrite any part of the query now.

    Until then, at least GROUP BY the ORDER BY columns first, to reduce the sorting required to satisfy the query, viz:

    GROUP BY fldConsulNo, fldCCGroupCd, fldCCSortOrder,

    RC.fldCCCd, fldSubCCNo,

    fldCCGroupDesc,

    fldConsulName,

    tblConsul.fldConsulKey,

    fldCCDesc, fldSubCCName, v.fldCostCenterKey

    ORDER BY fldConsulNo, fldCCGroupCd, fldCCSortOrder, fldCCCd , fldSubCCNo

    SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

  • That helps but still doesn't explain why the

    where IN (sub select)

    has such a different execution plan than

    WHERE IN (constants)

    What I've done to improve the performance (and I'm not really proud of) is to loop over the list of values, query the main view into a temp table using WHERE fldValue = syntax, then query the temp table for the final output. I tried changing indexes, join hints, lots of things but this worked best and was extensible whether querying against 1 value or 100.

  • souLTower (1/24/2013)


    That helps but still doesn't explain why the

    where IN (sub select)

    has such a different execution plan than

    WHERE IN (constants)

    What I've done to improve the performance (and I'm not really proud of) is to loop over the list of values, query the main view into a temp table using WHERE fldValue = syntax, then query the temp table for the final output. I tried changing indexes, join hints, lots of things but this worked best and was extensible whether querying against 1 value or 100.

    When you operate with constants SQL Server builds an execution plan considering the actual values assigned to the constants.

    When it selects it from the table it jhas no idea what kind of values are in there (because the table is a heap) and it has to build more "pessimistic", but more universal plan, which might be not that good in this particular case but will perform better in average in other cases.

    Add an index to your lookup table. It should help the optimizer in building more effective plan considering the current data in the table.

    _____________
    Code for TallyGenerator

  • I have downloaded the plans and they are exactly the same. I downloaded them twice in case of PEBKAC issues.

    The SQL Guy @ blogspot[/url]

    @SeanPearceSQL

    About Me[/url]

  • Sorry, I just realized that the fast and slow executions are included in each file. Compare the top one to the bottom one. You can see in the fast one that the execution plan starts by using "Where tblConsul.fldPlanKey=30472"

    In the slower plan, the index on the temp table (IDX_TMP) isn't introduced until midway through the execution. This is after 7+ million rows have been queried to the right of this execution point.

    So my question, which I understand may not be able to be answered, is why is a different table used to determine the filter in the two different cases AND, is there a way to prevent it or to let the optimizer know that there is a more efficient strategy?

    ST

  • My impression is that your query is too complicated and the query optimizer has no chance of compiling the most efficient plan. You can see this in the execution plan you posted.

    StatementOptmEarlyAbortReason="TimeOut"

    This means that the optimizer timed out while evaluating plans and occurs in both of your queries.

    You mention that you want to know the difference between

    Select from myView where fldValue IN (123,456,789)

    and

    Select from myView V inner join myTemp T ON V.fldValue = T.fldValue

    However your queries are so complicated it is preventing you from comparing an apple with an apple.

    If I were you I wouldn't be at all interested in the above comparison, but rather in breaking up the query to give the optimizer a better chance.

    Most alarming to me is MemoryGrant="25184" for query 1 and MemoryGrant="529712" for query 2.

    http://blogs.msdn.com/b/sqlqueryprocessing/archive/2010/02/16/understanding-sql-server-memory-grant.aspx

    Query 1 has 9 hash matches while query 2 has 14 hash matches.

    The SQL Guy @ blogspot[/url]

    @SeanPearceSQL

    About Me[/url]

  • Can you tell me how the following two queries compare please.

    PLEASE NOTE: You have not aliased all your columns so I have no idea which tables they come from. You must edit the following queries to remove any non-aliased columns that are not from vwREPQICASSCostsByCC.

    SELECT

    fldConsulNo,

    fldConsulName,

    fldCCGroupCd,

    fldCCGroupDesc,

    fldCCSortOrder,

    fldCCDesc,

    fldSubCCNo,

    fldSubCCName,

    v.fldCostCenterKey,

    fldAmount

    v.fldPlanKey

    v.fldCostCenterKey

    v.fldConsulKey

    v.fldSubCostCenterKey

    FROM

    vwREPQICASSCostsByCC v

    WHERE

    v.fldStageKey = 4

    AND v.fldPLanKey IN (30472);

    SELECT

    fldConsulNo,

    fldConsulName,

    fldCCGroupCd,

    fldCCGroupDesc,

    fldCCSortOrder,

    fldCCDesc,

    fldSubCCNo,

    fldSubCCName,

    v.fldCostCenterKey,

    fldAmount

    v.fldPlanKey

    v.fldCostCenterKey

    v.fldConsulKey

    v.fldSubCostCenterKey

    FROM

    vwREPQICASSCostsByCC v

    WHERE

    v.fldStageKey = 4

    ANDv.fldPLanKey IN (SELECT X.fldIMPPlanKey FROM #tmpPlan X);

    The SQL Guy @ blogspot[/url]

    @SeanPearceSQL

    About Me[/url]

Viewing 15 posts - 1 through 15 (of 19 total)

You must be logged in to reply to this topic. Login to reply