where fldValue=123 vs join to table with one row

Question

where fldValue=123 vs join to table with one row

souLTower

SSCarpal Tunnel

Points: 4026
More actions
January 23, 2013 at 6:50 am

#269713

I have a huge DB and a view that pulls records from multiple tables, some as big as 2M records. I have great indexes. I can query: select from myView where fldValue = 123.
It returns 3K records in .3 seconds. Happy. Now I need to return data for multiple fldValue values.
Select from myView where fldValue IN (123,456,789)
Works great. However this stinks:
Select from myView V inner join myTemp T ON V.fldValue = T.fldValue
So for a test, I reduced the temp table to one record. Still falls on its face. I have a few whacky solutions. My question is, why would this happen and is there a non whacky way to make the join to the temp table perform correctly?
Thanks
ST

Viewing 15 posts - 1 through 15 (of 19 total)

You must be logged in to reply to this topic. Login to reply

Roland Alexander STL SSCrazy Points: 2915 More actions · Answer 1

souLTower (1/23/2013)
I have a huge DB and a view that pulls records from multiple tables, some as big as 2M records. I have great indexes. I can query: select from myView where fldValue = 123.
It returns 3K records in .3 seconds. Happy. Now I need to return data for multiple fldValue values.
Select from myView where fldValue IN (123,456,789)
Works great. However this stinks:
Select from myView V inner join myTemp T ON V.fldValue = T.fldValue
So for a test, I reduced the temp table to one record. Still falls on its face. I have a few whacky solutions. My question is, why would this happen and is there a non whacky way to make the join to the temp table perform correctly?
Thanks
ST

What's the DDL for that temp table? Do you have the same PK and indexing on the temp table as you do on the tables underlying the view? Or are you leaving the temp table in a heap?

Roland Alexander
The Monday Morning DBA
There are two means of refuge from the miseries of life: music and cats. ~ Albert Schweitzer

Nils Gustav Stråbø SSChampion Points: 11259 More actions · Answer 2

Can you post the actual execution plan for both queries here?

A wild guess is that parallell execution with hash match or sort is involved, and that the distribution of processed rows between each thread is so skewed that some threads run out of memory to store the hash buckets and therefore spilling them to tempdb.

Look for parallellism operators in the actual exec plan, and see if number of rows processed by each thread is very skewed. Also run a profiler trace on your spid with Sort Warnings and Hash Warnings included when executing the query.

souLTower SSCarpal Tunnel Points: 4026 More actions · Answer 3

The execution plans are pretty big. I'm not sure how to post them.

I get completely different execution plans between fldValue=123 and fldValue IN (select value from tempTable). I know that the optimizer is a mystery but to me, I'm asking for the same thing. It boils down to:

A) where fldValue=CONSTANT

B) where fldValue IN (CONSTANT, CONSTANT2, etc)

C) where fldValue IN (Select from table with 1 row)

Execution plan and performace for A and B is the same. Very fast.

C is terrible

Is there a hint or something I can try? I've tried adding an index to the temp table, using a table variable, no joy. I'm left with:

Create the IN () list as a SQL variable and run the query dynamically.

Query the temp table, get the value I want row by row, query the main query for each value and put the results in a temp table. Then query the temp table.

All of those ideas seem stupid to me.

Thanks for any additional thoughts.

ST

Roland Alexander STL SSCrazy Points: 2915 More actions · Answer 4

souLTower (1/23/2013)
The execution plans are pretty big. I'm not sure how to post them.
I get completely different execution plans between fldValue=123 and fldValue IN (select value from tempTable). I know that the optimizer is a mystery but to me, I'm asking for the same thing. It boils down to:
A) where fldValue=CONSTANT
B) where fldValue IN (CONSTANT, CONSTANT2, etc)
C) where fldValue IN (Select from table with 1 row)
Execution plan and performace for A and B is the same. Very fast.
C is terrible
Is there a hint or something I can try? I've tried adding an index to the temp table, using a table variable, no joy. I'm left with:
Create the IN () list as a SQL variable and run the query dynamically.
Query the temp table, get the value I want row by row, query the main query for each value and put the results in a temp table. Then query the temp table.
All of those ideas seem stupid to me.
Thanks for any additional thoughts.
ST

Still like to know how that temp table is organized...

Roland Alexander
The Monday Morning DBA
There are two means of refuge from the miseries of life: music and cats. ~ Albert Schweitzer

souLTower SSCarpal Tunnel Points: 4026 More actions · Answer 5

Temp table is a heap with 1 row, 2 columns of int, int.

Looking for how to post an execution plan.

souLTower SSCarpal Tunnel Points: 4026 More actions · Answer 6

souLTower

SSCarpal Tunnel

Points: 4026

January 23, 2013 at 9:09 am

#1580157

Uploaded the two execution plans.

Thanks!

ST

souLTower SSCarpal Tunnel Points: 4026 More actions · Answer 7

Well, I can see the difference. I'm interested in knowing if there's a way to improve it.

When I use a constant in my query, the data is filtered at the first query in the execution plan. This greatly reduces the records upstream. When I use IN (temp), the records are filtered about halfway through the execution.

ScottPletcher SSC Guru Points: 100940 More actions · Answer 8

You need to get rid of the massive GROUP BY; you should be able to do the SUM() in a subquery rather than the main query, but I can't be sure which columns come from which views/tables so I can't rewrite any part of the query now.

Until then, at least GROUP BY the ORDER BY columns first, to reduce the sorting required to satisfy the query, viz:

GROUP BY fldConsulNo, fldCCGroupCd, fldCCSortOrder,

RC.fldCCCd, fldSubCCNo,

fldCCGroupDesc,

fldConsulName,

tblConsul.fldConsulKey,

fldCCDesc, fldSubCCName, v.fldCostCenterKey

ORDER BY fldConsulNo, fldCCGroupCd, fldCCSortOrder, fldCCCd , fldSubCCNo

SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

souLTower SSCarpal Tunnel Points: 4026 More actions · Answer 9

That helps but still doesn't explain why the

where IN (sub select)

has such a different execution plan than

WHERE IN (constants)

What I've done to improve the performance (and I'm not really proud of) is to loop over the list of values, query the main view into a temp table using WHERE fldValue = syntax, then query the temp table for the final output. I tried changing indexes, join hints, lots of things but this worked best and was extensible whether querying against 1 value or 100.

Sergiy SSC Guru Points: 110206 More actions · Answer 10

souLTower (1/24/2013)
That helps but still doesn't explain why the
where IN (sub select)
has such a different execution plan than
WHERE IN (constants)
What I've done to improve the performance (and I'm not really proud of) is to loop over the list of values, query the main view into a temp table using WHERE fldValue = syntax, then query the temp table for the final output. I tried changing indexes, join hints, lots of things but this worked best and was extensible whether querying against 1 value or 100.

When you operate with constants SQL Server builds an execution plan considering the actual values assigned to the constants.

When it selects it from the table it jhas no idea what kind of values are in there (because the table is a heap) and it has to build more "pessimistic", but more universal plan, which might be not that good in this particular case but will perform better in average in other cases.

Add an index to your lookup table. It should help the optimizer in building more effective plan considering the current data in the table.

_____________
Code for TallyGenerator

Sean Pearce SSCoach Points: 15999 More actions · Answer 11

I have downloaded the plans and they are exactly the same. I downloaded them twice in case of PEBKAC issues.

The SQL Guy @ blogspot[/url]

@SeanPearceSQL

About Me[/url]

souLTower SSCarpal Tunnel Points: 4026 More actions · Answer 12

Sorry, I just realized that the fast and slow executions are included in each file. Compare the top one to the bottom one. You can see in the fast one that the execution plan starts by using "Where tblConsul.fldPlanKey=30472"

In the slower plan, the index on the temp table (IDX_TMP) isn't introduced until midway through the execution. This is after 7+ million rows have been queried to the right of this execution point.

So my question, which I understand may not be able to be answered, is why is a different table used to determine the filter in the two different cases AND, is there a way to prevent it or to let the optimizer know that there is a more efficient strategy?

ST

Sean Pearce SSCoach Points: 15999 More actions · Answer 13

My impression is that your query is too complicated and the query optimizer has no chance of compiling the most efficient plan. You can see this in the execution plan you posted.

StatementOptmEarlyAbortReason="TimeOut"

This means that the optimizer timed out while evaluating plans and occurs in both of your queries.

You mention that you want to know the difference between

Select from myView where fldValue IN (123,456,789)

and

Select from myView V inner join myTemp T ON V.fldValue = T.fldValue

However your queries are so complicated it is preventing you from comparing an apple with an apple.

If I were you I wouldn't be at all interested in the above comparison, but rather in breaking up the query to give the optimizer a better chance.

Most alarming to me is MemoryGrant="25184" for query 1 and MemoryGrant="529712" for query 2.

http://blogs.msdn.com/b/sqlqueryprocessing/archive/2010/02/16/understanding-sql-server-memory-grant.aspx

Query 1 has 9 hash matches while query 2 has 14 hash matches.

The SQL Guy @ blogspot[/url]

@SeanPearceSQL

About Me[/url]

Sean Pearce SSCoach Points: 15999 More actions · Answer 14

Can you tell me how the following two queries compare please.

PLEASE NOTE: You have not aliased all your columns so I have no idea which tables they come from. You must edit the following queries to remove any non-aliased columns that are not from vwREPQICASSCostsByCC.

SELECT

fldConsulNo,

fldConsulName,

fldCCGroupCd,

fldCCGroupDesc,

fldCCSortOrder,

fldCCDesc,

fldSubCCNo,

fldSubCCName,

v.fldCostCenterKey,

fldAmount

v.fldPlanKey

v.fldCostCenterKey

v.fldConsulKey

v.fldSubCostCenterKey

FROM

vwREPQICASSCostsByCC v

WHERE

v.fldStageKey = 4

AND v.fldPLanKey IN (30472);

SELECT

fldConsulNo,

fldConsulName,

fldCCGroupCd,

fldCCGroupDesc,

fldCCSortOrder,

fldCCDesc,

fldSubCCNo,

fldSubCCName,

v.fldCostCenterKey,

fldAmount

v.fldPlanKey

v.fldCostCenterKey

v.fldConsulKey

v.fldSubCostCenterKey

FROM

vwREPQICASSCostsByCC v

WHERE

v.fldStageKey = 4

ANDv.fldPLanKey IN (SELECT X.fldIMPPlanKey FROM #tmpPlan X);

The SQL Guy @ blogspot[/url]

@SeanPearceSQL

About Me[/url]