Performance optimization of UNIOn

Question

Post reply

Performance optimization of UNIOn

Mohammed Imtiaz

SSC Veteran

Points: 254
More actions
December 8, 2008 at 1:32 am

#216175

Hi,
i have query as shown below
SELECT
'D_Account' AS SourceName
, 'Dimension' AS SourceType
, Fact.Account_MemberId_Label AS MissingLabel
, Fact.Datasource_MemberId_Label AS DataSource
, Fact.Scenario_MemberId_Label AS Scenario
, Sum(Fact.[Value]) AS [Value]
FROM dbo.COI_OSB_WINDOWS_FactData Fact
WHERE NOT EXISTS(SELECT D.Label FROM dbo.D_Account D
WHERE Fact.Account_MemberId_Label = D.Label)
AND Fact.Account_MemberId_Label != 'None'
GROUP BY Fact.Account_MemberId_Label, Fact.Datasource_MemberId_Label, Fact.Scenario_MemberId_Label
UNION
SELECT
'D_Allocation' AS SourceName
, 'Dimension' AS SourceType
, Fact.Allocation_MemberId_Label AS MissingLabel
, Fact.Datasource_MemberId_Label AS DataSource
, Fact.Scenario_MemberId_Label AS Scenario
, Sum(Fact.[Value]) AS [Value]
FROM dbo.COI_OSB_WINDOWS_FactData Fact
WHERE NOT EXISTS(SELECT D.Label FROM D_Allocation D
WHERE Fact.Allocation_MemberId_Label = D.Label)
AND Fact.Allocation_MemberId_Label != 'None'
GROUP BY Fact.Allocation_MemberId_Label, Fact.Datasource_MemberId_Label, Fact.Scenario_MemberId_Label
this union is jus for 2 tables, but i run a query where there are around 24 such tables
i just want to optimize this query as it take long time...
kindly help me out in optimizing it.
Regards
Imtiaz

Viewing 15 posts - 1 through 15 (of 29 total)

You must be logged in to reply to this topic. Login to reply

Gail Shaw SSC Guru Points: 1004504 More actions · Answer 1

Maybe I'm missing something, but the two queries unioned look the same.

Why do you need to union 24 results sets? There may be a better way.

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

arup chakraborty Hall of Fame Points: 3640 More actions · Answer 2

UNION ALL is faster than UNION. So, if u allow any duplicate data, then use UNION ALL.

Also u may try to tune each query individually, like

use WHERE NOT EXISTS(SELECT 1 FROM D_Allocation D...

instead of WHERE NOT EXISTS(SELECT D.Label FROM D_Allocation D..

Also grouping at the last of UNIONing 24 queries, not each individual queries.

Mohammed Imtiaz SSC Veteran Points: 254 More actions · Answer 3

Mohammed Imtiaz

SSC Veteran

Points: 254

December 8, 2008 at 2:10 am

#909028

Hi Gail,

The inner query contains different tables

Mohammed Imtiaz SSC Veteran Points: 254 More actions · Answer 4

Hi Arup,

wel can we do something instead of NOT EXIST

Thanks for the grouping trick

Mohammed Imtiaz SSC Veteran Points: 254 More actions · Answer 5

Mohammed Imtiaz

SSC Veteran

Points: 254

December 8, 2008 at 2:30 am

#909037

EACH GROUPING HAS DIFFERENT COLUMN IN IT

Gail Shaw SSC Guru Points: 1004504 More actions · Answer 6

mohd.imtiaz (12/8/2008)
EACH GROUPING HAS DIFFERENT COLUMN IN IT

No need to shout.

Since there's no way that the resultsets can overlap (there's a literal string that's different in each one), use UNION ALL instead of union. Union means concatenate, sort resulting rows and eliminate duplicates. Union all just concatenates rows.

It may still be slow though. What indexes do you have on the fact tables?

Can you post the execution plan of one of the queries within the union? (saved as a .sqlplan file, zipped and attached)

Also u may try to tune each query individually, like
use WHERE NOT EXISTS(SELECT 1 FROM D_Allocation D...
instead of WHERE NOT EXISTS(SELECT D.Label FROM D_Allocation D..

There's no difference between those two. Exists does not do any data retrieval and ignores columns specified.

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

Linson.Daniel Ten Centuries Points: 1153 More actions · Answer 7

Why dont you make use of a variable/temporary table to put all your data into and then retrieve the data using select *......

I assure you its more easy and better in performance compared to all that union your making use of...:)

Gail Shaw SSC Guru Points: 1004504 More actions · Answer 8

Please don't type in all caps. It's the online equivalent of shouting at someone.

From what I can see from the exec plan, you have no indexes at all on a table with 6 million rows. That's going to be slow no matter what. You needs some effective indexes if this is going to run any faster than it is now.

Can you please post the table definition (create statement) and the definitions of the indexes on that table?

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

Grant Fritchey SSC Guru Points: 398716 More actions · Answer 9

You've got a scan going against 6 million rows and returning nothing. That's a problem.

I'd suggest checking the indexes on COI_AppDb. Also, the <> construct can cause problems with index usage.

That's before you get to the UNION. Gail already made the best suggestion of all there. Since each query is already returning a unique result set, use UNION ALL.

"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt

Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning

Gail Shaw SSC Guru Points: 1004504 More actions · Answer 10

mohd.imtiaz (12/9/2008)
Hi Gail,
here is the table defination

Index definitions?

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

arup chakraborty Hall of Fame Points: 3640 More actions · Answer 11

HI Gail,

Want 2 know something.

"Also u may try to tune each query individually, like

use WHERE NOT EXISTS(SELECT 1 FROM D_Allocation D...

instead of WHERE NOT EXISTS(SELECT D.Label FROM D_Allocation D..

There's no difference between those two. Exists does not do any data retrieval and ignores columns specified."

I have a notion than "SELECT D.Label" will take more system resources than "SELECT 1". And since EXISTS works on boolean value, hence no difference between the two but the performance is better for "SELECT 1"

I have actually tested it and SELECT 1 gives lesser duration.

arup chakraborty Hall of Fame Points: 3640 More actions · Answer 12

HI Imtiaz,

EXISTS is the fastest operator to check existence. It works on boolean value and whenever finds any TRUE it terminates. Also, it is not taking the NULL values.

Gail Shaw SSC Guru Points: 1004504 More actions · Answer 13

arup_kc (12/10/2008)
I have a notion than "SELECT D.Label" will take more system resources than "SELECT 1". And since EXISTS works on boolean value, hence no difference between the two but the performance is better for "SELECT 1"

Exists ignored columns passed to it. There's no difference in the way it executes if a column is passed, or if a constant is passed. During the parsing phase, all columns are stripped out of an exists so by the time it gets to the optimiser, there's no columns listed.

I have actually tested it and SELECT 1 gives lesser duration.

Post your test?

In the meantime, here's mine. I ran each one several times, the cpu times of all three vary from around 290 ms to about 422 ms and elapsed time from 1400 to 1600 ms.

Setup first

Create table #TestingExists1 (

id int identity primary key,

somestring char(2)

)

insert into #TestingExists1 (somestring)

select top 1000000 CHAR(97+ FLOOR(RAND(a.number * 5412)*26)) + CHAR(97+ FLOOR(RAND(b.number * 3462)*26))

from master..spt_values a cross join master..spt_values b

where a.name is null and b.name is null

select COUNT(distinct somestring) from #TestingExists1 -- 676 distinct values

select distinct top 100 somestring

into #testingExists2

from #TestingExists1

go

And testing. Done on SQL 2008 Developer edition RTM x64

set statistics io on

go

set statistics time on

go

-- Exists with a column

select * from #TestingExists1 where exists (select somestring from #testingExists2 where #testingExists2.somestring = #TestingExists1.somestring) option (maxdop 1)

/*

Table '#testingExists2'. Scan count 1, logical reads 2

Table '#TestingExists1'. Scan count 3, logical reads 1860

Table 'Worktable'. Scan count 0, logical reads 0

SQL Server Execution Times:

CPU time = 328 ms, elapsed time = 1437 ms.

*/

-- exists with *

select * from #TestingExists1 where exists (select * from #testingExists2 where #testingExists2.somestring = #TestingExists1.somestring) option (maxdop 1)

/*

Table 'Worktable'. Scan count 0, logical reads 0

Table '#TestingExists1'. Scan count 1, logical reads 1860, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table '#testingExists2'. Scan count 1, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:

CPU time = 359 ms, elapsed time = 1426 ms.

*/

-- exists with 1

select * from #TestingExists1 where exists (select 1 from #testingExists2 where #testingExists2.somestring = #TestingExists1.somestring) option (maxdop 1)

/*

Table 'Worktable'. Scan count 0, logical reads 0

Table '#TestingExists1'. Scan count 1, logical reads 1860, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table '#testingExists2'. Scan count 1, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:

CPU time = 344 ms, elapsed time = 1433 ms.

*/

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

Adi Cohn-120898 SSC-Dedicated Points: 33944 More actions · Answer 14

arup_kc (12/10/2008)
HI Gail,
Want 2 know something.
"Also u may try to tune each query individually, like
use WHERE NOT EXISTS(SELECT 1 FROM D_Allocation D...
instead of WHERE NOT EXISTS(SELECT D.Label FROM D_Allocation D..
There's no difference between those two. Exists does not do any data retrieval and ignores columns specified."
I have a notion than "SELECT D.Label" will take more system resources than "SELECT 1". And since EXISTS works on boolean value, hence no difference between the two but the performance is better for "SELECT 1"
I have actually tested it and SELECT 1 gives lesser duration.

If you check the query plan for all versions (using a constant such as 1, using a star (*) or using a column name), you'll find out that it has the exact same query plan. Since the query is being performed the exact same way each time, and each time it will pass to the client the same amount of data, I don't see how come one version is faster then the other. Can you explain how did you test it?

Adi

--------------------------------------------------------------
To know how to ask questions and increase the chances of getting asnwers:
http://www.sqlservercentral.com/articles/Best+Practices/61537/

For better answers on performance questions, click on the following...
http://www.sqlservercentral.com/articles/SQLServerCentral/66909/