Performance optimization of UNIOn

  • Hi,

    i have query as shown below

    SELECT

    'D_Account' AS SourceName

    , 'Dimension' AS SourceType

    , Fact.Account_MemberId_Label AS MissingLabel

    , Fact.Datasource_MemberId_Label AS DataSource

    , Fact.Scenario_MemberId_Label AS Scenario

    , Sum(Fact.[Value]) AS [Value]

    FROM dbo.COI_OSB_WINDOWS_FactData Fact

    WHERE NOT EXISTS(SELECT D.Label FROM dbo.D_Account D

    WHERE Fact.Account_MemberId_Label = D.Label)

    AND Fact.Account_MemberId_Label != 'None'

    GROUP BY Fact.Account_MemberId_Label, Fact.Datasource_MemberId_Label, Fact.Scenario_MemberId_Label

    UNION

    SELECT

    'D_Allocation' AS SourceName

    , 'Dimension' AS SourceType

    , Fact.Allocation_MemberId_Label AS MissingLabel

    , Fact.Datasource_MemberId_Label AS DataSource

    , Fact.Scenario_MemberId_Label AS Scenario

    , Sum(Fact.[Value]) AS [Value]

    FROM dbo.COI_OSB_WINDOWS_FactData Fact

    WHERE NOT EXISTS(SELECT D.Label FROM D_Allocation D

    WHERE Fact.Allocation_MemberId_Label = D.Label)

    AND Fact.Allocation_MemberId_Label != 'None'

    GROUP BY Fact.Allocation_MemberId_Label, Fact.Datasource_MemberId_Label, Fact.Scenario_MemberId_Label

    this union is jus for 2 tables, but i run a query where there are around 24 such tables

    i just want to optimize this query as it take long time...

    kindly help me out in optimizing it.

    Regards

    Imtiaz

  • Maybe I'm missing something, but the two queries unioned look the same.

    Why do you need to union 24 results sets? There may be a better way.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • UNION ALL is faster than UNION. So, if u allow any duplicate data, then use UNION ALL.

    Also u may try to tune each query individually, like

    use WHERE NOT EXISTS(SELECT 1 FROM D_Allocation D...

    instead of WHERE NOT EXISTS(SELECT D.Label FROM D_Allocation D..

    Also grouping at the last of UNIONing 24 queries, not each individual queries.

  • Hi Gail,

    The inner query contains different tables

  • Hi Arup,

    wel can we do something instead of NOT EXIST

    Thanks for the grouping trick

  • EACH GROUPING HAS DIFFERENT COLUMN IN IT

  • mohd.imtiaz (12/8/2008)


    EACH GROUPING HAS DIFFERENT COLUMN IN IT

    No need to shout.

    Since there's no way that the resultsets can overlap (there's a literal string that's different in each one), use UNION ALL instead of union. Union means concatenate, sort resulting rows and eliminate duplicates. Union all just concatenates rows.

    It may still be slow though. What indexes do you have on the fact tables?

    Can you post the execution plan of one of the queries within the union? (saved as a .sqlplan file, zipped and attached)

    Also u may try to tune each query individually, like

    use WHERE NOT EXISTS(SELECT 1 FROM D_Allocation D...

    instead of WHERE NOT EXISTS(SELECT D.Label FROM D_Allocation D..

    There's no difference between those two. Exists does not do any data retrieval and ignores columns specified.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Why dont you make use of a variable/temporary table to put all your data into and then retrieve the data using select *......

    I assure you its more easy and better in performance compared to all that union your making use of...:)

  • Please don't type in all caps. It's the online equivalent of shouting at someone.

    From what I can see from the exec plan, you have no indexes at all on a table with 6 million rows. That's going to be slow no matter what. You needs some effective indexes if this is going to run any faster than it is now.

    Can you please post the table definition (create statement) and the definitions of the indexes on that table?

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • You've got a scan going against 6 million rows and returning nothing. That's a problem.

    I'd suggest checking the indexes on COI_AppDb. Also, the <> construct can cause problems with index usage.

    That's before you get to the UNION. Gail already made the best suggestion of all there. Since each query is already returning a unique result set, use UNION ALL.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • mohd.imtiaz (12/9/2008)


    Hi Gail,

    here is the table defination

    Index definitions?

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • HI Gail,

    Want 2 know something.

    "Also u may try to tune each query individually, like

    use WHERE NOT EXISTS(SELECT 1 FROM D_Allocation D...

    instead of WHERE NOT EXISTS(SELECT D.Label FROM D_Allocation D..

    There's no difference between those two. Exists does not do any data retrieval and ignores columns specified."

    I have a notion than "SELECT D.Label" will take more system resources than "SELECT 1". And since EXISTS works on boolean value, hence no difference between the two but the performance is better for "SELECT 1"

    I have actually tested it and SELECT 1 gives lesser duration.

  • HI Imtiaz,

    EXISTS is the fastest operator to check existence. It works on boolean value and whenever finds any TRUE it terminates. Also, it is not taking the NULL values.

  • arup_kc (12/10/2008)


    I have a notion than "SELECT D.Label" will take more system resources than "SELECT 1". And since EXISTS works on boolean value, hence no difference between the two but the performance is better for "SELECT 1"

    Exists ignored columns passed to it. There's no difference in the way it executes if a column is passed, or if a constant is passed. During the parsing phase, all columns are stripped out of an exists so by the time it gets to the optimiser, there's no columns listed.

    I have actually tested it and SELECT 1 gives lesser duration.

    Post your test?

    In the meantime, here's mine. I ran each one several times, the cpu times of all three vary from around 290 ms to about 422 ms and elapsed time from 1400 to 1600 ms.

    Setup first

    Create table #TestingExists1 (

    id int identity primary key,

    somestring char(2)

    )

    insert into #TestingExists1 (somestring)

    select top 1000000 CHAR(97+ FLOOR(RAND(a.number * 5412)*26)) + CHAR(97+ FLOOR(RAND(b.number * 3462)*26))

    from master..spt_values a cross join master..spt_values b

    where a.name is null and b.name is null

    select COUNT(distinct somestring) from #TestingExists1 -- 676 distinct values

    select distinct top 100 somestring

    into #testingExists2

    from #TestingExists1

    go

    And testing. Done on SQL 2008 Developer edition RTM x64

    set statistics io on

    go

    set statistics time on

    go

    -- Exists with a column

    select * from #TestingExists1 where exists (select somestring from #testingExists2 where #testingExists2.somestring = #TestingExists1.somestring) option (maxdop 1)

    /*

    Table '#testingExists2'. Scan count 1, logical reads 2

    Table '#TestingExists1'. Scan count 3, logical reads 1860

    Table 'Worktable'. Scan count 0, logical reads 0

    SQL Server Execution Times:

    CPU time = 328 ms, elapsed time = 1437 ms.

    */

    -- exists with *

    select * from #TestingExists1 where exists (select * from #testingExists2 where #testingExists2.somestring = #TestingExists1.somestring) option (maxdop 1)

    /*

    Table 'Worktable'. Scan count 0, logical reads 0

    Table '#TestingExists1'. Scan count 1, logical reads 1860, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table '#testingExists2'. Scan count 1, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 359 ms, elapsed time = 1426 ms.

    */

    -- exists with 1

    select * from #TestingExists1 where exists (select 1 from #testingExists2 where #testingExists2.somestring = #TestingExists1.somestring) option (maxdop 1)

    /*

    Table 'Worktable'. Scan count 0, logical reads 0

    Table '#TestingExists1'. Scan count 1, logical reads 1860, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table '#testingExists2'. Scan count 1, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 344 ms, elapsed time = 1433 ms.

    */

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • arup_kc (12/10/2008)


    HI Gail,

    Want 2 know something.

    "Also u may try to tune each query individually, like

    use WHERE NOT EXISTS(SELECT 1 FROM D_Allocation D...

    instead of WHERE NOT EXISTS(SELECT D.Label FROM D_Allocation D..

    There's no difference between those two. Exists does not do any data retrieval and ignores columns specified."

    I have a notion than "SELECT D.Label" will take more system resources than "SELECT 1". And since EXISTS works on boolean value, hence no difference between the two but the performance is better for "SELECT 1"

    I have actually tested it and SELECT 1 gives lesser duration.

    If you check the query plan for all versions (using a constant such as 1, using a star (*) or using a column name), you'll find out that it has the exact same query plan. Since the query is being performed the exact same way each time, and each time it will pass to the client the same amount of data, I don't see how come one version is faster then the other. Can you explain how did you test it?

    Adi

    --------------------------------------------------------------
    To know how to ask questions and increase the chances of getting asnwers:
    http://www.sqlservercentral.com/articles/Best+Practices/61537/

    For better answers on performance questions, click on the following...
    http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

Viewing 15 posts - 1 through 15 (of 29 total)

You must be logged in to reply to this topic. Login to reply