Tuning slow query

  • LOOKUP_BI-756009

    SSCertifiable

    Points: 7286

    SELECT DISTINCT A.Status

    FROM dbo.Test1 A

    LEFT JOIN dbo.Test2 B

    ON A.Status= B.Status

    WHERE B.Status IS NULL

    AND A.Status IS NOT NULL

    Number of rows in dbo.Test1 A = 150 000

    Number of rows in dbo.Test2 B = 4500

    There are clusterd indexes on both table on ID column

    This query is taking 35 minutes.

    How can I make it faster

    I have tested the following

    1) Included column A.Status & B.Status on the clustered index -Time taken 34 minutes

    The query execution plan shows that the LEFT JOIN operation cost is 60% and the filter on B.Status IS NULL is 18%.These two task has the highest %

  • John Rowan

    SSC Guru

    Points: 56440

    For starters, I'd consider indexing the Status columns. Also, remove the Distinct and add a GROUP BY A.Status clause.

    John Rowan

    ======================================================
    ======================================================
    Forum Etiquette: How to post data/code on a forum to get the best help[/url] - by Jeff Moden

  • LOOKUP_BI-756009

    SSCertifiable

    Points: 7286

    Hi John,

    I did include column Status into clusterd Index for both table A & B.

    The Status column in table A has many duplicates, thats the reason for the distinct

  • John Rowan

    SSC Guru

    Points: 56440

    So the clustered index is a composit index? What other columns are in it?

    It would help if you could post the table DDL (full DDL w/ index defs too), a few rows of sample data, and most importantly the execution plan.

    John Rowan

    ======================================================
    ======================================================
    Forum Etiquette: How to post data/code on a forum to get the best help[/url] - by Jeff Moden

  • LOOKUP_BI-756009

    SSCertifiable

    Points: 7286

    John Rowan (12/9/2010)


    So the clustered index is a composit index? What other columns are in it?

    It would help if you could post the table DDL (full DDL w/ index defs too), a few rows of sample data, and most importantly the execution plan.

    Initially just the column ID was the clusterd index on both tables, but I had included column Status to the clustered index for my testing

    I will post the DDL shortly, meanwhile I tried this and it seemed to run much faster.Could you have a look at it and confirm if my logic is correct

    SELECT DISTINCT A.Status INTO #TEMP FROM dbo.Test1 A where A.Status IS NOT NULL

    SELECT C.Status from #TEMP C

    LEFT JOIN dbo.Test2 B

    ON C.Status= B.Status

    WHERE B.Status IS NULL

    IT took roughly 1 minute

  • John Rowan

    SSC Guru

    Points: 56440

    An execution plan here would help a bunch. You shouldn't have to bother with the temp table to make this work.

    The challenge with adding Status to the clustered index as a secondary column is that you are not using the first column in the index in the query. So if your clustered index is ID, Status and you are not using ID in the query, its not really taking advantage of the index like it could.

    The clustered index key is included in all non-clutered indexes, so creating a non-clustered index on the Status columns is probably the way to go here.

    John Rowan

    ======================================================
    ======================================================
    Forum Etiquette: How to post data/code on a forum to get the best help[/url] - by Jeff Moden

  • LOOKUP_BI-756009

    SSCertifiable

    Points: 7286

    I got your point John, let me give that a try first

  • WO

    Old Hand

    Points: 339

    I agree with John that the execution plan would be very informative, and that the temp table shouldn't be necessary.

    Have you tried rewriting the query using an exists clause instead of the left outer join? since the population in table test2 is relatively small i would expect this to give a better result

    something like:

    select distinct a.status

    from dbo.test1 a

    where a.status not in (select distinct b.status

    from dbo.test2 b)

    and a.status is not null

    Willem


  • Jeff Moden

    SSC Guru

    Points: 996622

    LOOKUP_BI-756009 (12/9/2010)


    SELECT DISTINCT A.Status

    FROM dbo.Test1 A

    LEFT JOIN dbo.Test2 B

    ON A.Status= B.Status

    WHERE B.Status IS NULL

    AND A.Status IS NOT NULL

    Number of rows in dbo.Test1 A = 150 000

    Number of rows in dbo.Test2 B = 4500

    There are clusterd indexes on both table on ID column

    This query is taking 35 minutes.

    [font="Arial Black"]How can I make it faster[/font]

    I have tested the following

    1) Included column A.Status & B.Status on the clustered index -Time taken 34 minutes

    The query execution plan shows that the LEFT JOIN operation cost is 60% and the filter on B.Status IS NULL is 18%.These two task has the highest %

    Just going by the names of the columns in the ON clause... you probably can't make this as fast as you'd like even with indexes. You've most likely made a many-to-many join which is really a Cartesian Join (ie: CROSS JOIN). You need additional JOIN criteria to make this problem go away.

    Of course, that's a total guess on my part because none of us actually know anything about your tables or their columns. My suggestion would be to read and heed the article at the second link in my signature line below.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".
    "If "pre-optimization" is the root of all evil, then what does the resulting no optimization lead to?"

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • TheSQLGuru

    SSC Guru

    Points: 134017

    I'm with Jeff on this, as usual. Show the estimated query plan and not the incredibly FAT join line. Put your mouse over it and see the telephone number that pops up for estimated rows.

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • LOOKUP_BI-756009

    SSCertifiable

    Points: 7286

    Yes, I tried John's response and did much more quicker. Thank You all.

  • Jeff Moden

    SSC Guru

    Points: 996622

    LOOKUP_BI-756009 (12/13/2010)


    Yes, I tried John's response and did much more quicker. Thank You all.

    Still, I believe you have a many-to-many join and a bit of extra criteria would help. If additional criteria is not possible, you may want to have a look at the overall design of the tables and data... a change may be in order. 😉

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".
    "If "pre-optimization" is the root of all evil, then what does the resulting no optimization lead to?"

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • 250128836

    Newbie

    Points: 3

    Please test Query SQL:

    SELECT DISTINCT A.Status

    FROM dbo.Test1 A

    LEFT JOIN dbo.Test2 B

    ON A.Status= B.Status

    WHERE A.Status IS NOT NULL

    AND B.Status IS NULL

    Because of Number of rows in dbo.Test1 A more than Number of rows in dbo.Test1 B

    😛

  • Malcolm Daughtree

    SSCrazy

    Points: 2373

    250128836 (12/15/2010)


    Please test Query SQL:

    SELECT DISTINCT A.Status

    FROM dbo.Test1 A

    LEFT JOIN dbo.Test2 B

    ON A.Status= B.Status

    WHERE A.Status IS NOT NULL

    AND B.Status IS NULL

    Because of Number of rows in dbo.Test1 A more than Number of rows in dbo.Test1 B

    😛

    Not being picky - well Ok I am but the above query makes no sence....

    as the query is internaly processed

    Step 1

    FROM dbo.Test1 A

    LEFT JOIN dbo.Test2 B ON A.Status = B.Status

    every thing a AND those IN b that has the same status AS A

    A LEFT JOIN will show every Test1 status and all the matching Test2.status records Nulls are not evaluated

    hence negating the need for the where clause becauce only B records with an equalling A Status will be returned.

    Left joins, right joins, and Inner joins are 'Exclusive' joins if you want to use the following where clause

    WHERE A.Status IS NOT NULL

    AND B.Status IS NULL

    Then you will need to use the inclusive OUTER join and then evaluate for nulls.

    Lastly remove the Distinct keyword and include instead a GROUP BY A.status because the Group by clause is actioned after the where clause and you are forcing SQL to double handle its internal grouping.

Viewing 14 posts - 1 through 14 (of 14 total)

You must be logged in to reply to this topic. Login to reply