Slow self join

  • I have a stored procedure which updates a table with the code below.

    It is rather slow (table with 360,000 rows).

    Any suggestions?

    Thanks,

    Julian

    -- ADD average FTE Hrs for the Week based table with daily data

    UPDATE DRPDATA

    SET FTEHrsAVG =

    (

    SELECTAVG(B.FTEHrs)

    FROMDRPDATA B

    WHEREA.MWID = B.MWID

    AND

    DATEPART(ISO_WEEK,A.BEGINDATUM)= DATEPART(ISO_WEEK,B.BEGINDATUM)

    AND

    A.[Org eenheid code] = B.[Org eenheid code]

    AND

    B.FlexVast = 'VAST'

    GROUP BYB.MWID,b.[Org eenheid code], DATEPART(ISO_WEEK,B.BEGINDATUM)

    )

    FROMDRPDATA A;

  • that looks like an accidental cross join to me.

    it explicitly says UPDATE DRPDATA, but since the FROM has the table name as being aliased,and the unaliased table is not referenced anywhere else, there is an implied cross join like this:

    UPDATE DRPDATA

    SET ....

    FROM DRPDATA A ,DRPDATA

    the UPDATE should be UPDATE ALIASNAME explicitly.

    -- ADD average FTE Hrs for the Week based table with daily data

    UPDATE A

    SET A.FTEHrsAVG =

    (

    SELECTAVG(B.FTEHrs)

    FROMDRPDATA B

    WHEREA.MWID = B.MWID

    AND

    DATEPART(ISO_WEEK,A.BEGINDATUM)= DATEPART(ISO_WEEK,B.BEGINDATUM)

    AND

    A.[Org eenheid code] = B.[Org eenheid code]

    AND

    B.FlexVast = 'VAST'

    GROUP BYB.MWID,b.[Org eenheid code], DATEPART(ISO_WEEK,B.BEGINDATUM)

    )

    FROMDRPDATA A;

    Lowell


    --help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

  • Some other things to consider:

    1. Do you have indices on the join columns? (MWID, [Org Eenheid Code], BEGINDATUM)

    2. If you have an index on BEGINDATUM, the expression

    DATEPART(ISO_WEEK,A.BEGINDATUM)= DATEPART(ISO_WEEK,B.BEGINDATUM)

    will stop SQL server from using that index (Since it's a function call it's not a SARG). To get around that, try to reformulate it as

    BEGINDATUM > <beginning of the week> AND BEGINDATUM < <beginning of following week>

  • In this case I would consider using a temporary table

    ๐Ÿ˜Ž

  • Hello Lowell,

    Thanks for your response.

    For some reason the original code is 15 seconds faster (2.25 min iso 2.5 min)....

    Cheers,

    Julian

  • The code is part of a stored procedure which bulk inserts data from a csv file, using a Staging table.

    Thanks for the hint. Job for later.

    Cheers,

    Julian

    gbritton1 (5/6/2014)


    Some other things to consider:

    1. Do you have indices on the join columns? (MWID, [Org Eenheid Code], BEGINDATUM)

    2. If you have an index on BEGINDATUM, the expression

    DATEPART(ISO_WEEK,A.BEGINDATUM)= DATEPART(ISO_WEEK,B.BEGINDATUM)

    will stop SQL server from using that index (Since it's a function call it's not a SARG). To get around that, try to reformulate it as

    BEGINDATUM > <beginning of the week> AND BEGINDATUM < <beginning of following week>

  • Without sample data to test against...

    Try this.

    ;WITH Updater AS (

    SELECT

    FTEHrsAVG,

    NewFTEHrsAVG = AVG(CASE WHEN FlexVast = 'VAST' THEN FTEHrs ELSE NULL END)

    OVER(PARTITION BY MWID, [Org eenheid code], DATEPART(ISO_WEEK,A.BEGINDATUM))

    FROM DRPDATA o

    WHERE EXISTS (

    SELECT 1 FROM DRPDATA i

    WHERE i.MWID = o.MWID

    AND DATEPART(ISO_WEEK,i.BEGINDATUM)= DATEPART(ISO_WEEK,o.BEGINDATUM)

    AND i.[Org eenheid code] = o.[Org eenheid code]

    AND i.FlexVast = 'VAST')

    )

    UPDATE Updater SET FTEHrsAVG = NewFTEHrsAVG

    You might not need the EXISTS check, it depends on what you want to do with unaffected partitions.

    If the number of affected rows is relatively high and unaffected partitions should have FTEHrsAVG set to NULL, then try without the EXISTS check - it should be very quick.


    [font="Arial"]Low-hanging fruit picker and defender of the moggies[/font]

    For better assistance in answering your questions, please read this[/url].


    Understanding and using APPLY, (I)[/url] and (II)[/url] Paul White[/url]

    Hidden RBAR: Triangular Joins[/url] / The "Numbers" or "Tally" Table: What it is and how it replaces a loop[/url] Jeff Moden[/url]

  • ChrisM@home (5/6/2014)


    Without sample data to test against...

    Try this.

    ;WITH Updater AS (

    SELECT

    FTEHrsAVG,

    NewFTEHrsAVG = AVG(CASE WHEN FlexVast = 'VAST' THEN FTEHrs ELSE NULL END)

    OVER(PARTITION BY MWID, [Org eenheid code], DATEPART(ISO_WEEK,A.BEGINDATUM))

    FROM DRPDATA o

    WHERE EXISTS (

    SELECT 1 FROM DRPDATA i

    WHERE i.MWID = o.MWID

    AND DATEPART(ISO_WEEK,i.BEGINDATUM)= DATEPART(ISO_WEEK,o.BEGINDATUM)

    AND i.[Org eenheid code] = o.[Org eenheid code]

    AND i.FlexVast = 'VAST')

    )

    UPDATE Updater SET FTEHrsAVG = NewFTEHrsAVG

    You might not need the EXISTS check, it depends on what you want to do with unaffected partitions.

    If the number of affected rows is relatively high and unaffected partitions should have FTEHrsAVG set to NULL, then try without the EXISTS check - it should be very quick.

    That is amazing. Would you perhaps have a reference to other examples of CTE usage like these?

  • Bouke Bruinsma (5/6/2014)


    ChrisM@home (5/6/2014)


    Without sample data to test against...

    Try this.

    ;WITH Updater AS (

    SELECT

    FTEHrsAVG,

    NewFTEHrsAVG = AVG(CASE WHEN FlexVast = 'VAST' THEN FTEHrs ELSE NULL END)

    OVER(PARTITION BY MWID, [Org eenheid code], DATEPART(ISO_WEEK,A.BEGINDATUM))

    FROM DRPDATA o

    WHERE EXISTS (

    SELECT 1 FROM DRPDATA i

    WHERE i.MWID = o.MWID

    AND DATEPART(ISO_WEEK,i.BEGINDATUM)= DATEPART(ISO_WEEK,o.BEGINDATUM)

    AND i.[Org eenheid code] = o.[Org eenheid code]

    AND i.FlexVast = 'VAST')

    )

    UPDATE Updater SET FTEHrsAVG = NewFTEHrsAVG

    You might not need the EXISTS check, it depends on what you want to do with unaffected partitions.

    If the number of affected rows is relatively high and unaffected partitions should have FTEHrsAVG set to NULL, then try without the EXISTS check - it should be very quick.

    That is amazing. Would you perhaps have a reference to other examples of CTE usage like these?

    Thanks. That's why I lurk here - to learn stuff like this. It's what I'd recommend to you too. When something interesting or new comes up, copy it and pull it apart locally to see how it works - and if you can improve on it.

    โ€œWrite the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.โ€ - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • I totally agree. I pick up something useful everytime I browse around the forums. And I could give quite a bit of examples of practical usage in my work of stuff I learned here.

    So yeah, i'll be lurking the forums.

  • Hello Chris,

    Thank you very much!

    It is ligthning fast (4 sec a.o.t. 1.5 min).

    Cheers,

    Julian

    ChrisM@home (5/6/2014)


    Without sample data to test against...

    Try this.

    ;WITH Updater AS (

    SELECT

    FTEHrsAVG,

    NewFTEHrsAVG = AVG(CASE WHEN FlexVast = 'VAST' THEN FTEHrs ELSE NULL END)

    OVER(PARTITION BY MWID, [Org eenheid code], DATEPART(ISO_WEEK,A.BEGINDATUM))

    FROM DRPDATA o

    WHERE EXISTS (

    SELECT 1 FROM DRPDATA i

    WHERE i.MWID = o.MWID

    AND DATEPART(ISO_WEEK,i.BEGINDATUM)= DATEPART(ISO_WEEK,o.BEGINDATUM)

    AND i.[Org eenheid code] = o.[Org eenheid code]

    AND i.FlexVast = 'VAST')

    )

    UPDATE Updater SET FTEHrsAVG = NewFTEHrsAVG

    You might not need the EXISTS check, it depends on what you want to do with unaffected partitions.

    If the number of affected rows is relatively high and unaffected partitions should have FTEHrsAVG set to NULL, then try without the EXISTS check - it should be very quick.

Viewing 11 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic. Login to reply