query performance tuning

  • Hello! I am running a query which is taking 23 mins to run in my dev server. I need your expert advice on how to improve the performance. I can change the architecture a little bit as it is still in dev environment.

    declare

    @market varchar(5),

    @EIA_plant_id int = 999999,

    @EIA_unit_id varchar(16) = '0000',

    @data_type int,

    @startdate datetime,

    @enddate datetime,

    @starthour int,

    @endhour int

    set @market = N'DA',

    set @data_type = 0

    set @startdate = N'2015-12-01'

    set @enddate = N'2015-12-10'

    set @starthour = 0

    set @endhour = 23

    -- DA Raw

    IF @market = 'DA' and @data_type = 0

    BEGIN

    SELECT

    --CASE WHEN max(gb.bidding_group_id) is not null or @EIA_unit_id != '0000' THEN min(gu.unit_id) ELSE max(gu.unit_id) END as unit_id,

    CASE WHEN max(gb.bidding_group_id) is null or @EIA_unit_id != '0000' THEN max(gu.unit_name) ELSE max(gb.bidding_group_name) END as unit_name

    ,max(gu.[EIA_plant_code]) as EIA_plant_ID

    ,CASE WHEN max(gb.bidding_group_id) is null or @EIA_unit_id != '0000' THEN max(gu.eia_unit_id) ELSE NULL END as EIA_unit_ID,

    max(br.Region) as region,

    --max(br.[Owner Code]) as owner_code,

    --br.[Unit Code] as bid_code_id,

    max(br.[Unit Type]) as unit_type_code,

    br.[Date Time Beginning (est)] AS [date_beginning_EST],

    max(br.price1) AS bid1,

    max(br.price2) AS bid2,

    max(br.price3) AS bid3,

    max(br.price4) AS bid4,

    max(br.price5) AS bid5,

    max(br.price6) AS bid6,

    max(br.price7) AS bid7,

    max(br.price8) AS bid8,

    max(br.price9) AS bid9,

    max(br.price10) AS bid10,

    CASE WHEN max(gb.bidding_group_id) is null or @EIA_unit_id != '0000' THEN convert(decimal(6,2),max(uGen.[weight] * br.mw1)) ELSE max(br.mw1) END AS MW1,

    CASE WHEN max(gb.bidding_group_id) is null or @EIA_unit_id != '0000' THEN convert(decimal(6,2),max(uGen.[weight] * br.mw2)) ELSE max(br.mw2) END AS MW2,

    CASE WHEN max(gb.bidding_group_id) is null or @EIA_unit_id != '0000' THEN convert(decimal(6,2),max(uGen.[weight] * br.mw3)) ELSE max(br.mw3) END AS MW3,

    CASE WHEN max(gb.bidding_group_id) is null or @EIA_unit_id != '0000' THEN convert(decimal(6,2),max(uGen.[weight] * br.mw4)) ELSE max(br.mw4) END AS MW4,

    CASE WHEN max(gb.bidding_group_id) is null or @EIA_unit_id != '0000' THEN convert(decimal(6,2),max(uGen.[weight] * br.mw5)) ELSE max(br.mw5) END AS MW5,

    CASE WHEN max(gb.bidding_group_id) is null or @EIA_unit_id != '0000' THEN convert(decimal(6,2),max(uGen.[weight] * br.mw6)) ELSE max(br.mw6) END AS MW6,

    CASE WHEN max(gb.bidding_group_id) is null or @EIA_unit_id != '0000' THEN convert(decimal(6,2),max(uGen.[weight] * br.mw7)) ELSE max(br.mw7) END AS MW7,

    CASE WHEN max(gb.bidding_group_id) is null or @EIA_unit_id != '0000' THEN convert(decimal(6,2),max(uGen.[weight] * br.mw8)) ELSE max(br.mw8) END AS MW8,

    CASE WHEN max(gb.bidding_group_id) is null or @EIA_unit_id != '0000' THEN convert(decimal(6,2),max(uGen.[weight] * br.mw9)) ELSE max(br.mw9) END AS MW9,

    CASE WHEN max(gb.bidding_group_id) is null or @EIA_unit_id != '0000' THEN convert(decimal(6,2),max(uGen.[weight] * br.mw10)) ELSE max(br.mw10) END AS MW10,

    CASE WHEN max(gb.bidding_group_id) is null or @EIA_unit_id != '0000' THEN convert(decimal(6,2),max(uGen.[weight] * br.[economic min])) ELSE max(br.mw1) END AS economic_min,

    CASE WHEN max(gb.bidding_group_id) is null or @EIA_unit_id != '0000' THEN convert(decimal(6,2),max(uGen.[weight] * br.[economic max])) ELSE max(br.mw1) END AS economic_max,

    max(convert(decimal(4,2),br.[Slope])) AS slope,

    max(convert(decimal(4,2),br.[Must Run Flag])) AS must_run_flag,

    max(convert(decimal(4,2),br.[Unit Available Flag])) AS unit_available_flag,

    max(br.[LMP]) AS LMP_hourly,

    CASE WHEN max(gb.bidding_group_id) is null or @EIA_unit_id != '0000' THEN convert(decimal(6,2),max(uGen.[weight] * br.[MW])) ELSE max(br.[MW]) END AS MW_hourly,

    CASE WHEN max(gb.bidding_group_id) is null or @EIA_unit_id != '0000' THEN convert(decimal(6,2),max(uGen.[weight] * br.[Self Scheduled MW])) ELSE max(br.[Self Scheduled MW]) END AS MW_self_schedule,

    max(convert(decimal(4,2),br.[Economic Flag])) AS economic_flag,

    max(convert(decimal(4,2),br.[Emergency Flag])) AS emergency_flag,

    CASE WHEN max(gb.bidding_group_id) is null or @EIA_unit_id != '0000' THEN convert(decimal(6,2),max(uGen.[weight] * br.[Emergency Min])) ELSE max(br.[Emergency Min]) END AS emergency_min,

    CASE WHEN max(gb.bidding_group_id) is null or @EIA_unit_id != '0000' THEN convert(decimal(6,2),max(uGen.[weight] * br.[Emergency Max])) ELSE max(br.[Emergency Max]) END AS emergency_max,

    CASE WHEN max(gb.bidding_group_id) is null or @EIA_unit_id != '0000' THEN convert(decimal(6,2),max(uGen.[weight] * br.[Target MW Reduction])) ELSE max(br.[Target MW Reduction]) END AS target_MW_reduction,

    max(br.[Curtailment Offer Price]) AS curtailment_offer_price

    --max(gu.bidding_group_id) as bid_group_id,

    --max(uGen.[weight]) as UC_weight

    --max(CONVERT(int,br.bitDataChecked)) AS bit_data_checked,

    FROM dbo.global_unit gu

    INNER JOINdbo.unit_to_gennode AS uGen ON uGen.unit_id=gu.unit_id

    INNER JOIN dbo.gennode_to_unit_code AS guCode ON uGen.gennode = guCode.gennode

    INNER JOIN dbo.bids_raw_da_ AS br ON guCode.unit_code = br.[Unit Code] --THIS TABLE HAS 30 million woes

    LEFT JOIN dbo.global_bidding_group gb on gb.bidding_group_id = gu.bidding_group_id

    WHERE

    @EIA_plant_id = CASE WHEN @EIA_plant_id != 999999 THEN gu.EIA_plant_code ELSE 999999 END

    AND @EIA_unit_id = CASE WHEN @EIA_unit_id != '0000' THEN gu.EIA_unit_id ELSE '0000' END

    --AND (bitDataChecked = 1 or bitDataChecked = 0)

    AND [Date Time Beginning (est)] >= @startdate

    AND [Date Time Beginning (est)] <= @enddate

    AND datepart(hour, br.[Date Time Beginning (EST)]) >= @starthour

    and datepart(hour, br.[Date Time Beginning (EST)]) <= @endhour

    --AND starthour >= @starthour -- i tried to add persisted computed columns on Date Time Beginning (est). no luck.

    --AND starthour<= @endhour

    GROUP BY CASE WHEN gb.bidding_group_id is not null THEN gb.bidding_group_id ELSE gu.unit_id END, [Date Time Beginning (EST)], [Unit Code]

    ORDER BY max(gu.unit_id), date_beginning_EST, [Unit Code]

    --option(hash join,merge join)

    END

    What are the things i should change to make it run it better. please help.

  • query execution plan is attached.

  • You'll need to post the DDL for the tables you are querying for a better answer. Im not at a SQL Server so I can't look at the execution plan but here's what is jumping out at me: the ORDER BY max() clause at the end. That is certainly one thing that's wrecking your performance and isn't necessary to get the correct results. Losing that is your "low-hanging fruit" performance fix. Let the application sort your result set.

    "I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."

    -- Itzik Ben-Gan 2001

  • 1) Your WHERE clause is VERY complex for the optimizer, leading to bad estimates and both a suboptimal plan and multiple spills to tempdb for sorting.

    2) You have functions around columns in the WHERE clause, which contributes to the above and also can prevent the use of index seeks (if/when they are appropriate).

    3) Maybe your dev is underpowered πŸ™‚

    4) You have date range filtering, which is almost an automatic OPTION (RECOMPILE) for the statement. I promise you don't want the same query plan for 1 day range as you do for a 10 year range!

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • CASE statement in the GROUP BY clause is also going to lead to some pretty poor choices by the optimizer.

    If you have two different sets of logic, rather than try to cram them into a single procedure, create two procedures and call them from a wrapper procedure based on the needs of the separated logic. That's going to give the optimizer a better chance at arriving at a good plan.

    Plus everything already pointed out.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Thank you so much. I will try to make the changes and will get back with more questions.

    I really appreciate you all for taking out your time and helping me. πŸ™‚

  • I've got a question about this piece of code:

    AND [Date Time Beginning (est)] >= @startdate

    AND [Date Time Beginning (est)] <= @enddate

    AND datepart(hour, br.[Date Time Beginning (EST)]) >= @starthour

    and datepart(hour, br.[Date Time Beginning (EST)]) <= @endhour

    If you have, say,

    @starthour = 8 and

    @endhour = 10

    the query will return events beginning only within 3 hours of each day between @startdate and @enddate.

    For example, if @startdate = 27 June and @enddate = 29 June the query will select events which started :

    27 June between 8:00am and 10:59:59.997 am,

    28 June between 8:00am and 10:59:59.997 am,

    29 June between 8:00am and 10:59:59.997 am.

    Is it what is expected?

    _____________
    Code for TallyGenerator

  • yes this is expected. I need data in that particular range of hours in order to keep the result set small.

  • From the plan I can see the column [Date Time Beginning (EST)] belongs to table [bids_raw_da_miso]

    But I don't see this table anywhere in the query.

    And if you select by range then you better have a clustered index on [bids_raw_da_miso].[Date Time Beginning (EST)], or a covering index with [Date Time Beginning (EST)] as a 1st column and including other columns from that table which are used in the query.

    _____________
    Code for TallyGenerator

  • ekant_alone (7/5/2016)


    yes this is expected. I need data in that particular range of hours in order to keep the result set small.

    So your range might be several days long, capturing say 9am to 11am for each day?

    β€œWrite the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

Viewing 10 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply