Query optimizing 2

  • So after my fail in the first post I recreated the problem. The goal is to optimize this query in two ways:

    - Do I really need a nested select statement?

    - The "CASE WHEN" statements are really slow on a large dataset. Is there a better way?

    This is the query:

    select

    Sold_To_Country,

    sum(Sales_New_York) Sales_New_York,

    sum(Sales_Detroit) Sales_Detroit,

    sum(Sales_Los_Angeles) Sales_Los_Angeles,

    sum(Sales_Paris) Sales_Paris,

    sum(Sales_Lyon) Sales_Lyon,

    sum(Sales_Bonn) Sales_Bonn,

    sum(Sales_Hamburg) Sales_Hamburg,

    sum(Sales_Frankfurt) Sales_Frankfurt

    from

    (

    select

    b.Sold_To_Country,

    (case when ToCity = 'New York' then SUM(SalesAmount) else 0 end) Sales_New_York,

    (case when ToCity = 'Detroit' then SUM(SalesAmount) else 0 end) Sales_Detroit,

    (case when ToCity = 'Los Angeles' then SUM(SalesAmount) else 0 end) Sales_Los_Angeles,

    (case when ToCity = 'Paris' then SUM(SalesAmount) else 0 end) Sales_Paris,

    (case when ToCity = 'Lyon' then SUM(SalesAmount) else 0 end) Sales_Lyon,

    (case when ToCity = 'Bonn' then SUM(SalesAmount) else 0 end) Sales_Bonn,

    (case when ToCity = 'Hamburg' then SUM(SalesAmount) else 0 end) Sales_Hamburg,

    (case when ToCity = 'Frankfurt' then SUM(SalesAmount) else 0 end) Sales_Frankfurt

    from

    dbo.CountriesInvolved a inner join dbo.Sales_City b

    on a.Sold_To_Country = b.Sold_To_Country

    group by b.Sold_To_Country, ToCity

    ) c

    group by Sold_To_Country

  • i think you can just do it with inline SUM..CASE...

    you almost had the syntax in your example:

    does this do what you wanted?

    SELECT

    b.Sold_To_Country,

    SUM(CASE WHEN ToCity = 'New York' THEN (SalesAmount) ELSE 0 END) Sales_New_York,

    SUM(CASE WHEN ToCity = 'Detroit' THEN (SalesAmount) ELSE 0 END) Sales_Detroit,

    SUM(CASE WHEN ToCity = 'Los Angeles' THEN (SalesAmount) ELSE 0 END) Sales_Los_Angeles,

    SUM(CASE WHEN ToCity = 'Paris' THEN (SalesAmount) ELSE 0 END) Sales_Paris,

    SUM(CASE WHEN ToCity = 'Lyon' THEN (SalesAmount) ELSE 0 END) Sales_Lyon,

    SUM(CASE WHEN ToCity = 'Bonn' THEN (SalesAmount) ELSE 0 END) Sales_Bonn,

    SUM(CASE WHEN ToCity = 'Hamburg' THEN (SalesAmount) ELSE 0 END) Sales_Hamburg,

    SUM(CASE WHEN ToCity = 'Frankfurt' THEN (SalesAmount) ELSE 0 END) Sales_Frankfurt

    FROM dbo.CountriesInvolved a

    INNER JOIN dbo.Sales_City b

    ON a.Sold_To_Country = b.Sold_To_Country

    GROUP BY b.Sold_To_Country, ToCity

    Lowell


    --help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

  • It's not the desired result. Here's the deal:

    Using this query:

    select

    b.Sold_To_Country,

    (case when ToCity = 'New York' then SUM(SalesAmount) else 0 end) Sales_New_York,

    (case when ToCity = 'Detroit' then SUM(SalesAmount) else 0 end) Sales_Detroit,

    (case when ToCity = 'Los Angeles' then SUM(SalesAmount) else 0 end) Sales_Los_Angeles,

    (case when ToCity = 'Paris' then SUM(SalesAmount) else 0 end) Sales_Paris,

    (case when ToCity = 'Lyon' then SUM(SalesAmount) else 0 end) Sales_Lyon,

    (case when ToCity = 'Bonn' then SUM(SalesAmount) else 0 end) Sales_Bonn,

    (case when ToCity = 'Hamburg' then SUM(SalesAmount) else 0 end) Sales_Hamburg,

    (case when ToCity = 'Frankfurt' then SUM(SalesAmount) else 0 end) Sales_Frankfurt

    from

    dbo.CountriesInvolved a inner join dbo.Sales_City b

    on a.Sold_To_Country = b.Sold_To_Country

    group by b.Sold_To_Country, ToCity

    Results in this:

    So I built another query around it (which can be seen in the first post,) sum again and have only one group left (without a subgroup):

    And that's the desired result. So that's challenge 1.

    The second challenge is to replace the the "case when's". They force a table scan. It's very inefficient execution time wise.

  • Rokh (9/15/2011)


    It's not the desired result. Here's the deal:

    Using this query:

    select

    b.Sold_To_Country,

    (case when ToCity = 'New York' then SUM(SalesAmount) else 0 end) Sales_New_York,

    (case when ToCity = 'Detroit' then SUM(SalesAmount) else 0 end) Sales_Detroit,

    (case when ToCity = 'Los Angeles' then SUM(SalesAmount) else 0 end) Sales_Los_Angeles,

    (case when ToCity = 'Paris' then SUM(SalesAmount) else 0 end) Sales_Paris,

    (case when ToCity = 'Lyon' then SUM(SalesAmount) else 0 end) Sales_Lyon,

    (case when ToCity = 'Bonn' then SUM(SalesAmount) else 0 end) Sales_Bonn,

    (case when ToCity = 'Hamburg' then SUM(SalesAmount) else 0 end) Sales_Hamburg,

    (case when ToCity = 'Frankfurt' then SUM(SalesAmount) else 0 end) Sales_Frankfurt

    from

    dbo.CountriesInvolved a inner join dbo.Sales_City b

    on a.Sold_To_Country = b.Sold_To_Country

    group by b.Sold_To_Country, ToCity

    Results in this:

    So I built another query around it (which can be seen in the first post,) sum again and have only one group left (without a subgroup):

    And that's the desired result. So that's challenge 1.

    The second challenge is to replace the the "case when's". They force a table scan. It's very inefficient execution time wise.

    Lutz made a minor error (copy'n'paste error, most likely) and I've crossed it out below but he was definitely on the right track... Also, which table is "ToCity" and "Sales Amount" in? I ask because I think you might not even need the join to the CountriesInvolved table. And, no... the CASE WHEN's aren't the problem for such code. The fact that you're trying to SUM and PIVOT at the same time is the problem. Lemme know about the things I asked about above and I'll show you how to write this for max performance.

    SELECT

    b.Sold_To_Country,

    SUM(CASE WHEN ToCity = 'New York' THEN (SalesAmount) ELSE 0 END) Sales_New_York,

    SUM(CASE WHEN ToCity = 'Detroit' THEN (SalesAmount) ELSE 0 END) Sales_Detroit,

    SUM(CASE WHEN ToCity = 'Los Angeles' THEN (SalesAmount) ELSE 0 END) Sales_Los_Angeles,

    SUM(CASE WHEN ToCity = 'Paris' THEN (SalesAmount) ELSE 0 END) Sales_Paris,

    SUM(CASE WHEN ToCity = 'Lyon' THEN (SalesAmount) ELSE 0 END) Sales_Lyon,

    SUM(CASE WHEN ToCity = 'Bonn' THEN (SalesAmount) ELSE 0 END) Sales_Bonn,

    SUM(CASE WHEN ToCity = 'Hamburg' THEN (SalesAmount) ELSE 0 END) Sales_Hamburg,

    SUM(CASE WHEN ToCity = 'Frankfurt' THEN (SalesAmount) ELSE 0 END) Sales_Frankfurt

    FROM dbo.CountriesInvolved a

    INNER JOIN dbo.Sales_City b

    ON a.Sold_To_Country = b.Sold_To_Country

    GROUP BY b.Sold_To_Country, ToCity

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Also, if your table has more cities in it than what you have in the query, add the following WHERE clause just before the GROUP BY...

    WHERE ToCity IN ('New York', 'Detroit', 'Los Angeles', 'Paris', 'Lyon', 'Bonn', 'Hamburg', 'Frankfurt')

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (9/15/2011)


    Also, if your table has more cities in it than what you have in the query, add the following WHERE clause just before the GROUP BY...

    WHERE ToCity IN ('New York', 'Detroit', 'Los Angeles', 'Paris', 'Lyon', 'Bonn', 'Hamburg', 'Frankfurt')

    The original query (I merely simulated the structure to keep things simple here) has this where clause 🙂

    . Also, which table is "ToCity" and "Sales Amount" in? I ask because I think you might not even need the join to the CountriesInvolved table.

    The join is needed as CountriesInvolved is the resultset of a CTE, deciding which coutries I need to analyse.

    So it all starts like this:

    ;WITH CountriesInvolved AS

    (

    Sold_To_Country,

    -- Other calculations and stuff

    ),

    ExtendedDetails AS

    (

    select

    --stuff mainly consisting of sum(case when else end) cases

    from

    CountriesInvolved a inner join Sales_City b

    on a.a.Sold_To_Country = b.Sold_To_Country

    )

    -- more sql here to bind the resultset of above with 3 other tables

    I appreciate the help I'm getting here by the way. Thanks guys.

  • I still need to know which table ToCity and SalesAmount are in to optimize this to the tune of about a 300%+ performance gain. Are they in the Sales_City table (educated guess but still just a guess)? If not, please let me know where.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Yes, they're in the Sales_City table! In the process of making screenies I forgot to post the answer.

  • Well I'll be dammed. The first query is faster then the more readable and shorter query...

    I've tested it using 19 million records.

    select

    Sold_To_Country,

    sum(Sales_New_York) Sales_New_York,

    sum(Sales_Detroit) Sales_Detroit,

    sum(Sales_Los_Angeles) Sales_Los_Angeles,

    sum(Sales_Paris) Sales_Paris,

    sum(Sales_Lyon) Sales_Lyon,

    sum(Sales_Bonn) Sales_Bonn,

    sum(Sales_Hamburg) Sales_Hamburg,

    sum(Sales_Frankfurt) Sales_Frankfurt

    from

    (

    select

    b.Sold_To_Country,

    (case when ToCity = 'New York'then SUM(SalesAmount) else 0 end) Sales_New_York,

    (case when ToCity = 'Detroit'then SUM(SalesAmount) else 0 end) Sales_Detroit,

    (case when ToCity = 'Los Angeles' then SUM(SalesAmount) else 0 end) Sales_Los_Angeles,

    (case when ToCity = 'Paris'then SUM(SalesAmount) else 0 end) Sales_Paris,

    (case when ToCity = 'Lyon'then SUM(SalesAmount) else 0 end) Sales_Lyon,

    (case when ToCity = 'Bonn'then SUM(SalesAmount) else 0 end) Sales_Bonn,

    (case when ToCity = 'Hamburg'then SUM(SalesAmount) else 0 end) Sales_Hamburg,

    (case when ToCity = 'Frankfurt'then SUM(SalesAmount) else 0 end) Sales_Frankfurt

    from

    dbo.CountriesInvolved a inner join dbo.Sales_City b

    on a.Sold_To_Country = b.Sold_To_Country

    group by b.Sold_To_Country, ToCity

    ) c

    group by Sold_To_Country

    -- this one below produces the same output but is twice as slow...

    SELECT

    b.Sold_To_Country,

    SUM(CASE WHEN ToCity = 'New York' THEN (SalesAmount) ELSE 0 END) Sales_New_York,

    SUM(CASE WHEN ToCity = 'Detroit' THEN (SalesAmount) ELSE 0 END) Sales_Detroit,

    SUM(CASE WHEN ToCity = 'Los Angeles' THEN (SalesAmount) ELSE 0 END) Sales_Los_Angeles,

    SUM(CASE WHEN ToCity = 'Paris' THEN (SalesAmount) ELSE 0 END) Sales_Paris,

    SUM(CASE WHEN ToCity = 'Lyon' THEN (SalesAmount) ELSE 0 END) Sales_Lyon,

    SUM(CASE WHEN ToCity = 'Bonn' THEN (SalesAmount) ELSE 0 END) Sales_Bonn,

    SUM(CASE WHEN ToCity = 'Hamburg' THEN (SalesAmount) ELSE 0 END) Sales_Hamburg,

    SUM(CASE WHEN ToCity = 'Frankfurt' THEN (SalesAmount) ELSE 0 END) Sales_Frankfurt

    FROM dbo.CountriesInvolved a INNER JOIN dbo.Sales_City b

    ON a.Sold_To_Country = b.Sold_To_Country

    GROUP BY b.Sold_To_Country

  • Rokh (9/16/2011)


    Well I'll be dammed. The first query is faster then the more readable and shorter query...

    I've tested it using 19 million records.

    In that case, could you post the CREATE TABLE statements for the two tables involved here... I've got an idea to really make it sail...

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • CREATE TABLE [dbo].[CountriesInvolved](

    [Sold_To_Country] [nvarchar](255) NULL

    ) ON [PRIMARY]

    CREATE TABLE [dbo].[Sales_City](

    [Sold_To_Country] [nvarchar](255) NULL,

    [ToCity] [nvarchar](255) NULL,

    [SalesAmount] [float] NULL

    ) ON [PRIMARY]

  • Thanks,

    Try this... it's a technique called "Pre-Aggregation" (phrase coined by Peter Larsson). It usually runs quite a bit faster than many straight forward Cross-Tab methods and is a bit less code to write than the faster method you found.

    WITH

    ctePreAgg AS

    (

    SELECT sc.Sold_To_Country,

    sc.ToCity,

    SalesAmount = SUM(sc.SalesAmount)

    FROM dbo.Sales_City sc

    INNEr JOIN dbo.CountriesInvolved ci

    ON sc.Sold_To_Country = ci.Sold_To_Country

    GROUP BY sc.Sold_To_Country, sc.ToCity

    )

    SELECT Sold_To_Country,

    SUM(CASE WHEN ToCity = 'New York' THEN (SalesAmount) ELSE 0 END) Sales_New_York,

    SUM(CASE WHEN ToCity = 'Detroit' THEN (SalesAmount) ELSE 0 END) Sales_Detroit,

    SUM(CASE WHEN ToCity = 'Los Angeles' THEN (SalesAmount) ELSE 0 END) Sales_Los_Angeles,

    SUM(CASE WHEN ToCity = 'Paris' THEN (SalesAmount) ELSE 0 END) Sales_Paris,

    SUM(CASE WHEN ToCity = 'Lyon' THEN (SalesAmount) ELSE 0 END) Sales_Lyon,

    SUM(CASE WHEN ToCity = 'Bonn' THEN (SalesAmount) ELSE 0 END) Sales_Bonn,

    SUM(CASE WHEN ToCity = 'Hamburg' THEN (SalesAmount) ELSE 0 END) Sales_Hamburg,

    SUM(CASE WHEN ToCity = 'Frankfurt' THEN (SalesAmount) ELSE 0 END) Sales_Frankfurt

    FROM ctePreAgg

    GROUP BY Sold_To_Country

    OPTION (RECOMPILE)

    ;

    Of course, I tested with NonClustered indexes on both tables where "Sold_To_Country" appears.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Funny, has about the same execution speed as my original query. have to deal with the fact that implementing CASE/WHEN statements like this implies a serious performance hit. It forces table scans. So be it 🙂

    Thanks for your help! Appreciate it a lot!

  • Rokh (9/21/2011)


    Funny, has about the same execution speed as my original query. have to deal with the fact that implementing CASE/WHEN statements like this implies a serious performance hit. It forces table scans. So be it 🙂

    Thanks for your help! Appreciate it a lot!

    No it does NOT. You have a scan here because you have no where clause.

  • Oki, so I added a where clause. The execution plan still shows 39% table scan on the Sales_City table.

    The final query looks like this then:

    select

    Sold_To_Country,

    sum(Sales_New_York) Sales_New_York,

    sum(Sales_Detroit) Sales_Detroit,

    sum(Sales_Los_Angeles) Sales_Los_Angeles,

    sum(Sales_Paris) Sales_Paris,

    sum(Sales_Lyon) Sales_Lyon,

    sum(Sales_Bonn) Sales_Bonn,

    sum(Sales_Hamburg) Sales_Hamburg,

    sum(Sales_Frankfurt) Sales_Frankfurt

    from

    (

    select

    b.Sold_To_Country,

    (case when ToCity = 'New York' then SUM(SalesAmount) else 0 end) Sales_New_York,

    (case when ToCity = 'Detroit' then SUM(SalesAmount) else 0 end) Sales_Detroit,

    (case when ToCity = 'Los Angeles' then SUM(SalesAmount) else 0 end) Sales_Los_Angeles,

    (case when ToCity = 'Paris' then SUM(SalesAmount) else 0 end) Sales_Paris,

    (case when ToCity = 'Lyon' then SUM(SalesAmount) else 0 end) Sales_Lyon,

    (case when ToCity = 'Bonn' then SUM(SalesAmount) else 0 end) Sales_Bonn,

    (case when ToCity = 'Hamburg' then SUM(SalesAmount) else 0 end) Sales_Hamburg,

    (case when ToCity = 'Frankfurt' then SUM(SalesAmount) else 0 end) Sales_Frankfurt

    from

    dbo.CountriesInvolved a inner join dbo.Sales_City b

    on a.Sold_To_Country = b.Sold_To_Country

    where

    -- decreases table scan with 12%.

    ToCity IN ('New York', 'Detroit', 'Los Angeles', 'Paris', 'Lyon', 'Bonn', 'Hamburg', 'Frankfurt')

    group by b.Sold_To_Country, ToCity

    ) c

    group by Sold_To_Country

Viewing 15 posts - 1 through 15 (of 39 total)

You must be logged in to reply to this topic. Login to reply