Could this be done better?

  • Howdy,

    Late last night I got a request for some data. The format was the biggest challenge, since the definitions of small, medium, and large businesses are different ranges for reach country, and the coded variables for them also end up being a different three number sequence for each country and business size. The format was:

    country, total, total small, total medium, total large

    So I wrote this, altered a bit to accommodate the test data. I'm fine with how and how fast it works, so you don't have to rack your brain. But if you see an obvious improvement I'd love to hear it.

    with data (orc_country, orc_quotacell, statusflag) as (

    select 'Canada', 1, 1 UNION ALL

    select 'Canada', 2, 1 UNION ALL

    select 'Canada', 3, 1 UNION ALL

    select 'Canada', 1, 1 UNION ALL

    select 'Canada', 2, 1 UNION ALL

    select 'Canada', 3, 1 UNION ALL

    select 'Canada', 2, 1 UNION ALL

    select 'Canada', 3, 1 UNION ALL

    select 'Canada', 3, 1 UNION ALL

    select 'Canada', 1, 1 UNION ALL

    select 'Canada', 2, 1 UNION ALL

    select 'India', 4, 1 UNION ALL

    select 'India', 5, 1 UNION ALL

    select 'India', 6, 1 UNION ALL

    select 'India', 4, 1 UNION ALL

    select 'India', 5, 1 UNION ALL

    select 'India', 6, 1 UNION ALL

    select 'India', 5, 1 UNION ALL

    select 'India', 6, 1 UNION ALL

    select 'India', 4, 1 UNION ALL

    select 'India', 5, 1 UNION ALL

    select 'India', 6, 1 UNION ALL

    select 'India', 5, 1 UNION ALL

    select 'India', 6, 1 UNION ALL

    select 'Singapore', 7, 1 UNION ALL

    select 'Singapore', 8, 1 UNION ALL

    select 'Singapore', 9, 1 UNION ALL

    select 'Singapore', 7, 1 UNION ALL

    select 'Singapore', 8, 1 UNION ALL

    select 'Singapore', 9, 1 UNION ALL

    select 'Singapore', 7, 1 UNION ALL

    select 'Singapore', 8, 1 UNION ALL

    select 'Singapore', 9, 1 UNION ALL

    select 'Singapore', 8, 1 UNION ALL

    select 'Singapore', 9, 1 UNION ALL

    select 'Thailand', 10, 1 UNION ALL

    select 'Thailand', 11, 1 UNION ALL

    select 'Thailand', 12, 1 UNION ALL

    select 'Thailand', 10, 1 UNION ALL

    select 'Thailand', 11, 1 UNION ALL

    select 'Thailand', 12, 1 UNION ALL

    select 'Thailand', 10, 1 UNION ALL

    select 'Thailand', 11, 1 UNION ALL

    select 'Thailand', 12, 1 UNION ALL

    select 'Thailand', 11, 1 UNION ALL

    select 'Thailand', 12, 1 UNION ALL

    select 'UK', 13, 1 UNION ALL

    select 'UK', 14, 1 UNION ALL

    select 'UK', 15, 1 UNION ALL

    select 'UK', 13, 1 UNION ALL

    select 'UK', 13, 1 UNION ALL

    select 'UK', 14, 1 UNION ALL

    select 'UK', 15, 1 UNION ALL

    select 'UK', 14, 1 UNION ALL

    select 'UK', 14, 1 UNION ALL

    select 'UK', 15, 1 UNION ALL

    select 'UK', 14, 1 UNION ALL

    select 'UK', 15, 1 UNION ALL

    select 'United Arab Emirates', 16, 1 UNION ALL

    select 'United Arab Emirates', 17, 1 UNION ALL

    select 'United Arab Emirates', 16, 1 UNION ALL

    select 'United Arab Emirates', 16, 1 UNION ALL

    select 'United Arab Emirates', 17, 1 UNION ALL

    select 'United Arab Emirates', 16, 1 UNION ALL

    select 'United Arab Emirates', 17, 1 UNION ALL

    select 'United Arab Emirates', 16, 1 UNION ALL

    select 'United Arab Emirates', 17, 1 UNION ALL

    select 'United Arab Emirates', 16, 1 UNION ALL

    select 'United Arab Emirates', 17, 1 UNION ALL

    select 'United Arab Emirates', 18, 1

    ),

    agg(country, sm, md, lg) as (

    select s1.orc_country

    , min(s2.orc_quotacell) as [Sm]

    , min(s2.orc_quotacell) + 1 as [Med]

    , max(s2.orc_quotacell) as [Large]

    from data s1 left join data s2

    on s1.orc_country = s2.orc_country

    group by s1.orc_country

    )

    select s.orc_country

    ,sum(case when s.orc_country = a.country then 1 else 0 end) as [Total]

    ,sum(case when s.orc_quotacell = a.sm then 1 else 0 end) as [Small]

    ,sum(case when s.orc_quotacell = a.md then 1 else 0 end) as [Medium]

    ,sum(case when s.orc_quotacell = a.lg then 1 else 0 end) as [Large]

    from data s,

    agg a

    where s.statusflag = 1

    group by s.orc_country

    order by orc_country

  • Here is my solution along with yours:

    with data (orc_country, orc_quotacell, statusflag) as (

    select 'Canada', 1, 1 UNION ALL

    select 'Canada', 2, 1 UNION ALL

    select 'Canada', 3, 1 UNION ALL

    select 'Canada', 1, 1 UNION ALL

    select 'Canada', 2, 1 UNION ALL

    select 'Canada', 3, 1 UNION ALL

    select 'Canada', 2, 1 UNION ALL

    select 'Canada', 3, 1 UNION ALL

    select 'Canada', 3, 1 UNION ALL

    select 'Canada', 1, 1 UNION ALL

    select 'Canada', 2, 1 UNION ALL

    select 'India', 4, 1 UNION ALL

    select 'India', 5, 1 UNION ALL

    select 'India', 6, 1 UNION ALL

    select 'India', 4, 1 UNION ALL

    select 'India', 5, 1 UNION ALL

    select 'India', 6, 1 UNION ALL

    select 'India', 5, 1 UNION ALL

    select 'India', 6, 1 UNION ALL

    select 'India', 4, 1 UNION ALL

    select 'India', 5, 1 UNION ALL

    select 'India', 6, 1 UNION ALL

    select 'India', 5, 1 UNION ALL

    select 'India', 6, 1 UNION ALL

    select 'Singapore', 7, 1 UNION ALL

    select 'Singapore', 8, 1 UNION ALL

    select 'Singapore', 9, 1 UNION ALL

    select 'Singapore', 7, 1 UNION ALL

    select 'Singapore', 8, 1 UNION ALL

    select 'Singapore', 9, 1 UNION ALL

    select 'Singapore', 7, 1 UNION ALL

    select 'Singapore', 8, 1 UNION ALL

    select 'Singapore', 9, 1 UNION ALL

    select 'Singapore', 8, 1 UNION ALL

    select 'Singapore', 9, 1 UNION ALL

    select 'Thailand', 10, 1 UNION ALL

    select 'Thailand', 11, 1 UNION ALL

    select 'Thailand', 12, 1 UNION ALL

    select 'Thailand', 10, 1 UNION ALL

    select 'Thailand', 11, 1 UNION ALL

    select 'Thailand', 12, 1 UNION ALL

    select 'Thailand', 10, 1 UNION ALL

    select 'Thailand', 11, 1 UNION ALL

    select 'Thailand', 12, 1 UNION ALL

    select 'Thailand', 11, 1 UNION ALL

    select 'Thailand', 12, 1 UNION ALL

    select 'UK', 13, 1 UNION ALL

    select 'UK', 14, 1 UNION ALL

    select 'UK', 15, 1 UNION ALL

    select 'UK', 13, 1 UNION ALL

    select 'UK', 13, 1 UNION ALL

    select 'UK', 14, 1 UNION ALL

    select 'UK', 15, 1 UNION ALL

    select 'UK', 14, 1 UNION ALL

    select 'UK', 14, 1 UNION ALL

    select 'UK', 15, 1 UNION ALL

    select 'UK', 14, 1 UNION ALL

    select 'UK', 15, 1 UNION ALL

    select 'United Arab Emirates', 16, 1 UNION ALL

    select 'United Arab Emirates', 17, 1 UNION ALL

    select 'United Arab Emirates', 16, 1 UNION ALL

    select 'United Arab Emirates', 16, 1 UNION ALL

    select 'United Arab Emirates', 17, 1 UNION ALL

    select 'United Arab Emirates', 16, 1 UNION ALL

    select 'United Arab Emirates', 17, 1 UNION ALL

    select 'United Arab Emirates', 16, 1 UNION ALL

    select 'United Arab Emirates', 17, 1 UNION ALL

    select 'United Arab Emirates', 16, 1 UNION ALL

    select 'United Arab Emirates', 17, 1 UNION ALL

    select 'United Arab Emirates', 18, 1

    ),

    agg(country, sm, md, lg) as (

    select s1.orc_country

    , min(s2.orc_quotacell) as [Sm]

    , min(s2.orc_quotacell) + 1 as [Med]

    , max(s2.orc_quotacell) as [Large]

    from data s1 left join data s2

    on s1.orc_country = s2.orc_country

    group by s1.orc_country

    )

    select s.orc_country

    ,sum(case when s.orc_country = a.country then 1 else 0 end) as [Total]

    ,sum(case when s.orc_quotacell = a.sm then 1 else 0 end) as [Small]

    ,sum(case when s.orc_quotacell = a.md then 1 else 0 end) as [Medium]

    ,sum(case when s.orc_quotacell = a.lg then 1 else 0 end) as [Large]

    from data s,

    agg a

    where s.statusflag = 1

    group by s.orc_country

    order by orc_country;

    go

    with data (orc_country, orc_quotacell, statusflag) as (

    select 'Canada', 1, 1 UNION ALL

    select 'Canada', 2, 1 UNION ALL

    select 'Canada', 3, 1 UNION ALL

    select 'Canada', 1, 1 UNION ALL

    select 'Canada', 2, 1 UNION ALL

    select 'Canada', 3, 1 UNION ALL

    select 'Canada', 2, 1 UNION ALL

    select 'Canada', 3, 1 UNION ALL

    select 'Canada', 3, 1 UNION ALL

    select 'Canada', 1, 1 UNION ALL

    select 'Canada', 2, 1 UNION ALL

    select 'India', 4, 1 UNION ALL

    select 'India', 5, 1 UNION ALL

    select 'India', 6, 1 UNION ALL

    select 'India', 4, 1 UNION ALL

    select 'India', 5, 1 UNION ALL

    select 'India', 6, 1 UNION ALL

    select 'India', 5, 1 UNION ALL

    select 'India', 6, 1 UNION ALL

    select 'India', 4, 1 UNION ALL

    select 'India', 5, 1 UNION ALL

    select 'India', 6, 1 UNION ALL

    select 'India', 5, 1 UNION ALL

    select 'India', 6, 1 UNION ALL

    select 'Singapore', 7, 1 UNION ALL

    select 'Singapore', 8, 1 UNION ALL

    select 'Singapore', 9, 1 UNION ALL

    select 'Singapore', 7, 1 UNION ALL

    select 'Singapore', 8, 1 UNION ALL

    select 'Singapore', 9, 1 UNION ALL

    select 'Singapore', 7, 1 UNION ALL

    select 'Singapore', 8, 1 UNION ALL

    select 'Singapore', 9, 1 UNION ALL

    select 'Singapore', 8, 1 UNION ALL

    select 'Singapore', 9, 1 UNION ALL

    select 'Thailand', 10, 1 UNION ALL

    select 'Thailand', 11, 1 UNION ALL

    select 'Thailand', 12, 1 UNION ALL

    select 'Thailand', 10, 1 UNION ALL

    select 'Thailand', 11, 1 UNION ALL

    select 'Thailand', 12, 1 UNION ALL

    select 'Thailand', 10, 1 UNION ALL

    select 'Thailand', 11, 1 UNION ALL

    select 'Thailand', 12, 1 UNION ALL

    select 'Thailand', 11, 1 UNION ALL

    select 'Thailand', 12, 1 UNION ALL

    select 'UK', 13, 1 UNION ALL

    select 'UK', 14, 1 UNION ALL

    select 'UK', 15, 1 UNION ALL

    select 'UK', 13, 1 UNION ALL

    select 'UK', 13, 1 UNION ALL

    select 'UK', 14, 1 UNION ALL

    select 'UK', 15, 1 UNION ALL

    select 'UK', 14, 1 UNION ALL

    select 'UK', 14, 1 UNION ALL

    select 'UK', 15, 1 UNION ALL

    select 'UK', 14, 1 UNION ALL

    select 'UK', 15, 1 UNION ALL

    select 'United Arab Emirates', 16, 1 UNION ALL

    select 'United Arab Emirates', 17, 1 UNION ALL

    select 'United Arab Emirates', 16, 1 UNION ALL

    select 'United Arab Emirates', 16, 1 UNION ALL

    select 'United Arab Emirates', 17, 1 UNION ALL

    select 'United Arab Emirates', 16, 1 UNION ALL

    select 'United Arab Emirates', 17, 1 UNION ALL

    select 'United Arab Emirates', 16, 1 UNION ALL

    select 'United Arab Emirates', 17, 1 UNION ALL

    select 'United Arab Emirates', 16, 1 UNION ALL

    select 'United Arab Emirates', 17, 1 UNION ALL

    select 'United Arab Emirates', 18, 1

    ),

    BaseData as (

    select

    d.orc_country,

    d.orc_quotacell,

    d.statusflag,

    rn = dense_rank() over (partition by d.orc_country order by d.orc_quotacell)

    from

    data d

    )

    select

    bd.orc_country,

    count(bd.orc_country) Total,

    sum(case rn when 1 then 1 else 0 end) Small,

    sum(case rn when 2 then 1 else 0 end) Medium,

    sum(case rn when 3 then 1 else 0 end) Large

    from

    BaseData bd

    group by

    bd.orc_country;

    Also attaching the actual execution plans generated when I ran them.

  • Hot damn, Lynn, that's awesome. Attached plans from running on my actual table. I'm expecting data for 13 more countries, so I'll compare again when I have everything.

    Is there a specific reason you went with dense rank over regular rank or just row number?

    Thanks

  • erikd (11/24/2013)


    Hot damn, Lynn, that's awesome. Attached plans from running on my actual table. I'm expecting data for 13 more countries, so I'll compare again when I have everything.

    Is there a specific reason you went with dense rank over regular rank or just row number?

    Thanks

    Actually, I tried row_number() first and displaying the data generated from the BaseData cte quickly showed me that that wasn't what I wanted. I needed to rank the values representing small, medium, and large with the same values; in the case 1,2, and 3. Rank wold not accomplish this due to ties in the values, this left DENSE_RANK.

    To see the differences just do a select from BaseData using row_number(), rank(), and dense_rank() to see the different return values for rn.

  • What's the difference in execution times?

  • Mine was 503ms, yours was 96ms :blush:

  • erikd (11/24/2013)


    Mine was 503ms, yours was 96ms :blush:

    Try something for me, move the WHERE statusflag = 1 into the BaseData cte and rerun the code.

    Be sure to report the execution plan as well, I'd like to see if there is any difference. Not thinkingthere will be based on what I have alread seen in this plan. Looks like all the records may have a statusflag = 1, which means the filter on this column is not relevant.

  • Hi Lynn,

    They're all 1 in the test data, but not in my table:

    statusflagDescription Counts

    1 Released 40158

    10 Duplicate within File75

    30 Do Not Call List 20

    45 Removed cell phones170

    70 Invalid Area Code 1

    So the filter isn't entirely irrelevant. Anyway, I moved the where. Here's the execution plan for it. 96ms still, but now the analyzer recommends an index. Interesting.

  • erikd (11/24/2013)


    Hi Lynn,

    They're all 1 in the test data, but not in my table:

    statusflagDescription Counts

    1 Released 40158

    10 Duplicate within File75

    30 Do Not Call List 20

    45 Removed cell phones170

    70 Invalid Area Code 1

    So the filter isn't entirely irrelevant. Anyway, I moved the where. Here's the execution plan for it. 96ms still, but now the analyzer recommends an index. Interesting.

    Looking at your counts, I don't really see a reason to filter on status while reading from the table. It wouldn't filter out a lot of rows of data, which is what I was looking for here. If you have queries that select data where the statusflag <> 1 would benefit from a filtered index where statusflag <> 1.

  • Is it generally better to keep filters out of CTEs? Since your suggestion to move it, I've tried it in some other queries I've written and the query plans no longer suggest indexes. I guess I falsely assumed that getting all your filtering and aggregation done in the CTE and then querying that smaller set of results would be faster.

  • erikd (11/25/2013)


    Is it generally better to keep filters out of CTEs? Since your suggestion to move it, I've tried it in some other queries I've written and the query plans no longer suggest indexes. I guess I falsely assumed that getting all your filtering and aggregation done in the CTE and then querying that smaller set of results would be faster.

    I'm going to give the standard answer, It depends. If the filter will greatly reduce the number of records to be processed then you should probably filter it as soon as possible. If it makes sense to add the index that sql server suggests, then it may be worthwhile especially if the index is a covering index and much smaller than then table itself.

    All you can do is test, test, and test some more. One of the things you probably noticed when comparing the two execution plans is that my version eliminated two of your clustered index (table) scans.

  • That's sort of what I figured. Thanks again Lynn.

  • The key here will be having an index to support the ranking function and avoid a "sort" iterator in the execution plan.

    I would filter first, then aggregate, then enumerate the result (rank) and then pivot.

    DECLARE @T TABLE (

    sk int NOT NULL IDENTITY(1, 1),

    orc_country varchar(25) NOT NULL,

    orc_quotacell int NOT NULL,

    statusflag tinyint NOT NULL,

    UNIQUE CLUSTERED (orc_country, orc_quotacell, sk)

    );

    with data (orc_country, orc_quotacell, statusflag) as (

    select 'Canada', 1, 1 UNION ALL

    select 'Canada', 2, 1 UNION ALL

    select 'Canada', 3, 1 UNION ALL

    select 'Canada', 1, 1 UNION ALL

    select 'Canada', 2, 1 UNION ALL

    select 'Canada', 3, 1 UNION ALL

    select 'Canada', 2, 1 UNION ALL

    select 'Canada', 3, 1 UNION ALL

    select 'Canada', 3, 1 UNION ALL

    select 'Canada', 1, 1 UNION ALL

    select 'Canada', 2, 1 UNION ALL

    select 'India', 4, 1 UNION ALL

    select 'India', 5, 1 UNION ALL

    select 'India', 6, 1 UNION ALL

    select 'India', 4, 1 UNION ALL

    select 'India', 5, 1 UNION ALL

    select 'India', 6, 1 UNION ALL

    select 'India', 5, 1 UNION ALL

    select 'India', 6, 1 UNION ALL

    select 'India', 4, 1 UNION ALL

    select 'India', 5, 1 UNION ALL

    select 'India', 6, 1 UNION ALL

    select 'India', 5, 1 UNION ALL

    select 'India', 6, 1 UNION ALL

    select 'Singapore', 7, 1 UNION ALL

    select 'Singapore', 8, 1 UNION ALL

    select 'Singapore', 9, 1 UNION ALL

    select 'Singapore', 7, 1 UNION ALL

    select 'Singapore', 8, 1 UNION ALL

    select 'Singapore', 9, 1 UNION ALL

    select 'Singapore', 7, 1 UNION ALL

    select 'Singapore', 8, 1 UNION ALL

    select 'Singapore', 9, 1 UNION ALL

    select 'Singapore', 8, 1 UNION ALL

    select 'Singapore', 9, 1 UNION ALL

    select 'Thailand', 10, 1 UNION ALL

    select 'Thailand', 11, 1 UNION ALL

    select 'Thailand', 12, 1 UNION ALL

    select 'Thailand', 10, 1 UNION ALL

    select 'Thailand', 11, 1 UNION ALL

    select 'Thailand', 12, 1 UNION ALL

    select 'Thailand', 10, 1 UNION ALL

    select 'Thailand', 11, 1 UNION ALL

    select 'Thailand', 12, 1 UNION ALL

    select 'Thailand', 11, 1 UNION ALL

    select 'Thailand', 12, 1 UNION ALL

    select 'UK', 13, 1 UNION ALL

    select 'UK', 14, 1 UNION ALL

    select 'UK', 15, 1 UNION ALL

    select 'UK', 13, 1 UNION ALL

    select 'UK', 13, 1 UNION ALL

    select 'UK', 14, 1 UNION ALL

    select 'UK', 15, 1 UNION ALL

    select 'UK', 14, 1 UNION ALL

    select 'UK', 14, 1 UNION ALL

    select 'UK', 15, 1 UNION ALL

    select 'UK', 14, 1 UNION ALL

    select 'UK', 15, 1 UNION ALL

    select 'United Arab Emirates', 16, 1 UNION ALL

    select 'United Arab Emirates', 17, 1 UNION ALL

    select 'United Arab Emirates', 16, 1 UNION ALL

    select 'United Arab Emirates', 16, 1 UNION ALL

    select 'United Arab Emirates', 17, 1 UNION ALL

    select 'United Arab Emirates', 16, 1 UNION ALL

    select 'United Arab Emirates', 17, 1 UNION ALL

    select 'United Arab Emirates', 16, 1 UNION ALL

    select 'United Arab Emirates', 17, 1 UNION ALL

    select 'United Arab Emirates', 16, 1 UNION ALL

    select 'United Arab Emirates', 17, 1 UNION ALL

    select 'United Arab Emirates', 18, 1

    )

    INSERT INTO @T (orc_country, orc_quotacell, statusflag)

    SELECT

    *

    FROM

    data;

    WITH C1 AS (

    SELECT

    orc_country,

    orc_quotacell,

    COUNT(*) AS cnt,

    DENSE_RANK() OVER(PARTITION BY orc_country ORDER BY orc_quotacell) AS rn

    FROM

    @T

    WHERE

    statusflag = 1

    GROUP BY

    orc_country,

    orc_quotacell

    )

    SELECT

    P.orc_country,

    ISNULL([1], 0) + ISNULL([2], 0) + ISNULL([3], 0) AS Total,

    ISNULL([1], 0) AS Small,

    ISNULL([2], 0) AS Medium,

    ISNULL([3], 0) AS Large

    FROM

    (

    SELECT orc_country, cnt, rn FROM C1

    ) AS T

    PIVOT

    (

    SUM(cnt)

    FOR rn IN ([1], [2], [3])

    ) AS P;

    GO

    This works because your sample data happens to have maximum three distinct values for each country. I wonder if this assumption is correct, otherwise can you describe the problem with words?

  • One, using a table variable isn't necessarily a good choice for testing. No matter how many rows of data exist, the optimizer will assume 1 row as there are no statistics kept on a table variable.

    Two, looking at the execution plans based on the small data set, both your pivot solution and my solution are identical.

    Three, based on the sample data and the code originally developed I am making an assumption that each of the 3 values for each country indicate small, medium, large. The purpose of the DENSE_RANK function is to conform the data to the same set of values that represents small, medium, and large.

    Four, as far as indexing on the actual data source the OP may not have control over that as there may be other data values as well and the existing index(es) may be needed for other purposes.

    Five, it would be interesting to see how your solution works when adapted to use the actual data source.

  • Lynn Pettis (11/25/2013)


    Five, it would be interesting to see how your solution works when adapted to use the actual data source.

    Ask and ye shall receive.

    The first query plan is using a table variable. It took 386ms.

    The second query skips the table variable and performs the query on the actual table. It took 26ms.

    Which is funny, because I read a lot about pivots being slowpokes, and tend to avoid them.

Viewing 15 posts - 1 through 15 (of 22 total)

You must be logged in to reply to this topic. Login to reply