Select Top x% From Group Based on Total

  • I tried searching for this but I must be using the wrong search terms. What I need to do it select the top 80 percent of records per group based on the group total. To be clear I am not trying to just grab the top x percent of rows. My example below might be better than me trying to explain 🙂

    Table 1:

    DealerID RepairID

    1 1

    1 2

    1 3

    1 4

    1 5

    1 6

    1 7

    1 8

    1 9

    1 10

    2 11

    2 12

    2 13

    Table 2:

    RepairID Tire

    1 225/65R17 102T

    2 225/65R17 102T

    3 225/65R17 102T

    4 235/60R18 102V

    5 235/60R18 102V

    6 235/60R18 102V

    7 235/60R18 102V

    8 205/55R16 89H

    9 205/70R15 89H

    13 225/65R17 102T

    Table 1 has the total number of repair orders per dealer. This can be obtained by simply grouping on DealerID and counting the number of RepairIDs.

    Table 2 has information on some of the repair orders and it is needed to select the top 80% of tire sizes. This table will be joined to Table 1 and grouped by DealerID and Tire.

    Desired Output:

    DealerIDTire RepairsOfThisTireRepairCount PercentOfTotalRepairOrders

    1 235/60R18 102V4 10 40

    1 225/65R17 102T3 10 30

    1 205/55R16 89H1 10 10

    2 225/65R17 102T1 3 33

    The equation I am given to calculate the top tires per dealer is as follows: Total # of tires for the size / Total # of repair orders per dealer.

    Here is what I have so far though I think I might have to rewrite all of it.

    SELECT

    DealerID ,

    COUNT(RepairID)

    INTO #TotalDealerRepairOrders

    FROM

    dbo.Table1

    GROUP BY

    DealerID

    SELECT

    DealerID,

    Tire,

    COUNT(RepairID) AS 'Tire Count'

    INTO

    #tempCounts

    FROM

    dbo.Table1 INNER JOIN

    dbo.Table2 ON Table1.RepairID = Table2.RepairID

    GROUP BY

    Table1.DealerID, Tire

    SELECT

    tc.DealerId ,

    tc.Tire,

    CONVERT(DECIMAL(18,4), tc.[Tire Count]) / tro.NumberROs * 100 AS 'Percent'

    FROM

    #tempCounts tc INNER JOIN

    #TotalDealerRepairOrders tro ON tro.DealerID = tc.DealerId

    GROUP BY

    tc.DealerID, tc.Tire, tc.[Tire Count], tro.NumberROs

    HAVING

    SUM(tc.[Tire Count]) <= .8 * tro.NumberROs

    ORDER BY

    tc.[Tire Count] DESC

    DROP TABLE #TotalDealerRepairOrders

    DROP TABLE #tempCounts

    I feel like I should be using the Partition function. Thanks for your help.

  • What do you mean by "Total # of tires for the size"?

    Here's your data set up in a consumable format:

    DROP TABLE #Table1

    CREATE TABLE #Table1 (DealerID INT, RepairID INT)

    INSERT INTO #Table1 VALUES

    (1,1),(1,2),(1,3),(1,4),(1,5),(1,6),(1,7),

    (1,8),(1,9),(1,10),(2,11),(2,12),(2,13)

    DROP TABLE #Table2

    CREATE TABLE #Table2 (RepairID INT, Tire VARCHAR(15))

    INSERT INTO #Table2 VALUES (1,'225/65R17 102T'),

    (2,'225/65R17 102T'),(3,'225/65R17 102T'),(4,'235/60R18 102V'),

    (5,'235/60R18 102V'),(6,'235/60R18 102V'),(7,'235/60R18 102V'),

    (8,'205/55R16 89H'),(9,'205/70R15 89H'),(13,'225/65R17 102T')

    Here's a query built upon your data which may go some way towards what you're trying to do:

    SELECT DealerID, Tire, RepairsOfThisTyre, RepairCount = SUM(RepairsOfThisTyre) OVER(PARTITION BY DealerID)

    FROM (

    SELECT DealerID, Tire, RepairsOfThisTyre = COUNT(*)

    FROM #Table1 d

    INNER JOIN #Table2 r ON r.RepairID = d.RepairID

    GROUP BY DealerID, Tire

    ) d

    ORDER BY DealerID, RepairsOfThisTyre DESC

    How close is this to your requested result?

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • It is close to what I wrote in my example.

    Your query returns this:

    DealerIDTire RepairsOfThisTyreRepairCount

    1 235/60R18 102V 4 9

    1 225/65R17 102T 3 9

    1 205/55R16 89H 1 9

    1 205/70R15 89H 1 9

    2 225/65R17 102T 1 1

    I need it to return this:

    DealerIDTire RepairsOfThisTyreRepairCount PercentOfTotalRepairOrders

    1 235/60R18 102V4 10 40

    1 225/65R17 102T3 10 30

    1 205/55R16 89H1 10 10

    2 225/65R17 102T1 3 33

    The RepairCount is the total number of repair orders in Table 1. The record with DealerID 1 and Tire 205/70R15 89H is not included since the 3 records above it make up the top 80%.

  • Gotcha - think you're looking for this:

    ;WITH

    Preagg AS (

    SELECT DealerID, Tire, RepairsOfThisTyre,

    RepairsForThisDealer = SUM(RepairsOfThisTyre) OVER(PARTITION BY DealerID)

    FROM (

    SELECT DealerID, Tire,

    RepairsOfThisTyre = COUNT(*)

    FROM #Table1 d

    INNER JOIN #Table2 r ON r.RepairID = d.RepairID

    GROUP BY DealerID, Tire

    ) d

    ),

    FinalAgg AS (

    SELECT DealerID, Tire, RepairsOfThisTyre, RepairsForThisDealer,

    x.PCT,

    q = SUM(x.PCT) OVER(PARTITION BY DealerID ORDER BY x.PCT DESC, Tire),

    rn = ROW_NUMBER() OVER(PARTITION BY DealerID ORDER BY RepairsOfThisTyre DESC)

    FROM Preagg

    CROSS APPLY (SELECT PCT = (RepairsOfThisTyre*100.00)/RepairsForThisDealer) x

    )

    SELECT f.*

    FROM FinalAgg f

    CROSS APPLY (

    SELECT TOP 1 *

    FROM FinalAgg fi

    WHERE fi.DealerID = f.DealerID

    AND q >= 80

    ORDER BY rn

    ) x

    WHERE f.rn <= x.rn

    ORDER BY f.DealerID, f.RepairsOfThisTyre DESC

    Give it a shot. It's highly likely that it can be improved.

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • The Preagg query is just counting the total number or records per dealer in table 2.

    I switched the inner join to a left outer join and got some crazy results. The only thing I can think of to correct it is to do this:

    SELECT

    DealerId,

    COUNT(RepairID) AS 'RepairsForThisDealer'

    INTO

    #temp

    FROM

    #Table1

    GROUP BY

    DealerId

    ;WITH

    Preagg AS (

    SELECT

    d.DealerID,

    Tire,

    COUNT(r.RepairID) AS RepairsOfThisTyre,

    RepairsForThisDealer

    FROM

    #Table1 d

    INNER JOIN #Table2 r ON r.RepairID = d.RepairID

    INNER JOIN #temp t ON t.DealerId = d.DealerId

    GROUP BY d.DealerID, Tire, RepairsForThisDealer

    ),

    FinalAgg AS (

    SELECT DealerID, Tire, RepairsOfThisTyre, RepairsForThisDealer,

    x.PCT,

    q = SUM(x.PCT) OVER(PARTITION BY DealerID ORDER BY x.PCT DESC, Tire),

    rn = ROW_NUMBER() OVER(PARTITION BY DealerID ORDER BY RepairsOfThisTyre DESC)

    FROM Preagg

    CROSS APPLY (SELECT PCT = (RepairsOfThisTyre*100.00)/RepairsForThisDealer) x

    )

    SELECT f.*

    FROM FinalAgg f

    CROSS APPLY (

    SELECT TOP 1 *

    FROM FinalAgg fi

    WHERE fi.DealerID = f.DealerID

    AND q >= 80

    ORDER BY rn

    ) x

    WHERE f.rn <= x.rn

    ORDER BY f.DealerID, f.RepairsOfThisTyre DESC

    I get the correct percents, but now it doesn't show dealer 2.

  • I also posted this question in StackOverflow. Here is the solution they came up with:

    select DealerId, Tire, count(*) as RepairsOfThisTire,

    max(cnt) as TotalRepairs,

    count(*) * 1.0 / max(cnt) as PercentOfTotal

    from (select t1.*, t2.Tire,

    row_number() over (partition by t1.DealerId

    order by (case when t2.tire is null then 1 else 0 end),

    t1.RepairId

    ) as seqnum,

    count(*) over (partition by t1.DealerId) as cnt

    from table1 t1 left join

    table2 t2

    on t1.RepairId = t2.RepairId

    ) t

    where seqnum <= 0.8*cnt and tire is not null

    group by DealerId, Tire

    order by 1, 2;

  • Just an FYI, you shouldn't order by ordinal values. You really should specify the column names you are ordering by. If I remember correctly ordering by ordinal values is being deprecated.

  • Thanks for the tip. I will make sure to change it in my query.

  • Curious, is this what you actually want? I added a third dealer with only one order for one tire. This dealer does not show up in the results set.

    CREATE TABLE dbo.Table1 (DealerId INT, RepairId INT);

    INSERT INTO dbo.Table1 VALUES

    (1,1),(1,2),(1,3),(1,4),(1,5),(1,6),(1,7),

    (1,8),(1,9),(1,10),(2,11),(2,12),(2,13),

    (3,14);

    CREATE TABLE dbo.Table2 (RepairId INT, tire VARCHAR(15));

    INSERT INTO dbo.Table2 VALUES (1,'225/65R17 102T'),

    (2,'225/65R17 102T'),(3,'225/65R17 102T'),(4,'235/60R18 102V'),

    (5,'235/60R18 102V'),(6,'235/60R18 102V'),(7,'235/60R18 102V'),

    (8,'205/55R16 89H'),(9,'205/70R15 89H'),(13,'225/65R17 102T'),

    (14,'235/60R18 102V');

    select * from dbo.Table1;

    select * from dbo.Table2;

    select DealerId, tire, count(*) as RepairsOfThisTire,

    max(cnt) as TotalRepairs,

    count(*) * 1.0 / max(cnt) as PercentOfTotal

    from (select t1.*, t2.tire,

    row_number() over (partition by t1.DealerId

    order by (case when t2.tire is null then 1 else 0 end),

    t1.RepairId

    ) as seqnum,

    count(*) over (partition by t1.DealerId) as cnt

    from dbo.Table1 t1 left join

    dbo.Table2 t2

    on t1.RepairId = t2.RepairId

    ) t

    where seqnum <= 0.8*cnt and tire is not null

    group by DealerId, tire

    order by DealerId, tire, RepairsOfThisTire desc;

    go

    DROP TABLE dbo.Table1;

    DROP TABLE dbo.Table2;

  • Added two more orders for the same tire and Dealer 3 shows up but only reports the tire as 66% of the total when in actuality it is 100%.

    CREATE TABLE dbo.Table1 (DealerId INT, RepairId INT);

    INSERT INTO dbo.Table1 VALUES

    (1,1),(1,2),(1,3),(1,4),(1,5),(1,6),(1,7),

    (1,8),(1,9),(1,10),(2,11),(2,12),(2,13),

    (3,14),(3,15),(3,16);

    CREATE TABLE dbo.Table2 (RepairId INT, tire VARCHAR(15));

    INSERT INTO dbo.Table2 VALUES (1,'225/65R17 102T'),

    (2,'225/65R17 102T'),(3,'225/65R17 102T'),(4,'235/60R18 102V'),

    (5,'235/60R18 102V'),(6,'235/60R18 102V'),(7,'235/60R18 102V'),

    (8,'205/55R16 89H'),(9,'205/70R15 89H'),(13,'225/65R17 102T'),

    (14,'235/60R18 102V'),(15,'235/60R18 102V'),(16,'235/60R18 102V');

    select * from dbo.Table1;

    select * from dbo.Table2;

    select DealerId, tire, count(*) as RepairsOfThisTire,

    max(cnt) as TotalRepairs,

    count(*) * 1.0 / max(cnt) as PercentOfTotal

    from (select t1.*, t2.tire,

    row_number() over (partition by t1.DealerId

    order by (case when t2.tire is null then 1 else 0 end),

    t1.RepairId

    ) as seqnum,

    count(*) over (partition by t1.DealerId) as cnt

    from dbo.Table1 t1 left join

    dbo.Table2 t2

    on t1.RepairId = t2.RepairId

    ) t

    where seqnum <= 0.8*cnt and tire is not null

    group by DealerId, tire

    order by DealerId, tire, RepairsOfThisTire desc;

    go

    DROP TABLE dbo.Table1;

    DROP TABLE dbo.Table2;

  • Lynn Pettis (3/21/2014)


    Added two more orders for the same tire and Dealer 3 shows up but only reports the tire as 66% of the total when in actuality it is 100%.

    Hmm...Interesting. I think you stumbled on a test case that we did not think of. Namely when all the orders are the same tire size. It is not likely going to ever happen but it is good to cover all bases.

    The query is supposed to go through Table 1 grouping by dealerID and counting up all a dealers orders. This number will be the denominator in the calculation. It then looks at table 2 grouping by dealerID and by tire and dividing the count by the total orders in table 1 to get the % of the total orders. Once I have a % calculated I then need to only select the tires that make up 80% of the dealers total count, everything else is left out.

  • Added another dealer with 10 orders, 9 for one timre and 1 for a different tire. The new dealer shows up, but again the count of the Repairs for the tire shows as 8 and 80% when actually it is 9 and 90%.

    CREATE TABLE dbo.Table1 (DealerId INT, RepairId INT);

    INSERT INTO dbo.Table1 VALUES

    (1,1),(1,2),(1,3),(1,4),(1,5),(1,6),(1,7),

    (1,8),(1,9),(1,10),(2,11),(2,12),(2,13),

    (3,14),(3,15),(3,16),

    (4,17),(4,18),(4,19),(4,20),(4,21),(4,22),(4,23),(4,24),(4,25),(4,26);

    CREATE TABLE dbo.Table2 (RepairId INT, tire VARCHAR(15));

    INSERT INTO dbo.Table2 VALUES (1,'225/65R17 102T'),

    (2,'225/65R17 102T'),(3,'225/65R17 102T'),(4,'235/60R18 102V'),

    (5,'235/60R18 102V'),(6,'235/60R18 102V'),(7,'235/60R18 102V'),

    (8,'205/55R16 89H'),(9,'205/70R15 89H'),(13,'225/65R17 102T'),

    (17,'235/60R18 102V'),(18,'235/60R18 102V'),(19,'235/60R18 102V'),

    (20,'235/60R18 102V'),(21,'235/60R18 102V'),(22,'235/60R18 102V'),

    (23,'235/60R18 102V'),(24,'235/60R18 102V'),(25,'235/60R18 102V'),

    (26,'225/65R17 102T');

    select * from dbo.Table1;

    select * from dbo.Table2;

    select DealerId, tire, count(*) as RepairsOfThisTire,

    max(cnt) as TotalRepairs,

    count(*) * 1.0 / max(cnt) as PercentOfTotal

    from (select t1.*, t2.tire,

    row_number() over (partition by t1.DealerId

    order by (case when t2.tire is null then 1 else 0 end),

    t1.RepairId

    ) as seqnum,

    count(*) over (partition by t1.DealerId) as cnt

    from dbo.Table1 t1 left join

    dbo.Table2 t2

    on t1.RepairId = t2.RepairId

    ) t

    where seqnum <= 0.8*cnt and tire is not null

    group by DealerId, tire

    order by DealerId, tire, RepairsOfThisTire desc;

    go

    DROP TABLE dbo.Table1;

    DROP TABLE dbo.Table2;

  • Modified the order by so that it listed output in descending order by RepairsOfThisTire by DealerId, which what you original indicated what you also wanted.

    CREATE TABLE dbo.Table1 (DealerId INT, RepairId INT);

    INSERT INTO dbo.Table1 VALUES

    (1,1),(1,2),(1,3),(1,4),(1,5),(1,6),(1,7),

    (1,8),(1,9),(1,10),(2,11),(2,12),(2,13),

    (3,14),(3,15),(3,16),

    (4,17),(4,18),(4,19),(4,20),(4,21),(4,22),(4,23),(4,24),(4,25),(4,26);

    CREATE TABLE dbo.Table2 (RepairId INT, tire VARCHAR(15));

    INSERT INTO dbo.Table2 VALUES (1,'225/65R17 102T'),

    (2,'225/65R17 102T'),(3,'225/65R17 102T'),(4,'235/60R18 102V'),

    (5,'235/60R18 102V'),(6,'235/60R18 102V'),(7,'235/60R18 102V'),

    (8,'205/55R16 89H'),(9,'205/70R15 89H'),(13,'225/65R17 102T'),

    (14,'235/60R18 102V'),(15,'235/60R18 102V'),(16,'235/60R18 102V'),

    (17,'235/60R18 102V'),(18,'235/60R18 102V'),(19,'235/60R18 102V'),

    (20,'235/60R18 102V'),(21,'235/60R18 102V'),(22,'235/60R18 102V'),

    (23,'235/60R18 102V'),(24,'235/60R18 102V'),(25,'235/60R18 102V'),

    (26,'225/65R17 102T');

    select * from dbo.Table1;

    select * from dbo.Table2;

    select DealerId, tire, count(*) as RepairsOfThisTire,

    max(cnt) as TotalRepairs,

    count(*) * 1.0 / max(cnt) as PercentOfTotal

    from (select t1.*, t2.tire,

    row_number() over (partition by t1.DealerId

    order by (case when t2.tire is null then 1 else 0 end),

    t1.RepairId

    ) as seqnum,

    count(*) over (partition by t1.DealerId) as cnt

    from dbo.Table1 t1 left join

    dbo.Table2 t2

    on t1.RepairId = t2.RepairId

    ) t

    where seqnum <= 0.8*cnt and tire is not null

    group by DealerId, tire

    order by DealerId, RepairsOfThisTire desc, tire;

    go

    DROP TABLE dbo.Table1;

    DROP TABLE dbo.Table2;

  • tfendt (3/21/2014)


    The Preagg query is just counting the total number or records per dealer in table 2.

    ...

    It also counts the total number of repairs per dealer.

    ;WITH

    Preagg AS (

    SELECT DealerID, Tire, RepairsOfThisTyre,

    RepairsForThisDealer = SUM(RepairsOfThisTyre) OVER(PARTITION BY DealerID)

    FROM (

    SELECT DealerID, Tire,

    RepairsOfThisTyre = COUNT(*)

    FROM Table1 d

    INNER JOIN Table2 r ON r.RepairID = d.RepairID

    GROUP BY DealerID, Tire

    ) d

    ) SELECT * FROM Preagg


    [font="Arial"]Low-hanging fruit picker and defender of the moggies[/font]

    For better assistance in answering your questions, please read this[/url].


    Understanding and using APPLY, (I)[/url] and (II)[/url] Paul White[/url]

    Hidden RBAR: Triangular Joins[/url] / The "Numbers" or "Tally" Table: What it is and how it replaces a loop[/url] Jeff Moden[/url]

  • tfendt (3/21/2014)


    I also posted this question in StackOverflow. Here is the solution:

    select DealerId, Tire, count(*) as RepairsOfThisTire,

    max(cnt) as TotalRepairs,

    count(*) * 1.0 / max(cnt) as PercentOfTotal

    from (select t1.*, t2.Tire,

    row_number() over (partition by t1.DealerId

    order by (case when t2.tire is null then 1 else 0 end),

    t1.RepairId

    ) as seqnum,

    count(*) over (partition by t1.DealerId) as cnt

    from table1 t1 left join

    table2 t2

    on t1.RepairId = t2.RepairId

    ) t

    where seqnum <= 0.8*cnt and tire is not null

    group by DealerId, Tire

    order by 1, 2;

    Thank you for you time.

    This query works only because of the sequence of RepairId in table2, which already orders most-used tyres ahead of least-used tyres, probably by accident. If table2 really is artificially ordered in this way, you're ok, otherwise you will get incorrect results. Here's a copy of Lyn's dataset slightly adjusted to show this:

    DROP TABLE dbo.Table1

    CREATE TABLE dbo.Table1 (DealerId INT, RepairId INT);

    INSERT INTO dbo.Table1 VALUES

    (1,1),(1,2),(1,3),(1,4),(1,5),(1,6),(1,7),

    (1,8),(1,9),(1,10),(2,11),(2,12),(2,13),

    (3,14),(3,15),(3,16);

    DROP TABLE dbo.Table2

    CREATE TABLE dbo.Table2 (RepairId INT, tire VARCHAR(15));

    INSERT INTO dbo.Table2 VALUES (9,'225/65R17 102T'),

    (7,'225/65R17 102T'),(3,'225/65R17 102T'),(4,'235/60R18 102V'),

    (5,'235/60R18 102V'),(6,'235/60R18 102V'),(8,'235/60R18 102V'),

    (2,'205/55R16 89H'),(1,'205/70R15 89H'),(13,'225/65R17 102T'),

    (14,'235/60R18 102V'),(15,'235/60R18 102V'),(16,'235/60R18 102V');

    and a query which works:

    SELECT DealerID, tire, RepairsOfThisTire, TotalRepairs,

    PercentOfTotal = (RepairsOfThisTire*100.00)/TotalRepairs

    FROM (

    SELECT DealerID, TotalRepairs, tire, RepairsOfThisTire,

    Sector = NTILE(5) OVER(PARTITION BY DealerID ORDER BY RepairsOfThisTire DESC, tire DESC)

    FROM (

    SELECT

    t1.DealerID, t2.tire,

    TotalRepairs = COUNT(*) OVER (PARTITION BY t1.DealerID),

    RepairsOfThisTire = COUNT(*) OVER(PARTITION BY t1.DealerID, t2.tire)

    FROM dbo.Table1 t1

    LEFT JOIN dbo.Table2 t2 ON t2.RepairId = t1.RepairId

    ) d

    WHERE tire IS NOT NULL

    ) e

    WHERE Sector < 5

    GROUP BY DealerID, TotalRepairs, tire, RepairsOfThisTire--, PercentOfTotal

    ORDER BY DealerID, RepairsOfThisTire DESC, tire;


    [font="Arial"]Low-hanging fruit picker and defender of the moggies[/font]

    For better assistance in answering your questions, please read this[/url].


    Understanding and using APPLY, (I)[/url] and (II)[/url] Paul White[/url]

    Hidden RBAR: Triangular Joins[/url] / The "Numbers" or "Tally" Table: What it is and how it replaces a loop[/url] Jeff Moden[/url]

Viewing 15 posts - 1 through 15 (of 25 total)

You must be logged in to reply to this topic. Login to reply