SQL Clone
SQLServerCentral is supported by Redgate
 
Log in  ::  Register  ::  Not logged in
 
 
 


Select Top x% From Group Based on Total


Select Top x% From Group Based on Total

Author
Message
tfendt
tfendt
SSC Veteran
SSC Veteran (275 reputation)SSC Veteran (275 reputation)SSC Veteran (275 reputation)SSC Veteran (275 reputation)SSC Veteran (275 reputation)SSC Veteran (275 reputation)SSC Veteran (275 reputation)SSC Veteran (275 reputation)

Group: General Forum Members
Points: 275 Visits: 342
Lynn Pettis (3/21/2014)
Added two more orders for the same tire and Dealer 3 shows up but only reports the tire as 66% of the total when in actuality it is 100%.


Hmm...Interesting. I think you stumbled on a test case that we did not think of. Namely when all the orders are the same tire size. It is not likely going to ever happen but it is good to cover all bases.

The query is supposed to go through Table 1 grouping by dealerID and counting up all a dealers orders. This number will be the denominator in the calculation. It then looks at table 2 grouping by dealerID and by tire and dividing the count by the total orders in table 1 to get the % of the total orders. Once I have a % calculated I then need to only select the tires that make up 80% of the dealers total count, everything else is left out.
Lynn Pettis
Lynn Pettis
SSC Guru
SSC Guru (94K reputation)SSC Guru (94K reputation)SSC Guru (94K reputation)SSC Guru (94K reputation)SSC Guru (94K reputation)SSC Guru (94K reputation)SSC Guru (94K reputation)SSC Guru (94K reputation)

Group: General Forum Members
Points: 94379 Visits: 38955
Added another dealer with 10 orders, 9 for one timre and 1 for a different tire. The new dealer shows up, but again the count of the Repairs for the tire shows as 8 and 80% when actually it is 9 and 90%.



CREATE TABLE dbo.Table1 (DealerId INT, RepairId INT);

INSERT INTO dbo.Table1 VALUES
(1,1),(1,2),(1,3),(1,4),(1,5),(1,6),(1,7),
(1,8),(1,9),(1,10),(2,11),(2,12),(2,13),
(3,14),(3,15),(3,16),
(4,17),(4,18),(4,19),(4,20),(4,21),(4,22),(4,23),(4,24),(4,25),(4,26);

CREATE TABLE dbo.Table2 (RepairId INT, tire VARCHAR(15));

INSERT INTO dbo.Table2 VALUES (1,'225/65R17 102T'),
(2,'225/65R17 102T'),(3,'225/65R17 102T'),(4,'235/60R18 102V'),
(5,'235/60R18 102V'),(6,'235/60R18 102V'),(7,'235/60R18 102V'),
(8,'205/55R16 89H'),(9,'205/70R15 89H'),(13,'225/65R17 102T'),
(17,'235/60R18 102V'),(18,'235/60R18 102V'),(19,'235/60R18 102V'),
(20,'235/60R18 102V'),(21,'235/60R18 102V'),(22,'235/60R18 102V'),
(23,'235/60R18 102V'),(24,'235/60R18 102V'),(25,'235/60R18 102V'),
(26,'225/65R17 102T');


select * from dbo.Table1;
select * from dbo.Table2;

select DealerId, tire, count(*) as RepairsOfThisTire,
max(cnt) as TotalRepairs,
count(*) * 1.0 / max(cnt) as PercentOfTotal
from (select t1.*, t2.tire,
row_number() over (partition by t1.DealerId
order by (case when t2.tire is null then 1 else 0 end),
t1.RepairId
) as seqnum,
count(*) over (partition by t1.DealerId) as cnt
from dbo.Table1 t1 left join
dbo.Table2 t2
on t1.RepairId = t2.RepairId
) t
where seqnum <= 0.8*cnt and tire is not null
group by DealerId, tire
order by DealerId, tire, RepairsOfThisTire desc;

go
DROP TABLE dbo.Table1;
DROP TABLE dbo.Table2;




Cool
Lynn Pettis

For better assistance in answering your questions, click here
For tips to get better help with Performance Problems, click here
For Running Totals and its variations, click here or when working with partitioned tables
For more about Tally Tables, click here
For more about Cross Tabs and Pivots, click here and here
Managing Transaction Logs

SQL Musings from the Desert Fountain Valley SQL (My Mirror Blog)
Lynn Pettis
Lynn Pettis
SSC Guru
SSC Guru (94K reputation)SSC Guru (94K reputation)SSC Guru (94K reputation)SSC Guru (94K reputation)SSC Guru (94K reputation)SSC Guru (94K reputation)SSC Guru (94K reputation)SSC Guru (94K reputation)

Group: General Forum Members
Points: 94379 Visits: 38955
Modified the order by so that it listed output in descending order by RepairsOfThisTire by DealerId, which what you original indicated what you also wanted.



CREATE TABLE dbo.Table1 (DealerId INT, RepairId INT);

INSERT INTO dbo.Table1 VALUES
(1,1),(1,2),(1,3),(1,4),(1,5),(1,6),(1,7),
(1,8),(1,9),(1,10),(2,11),(2,12),(2,13),
(3,14),(3,15),(3,16),
(4,17),(4,18),(4,19),(4,20),(4,21),(4,22),(4,23),(4,24),(4,25),(4,26);

CREATE TABLE dbo.Table2 (RepairId INT, tire VARCHAR(15));

INSERT INTO dbo.Table2 VALUES (1,'225/65R17 102T'),
(2,'225/65R17 102T'),(3,'225/65R17 102T'),(4,'235/60R18 102V'),
(5,'235/60R18 102V'),(6,'235/60R18 102V'),(7,'235/60R18 102V'),
(8,'205/55R16 89H'),(9,'205/70R15 89H'),(13,'225/65R17 102T'),
(14,'235/60R18 102V'),(15,'235/60R18 102V'),(16,'235/60R18 102V'),
(17,'235/60R18 102V'),(18,'235/60R18 102V'),(19,'235/60R18 102V'),
(20,'235/60R18 102V'),(21,'235/60R18 102V'),(22,'235/60R18 102V'),
(23,'235/60R18 102V'),(24,'235/60R18 102V'),(25,'235/60R18 102V'),
(26,'225/65R17 102T');


select * from dbo.Table1;
select * from dbo.Table2;

select DealerId, tire, count(*) as RepairsOfThisTire,
max(cnt) as TotalRepairs,
count(*) * 1.0 / max(cnt) as PercentOfTotal
from (select t1.*, t2.tire,
row_number() over (partition by t1.DealerId
order by (case when t2.tire is null then 1 else 0 end),
t1.RepairId
) as seqnum,
count(*) over (partition by t1.DealerId) as cnt
from dbo.Table1 t1 left join
dbo.Table2 t2
on t1.RepairId = t2.RepairId
) t
where seqnum <= 0.8*cnt and tire is not null
group by DealerId, tire
order by DealerId, RepairsOfThisTire desc, tire;

go
DROP TABLE dbo.Table1;
DROP TABLE dbo.Table2;




Cool
Lynn Pettis

For better assistance in answering your questions, click here
For tips to get better help with Performance Problems, click here
For Running Totals and its variations, click here or when working with partitioned tables
For more about Tally Tables, click here
For more about Cross Tabs and Pivots, click here and here
Managing Transaction Logs

SQL Musings from the Desert Fountain Valley SQL (My Mirror Blog)
ChrisM@home
ChrisM@home
SSCertifiable
SSCertifiable (5.2K reputation)SSCertifiable (5.2K reputation)SSCertifiable (5.2K reputation)SSCertifiable (5.2K reputation)SSCertifiable (5.2K reputation)SSCertifiable (5.2K reputation)SSCertifiable (5.2K reputation)SSCertifiable (5.2K reputation)

Group: General Forum Members
Points: 5176 Visits: 10605
tfendt (3/21/2014)
The Preagg query is just counting the total number or records per dealer in table 2.

...


It also counts the total number of repairs per dealer.

;WITH 
Preagg AS (
SELECT DealerID, Tire, RepairsOfThisTyre,
RepairsForThisDealer = SUM(RepairsOfThisTyre) OVER(PARTITION BY DealerID)
FROM (
SELECT DealerID, Tire,
RepairsOfThisTyre = COUNT(*)
FROM Table1 d
INNER JOIN Table2 r ON r.RepairID = d.RepairID
GROUP BY DealerID, Tire
) d
) SELECT * FROM Preagg




Low-hanging fruit picker and defender of the moggies





For better assistance in answering your questions, please read this.




Understanding and using APPLY, (I) and (II) Paul White

Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
ChrisM@home
ChrisM@home
SSCertifiable
SSCertifiable (5.2K reputation)SSCertifiable (5.2K reputation)SSCertifiable (5.2K reputation)SSCertifiable (5.2K reputation)SSCertifiable (5.2K reputation)SSCertifiable (5.2K reputation)SSCertifiable (5.2K reputation)SSCertifiable (5.2K reputation)

Group: General Forum Members
Points: 5176 Visits: 10605
tfendt (3/21/2014)
I also posted this question in StackOverflow. Here is the solution:

select DealerId, Tire, count(*) as RepairsOfThisTire,
max(cnt) as TotalRepairs,
count(*) * 1.0 / max(cnt) as PercentOfTotal
from (select t1.*, t2.Tire,
row_number() over (partition by t1.DealerId
order by (case when t2.tire is null then 1 else 0 end),
t1.RepairId
) as seqnum,
count(*) over (partition by t1.DealerId) as cnt
from table1 t1 left join
table2 t2
on t1.RepairId = t2.RepairId
) t
where seqnum <= 0.8*cnt and tire is not null
group by DealerId, Tire
order by 1, 2;



Thank you for you time.


This query works only because of the sequence of RepairId in table2, which already orders most-used tyres ahead of least-used tyres, probably by accident. If table2 really is artificially ordered in this way, you're ok, otherwise you will get incorrect results. Here's a copy of Lyn's dataset slightly adjusted to show this:
DROP TABLE dbo.Table1
CREATE TABLE dbo.Table1 (DealerId INT, RepairId INT);

INSERT INTO dbo.Table1 VALUES
(1,1),(1,2),(1,3),(1,4),(1,5),(1,6),(1,7),
(1,8),(1,9),(1,10),(2,11),(2,12),(2,13),
(3,14),(3,15),(3,16);

DROP TABLE dbo.Table2
CREATE TABLE dbo.Table2 (RepairId INT, tire VARCHAR(15));

INSERT INTO dbo.Table2 VALUES (9,'225/65R17 102T'),
(7,'225/65R17 102T'),(3,'225/65R17 102T'),(4,'235/60R18 102V'),
(5,'235/60R18 102V'),(6,'235/60R18 102V'),(8,'235/60R18 102V'),
(2,'205/55R16 89H'),(1,'205/70R15 89H'),(13,'225/65R17 102T'),
(14,'235/60R18 102V'),(15,'235/60R18 102V'),(16,'235/60R18 102V');



and a query which works:
SELECT DealerID, tire, RepairsOfThisTire, TotalRepairs, 
PercentOfTotal = (RepairsOfThisTire*100.00)/TotalRepairs
FROM (
SELECT DealerID, TotalRepairs, tire, RepairsOfThisTire,
Sector = NTILE(5) OVER(PARTITION BY DealerID ORDER BY RepairsOfThisTire DESC, tire DESC)
FROM (
SELECT
t1.DealerID, t2.tire,
TotalRepairs = COUNT(*) OVER (PARTITION BY t1.DealerID),
RepairsOfThisTire = COUNT(*) OVER(PARTITION BY t1.DealerID, t2.tire)
FROM dbo.Table1 t1
LEFT JOIN dbo.Table2 t2 ON t2.RepairId = t1.RepairId
) d
WHERE tire IS NOT NULL
) e
WHERE Sector < 5
GROUP BY DealerID, TotalRepairs, tire, RepairsOfThisTire--, PercentOfTotal
ORDER BY DealerID, RepairsOfThisTire DESC, tire;




Low-hanging fruit picker and defender of the moggies





For better assistance in answering your questions, please read this.




Understanding and using APPLY, (I) and (II) Paul White

Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
thava
thava
SSC-Addicted
SSC-Addicted (499 reputation)SSC-Addicted (499 reputation)SSC-Addicted (499 reputation)SSC-Addicted (499 reputation)SSC-Addicted (499 reputation)SSC-Addicted (499 reputation)SSC-Addicted (499 reputation)SSC-Addicted (499 reputation)

Group: General Forum Members
Points: 499 Visits: 557
might be this will be handy and more readable

with cte as (
SELECT DISTINCT t1.DealerID, t2.tire,
TotalRepairs = SUM(1) OVER (PARTITION BY t1.DealerID),
RepairsOfThisTire = SUM(1) OVER(PARTITION BY t1.DealerID, t2.tire),
PercentOfTotalRepairOrders = ROUND((SUM(1.0) OVER(PARTITION BY t1.DealerID, t2.tire)/SUM(1.0) OVER (PARTITION BY t1.DealerID)) *100,0)
FROM dbo.Table1 t1
LEFT JOIN dbo.Table2 t2
ON t2.RepairId = t1.RepairId
)
SELECT * FROM cte
WHERE tire IS NOT NULL
ORDER BY PercentOfTotalRepairOrders DESC


since you are using Sql 2012 you can use Lead or Lag function to calculate Total percent by Dealer Group and you can get the desire result



Every rule in a world of bits and bytes, can be bend or eventually be broken
MyBlog About Common dialog control
A Visualizer for viewing SqlCommand object script
tfendt
tfendt
SSC Veteran
SSC Veteran (275 reputation)SSC Veteran (275 reputation)SSC Veteran (275 reputation)SSC Veteran (275 reputation)SSC Veteran (275 reputation)SSC Veteran (275 reputation)SSC Veteran (275 reputation)SSC Veteran (275 reputation)

Group: General Forum Members
Points: 275 Visits: 342
ChrisM@home (3/23/2014)

This query works only because of the sequence of RepairId in table2, which already orders most-used tyres ahead of least-used tyres, probably by accident. If table2 really is artificially ordered in this way, you're ok, otherwise you will get incorrect results. Here's a copy of Lyn's dataset slightly adjusted to show this:
DROP TABLE dbo.Table1
CREATE TABLE dbo.Table1 (DealerId INT, RepairId INT);

INSERT INTO dbo.Table1 VALUES
(1,1),(1,2),(1,3),(1,4),(1,5),(1,6),(1,7),
(1,8),(1,9),(1,10),(2,11),(2,12),(2,13),
(3,14),(3,15),(3,16);

DROP TABLE dbo.Table2
CREATE TABLE dbo.Table2 (RepairId INT, tire VARCHAR(15));

INSERT INTO dbo.Table2 VALUES (9,'225/65R17 102T'),
(7,'225/65R17 102T'),(3,'225/65R17 102T'),(4,'235/60R18 102V'),
(5,'235/60R18 102V'),(6,'235/60R18 102V'),(8,'235/60R18 102V'),
(2,'205/55R16 89H'),(1,'205/70R15 89H'),(13,'225/65R17 102T'),
(14,'235/60R18 102V'),(15,'235/60R18 102V'),(16,'235/60R18 102V');



and a query which works:
SELECT DealerID, tire, RepairsOfThisTire, TotalRepairs, 
PercentOfTotal = (RepairsOfThisTire*100.00)/TotalRepairs
FROM (
SELECT DealerID, TotalRepairs, tire, RepairsOfThisTire,
Sector = NTILE(5) OVER(PARTITION BY DealerID ORDER BY RepairsOfThisTire DESC, tire DESC)
FROM (
SELECT
t1.DealerID, t2.tire,
TotalRepairs = COUNT(*) OVER (PARTITION BY t1.DealerID),
RepairsOfThisTire = COUNT(*) OVER(PARTITION BY t1.DealerID, t2.tire)
FROM dbo.Table1 t1
LEFT JOIN dbo.Table2 t2 ON t2.RepairId = t1.RepairId
) d
WHERE tire IS NOT NULL
) e
WHERE Sector < 5
GROUP BY DealerID, TotalRepairs, tire, RepairsOfThisTire--, PercentOfTotal
ORDER BY DealerID, RepairsOfThisTire DESC, tire;





The problem with this query is it returns data for dealer id 3. The business rule I was given was that they only want to see the top 80%. In the case of dealer id 3 the query is showing the top 100%. In this example the top 80% cannot be calculated since all the orders are the same tire. The main issue I see and need to get clarification from the business on is what happens if the top tire is > 80%?
ChrisM@Work
ChrisM@Work
SSC-Forever
SSC-Forever (41K reputation)SSC-Forever (41K reputation)SSC-Forever (41K reputation)SSC-Forever (41K reputation)SSC-Forever (41K reputation)SSC-Forever (41K reputation)SSC-Forever (41K reputation)SSC-Forever (41K reputation)

Group: General Forum Members
Points: 41248 Visits: 20000
tfendt (3/24/2014)
...
The problem with this query is it returns data for dealer id 3. The business rule I was given was that they only want to see the top 80%. In the case of dealer id 3 the query is showing the top 100%. In this example the top 80% cannot be calculated since all the orders are the same tire. The main issue I see and need to get clarification from the business on is what happens if the top tire is > 80%?


Exactly. What do they want if the number of rows is not divisible by 5. Modification of the query to accommodate the decision should be straightforward.

“Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
Exploring Recursive CTEs by Example Dwain Camps
tfendt
tfendt
SSC Veteran
SSC Veteran (275 reputation)SSC Veteran (275 reputation)SSC Veteran (275 reputation)SSC Veteran (275 reputation)SSC Veteran (275 reputation)SSC Veteran (275 reputation)SSC Veteran (275 reputation)SSC Veteran (275 reputation)

Group: General Forum Members
Points: 275 Visits: 342
tfendt (3/24/2014)
ChrisM@home (3/23/2014)

This query works only because of the sequence of RepairId in table2, which already orders most-used tyres ahead of least-used tyres, probably by accident. If table2 really is artificially ordered in this way, you're ok, otherwise you will get incorrect results. Here's a copy of Lyn's dataset slightly adjusted to show this:
DROP TABLE dbo.Table1
CREATE TABLE dbo.Table1 (DealerId INT, RepairId INT);

INSERT INTO dbo.Table1 VALUES
(1,1),(1,2),(1,3),(1,4),(1,5),(1,6),(1,7),
(1,8),(1,9),(1,10),(2,11),(2,12),(2,13),
(3,14),(3,15),(3,16);

DROP TABLE dbo.Table2
CREATE TABLE dbo.Table2 (RepairId INT, tire VARCHAR(15));

INSERT INTO dbo.Table2 VALUES (9,'225/65R17 102T'),
(7,'225/65R17 102T'),(3,'225/65R17 102T'),(4,'235/60R18 102V'),
(5,'235/60R18 102V'),(6,'235/60R18 102V'),(8,'235/60R18 102V'),
(2,'205/55R16 89H'),(1,'205/70R15 89H'),(13,'225/65R17 102T'),
(14,'235/60R18 102V'),(15,'235/60R18 102V'),(16,'235/60R18 102V');



and a query which works:
SELECT DealerID, tire, RepairsOfThisTire, TotalRepairs, 
PercentOfTotal = (RepairsOfThisTire*100.00)/TotalRepairs
FROM (
SELECT DealerID, TotalRepairs, tire, RepairsOfThisTire,
Sector = NTILE(5) OVER(PARTITION BY DealerID ORDER BY RepairsOfThisTire DESC, tire DESC)
FROM (
SELECT
t1.DealerID, t2.tire,
TotalRepairs = COUNT(*) OVER (PARTITION BY t1.DealerID),
RepairsOfThisTire = COUNT(*) OVER(PARTITION BY t1.DealerID, t2.tire)
FROM dbo.Table1 t1
LEFT JOIN dbo.Table2 t2 ON t2.RepairId = t1.RepairId
) d
WHERE tire IS NOT NULL
) e
WHERE Sector < 5
GROUP BY DealerID, TotalRepairs, tire, RepairsOfThisTire--, PercentOfTotal
ORDER BY DealerID, RepairsOfThisTire DESC, tire;





The problem with this query is it returns data for dealer id 3. The business rule I was given was that they only want to see the top 80%. In the case of dealer id 3 the query is showing the top 100%. In this example the top 80% cannot be calculated since all the orders are the same tire. The main issue I see and need to get clarification from the business on is what happens if the top tire is > 80%?


I do like this query however. I am not 100% sure on how it works though. How does it know to select the top 80%? Why NTILE(5)?
ChrisM@Work
ChrisM@Work
SSC-Forever
SSC-Forever (41K reputation)SSC-Forever (41K reputation)SSC-Forever (41K reputation)SSC-Forever (41K reputation)SSC-Forever (41K reputation)SSC-Forever (41K reputation)SSC-Forever (41K reputation)SSC-Forever (41K reputation)

Group: General Forum Members
Points: 41248 Visits: 20000
Have a look at the NTILE function in Books Online (SQL Server help).
NTILE(n) splits the data into n equal-sized partitions numbered 1 to n in a user-defined order and with defined rules for when the data set is not divisible by n.

“Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
Exploring Recursive CTEs by Example Dwain Camps
Go


Permissions

You can't post new topics.
You can't post topic replies.
You can't post new polls.
You can't post replies to polls.
You can't edit your own topics.
You can't delete your own topics.
You can't edit other topics.
You can't delete other topics.
You can't edit your own posts.
You can't edit other posts.
You can't delete your own posts.
You can't delete other posts.
You can't post events.
You can't edit your own events.
You can't edit other events.
You can't delete your own events.
You can't delete other events.
You can't send private messages.
You can't send emails.
You can read topics.
You can't vote in polls.
You can't upload attachments.
You can download attachments.
You can't post HTML code.
You can't edit HTML code.
You can't post IFCode.
You can't post JavaScript.
You can post emoticons.
You can't post or upload images.

Select a forum

































































































































































SQLServerCentral


Search