Selecting TOP PERCENT based on GROUP ?

Question

Selecting TOP PERCENT based on GROUP ?

SQL_By_Chance

Hall of Fame

Points: 3915
More actions
January 9, 2012 at 10:20 am

#249304

Hi,
I wanted to know if we can select percent of records for each group?
something like :-
SELECT TOP 30 PERCENT *
FROM LOCATION_TABLE
GROUP BY STATE
ORDER BY CITYNAME
______________________________________________________________________
Ankit
MCITP Database developer SQL 2008,MCTS Database Administration SQL 2008
"Remembering that you are going to die is the best way I know to avoid the trap of thinking you have something to lose.
You are already naked. There is no reason not to follow your heart.”

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply

padhis SSCommitted Points: 1762 More actions · Answer 1

SQL_By_Chance (1/9/2012)
Hi,
I wanted to know if we can select percent of records for each group?
something like :-
SELECT TOP 30 PERCENT *
FROM LOCATION_TABLE
GROUP BY STATE
ORDER BY CITYNAME

As you have only grouped by cityname so You can't use all column names in select list. replace * by column name which you are using in group by clause

SELECT TOP 30 PERCENT CITYNAME

FROM LOCATION_TABLE

GROUP BY STATE

ORDER BY CITYNAME

GSquared SSC Guru Points: 260824 More actions · Answer 2

Select the states and cities, then use Cross Apply (or Outer Apply) to get the top 30 percent for each.

Is that what you're trying to do?

Something like this:

select state, city, location

from dbo.MyCitiesTable

cross apply

(select top 30 percent location

from dbo.MyLocationsTable

where MyLocationsTable.CityID = MyCitiesTable.ID

order by location) as Locations ;

- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread

"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

SQL_By_Chance Hall of Fame Points: 3915 More actions · Answer 3

Thanks for the reply.

By chance, I came across a very good blog about the same which uses the following CTE :-

(http://weblogs.sqlteam.com/jeffs/archive/2008/02/21/Top-N-Percent-per-Group.aspx)

with AttributebyRegion as

(

select addressid,statecode, cityname,

rank() over (partition by cityname,statecode order by addressid desc) as AddedRank,

count(*) over (partition by cityname,statecode) as RegionCount

from lOCATTRIBUTE_Flagid

)

select

distinct statecode, COUNT_State = FLOOR (COUNT(RegionCount * .30))

from

AttributebyRegion

where

AddedRank <= (RegionCount * .30)

group by statecode

order by statecode

-----

______________________________________________________________________

Ankit
MCITP Database developer SQL 2008,MCTS Database Administration SQL 2008
"Remembering that you are going to die is the best way I know to avoid the trap of thinking you have something to lose.
You are already naked. There is no reason not to follow your heart.”

GSquared SSC Guru Points: 260824 More actions · Answer 4

The problem with that one is that, when you go back to read it a year from now, it won't be clear what it's doing and how. You can overcome that with adequate documentation, but make sure that it's in comments in the code, so it doesn't get lost.

The Apply version tells you what it's doing just by reading the code. Minimal documentation needed.

- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread

"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

SQL_By_Chance Hall of Fame Points: 3915 More actions · Answer 5

G,

your query has 2 input tables but if you check CTE it is taking input from the same table.

On the first thought, I too thought of using a function and applying it to outer query but that was degrading performance.

Regards,

Ankit

______________________________________________________________________

Ankit
MCITP Database developer SQL 2008,MCTS Database Administration SQL 2008
"Remembering that you are going to die is the best way I know to avoid the trap of thinking you have something to lose.
You are already naked. There is no reason not to follow your heart.”

Paul White SSC Guru Points: 150467 More actions · Answer 6

I would also prefer an APPLY based solution. Using AdventureWorks:

WITH

Data AS

(

SELECT

a.*

FROM Person.Address AS a

),

StateProvince AS

(

SELECT DISTINCT

d.StateProvinceID

FROM Data AS d

)

SELECT

Selected.AddressID,

Selected.AddressLine1,

Selected.AddressLine2,

Selected.City,

Selected.StateProvinceID,

Selected.PostalCode

FROM StateProvince

CROSS APPLY

(

SELECT TOP (30) PERCENT

d2.*

FROM Data AS d2

WHERE

d2.StateProvinceID = StateProvince.StateProvinceID

ORDER BY

d2.City

) AS Selected

ORDER BY

StateProvince.StateProvinceID,

Selected.City;

This seems much clearer to me. Proper indexing might be necessary on larger input sets.

Paul White
SQLPerformance.com
SQLkiwi blog
@SQL_Kiwi

GSquared SSC Guru Points: 260824 More actions · Answer 7

SQL_By_Chance (1/10/2012)
G,
your query has 2 input tables but if you check CTE it is taking input from the same table.
On the first thought, I too thought of using a function and applying it to outer query but that was degrading performance.
Regards,
Ankit

What function? My query doesn't use a function. It uses an inline derived table.

As for two tables vs one, a Distinct operator on one table will be more expensive than a query of a table that has the aggregate-base properly normalized, unless we're talking about very small datasets.

- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread

"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon