Click here to monitor SSC
SQLServerCentral is supported by Redgate
 
Log in  ::  Register  ::  Not logged in
 
 
 


Selecting TOP PERCENT based on GROUP ?


Selecting TOP PERCENT based on GROUP ?

Author
Message
SQL_By_Chance
SQL_By_Chance
SSC-Enthusiastic
SSC-Enthusiastic (123 reputation)SSC-Enthusiastic (123 reputation)SSC-Enthusiastic (123 reputation)SSC-Enthusiastic (123 reputation)SSC-Enthusiastic (123 reputation)SSC-Enthusiastic (123 reputation)SSC-Enthusiastic (123 reputation)SSC-Enthusiastic (123 reputation)

Group: General Forum Members
Points: 123 Visits: 299
Hi,

I wanted to know if we can select percent of records for each group?

something like :-

SELECT TOP 30 PERCENT *
FROM LOCATION_TABLE
GROUP BY STATE
ORDER BY CITYNAME

______________________________________________________________________

Ankit
MCITP Database developer SQL 2008,MCTS Database Administration SQL 2008
"Remembering that you are going to die is the best way I know to avoid the trap of thinking you have something to lose.
You are already naked. There is no reason not to follow your heart.”
padhis
padhis
SSC Journeyman
SSC Journeyman (94 reputation)SSC Journeyman (94 reputation)SSC Journeyman (94 reputation)SSC Journeyman (94 reputation)SSC Journeyman (94 reputation)SSC Journeyman (94 reputation)SSC Journeyman (94 reputation)SSC Journeyman (94 reputation)

Group: General Forum Members
Points: 94 Visits: 125
SQL_By_Chance (1/9/2012)
Hi,

I wanted to know if we can select percent of records for each group?

something like :-

SELECT TOP 30 PERCENT *
FROM LOCATION_TABLE
GROUP BY STATE
ORDER BY CITYNAME



As you have only grouped by cityname so You can't use all column names in select list. replace * by column name which you are using in group by clause


SELECT TOP 30 PERCENT CITYNAME
FROM LOCATION_TABLE
GROUP BY STATE
ORDER BY CITYNAME
GSquared
GSquared
SSChampion
SSChampion (14K reputation)SSChampion (14K reputation)SSChampion (14K reputation)SSChampion (14K reputation)SSChampion (14K reputation)SSChampion (14K reputation)SSChampion (14K reputation)SSChampion (14K reputation)

Group: General Forum Members
Points: 14375 Visits: 9729
Select the states and cities, then use Cross Apply (or Outer Apply) to get the top 30 percent for each.

Is that what you're trying to do?

Something like this:

select state, city, location
from dbo.MyCitiesTable
cross apply
(select top 30 percent location
from dbo.MyLocationsTable
where MyLocationsTable.CityID = MyCitiesTable.ID
order by location) as Locations ;



- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread

"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
SQL_By_Chance
SQL_By_Chance
SSC-Enthusiastic
SSC-Enthusiastic (123 reputation)SSC-Enthusiastic (123 reputation)SSC-Enthusiastic (123 reputation)SSC-Enthusiastic (123 reputation)SSC-Enthusiastic (123 reputation)SSC-Enthusiastic (123 reputation)SSC-Enthusiastic (123 reputation)SSC-Enthusiastic (123 reputation)

Group: General Forum Members
Points: 123 Visits: 299
Thanks for the reply.

By chance, I came across a very good blog about the same which uses the following CTE :-
(http://weblogs.sqlteam.com/jeffs/archive/2008/02/21/Top-N-Percent-per-Group.aspx)

with AttributebyRegion as
(
select addressid,statecode, cityname,
rank() over (partition by cityname,statecode order by addressid desc) as AddedRank,
count(*) over (partition by cityname,statecode) as RegionCount
from lOCATTRIBUTE_Flagid
)
select
distinct statecode, COUNT_State = FLOOR (COUNT(RegionCount * .30))
from
AttributebyRegion
where
AddedRank <= (RegionCount * .30)
group by statecode
order by statecode

-----

______________________________________________________________________

Ankit
MCITP Database developer SQL 2008,MCTS Database Administration SQL 2008
"Remembering that you are going to die is the best way I know to avoid the trap of thinking you have something to lose.
You are already naked. There is no reason not to follow your heart.”
GSquared
GSquared
SSChampion
SSChampion (14K reputation)SSChampion (14K reputation)SSChampion (14K reputation)SSChampion (14K reputation)SSChampion (14K reputation)SSChampion (14K reputation)SSChampion (14K reputation)SSChampion (14K reputation)

Group: General Forum Members
Points: 14375 Visits: 9729
The problem with that one is that, when you go back to read it a year from now, it won't be clear what it's doing and how. You can overcome that with adequate documentation, but make sure that it's in comments in the code, so it doesn't get lost.

The Apply version tells you what it's doing just by reading the code. Minimal documentation needed.

- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread

"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
SQL_By_Chance
SQL_By_Chance
SSC-Enthusiastic
SSC-Enthusiastic (123 reputation)SSC-Enthusiastic (123 reputation)SSC-Enthusiastic (123 reputation)SSC-Enthusiastic (123 reputation)SSC-Enthusiastic (123 reputation)SSC-Enthusiastic (123 reputation)SSC-Enthusiastic (123 reputation)SSC-Enthusiastic (123 reputation)

Group: General Forum Members
Points: 123 Visits: 299
G,

your query has 2 input tables but if you check CTE it is taking input from the same table.

On the first thought, I too thought of using a function and applying it to outer query but that was degrading performance.

Regards,
Ankit

______________________________________________________________________

Ankit
MCITP Database developer SQL 2008,MCTS Database Administration SQL 2008
"Remembering that you are going to die is the best way I know to avoid the trap of thinking you have something to lose.
You are already naked. There is no reason not to follow your heart.”
Paul White
Paul White
SSChampion
SSChampion (10K reputation)SSChampion (10K reputation)SSChampion (10K reputation)SSChampion (10K reputation)SSChampion (10K reputation)SSChampion (10K reputation)SSChampion (10K reputation)SSChampion (10K reputation)

Group: General Forum Members
Points: 10338 Visits: 11350
I would also prefer an APPLY based solution. Using AdventureWorks:


WITH
Data AS
(
SELECT
a.*
FROM Person.Address AS a
),
StateProvince AS
(
SELECT DISTINCT
d.StateProvinceID
FROM Data AS d
)
SELECT
Selected.AddressID,
Selected.AddressLine1,
Selected.AddressLine2,
Selected.City,
Selected.StateProvinceID,
Selected.PostalCode
FROM StateProvince
CROSS APPLY
(
SELECT TOP (30) PERCENT
d2.*
FROM Data AS d2
WHERE
d2.StateProvinceID = StateProvince.StateProvinceID
ORDER BY
d2.City
) AS Selected
ORDER BY
StateProvince.StateProvinceID,
Selected.City;


This seems much clearer to me. Proper indexing might be necessary on larger input sets.



Paul White
SQLPerformance.com
SQLblog.com
@SQL_Kiwi
GSquared
GSquared
SSChampion
SSChampion (14K reputation)SSChampion (14K reputation)SSChampion (14K reputation)SSChampion (14K reputation)SSChampion (14K reputation)SSChampion (14K reputation)SSChampion (14K reputation)SSChampion (14K reputation)

Group: General Forum Members
Points: 14375 Visits: 9729
SQL_By_Chance (1/10/2012)
G,

your query has 2 input tables but if you check CTE it is taking input from the same table.

On the first thought, I too thought of using a function and applying it to outer query but that was degrading performance.

Regards,
Ankit


What function? My query doesn't use a function. It uses an inline derived table.

As for two tables vs one, a Distinct operator on one table will be more expensive than a query of a table that has the aggregate-base properly normalized, unless we're talking about very small datasets.

- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread

"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
Go


Permissions

You can't post new topics.
You can't post topic replies.
You can't post new polls.
You can't post replies to polls.
You can't edit your own topics.
You can't delete your own topics.
You can't edit other topics.
You can't delete other topics.
You can't edit your own posts.
You can't edit other posts.
You can't delete your own posts.
You can't delete other posts.
You can't post events.
You can't edit your own events.
You can't edit other events.
You can't delete your own events.
You can't delete other events.
You can't send private messages.
You can't send emails.
You can read topics.
You can't vote in polls.
You can't upload attachments.
You can download attachments.
You can't post HTML code.
You can't edit HTML code.
You can't post IFCode.
You can't post JavaScript.
You can post emoticons.
You can't post or upload images.

Select a forum

































































































































































SQLServerCentral


Search