Is there a fast way for SQL to do a Group By Count on items within a group?

Question

Post reply

Is there a fast way for SQL to do a Group By Count on items within a group?

isuckatsql

SSChampion

Points: 10381
More actions
May 1, 2014 at 9:38 am

#285188

select top 15 count(*) as cnt, state from table
group by state
order by cnt desc
select top 15 count(*) as cnt, city from table
group by city
order by cnt desc
select top 15 count(*) as cnt, company from table
group by company
order by cnt desc
Can the above three queries be combined into one and still be fast, if so how?
What i am trying to go is an item count, by group, similar to ones Inbox in Outlook.
Thanks

Viewing 14 posts - 1 through 13 (of 13 total)

You must be logged in to reply to this topic. Login to reply

gbritton1 SSCertifiable Points: 6522 More actions · Answer 1

check out Windowing functions. Something like:

select count(*) over (partition by city) citycount

, count(*) over (partition by state) statecount

...

isuckatsql SSChampion Points: 10381 More actions · Answer 2

isuckatsql

SSChampion

Points: 10381

May 1, 2014 at 9:56 am

#1710140

Thanks

TheSQLGuru SSC Guru Points: 134017 More actions · Answer 3

isuckatsql (5/1/2014)
select top 15 count(*) as cnt, state from table
group by state
order by cnt desc
select top 15 count(*) as cnt, city from table
group by city
order by cnt desc
select top 15 count(*) as cnt, company from table
group by company
order by cnt desc
Can the above three queries be combined into one and still be fast, if so how?
What i am trying to go is an item count, by group, similar to ones Inbox in Outlook.
Thanks

I want to see your expected output. If New York City is the most populous city but California is the most populous state, how would a single output with the respective columns and counts portray that? NYC is in NY, not CA...

Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service

isuckatsql SSChampion Points: 10381 More actions · Answer 4

Kevin,

You are correct!

I am not sure on the best way to do this.

Based on the Microsoft Outlook scenario, i assume you would need one SP for each of the folders: Inbox, Sent, Junk.....etc.. which i'm sure would be horribly slow.

One of the suggestions made was to hold all the records in memory, then make the counts based on the records contained only in memory. Then update the records in memory every two hours.

Any other ideas?

Thanks

Ian

LutzM SSC Guru Points: 107049 More actions · Answer 5

isuckatsql (5/2/2014)
Kevin,
You are correct!
I am not sure on the best way to do this.
Based on the Microsoft Outlook scenario, i assume you would need one SP for each of the folders: Inbox, Sent, Junk.....etc.. which i'm sure would be horribly slow.
One of the suggestions made was to hold all the records in memory, then make the counts based on the records contained only in memory. Then update the records in memory every two hours.
Any other ideas?
Thanks
Ian

I'm not sure what this recommendation is based on...

Before SQL2014 you cannot really control what data will be hold in memory and for how long.

How would such a suggestion be implemented to work undereach and every condition? What SQL command would you use to "force" it into memory?

Or am I missing something here?

Lutz
A pessimist is an optimist with experience.

How to get fast answers to your question[/url]
How to post performance related questions[/url]
Links for Tally Table [/url] , Cross Tabs [/url] and Dynamic Cross Tabs [/url], Delimited Split Function[/url]

ChrisM@Work SSC Guru Points: 186127 More actions · Answer 6

Have you thought about how the TOP(15) requirement will work? What if your user wants to see - and expand - a state node which isn't in the top 15?

Mimicing the Outlook menu system is fairly common. So long as you put a little thought into the data structure driving it, performance doesn't have to be an issue.

^{“Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw}

For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

TheSQLGuru SSC Guru Points: 134017 More actions · Answer 7

Lutz is correct, as usual. You cannot command SQL Server to "keep this table in memory all the time" like you want. However, it WILL stay in memory "all the time" if it is regularly referenced, which this type of dashboard aggregates probably will. So you do what you are suggesting - make one or more aggregate tables and refresh them on a period that is acceptable to the stake holders. To speed the refresh of said data I would keep track of the last "something" that was counted up through and use that as an indexed, seekable predicate in your query that computes the increments for your aggregates. There are various ways to accomplish this efficiently.

Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service

isuckatsql SSChampion Points: 10381 More actions · Answer 8

Is it better to do this with SQL or C# ?

I have the table running in memory via C# and the initial load takes 52 seconds, then each count, as a test all cities in CA, takes 0.03 seconds.

Thanks

TheSQLGuru SSC Guru Points: 134017 More actions · Answer 9

Seems quite silly (and unscalable) to me to read all the data just to read all the data iteratively. 🙂

Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service

LutzM SSC Guru Points: 107049 More actions · Answer 10

I agree, Kevin.

But if the OP really want to "force" the data to stay in memory, one way could be a dll that would query the aggregated data and keep the result in memory. Another dll would just return the data of the previous one.

The question is: why? If the aggregated data don't have to be accurate, why not create a tiny table that'll hold the aggregated results and refresh the table as needed?

Lutz
A pessimist is an optimist with experience.

How to get fast answers to your question[/url]
How to post performance related questions[/url]
Links for Tally Table [/url] , Cross Tabs [/url] and Dynamic Cross Tabs [/url], Delimited Split Function[/url]

Eirikur Eiriksson SSC Guru Points: 182879 More actions · Answer 11

A little trick with SIGN and the OVER clause makes this a simple query

😎

USE AdventureWorks2012;

GO

;WITH MULTI_COUNTING AS

(

SELECT /* Sum the row numbers, gives distinct count */

ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS PEAD_RID

,SUM(PEAD_AddressLine1 ) OVER (PARTITION BY (SELECT NULL)) AS COUNT_AddressLine1

,SUM(PEAD_City ) OVER (PARTITION BY (SELECT NULL)) AS COUNT_City

,SUM(PEAD_StateProvinceID ) OVER (PARTITION BY (SELECT NULL)) AS COUNT_StateProvinceID

,SUM(PEAD_PostalCode ) OVER (PARTITION BY (SELECT NULL)) AS COUNT_PostalCode

FROM

(

SELECT /* Row number 1 returns 1, else 0 */

1 + SIGN(1 - ROW_NUMBER() OVER (PARTITION BY PEAD.AddressLine1 ORDER BY (SELECT NULL))) AS PEAD_AddressLine1

,1 + SIGN(1 - ROW_NUMBER() OVER (PARTITION BY PEAD.City ORDER BY (SELECT NULL))) AS PEAD_City

,1 + SIGN(1 - ROW_NUMBER() OVER (PARTITION BY PEAD.StateProvinceID ORDER BY (SELECT NULL))) AS PEAD_StateProvinceID

,1 + SIGN(1 - ROW_NUMBER() OVER (PARTITION BY PEAD.PostalCode ORDER BY (SELECT NULL))) AS PEAD_PostalCode

FROM Person.Address PEAD

) AS COUNT_BASE

)

SELECT

MC.COUNT_AddressLine1

,MC.COUNT_City

,MC.COUNT_StateProvinceID

,MC.COUNT_PostalCode

FROM MULTI_COUNTING MC

WHERE MC.PEAD_RID = 1;

Results

COUNT_AddressLine1 COUNT_City COUNT_StateProvinceID COUNT_PostalCode

-------------------- -------------------- --------------------- --------------------

13567 575 74 661

LutzM SSC Guru Points: 107049 More actions · Answer 12

But this query doesn't return the result set as requested since there's a TOP 15 involved for each group...

Lutz
A pessimist is an optimist with experience.

How to get fast answers to your question[/url]
How to post performance related questions[/url]
Links for Tally Table [/url] , Cross Tabs [/url] and Dynamic Cross Tabs [/url], Delimited Split Function[/url]

Eirikur Eiriksson SSC Guru Points: 182879 More actions · Answer 13

LutzM (5/3/2014)
But this query doesn't return the result set as requested since there's a TOP 15 involved for each group...

Ooops, the lights are on but no one's at home 😛

[mental note: read the full post]