Is there a fast way for SQL to do a Group By Count on items within a group?

  • select top 15 count(*) as cnt, state from table

    group by state

    order by cnt desc

    select top 15 count(*) as cnt, city from table

    group by city

    order by cnt desc

    select top 15 count(*) as cnt, company from table

    group by company

    order by cnt desc

    Can the above three queries be combined into one and still be fast, if so how?

    What i am trying to go is an item count, by group, similar to ones Inbox in Outlook.

    Thanks

  • check out Windowing functions. Something like:

    select count(*) over (partition by city) citycount

    , count(*) over (partition by state) statecount

    ...

  • Thanks

  • isuckatsql (5/1/2014)


    select top 15 count(*) as cnt, state from table

    group by state

    order by cnt desc

    select top 15 count(*) as cnt, city from table

    group by city

    order by cnt desc

    select top 15 count(*) as cnt, company from table

    group by company

    order by cnt desc

    Can the above three queries be combined into one and still be fast, if so how?

    What i am trying to go is an item count, by group, similar to ones Inbox in Outlook.

    Thanks

    I want to see your expected output. If New York City is the most populous city but California is the most populous state, how would a single output with the respective columns and counts portray that? NYC is in NY, not CA...

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • Kevin,

    You are correct!

    I am not sure on the best way to do this.

    Based on the Microsoft Outlook scenario, i assume you would need one SP for each of the folders: Inbox, Sent, Junk.....etc.. which i'm sure would be horribly slow.

    One of the suggestions made was to hold all the records in memory, then make the counts based on the records contained only in memory. Then update the records in memory every two hours.

    Any other ideas?

    Thanks

    Ian

  • isuckatsql (5/2/2014)


    Kevin,

    You are correct!

    I am not sure on the best way to do this.

    Based on the Microsoft Outlook scenario, i assume you would need one SP for each of the folders: Inbox, Sent, Junk.....etc.. which i'm sure would be horribly slow.

    One of the suggestions made was to hold all the records in memory, then make the counts based on the records contained only in memory. Then update the records in memory every two hours.

    Any other ideas?

    Thanks

    Ian

    I'm not sure what this recommendation is based on...

    Before SQL2014 you cannot really control what data will be hold in memory and for how long.

    How would such a suggestion be implemented to work undereach and every condition? What SQL command would you use to "force" it into memory?

    Or am I missing something here?



    Lutz
    A pessimist is an optimist with experience.

    How to get fast answers to your question[/url]
    How to post performance related questions[/url]
    Links for Tally Table [/url] , Cross Tabs [/url] and Dynamic Cross Tabs [/url], Delimited Split Function[/url]

  • Have you thought about how the TOP(15) requirement will work? What if your user wants to see - and expand - a state node which isn't in the top 15?

    Mimicing the Outlook menu system is fairly common. So long as you put a little thought into the data structure driving it, performance doesn't have to be an issue.

    β€œWrite the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • Lutz is correct, as usual. You cannot command SQL Server to "keep this table in memory all the time" like you want. However, it WILL stay in memory "all the time" if it is regularly referenced, which this type of dashboard aggregates probably will. So you do what you are suggesting - make one or more aggregate tables and refresh them on a period that is acceptable to the stake holders. To speed the refresh of said data I would keep track of the last "something" that was counted up through and use that as an indexed, seekable predicate in your query that computes the increments for your aggregates. There are various ways to accomplish this efficiently.

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • Is it better to do this with SQL or C# ?

    I have the table running in memory via C# and the initial load takes 52 seconds, then each count, as a test all cities in CA, takes 0.03 seconds.

    Thanks

  • Seems quite silly (and unscalable) to me to read all the data just to read all the data iteratively. πŸ™‚

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • I agree, Kevin.

    But if the OP really want to "force" the data to stay in memory, one way could be a dll that would query the aggregated data and keep the result in memory. Another dll would just return the data of the previous one.

    The question is: why? If the aggregated data don't have to be accurate, why not create a tiny table that'll hold the aggregated results and refresh the table as needed?



    Lutz
    A pessimist is an optimist with experience.

    How to get fast answers to your question[/url]
    How to post performance related questions[/url]
    Links for Tally Table [/url] , Cross Tabs [/url] and Dynamic Cross Tabs [/url], Delimited Split Function[/url]

  • A little trick with SIGN and the OVER clause makes this a simple query

    😎

    USE AdventureWorks2012;

    GO

    ;WITH MULTI_COUNTING AS

    (

    SELECT /* Sum the row numbers, gives distinct count */

    ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS PEAD_RID

    ,SUM(PEAD_AddressLine1 ) OVER (PARTITION BY (SELECT NULL)) AS COUNT_AddressLine1

    ,SUM(PEAD_City ) OVER (PARTITION BY (SELECT NULL)) AS COUNT_City

    ,SUM(PEAD_StateProvinceID ) OVER (PARTITION BY (SELECT NULL)) AS COUNT_StateProvinceID

    ,SUM(PEAD_PostalCode ) OVER (PARTITION BY (SELECT NULL)) AS COUNT_PostalCode

    FROM

    (

    SELECT /* Row number 1 returns 1, else 0 */

    1 + SIGN(1 - ROW_NUMBER() OVER (PARTITION BY PEAD.AddressLine1 ORDER BY (SELECT NULL))) AS PEAD_AddressLine1

    ,1 + SIGN(1 - ROW_NUMBER() OVER (PARTITION BY PEAD.City ORDER BY (SELECT NULL))) AS PEAD_City

    ,1 + SIGN(1 - ROW_NUMBER() OVER (PARTITION BY PEAD.StateProvinceID ORDER BY (SELECT NULL))) AS PEAD_StateProvinceID

    ,1 + SIGN(1 - ROW_NUMBER() OVER (PARTITION BY PEAD.PostalCode ORDER BY (SELECT NULL))) AS PEAD_PostalCode

    FROM Person.Address PEAD

    ) AS COUNT_BASE

    )

    SELECT

    MC.COUNT_AddressLine1

    ,MC.COUNT_City

    ,MC.COUNT_StateProvinceID

    ,MC.COUNT_PostalCode

    FROM MULTI_COUNTING MC

    WHERE MC.PEAD_RID = 1;

    Results

    COUNT_AddressLine1 COUNT_City COUNT_StateProvinceID COUNT_PostalCode

    -------------------- -------------------- --------------------- --------------------

    13567 575 74 661

  • But this query doesn't return the result set as requested since there's a TOP 15 involved for each group...



    Lutz
    A pessimist is an optimist with experience.

    How to get fast answers to your question[/url]
    How to post performance related questions[/url]
    Links for Tally Table [/url] , Cross Tabs [/url] and Dynamic Cross Tabs [/url], Delimited Split Function[/url]

  • LutzM (5/3/2014)


    But this query doesn't return the result set as requested since there's a TOP 15 involved for each group...

    Ooops, the lights are on but no one's at home πŸ˜›

    [mental note: read the full post]

Viewing 14 posts - 1 through 13 (of 13 total)

You must be logged in to reply to this topic. Login to reply