how to best summarize data

  • I’ve got the following data and I’m trying to get to the following results. I’m assuming I need to use a CASE statement, but was looking for some help and to find out the best way to optimize the query as well because I will be hitting hundreds of thousands of rows to get the summarized data.

    Name CategorySourceUserIDEnteredUserIDOrder #

    Joe SmithMEDWjsmithbob 2012-01

    Joe SmithMEDWjsmithjsmith 2012-02

    Joe SmithLABVjsmithjsmith 2012-03

    Bob WhiteMED PROTOCOL bobbob 2012-04

    Bob WhiteRADWbobjsmith 2012-05

    Bob WhiteCTVbobkatie 2012-06

    Bob WhiteMED MD ORDER bobjsmith 2012-07

    I want to get a total count of orders by Name then by Category like below.

    NameCategoryTotal Orders% Phys Entered%Other Entered Protocol %

    Joe SmithMED250%50%0%

    Joe SmithLAB1100%0%0%

    Bob WhiteMED20%50%50%

    Bob WhiteRAD10%100%0%

    Bob WhiteCT10%100%0%

    Total Orders = # of orders by Name & Category

    % Phys Entered = # of orders where user ID = EnteredUser ID AND Source <> ‘PROTOCOL’ / Total Orders by Name & Category

    %Other Entered = # of orders where userID <> EnteredUserID AND Source <> ‘PROTOCOL’/ total Orders by Name & Category

    Protocol % = # of orders where Source = ‘PROTOCOL’ / Total Orders

  • christina.honnert (9/11/2012)


    I’ve got the following data and I’m trying to get to the following results. I’m assuming I need to use a CASE statement, but was looking for some help and to find out the best way to optimize the query as well because I will be hitting hundreds of thousands of rows to get the summarized data.

    Name CategorySourceUserIDEnteredUserIDOrder #

    Joe SmithMEDWjsmithbob 2012-01

    Joe SmithMEDWjsmithjsmith 2012-02

    Joe SmithLABVjsmithjsmith 2012-03

    Bob WhiteMED PROTOCOL bobbob 2012-04

    Bob WhiteRADWbobjsmith 2012-05

    Bob WhiteCTVbobkatie 2012-06

    Bob WhiteMED MD ORDER bobjsmith 2012-07

    I want to get a total count of orders by Name then by Category like below.

    NameCategoryTotal Orders% Phys Entered%Other Entered Protocol %

    Joe SmithMED250%50%0%

    Joe SmithLAB1100%0%0%

    Bob WhiteMED20%50%50%

    Bob WhiteRAD10%100%0%

    Bob WhiteCT10%100%0%

    Total Orders = # of orders by Name & Category

    % Phys Entered = # of orders where user ID = EnteredUser ID AND Source <> ‘PROTOCOL’ / Total Orders by Name & Category

    %Other Entered = # of orders where userID <> EnteredUserID AND Source <> ‘PROTOCOL’/ total Orders by Name & Category

    Protocol % = # of orders where Source = ‘PROTOCOL’ / Total Orders

    It is not too likely you are going to get much help with your query unless you post some more details. I see you are pretty new around here. You should take a look at the first link in my signature for best practices when posting questions.

    I can't provide you much detail but I don't see how CASE is going to help in your situation. What you need to do is look at aggregate data. You will need to do some grouping and use some aggregate functions.

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • I think you're going to need COUNTs and GROUP BY

    SELECT Name, Category, COUNT(Category) AS CountCat

    FROM ..yourTable..

    GROUP BY Name, Category

    then add your percentages in by getting the MAX as a subquery (I'm ready to be corrected here)

    SELECT Name, Category, ( 100 * COUNT(Category) / (SELECT COUNT(*) FROM ..yourTable.. ) ) AS PCNT

    FROM ..yourTable..

    GROUP BY Name, Category

    Hopefully this will give you the framework of an approach to this problem.

  • Just a warning, Christina, if your data is set up the way I think it is, t.brown's solution will not immediately give you the answer you're looking for. You will definitely have to go tweaking in order to find the best way to do this.

    However, if you're able to give us table DDL and an insert statement with sample data, and let us know what code you've tried to make this work, we'll help you tweak until you get it right.

    Brandie Tarvin, MCITP Database AdministratorLiveJournal Blog: http://brandietarvin.livejournal.com/[/url]On LinkedIn!, Google+, and Twitter.Freelance Writer: ShadowrunLatchkeys: Nevermore, Latchkeys: The Bootleg War, and Latchkeys: Roscoes in the Night are now available on Nook and Kindle.

  • I did end up figuring this out by the following:

    select b.Name, a.Category, COUNT(a.OrderNumber) as 'Total Orders',

    ((sum(

    CASE

    WHEN b.UserID = a.EnteredUserID and c.OrdSourceID <> 'PROTOCOL'

    THEN 1

    ELSE 0

    END) * 100 / nullif(COUNT(a.OrderNumber),0))) as 'Phys Entered %',

    ((SUM(

    CASE

    WHEN b.UserID <> a.EnteredUserID and c.OrdSourceID <> 'PROTOCOL'

    THEN 1

    ELSE 0

    END) * 100 / nullif(COUNT(a.OrderNumber),0))) as 'Other Entered %',

    ((SUM(

    CASE

    WHEN c.OrdSourceID = 'PROTOCOL'

    THEN 1

    ELSE 0

    END) * 100 / nullif(COUNT(a.OrderNumber),0))) as 'Revised Order %'

    from OeOrders a

    INNER JOIN OeOrders2 c ON a.OrderID = c.OrderID

    LEFT OUTER JOIN DMisUsers b ON a.ProviderID = b.ProviderID

    where (a.OrderDateTime between '2012-01-01 00:00:00:000' AND '2012-09-09 23:59:59:999')

    and a.Status <> 'CANC'

    and b.Name is not null

    group by b.Name, a.Category

    order by b.Name, a.Category

  • How many statuses do you have? If it's 10 or less, your performance may be increased if you switch that WHERE clause to say something like Status IN ('Active','Expired',...'n').

    Brandie Tarvin, MCITP Database AdministratorLiveJournal Blog: http://brandietarvin.livejournal.com/[/url]On LinkedIn!, Google+, and Twitter.Freelance Writer: ShadowrunLatchkeys: Nevermore, Latchkeys: The Bootleg War, and Latchkeys: Roscoes in the Night are now available on Nook and Kindle.

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply