Group by Age Range

  • Hello

    I have the following query, altered to protect the table names.

    SELECT '0-10' Age, COUNT(CASE WHEN Age BETWEEN 0 AND 10 THEN Age ELSE NULL END) Total

    ,COUNT(CASE WHEN Age BETWEEN 0 AND 10 THEN Age ELSE NULL END)*1.0/COUNT(Age) Percentage

    from dbo.PERSON ppp

    union

    select

    '11-20', COUNT(CASE WHEN Age BETWEEN 11 AND 20 THEN Age ELSE NULL END)

    ,COUNT(CASE WHEN Age BETWEEN 11 AND 20 THEN Age ELSE NULL END)*1.0/COUNT(Age)

    from dbo.PERSON ppp union

    select

    '21-30', COUNT(CASE WHEN Age BETWEEN 21 AND 30 THEN Age ELSE NULL END)

    ,COUNT(CASE WHEN Age BETWEEN 21 AND 30 THEN Age ELSE NULL END)*1.0/COUNT(Age)

    from dbo.PERSON ppp union

    select

    '31-40', COUNT(CASE WHEN Age BETWEEN 31 AND 40 THEN Age ELSE NULL END)

    ,COUNT(CASE WHEN Age BETWEEN 31 AND 40 THEN Age ELSE NULL END)*1.0/COUNT(Age)

    from dbo.PERSON ppp union

    select

    '41-50', COUNT(CASE WHEN Age BETWEEN 41 AND 50 THEN Age ELSE NULL END)

    ,COUNT(CASE WHEN Age BETWEEN 41 AND 50 THEN Age ELSE NULL END)*1.0/COUNT(Age)

    from dbo.PERSON ppp union

    select

    '51-60', COUNT(CASE WHEN Age BETWEEN 51 AND 60 THEN Age ELSE NULL END)

    ,COUNT(CASE WHEN Age BETWEEN 51 AND 60 THEN Age ELSE NULL END)*1.0/COUNT(Age)

    from dbo.PERSON ppp union

    select

    '61-70', COUNT(CASE WHEN Age BETWEEN 61 AND 70 THEN Age ELSE NULL END)

    ,COUNT(CASE WHEN Age BETWEEN 61 AND 70 THEN Age ELSE NULL END)*1.0/COUNT(Age)

    from dbo.PERSON ppp union

    select

    '71-80', COUNT(CASE WHEN Age BETWEEN 71 AND 80 THEN Age ELSE NULL END)

    ,COUNT(CASE WHEN Age BETWEEN 71 AND 80 THEN Age ELSE NULL END)*1.0/COUNT(Age)

    from dbo.PERSON ppp union

    select

    '81-90', COUNT(CASE WHEN Age BETWEEN 81 AND 90 THEN Age ELSE NULL END)

    ,COUNT(CASE WHEN Age BETWEEN 81 AND 90 THEN Age ELSE NULL END)*1.0/COUNT(Age)

    from dbo.PERSON ppp union

    select

    '91-100', COUNT(CASE WHEN Age BETWEEN 91 AND 100 THEN Age ELSE NULL END)

    ,COUNT(CASE WHEN Age BETWEEN 91 AND 100 THEN Age ELSE NULL END)*1.0/COUNT(Age)

    from dbo.PERSON ppp union

    select

    '100+', COUNT(CASE WHEN Age > 100 THEN Age ELSE NULL END)

    ,COUNT(CASE WHEN Age > 100 THEN Age ELSE NULL END)*1.0/COUNT(Age)

    from dbo.PERSON ppp union

    select

    'Not Entered', COUNT(CASE WHEN isnull(Age, 9999) = 9999 THEN 1 ELSE NULL END)

    ,COUNT(CASE WHEN isnull(Age, 999) = 999 THEN 1 ELSE NULL END)*1.0/COUNT(*)

    from dbo.PERSON ppp

    Which returns

    AgeTotalPercentage

    0-103550.045489492567

    11-206090.078036904151

    21-307000.089697590978

    31-406930.088800615069

    41-507360.094310609943

    51-606920.088672475653

    61-709040.115838031778

    71-8013970.179010763710

    81-9013700.175550999487

    91-1003400.043567401332

    100+80.001025115325

    Not Entered92660.542823667252

    Excellent, but wait, why the large number of exceptions? That is, records with no age entered.

    It turns out the table has duplicate rows for each record. Here's a quick test setup. One row has the age, the other related record is null.

    create table #PERSON(personId int not null identity(1,1) primary key,

    Age int null);

    insert #Person (Age) values(7)

    insert #Person (Age) values(17)

    insert #Person (Age) values(27)

    insert #Person (Age) values(37)

    insert #Person (Age) values(47)

    insert #Person (Age) values(57)

    insert #Person (Age) values(67)

    insert #Person (Age) values(77)

    insert #Person (Age) values(87)

    insert #Person (Age) values(97)

    insert #Person (Age) values(107)

    insert #Person (Age) values(8)

    insert #Person (Age) values(18)

    insert #Person (Age) values(28)

    insert #Person (Age) values(38)

    insert #Person (Age) values(48)

    insert #Person (Age) values(58)

    insert #Person (Age) values(68)

    insert #Person (Age) values(78)

    insert #Person (Age) values(88)

    insert #Person (Age) values(98)

    insert #Person (Age) values(108)

    insert #Person (Age) values(3)

    insert #Person (Age) values(13)

    insert #Person (Age) values(23)

    insert #Person (Age) values(33)

    insert #Person (Age) values(43)

    create table PERSON(pkey int not null identity (1,1),

    personId int not null,

    Age int null);

    Insert Person (PersonId, Age)

    select * from #PERSON p

    Insert Person (PersonId)

    select personId from #PERSON p

    delete person where age > 90

    SELECT * from PERSON p order by personId, age

    -- drop table #PERSON

    -- drop table PERSON

    If you run the first query on this test setup, you get

    AgeTotalPercentage

    0-1030.130434782608

    11-2030.130434782608

    21-3030.130434782608

    31-4030.130434782608

    41-5030.130434782608

    51-6020.086956521739

    61-7020.086956521739

    71-8020.086956521739

    81-9020.086956521739

    91-10000.000000000000

    100+00.000000000000

    Not Entered270.540000000000

    When really only the people over 90 have not had their age entered into the system. Not entered should be 4. How do I ignore null rows for ones that have a valid age, but not others?

    Dave J


    http://glossopian.co.uk/
    "I don't know what I don't know."

  • I would do the whole thing this way:

    declare @Total float;

    select @Total =

    (select count(*)

    from #Person);

    select

    '0-10' as Age,

    count(*) as Total,

    count(*)/@Total as Percentage

    from #Person

    where age between 0 and 10

    union all

    select

    'Not Entered',

    count(*),

    count(*)/@Total

    from #Person

    where age is null

    and personid not in

    (select personid

    from #Person

    where age is not null);

    It'll be much more efficient than what you've got so far, and it'll give you an accurate count on the final query. Just copy and paste the firsts bit and plug in each of your ranges.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • Here's another way

    WITH AgeRanges(AgeDesc,AgeMin,AgeMax) AS(

    SELECT '0-10',0,10

    UNION ALL

    SELECT '11-20',11,20

    UNION ALL

    SELECT '21-30',21,30

    UNION ALL

    SELECT '31-40',31,40

    UNION ALL

    SELECT '41-50',41,50

    UNION ALL

    SELECT '51-60',51,60

    UNION ALL

    SELECT '61-70',61,70

    UNION ALL

    SELECT '71-80',71,80

    UNION ALL

    SELECT '81-90',81,90

    UNION ALL

    SELECT '91-100',91,100

    UNION ALL

    SELECT '100+',101,9999

    UNION ALL

    SELECT 'Not Entered',NULL,NULL),

    PersonAges AS (

    SELECT personId,MAX(Age) AS Age FROM dbo.Person GROUP BY personId

    )

    SELECT AgeDesc,

    COUNT(personId) AS Total,

    COUNT(personId)*1.0 / (SELECT COUNT(Age) FROM PersonAges) AS Percentage

    FROM AgeRanges

    LEFT OUTER JOIN PersonAges ON (Age BETWEEN AgeMin AND AgeMax) OR (Age IS NULL AND AgeMin IS NULL)

    GROUP BY AgeDesc

    ____________________________________________________

    Deja View - The strange feeling that somewhere, sometime you've optimised this query before

    How to get the best help on a forum

    http://www.sqlservercentral.com/articles/Best+Practices/61537
  • Thanks to both of you. Marks solution is interesting as it uses a 2005/08 feature I've seen but never used. This is my first position where SQL > 2000! :w00t:

    My original stab at it returns this

    AgeTotalPercentage

    0-109080.049942247401

    11-2014000.077003465155

    21-3016640.091524118585

    31-4015870.087288928001

    41-5017260.094934272042

    51-6016040.088223970078

    61-7021420.117815301688

    71-8031440.172927781750

    81-9032120.176667950057

    91-1007710.042406908310

    100+230.001265056927

    Not Entered220010.547533721566

    I get this from GSquared query, altered to point to the real table

    AgeTotalPercentage

    0-109080.022588750404259

    Not Entered24350.0605766599497475

    So the Not entered looks good to me, but the 0-10 range percentage differs. Marks query returns

    AgeDescTotalPercentage

    0-10 9080.049934007919

    11-2014000.076990761108

    21-3016640.091509018917

    31-4015870.087274527056

    41-5017270.094973603167

    51-6016040.088209414870

    61-7021430.117850857897

    71-8031450.172954245490

    81-9032120.176638803343

    91-1007710.042399912010

    100+230.001264848218

    Not Entered20740.114056313242

    Which agrees with my 0-10 count, but differs from GSquared's Not Entered total. As I'm in the UK I suspect I have some time to figure it out, and I'll be mighty proud if I get my head round Marks code. I'll post back any findings.

    Once again, many thanks to you both

    Dave J


    http://glossopian.co.uk/
    "I don't know what I don't know."

  • Cracked it, thanks to GSquared and Marks help. I noticed that Mark results differed from mine by one on 3 age groups, 41-50, 61-70 and 71-80, and frankly as I don't know this 'with' stuff works I concentrated on GSquared's solution. To cut a long story short I added this to the top of the script:

    declare @notEntered float, @notEnteredPercentage float

    select @total = COUNT(distinct personid) from person

    select @notEntered = count(*)

    from person

    where age is null

    and personid not in

    (select personid

    from person

    where age is not null);

    select @notEnteredPercentage = @notEntered/@total

    and replaced the last select in the union with this

    select

    'Not Entered', @notEntered, round(@notEnteredPercentage , 3)

    works a charm.

    Apologies for saying GSquareds percentage was off earlier, my bad. I changed count(*) to count(age) and forgot I had done so. :blush:

    Edit: I think I needed to on reflection.

    Thanks guys.

    Dave J


    http://glossopian.co.uk/
    "I don't know what I don't know."

  • Glad we could help.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply