How to manually make up the final result for this query on this dataset

  • Hello,
    I have a table defined as below:
    CREATE TABLE [dbo].[tmpRate](
      [ID] [int] IDENTITY(1,1) NOT NULL,
      [CancerType] [varchar](50) NULL,
      [Sex] [varchar](20) NULL,
      [Rate] [float] NULL
    ) ON [PRIMARY]

    I have a dataset as below:


    Insert into tmpRate Values (138,'Acute myeloid leukemia','Females',0.34)
    Insert into tmpRate Values (139,'Acute myeloid leukemia','Males',0.13)
    Insert into tmpRate Values (141,'Anus','Females',0.44)
    Insert into tmpRate Values (142,'Anus','Males',0.55)
    Insert into tmpRate Values (144,'Bladder (including in situ)','Females',0.76)
    Insert into tmpRate Values (145,'Bladder (including in situ)','Males',0.77)
    Insert into tmpRate Values (147,'Bones and joints (20 to 99 years)','Females',0.18)

    I am looking for a query to get the result like below (if say a type is missing in the raw dataset for a sex, then it shows up as 0 for that missing):

    1,'Acute lymphocytic leukemia','Males'  ,0.11
    2,'Acute lymphocytic leukemia','Females' ,0
    3,'Acute myeloid leukemia','Females',0.34
    4,'Acute myeloid leukemia','Males',0.13
    5,'Anus','Males',0.55
    6,'Anus','Females',0.44
    7,'Bladder (including in situ)','Males',0.77
    8,'Bladder (including in situ)','Females',0.76
    9,'Bones and joints (20 to 99 years)','Females',0.18
    10,'Bones and joints (20 to 99 years)','Males',0

    In this dataset sample, there is no or not applicable for Males to have 'Bones and joints (20 to 99 years)', so the result shows it as 0

    Final result is to be sorted by CancerType and then Rate

    Thank you very much.

  • Thanks for posting such clean details. You can do this a number of ways. I used a cte here to help demonstrate the logical separation you need. I use your table to build the full set of cancer types and then simply use a left join.


    with CancerTypes as
    (
        select distinct tr.CancerType
            , x.Sex
        from tmpRate tr
        cross join (values('Males'),('Females')) x(Sex)
    )

    select ct.CancerType
        , ct.Sex
        , Rate = isnull(r.Rate, 0)
    from CancerTypes ct
    left join tmpRate r on r.CancerType = ct.CancerType and r.Sex = ct.Sex

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • Sean Lange - Monday, March 6, 2017 1:46 PM

    Thanks for posting such clean details. You can do this a number of ways. I used a cte here to help demonstrate the logical separation you need. I use your table to build the full set of cancer types and then simply use a left join.


    with CancerTypes as
    (
        select distinct tr.CancerType
            , x.Sex
        from tmpRate tr
        cross join (values('Males'),('Females')) x(Sex)
    )

    select ct.CancerType
        , ct.Sex
        , Rate = isnull(r.Rate, 0)
    from CancerTypes ct
    left join tmpRate r on r.CancerType = ct.CancerType and r.Sex = ct.Sex

    Thank you for your quick response and it is working as I expect.

  • Sorry I have to come back again:

    When I am doing another similar stats query, my adapted script doesn't produce the right result, here is my queries and results:

    with CancerTypes as
    (
      select distinct tr.CancerType
       , x.Sex
      from tmpRate tr
      cross join (values('Males'),('Females')) x(Sex)
    )

    This gives me:

    Which is correct.

    Another query below:
    with SurvivalRate as
    (
    select c.CancerSite, c.Sex, Cnt = sum(Cast(c.[Value] as float)) from [dbo].[CaCancer5YrSurvRate] c
    where unit = 'Number of cases' and [Value] not in ('F', '..') and c.sex <> 'Both sexes'
    Group by c.CancerSite, c.Sex
    )

    Gives the following result which is correct too:

    Now I want to combine them with the following script:

    with CancerTypes as
    (
      select distinct tr.CancerType
       , x.Sex
      from tmpRate tr
      cross join (values('Males'),('Females')) x(Sex)
    )
    ,
    SurvivalRate as
    (
    select c.CancerSite, c.Sex, Cnt = sum(Cast(c.[Value] as float)) from [dbo].[CaCancer5YrSurvRate] c
    where unit = 'Number of cases' and [Value] not in ('F', '..') and c.sex <> 'Both sexes'
    Group by c.CancerSite, c.Sex
    --order by c.cancersite, c.sex
    )

    select c.cancertype, c.sex, isnull(s.cnt, 0) as Cnt
    From SurvivalRate s
    left outer join CancerTypes c on
    c.Sex = s.Sex and
    c.CancerType = s.CancerSite
    order by s.cancersite, s.sex

    which produces the following result:

    You can see the result is NOT correct because the result should have 0 for Males on the type of Ovary and 0 for Females on the type of Penis.

    So what's wrong with the script?
    I validated the script by saving the two "with" results into tables and pull data directly from tables, I get the right result below:

    Thank you very much.

  • Try changing the order of the LEFT OUTER JOIN to list the CancerTypes table reference first.

    Steve (aka sgmunson) πŸ™‚ πŸ™‚ πŸ™‚
    Rent Servers for Income (picks and shovels strategy)

  • From SurvivalRate s
    left outer join CancerTypes c on ...
    - which means "take all rows from SurvivalRate, and any rows from CancerTypes which have a match on the join columns".
    Rows in CancerTypes which don't have a match in SurvivalRate are omitted.
    Rows in SurvivalRate which don't have a match in CancerTypes are retained.

    Try reversing to:

    FROM CancerTypes c
    LEFT JOIN SurvivalRate s ON ...
    - which means "take all rows from  CancerTypes, and any rows from SurvivalRate which have a match on the join columns"

    β€œWrite the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • Thanks both same suggestion, now I get the right result with the revised script below for later reference:

    with CancerTypes as
    (
      select distinct tr.CancerType
       , x.Sex
      from tmpRate tr
      cross join (values('Males'),('Females')) x(Sex)
    )
    ,
    SurvivalRate as
    (
    select c.CancerSite, c.Sex, Cnt = sum(Cast(c.[Value] as float)) from [dbo].[CaCancer5YrSurvRate] c
    where unit = 'Number of cases' and [Value] not in ('F', '..') and c.sex <> 'Both sexes'
    Group by c.CancerSite, c.Sex
    --order by c.cancersite, c.sex
    )

    select c.cancertype, c.sex, isnull(s.cnt, 0) as Cnt
    From CancerTypes c
    left outer join SurvivalRate s on
    c.CancerType = s.CancerSite and c.Sex = s.Sex
    order by c.cancertype, c.sex

    The result is below:

    Thanks lots.

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply