Query Result Help

  • Here is the query that is thrwoing me off....If I comment the ReportDate portion out, I get aCount as 728 and bCount as 716 which is the right count. But when I bring in ReportDate in the group by clause

    the acutal counts are off. aCount comes to be 728 but bCount comes to be 725 and I am not able to figure out why. By the way ReportDate in the table is datetime.

    Please help.

    SELECT

    login as UserID

    ,CONVERT(varchar(8), mydate, 112) as ReportDate

    ,COUNT(distinct a) as aCount

    ,COUNT(distinct b) as bCount

    FROM A

    WHERE CONVERT(varchar(8), mydate, 112) BETWEEN '20111001' AND '20111101'

    and login = '2853'

    GROUP BY Login,CONVERT(varchar(8), mydate, 112) as ReportDate

  • I will be glad to help, if you follow the posting guidelines.

    Please post your DDL and sample data.

  • SQL_Surfer (1/28/2012)


    Here is the query that is thrwoing me off....If I comment the ReportDate portion out, I get aCount as 728 and bCount as 716 which is the right count. But when I bring in ReportDate in the group by clause

    the acutal counts are off. aCount comes to be 728 but bCount comes to be 725 and I am not able to figure out why. By the way ReportDate in the table is datetime.

    Please help.

    SELECT

    login as UserID

    ,CONVERT(varchar(8), mydate, 112) as ReportDate

    ,COUNT(distinct a) as aCount

    ,COUNT(distinct b) as bCount

    FROM A

    WHERE CONVERT(varchar(8), mydate, 112) BETWEEN '20111001' AND '20111101'

    and login = '2853'

    GROUP BY Login,CONVERT(varchar(8), mydate, 112) as ReportDate

    count(distinct) is evaluated AFTER "group by" which means that duplicates within "group by" are eliminated from the count, that's why.

    _____________________________________
    Pablo (Paul) Berzukov

    Author of Understanding Database Administration available at Amazon and other bookstores.

    Disclaimer: Advice is provided to the best of my knowledge but no implicit or explicit warranties are provided. Since the advisor explicitly encourages testing any and all suggestions on a test non-production environment advisor should not held liable or responsible for any actions taken based on the given advice.
  • Thanks. How do I get around that?

  • I think that PaulB gave you a good hint:

    When you eliminate grouping by the date, you get counts for one set only and all duplicates of b within that set are eliminated.

    With grouping by the date, you are dealing with (potentially) 32 groups and duplicates of b are eliminated only if they fall within the same group, i.e., belong to the same date. If they fall within different groups, they are not eliminated and your overall count is therefore higher.

  • SQL_Surfer (1/28/2012)


    Thanks. How do I get around that?

    Well... both result sets show you two different views of reality and cannot be used to validate each other.

    Depending on business requirements and the way data was modeled in your particular system you have to decide which one is the one that represents reality.

    _____________________________________
    Pablo (Paul) Berzukov

    Author of Understanding Database Administration available at Amazon and other bookstores.

    Disclaimer: Advice is provided to the best of my knowledge but no implicit or explicit warranties are provided. Since the advisor explicitly encourages testing any and all suggestions on a test non-production environment advisor should not held liable or responsible for any actions taken based on the given advice.

Viewing 6 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply