What's wrong with my DISTINCT

  • I've got another noob question...

    When I run this query to just grab the emp_id I get 6474 but when I request more fields I get 6605?

    select distinct CONVERT (varchar(9),EMPLOYEE_ID) AS emp_id

    , CONVERT(datetime,(CONVERT(varchar(8),ADMISSION_DTE))) AS Adm_date

    , ADMISSION_CDE

    , ADMIT_TYPE_CDE

    , CONVERT(datetime,(CONVERT(varchar(8),DISC_DTE))) AS Dis_date

    , DISC_CDE

    , DISC_TYPE_CDE

    from CurrentTable

    WHERE [BranchID]='950' AND (DISC_DTE IS NULL OR (DISC_DTE > 20120630 AND DISC_DTE < 20130201))

    order by emp_id , adm_date

    ...I just want the record for the most recent adm_date what's up with my distinct?

  • momba (2/5/2013)


    I've got another noob question...

    When I run this query to just grab the emp_id I get 6474 but when I request more fields I get 6605?

    select distinct CONVERT (varchar(9),EMPLOYEE_ID) AS emp_id

    , CONVERT(datetime,(CONVERT(varchar(8),ADMISSION_DTE))) AS Adm_date

    , ADMISSION_CDE

    , ADMIT_TYPE_CDE

    , CONVERT(datetime,(CONVERT(varchar(8),DISC_DTE))) AS Dis_date

    , DISC_CDE

    , DISC_TYPE_CDE

    from CurrentTable

    WHERE [BranchID]='950' AND (DISC_DTE IS NULL OR (DISC_DTE > 20120630 AND DISC_DTE < 20130201))

    order by emp_id , adm_date

    ...I just want the record for the most recent adm_date what's up with my distinct?

    The problem is with your understanding of the DISTINCT operator. It returns all distinct records based on all the columns in the select list. Each row will be distinct.

  • Distinct just removes complete duplicate rows (rows where all the columns you select are exactly the same)

    What is that query supposed to return?

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • I'm trying to pull out the last record for each employee, i.e. if they appear in the table two or more times, I just want the record attached to the later adm_date (which should result in just 6474 records).

  • momba (2/5/2013)


    I'm trying to pull out the last record for each employee, i.e. if they appear in the table two or more times, I just want the record attached to the later adm_date (which should result in just 6474 records).

    if you want the latest date, you want to use GROUP BY., and actually use the MAX*() function on the date so you get that last(latest/max date

    SELECT

    CONVERT (VARCHAR(9), EMPLOYEE_ID) AS emp_id,

    MAX(CONVERT(DATETIME, (CONVERT(VARCHAR(8), ADMISSION_DTE)))) AS Adm_date,

    ADMISSION_CDE,

    ADMIT_TYPE_CDE,

    CONVERT(DATETIME, (CONVERT(VARCHAR(8), DISC_DTE))) AS Dis_date,

    DISC_CDE,

    DISC_TYPE_CDE

    FROM CurrentTable

    WHERE [BranchID] = '950'

    AND ( DISC_DTE IS NULL

    OR ( DISC_DTE > 20120630

    AND DISC_DTE < 20130201 ) )

    GROUP BY

    CONVERT (VARCHAR(9), EMPLOYEE_ID),

    ADMISSION_CDE,

    ADMIT_TYPE_CDE,

    CONVERT(DATETIME, (CONVERT(VARCHAR(8), DISC_DTE))),

    DISC_CDE,

    DISC_TYPE_CDE

    ORDER BY

    emp_id,

    adm_date

    Lowell


    --help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

  • This could be more efficient (esp if "pkstuff" tight and/or the table is indexed properly for the correlated SELECT):

    select CONVERT (varchar(9),EMPLOYEE_ID) AS emp_id

    , CONVERT(datetime,(CONVERT(varchar(8),ADMISSION_DTE))) AS Adm_date

    , ADMISSION_CDE

    , ADMIT_TYPE_CDE

    , CONVERT(datetime,(CONVERT(varchar(8),DISC_DTE))) AS Dis_date

    , DISC_CDE

    , DISC_TYPE_CDE

    from CurrentTable ct1

    WHERE [BranchID]='950' AND (DISC_DTE IS NULL OR (DISC_DTE > 20120630 AND DISC_DTE < 20130201))

    and ct1.admission_dte = (SELECT MAX(ct2.admission_dte) from curenttable ct1 WHERE ct1.pkstuff = ct2.pkstuff)

    order by emp_id , adm_date

    You can also solve this with this construct: , ROW_NUMBER() OVER(partition by ... order by admission_dte desc) as rownumber

    WHERE rownumber = 1

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • RE: Lowell's post....results are still too high... when I try to follow your logic in the code I get lost in the syntax around "GROUP BY" so I started back at the basics. If I just do emp_id and adm_date the numbers are right...

    SELECT

    CONVERT (VARCHAR(9), EMPLOYEE_ID) AS EMP_id,

    MAX(CONVERT(DATETIME, (CONVERT(VARCHAR(8), ADMISSION_DTE)))) AS Adm_date

    from CurrentTable

    WHERE [BranchID] = '950'

    AND ( DISC_DTE IS NULL

    OR ( DISC_DTE > 20120630

    AND DISC_DTE < 20130201 ) )

    GROUP BY

    EMPLOYEE_ID

    .... but as soon as I add in additional fields the numbers get higher and higher. ???

    I'm going to try the next example.

  • momba (2/5/2013)


    RE: Lowell's post....results are still too high... when I try to follow your logic in the code I get lost in the syntax around "GROUP BY" so I started back at the basics. If I just do emp_id and adm_date the numbers are right...

    SELECT

    CONVERT (VARCHAR(9), EMPLOYEE_ID) AS EMP_id,

    MAX(CONVERT(DATETIME, (CONVERT(VARCHAR(8), ADMISSION_DTE)))) AS Adm_date

    from CurrentTable

    WHERE [BranchID] = '950'

    AND ( DISC_DTE IS NULL

    OR ( DISC_DTE > 20120630

    AND DISC_DTE < 20130201 ) )

    GROUP BY

    EMPLOYEE_ID

    .... but as soon as I add in additional fields the numbers get higher and higher. ???

    I'm going to try the next example.

    This:

    WITH BaseData AS (

    SELECT

    rn = ROW_NUMBER OVER (PARTITION BY EMPLOYEE_ID ORDER BY ADMISSION_DTE DESC),

    CONVERT (VARCHAR(9), EMPLOYEE_ID) AS EMP_id,

    MAX(CONVERT(DATETIME, (CONVERT(VARCHAR(8), ADMISSION_DTE)))) AS Adm_date

    FROM

    CurrentTable

    WHERE

    [BranchID] = '950'

    AND ( DISC_DTE IS NULL

    OR ( DISC_DTE > 20120630

    AND DISC_DTE < 20130201 ) )

    )

    SELECT

    *

    FROM

    BaseData

    WHERE

    rn = 1;

    Any columns you need added, ad to the SELECT in the CTE. You can also define which columns to select from the CTE instead of using the *, this way you can eliminate the column rn.

  • Lynn Pettis (2/5/2013)


    momba (2/5/2013)


    RE: Lowell's post....results are still too high... when I try to follow your logic in the code I get lost in the syntax around "GROUP BY" so I started back at the basics. If I just do emp_id and adm_date the numbers are right...

    SELECT

    CONVERT (VARCHAR(9), EMPLOYEE_ID) AS EMP_id,

    MAX(CONVERT(DATETIME, (CONVERT(VARCHAR(8), ADMISSION_DTE)))) AS Adm_date

    from CurrentTable

    WHERE [BranchID] = '950'

    AND ( DISC_DTE IS NULL

    OR ( DISC_DTE > 20120630

    AND DISC_DTE < 20130201 ) )

    GROUP BY

    EMPLOYEE_ID

    .... but as soon as I add in additional fields the numbers get higher and higher. ???

    I'm going to try the next example.

    This:

    WITH BaseData AS (

    SELECT

    rn = ROW_NUMBER OVER (PARTITION BY EMPLOYEE_ID ORDER BY ADMISSION_DTE DESC),

    CONVERT (VARCHAR(9), EMPLOYEE_ID) AS EMP_id,

    MAX(CONVERT(DATETIME, (CONVERT(VARCHAR(8), ADMISSION_DTE)))) AS Adm_date

    FROM

    CurrentTable

    WHERE

    [BranchID] = '950'

    AND ( DISC_DTE IS NULL

    OR ( DISC_DTE > 20120630

    AND DISC_DTE < 20130201 ) )

    )

    SELECT

    *

    FROM

    BaseData

    WHERE

    rn = 1;

    Any columns you need added, ad to the SELECT in the CTE. You can also define which columns to select from the CTE instead of using the *, this way you can eliminate the column rn.

    ... for some reason SSMS 2012 doesn't like "OVER" in the "ROW NUMBER OVER" syntax. I'm working through it now...

  • Cause Lynn left out some brackets

    WITH BaseData AS (

    SELECT

    rn = ROW_NUMBER() OVER (PARTITION BY EMPLOYEE_ID ORDER BY ADMISSION_DTE DESC),

    CONVERT (VARCHAR(9), EMPLOYEE_ID) AS EMP_id,

    MAX(CONVERT(DATETIME, (CONVERT(VARCHAR(8), ADMISSION_DTE)))) AS Adm_date

    FROM

    CurrentTable

    WHERE

    [BranchID] = '950'

    AND ( DISC_DTE IS NULL

    OR ( DISC_DTE > 20120630

    AND DISC_DTE < 20130201 ) )

    )

    SELECT

    *

    FROM

    BaseData

    WHERE

    rn = 1;

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • looks like a copy/paste error:

    ROW_NUMBER needs parenthesis:

    change that query to ROW_NUMBER() and it should work fine.

    Lowell


    --help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

  • More like a picnic problem. That's what I get for writing code on a smartphone.

    Thanks for catching the missing parens.

  • :0( Now it says I'm missing the "GROUP BY" clause... (still working on it)...

  • momba (2/5/2013)


    :0( Now it says I'm missing the "GROUP BY" clause... (still working on it)...

    No group by needed., forgot to take off the MAX.

    WITH BaseData AS (

    SELECT

    rn = ROW_NUMBER() OVER (PARTITION BY EMPLOYEE_ID ORDER BY ADMISSION_DTE DESC),

    CONVERT (VARCHAR(9), EMPLOYEE_ID) AS EMP_id,

    ADMISSION_DTE

    FROM

    CurrentTable

    WHERE

    [BranchID] = '950'

    AND ( DISC_DTE IS NULL

    OR ( DISC_DTE > 20120630

    AND DISC_DTE < 20130201 ) )

    )

    SELECT

    *

    FROM

    BaseData

    WHERE

    rn = 1;

  • ...I need to study this row number() thing. Will let you know how it goes when I straigten it out. Thanks for being so patient with my newbieness.

Viewing 15 posts - 1 through 15 (of 16 total)

You must be logged in to reply to this topic. Login to reply