Click here to monitor SSC
SQLServerCentral is supported by Redgate
 
Log in  ::  Register  ::  Not logged in
 
 
 


What's wrong with my DISTINCT


What's wrong with my DISTINCT

Author
Message
momba
momba
Grasshopper
Grasshopper (19 reputation)Grasshopper (19 reputation)Grasshopper (19 reputation)Grasshopper (19 reputation)Grasshopper (19 reputation)Grasshopper (19 reputation)Grasshopper (19 reputation)Grasshopper (19 reputation)

Group: General Forum Members
Points: 19 Visits: 49
I've got another noob question...

When I run this query to just grab the emp_id I get 6474 but when I request more fields I get 6605?
select distinct CONVERT (varchar(9),EMPLOYEE_ID) AS emp_id
, CONVERT(datetime,(CONVERT(varchar(8),ADMISSION_DTE))) AS Adm_date
, ADMISSION_CDE
, ADMIT_TYPE_CDE
, CONVERT(datetime,(CONVERT(varchar(8),DISC_DTE))) AS Dis_date
, DISC_CDE
, DISC_TYPE_CDE
from CurrentTable
WHERE [BranchID]='950' AND (DISC_DTE IS NULL OR (DISC_DTE > 20120630 AND DISC_DTE < 20130201))
order by emp_id , adm_date



...I just want the record for the most recent adm_date what's up with my distinct?
Lynn Pettis
Lynn Pettis
SSC-Insane
SSC-Insane (24K reputation)SSC-Insane (24K reputation)SSC-Insane (24K reputation)SSC-Insane (24K reputation)SSC-Insane (24K reputation)SSC-Insane (24K reputation)SSC-Insane (24K reputation)SSC-Insane (24K reputation)

Group: General Forum Members
Points: 24207 Visits: 37975
momba (2/5/2013)
I've got another noob question...

When I run this query to just grab the emp_id I get 6474 but when I request more fields I get 6605?
select distinct CONVERT (varchar(9),EMPLOYEE_ID) AS emp_id
, CONVERT(datetime,(CONVERT(varchar(8),ADMISSION_DTE))) AS Adm_date
, ADMISSION_CDE
, ADMIT_TYPE_CDE
, CONVERT(datetime,(CONVERT(varchar(8),DISC_DTE))) AS Dis_date
, DISC_CDE
, DISC_TYPE_CDE
from CurrentTable
WHERE [BranchID]='950' AND (DISC_DTE IS NULL OR (DISC_DTE > 20120630 AND DISC_DTE < 20130201))
order by emp_id , adm_date



...I just want the record for the most recent adm_date what's up with my distinct?


The problem is with your understanding of the DISTINCT operator. It returns all distinct records based on all the columns in the select list. Each row will be distinct.

Cool
Lynn Pettis

For better assistance in answering your questions, click here
For tips to get better help with Performance Problems, click here
For Running Totals and its variations, click here or when working with partitioned tables
For more about Tally Tables, click here
For more about Cross Tabs and Pivots, click here and here
Managing Transaction Logs

SQL Musings from the Desert Fountain Valley SQL (My Mirror Blog)
GilaMonster
GilaMonster
SSC-Forever
SSC-Forever (47K reputation)SSC-Forever (47K reputation)SSC-Forever (47K reputation)SSC-Forever (47K reputation)SSC-Forever (47K reputation)SSC-Forever (47K reputation)SSC-Forever (47K reputation)SSC-Forever (47K reputation)

Group: General Forum Members
Points: 47259 Visits: 44392
Distinct just removes complete duplicate rows (rows where all the columns you select are exactly the same)

What is that query supposed to return?


Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass


momba
momba
Grasshopper
Grasshopper (19 reputation)Grasshopper (19 reputation)Grasshopper (19 reputation)Grasshopper (19 reputation)Grasshopper (19 reputation)Grasshopper (19 reputation)Grasshopper (19 reputation)Grasshopper (19 reputation)

Group: General Forum Members
Points: 19 Visits: 49
I'm trying to pull out the last record for each employee, i.e. if they appear in the table two or more times, I just want the record attached to the later adm_date (which should result in just 6474 records).
Lowell
Lowell
SSChampion
SSChampion (14K reputation)SSChampion (14K reputation)SSChampion (14K reputation)SSChampion (14K reputation)SSChampion (14K reputation)SSChampion (14K reputation)SSChampion (14K reputation)SSChampion (14K reputation)

Group: General Forum Members
Points: 14959 Visits: 38977
momba (2/5/2013)
I'm trying to pull out the last record for each employee, i.e. if they appear in the table two or more times, I just want the record attached to the later adm_date (which should result in just 6474 records).



if you want the latest date, you want to use GROUP BY., and actually use the MAX*() function on the date so you get that last(latest/max date

SELECT
CONVERT (VARCHAR(9), EMPLOYEE_ID) AS emp_id,
MAX(CONVERT(DATETIME, (CONVERT(VARCHAR(8), ADMISSION_DTE)))) AS Adm_date,
ADMISSION_CDE,
ADMIT_TYPE_CDE,
CONVERT(DATETIME, (CONVERT(VARCHAR(8), DISC_DTE))) AS Dis_date,
DISC_CDE,
DISC_TYPE_CDE
FROM CurrentTable
WHERE [BranchID] = '950'
AND ( DISC_DTE IS NULL
OR ( DISC_DTE > 20120630
AND DISC_DTE < 20130201 ) )
GROUP BY
CONVERT (VARCHAR(9), EMPLOYEE_ID),
ADMISSION_CDE,
ADMIT_TYPE_CDE,
CONVERT(DATETIME, (CONVERT(VARCHAR(8), DISC_DTE))),
DISC_CDE,
DISC_TYPE_CDE
ORDER BY
emp_id,
adm_date



Lowell

--
help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

TheSQLGuru
TheSQLGuru
SSCertifiable
SSCertifiable (6K reputation)SSCertifiable (6K reputation)SSCertifiable (6K reputation)SSCertifiable (6K reputation)SSCertifiable (6K reputation)SSCertifiable (6K reputation)SSCertifiable (6K reputation)SSCertifiable (6K reputation)

Group: General Forum Members
Points: 5963 Visits: 8312
This could be more efficient (esp if "pkstuff" tight and/or the table is indexed properly for the correlated SELECT):

select CONVERT (varchar(9),EMPLOYEE_ID) AS emp_id
, CONVERT(datetime,(CONVERT(varchar(8),ADMISSION_DTE))) AS Adm_date
, ADMISSION_CDE
, ADMIT_TYPE_CDE
, CONVERT(datetime,(CONVERT(varchar(8),DISC_DTE))) AS Dis_date
, DISC_CDE
, DISC_TYPE_CDE
from CurrentTable ct1
WHERE [BranchID]='950' AND (DISC_DTE IS NULL OR (DISC_DTE > 20120630 AND DISC_DTE < 20130201))
and ct1.admission_dte = (SELECT MAX(ct2.admission_dte) from curenttable ct1 WHERE ct1.pkstuff = ct2.pkstuff)
order by emp_id , adm_date



You can also solve this with this construct: , ROW_NUMBER() OVER(partition by ... order by admission_dte desc) as rownumber

WHERE rownumber = 1

Best,

Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru at GMail
momba
momba
Grasshopper
Grasshopper (19 reputation)Grasshopper (19 reputation)Grasshopper (19 reputation)Grasshopper (19 reputation)Grasshopper (19 reputation)Grasshopper (19 reputation)Grasshopper (19 reputation)Grasshopper (19 reputation)

Group: General Forum Members
Points: 19 Visits: 49
RE: Lowell's post....results are still too high... when I try to follow your logic in the code I get lost in the syntax around "GROUP BY" so I started back at the basics. If I just do emp_id and adm_date the numbers are right...

SELECT
CONVERT (VARCHAR(9), EMPLOYEE_ID) AS EMP_id,
MAX(CONVERT(DATETIME, (CONVERT(VARCHAR(8), ADMISSION_DTE)))) AS Adm_date
from CurrentTable
WHERE [BranchID] = '950'
AND ( DISC_DTE IS NULL
OR ( DISC_DTE > 20120630
AND DISC_DTE < 20130201 ) )
GROUP BY
EMPLOYEE_ID

.... but as soon as I add in additional fields the numbers get higher and higher. ???

I'm going to try the next example.
Lynn Pettis
Lynn Pettis
SSC-Insane
SSC-Insane (24K reputation)SSC-Insane (24K reputation)SSC-Insane (24K reputation)SSC-Insane (24K reputation)SSC-Insane (24K reputation)SSC-Insane (24K reputation)SSC-Insane (24K reputation)SSC-Insane (24K reputation)

Group: General Forum Members
Points: 24207 Visits: 37975
momba (2/5/2013)
RE: Lowell's post....results are still too high... when I try to follow your logic in the code I get lost in the syntax around "GROUP BY" so I started back at the basics. If I just do emp_id and adm_date the numbers are right...

SELECT
CONVERT (VARCHAR(9), EMPLOYEE_ID) AS EMP_id,
MAX(CONVERT(DATETIME, (CONVERT(VARCHAR(8), ADMISSION_DTE)))) AS Adm_date
from CurrentTable
WHERE [BranchID] = '950'
AND ( DISC_DTE IS NULL
OR ( DISC_DTE > 20120630
AND DISC_DTE < 20130201 ) )
GROUP BY
EMPLOYEE_ID

.... but as soon as I add in additional fields the numbers get higher and higher. ???

I'm going to try the next example.


This:



WITH BaseData AS (
SELECT
rn = ROW_NUMBER OVER (PARTITION BY EMPLOYEE_ID ORDER BY ADMISSION_DTE DESC),
CONVERT (VARCHAR(9), EMPLOYEE_ID) AS EMP_id,
MAX(CONVERT(DATETIME, (CONVERT(VARCHAR(8), ADMISSION_DTE)))) AS Adm_date
FROM
CurrentTable
WHERE
[BranchID] = '950'
AND ( DISC_DTE IS NULL
OR ( DISC_DTE > 20120630
AND DISC_DTE < 20130201 ) )
)
SELECT
*
FROM
BaseData
WHERE
rn = 1;




Any columns you need added, ad to the SELECT in the CTE. You can also define which columns to select from the CTE instead of using the *, this way you can eliminate the column rn.

Cool
Lynn Pettis

For better assistance in answering your questions, click here
For tips to get better help with Performance Problems, click here
For Running Totals and its variations, click here or when working with partitioned tables
For more about Tally Tables, click here
For more about Cross Tabs and Pivots, click here and here
Managing Transaction Logs

SQL Musings from the Desert Fountain Valley SQL (My Mirror Blog)
momba
momba
Grasshopper
Grasshopper (19 reputation)Grasshopper (19 reputation)Grasshopper (19 reputation)Grasshopper (19 reputation)Grasshopper (19 reputation)Grasshopper (19 reputation)Grasshopper (19 reputation)Grasshopper (19 reputation)

Group: General Forum Members
Points: 19 Visits: 49
Lynn Pettis (2/5/2013)
momba (2/5/2013)
RE: Lowell's post....results are still too high... when I try to follow your logic in the code I get lost in the syntax around "GROUP BY" so I started back at the basics. If I just do emp_id and adm_date the numbers are right...

SELECT
CONVERT (VARCHAR(9), EMPLOYEE_ID) AS EMP_id,
MAX(CONVERT(DATETIME, (CONVERT(VARCHAR(8), ADMISSION_DTE)))) AS Adm_date
from CurrentTable
WHERE [BranchID] = '950'
AND ( DISC_DTE IS NULL
OR ( DISC_DTE > 20120630
AND DISC_DTE < 20130201 ) )
GROUP BY
EMPLOYEE_ID

.... but as soon as I add in additional fields the numbers get higher and higher. ???

I'm going to try the next example.


This:



WITH BaseData AS (
SELECT
rn = ROW_NUMBER OVER (PARTITION BY EMPLOYEE_ID ORDER BY ADMISSION_DTE DESC),
CONVERT (VARCHAR(9), EMPLOYEE_ID) AS EMP_id,
MAX(CONVERT(DATETIME, (CONVERT(VARCHAR(8), ADMISSION_DTE)))) AS Adm_date
FROM
CurrentTable
WHERE
[BranchID] = '950'
AND ( DISC_DTE IS NULL
OR ( DISC_DTE > 20120630
AND DISC_DTE < 20130201 ) )
)
SELECT
*
FROM
BaseData
WHERE
rn = 1;




Any columns you need added, ad to the SELECT in the CTE. You can also define which columns to select from the CTE instead of using the *, this way you can eliminate the column rn.


... for some reason SSMS 2012 doesn't like "OVER" in the "ROW NUMBER OVER" syntax. I'm working through it now...
GilaMonster
GilaMonster
SSC-Forever
SSC-Forever (47K reputation)SSC-Forever (47K reputation)SSC-Forever (47K reputation)SSC-Forever (47K reputation)SSC-Forever (47K reputation)SSC-Forever (47K reputation)SSC-Forever (47K reputation)SSC-Forever (47K reputation)

Group: General Forum Members
Points: 47259 Visits: 44392
Cause Lynn left out some brackets

WITH BaseData AS (
SELECT
rn = ROW_NUMBER() OVER (PARTITION BY EMPLOYEE_ID ORDER BY ADMISSION_DTE DESC),
CONVERT (VARCHAR(9), EMPLOYEE_ID) AS EMP_id,
MAX(CONVERT(DATETIME, (CONVERT(VARCHAR(8), ADMISSION_DTE)))) AS Adm_date
FROM
CurrentTable
WHERE
[BranchID] = '950'
AND ( DISC_DTE IS NULL
OR ( DISC_DTE > 20120630
AND DISC_DTE < 20130201 ) )
)
SELECT
*
FROM
BaseData
WHERE
rn = 1;




Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass


Go


Permissions

You can't post new topics.
You can't post topic replies.
You can't post new polls.
You can't post replies to polls.
You can't edit your own topics.
You can't delete your own topics.
You can't edit other topics.
You can't delete other topics.
You can't edit your own posts.
You can't edit other posts.
You can't delete your own posts.
You can't delete other posts.
You can't post events.
You can't edit your own events.
You can't edit other events.
You can't delete your own events.
You can't delete other events.
You can't send private messages.
You can't send emails.
You can read topics.
You can't vote in polls.
You can't upload attachments.
You can download attachments.
You can't post HTML code.
You can't edit HTML code.
You can't post IFCode.
You can't post JavaScript.
You can post emoticons.
You can't post or upload images.

Select a forum

































































































































































SQLServerCentral


Search