January 14, 2016 at 6:51 am
I have an issue where rows were duplicated by a certain workflow and I need to identify the rows that were duplicated. I need to know the rows that have the FIN and MRN duplicated and I can get those easy enough, but I also want to see the FirstName, LastName and WhenDischarged fields that may not be unique for the rows with the duplicate FIN(Account Number) and MRN(Medical Record Number).
This script gets me all of the data that I need, but I only want to see the WhenDischarged field when one or both entries are NULL. I do not want to see the duplicated rows when the WhenDischarged date for both entries is populated with data.
select WhenDischarged, FirstName, LastName, AccountNumber, MedicalRecordNumber
from
[Patient] p
where
1 < (select count(*) from [Patient] i where i.AccountNumber = p.AccountNumber and i.MedicalRecordNumber = p.MedicalRecordNumber)
order by AccountNumber
This script works a little better, but it only shows me when one of the WhenDischarged fields is NULL for the duplicated records and not when both WhenDischarged fields are NULL.
select p.MedicalRecordNumber, p.AccountNumber, p.FirstName, p.LastName, p.WhenDischarged
from Patient p
inner join
(
select MedicalRecordNumber
from Patient
group by MedicalRecordNumber
having sum(case when WhenDischarged is not null then 1 else 0 end) = 1
and sum(case when WhenDischarged is null then 1 else 0 end) = 1
) d
on p.MedicalRecordNumber = d.MedicalRecordNumber
where
1 < (select count(*) from [Patient] i where i.AccountNumber = p.AccountNumber and i.MedicalRecordNumber = p.MedicalRecordNumber)
order by AccountNumber
In Summary I want rows where the FIN and MRN are exact matches (There are only two rows for each patient with duplicate data). I want FirstName and LastName even if they do not match. I want WhenDischarged when both are NULL, when one is NULL, but not when both are not NULL
January 14, 2016 at 7:17 am
Have a play with ROW_NUMBER(). This should be close:
SELECT MedicalRecordNumber, AccountNumber, FirstName, LastName, WhenDischarged
FROM (
SELECT p.MedicalRecordNumber, p.AccountNumber, p.FirstName, p.LastName, p.WhenDischarged,
rn = ROW_NUMBER() OVER(PARTITION BY p.MedicalRecordNumber, p.AccountNumber ORDER BY p.WhenDischarged)
FROM Patient p
) d
WHERE d.rn = 1
AND d.WhenDischarged IS NULL
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
January 14, 2016 at 7:52 am
Hi Chris,
Thanks for the suggestion, but I cannot get that to show what I need.
It is Pulling
MedicalRecordNumberAccountNumberFirstNameLastNameWhenDischarged
12345 0012345 John Smith NULL
12346 0012346 Jane Doe NULL
12347 0012347 John Doe NULL
12348 0012348 Jane Smith NULL
What I need it to do is display both records with the duplicate MRN/FIN and then display First and Last Names whether duplicates or unique and then WhenDischarged when 1 is NULL or both are NULL, but not when both have a Date populated.
MedicalRecordNumberAccountNumberFirstNameLastNameWhenDischarged
12345 0012345 John Smith NULL
12345 0012345 John Smithe NULL
12346 0012346 Jane Doe NULL
12346 0012346 Jane Doe 2016-01-01 23:59:00.000
12347 0012347 John Doe 2016-01-02 23:59:00.000
12347 0012347 John Doe NULL
12348 0012348 Jane Smith NULL
12348 0012348 Janey Smith NULL
I would not want to see these rows when both duplicate records have the WhenDischarged field populated.
MedicalRecordNumberAccountNumberFirstNameLastNameWhenDischarged
12345 0012345 John Smith 2015-12-23 14:23:00.000
12345 0012345 John Smithe 2015-12-28 13:56:00.000
12346 0012346 Jane Doe 2016-01-03 13:56:00.000
12346 0012346 Jane Doe 2016-01-01 23:59:00.000
January 14, 2016 at 7:54 am
Sorry for the poor format of the results. I cannot figure out how to get this site to use the spacing I typed out.
January 14, 2016 at 7:59 am
Experiment with the result set. Take out the rn=1 filter so you can see the full set.
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
January 14, 2016 at 8:11 am
wolfgheist (1/14/2016)
Hi Chris,Thanks for the suggestion, but I cannot get that to show what I need.
It is Pulling
MedicalRecordNumberAccountNumberFirstNameLastNameWhenDischarged
12345 0012345 John Smith NULL
12346 0012346 Jane Doe NULL
12347 0012347 John Doe NULL
12348 0012348 Jane Smith NULL
What I need it to do is display both records with the duplicate MRN/FIN and then display First and Last Names whether duplicates or unique and then WhenDischarged when 1 is NULL or both are NULL, but not when both have a Date populated.
MedicalRecordNumberAccountNumberFirstNameLastNameWhenDischarged
12345 0012345 John Smith NULL
12345 0012345 John Smithe NULL
12346 0012346 Jane Doe NULL
12346 0012346 Jane Doe 2016-01-01 23:59:00.000
12347 0012347 John Doe 2016-01-02 23:59:00.000
12347 0012347 John Doe NULL
12348 0012348 Jane Smith NULL
12348 0012348 Janey Smith NULL
I would not want to see these rows when both duplicate records have the WhenDischarged field populated.
MedicalRecordNumberAccountNumberFirstNameLastNameWhenDischarged
12345 0012345 John Smith 2015-12-23 14:23:00.000
12345 0012345 John Smithe 2015-12-28 13:56:00.000
12346 0012346 Jane Doe 2016-01-03 13:56:00.000
12346 0012346 Jane Doe 2016-01-01 23:59:00.000
Try this query, which sets up some sample data so we can both see the same thing. Expand the data set if you need to, or change values within it.
;WITH SampleData (MedicalRecordNumber, AccountNumber, FirstName, LastName, WhenDischarged) AS (
SELECT 12345, '0012345','John','Smith', CAST(NULL AS date) UNION ALL
SELECT 12345, '0012345','John','Smithe', NULL UNION ALL
SELECT 12346, '0012346','Jane','Doe', NULL UNION ALL
SELECT 12346, '0012346','Jane','Doe', '2016-01-01 23:59:00.000' UNION ALL
SELECT 12347, '0012347','John','Doe', '2016-01-02 23:59:00.000' UNION ALL
SELECT 12347, '0012347','John','Doe',NULL UNION ALL
SELECT 12348, '0012348','Jane','Smith', '2016-01-01 23:59:00.000' UNION ALL
SELECT 12348, '0012348','Janey','Smith', '2016-01-01 23:59:00.000')
SELECT
MedicalRecordNumber, AccountNumber, FirstName, LastName, WhenDischarged,
rn, PickMe = CASE WHEN d.rn = 1 AND d.WhenDischarged IS NULL THEN 'YES' ELSE 'NO' END
FROM (
SELECT *,
rn = ROW_NUMBER() OVER(PARTITION BY p.MedicalRecordNumber, p.AccountNumber ORDER BY p.WhenDischarged)
FROM SampleData p
) d
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
January 14, 2016 at 8:35 am
I am playing around with it, but it keeps pulling too much data. It is pulling 20k+ rows, when my first script above pulls 4302 rows that contains all of the data including when the discharge data is populated for both records which I do not want and the second pulls 1302 rows, but is missing the Whendischarged when one is Null and one has a value.
January 14, 2016 at 8:42 am
How are you creating the little window with correct spacing format? Here are the results that I am seeing from your last query.
MRN FIN First Last Discharge rn PickMe
123450012345JohnSmitheNULL1YES
123450012345JohnSmithNULL2NO
123460012346JaneDoeNULL1YES
123460012346JaneDoe2016-01-012NO
123470012347JohnDoeNULL1YES
123470012347JohnDoe2016-01-022NO
123480012348JaneySmith2016-01-011NO
123480012348JaneSmith2016-01-012NO
January 14, 2016 at 8:43 am
wolfgheist (1/14/2016)
I am playing around with it, but it keeps pulling too much data. It is pulling 20k+ rows, when my first script above pulls 4302 rows that contains all of the data including when the discharge data is populated for both records which I do not want and the second pulls 1302 rows, but is missing the Whendischarged when one is Null and one has a value.
Bear in mind that your script doesn't work correctly (or you wouldn't be here).
wolfgheist (1/14/2016)
In Summary I want rows where the FIN and MRN are exact matches (There are only two rows for each patient with duplicate data). I want FirstName and LastName even if they do not match. I want WhenDischarged when both are NULL, when one is NULL, but not when both are not NULL
Since the last script I posted does exactly this (without eliminating the rows from the output), can you add rows to the sample data section to show how the script isn't working?
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
January 14, 2016 at 8:44 am
ROW_NUMBER() is overused. Here the OP needs a single value from one record to be available across the entire partition, namely whether a NULL exists on any record in the partition. Since ROW_NUMBER() assigns a different value to every single record in the partition, it cannot fulfill this need. Here, I've used MAX(), precisely because it returns the same value for every record within a partition.
I've also included COUNT() in case the OP wants to filter only those records where a duplicate exists.
;WITH SampleData (MedicalRecordNumber, AccountNumber, FirstName, LastName, WhenDischarged) AS (
SELECT 12345, '0012345','John','Smith', CAST(NULL AS date) UNION ALL
SELECT 12345, '0012345','John','Smithe', NULL UNION ALL
SELECT 12346, '0012346','Jane','Doe', NULL UNION ALL
SELECT 12346, '0012346','Jane','Doe', '2016-01-01 23:59:00.000' UNION ALL
SELECT 12347, '0012347','John','Doe', '2016-01-02 23:59:00.000' UNION ALL
SELECT 12347, '0012347','John','Doe',NULL UNION ALL
SELECT 12348, '0012348','Jane','Smith', '2016-01-01 23:59:00.000' UNION ALL
SELECT 12348, '0012348','Janey','Smith', '2016-01-01 23:59:00.000')
SELECT
MedicalRecordNumber, AccountNumber, FirstName, LastName, WhenDischarged, d.cnt
FROM (
SELECT *,
last_discharge = MAX(ISNULL(p.WhenDischarged, '9999-12-31')) OVER(PARTITION BY p.MedicalRecordNumber, p.AccountNumber),
cnt = COUNT(*) OVER(PARTITION BY p.MedicalRecordNumber, p.AccountNumber)
FROM SampleData p
) d
WHERE d.last_discharge = '9999-12-31'
ORDER BY d.MedicalRecordNumber, d.AccountNumber, d.WhenDischarged
Drew
J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA
January 14, 2016 at 8:47 am
wolfgheist (1/14/2016)
I am playing around with it, but it keeps pulling too much data. It is pulling 20k+ rows, when my first script above pulls 4302 rows that contains all of the data including when the discharge data is populated for both records which I do not want and the second pulls 1302 rows, but is missing the Whendischarged when one is Null and one has a value.
Use the IFCode shortcuts menu, just to the left of the window you type into.
This is the resultset in code = "plain" and font = "Courier New"
[font="Courier New"]MRN FIN First Last Discharge rn PickMe
12345 0012345 John Smithe NULL 1 YES
12345 0012345 John Smith NULL 2 NO
12346 0012346 Jane Doe NULL 1 YES
12346 0012346 Jane Doe 2016-01-01 2 NO
12347 0012347 John Doe NULL 1 YES
12347 0012347 John Doe 2016-01-02 2 NO
12348 0012348 Janey Smith 2016-01-01 1 NO
12348 0012348 Jane Smith 2016-01-01 2 NO [/font]
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
January 14, 2016 at 8:56 am
ChrisM@Work (1/14/2016)
wolfgheist (1/14/2016)
I am playing around with it, but it keeps pulling too much data. It is pulling 20k+ rows, when my first script above pulls 4302 rows that contains all of the data including when the discharge data is populated for both records which I do not want and the second pulls 1302 rows, but is missing the Whendischarged when one is Null and one has a value.Bear in mind that your script doesn't work correctly (or you wouldn't be here).
I fully agree 🙂
wolfgheist (1/14/2016)
In Summary I want rows where the FIN and MRN are exact matches (There are only two rows for each patient with duplicate data). I want FirstName and LastName even if they do not match. I want WhenDischarged when both are NULL, when one is NULL, but not when both are not NULLSince the last script I posted does exactly this (without eliminating the rows from the output), can you add rows to the sample data section to show how the script isn't working?
Well it displayed some things that do not quite match up with what I am looking for. I have spaced them in sets of what I am looking for and what I do not want.
123450012345JohnSmitheNULL1YES - I want this row
123450012345JohnSmithNULL2NO - I want this row
123460012346JaneDoeNULL1YES - I want this row
123460012346JaneDoe2016-01-012NO - I want this row
123470012347JohnDoeNULL1YES - I want this row
123470012347JohnDoe2016-01-022NO - I want this row
123480012348JaneySmith2016-01-011NO - I do not want this row
123480012348JaneSmith2016-01-012NO - I do not want this row
January 14, 2016 at 8:58 am
drew.allen (1/14/2016)
ROW_NUMBER() is overused. Here the OP needs a single value from one record to be available across the entire partition, namely whether a NULL exists on any record in the partition. Since ROW_NUMBER() assigns a different value to every single record in the partition, it cannot fulfill this need. Here, I've used MAX(), precisely because it returns the same value for every record within a partition.I've also included COUNT() in case the OP wants to filter only those records where a duplicate exists.
;WITH SampleData (MedicalRecordNumber, AccountNumber, FirstName, LastName, WhenDischarged) AS (
SELECT 12345, '0012345','John','Smith', CAST(NULL AS date) UNION ALL
SELECT 12345, '0012345','John','Smithe', NULL UNION ALL
SELECT 12346, '0012346','Jane','Doe', NULL UNION ALL
SELECT 12346, '0012346','Jane','Doe', '2016-01-01 23:59:00.000' UNION ALL
SELECT 12347, '0012347','John','Doe', '2016-01-02 23:59:00.000' UNION ALL
SELECT 12347, '0012347','John','Doe',NULL UNION ALL
SELECT 12348, '0012348','Jane','Smith', '2016-01-01 23:59:00.000' UNION ALL
SELECT 12348, '0012348','Janey','Smith', '2016-01-01 23:59:00.000')
SELECT
MedicalRecordNumber, AccountNumber, FirstName, LastName, WhenDischarged, d.cnt
FROM (
SELECT *,
last_discharge = MAX(ISNULL(p.WhenDischarged, '9999-12-31')) OVER(PARTITION BY p.MedicalRecordNumber, p.AccountNumber),
cnt = COUNT(*) OVER(PARTITION BY p.MedicalRecordNumber, p.AccountNumber)
FROM SampleData p
) d
WHERE d.last_discharge = '9999-12-31'
ORDER BY d.MedicalRecordNumber, d.AccountNumber, d.WhenDischarged
Drew
From the sample set this pulls what I am looking for.
123450012345JohnSmitheNULL2
123450012345JohnSmithNULL2
123460012346JaneDoeNULL2
123460012346JaneDoe2016-01-012
123470012347JohnDoeNULL2
123470012347JohnDoe2016-01-022
January 14, 2016 at 9:02 am
wolfgheist (1/14/2016)
ChrisM@Work (1/14/2016)
wolfgheist (1/14/2016)
I am playing around with it, but it keeps pulling too much data. It is pulling 20k+ rows, when my first script above pulls 4302 rows that contains all of the data including when the discharge data is populated for both records which I do not want and the second pulls 1302 rows, but is missing the Whendischarged when one is Null and one has a value.Bear in mind that your script doesn't work correctly (or you wouldn't be here).
I fully agree 🙂
wolfgheist (1/14/2016)
In Summary I want rows where the FIN and MRN are exact matches (There are only two rows for each patient with duplicate data). I want FirstName and LastName even if they do not match. I want WhenDischarged when both are NULL, when one is NULL, but not when both are not NULLSince the last script I posted does exactly this (without eliminating the rows from the output), can you add rows to the sample data section to show how the script isn't working?
Well it displayed some things that do not quite match up with what I am looking for. I have spaced them in sets of what I am looking for and what I do not want.
123450012345JohnSmitheNULL1YES - I want this row
123450012345JohnSmithNULL2NO - I want this row
123460012346JaneDoeNULL1YES - I want this row
123460012346JaneDoe2016-01-012NO - I want this row
123470012347JohnDoeNULL1YES - I want this row
123470012347JohnDoe2016-01-022NO - I want this row
123480012348JaneySmith2016-01-011NO - I do not want this row
123480012348JaneSmith2016-01-012NO - I do not want this row
So you want ALL row pairs, except where WhenDischarged for both rows of a pair is not null?
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
January 14, 2016 at 9:02 am
This script appears to be doing exactly what I want. Thanks both of you guys. You are awesome!!! 🙂
[Code="sql']SELECT
MedicalRecordNumber, AccountNumber, FirstName, LastName, WhenDischarged, d.cnt
FROM (
SELECT *,
last_discharge = MAX(ISNULL(p.WhenDischarged, '9999-12-31')) OVER(PARTITION BY p.MedicalRecordNumber, p.AccountNumber),
cnt = COUNT(*) OVER(PARTITION BY p.MedicalRecordNumber, p.AccountNumber)
FROM Patient p
) d
WHERE d.last_discharge = '9999-12-31' and cnt = 2
ORDER BY d.MedicalRecordNumber, d.AccountNumber, d.WhenDischarged[/code]
Viewing 15 posts - 1 through 15 (of 16 total)
You must be logged in to reply to this topic. Login to reply