Compare Dates in an iterative manner for a particular set of records

Question

Compare Dates in an iterative manner for a particular set of records

Viewing 12 posts - 31 through 41 (of 41 total)

You must be logged in to reply to this topic. Login to reply

SqlOnMyMind SSCertifiable Points: 5062 More actions · Answer 1

dfarran (11/13/2008)
It is also possible to do the query with just left joins,
SELECT a.PersonID, a.Version , a.dedate,
[is improper] = CASE WHEN MAX(c.version) IS NOT NULL THEN 1
ELSE 0 END
FROM personrecord a
LEFT JOIN personrecord b ON b.personid = a.personid
LEFT JOIN personrecord c ON c.personid = a.personid
AND c.version >= b.version
AND c.dedate < b.dedate
GROUP BY a.PersonID, a.Version, a.dedate
with the primary key on the table (personid, version) this performs slightly better than the cross join with both the small(8) and medium(circa .75m) rowcount.
I would avoid using the version = version -1 because this does not cater for missing versions (try doubling the version numbers in test data)

Good point about not relying on version = version-1!

SqlOnMyMind SSCertifiable Points: 5062 More actions · Answer 2

Peter M. (11/12/2008)
I'd be more interested in seeing the execution times where the data size is non-trivial. My suspicion would lead me to believe the row_number() solution might scale better based on my own experience using the related dense_rank() function.
You could even use two dense rank functions that are partitioned identically but have an order by that is based on the version number in one and the date field in the other, then using that to compare when the resulting ranks are not equal. It would require a derived table joined to the original table, but you could get the results you are looking for.

and thanks, Jeff Moden, for pointing out the use of the clustered index on PersonId, Version.

I was interested to see how the different solutions would scale, so I expanded the PersonRecord table to 1 million PersonIDs, and expanded the rows of data to 5 records per PersonId, using Jeff's code. And I added the clustered index.

And I added dfarran's query.

Here are the results:

Original Query 153,733 Milliseconds duration

James Goodwin -

Left join on inner join 136,703 Milliseconds duration

Matt C -

SQL2005 row_number()

over (partition 184,280 Milliseconds duration

Adam Haines -

inner joins and CASE 138,906 Milliseconds duration

dfarrans -

just left joins,

no version-1 245,016 Milliseconds duration

Granted, these are total execution times, so include I/O.

Also, I did not alter the queries that used version-1, which is a weakness if the version sequence is not perfect.

AlexT-359331 SSC Veteran Points: 219 More actions · Answer 3

Am I missing something ?

select a.PersonID,a.Version, a.DEDate , 1 AS [Is ImProper]

from PersonRecord a

inner join PersonRecord b on a.PersonID = b.PersonID and a.Version>b.Version and a.DEDate<b.DEDate

SqlOnMyMind SSCertifiable Points: 5062 More actions · Answer 4

AlexT (11/13/2008)
Am I missing something ?
select a.PersonID,a.Version, a.DEDate , 1 AS [Is ImProper]
from PersonRecord a
inner join PersonRecord b on a.PersonID = b.PersonID and a.Version>b.Version and a.DEDate<b.DEDate

Yes, you then need to apply this flag to all the records for that PersonId. (So, if a PersonId has 5 rows and 2 of them are out of order, then all 5 rows will have the ImProper = 1, and those PersonIds with no rows out of order will have ImProper = 0.)

AlexT-359331 SSC Veteran Points: 219 More actions · Answer 5

yes whatever,

select distinct a.PersonID,a.Version, a.DEDate , case when b.PersonID is null then 0 else 1 end AS [Is ImProper]

from PersonRecord a

left join (

select a.PersonID

from PersonRecord a

inner join PersonRecord b on a.PersonID = b.PersonID and a.Version>b.Version and a.DEDate<b.DEDate

) b on a.PersonID = b.PersonID

but initial tast was just get bad data entry dates

SqlOnMyMind SSCertifiable Points: 5062 More actions · Answer 6

AlexT (11/13/2008)
yes whatever,
select distinct a.PersonID,a.Version, a.DEDate , case when b.PersonID is null then 0 else 1 end AS [Is ImProper]
from PersonRecord a
left join (
select a.PersonID
from PersonRecord a
inner join PersonRecord b on a.PersonID = b.PersonID and a.Version>b.Version and a.DEDate<b.DEDate
) b on a.PersonID = b.PersonID
but initial tast was just get bad data entry dates

Thanks, this achieves the desired results, and it's a straight-forward, easy-to-read solution.

When I tested it on my 5 million row table, total execution time was 164580 Milliseconds duration

Edited: Although I think I would put the "select distinct" in the subselect, for cases where one PersonId has multiple rows that are out of order.

AlexT-359331 SSC Veteran Points: 219 More actions · Answer 7

that supposed to be faster, if clustered index applied on (PersonID, Version )

select distinct a.PersonID,a.Version, a.DEDate , case when b.PersonID is null then 0 else 1 end AS [Is ImProper]

from PersonRecord a

left join (

select a.PersonID, a.Version

from PersonRecord a

inner join PersonRecord b on a.PersonID = b.PersonID and a.Version>b.Version and a.DEDate<b.DEDate

) b on a.PersonID = b.PersonID and a.Version=b.Version

thanks

SqlOnMyMind SSCertifiable Points: 5062 More actions · Answer 8

AlexT (11/13/2008)
that supposed to be faster, if clustered index applied on (PersonID, Version )
select distinct a.PersonID,a.Version, a.DEDate , case when b.PersonID is null then 0 else 1 end AS [Is ImProper]
from PersonRecord a
left join (
select a.PersonID, a.Version
from PersonRecord a
inner join PersonRecord b on a.PersonID = b.PersonID and a.Version>b.Version and a.DEDate<b.DEDate
) b on a.PersonID = b.PersonID and a.Version=b.Version
thanks

Good point about the fact that Version is part of the clustered index.

Unfortunately, incorporating version into the subselct breaks the solution. It no longer returns the desired results.

AlexT-359331 SSC Veteran Points: 219 More actions · Answer 9

yes sorry, my mistake. if you don't mind you can try this

select a.PersonID, a.Version, a.DEDate,

case when b.PersonID is null then 0 else 1 end AS [Is ImProper]

from PersonRecord a

left join (

select distinct a.PersonID

from PersonRecord a

inner join PersonRecord b on a.PersonID = b.PersonID and a.Version>b.Version and a.DEDate<b.DEDate

) b on a.PersonID = b.PersonID

thanks

SqlOnMyMind SSCertifiable Points: 5062 More actions · Answer 10

AlexT (11/13/2008)
yes sorry, my mistake. if you don't mind you can try this
select a.PersonID, a.Version, a.DEDate,
case when b.PersonID is null then 0 else 1 end AS [Is ImProper]
from PersonRecord a
left join (
select distinct a.PersonID
from PersonRecord a
inner join PersonRecord b on a.PersonID = b.PersonID and a.Version>b.Version and a.DEDate<b.DEDate
) b on a.PersonID = b.PersonID

This query does produce the desired effect, and yes, moving the distinct inside the subselect does result in a significant performance lift. Total execution time was 144470 Milliseconds! (vs. previous run time of was 164580 Milliseconds)

AlexT-359331 SSC Veteran Points: 219 More actions · Answer 11

that means that you have a lot "bad" de dates in table. in reality difference probably will be more significant

SqlOnMyMind SSCertifiable Points: 5062 More actions · Answer 12

No, actually, only 10,000 PersonIds in my test table of 5 million rows (1 million unique PersonIDs) have records that are out of order; that's about 1% of the ids. That also explains the nice lift, because the subselect has far fewer rows for the distinct to operate on.

Note: the subselect finds roughly 20,000 rows, distinct return 10,000 rows.