Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase «««12345»»

Compare Dates in an iterative manner for a particular set of records Expand / Collapse
Author
Message
Posted Thursday, November 13, 2008 7:36 AM
SSCommitted

SSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommitted

Group: General Forum Members
Last Login: Monday, October 27, 2014 6:49 AM
Points: 1,585, Visits: 1,859
dfarran (11/13/2008)
It is also possible to do the query with just left joins,

SELECT a.PersonID, a.Version , a.dedate,
[is improper] = CASE WHEN MAX(c.version) IS NOT NULL THEN 1
ELSE 0 END
FROM personrecord a
LEFT JOIN personrecord b ON b.personid = a.personid
LEFT JOIN personrecord c ON c.personid = a.personid
AND c.version >= b.version
AND c.dedate < b.dedate
GROUP BY a.PersonID, a.Version, a.dedate

with the primary key on the table (personid, version) this performs slightly better than the cross join with both the small(8) and medium(circa .75m) rowcount.

I would avoid using the version = version -1 because this does not cater for missing versions (try doubling the version numbers in test data)

Good point about not relying on version = version-1!
Post #602108
Posted Thursday, November 13, 2008 7:59 AM
SSCommitted

SSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommitted

Group: General Forum Members
Last Login: Monday, October 27, 2014 6:49 AM
Points: 1,585, Visits: 1,859
Peter M. (11/12/2008)
I'd be more interested in seeing the execution times where the data size is non-trivial. My suspicion would lead me to believe the row_number() solution might scale better based on my own experience using the related dense_rank() function.

You could even use two dense rank functions that are partitioned identically but have an order by that is based on the version number in one and the date field in the other, then using that to compare when the resulting ranks are not equal. It would require a derived table joined to the original table, but you could get the results you are looking for.


and thanks, Jeff Moden, for pointing out the use of the clustered index on PersonId, Version.

I was interested to see how the different solutions would scale, so I expanded the PersonRecord table to 1 million PersonIDs, and expanded the rows of data to 5 records per PersonId, using Jeff's code. And I added the clustered index.

And I added dfarran's query.

Here are the results:
Original Query 153,733 Milliseconds duration

James Goodwin -
Left join on inner join 136,703 Milliseconds duration

Matt C -
SQL2005 row_number()
over (partition 184,280 Milliseconds duration

Adam Haines -
inner joins and CASE 138,906 Milliseconds duration

dfarrans -
just left joins,
no version-1 245,016 Milliseconds duration

Granted, these are total execution times, so include I/O.

Also, I did not alter the queries that used version-1, which is a weakness if the version sequence is not perfect.
Post #602141
Posted Thursday, November 13, 2008 8:30 AM
SSC-Enthusiastic

SSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-Enthusiastic

Group: General Forum Members
Last Login: Monday, August 25, 2014 10:23 AM
Points: 149, Visits: 591
Am I missing something ?

select a.PersonID,a.Version, a.DEDate , 1 AS [Is ImProper]
from PersonRecord a
inner join PersonRecord b on a.PersonID = b.PersonID and a.Version>b.Version and a.DEDate<b.DEDate

Post #602177
Posted Thursday, November 13, 2008 8:36 AM
SSCommitted

SSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommitted

Group: General Forum Members
Last Login: Monday, October 27, 2014 6:49 AM
Points: 1,585, Visits: 1,859
AlexT (11/13/2008)
Am I missing something ?

select a.PersonID,a.Version, a.DEDate , 1 AS [Is ImProper]
from PersonRecord a
inner join PersonRecord b on a.PersonID = b.PersonID and a.Version>b.Version and a.DEDate<b.DEDate


Yes, you then need to apply this flag to all the records for that PersonId. (So, if a PersonId has 5 rows and 2 of them are out of order, then all 5 rows will have the ImProper = 1, and those PersonIds with no rows out of order will have ImProper = 0.)
Post #602183
Posted Thursday, November 13, 2008 9:04 AM
SSC-Enthusiastic

SSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-Enthusiastic

Group: General Forum Members
Last Login: Monday, August 25, 2014 10:23 AM
Points: 149, Visits: 591
yes whatever,

select distinct a.PersonID,a.Version, a.DEDate , case when b.PersonID is null then 0 else 1 end AS [Is ImProper]
from PersonRecord a
left join (
select a.PersonID
from PersonRecord a
inner join PersonRecord b on a.PersonID = b.PersonID and a.Version>b.Version and a.DEDate<b.DEDate
) b on a.PersonID = b.PersonID

but initial tast was just get bad data entry dates
Post #602203
Posted Thursday, November 13, 2008 9:19 AM
SSCommitted

SSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommitted

Group: General Forum Members
Last Login: Monday, October 27, 2014 6:49 AM
Points: 1,585, Visits: 1,859
AlexT (11/13/2008)
yes whatever,

select distinct a.PersonID,a.Version, a.DEDate , case when b.PersonID is null then 0 else 1 end AS [Is ImProper]
from PersonRecord a
left join (
select a.PersonID
from PersonRecord a
inner join PersonRecord b on a.PersonID = b.PersonID and a.Version>b.Version and a.DEDate<b.DEDate
) b on a.PersonID = b.PersonID

but initial tast was just get bad data entry dates


Thanks, this achieves the desired results, and it's a straight-forward, easy-to-read solution.

When I tested it on my 5 million row table, total execution time was 164580 Milliseconds duration

Edited: Although I think I would put the "select distinct" in the subselect, for cases where one PersonId has multiple rows that are out of order.
Post #602211
Posted Thursday, November 13, 2008 9:33 AM
SSC-Enthusiastic

SSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-Enthusiastic

Group: General Forum Members
Last Login: Monday, August 25, 2014 10:23 AM
Points: 149, Visits: 591
that supposed to be faster, if clustered index applied on (PersonID, Version )

select distinct a.PersonID,a.Version, a.DEDate , case when b.PersonID is null then 0 else 1 end AS [Is ImProper]
from PersonRecord a
left join (
select a.PersonID, a.Version
from PersonRecord a
inner join PersonRecord b on a.PersonID = b.PersonID and a.Version>b.Version and a.DEDate<b.DEDate
) b on a.PersonID = b.PersonID and a.Version=b.Version

thanks

Post #602223
Posted Thursday, November 13, 2008 9:39 AM
SSCommitted

SSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommitted

Group: General Forum Members
Last Login: Monday, October 27, 2014 6:49 AM
Points: 1,585, Visits: 1,859
AlexT (11/13/2008)
that supposed to be faster, if clustered index applied on (PersonID, Version )

select distinct a.PersonID,a.Version, a.DEDate , case when b.PersonID is null then 0 else 1 end AS [Is ImProper]
from PersonRecord a
left join (
select a.PersonID, a.Version
from PersonRecord a
inner join PersonRecord b on a.PersonID = b.PersonID and a.Version>b.Version and a.DEDate<b.DEDate
) b on a.PersonID = b.PersonID and a.Version=b.Version

thanks



Good point about the fact that Version is part of the clustered index.
Unfortunately, incorporating version into the subselct breaks the solution. It no longer returns the desired results.
Post #602231
Posted Thursday, November 13, 2008 10:54 AM
SSC-Enthusiastic

SSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-Enthusiastic

Group: General Forum Members
Last Login: Monday, August 25, 2014 10:23 AM
Points: 149, Visits: 591
yes sorry, my mistake. if you don't mind you can try this

select a.PersonID, a.Version, a.DEDate,
case when b.PersonID is null then 0 else 1 end AS [Is ImProper]
from PersonRecord a
left join (
select distinct a.PersonID
from PersonRecord a
inner join PersonRecord b on a.PersonID = b.PersonID and a.Version>b.Version and a.DEDate<b.DEDate
) b on a.PersonID = b.PersonID

thanks

Post #602273
Posted Thursday, November 13, 2008 12:58 PM
SSCommitted

SSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommitted

Group: General Forum Members
Last Login: Monday, October 27, 2014 6:49 AM
Points: 1,585, Visits: 1,859
AlexT (11/13/2008)
yes sorry, my mistake. if you don't mind you can try this

select a.PersonID, a.Version, a.DEDate,
case when b.PersonID is null then 0 else 1 end AS [Is ImProper]
from PersonRecord a
left join (
select distinct a.PersonID
from PersonRecord a
inner join PersonRecord b on a.PersonID = b.PersonID and a.Version>b.Version and a.DEDate<b.DEDate
) b on a.PersonID = b.PersonID



This query does produce the desired effect, and yes, moving the distinct inside the subselect does result in a significant performance lift. Total execution time was 144470 Milliseconds! (vs. previous run time of was 164580 Milliseconds)
Post #602337
« Prev Topic | Next Topic »

Add to briefcase «««12345»»

Permissions Expand / Collapse