Compare Dates in an iterative manner for a particular set of records

Question

Compare Dates in an iterative manner for a particular set of records

Viewing 15 posts - 16 through 30 (of 42 total)

You must be logged in to reply to this topic. Login to reply

SqlOnMyMind SSCertifiable Points: 5062 More actions · Answer 1

Adam Haines (11/12/2008)
Good article!
This type of query can be handled much easier in SQL 2005, but I would still try to avoid a correlated subquery in SQL 2000. You can move the query into a derived table and make it inline. Obviously, this is over simplifying the solution but you get the idea.
SELECT
PersonRecord.*,
Improper.[IsImProper]
FROM PersonRecord
INNER JOIN(
SELECT
a.PersonID,
MAX(CASE
WHEN a.Version B.DEDate
THEN 1
ELSE 0
END) as IsImProper
FROM PersonRecord A
INNER JOIN PersonRecord B
ON A.PersonID = B.PersonID AND
A.Version = B.Version -1
GROUP BY A.PersonId
) AS Improper
ON Improper.[PersonID] = PersonRecord.PersonID

correction: the CASE clause should read:

WHEN a.DEDate > B.DEDate

but this does give the correct results.

The 2005 (partition) technique does not give the desired result of marking all rows for the personID.

James Goodwin Hall of Fame Points: 3691 More actions · Answer 2

Reply to :Why use the cross join?
Yes it could be done in a way you have shown but then the whole personid should be marked as improper if any DEDATE is found mismatched

Divya,

Your given query is essentially marking the personID as improper. In that case a query like:

SELECT PersonID, 1 as improper

FROM #PersonRecord A

INNER JOIN #PersonRecord B

ON A.PersonID = B.PersonID

AND A.Version = B.Version-1

WHERE A.DEDate >= B.DEDate

GROUP BY PersonID

Seems much more simple than the original code.

In Addition, the CROSS JOIN is not necessary (and probably removed by the optimizer) in the original code. The following is the same logic with the Cross join replaced with an INNER JOIN:

SELECT *,

(SELECT CASE

WHEN (SUM(CASE WHEN B.DEDate =1

THEN 1 ELSE 0

END AS isawk

from #PersonRecord B INNER JOIN #PersonRecord C

ON B.PersonID = C.PersonID

AND B.Version < C.Version

WHERE B.PersonID=A.PersonID) AS [Is ImProper]

from #PersonRecord A

If I were aiming for this result, I would probably use something like:

SELECT A.PersonID, Version, DEDate, COALESCE(Improper,0) as Improper

FROM #PersonRecord A

LEFT JOIN (SELECT DISTINCT B.PersonID, 1 as improper

FROM #PersonRecord B

INNER JOIN #PersonRecord C

ON B.PersonID = C.PersonID

AND B.Version = C.Version-1

WHERE B.DEDate >= C.DEDate) as t

ON A.PersonID = t.PersonID

I prefer to put the subquery in the FROM Clause instead of in the SELECT Clause, if possible. The benefit of this is seen when you want to move that subquery into a VIEW so that it can be used as a source in multiple queries.

--

JimFive

sharmago Valued Member Points: 60 More actions · Answer 3

derailed from the topic and question.....

do not use more than two instances of given table...

try to give output like this.

PersonID IsImproper

1 1

2 1

3 0

4 1

Adam Haines SSC-Insane Points: 23217 More actions · Answer 4

Carla Wilson (11/12/2008)
Adam Haines (11/12/2008)
Good article!
This type of query can be handled much easier in SQL 2005, but I would still try to avoid a correlated subquery in SQL 2000. You can move the query into a derived table and make it inline. Obviously, this is over simplifying the solution but you get the idea.
SELECT
PersonRecord.*,
Improper.[IsImProper]
FROM PersonRecord
INNER JOIN(
SELECT
a.PersonID,
MAX(CASE
WHEN a.Version B.DEDate
THEN 1
ELSE 0
END) as IsImProper
FROM PersonRecord A
INNER JOIN PersonRecord B
ON A.PersonID = B.PersonID AND
A.Version = B.Version -1
GROUP BY A.PersonId
) AS Improper
ON Improper.[PersonID] = PersonRecord.PersonID
correction: the CASE clause should read:
WHEN a.DEDate > B.DEDate
but this does give the correct results.
The 2005 (partition) technique does not give the desired result of marking all rows for the personID.

Thanks. The site completely stripped my code :(. I will edit my post.

My blog: http://jahaines.blogspot.com

SqlOnMyMind SSCertifiable Points: 5062 More actions · Answer 5

I was curious how the different techniques performed, so I loaded up the PersonRecord table with more data using a Numbers table (please see article http://www.sqlservercentral.com/articles/TSQL/62867/ [/url]for Numbers table)

insert into PersonRecord

select N,0,'11/12/2008'

from Numbers

where N between 4 and 11000

insert into PersonRecord

select N,1,'11/13/2008'

from Numbers

where N between 4 and 11000

insert into PersonRecord

select N,2,'11/10/2008'

from Numbers

where (N between 4 and 11000) and N % 100 = 0

So total of 11000 PersonIds, and 22118 total rows in the PersonRecord table.

I then recorded the execution time for 4 of the techniques.

Original Query 610 Milliseconds duration

Adam Haines

- inner joins and CASE 576 Milliseconds duration

James Goodwin

- Left join on inner join 656 Milliseconds duration

Matt C - SQL2005

row_number() over (partition) 860 Milliseconds duration

Just thought you would find this interesting.

(edited to add link to Numbers table article)

SqlAsSecondLanguage Mr or Mrs. 500 Points: 529 More actions · Answer 6

Very interesting on the elapsed times... I would have figured I would have been the quickest without having to perform a table join, but I guess either the partition or the row_nuber generation makes it quite a bit slower. Thanks for putting that together!

At any rate, is was a good article and I enjoyed learning a couple of new aproaches!

Peter M. Newbie Points: 1 More actions · Answer 7

I'd be more interested in seeing the execution times where the data size is non-trivial. My suspicion would lead me to believe the row_number() solution might scale better based on my own experience using the related dense_rank() function.

You could even use two dense rank functions that are partitioned identically but have an order by that is based on the version number in one and the date field in the other, then using that to compare when the resulting ranks are not equal. It would require a derived table joined to the original table, but you could get the results you are looking for.

Jeff Moden SSC Guru Points: 1004683 More actions · Answer 8

Carla Wilson (11/12/2008)
I was curious how the different techniques performed, so I loaded up the PersonRecord table with more data using a Numbers table (please see article http://www.sqlservercentral.com/articles/TSQL/62867/ [/url]for Numbers table)
insert into PersonRecord
select N,0,'11/12/2008'
from Numbers
where N between 4 and 11000
insert into PersonRecord
select N,1,'11/13/2008'
from Numbers
where N between 4 and 11000
insert into PersonRecord
select N,2,'11/10/2008'
from Numbers
where (N between 4 and 11000) and N % 100 = 0
So total of 11000 PersonIds, and 22118 total rows in the PersonRecord table.
I then recorded the execution time for 4 of the techniques.
Original Query 610 Milliseconds duration
Adam Haines
- inner joins and CASE 576 Milliseconds duration
James Goodwin
- Left join on inner join 656 Milliseconds duration
Matt C - SQL2005
row_number() over (partition) 860 Milliseconds duration
Just thought you would find this interesting.
(edited to add link to Numbers table article)

Aye, nicely done. What would be more interesting is the CPU time and the I/O hit...

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1004683 More actions · Answer 9

Sorry... looks like those were CPU times.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1004683 More actions · Answer 10

You should see what happens with the correct and incorrect indexing. Adams code (using his because Carla rated it as the fastest) is more than twice as fast as the cross-join if the Clustered PK of PersonID,Version is correctly added to the table. If a non-unique clustered index is added to the same columns instead of the clustered PK, Adams code is almost 3 times as slow.

Neither method of indexing either helped nor hurt the cross-join.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1004683 More actions · Answer 11

Sorry... almost forgot... borrowed on Carla's good code and expanded the test data set a bit for my testing...

CREATE TABLE PersonRecord ( PersonID int NOT NULL, Version int NOT NULL,DEDate datetime NOT NULL)

go

insert into PersonRecord

select N,0,'11/12/2008'

from Tally

where N between 1 and 11000

insert into PersonRecord

select N,1,'11/13/2008'

from Tally

where N between 1 and 11000

insert into PersonRecord

select N,2,'11/10/2008'

from Tally

where (N between 1 and 11000) and N % 100 = 0

insert into PersonRecord

select N,2,'11/14/2008'

from Tally

where (N between 1 and 11000) and N % 100 > 0

insert into PersonRecord

select N,3,'11/15/2008'

from Tally

where N between 1 and 11000

insert into PersonRecord

select N,4,'11/16/2008'

from Tally

where N between 1 and 11000

--===== Uncomment to test with Clustered PK

--ALTER TABLE PersonRecord

--ADD PRIMARY KEY CLUSTERED (PersonID,Version)

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

prabhu.rangan SSC Rookie Points: 36 More actions · Answer 12

Hi Members,

I have two tables named 'result' and 'status' for which i have 'status_key_id' in my result table with default 'null'. I have to update the 'key_id' of 'status' table to 'status_key_id' in 'result' table.

Here in status table there will be only one row with 'started_at' and 'ended_at' and in result table there will be multiple rows within the date range of 'status' started_at and ended_at.

Here no inputs are given, just i have to check the 'status' table 'started_at >= result table 'started_at' and 'status table ended_at <= result table ended_at' and i have to update the 'status' key_id to 'status_key_id' in result table.

Please provide me a sample query with these requirement...

Regards,

Prabhu.

Divya Agrawal Hall of Fame Points: 3149 More actions · Answer 13

Carla's comparison was really very nice.

Thanks I have learnt a lot many new things by writing this article

--Divya

dfarran Grasshopper Points: 16 More actions · Answer 14

It is also possible to do the query with just left joins,

SELECT a.PersonID, a.Version , a.dedate,

[is improper] = CASE WHEN MAX(c.version) IS NOT NULL THEN 1

ELSE 0 END

FROM personrecord a

LEFT JOIN personrecord b ON b.personid = a.personid

LEFT JOIN personrecord c ON c.personid = a.personid

AND c.version >= b.version

AND c.dedate < b.dedate

GROUP BY a.PersonID, a.Version, a.dedate

with the primary key on the table (personid, version) this performs slightly better than the cross join with both the small(8) and medium(circa .75m) rowcount.

I would avoid using the version = version -1 because this does not cater for missing versions (try doubling the version numbers in test data)

Jeff Moden SSC Guru Points: 1004683 More actions · Answer 15

prabhu.rangan (11/12/2008)
Hi Members,
I have two tables named 'result' and 'status' for which i have 'status_key_id' in my result table with default 'null'. I have to update the 'key_id' of 'status' table to 'status_key_id' in 'result' table.
Here in status table there will be only one row with 'started_at' and 'ended_at' and in result table there will be multiple rows within the date range of 'status' started_at and ended_at.
Here no inputs are given, just i have to check the 'status' table 'started_at >= result table 'started_at' and 'status table ended_at <= result table ended_at' and i have to update the 'status' key_id to 'status_key_id' in result table.
Please provide me a sample query with these requirement...
Regards,
Prabhu.

Prabhu,

It would probably be better for you to open your own thread on this... this particular thread is a discussion about an article on the subject of finding out of order rows based on dates and versions.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)