Compare Dates in an iterative manner for a particular set of records

  • Adam Haines (11/12/2008)


    Good article!

    This type of query can be handled much easier in SQL 2005, but I would still try to avoid a correlated subquery in SQL 2000. You can move the query into a derived table and make it inline. Obviously, this is over simplifying the solution but you get the idea.

    SELECT

    PersonRecord.*,

    Improper.[IsImProper]

    FROM PersonRecord

    INNER JOIN(

    SELECT

    a.PersonID,

    MAX(CASE

    WHEN a.Version B.DEDate

    THEN 1

    ELSE 0

    END) as IsImProper

    FROM PersonRecord A

    INNER JOIN PersonRecord B

    ON A.PersonID = B.PersonID AND

    A.Version = B.Version -1

    GROUP BY A.PersonId

    ) AS Improper

    ON Improper.[PersonID] = PersonRecord.PersonID

    correction: the CASE clause should read:

    WHEN a.DEDate > B.DEDate

    but this does give the correct results.

    The 2005 (partition) technique does not give the desired result of marking all rows for the personID.

  • Reply to :Why use the cross join?

    Yes it could be done in a way you have shown but then the whole personid should be marked as improper if any DEDATE is found mismatched

    Divya,

    Your given query is essentially marking the personID as improper. In that case a query like:

    SELECT PersonID, 1 as improper

    FROM #PersonRecord A

    INNER JOIN #PersonRecord B

    ON A.PersonID = B.PersonID

    AND A.Version = B.Version-1

    WHERE A.DEDate >= B.DEDate

    GROUP BY PersonID

    Seems much more simple than the original code.

    In Addition, the CROSS JOIN is not necessary (and probably removed by the optimizer) in the original code. The following is the same logic with the Cross join replaced with an INNER JOIN:

    SELECT *,

    (SELECT CASE

    WHEN (SUM(CASE WHEN B.DEDate =1

    THEN 1 ELSE 0

    END AS isawk

    from #PersonRecord B INNER JOIN #PersonRecord C

    ON B.PersonID = C.PersonID

    AND B.Version < C.Version

    WHERE B.PersonID=A.PersonID) AS [Is ImProper]

    from #PersonRecord A

    If I were aiming for this result, I would probably use something like:

    SELECT A.PersonID, Version, DEDate, COALESCE(Improper,0) as Improper

    FROM #PersonRecord A

    LEFT JOIN (SELECT DISTINCT B.PersonID, 1 as improper

    FROM #PersonRecord B

    INNER JOIN #PersonRecord C

    ON B.PersonID = C.PersonID

    AND B.Version = C.Version-1

    WHERE B.DEDate >= C.DEDate) as t

    ON A.PersonID = t.PersonID

    I prefer to put the subquery in the FROM Clause instead of in the SELECT Clause, if possible. The benefit of this is seen when you want to move that subquery into a VIEW so that it can be used as a source in multiple queries.

    --

    JimFive

  • derailed from the topic and question.....

    do not use more than two instances of given table...

    try to give output like this.

    PersonID IsImproper

    1 1

    2 1

    3 0

    4 1

  • Carla Wilson (11/12/2008)


    Adam Haines (11/12/2008)


    Good article!

    This type of query can be handled much easier in SQL 2005, but I would still try to avoid a correlated subquery in SQL 2000. You can move the query into a derived table and make it inline. Obviously, this is over simplifying the solution but you get the idea.

    SELECT

    PersonRecord.*,

    Improper.[IsImProper]

    FROM PersonRecord

    INNER JOIN(

    SELECT

    a.PersonID,

    MAX(CASE

    WHEN a.Version B.DEDate

    THEN 1

    ELSE 0

    END) as IsImProper

    FROM PersonRecord A

    INNER JOIN PersonRecord B

    ON A.PersonID = B.PersonID AND

    A.Version = B.Version -1

    GROUP BY A.PersonId

    ) AS Improper

    ON Improper.[PersonID] = PersonRecord.PersonID

    correction: the CASE clause should read:

    WHEN a.DEDate > B.DEDate

    but this does give the correct results.

    The 2005 (partition) technique does not give the desired result of marking all rows for the personID.

    Thanks. The site completely stripped my code :(. I will edit my post.

  • I was curious how the different techniques performed, so I loaded up the PersonRecord table with more data using a Numbers table (please see article http://www.sqlservercentral.com/articles/TSQL/62867/ [/url]for Numbers table)

    insert into PersonRecord

    select N,0,'11/12/2008'

    from Numbers

    where N between 4 and 11000

    insert into PersonRecord

    select N,1,'11/13/2008'

    from Numbers

    where N between 4 and 11000

    insert into PersonRecord

    select N,2,'11/10/2008'

    from Numbers

    where (N between 4 and 11000) and N % 100 = 0

    So total of 11000 PersonIds, and 22118 total rows in the PersonRecord table.

    I then recorded the execution time for 4 of the techniques.

    Original Query 610 Milliseconds duration

    Adam Haines

    - inner joins and CASE 576 Milliseconds duration

    James Goodwin

    - Left join on inner join 656 Milliseconds duration

    Matt C - SQL2005

    row_number() over (partition) 860 Milliseconds duration

    Just thought you would find this interesting.

    (edited to add link to Numbers table article)

  • Very interesting on the elapsed times... I would have figured I would have been the quickest without having to perform a table join, but I guess either the partition or the row_nuber generation makes it quite a bit slower. Thanks for putting that together!

    At any rate, is was a good article and I enjoyed learning a couple of new aproaches!

  • I'd be more interested in seeing the execution times where the data size is non-trivial. My suspicion would lead me to believe the row_number() solution might scale better based on my own experience using the related dense_rank() function.

    You could even use two dense rank functions that are partitioned identically but have an order by that is based on the version number in one and the date field in the other, then using that to compare when the resulting ranks are not equal. It would require a derived table joined to the original table, but you could get the results you are looking for.

  • Carla Wilson (11/12/2008)


    I was curious how the different techniques performed, so I loaded up the PersonRecord table with more data using a Numbers table (please see article http://www.sqlservercentral.com/articles/TSQL/62867/ [/url]for Numbers table)

    insert into PersonRecord

    select N,0,'11/12/2008'

    from Numbers

    where N between 4 and 11000

    insert into PersonRecord

    select N,1,'11/13/2008'

    from Numbers

    where N between 4 and 11000

    insert into PersonRecord

    select N,2,'11/10/2008'

    from Numbers

    where (N between 4 and 11000) and N % 100 = 0

    So total of 11000 PersonIds, and 22118 total rows in the PersonRecord table.

    I then recorded the execution time for 4 of the techniques.

    Original Query 610 Milliseconds duration

    Adam Haines

    - inner joins and CASE 576 Milliseconds duration

    James Goodwin

    - Left join on inner join 656 Milliseconds duration

    Matt C - SQL2005

    row_number() over (partition) 860 Milliseconds duration

    Just thought you would find this interesting.

    (edited to add link to Numbers table article)

    Aye, nicely done. What would be more interesting is the CPU time and the I/O hit...

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Sorry... looks like those were CPU times.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • You should see what happens with the correct and incorrect indexing. Adams code (using his because Carla rated it as the fastest) is more than twice as fast as the cross-join if the Clustered PK of PersonID,Version is correctly added to the table. If a non-unique clustered index is added to the same columns instead of the clustered PK, Adams code is almost 3 times as slow.

    Neither method of indexing either helped nor hurt the cross-join.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Sorry... almost forgot... borrowed on Carla's good code and expanded the test data set a bit for my testing...

    CREATE TABLE PersonRecord ( PersonID int NOT NULL, Version int NOT NULL,DEDate datetime NOT NULL)

    go

    insert into PersonRecord

    select N,0,'11/12/2008'

    from Tally

    where N between 1 and 11000

    insert into PersonRecord

    select N,1,'11/13/2008'

    from Tally

    where N between 1 and 11000

    insert into PersonRecord

    select N,2,'11/10/2008'

    from Tally

    where (N between 1 and 11000) and N % 100 = 0

    insert into PersonRecord

    select N,2,'11/14/2008'

    from Tally

    where (N between 1 and 11000) and N % 100 > 0

    insert into PersonRecord

    select N,3,'11/15/2008'

    from Tally

    where N between 1 and 11000

    insert into PersonRecord

    select N,4,'11/16/2008'

    from Tally

    where N between 1 and 11000

    --===== Uncomment to test with Clustered PK

    --ALTER TABLE PersonRecord

    --ADD PRIMARY KEY CLUSTERED (PersonID,Version)

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Hi Members,

    I have two tables named 'result' and 'status' for which i have 'status_key_id' in my result table with default 'null'. I have to update the 'key_id' of 'status' table to 'status_key_id' in 'result' table.

    Here in status table there will be only one row with 'started_at' and 'ended_at' and in result table there will be multiple rows within the date range of 'status' started_at and ended_at.

    Here no inputs are given, just i have to check the 'status' table 'started_at >= result table 'started_at' and 'status table ended_at <= result table ended_at' and i have to update the 'status' key_id to 'status_key_id' in result table.

    Please provide me a sample query with these requirement...

    Regards,

    Prabhu.

  • Carla's comparison was really very nice.

    Thanks I have learnt a lot many new things by writing this article

    --Divya

  • It is also possible to do the query with just left joins,

    SELECT a.PersonID, a.Version , a.dedate,

    [is improper] = CASE WHEN MAX(c.version) IS NOT NULL THEN 1

    ELSE 0 END

    FROM personrecord a

    LEFT JOIN personrecord b ON b.personid = a.personid

    LEFT JOIN personrecord c ON c.personid = a.personid

    AND c.version >= b.version

    AND c.dedate < b.dedate

    GROUP BY a.PersonID, a.Version, a.dedate

    with the primary key on the table (personid, version) this performs slightly better than the cross join with both the small(8) and medium(circa .75m) rowcount.

    I would avoid using the version = version -1 because this does not cater for missing versions (try doubling the version numbers in test data)

  • prabhu.rangan (11/12/2008)


    Hi Members,

    I have two tables named 'result' and 'status' for which i have 'status_key_id' in my result table with default 'null'. I have to update the 'key_id' of 'status' table to 'status_key_id' in 'result' table.

    Here in status table there will be only one row with 'started_at' and 'ended_at' and in result table there will be multiple rows within the date range of 'status' started_at and ended_at.

    Here no inputs are given, just i have to check the 'status' table 'started_at >= result table 'started_at' and 'status table ended_at <= result table ended_at' and i have to update the 'status' key_id to 'status_key_id' in result table.

    Please provide me a sample query with these requirement...

    Regards,

    Prabhu.

    Prabhu,

    It would probably be better for you to open your own thread on this... this particular thread is a discussion about an article on the subject of finding out of order rows based on dates and versions.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 15 posts - 16 through 30 (of 41 total)

You must be logged in to reply to this topic. Login to reply