Help with query

  • Hi all,

    I'm having a moment, and I can't seem to get the query right. I'm trying to get the actual start and end dates for each rev_id.
    Basically,  each row has the same start date, but I want the start date to be the end date of the previous revision, based on the end date by article_id.

    Here's some sample data:

    IF OBJECT_ID('tData') IS NOT NULL
    DROP TABLE tData
    GO

    CREATE TABLE tData
        (nArticle_Id    INTEGER        NOT NULL,
        nRev_Id        INTEGER        NOT NULL PRIMARY KEY,
        dStart        DATETIME    NOT NULL,
        dEnd        DATETIME    NULL)
    GO

    INSERT    tData VALUES (6457, 51, '2015-01-08 08:11:40.000', '2015-01-08 08:23:02.020')
    INSERT    tData VALUES (6457, 95, '2015-01-08 08:11:40.000', '2015-01-08 10:28:18.173')
    INSERT    tData VALUES (6457, 82, '2015-01-08 08:11:40.000', '2015-02-05 13:59:13.027')
    INSERT    tData VALUES (6457, 102, '2015-01-08 08:11:40.000', NULL)
    GO

    And here are the expected results (sorry, not having much luck with the formatting):

    nArticle_Id  nRev_Id dStart      dEnd
    ----------- ----------- ----------------------- -----------------------
       6457    51 2015-01-08 08:11:40.000 2015-01-08 08:23:02.020
       6457    82 2015-01-08 08:23:02.020 2015-02-05 13:59:13.027
       6457    95 2015-02-05 13:59:13.027 2015-01-08 10:28:18.173
       6457   102 2015-01-08 10:28:18.173 NULL

    Can anyone help?

    TIA,

    Paul


  • DECLARE @tData TABLE
    (
        nArticle_Id int not null,
        nRev_Id int not null PRIMARY KEY,
        dStart datetime2 not null,
        dEnd datetime2 null
    );

    INSERT into @tData
    VALUES
        (6457, 51, '2015-01-08 08:11:40.000', '2015-01-08 08:23:02.020'),
        (6457, 95, '2015-01-08 08:11:40.000', '2015-01-08 10:28:18.173'),
        (6457, 82, '2015-01-08 08:11:40.000', '2015-02-05 13:59:13.027'),
        (6457, 102, '2015-01-08 08:11:40.000', NULL)
    ;

    SELECT
    t.nArticle_Id,
    t.nRev_Id,
    t.dStart,
    (SELECT t2.dEnd FROM @tData t2 WHERE t2.nArticle_Id = t.nArticle_Id and t2.nRev_Id = (SELECT MAX(t3.nRev_Id) FROM @tdata t3 WHERE t3.nRev_id < t.nRev_ID)) as RealStartDate,
    t.dEnd
    FROM @tData t
    ;

  • You can use the LAG function to return the previous end date - something like LAG(dEnd,1) OVER(PARTITION BY nArticle_Id ORDER BY nRev_Id) As NewStartDate

    Jeffrey Williams
    “We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”

    ― Charles R. Swindoll

    How to post questions to get better answers faster
    Managing Transaction Logs

  • it would be easier on a newer version of SQL Server, but for 2008 how about this?

    SELECT nArticle_Id, nRev_Id, dStart, dEnd, ROW_NUMBER() OVER (PARTITION BY nArticle_Id ORDER BY nRev_Id) AS sort
      INTO #sorted
      FROM tData;

    CREATE UNIQUE CLUSTERED INDEX IX_sorted ON #sorted (nArticle_Id, sort);

    SELECT c.nArticle_Id, c.nRev_Id, ISNULL(p.dEnd, c.dStart), c.dEnd
      FROM #sorted c
      LEFT OUTER JOIN #sorted p ON c.nArticle_Id = p.nArticle_Id AND c.sort = p.sort + 1;

    DROP TABLE #sorted;

  • All:

    I goofed on the expected results in the OP: here's how they should appear.
    The PK (nrev_Id)  can be out of calendar sequence (because users), so that can't be used for ordering.

    nArticle_Id    nRev_Id            dPublished                                dExpired
    6457            51                        2015-01-08 08:11:40.000        2015-01-08 08:23:02.020
    6457            95                         2015-01-08 08:23:02.020        2015-01-08 10:28:18.173
    6457            82                        2015-01-08 10:28:18.173        2015-02-05 13:59:13.027
    6457            102                      2015-02-05 13:59:13.027        NULL

    Bill, Thanks for taking the time to post the query, and if nothing else, pointing me in the right direction.  I think I'll be able to get it now.
    Jeffrey: LAG would appear to be ideal it, but is not available until SQL 2012

  • Thanks Chris, that did it when I changed the ORDER BY to use the date in the ROW_NUMBER()

    Much appreciated!

  • Jeffrey Williams 3188 - Thursday, April 20, 2017 12:12 PM

    You can use the LAG function to return the previous end date - something like LAG(dEnd,1) OVER(PARTITION BY nArticle_Id ORDER BY nRev_Id) As NewStartDate

    This was posted in a SQL 2008 forum, and LAG was introduced in SQL 2012.

    Drew

    J. Drew Allen
    Business Intelligence Analyst
    Philadelphia, PA

  • Chris Harshman - Thursday, April 20, 2017 12:21 PM

    it would be easier on a newer version of SQL Server, but for 2008 how about this?

    SELECT nArticle_Id, nRev_Id, dStart, dEnd, ROW_NUMBER() OVER (PARTITION BY nArticle_Id ORDER BY nRev_Id) AS sort
      INTO #sorted
      FROM tData;

    CREATE UNIQUE CLUSTERED INDEX IX_sorted ON #sorted (nArticle_Id, sort);

    SELECT c.nArticle_Id, c.nRev_Id, ISNULL(p.dEnd, c.dStart), c.dEnd
      FROM #sorted c
      LEFT OUTER JOIN #sorted p ON c.nArticle_Id = p.nArticle_Id AND c.sort = p.sort + 1;

    DROP TABLE #sorted;

    Hi Chris,
    Just wanted to know, why the index was created. Please explain

  • VSSGeorge - Wednesday, May 31, 2017 7:02 AM

    Hi Chris,
    Just wanted to know, why the index was created. Please explain

    The index isn't completely necessary, but depending on the number of distinct nArticle_ID values, and how many rows there are per nArticle_ID, then the index will help the self join of the #sorted table perform better.

  • This query would perform much better:
    SELECT t.nArticle_Id,
    t.nRev_Id,
    isnull(pt.dEnd,t.dStart) Start,
    t.dEnd
    FROM @tData t
    outer apply (select top 1 dEnd from @tData p
            where p.nArticle_Id = t.nArticle_Id and p.nRev_Id < t.nRev_Id
            order by p.nRev_Id desc) pt

    _____________
    Code for TallyGenerator

  • Now that there are multiple solutions for this problem, I have to ask, if the "current" start date will be replaced with the previous end date, why is there any need for the start date column in the table to being with?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • In short,
    - the guy who modeled this insisted on doing it (t)his way. We let him. He doesn't work here anymore.
    - I'm trying to get the "real" date ranges (as in, a given revision was live between X and Y dates) for each revision. For reasons that I have yet to fathom, users want the create date to inherit from previous revisions of the same article - in some cases. In other cases, the current date at the time of revision is what they want.
    - Finally, I was pretty sure I was looking at an on-coming Triangular Join (thanks, Jeff!). As we know, works fine with a few hundred rows, not so much with a few hundred thousand.

Viewing 12 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply