Self Join Question

  • I've got a table with the following fields in it:

    UniqueID
    Sequence
    BeforeStatus
    AfterStatus
    HistoryStatusDate

    Sample data looks like :

    08-40089445, 1, '','BNT',10/17/17
    08-40089445, 2, 'BNT','VFD',1/18/18
    08-40089445, 3, 'VFD','NST',1/23/18
    08-40089445, 4, 'NST','SLD',1/23/18

    What I need to do is basically make a "beginning history" and  "ending history" date for each record:
    08-40089445, 1, '','BNT',10/17/17, '1/17/18'
    08-40089445, 2, 'BNT','VFD',1/18/18, '1/22/18'
    08-40089445, 3, 'VFD','NST',1/23/18,'1/23/18'
    08-40089445, 4, 'NST','SLD',1/23/18,'' 
    When I write a self join such as

    SELECT * FRom
    (SELECT s.*,h.HistoryStatusDate statusEnddate
    FROM [StatusHistory] S
    JOIN [StatusHistory] h
    ON s.UniqueID=h.UniqueID
    WHERE s.Sequence=h.Sequence-1
    ) s
    --Still trying to find the 7th record for this example
    WHERE s.[unique#]='08-40089445'

    I end up getting the first 3 back but always miss the 4th (in my example).

    Anyone have an idea of a better way to write this/solve this problem?  (it's tracking when a record changes a status but it's done in an iSeries so getting the programmers to put the "ending" date on isn't very likely, and the query is ignoring due to the sequence-1 portion for the 4th (max) record.

  • Terry Gamble - Wednesday, January 31, 2018 1:22 PM

    I've got a table with the following fields in it:

    UniqueID
    Sequence
    BeforeStatus
    AfterStatus
    HistoryStatusDate

    Sample data looks like :

    08-40089445, 1, '','BNT',10/17/17
    08-40089445, 2, 'BNT','VFD',1/18/18
    08-40089445, 3, 'VFD','NST',1/23/18
    08-40089445, 4, 'NST','SLD',1/23/18

    What I need to do is basically make a "beginning history" and  "ending history" date for each record:
    08-40089445, 1, '','BNT',10/17/17, '1/17/18'
    08-40089445, 2, 'BNT','VFD',1/18/18, '1/22/18'
    08-40089445, 3, 'VFD','NST',1/23/18,'1/23/18'
    08-40089445, 4, 'NST','SLD',1/23/18,'' 
    When I write a self join such as

    SELECT * FRom
    (SELECT s.*,h.HistoryStatusDate statusEnddate
    FROM [StatusHistory] S
    JOIN [StatusHistory] h
    ON s.UniqueID=h.UniqueID
    WHERE s.Sequence=h.Sequence-1
    ) s
    --Still trying to find the 7th record for this example
    WHERE s.[unique#]='08-40089445'

    I end up getting the first 3 back but always miss the 4th (in my example).

    Anyone have an idea of a better way to write this/solve this problem?  (it's tracking when a record changes a status but it's done in an iSeries so getting the programmers to put the "ending" date on isn't very likely, and the query is ignoring due to the sequence-1 portion for the 4th (max) record.

    You should be able to get those end dates using LEAD. No need for any joins.


  • This is maybe not the right forum/version as I don't have the "lead" function that I can use/find.  Is there a true "SQL" based solution using something like the self join?  Versus the newer functions like LEAD (which btw I sincerely wish I could use).

  • You posted in the SQL Server 2017 forum. In 2017, Lead is a true SQL function ?


  • Which version are you using?


  • I'm not sure what version of SQL that my BI tool rides on (Sisense).  I was looking through it for any solution and the only thing I found was self-joins to solve my issue and it drew me back to this site to ask the question.

  • Try changing your JOIN to a LEFT JOIN

  • I had tried that.  The table is 8 million rows right now and even trying to limit it to a specific unique ID the server basically locks up on the query (the cube goes to using 110+ GB memory and doesn't return the result so I end up having to kill the cube).  It's an annoying issue.

  • Terry Gamble - Wednesday, January 31, 2018 1:22 PM

    I've got a table with the following fields in it:

    UniqueID
    Sequence
    BeforeStatus
    AfterStatus
    HistoryStatusDate

    Sample data looks like :

    08-40089445, 1, '','BNT',10/17/17
    08-40089445, 2, 'BNT','VFD',1/18/18
    08-40089445, 3, 'VFD','NST',1/23/18
    08-40089445, 4, 'NST','SLD',1/23/18

    What I need to do is basically make a "beginning history" and  "ending history" date for each record:
    08-40089445, 1, '','BNT',10/17/17, '1/17/18'
    08-40089445, 2, 'BNT','VFD',1/18/18, '1/22/18'
    08-40089445, 3, 'VFD','NST',1/23/18,'1/23/18'
    08-40089445, 4, 'NST','SLD',1/23/18,'' 
    When I write a self join such as

    SELECT * FRom
    (SELECT s.*,h.HistoryStatusDate statusEnddate
    FROM [StatusHistory] S
    JOIN [StatusHistory] h
    ON s.UniqueID=h.UniqueID
    WHERE s.Sequence=h.Sequence-1
    ) s
    --Still trying to find the 7th record for this example
    WHERE s.[unique#]='08-40089445'

    I end up getting the first 3 back but always miss the 4th (in my example).

    Anyone have an idea of a better way to write this/solve this problem?  (it's tracking when a record changes a status but it's done in an iSeries so getting the programmers to put the "ending" date on isn't very likely, and the query is ignoring due to the sequence-1 portion for the 4th (max) record.

    Could you please try to use OFFSET and FETCH for finding nth record as you wish. But still the performance is a concern.

  • DesNorton - Wednesday, January 31, 2018 10:31 PM

    Try changing your JOIN to a LEFT JOIN

    LEFT JOIN alone will not cut it. You'll also have to move the WHERE condition to ON:

    CREATE TABLE #StatusHistory (
     UniqueID varchar(30) ,
     Sequence int,
     BeforeStatus varchar(10),
     AfterStatus varchar(10),
     HistoryStatusDate date,
     PRIMARY KEY(UniqueID, Sequence)
     )

    INSERT INTO #StatusHistory VALUES('08-40089445',1,'','BNT','20171017')
    INSERT INTO #StatusHistory VALUES('08-40089445',2,'BNT','VFD','20180118')
    INSERT INTO #StatusHistory VALUES('08-40089445',3,'VFD','NST','20180123')
    INSERT INTO #StatusHistory VALUES('08-40089445',4,'NST','SLD','20180123')

    SELECT * FRom
    (SELECT s.*,h.HistoryStatusDate statusEnddate
    FROM [#StatusHistory] S
    LEFT JOIN [#StatusHistory] h
    ON s.UniqueID=h.UniqueID
    AND  s.Sequence=h.Sequence-1
    ) s
    --Still trying to find the 7th record for this example
    WHERE s.[UniqueID]='08-40089445'
    ORDER BY s.UniqueID, s.Sequence
    Result:
    UniqueID                       Sequence    BeforeStatus AfterStatus HistoryStatusDate statusEnddate
    ------------------------------ ----------- ------------ ----------- ----------------- -------------
    08-40089445                    1                        BNT         2017-10-17        2018-01-18
    08-40089445                    2           BNT          VFD         2018-01-18        2018-01-23
    08-40089445                    3           VFD          NST         2018-01-23        2018-01-23
    08-40089445                    4           NST          SLD         2018-01-23        NULL

    (4 rows affected)

    However, I'd be reluctant to rely on the Sequence column being an unbroken sequence (the -1 condition). So I'd likely utilize an OUTER APPLY in stead of the LEFT JOIN:
    SELECT * FRom
    (SELECT s.*,h.HistoryStatusDate statusEnddate
    FROM [#StatusHistory] S
    OUTER APPLY (
     SELECT TOP(1) HistoryStatusDate
     FROM [#StatusHistory] h
     WHERE s.UniqueID=h.UniqueID
     AND h.Sequence > s.Sequence
     ORDER BY h.Sequence ASC
     ) h
    ) s
    --Still trying to find the 7th record for this example
    WHERE s.[UniqueID]='08-40089445'
    ORDER BY s.UniqueID, s.Sequence

    If we put in a 5 instead of 4 in the last row (thereby creating a gap in the Sequence column sequence), we can see the difference:
    UniqueID                       Sequence    BeforeStatus AfterStatus HistoryStatusDate statusEnddate
    ------------------------------ ----------- ------------ ----------- ----------------- -------------
    08-40089445                    1                        BNT         2017-10-17        2018-01-18
    08-40089445                    2           BNT          VFD         2018-01-18        2018-01-23
    08-40089445                    3           VFD          NST         2018-01-23        NULL
    08-40089445                    5           NST          SLD         2018-01-23        NULL

    (4 rows affected)

    UniqueID                       Sequence    BeforeStatus AfterStatus HistoryStatusDate statusEnddate
    ------------------------------ ----------- ------------ ----------- ----------------- -------------
    08-40089445                    1                        BNT         2017-10-17        2018-01-18
    08-40089445                    2           BNT          VFD         2018-01-18        2018-01-23
    08-40089445                    3           VFD          NST         2018-01-23        2018-01-23
    08-40089445                    5           NST          SLD         2018-01-23        NULL

    (4 rows affected)

    Only the last one can cope.

    Anyway, both methods will need an index on UniqueID, Sequence in order to perform adequately if more than a few rows are to be processed.

  • subramaniam.chandrasekar - Thursday, February 1, 2018 6:32 AM

    Terry Gamble - Wednesday, January 31, 2018 1:22 PM

    Terry Gamble - Wednesday, January 31, 2018 1:22 PM

    I've got a table with the following fields in it:

    UniqueID
    Sequence
    BeforeStatus
    AfterStatus
    HistoryStatusDate

    Sample data looks like :

    08-40089445, 1, '','BNT',10/17/17
    08-40089445, 2, 'BNT','VFD',1/18/18
    08-40089445, 3, 'VFD','NST',1/23/18
    08-40089445, 4, 'NST','SLD',1/23/18

    What I need to do is basically make a "beginning history" and  "ending history" date for each record:
    08-40089445, 1, '','BNT',10/17/17, '1/17/18'
    08-40089445, 2, 'BNT','VFD',1/18/18, '1/22/18'
    08-40089445, 3, 'VFD','NST',1/23/18,'1/23/18'
    08-40089445, 4, 'NST','SLD',1/23/18,'' 
    When I write a self join such as

    SELECT * FRom
    (SELECT s.*,h.HistoryStatusDate statusEnddate
    FROM [StatusHistory] S
    JOIN [StatusHistory] h
    ON s.UniqueID=h.UniqueID
    WHERE s.Sequence=h.Sequence-1
    ) s
    --Still trying to find the 7th record for this example
    WHERE s.[unique#]='08-40089445'

    I end up getting the first 3 back but always miss the 4th (in my example).

    Anyone have an idea of a better way to write this/solve this problem?  (it's tracking when a record changes a status but it's done in an iSeries so getting the programmers to put the "ending" date on isn't very likely, and the query is ignoring due to the sequence-1 portion for the 4th (max) record.

    >> I've got a table with the following fields in it: <<

    you might want to actually read the SQL standards, so you’ll know the difference between a column in a field. And why a row is not anything like a record. They are totally different concepts. Also, I can you post DDL instead of just a little shopping list of column names? Do you know what DDL is? I guess we have to start guessing and keys, too. There is no such thing as a “unique_id†because in RDBMS. An identifier must be the identifier of something in particular, and a “unique†is both a reserved word and not an entity. Then “Sequence†is yet another reserved word in SQL. A status is a state of being of something and it has to have a start and end time for this state. You never read a book on data modeling and you need to very badly. Finally, the only display format allowed for dates in ANSI-ISO Standard SQL is based on ISO 8601, not your local dialect. What little you did post is wrong.

    https://www.red-gate.com/simple-talk/sql/t-sql-programming/contiguous-time-periods/

    start with a skeleton, and then build on it. Will then need to look at another article of mine on state transition constraints

    https://www.red-gate.com/simple-talk/sql/t-sql-programming/state-transition-constraints/

    CREATE TABLE Foobar_History
    (foobar_id CHAR(11), NOT NULL
    CHECK (foobar_id LIKE ‘08-[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]’),
    foobar_status CHAR(3) NOT NULL
    CHECK (foobar_status IN ('BNT','VFD', 'NST', 'SLD))
    foobar_status_start_date DATE NOT NULL
    foobar_status_end_date DATE,
    CHECK(foobar_status_start_date <= foobar_status_end_date),
    PRIMARY KEY (foobar_id, foobar_status_start_date)
    );

    Please post DDL and follow ANSI/ISO standards when asking for help. 

  • Terry Gamble - Wednesday, January 31, 2018 1:22 PM

    I've got a table with the following fields in it:

    UniqueID
    Sequence
    BeforeStatus
    AfterStatus
    HistoryStatusDate

    Sample data looks like :

    08-40089445, 1, '','BNT',10/17/17
    08-40089445, 2, 'BNT','VFD',1/18/18
    08-40089445, 3, 'VFD','NST',1/23/18
    08-40089445, 4, 'NST','SLD',1/23/18

    What I need to do is basically make a "beginning history" and  "ending history" date for each record:
    08-40089445, 1, '','BNT',10/17/17, '1/17/18'
    08-40089445, 2, 'BNT','VFD',1/18/18, '1/22/18'
    08-40089445, 3, 'VFD','NST',1/23/18,'1/23/18'
    08-40089445, 4, 'NST','SLD',1/23/18,'' 
    When I write a self join such as

    SELECT * FRom
    (SELECT s.*,h.HistoryStatusDate statusEnddate
    FROM [StatusHistory] S
    JOIN [StatusHistory] h
    ON s.UniqueID=h.UniqueID
    WHERE s.Sequence=h.Sequence-1
    ) s
    --Still trying to find the 7th record for this example
    WHERE s.[unique#]='08-40089445'

    I end up getting the first 3 back but always miss the 4th (in my example).

    Anyone have an idea of a better way to write this/solve this problem?  (it's tracking when a record changes a status but it's done in an iSeries so getting the programmers to put the "ending" date on isn't very likely, and the query is ignoring due to the sequence-1 portion for the 4th (max) record.

    Try selecting UniqueID, Seq, After Status, Date as Start Date
    In the sub query, do the join as a Left or Outer Join, where the Seq is + 1
    Select the Date from this as the End Date.

    This should give you a Start and End Date for each status, and in the case of the last Seq a NULL.
    Not sure if your platform is iSeries or SQL Server (or something else). Many flavors of SQL, each with some unique features.

Viewing 0 posts

You must be logged in to reply to this topic. Login to reply