Self Join Question

  • I've got a table with the following fields in it:

    UniqueID
    Sequence
    BeforeStatus
    AfterStatus
    HistoryStatusDate

    Sample data looks like :

    08-40089445, 1, '','BNT',10/17/17
    08-40089445, 2, 'BNT','VFD',1/18/18
    08-40089445, 3, 'VFD','NST',1/23/18
    08-40089445, 4, 'NST','SLD',1/23/18

    What I need to do is basically make a "beginning history" and  "ending history" date for each record:
    08-40089445, 1, '','BNT',10/17/17, '1/17/18'
    08-40089445, 2, 'BNT','VFD',1/18/18, '1/22/18'
    08-40089445, 3, 'VFD','NST',1/23/18,'1/23/18'
    08-40089445, 4, 'NST','SLD',1/23/18,'' 
    When I write a self join such as

    SELECT * FRom
    (SELECT s.*,h.HistoryStatusDate statusEnddate
    FROM [StatusHistory] S
    JOIN [StatusHistory] h
    ON s.UniqueID=h.UniqueID
    WHERE s.Sequence=h.Sequence-1
    ) s
    --Still trying to find the 7th record for this example
    WHERE s.[unique#]='08-40089445'

    I end up getting the first 3 back but always miss the 4th (in my example).

    Anyone have an idea of a better way to write this/solve this problem?  (it's tracking when a record changes a status but it's done in an iSeries so getting the programmers to put the "ending" date on isn't very likely, and the query is ignoring due to the sequence-1 portion for the 4th (max) record.

  • Terry Gamble - Wednesday, January 31, 2018 1:22 PM

    I've got a table with the following fields in it:

    UniqueID
    Sequence
    BeforeStatus
    AfterStatus
    HistoryStatusDate

    Sample data looks like :

    08-40089445, 1, '','BNT',10/17/17
    08-40089445, 2, 'BNT','VFD',1/18/18
    08-40089445, 3, 'VFD','NST',1/23/18
    08-40089445, 4, 'NST','SLD',1/23/18

    What I need to do is basically make a "beginning history" and  "ending history" date for each record:
    08-40089445, 1, '','BNT',10/17/17, '1/17/18'
    08-40089445, 2, 'BNT','VFD',1/18/18, '1/22/18'
    08-40089445, 3, 'VFD','NST',1/23/18,'1/23/18'
    08-40089445, 4, 'NST','SLD',1/23/18,'' 
    When I write a self join such as

    SELECT * FRom
    (SELECT s.*,h.HistoryStatusDate statusEnddate
    FROM [StatusHistory] S
    JOIN [StatusHistory] h
    ON s.UniqueID=h.UniqueID
    WHERE s.Sequence=h.Sequence-1
    ) s
    --Still trying to find the 7th record for this example
    WHERE s.[unique#]='08-40089445'

    I end up getting the first 3 back but always miss the 4th (in my example).

    Anyone have an idea of a better way to write this/solve this problem?  (it's tracking when a record changes a status but it's done in an iSeries so getting the programmers to put the "ending" date on isn't very likely, and the query is ignoring due to the sequence-1 portion for the 4th (max) record.

    You should be able to get those end dates using LEAD. No need for any joins.

    If you haven't even tried to resolve your issue, please don't expect the hard-working volunteers here to waste their time providing links to answers which you could easily have found yourself.

  • This is maybe not the right forum/version as I don't have the "lead" function that I can use/find.  Is there a true "SQL" based solution using something like the self join?  Versus the newer functions like LEAD (which btw I sincerely wish I could use).

  • You posted in the SQL Server 2017 forum. In 2017, Lead is a true SQL function ?

    If you haven't even tried to resolve your issue, please don't expect the hard-working volunteers here to waste their time providing links to answers which you could easily have found yourself.

  • Which version are you using?

    If you haven't even tried to resolve your issue, please don't expect the hard-working volunteers here to waste their time providing links to answers which you could easily have found yourself.

  • I'm not sure what version of SQL that my BI tool rides on (Sisense).  I was looking through it for any solution and the only thing I found was self-joins to solve my issue and it drew me back to this site to ask the question.

  • Try changing your JOIN to a LEFT JOIN

  • I had tried that.  The table is 8 million rows right now and even trying to limit it to a specific unique ID the server basically locks up on the query (the cube goes to using 110+ GB memory and doesn't return the result so I end up having to kill the cube).  It's an annoying issue.

  • Terry Gamble - Wednesday, January 31, 2018 1:22 PM

    I've got a table with the following fields in it:

    UniqueID
    Sequence
    BeforeStatus
    AfterStatus
    HistoryStatusDate

    Sample data looks like :

    08-40089445, 1, '','BNT',10/17/17
    08-40089445, 2, 'BNT','VFD',1/18/18
    08-40089445, 3, 'VFD','NST',1/23/18
    08-40089445, 4, 'NST','SLD',1/23/18

    What I need to do is basically make a "beginning history" and  "ending history" date for each record:
    08-40089445, 1, '','BNT',10/17/17, '1/17/18'
    08-40089445, 2, 'BNT','VFD',1/18/18, '1/22/18'
    08-40089445, 3, 'VFD','NST',1/23/18,'1/23/18'
    08-40089445, 4, 'NST','SLD',1/23/18,'' 
    When I write a self join such as

    SELECT * FRom
    (SELECT s.*,h.HistoryStatusDate statusEnddate
    FROM [StatusHistory] S
    JOIN [StatusHistory] h
    ON s.UniqueID=h.UniqueID
    WHERE s.Sequence=h.Sequence-1
    ) s
    --Still trying to find the 7th record for this example
    WHERE s.[unique#]='08-40089445'

    I end up getting the first 3 back but always miss the 4th (in my example).

    Anyone have an idea of a better way to write this/solve this problem?  (it's tracking when a record changes a status but it's done in an iSeries so getting the programmers to put the "ending" date on isn't very likely, and the query is ignoring due to the sequence-1 portion for the 4th (max) record.

    Could you please try to use OFFSET and FETCH for finding nth record as you wish. But still the performance is a concern.

  • DesNorton - Wednesday, January 31, 2018 10:31 PM

    Try changing your JOIN to a LEFT JOIN

    LEFT JOIN alone will not cut it. You'll also have to move the WHERE condition to ON:

    CREATE TABLE #StatusHistory (
     UniqueID varchar(30) ,
     Sequence int,
     BeforeStatus varchar(10),
     AfterStatus varchar(10),
     HistoryStatusDate date,
     PRIMARY KEY(UniqueID, Sequence)
     )

    INSERT INTO #StatusHistory VALUES('08-40089445',1,'','BNT','20171017')
    INSERT INTO #StatusHistory VALUES('08-40089445',2,'BNT','VFD','20180118')
    INSERT INTO #StatusHistory VALUES('08-40089445',3,'VFD','NST','20180123')
    INSERT INTO #StatusHistory VALUES('08-40089445',4,'NST','SLD','20180123')

    SELECT * FRom
    (SELECT s.*,h.HistoryStatusDate statusEnddate
    FROM [#StatusHistory] S
    LEFT JOIN [#StatusHistory] h
    ON s.UniqueID=h.UniqueID
    AND  s.Sequence=h.Sequence-1
    ) s
    --Still trying to find the 7th record for this example
    WHERE s.[UniqueID]='08-40089445'
    ORDER BY s.UniqueID, s.Sequence
    Result:
    UniqueID                       Sequence    BeforeStatus AfterStatus HistoryStatusDate statusEnddate
    ------------------------------ ----------- ------------ ----------- ----------------- -------------
    08-40089445                    1                        BNT         2017-10-17        2018-01-18
    08-40089445                    2           BNT          VFD         2018-01-18        2018-01-23
    08-40089445                    3           VFD          NST         2018-01-23        2018-01-23
    08-40089445                    4           NST          SLD         2018-01-23        NULL

    (4 rows affected)

    However, I'd be reluctant to rely on the Sequence column being an unbroken sequence (the -1 condition). So I'd likely utilize an OUTER APPLY in stead of the LEFT JOIN:
    SELECT * FRom
    (SELECT s.*,h.HistoryStatusDate statusEnddate
    FROM [#StatusHistory] S
    OUTER APPLY (
     SELECT TOP(1) HistoryStatusDate
     FROM [#StatusHistory] h
     WHERE s.UniqueID=h.UniqueID
     AND h.Sequence > s.Sequence
     ORDER BY h.Sequence ASC
     ) h
    ) s
    --Still trying to find the 7th record for this example
    WHERE s.[UniqueID]='08-40089445'
    ORDER BY s.UniqueID, s.Sequence

    If we put in a 5 instead of 4 in the last row (thereby creating a gap in the Sequence column sequence), we can see the difference:
    UniqueID                       Sequence    BeforeStatus AfterStatus HistoryStatusDate statusEnddate
    ------------------------------ ----------- ------------ ----------- ----------------- -------------
    08-40089445                    1                        BNT         2017-10-17        2018-01-18
    08-40089445                    2           BNT          VFD         2018-01-18        2018-01-23
    08-40089445                    3           VFD          NST         2018-01-23        NULL
    08-40089445                    5           NST          SLD         2018-01-23        NULL

    (4 rows affected)

    UniqueID                       Sequence    BeforeStatus AfterStatus HistoryStatusDate statusEnddate
    ------------------------------ ----------- ------------ ----------- ----------------- -------------
    08-40089445                    1                        BNT         2017-10-17        2018-01-18
    08-40089445                    2           BNT          VFD         2018-01-18        2018-01-23
    08-40089445                    3           VFD          NST         2018-01-23        2018-01-23
    08-40089445                    5           NST          SLD         2018-01-23        NULL

    (4 rows affected)

    Only the last one can cope.

    Anyway, both methods will need an index on UniqueID, Sequence in order to perform adequately if more than a few rows are to be processed.

  • subramaniam.chandrasekar - Thursday, February 1, 2018 6:32 AM

    Terry Gamble - Wednesday, January 31, 2018 1:22 PM

    Terry Gamble - Wednesday, January 31, 2018 1:22 PM

    I've got a table with the following fields in it:

    UniqueID
    Sequence
    BeforeStatus
    AfterStatus
    HistoryStatusDate

    Sample data looks like :

    08-40089445, 1, '','BNT',10/17/17
    08-40089445, 2, 'BNT','VFD',1/18/18
    08-40089445, 3, 'VFD','NST',1/23/18
    08-40089445, 4, 'NST','SLD',1/23/18

    What I need to do is basically make a "beginning history" and  "ending history" date for each record:
    08-40089445, 1, '','BNT',10/17/17, '1/17/18'
    08-40089445, 2, 'BNT','VFD',1/18/18, '1/22/18'
    08-40089445, 3, 'VFD','NST',1/23/18,'1/23/18'
    08-40089445, 4, 'NST','SLD',1/23/18,'' 
    When I write a self join such as

    SELECT * FRom
    (SELECT s.*,h.HistoryStatusDate statusEnddate
    FROM [StatusHistory] S
    JOIN [StatusHistory] h
    ON s.UniqueID=h.UniqueID
    WHERE s.Sequence=h.Sequence-1
    ) s
    --Still trying to find the 7th record for this example
    WHERE s.[unique#]='08-40089445'

    I end up getting the first 3 back but always miss the 4th (in my example).

    Anyone have an idea of a better way to write this/solve this problem?  (it's tracking when a record changes a status but it's done in an iSeries so getting the programmers to put the "ending" date on isn't very likely, and the query is ignoring due to the sequence-1 portion for the 4th (max) record.

    >> I've got a table with the following fields in it: <<

    you might want to actually read the SQL standards, so you’ll know the difference between a column in a field. And why a row is not anything like a record. They are totally different concepts. Also, I can you post DDL instead of just a little shopping list of column names? Do you know what DDL is? I guess we have to start guessing and keys, too. There is no such thing as a “unique_id†because in RDBMS. An identifier must be the identifier of something in particular, and a “unique†is both a reserved word and not an entity. Then “Sequence†is yet another reserved word in SQL. A status is a state of being of something and it has to have a start and end time for this state. You never read a book on data modeling and you need to very badly. Finally, the only display format allowed for dates in ANSI-ISO Standard SQL is based on ISO 8601, not your local dialect. What little you did post is wrong.

    https://www.red-gate.com/simple-talk/sql/t-sql-programming/contiguous-time-periods/

    start with a skeleton, and then build on it. Will then need to look at another article of mine on state transition constraints

    https://www.red-gate.com/simple-talk/sql/t-sql-programming/state-transition-constraints/

    CREATE TABLE Foobar_History
    (foobar_id CHAR(11), NOT NULL
    CHECK (foobar_id LIKE ‘08-[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]’),
    foobar_status CHAR(3) NOT NULL
    CHECK (foobar_status IN ('BNT','VFD', 'NST', 'SLD))
    foobar_status_start_date DATE NOT NULL
    foobar_status_end_date DATE,
    CHECK(foobar_status_start_date <= foobar_status_end_date),
    PRIMARY KEY (foobar_id, foobar_status_start_date)
    );

    Please post DDL and follow ANSI/ISO standards when asking for help. 

  • Terry Gamble - Wednesday, January 31, 2018 1:22 PM

    I've got a table with the following fields in it:

    UniqueID
    Sequence
    BeforeStatus
    AfterStatus
    HistoryStatusDate

    Sample data looks like :

    08-40089445, 1, '','BNT',10/17/17
    08-40089445, 2, 'BNT','VFD',1/18/18
    08-40089445, 3, 'VFD','NST',1/23/18
    08-40089445, 4, 'NST','SLD',1/23/18

    What I need to do is basically make a "beginning history" and  "ending history" date for each record:
    08-40089445, 1, '','BNT',10/17/17, '1/17/18'
    08-40089445, 2, 'BNT','VFD',1/18/18, '1/22/18'
    08-40089445, 3, 'VFD','NST',1/23/18,'1/23/18'
    08-40089445, 4, 'NST','SLD',1/23/18,'' 
    When I write a self join such as

    SELECT * FRom
    (SELECT s.*,h.HistoryStatusDate statusEnddate
    FROM [StatusHistory] S
    JOIN [StatusHistory] h
    ON s.UniqueID=h.UniqueID
    WHERE s.Sequence=h.Sequence-1
    ) s
    --Still trying to find the 7th record for this example
    WHERE s.[unique#]='08-40089445'

    I end up getting the first 3 back but always miss the 4th (in my example).

    Anyone have an idea of a better way to write this/solve this problem?  (it's tracking when a record changes a status but it's done in an iSeries so getting the programmers to put the "ending" date on isn't very likely, and the query is ignoring due to the sequence-1 portion for the 4th (max) record.

    Try selecting UniqueID, Seq, After Status, Date as Start Date
    In the sub query, do the join as a Left or Outer Join, where the Seq is + 1
    Select the Date from this as the End Date.

    This should give you a Start and End Date for each status, and in the case of the last Seq a NULL.
    Not sure if your platform is iSeries or SQL Server (or something else). Many flavors of SQL, each with some unique features.

Viewing 12 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply