Self Join Question

Question

Self Join Question

Terry Gamble

Ten Centuries

Points: 1012
More actions
January 31, 2018 at 1:22 pm

#328738

I've got a table with the following fields in it:
UniqueID
Sequence
BeforeStatus
AfterStatus
HistoryStatusDate
Sample data looks like :
08-40089445, 1, '','BNT',10/17/17
08-40089445, 2, 'BNT','VFD',1/18/18
08-40089445, 3, 'VFD','NST',1/23/18
08-40089445, 4, 'NST','SLD',1/23/18
What I need to do is basically make a "beginning history" and "ending history" date for each record:
08-40089445, 1, '','BNT',10/17/17, '1/17/18'
08-40089445, 2, 'BNT','VFD',1/18/18, '1/22/18'
08-40089445, 3, 'VFD','NST',1/23/18,'1/23/18'
08-40089445, 4, 'NST','SLD',1/23/18,''
When I write a self join such as
SELECT * FRom
(SELECT s.*,h.HistoryStatusDate statusEnddate
FROM [StatusHistory] S
JOIN [StatusHistory] h
ON s.UniqueID=h.UniqueID
WHERE s.Sequence=h.Sequence-1
) s
--Still trying to find the 7th record for this example
WHERE s.[unique#]='08-40089445'
I end up getting the first 3 back but always miss the 4th (in my example).
Anyone have an idea of a better way to write this/solve this problem? (it's tracking when a record changes a status but it's done in an iSeries so getting the programmers to put the "ending" date on isn't very likely, and the query is ignoring due to the sequence-1 portion for the 4th (max) record.

Viewing 12 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply

Phil Parkin SSC Guru Points: 246686 More actions · Answer 1

Terry Gamble - Wednesday, January 31, 2018 1:22 PM
I've got a table with the following fields in it:
UniqueID
Sequence
BeforeStatus
AfterStatus
HistoryStatusDate
Sample data looks like :
08-40089445, 1, '','BNT',10/17/17
08-40089445, 2, 'BNT','VFD',1/18/18
08-40089445, 3, 'VFD','NST',1/23/18
08-40089445, 4, 'NST','SLD',1/23/18
What I need to do is basically make a "beginning history" and "ending history" date for each record:
08-40089445, 1, '','BNT',10/17/17, '1/17/18'
08-40089445, 2, 'BNT','VFD',1/18/18, '1/22/18'
08-40089445, 3, 'VFD','NST',1/23/18,'1/23/18'
08-40089445, 4, 'NST','SLD',1/23/18,''
When I write a self join such as
SELECT * FRom
(SELECT s.*,h.HistoryStatusDate statusEnddate
FROM [StatusHistory] S
JOIN [StatusHistory] h
ON s.UniqueID=h.UniqueID
WHERE s.Sequence=h.Sequence-1
) s
--Still trying to find the 7th record for this example
WHERE s.[unique#]='08-40089445'
I end up getting the first 3 back but always miss the 4th (in my example).
Anyone have an idea of a better way to write this/solve this problem? (it's tracking when a record changes a status but it's done in an iSeries so getting the programmers to put the "ending" date on isn't very likely, and the query is ignoring due to the sequence-1 portion for the 4th (max) record.

You should be able to get those end dates using LEAD. No need for any joins.

If you haven't even tried to resolve your issue, please don't expect the hard-working volunteers here to waste their time providing links to answers which you could easily have found yourself.

Terry Gamble Ten Centuries Points: 1012 More actions · Answer 2

This is maybe not the right forum/version as I don't have the "lead" function that I can use/find. Is there a true "SQL" based solution using something like the self join? Versus the newer functions like LEAD (which btw I sincerely wish I could use).

Phil Parkin SSC Guru Points: 246686 More actions · Answer 3

You posted in the SQL Server 2017 forum. In 2017, Lead is a true SQL function ?

If you haven't even tried to resolve your issue, please don't expect the hard-working volunteers here to waste their time providing links to answers which you could easily have found yourself.

Phil Parkin SSC Guru Points: 246686 More actions · Answer 4

Which version are you using?

If you haven't even tried to resolve your issue, please don't expect the hard-working volunteers here to waste their time providing links to answers which you could easily have found yourself.

Terry Gamble Ten Centuries Points: 1012 More actions · Answer 5

I'm not sure what version of SQL that my BI tool rides on (Sisense). I was looking through it for any solution and the only thing I found was self-joins to solve my issue and it drew me back to this site to ask the question.

DesNorton SSC-Insane Points: 24276 More actions · Answer 6

Try changing your JOIN to a LEFT JOIN

How to post data/code on a forum to get the best help.

Terry Gamble Ten Centuries Points: 1012 More actions · Answer 7

I had tried that. The table is 8 million rows right now and even trying to limit it to a specific unique ID the server basically locks up on the query (the cube goes to using 110+ GB memory and doesn't return the result so I end up having to kill the cube). It's an annoying issue.

subramaniam.chandrasekar SSCarpal Tunnel Points: 4223 More actions · Answer 8

Terry Gamble - Wednesday, January 31, 2018 1:22 PM
I've got a table with the following fields in it:
UniqueID
Sequence
BeforeStatus
AfterStatus
HistoryStatusDate
Sample data looks like :
08-40089445, 1, '','BNT',10/17/17
08-40089445, 2, 'BNT','VFD',1/18/18
08-40089445, 3, 'VFD','NST',1/23/18
08-40089445, 4, 'NST','SLD',1/23/18
What I need to do is basically make a "beginning history" and "ending history" date for each record:
08-40089445, 1, '','BNT',10/17/17, '1/17/18'
08-40089445, 2, 'BNT','VFD',1/18/18, '1/22/18'
08-40089445, 3, 'VFD','NST',1/23/18,'1/23/18'
08-40089445, 4, 'NST','SLD',1/23/18,''
When I write a self join such as
SELECT * FRom
(SELECT s.*,h.HistoryStatusDate statusEnddate
FROM [StatusHistory] S
JOIN [StatusHistory] h
ON s.UniqueID=h.UniqueID
WHERE s.Sequence=h.Sequence-1
) s
--Still trying to find the 7th record for this example
WHERE s.[unique#]='08-40089445'
I end up getting the first 3 back but always miss the 4th (in my example).
Anyone have an idea of a better way to write this/solve this problem? (it's tracking when a record changes a status but it's done in an iSeries so getting the programmers to put the "ending" date on isn't very likely, and the query is ignoring due to the sequence-1 portion for the 4th (max) record.

Could you please try to use OFFSET and FETCH for finding nth record as you wish. But still the performance is a concern.

kaj SSCommitted Points: 1580 More actions · Answer 9

DesNorton - Wednesday, January 31, 2018 10:31 PM
Try changing your JOIN to a LEFT JOIN

LEFT JOIN alone will not cut it. You'll also have to move the WHERE condition to ON:

CREATE TABLE #StatusHistory ( UniqueID varchar(30) , Sequence int, BeforeStatus varchar(10), AfterStatus varchar(10), HistoryStatusDate date, PRIMARY KEY(UniqueID, Sequence) )

INSERT INTO #StatusHistory VALUES('08-40089445',1,'','BNT','20171017')
INSERT INTO #StatusHistory VALUES('08-40089445',2,'BNT','VFD','20180118')
INSERT INTO #StatusHistory VALUES('08-40089445',3,'VFD','NST','20180123')
INSERT INTO #StatusHistory VALUES('08-40089445',4,'NST','SLD','20180123')

SELECT * FRom
(SELECT s.*,h.HistoryStatusDate statusEnddate
FROM [#StatusHistory] S
LEFT JOIN [#StatusHistory] h
ON s.UniqueID=h.UniqueID
AND s.Sequence=h.Sequence-1
) s
--Still trying to find the 7th record for this example
WHERE s.[UniqueID]='08-40089445'
ORDER BY s.UniqueID, s.Sequence
Result:
UniqueID                       Sequence    BeforeStatus AfterStatus HistoryStatusDate statusEnddate
------------------------------ ----------- ------------ ----------- ----------------- -------------
08-40089445                    1                        BNT         2017-10-17        2018-01-18
08-40089445                    2           BNT          VFD         2018-01-18        2018-01-23
08-40089445                    3           VFD          NST         2018-01-23        2018-01-23
08-40089445                    4           NST          SLD         2018-01-23        NULL

(4 rows affected)

However, I'd be reluctant to rely on the Sequence column being an unbroken sequence (the -1 condition). So I'd likely utilize an OUTER APPLY in stead of the LEFT JOIN:
SELECT * FRom (SELECT s.*,h.HistoryStatusDate statusEnddate FROM [#StatusHistory] S OUTER APPLY ( SELECT TOP(1) HistoryStatusDate FROM [#StatusHistory] h WHERE s.UniqueID=h.UniqueID AND h.Sequence > s.Sequence ORDER BY h.Sequence ASC ) h ) s --Still trying to find the 7th record for this example WHERE s.[UniqueID]='08-40089445' ORDER BY s.UniqueID, s.Sequence
If we put in a 5 instead of 4 in the last row (thereby creating a gap in the Sequence column sequence), we can see the difference:
UniqueID                       Sequence    BeforeStatus AfterStatus HistoryStatusDate statusEnddate
------------------------------ ----------- ------------ ----------- ----------------- -------------
08-40089445                    1                        BNT         2017-10-17        2018-01-18
08-40089445                    2           BNT          VFD         2018-01-18        2018-01-23
08-40089445                    3           VFD          NST         2018-01-23        NULL
08-40089445                    5           NST          SLD         2018-01-23        NULL

(4 rows affected)

UniqueID                       Sequence    BeforeStatus AfterStatus HistoryStatusDate statusEnddate
------------------------------ ----------- ------------ ----------- ----------------- -------------
08-40089445                    1                        BNT         2017-10-17        2018-01-18
08-40089445                    2           BNT          VFD         2018-01-18        2018-01-23
08-40089445                    3           VFD          NST         2018-01-23        2018-01-23
08-40089445                    5           NST          SLD         2018-01-23        NULL

(4 rows affected)

Only the last one can cope.

Anyway, both methods will need an index on UniqueID, Sequence in order to perform adequately if more than a few rows are to be processed.

jcelko212 32090 SSCrazy Eights Points: 9294 More actions · Answer 10

subramaniam.chandrasekar - Thursday, February 1, 2018 6:32 AM

Terry Gamble - Wednesday, January 31, 2018 1:22 PM

Terry Gamble - Wednesday, January 31, 2018 1:22 PM
I've got a table with the following fields in it:
UniqueID
Sequence
BeforeStatus
AfterStatus
HistoryStatusDate
Sample data looks like :
08-40089445, 1, '','BNT',10/17/17
08-40089445, 2, 'BNT','VFD',1/18/18
08-40089445, 3, 'VFD','NST',1/23/18
08-40089445, 4, 'NST','SLD',1/23/18
What I need to do is basically make a "beginning history" and "ending history" date for each record:
08-40089445, 1, '','BNT',10/17/17, '1/17/18'
08-40089445, 2, 'BNT','VFD',1/18/18, '1/22/18'
08-40089445, 3, 'VFD','NST',1/23/18,'1/23/18'
08-40089445, 4, 'NST','SLD',1/23/18,''
When I write a self join such as
SELECT * FRom
(SELECT s.*,h.HistoryStatusDate statusEnddate
FROM [StatusHistory] S
JOIN [StatusHistory] h
ON s.UniqueID=h.UniqueID
WHERE s.Sequence=h.Sequence-1
) s
--Still trying to find the 7th record for this example
WHERE s.[unique#]='08-40089445'
I end up getting the first 3 back but always miss the 4th (in my example).
Anyone have an idea of a better way to write this/solve this problem? (it's tracking when a record changes a status but it's done in an iSeries so getting the programmers to put the "ending" date on isn't very likely, and the query is ignoring due to the sequence-1 portion for the 4th (max) record.

>> I've got a table with the following fields in it: <<

you might want to actually read the SQL standards, so youâ€™ll know the difference between a column in a field. And why a row is not anything like a record. They are totally different concepts. Also, I can you post DDL instead of just a little shopping list of column names? Do you know what DDL is? I guess we have to start guessing and keys, too. There is no such thing as a â€œunique_idâ€ because in RDBMS. An identifier must be the identifier of something in particular, and a â€œuniqueâ€ is both a reserved word and not an entity. Then â€œSequenceâ€ is yet another reserved word in SQL. A status is a state of being of something and it has to have a start and end time for this state. You never read a book on data modeling and you need to very badly. Finally, the only display format allowed for dates in ANSI-ISO Standard SQL is based on ISO 8601, not your local dialect. What little you did post is wrong.

https://www.red-gate.com/simple-talk/sql/t-sql-programming/contiguous-time-periods/

start with a skeleton, and then build on it. Will then need to look at another article of mine on state transition constraints

https://www.red-gate.com/simple-talk/sql/t-sql-programming/state-transition-constraints/

CREATE TABLE Foobar_History (foobar_id CHAR(11), NOT NULL CHECK (foobar_id LIKE â€˜08-[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]â€™), foobar_status CHAR(3) NOT NULL CHECK (foobar_status IN ('BNT','VFD', 'NST', 'SLD)) foobar_status_start_date DATE NOT NULL foobar_status_end_date DATE, CHECK(foobar_status_start_date <= foobar_status_end_date), PRIMARY KEY (foobar_id, foobar_status_start_date) );

Please post DDL and follow ANSI/ISO standards when asking for help.

Greg Edwards-268690 SSC-Insane Points: 20676 More actions · Answer 11

Terry Gamble - Wednesday, January 31, 2018 1:22 PM
I've got a table with the following fields in it:
UniqueID
Sequence
BeforeStatus
AfterStatus
HistoryStatusDate
Sample data looks like :
08-40089445, 1, '','BNT',10/17/17
08-40089445, 2, 'BNT','VFD',1/18/18
08-40089445, 3, 'VFD','NST',1/23/18
08-40089445, 4, 'NST','SLD',1/23/18
What I need to do is basically make a "beginning history" and "ending history" date for each record:
08-40089445, 1, '','BNT',10/17/17, '1/17/18'
08-40089445, 2, 'BNT','VFD',1/18/18, '1/22/18'
08-40089445, 3, 'VFD','NST',1/23/18,'1/23/18'
08-40089445, 4, 'NST','SLD',1/23/18,''
When I write a self join such as
SELECT * FRom
(SELECT s.*,h.HistoryStatusDate statusEnddate
FROM [StatusHistory] S
JOIN [StatusHistory] h
ON s.UniqueID=h.UniqueID
WHERE s.Sequence=h.Sequence-1
) s
--Still trying to find the 7th record for this example
WHERE s.[unique#]='08-40089445'
I end up getting the first 3 back but always miss the 4th (in my example).
Anyone have an idea of a better way to write this/solve this problem? (it's tracking when a record changes a status but it's done in an iSeries so getting the programmers to put the "ending" date on isn't very likely, and the query is ignoring due to the sequence-1 portion for the 4th (max) record.

Try selecting UniqueID, Seq, After Status, Date as Start Date
In the sub query, do the join as a Left or Outer Join, where the Seq is + 1
Select the Date from this as the End Date.

This should give you a Start and End Date for each status, and in the case of the last Seq a NULL.
Not sure if your platform is iSeries or SQL Server (or something else). Many flavors of SQL, each with some unique features.