Break in continous rows

Question

Post reply

Break in continous rows

sac.nan

Old Hand

Points: 384
More actions
December 2, 2016 at 1:28 am

#337870

Hello,
I have a table and data like below
CREATE TABLE [dbo].[p2p](
[p2pid] [int] NULL,
[p2psource] [int] NULL
) ON [PRIMARY]
GO
INSERT [dbo].[p2p] ([p2pid], [p2psource]) VALUES (123, 131)
GO
INSERT [dbo].[p2p] ([p2pid], [p2psource]) VALUES (131, 142)
GO
INSERT [dbo].[p2p] ([p2pid], [p2psource]) VALUES (142, 146)
GO
INSERT [dbo].[p2p] ([p2pid], [p2psource]) VALUES (146, 147)
GO
INSERT [dbo].[p2p] ([p2pid], [p2psource]) VALUES (147, 124)
GO
INSERT [dbo].[p2p] ([p2pid], [p2psource]) VALUES (125, 156)
GO
INSERT [dbo].[p2p] ([p2pid], [p2psource]) VALUES (156, 165)
GO
INSERT [dbo].[p2p] ([p2pid], [p2psource]) VALUES (165, 126)
GO
INSERT [dbo].[p2p] ([p2pid], [p2psource]) VALUES (126, 162)
GO
INSERT [dbo].[p2p] ([p2pid], [p2psource]) VALUES (124, 123)
GO
Since p2pid != previous row p2psource when p2pid=125 the rank column should reset and start from 1
The expected output should be
p2pid rank
1231
1312
1423
1464
1475
1246
1251
1562
1653
1264
1625

Viewing 15 posts - 1 through 15 (of 21 total)

You must be logged in to reply to this topic. Login to reply

John Mitchell-245523 SSC Guru Points: 148809 More actions · Answer 1

Ah yes, this is a classic gaps and islands problem. Have a read through this, and see if it helps.

John

sac.nan Old Hand Points: 384 More actions · Answer 2

John Mitchell-245523 (12/2/2016)
Ah yes, this is a classic gaps and islands problem. Have a read through this, and see if it helps.
John

Thanks but I doubt the gaps/island solution will work here.In my case p2pid value of the current row needs to be compared with the previous row of p2psource and if they are same they have to be ranked in a continuous sequence and if not, the sequence should reset with 1 and continue till it finds the next non continuous value.

J Livingston SQL SSC Guru Points: 51550 More actions · Answer 3

how are you intending to order the rows as you have shown? do you have other columns that can be used to guarantee the order?

________________________________________________________________
you can lead a user to data....but you cannot make them think
and remember....every day is a school day

sac.nan Old Hand Points: 384 More actions · Answer 4

Unfortunately there is no predefined order.The design is not in our hand.Best way I could think of ordering was something like this.

drop table #t

create table #t(id intidentity,p2pid int,p2psource int)

insert into #t

select p2pid,p2psource from p2p

select * from #t

John Mitchell-245523 SSC Guru Points: 148809 More actions · Answer 5

That will bite you very hard at some point. What happens when someone creates a clustered index on p2p? The rows will (or are most likely to) go into your temp table in the order of the clustering key, rather than the order you showed them in your original post. Even with no clustered index, there's no guarantee that the rows will go in in the order you expect.

Ignoring that, I'm struggling with the fact that your data set has 10 rows but your result set has 11. This is because you've gone 147 - 124 - 125 at the hiatus. Surely if the values (124 and 125 here) don't match, you should choose one or other of them, not both? Please will you explain what you're modelling, to help me visualise it?

Thanks

John

sac.nan Old Hand Points: 384 More actions · Answer 6

This data format comes through a 3rd party application as is and gets dumped into the table without any specific order.

Yes if the values 124 and 125 don't match it would mean a start of a new sequencing order.So in case of 124 and 125 , 124 would act as the last value of one hiatus and 125 would be the first value of the next sequencing order so this way there will be 11 rows.The business requirement is such that the final result set should display the value at which the individual hiatus ended.

In short every sequencing order should have the first value from p2pid and end value from p2psource.

John Mitchell-245523 SSC Guru Points: 148809 More actions · Answer 7

Well, then you need an identity column in your table and a cast-iron process for making sure the data goes into the table in the same order that you receive it, or all of this is going to be a waste of time. How do you receive it - in a text file, direct from another database, or something else?

You didn't answer the question about exactly what you're modelling with this process. Please will you supply a bit more sample data as well, that has (say) three breaks in it instead of just one?

John

sac.nan Old Hand Points: 384 More actions · Answer 8

Yes we do have an identity column in the table.What I meant about data not being in order was that the data does not have a specific order for p2pid or p2psource. We receive data through 3rd part API's.

CREATE TABLE [dbo].[p2p](

[id] [int] IDENTITY(1,1) NOT NULL,

[p2pid] [int] NULL,

[p2psource] [int] NULL,

CONSTRAINT [PK_p2p] PRIMARY KEY CLUSTERED

(

[id] ASC

)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

) ON [PRIMARY]

GO

SET IDENTITY_INSERT [dbo].[p2p] ON

GO

INSERT [dbo].[p2p] ([id], [p2pid], [p2psource]) VALUES (1, 123, 131)

GO

INSERT [dbo].[p2p] ([id], [p2pid], [p2psource]) VALUES (2, 131, 142)

GO

INSERT [dbo].[p2p] ([id], [p2pid], [p2psource]) VALUES (3, 142, 146)

GO

INSERT [dbo].[p2p] ([id], [p2pid], [p2psource]) VALUES (4, 146, 147)

GO

INSERT [dbo].[p2p] ([id], [p2pid], [p2psource]) VALUES (5, 147, 124)

GO

INSERT [dbo].[p2p] ([id], [p2pid], [p2psource]) VALUES (6, 125, 156)

GO

INSERT [dbo].[p2p] ([id], [p2pid], [p2psource]) VALUES (7, 156, 165)

GO

INSERT [dbo].[p2p] ([id], [p2pid], [p2psource]) VALUES (8, 165, 126)

GO

INSERT [dbo].[p2p] ([id], [p2pid], [p2psource]) VALUES (9, 126, 162)

GO

INSERT [dbo].[p2p] ([id], [p2pid], [p2psource]) VALUES (10, 124, 123)

GO

INSERT [dbo].[p2p] ([id], [p2pid], [p2psource]) VALUES (11, 123, 185)

GO

INSERT [dbo].[p2p] ([id], [p2pid], [p2psource]) VALUES (12, 185, 241)

GO

INSERT [dbo].[p2p] ([id], [p2pid], [p2psource]) VALUES (13, 241, 300)

GO

INSERT [dbo].[p2p] ([id], [p2pid], [p2psource]) VALUES (14, 300, 110)

GO

INSERT [dbo].[p2p] ([id], [p2pid], [p2psource]) VALUES (15, 110, 55)

GO

INSERT [dbo].[p2p] ([id], [p2pid], [p2psource]) VALUES (16, 55, 500)

GO

INSERT [dbo].[p2p] ([id], [p2pid], [p2psource]) VALUES (17, 10, 200)

GO

INSERT [dbo].[p2p] ([id], [p2pid], [p2psource]) VALUES (18, 200, 220)

GO

INSERT [dbo].[p2p] ([id], [p2pid], [p2psource]) VALUES (19, 220, 65)

GO

SET IDENTITY_INSERT [dbo].[p2p] OFF

GO

In the above data two additional break occurs at row 10 and row 17

J Livingston SQL SSC Guru Points: 51550 More actions · Answer 9

maybe...???

WITH C1

AS (SELECT id,

p2pid,

p2psource,

lag(p2psource, 1, p2pid) OVER(ORDER BY id) AS lag_s

FROM p2p),

C2

AS (SELECT *,

SUM(CASE

WHEN p2pid = lag_s

THEN 0

ELSE 1

END) OVER(ORDER BY id ROWS UNBOUNDED PRECEDING) AS grp

FROM C1)

SELECT *,

ROW_NUMBER() OVER(PARTITION BY grp ORDER BY ID) AS RN

FROM C2;

________________________________________________________________
you can lead a user to data....but you cannot make them think
and remember....every day is a school day

sac.nan Old Hand Points: 384 More actions · Answer 10

J Livingston SQL (12/2/2016)
maybe...???
WITH C1
AS (SELECT id,
p2pid,
p2psource,
lag(p2psource, 1, p2pid) OVER(ORDER BY id) AS lag_s
FROM p2p),
C2
AS (SELECT *,
SUM(CASE
WHEN p2pid = lag_s
THEN 0
ELSE 1
END) OVER(ORDER BY id ROWS UNBOUNDED PRECEDING) AS grp
FROM C1)
SELECT *,
ROW_NUMBER() OVER(PARTITION BY grp ORDER BY ID) AS RN
FROM C2;

Thanks almost there but for row no 7 lag_s value of 125 should be displayed which would mean a start of a new sequencing order and 124 should be the end of the previous sequencing order.

1231

1312

1423

1464

1475

1246

1251

1562

1653

1264

1625

J Livingston SQL SSC Guru Points: 51550 More actions · Answer 11

sac.nan (12/2/2016)
J Livingston SQL (12/2/2016)
maybe...???
WITH C1
AS (SELECT id,
p2pid,
p2psource,
lag(p2psource, 1, p2pid) OVER(ORDER BY id) AS lag_s
FROM p2p),
C2
AS (SELECT *,
SUM(CASE
WHEN p2pid = lag_s
THEN 0
ELSE 1
END) OVER(ORDER BY id ROWS UNBOUNDED PRECEDING) AS grp
FROM C1)
SELECT *,
ROW_NUMBER() OVER(PARTITION BY grp ORDER BY ID) AS RN
FROM C2;
Thanks almost there but for row no 7 lag_s value of 125 should be displayed which would mean a start of a new sequencing order and 124 should be the end of the previous sequencing order.
1231
1312
1423
1464
1475
1246
1251
1562
1653
1264
1625

maybe I am misunderstanding something....but this would appear to be adding an additional row....EDIT ~~I think John mentioned this earlier but not sure I saw your answer?~~ jusr reread the thread....and saw your reply

________________________________________________________________
you can lead a user to data....but you cannot make them think
and remember....every day is a school day

John Mitchell-245523 SSC Guru Points: 148809 More actions · Answer 12

J Livingston SQL (12/2/2016)
sac.nan (12/2/2016)
J Livingston SQL (12/2/2016)
maybe...???
WITH C1
AS (SELECT id,
p2pid,
p2psource,
lag(p2psource, 1, p2pid) OVER(ORDER BY id) AS lag_s
FROM p2p),
C2
AS (SELECT *,
SUM(CASE
WHEN p2pid = lag_s
THEN 0
ELSE 1
END) OVER(ORDER BY id ROWS UNBOUNDED PRECEDING) AS grp
FROM C1)
SELECT *,
ROW_NUMBER() OVER(PARTITION BY grp ORDER BY ID) AS RN
FROM C2;
Thanks almost there but for row no 7 lag_s value of 125 should be displayed which would mean a start of a new sequencing order and 124 should be the end of the previous sequencing order.
1231
1312
1423
1464
1475
1246
1251
1562
1653
1264
1625
maybe I am misunderstanding something....but this would appear to be adding an additional row....I think John mentioned this earlier but not sure I saw your answer?

Yes, the answer was kind of "that's the requirement", which is fair enough, I suppose. I'd still love to know what this is for, though. The only way I could see of starting was by doing a self-join on ID = ID+1, but that didn't get me all the way there and I'm afraid I don't have all afternoon to spend on this.

John

J Livingston SQL SSC Guru Points: 51550 More actions · Answer 13

John Mitchell-245523 (12/2/2016)
J Livingston SQL (12/2/2016)
sac.nan (12/2/2016)
J Livingston SQL (12/2/2016)
maybe...???
WITH C1
AS (SELECT id,
p2pid,
p2psource,
lag(p2psource, 1, p2pid) OVER(ORDER BY id) AS lag_s
FROM p2p),
C2
AS (SELECT *,
SUM(CASE
WHEN p2pid = lag_s
THEN 0
ELSE 1
END) OVER(ORDER BY id ROWS UNBOUNDED PRECEDING) AS grp
FROM C1)
SELECT *,
ROW_NUMBER() OVER(PARTITION BY grp ORDER BY ID) AS RN
FROM C2;
Thanks almost there but for row no 7 lag_s value of 125 should be displayed which would mean a start of a new sequencing order and 124 should be the end of the previous sequencing order.
1231
1312
1423
1464
1475
1246
1251
1562
1653
1264
1625
maybe I am misunderstanding something....but this would appear to be adding an additional row....I think John mentioned this earlier but not sure I saw your answer?
I'm afraid I don't have all afternoon to spend on this.
John

neither do I John

________________________________________________________________
you can lead a user to data....but you cannot make them think
and remember....every day is a school day

sac.nan Old Hand Points: 384 More actions · Answer 14

John Mitchell-245523 (12/2/2016)
Yes, the answer was kind of "that's the requirement", which is fair enough, I suppose. I'd still love to know what this is for, though. The only way I could see of starting was by doing a self-join on ID = ID+1, but that didn't get me all the way there and I'm afraid I don't have all afternoon to spend on this.
John

Unfortunately I am way down the value chain and in no position to question the requirements :crying:

And thank you both John and Livingston for the help.Greatly appreciate your efforts.