Break in continous rows

  • Hello,

    I have a table and data like below

    CREATE TABLE [dbo].[p2p](

    [p2pid] [int] NULL,

    [p2psource] [int] NULL

    ) ON [PRIMARY]

    GO

    INSERT [dbo].[p2p] ([p2pid], [p2psource]) VALUES (123, 131)

    GO

    INSERT [dbo].[p2p] ([p2pid], [p2psource]) VALUES (131, 142)

    GO

    INSERT [dbo].[p2p] ([p2pid], [p2psource]) VALUES (142, 146)

    GO

    INSERT [dbo].[p2p] ([p2pid], [p2psource]) VALUES (146, 147)

    GO

    INSERT [dbo].[p2p] ([p2pid], [p2psource]) VALUES (147, 124)

    GO

    INSERT [dbo].[p2p] ([p2pid], [p2psource]) VALUES (125, 156)

    GO

    INSERT [dbo].[p2p] ([p2pid], [p2psource]) VALUES (156, 165)

    GO

    INSERT [dbo].[p2p] ([p2pid], [p2psource]) VALUES (165, 126)

    GO

    INSERT [dbo].[p2p] ([p2pid], [p2psource]) VALUES (126, 162)

    GO

    INSERT [dbo].[p2p] ([p2pid], [p2psource]) VALUES (124, 123)

    GO

    Since p2pid != previous row p2psource when p2pid=125 the rank column should reset and start from 1

    The expected output should be

    p2pid rank

    1231

    1312

    1423

    1464

    1475

    1246

    1251

    1562

    1653

    1264

    1625

  • Ah yes, this is a classic gaps and islands problem. Have a read through this, and see if it helps.

    John

  • John Mitchell-245523 (12/2/2016)


    Ah yes, this is a classic gaps and islands problem. Have a read through this, and see if it helps.

    John

    Thanks but I doubt the gaps/island solution will work here.In my case p2pid value of the current row needs to be compared with the previous row of p2psource and if they are same they have to be ranked in a continuous sequence and if not, the sequence should reset with 1 and continue till it finds the next non continuous value.

  • how are you intending to order the rows as you have shown? do you have other columns that can be used to guarantee the order?

    ________________________________________________________________
    you can lead a user to data....but you cannot make them think
    and remember....every day is a school day

  • Unfortunately there is no predefined order.The design is not in our hand.Best way I could think of ordering was something like this.

    drop table #t

    create table #t(id intidentity,p2pid int,p2psource int)

    insert into #t

    select p2pid,p2psource from p2p

    select * from #t

  • That will bite you very hard at some point. What happens when someone creates a clustered index on p2p? The rows will (or are most likely to) go into your temp table in the order of the clustering key, rather than the order you showed them in your original post. Even with no clustered index, there's no guarantee that the rows will go in in the order you expect.

    Ignoring that, I'm struggling with the fact that your data set has 10 rows but your result set has 11. This is because you've gone 147 - 124 - 125 at the hiatus. Surely if the values (124 and 125 here) don't match, you should choose one or other of them, not both? Please will you explain what you're modelling, to help me visualise it?

    Thanks

    John

  • This data format comes through a 3rd party application as is and gets dumped into the table without any specific order.

    Yes if the values 124 and 125 don't match it would mean a start of a new sequencing order.So in case of 124 and 125 , 124 would act as the last value of one hiatus and 125 would be the first value of the next sequencing order so this way there will be 11 rows.The business requirement is such that the final result set should display the value at which the individual hiatus ended.

    In short every sequencing order should have the first value from p2pid and end value from p2psource.

  • Well, then you need an identity column in your table and a cast-iron process for making sure the data goes into the table in the same order that you receive it, or all of this is going to be a waste of time. How do you receive it - in a text file, direct from another database, or something else?

    You didn't answer the question about exactly what you're modelling with this process. Please will you supply a bit more sample data as well, that has (say) three breaks in it instead of just one?

    John

  • Yes we do have an identity column in the table.What I meant about data not being in order was that the data does not have a specific order for p2pid or p2psource. We receive data through 3rd part API's.

    CREATE TABLE [dbo].[p2p](

    [id] [int] IDENTITY(1,1) NOT NULL,

    [p2pid] [int] NULL,

    [p2psource] [int] NULL,

    CONSTRAINT [PK_p2p] PRIMARY KEY CLUSTERED

    (

    [id] ASC

    )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

    ) ON [PRIMARY]

    GO

    SET IDENTITY_INSERT [dbo].[p2p] ON

    GO

    INSERT [dbo].[p2p] ([id], [p2pid], [p2psource]) VALUES (1, 123, 131)

    GO

    INSERT [dbo].[p2p] ([id], [p2pid], [p2psource]) VALUES (2, 131, 142)

    GO

    INSERT [dbo].[p2p] ([id], [p2pid], [p2psource]) VALUES (3, 142, 146)

    GO

    INSERT [dbo].[p2p] ([id], [p2pid], [p2psource]) VALUES (4, 146, 147)

    GO

    INSERT [dbo].[p2p] ([id], [p2pid], [p2psource]) VALUES (5, 147, 124)

    GO

    INSERT [dbo].[p2p] ([id], [p2pid], [p2psource]) VALUES (6, 125, 156)

    GO

    INSERT [dbo].[p2p] ([id], [p2pid], [p2psource]) VALUES (7, 156, 165)

    GO

    INSERT [dbo].[p2p] ([id], [p2pid], [p2psource]) VALUES (8, 165, 126)

    GO

    INSERT [dbo].[p2p] ([id], [p2pid], [p2psource]) VALUES (9, 126, 162)

    GO

    INSERT [dbo].[p2p] ([id], [p2pid], [p2psource]) VALUES (10, 124, 123)

    GO

    INSERT [dbo].[p2p] ([id], [p2pid], [p2psource]) VALUES (11, 123, 185)

    GO

    INSERT [dbo].[p2p] ([id], [p2pid], [p2psource]) VALUES (12, 185, 241)

    GO

    INSERT [dbo].[p2p] ([id], [p2pid], [p2psource]) VALUES (13, 241, 300)

    GO

    INSERT [dbo].[p2p] ([id], [p2pid], [p2psource]) VALUES (14, 300, 110)

    GO

    INSERT [dbo].[p2p] ([id], [p2pid], [p2psource]) VALUES (15, 110, 55)

    GO

    INSERT [dbo].[p2p] ([id], [p2pid], [p2psource]) VALUES (16, 55, 500)

    GO

    INSERT [dbo].[p2p] ([id], [p2pid], [p2psource]) VALUES (17, 10, 200)

    GO

    INSERT [dbo].[p2p] ([id], [p2pid], [p2psource]) VALUES (18, 200, 220)

    GO

    INSERT [dbo].[p2p] ([id], [p2pid], [p2psource]) VALUES (19, 220, 65)

    GO

    SET IDENTITY_INSERT [dbo].[p2p] OFF

    GO

    In the above data two additional break occurs at row 10 and row 17

  • maybe...???

    WITH C1

    AS (SELECT id,

    p2pid,

    p2psource,

    lag(p2psource, 1, p2pid) OVER(ORDER BY id) AS lag_s

    FROM p2p),

    C2

    AS (SELECT *,

    SUM(CASE

    WHEN p2pid = lag_s

    THEN 0

    ELSE 1

    END) OVER(ORDER BY id ROWS UNBOUNDED PRECEDING) AS grp

    FROM C1)

    SELECT *,

    ROW_NUMBER() OVER(PARTITION BY grp ORDER BY ID) AS RN

    FROM C2;

    ________________________________________________________________
    you can lead a user to data....but you cannot make them think
    and remember....every day is a school day

  • J Livingston SQL (12/2/2016)


    maybe...???

    WITH C1

    AS (SELECT id,

    p2pid,

    p2psource,

    lag(p2psource, 1, p2pid) OVER(ORDER BY id) AS lag_s

    FROM p2p),

    C2

    AS (SELECT *,

    SUM(CASE

    WHEN p2pid = lag_s

    THEN 0

    ELSE 1

    END) OVER(ORDER BY id ROWS UNBOUNDED PRECEDING) AS grp

    FROM C1)

    SELECT *,

    ROW_NUMBER() OVER(PARTITION BY grp ORDER BY ID) AS RN

    FROM C2;

    Thanks almost there but for row no 7 lag_s value of 125 should be displayed which would mean a start of a new sequencing order and 124 should be the end of the previous sequencing order.

    1231

    1312

    1423

    1464

    1475

    1246

    1251

    1562

    1653

    1264

    1625

  • sac.nan (12/2/2016)


    J Livingston SQL (12/2/2016)


    maybe...???

    WITH C1

    AS (SELECT id,

    p2pid,

    p2psource,

    lag(p2psource, 1, p2pid) OVER(ORDER BY id) AS lag_s

    FROM p2p),

    C2

    AS (SELECT *,

    SUM(CASE

    WHEN p2pid = lag_s

    THEN 0

    ELSE 1

    END) OVER(ORDER BY id ROWS UNBOUNDED PRECEDING) AS grp

    FROM C1)

    SELECT *,

    ROW_NUMBER() OVER(PARTITION BY grp ORDER BY ID) AS RN

    FROM C2;

    Thanks almost there but for row no 7 lag_s value of 125 should be displayed which would mean a start of a new sequencing order and 124 should be the end of the previous sequencing order.

    1231

    1312

    1423

    1464

    1475

    1246

    1251

    1562

    1653

    1264

    1625

    maybe I am misunderstanding something....but this would appear to be adding an additional row....EDIT I think John mentioned this earlier but not sure I saw your answer? jusr reread the thread....and saw your reply

    ________________________________________________________________
    you can lead a user to data....but you cannot make them think
    and remember....every day is a school day

  • J Livingston SQL (12/2/2016)


    sac.nan (12/2/2016)


    J Livingston SQL (12/2/2016)


    maybe...???

    WITH C1

    AS (SELECT id,

    p2pid,

    p2psource,

    lag(p2psource, 1, p2pid) OVER(ORDER BY id) AS lag_s

    FROM p2p),

    C2

    AS (SELECT *,

    SUM(CASE

    WHEN p2pid = lag_s

    THEN 0

    ELSE 1

    END) OVER(ORDER BY id ROWS UNBOUNDED PRECEDING) AS grp

    FROM C1)

    SELECT *,

    ROW_NUMBER() OVER(PARTITION BY grp ORDER BY ID) AS RN

    FROM C2;

    Thanks almost there but for row no 7 lag_s value of 125 should be displayed which would mean a start of a new sequencing order and 124 should be the end of the previous sequencing order.

    1231

    1312

    1423

    1464

    1475

    1246

    1251

    1562

    1653

    1264

    1625

    maybe I am misunderstanding something....but this would appear to be adding an additional row....I think John mentioned this earlier but not sure I saw your answer?

    Yes, the answer was kind of "that's the requirement", which is fair enough, I suppose. I'd still love to know what this is for, though. The only way I could see of starting was by doing a self-join on ID = ID+1, but that didn't get me all the way there and I'm afraid I don't have all afternoon to spend on this.

    John

  • John Mitchell-245523 (12/2/2016)


    J Livingston SQL (12/2/2016)


    sac.nan (12/2/2016)


    J Livingston SQL (12/2/2016)


    maybe...???

    WITH C1

    AS (SELECT id,

    p2pid,

    p2psource,

    lag(p2psource, 1, p2pid) OVER(ORDER BY id) AS lag_s

    FROM p2p),

    C2

    AS (SELECT *,

    SUM(CASE

    WHEN p2pid = lag_s

    THEN 0

    ELSE 1

    END) OVER(ORDER BY id ROWS UNBOUNDED PRECEDING) AS grp

    FROM C1)

    SELECT *,

    ROW_NUMBER() OVER(PARTITION BY grp ORDER BY ID) AS RN

    FROM C2;

    Thanks almost there but for row no 7 lag_s value of 125 should be displayed which would mean a start of a new sequencing order and 124 should be the end of the previous sequencing order.

    1231

    1312

    1423

    1464

    1475

    1246

    1251

    1562

    1653

    1264

    1625

    maybe I am misunderstanding something....but this would appear to be adding an additional row....I think John mentioned this earlier but not sure I saw your answer?

    I'm afraid I don't have all afternoon to spend on this.

    John

    neither do I John

    ________________________________________________________________
    you can lead a user to data....but you cannot make them think
    and remember....every day is a school day

  • John Mitchell-245523 (12/2/2016)


    Yes, the answer was kind of "that's the requirement", which is fair enough, I suppose. I'd still love to know what this is for, though. The only way I could see of starting was by doing a self-join on ID = ID+1, but that didn't get me all the way there and I'm afraid I don't have all afternoon to spend on this.

    John

    Unfortunately I am way down the value chain and in no position to question the requirements :crying:

    And thank you both John and Livingston for the help.Greatly appreciate your efforts.

Viewing 15 posts - 1 through 15 (of 21 total)

You must be logged in to reply to this topic. Login to reply