How to find islands and get max and min range values

  • I am trying to create groups for each Name and save the max and min ID for each group. The new group starts if the Length is higher then the previous Length for a particular Name, ordered by ID.

    The source data table is as follow:

    CREATE TABLE [dbo].[TEST](
    [Name] [nvarchar](50) NOT NULL,
    [Length] [int] NOT NULL,
    [ID] [int] NOT NULL
    ) ON [PRIMARY]
    GO
    INSERT INTO [dbo].[TEST]
    ([Name]
    ,[Length]
    ,[ID])
    VALUES
    ('A',30,1),
    ('B',30,2),
    ('C',40,3),
    ('A',25,4),
    ('A',20,5),
    ('B',35,6),
    ('A',20,7),
    ('B',30,8),
    ('A',25,9),
    ('A',20,10)
    GO

    The result table should look like this:

    I approached this like a problem of finding the islands. The first island for Name = A is defined by Length 30 to 20. The second island is again for Name = A but Length 25 to 20.

    I used Row_number() function but I don’t know how to create the proper partitions: in the example above, we have two partitions for the Name = A, for example.

    Thanks for help.

    • This topic was modified 1 month, 3 weeks ago by  itlk. Reason: Missing script to create a sample table
    • This topic was modified 1 month, 3 weeks ago by  itlk.
  • The source data table is as follow...

    Okay, how about a create table script?

  • Added a script

  • Looking at the code and just spitballing an idea, if you order the data by Name first and by ID second, you can use lead to see if the length is greater than or equal to the current value and that the Name is the same.  If that is true, it is part of a group.  If it is false, it is the end of a group. So you can easily mark the end of a group.  Using LAG and similar logic you can find the start of a group.

    Using the above values, you should be able to use the start of the group to get the Length Start and Start ID values and using the end of the group to get the Length End and End ID values.

    May need to also do a check to make sure that the start ID and end ID aren't the same and if they are, then you mark the Length End and End ID as NULL.

    Might not be the most efficient approach, but should get the desired results.

    EDIT - Potential solution (VERY likely not the most efficient approach) is:

    WITH [cte]
    AS
    (
    SELECT
    [Name]
    , [Length]
    , [ID]
    , CASE
    WHEN [length] >= LEAD( [Length]
    , 1
    ) OVER (ORDER BY [Name]
    , [ID]
    )
    THEN 1
    ELSE 0
    END AS [leading]
    , CASE
    WHEN [length] <= LAG( [length]
    , 1
    ) OVER (ORDER BY [name]
    , [id]
    )
    THEN 1
    ELSE 0
    END AS [lagging]
    FROM[dbo].[temp]
    )
    , [rawData]
    AS
    (
    SELECT
    [cte].[Name]
    , [cte].[Length]
    , [cte].[ID]
    , [cte].[leading]
    , [cte].[lagging]
    , LEAD( [length]
    , 1
    ) OVER (PARTITION BY
    [name]
    ORDER BY [name]
    ) AS [End_Length]
    , LEAD( [id]
    , 1
    ) OVER (PARTITION BY
    [name]
    ORDER BY [name]
    ) AS [end_ID]
    FROM[cte]
    WHERE[cte].[leading]= 0
    OR[cte].[lagging] = 0
    )
    SELECT
    [rawData].[Name]
    , [rawData].[Length] AS [Start_length]
    , CASE
    WHEN [rawData].[leading] = 0
    AND [rawData].[lagging] = 0
    THEN NULL
    ELSE NULLIF([rawData].[End_Length], [rawData].[length])
    END AS [End_Length]
    , [rawData].[ID] AS [Start_ID]
    , CASE
    WHEN [rawData].[leading] = 0
    AND [rawData].[lagging] = 0
    THEN NULL
    ELSE [rawData].[end_ID]
    END AS [end_ID]
    FROM[rawData]
    WHERE [rawData].[lagging] = 0;

    • This reply was modified 1 month, 3 weeks ago by  Mr. Brian Gale. Reason: added some code
    • This reply was modified 1 month, 3 weeks ago by  Mr. Brian Gale.

    The above is all just my opinion on what you should do. 
    As with all advice you find on a random internet forum - you shouldn't blindly follow it.  Always test on a test server to see if there is negative side effects before making changes to live!

  • Thanks for your reply. I think I am missing something: the rawData does not have Name column. Is perhaps cte.Name missing in the select statement?

    Also [rawData].[Length] is meant to be rawData.leading?

    Thank you.

  • That was a code formatting error...  I have corrected the code.

    The above is all just my opinion on what you should do. 
    As with all advice you find on a random internet forum - you shouldn't blindly follow it.  Always test on a test server to see if there is negative side effects before making changes to live!

  • It seems to be worse now:

  • I found it: I changed the Temp table for the Test table in your code.

    It seems to be working just fine. I will give it a deeper look.

    Thank you.

Viewing 8 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply