Group by column over partition ordered by date

  • Hi Everyone.

    I need a script to get the first and second null value from some records grouped by a column and ordered by date.

    sqlScenario

    I need the 311888 and 123477 itinerary groups because they have the first and second LegSegment as null value ordered by CreationDate, something like:

    select * from Test group by Itinerary order by CreationDate where (first LegSegment and secondLegSegment are null ordered by CreationDate)

    This is a query example to build the scenario:

    CREATE TABLE [dbo].[Test](
    [Id] [int] IDENTITY(1,1) NOT NULL,
    [Itinerary] [int] NULL,
    [LegSegment] [int] NULL,
    [CreationDate] [datetime] NULL,
    CONSTRAINT [PK_Test2] PRIMARY KEY CLUSTERED
    (
    [Id] ASC
    ))

    INSERT INTO [dbo].[Test]
    ([Itinerary]
    ,[LegSegment]
    ,[CreationDate])
    VALUES (311888,NULL,'2022-05-20 12:43:34.090'),(311888,NULL,'2022-05-21 12:43:34.090'), (311888,817430,'2022-05-22 12:43:34.090')
    , (271888,NULL,'2022-05-20 12:43:34.090'),(271888,256321,'2022-05-21 12:43:34.090'), (271888,NULL,'2022-05-22 12:43:34.090')
    , (312288,NULL,'2022-05-20 12:43:34.090'),(312288,345689,'2022-05-21 12:43:34.090'), (312288,817430,'2022-05-22 12:43:34.090')
    , (123456,345678,'2022-05-20 12:43:34.090'),(123456,344657,'2022-05-21 12:43:34.090'), (123456,817430,'2022-05-22 12:43:34.090')
    , (777555,NULL,'2022-05-20 12:43:34.090'),(777555,555778,'2022-05-21 12:43:34.090'), (777555,456345,'2022-05-22 12:43:34.090')
    , (123477,NULL,'2022-05-20 12:43:34.090'),(123477,NULL,'2022-05-21 12:43:34.090'), (123477,567897,'2022-05-22 12:43:34.090')
    , (123477,567846,'2022-05-23 12:43:34.090')
    GO

    Thanks in advance.

  • You could try something like this

    with rn_cte as (
    select *, case when row_number() over (partition by [Itinerary] order by [CreationDate])=1 and LegSegment is null then 1
    when row_number() over (partition by [Itinerary] order by [CreationDate])=2 and LegSegment is null then 1
    else 0 end first_two
    from #Test)
    select [Itinerary]
    from rn_cte
    group by [Itinerary]
    having sum(first_two)=2
    order by [Itinerary];

    Aus dem Paradies, das Cantor uns geschaffen, soll uns niemand vertreiben können

  • I'm trying to think of a more efficient way, but for now this should at least function correctly:

    ;WITH cte_add_row_num AS (
    SELECT *, ROW_NUMBER() OVER(PARTITION BY Itinerary ORDER BY CreationDate) AS row_num
    FROM dbo.Test
    )
    SELECT T.*
    FROM dbo.Test T
    INNER JOIN cte_add_row_num carn1 ON carn1.Itinerary = T.Itinerary AND carn1.row_num = 1 AND carn1.LegSegment IS NULL
    INNER JOIN cte_add_row_num carn2 ON carn2.Itinerary = T.Itinerary AND carn2.row_num = 2 AND carn2.LegSegment IS NULL
    ORDER BY T.Itinerary, T.CreationDate

    SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

  • By first and second do you mean according to CreationDate? ID looks like it would also serve as an ordering column, but I did not want to make that assumption.

    What output do you need? If just the Itinerary values, does this work? If you need the ID's then another approach will be needed.

    SELECT a.Itinerary
    FROM (
    SELECT Itinerary, LegSegment,
    ROW_NUMBER() OVER (PARTITION BY Itinerary ORDER BY CreationDate) AS RowNum
    FROM dbo.test
    ) AS a
    WHERE a.LegSegment IS NULL
    AND a.RowNum < 3
    GROUP BY a.Itinerary
    HAVING COUNT(a.Itinerary) = 2;

    I couldn't think of a good way to avoid using a rownumber, but I am interested to see alternatives.

  • As I see it, it's not the ROW_NUMBER itself that causes inefficiency (SQL is remarkably efficient at providing ROW_NUMBER), it's the multiple uses of it and thus multiple scans of the table.

    I believe the OP wanted to list all columns for all rows where the Itinerary met the conditions, so I would think at least two table scans would be required, but I'm trying to think of a (reasonable) way to get rid of the third one.

    IF there are a lot of rows in the table, that could make a big difference.  If not, maybe not worth worrying about.

    SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

  • ;WITH cte_add_row_num AS (
    SELECT *, ROW_NUMBER() OVER(PARTITION BY Itinerary ORDER BY CreationDate) AS row_num
    FROM dbo.Test
    )
    SELECT T.*
    FROM dbo.Test T
    INNER JOIN (
    SELECT Itinerary
    FROM cte_add_row_num carn
    WHERE row_num IN (1, 2) AND LegSegment IS NULL
    GROUP BY Itinerary
    HAVING COUNT(*) = 2
    ) AS carn ON carn.Itinerary = T.Itinerary
    ORDER BY T.Itinerary, T.CreationDate

    SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

  • I have a big database where I have to implement this query, I run these scripts and all the solutions worked like a charm except #4059312.  You are really skilled.

    Thank you all of you 🙂

    These are the final scripts with the original tables from the real database, all of them work

    with rn_cte as (
    select *, case when row_number() over (partition by tripItineraryMasterID order by CreationDtSrv)=1 and prevLegSegmentsID is null then 1
    when row_number() over (partition by tripItineraryMasterID order by CreationDtSrv)=2 and prevLegSegmentsID is null then 1
    else 0 end first_two
    from tripLegSegments)
    select tripItineraryMasterID
    from rn_cte
    group by tripItineraryMasterID
    having sum(first_two)=2
    order by tripItineraryMasterID;

    SELECT a.tripItineraryMasterID
    FROM (
    SELECT tripItineraryMasterID, prevLegSegmentsID,
    ROW_NUMBER() OVER (PARTITION BY tripItineraryMasterID ORDER BY CreationDtSrv) AS RowNum
    FROM tripLegSegments
    ) AS a
    WHERE a.prevLegSegmentsID IS NULL
    AND a.RowNum < 3
    GROUP BY a.tripItineraryMasterID
    HAVING COUNT(a.tripItineraryMasterID) = 2;

    WITH cte_add_row_num AS (
    SELECT *, ROW_NUMBER() OVER(PARTITION BY tripItineraryMasterID ORDER BY CreationDtSrv) AS row_num
    FROM tripLegSegments
    )
    SELECT T.*
    FROM tripLegSegments T
    INNER JOIN (
    SELECT tripItineraryMasterID
    FROM cte_add_row_num carn
    WHERE row_num IN (1, 2) AND prevLegSegmentsID IS NULL
    GROUP BY tripItineraryMasterID
    HAVING COUNT(*) = 2
    ) AS carn ON carn.tripItineraryMasterID = T.tripItineraryMasterID
    ORDER BY T.tripItineraryMasterID, T.CreationDtSrv

     

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply