Grouping

  • I have a project that I am afraid may require a cursor. I was wondering has anyone had a problem like this and solved it without one?

    Say I have a table with the following data

    DateAndTimeLevel

    9/28/2010 1:0012

    9/28/2010 1:0112

    9/28/2010 1:0212

    9/28/2010 1:0314

    9/28/2010 1:0414

    9/28/2010 1:0514

    9/28/2010 1:0614

    9/28/2010 1:0712

    9/28/2010 1:08 12

    9/28/2010 1:09 12

    9/28/2010 1:10 12

    I want to start a new group every time the Level changes to get the following:

    Level FirstLast

    129/28/2010 1:009/28/2010 1:02

    149/28/2010 1:039/28/2010 1:06

    129/28/2010 1:079/28/2010 1:10

    If I write a query grouping on Level and returning the min and max of DateAndTime I get, as expected, this:

    Level FirstLast

    129/28/2010 1:009/28/2010 1:10

    149/28/2010 1:039/28/2010 1:06

    Does anyone have any ideas on how to do this without RBAR?

    Thanks, Chris

  • Try this:

    DECLARE @tab TABLE

    (

    [Date] DATETIME ,

    [Level] INT

    )

    INSERT INTO @tab ([Date] ,[Level] )

    SELECT '9/28/2010 1:00', 12

    UNION ALL

    SELECT '9/28/2010 1:01', 12

    UNION ALL

    SELECT '9/28/2010 1:02', 12

    UNION ALL

    SELECT '9/28/2010 1:03', 14

    UNION ALL

    SELECT '9/28/2010 1:04', 14

    UNION ALL

    SELECT '9/28/2010 1:05', 14

    UNION ALL

    SELECT '9/28/2010 1:06', 14

    UNION ALL

    SELECT '9/28/2010 1:07', 12

    UNION ALL

    SELECT '9/28/2010 1:08', 12

    UNION ALL

    SELECT '9/28/2010 1:09', 12

    UNION ALL

    SELECT '9/28/2010 1:10', 12

    ; WITH cte0 AS

    (

    SELECT

    ID = ROW_NUMBER() OVER (ORDER BY (SELECT NULL))

    ,[DATE]

    ,[Level]

    FROM @tab

    )

    ,CTE1 AS

    (

    SELECT ID,[Level],[Date] ,

    ID-ROW_NUMBER() OVER(PARTITION BY [Level] ORDER BY ID) AS rn

    FROM cte0

    )

    ,CTE2 AS

    (

    SELECT ID,[Date] ,

    [Level],

    MIN(ID) OVER(PARTITION BY rn,[Level]) AS minID

    FROM CTE1

    )--select * From CTE2 order by ID

    ,cte3 AS

    (

    SELECT ID, [Date] , [Level],

    DENSE_RANK() OVER(ORDER BY minID) AS SetNum

    FROM CTE2

    )

    , CTE4 AS

    (

    SELECT ID, [Date] , [Level], SetNum ,

    ROW_NUMBER() OVER(PARTITION BY [Level], SetNum ORDER BY ID,SetNum) AS Row_Num

    FROM cte3

    )

    SELECT OuterQ.[Level]

    ,FirstDate =

    (

    SELECT TOP 1 SubQ.[Date]

    FROM CTE4 SubQ

    WHERE SubQ.[Level] = OuterQ.[Level] AND

    SubQ.SetNum = OuterQ.SetNum

    ORDER BY SubQ.Row_Num

    )

    ,LastDate =

    (

    SELECT TOP 1 SubQ.[Date]

    FROM CTE4 SubQ

    WHERE SubQ.[Level] = OuterQ.[Level] AND

    SubQ.SetNum = OuterQ.SetNum

    ORDER BY SubQ.Row_Num DESC

    )

    FROM CTE4 OuterQ

    GROUP BY

    OuterQ.[Level] , OuterQ.SetNum

    ORDER BY

    OuterQ.SetNum

    Thanks to Mark , Chris Morris , Venkat, Sayed for their help in this (Please refer this thread what help they did )

  • :Wow: Wow! That is absolutely amazing. It does exactly what I wanted and although it takes a while on a production database (3:47 over a base table of 137,003 rows) I will take it on face value that it is a lot faster than a cursor. I have done a fair amount of SQL but it uses two items that I have never used:

    CTEs - Are they basically dynamically generated views within a query?

    OVER (PARTITION BY) - Is a way to apply summary functions (the wrong word, I know) over a subset of rows?

    Can anyone suggest a book that goes into more advanced T-SQL?

    Thanks a ton!

    Chris

  • Hi there,

    This may also work.. See if this helps..

    ;WITH cte AS

    (

    SELECT [Date], [Level], ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS id

    , ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) - ROW_NUMBER() OVER (PARTITION BY [Level] ORDER BY (SELECT NULL)) AS diff

    FROM @tab

    )

    SELECT [Level], MIN([Date]) AS [First], MAX([Date]) AS [Last]

    FROM cte

    GROUP BY [Level], diff

    ORDER BY MAX(id)

    Cheers! 😀

  • :Wow: :Wow: Double wow! That took all of 2 seconds! Now, some caching into memory may have been involved so I reran the first query and it took 3:46 so the second approach is definitely faster.

    Thanks again for all of your help!!

  • I did find one little bug...

    If I apply this query to a certain range of data I get the entire data set rather than the summary. However, if I comment out the ORDER BY MAX(id) then it works? It does not really matter because I have a ton of data to summarize and I will be appending data to a summary table every weekend. I am just not sure why some data causes it to break and the fix makes no sense.

    Thanks again, Chris

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply