Windowing function - partition by .. finding the right partition

  • Hi,

    I am trying to get a row_number windowing function to work the way I hope it could work ..

    Using the following data definitions


    CREATE TABLE [dbo].[DataChanges](
    [ID] [numeric](12, 0) NOT NULL,
    [S_CODE] [varchar](8) NULL,
    [SnapshotDateTime] [datetime] NOT NULL
    ) ON [PRIMARY]

    And then with the following test data


    INSERT INTO [dbo].[DataChanges] ([ID] ,[S_CODE] ,[SnapshotDateTime]) VALUES (1446491,9,'2017-07-13 09:26:50')
    INSERT INTO [dbo].[DataChanges] ([ID] ,[S_CODE] ,[SnapshotDateTime]) VALUES (1446491,9,'2017-07-13 09:28:04')
    INSERT INTO [dbo].[DataChanges] ([ID] ,[S_CODE] ,[SnapshotDateTime]) VALUES (1446491,9,'2017-07-13 09:30:01')
    INSERT INTO [dbo].[DataChanges] ([ID] ,[S_CODE] ,[SnapshotDateTime]) VALUES (1446491,9,'2017-07-13 09:32:01')
    INSERT INTO [dbo].[DataChanges] ([ID] ,[S_CODE] ,[SnapshotDateTime]) VALUES (1446491,7,'2017-07-13 09:34:01')
    INSERT INTO [dbo].[DataChanges] ([ID] ,[S_CODE] ,[SnapshotDateTime]) VALUES (1446491,7,'2017-07-13 09:36:01')
    INSERT INTO [dbo].[DataChanges] ([ID] ,[S_CODE] ,[SnapshotDateTime]) VALUES (1446491,7,'2017-07-13 09:38:01')
    INSERT INTO [dbo].[DataChanges] ([ID] ,[S_CODE] ,[SnapshotDateTime]) VALUES (1446493,25,'2017-07-13 09:26:50')
    INSERT INTO [dbo].[DataChanges] ([ID] ,[S_CODE] ,[SnapshotDateTime]) VALUES (1446493,25,'2017-07-13 09:28:04')
    INSERT INTO [dbo].[DataChanges] ([ID] ,[S_CODE] ,[SnapshotDateTime]) VALUES (1446493,12,'2017-07-13 09:30:01')
    INSERT INTO [dbo].[DataChanges] ([ID] ,[S_CODE] ,[SnapshotDateTime]) VALUES (1446493,12,'2017-07-13 09:32:01')
    INSERT INTO [dbo].[DataChanges] ([ID] ,[S_CODE] ,[SnapshotDateTime]) VALUES (1446493,30,'2017-07-13 09:34:01')
    INSERT INTO [dbo].[DataChanges] ([ID] ,[S_CODE] ,[SnapshotDateTime]) VALUES (1446493,30,'2017-07-13 09:36:01')
    INSERT INTO [dbo].[DataChanges] ([ID] ,[S_CODE] ,[SnapshotDateTime]) VALUES (1446493,30,'2017-07-13 09:38:01')
    INSERT INTO [dbo].[DataChanges] ([ID] ,[S_CODE] ,[SnapshotDateTime]) VALUES (1446494,1,'2017-07-13 09:26:50')
    INSERT INTO [dbo].[DataChanges] ([ID] ,[S_CODE] ,[SnapshotDateTime]) VALUES (1446494,1,'2017-07-13 09:28:04')
    INSERT INTO [dbo].[DataChanges] ([ID] ,[S_CODE] ,[SnapshotDateTime]) VALUES (1446494,1,'2017-07-13 09:30:01')
    INSERT INTO [dbo].[DataChanges] ([ID] ,[S_CODE] ,[SnapshotDateTime]) VALUES (1446494,1,'2017-07-13 09:32:01')
    INSERT INTO [dbo].[DataChanges] ([ID] ,[S_CODE] ,[SnapshotDateTime]) VALUES (1446494,5,'2017-07-13 09:34:01')
    INSERT INTO [dbo].[DataChanges] ([ID] ,[S_CODE] ,[SnapshotDateTime]) VALUES (1446494,6,'2017-07-13 09:36:01')
    INSERT INTO [dbo].[DataChanges] ([ID] ,[S_CODE] ,[SnapshotDateTime]) VALUES (1446494,6,'2017-07-13 09:38:01')

    Resulting in the following pretty dataset


    ID    S_CODE    SnapshotDateTime
    1446491    9    2017-07-13 09:26:50.000
    1446491    9    2017-07-13 09:28:04.000
    1446491    9    2017-07-13 09:30:01.000
    1446491    9    2017-07-13 09:32:01.000
    1446491    7    2017-07-13 09:34:01.000
    1446491    7    2017-07-13 09:36:01.000
    1446491    7    2017-07-13 09:38:01.000
    1446493    25    2017-07-13 09:26:50.000
    1446493    25    2017-07-13 09:28:04.000
    1446493    12    2017-07-13 09:30:01.000
    1446493    12    2017-07-13 09:32:01.000
    1446493    30    2017-07-13 09:34:01.000
    1446493    30    2017-07-13 09:36:01.000
    1446493    30    2017-07-13 09:38:01.000
    1446494    1    2017-07-13 09:26:50.000
    1446494    1    2017-07-13 09:28:04.000
    1446494    1    2017-07-13 09:30:01.000
    1446494    1    2017-07-13 09:32:01.000
    1446494    5    2017-07-13 09:34:01.000
    1446494    6    2017-07-13 09:36:01.000
    1446494    6    2017-07-13 09:38:01.000

    basically S_CODE value is retrieved for each ID every 2 minutes (from the SnapshotDateTime)

    I am trying to write a windowing function that will highlight to me wherever there's a "S_code" change within the same ID

    So as a result, I would like to have something like

    ID    S_CODE    SnapshotDateTime    rwNbr
    1446491    9    2017-07-13 09:26:50    1
    1446491    9    2017-07-13 09:28:04    1
    1446491    9    2017-07-13 09:30:01    1
    1446491    9    2017-07-13 09:32:01    1
    1446491    7    2017-07-13 09:34:01    2
    1446491    7    2017-07-13 09:36:01    2
    1446491    7    2017-07-13 09:38:01    2
    1446493    25    2017-07-13 09:26:50    1
    1446493    25    2017-07-13 09:28:04    1
    1446493    12    2017-07-13 09:30:01    2
    1446493    12    2017-07-13 09:32:01    2
    1446493    30    2017-07-13 09:34:01    3
    1446493    30    2017-07-13 09:36:01    3
    1446493    30    2017-07-13 09:38:01    3
    1446494    1    2017-07-13 09:26:50    1
    1446494    1    2017-07-13 09:28:04    1
    1446494    1    2017-07-13 09:30:01    1
    1446494    1    2017-07-13 09:32:01    1
    1446494    5    2017-07-13 09:34:01    2
    1446494    6    2017-07-13 09:36:01    3
    1446494    6    2017-07-13 09:38:01    3

    see how the RowNumber resets itself to 1 whenever there's a new ID and increments when there's a new S_Code but retains the same value as long as the S_CODE (and ID) don't change ..

    If I run the following code:

    ;With FRowNums As (
    SELECT ID, S_CODE, SnapshotDateTime
    ,row_number() OVER (PARTITION BY ID, S_CODE ORDER BY SnapshotDateTime) as rwNbr
    FROM dbo.DataChanges
    ),
    SRowNums AS
    (
    SELECT ID, S_CODE, SnapshotDateTime
    ,row_number() OVER (PARTITION BY ID ORDER BY SnapshotDateTime) as rwNbr
    FROM dbo.DataChanges
    )
    SELECT
    t1.ID, t1.S_CODE, t1.rwNbr, t2.RwNbr as T2ROWNBR
    FROM FRowNums t1
    INNER JOIN SRowNums t2 ON t2.ID = t1.ID

    Then that gives me almost what I want


    ID    S_CODE    rwNbr    T2ROWNBR
    1446491    7    1    1
    1446491    7    2    1
    1446491    7    3    1
    1446491    9    1    1
    1446491    9    2    1
    1446491    9    3    1
    1446491    9    4    1
    1446491    7    1    2
    1446491    7    2    2
    1446491    7    3    2
    1446491    9    1    2
    1446491    9    2    2
    1446491    9    3    2
    1446491    9    4    2
    1446491    7    1    3
    1446491    7    2    3
    1446491    7    3    3
    .....

    But not quite .. If I were to concatenate these rownumber columns then I would get what I want but surely I should be able to get there in one go?

    Can anybody provide a hint or 2?

    Cheers

    B

  • Use LAG instead

    SELECT ID, S_CODE, SnapshotDateTime,
       CASE WHEN S_CODE = LAG(S_CODE) OVER(PARTITION BY ID ORDER BY SnapshotDateTime) THEN 'Same' ELSE 'S_CODE changed' END
    FROM DataChanges

    ____________________________________________________

    Deja View - The strange feeling that somewhere, sometime you've optimised this query before

    How to get the best help on a forum

    http://www.sqlservercentral.com/articles/Best+Practices/61537
  • Hi Mark,

    thanks for that! that worked quite well.

    Cheers,

    B

Viewing 3 posts - 1 through 2 (of 2 total)

You must be logged in to reply to this topic. Login to reply