Collapsing Hierarchy of messages into single row

  • All,
    I am working on a messaging system which contains message chains. I have attached an Excel sheet with the first two tabs being test data and the third tab has the desired results. The T-SQL I am working with is below

    with cteMessage as (
        select DISTINCT T1.MessageID,
             COALESCE(T1.ParentMessageID, T1.MessageID) as 'ParentMessageID',
             T2.RecipientID,
             convert(varchar(max), STUFF((SELECT ', ' + convert(varchar, b.RecipientID)
        FROM MessageRecipient b
        WHERE b.MessageID = T2.MessageID
        FOR XML PATH('')), 1, 2, '')) as 'RecipientIDs',
             T2.Retired,
             1 as 'Level',
             convert(varchar(max), T1.MessageBody) as 'MessageBody'
        from dbo.Message T1 inner join dbo.MessageRecipient T2 on T1.MessageID = T2.MessageID
        WHERE T1.ParentMessageID IS NULL
        UNION ALL
            select T1.MessageID,
                 T1.ParentMessageID,
                 T2.RecipientID,
                 convert(varchar(max), STUFF((SELECT ', ' + convert(varchar, b.RecipientID)
                 FROM MessageRecipient b
                 WHERE b.MessageID = T2.MessageID
                    FOR XML PATH('')), 1, 2, '')),
                 T2.Retired,
                 T3.Level + 1 as 'Level',
               convert(varchar(max), T1.MessageBody) + '|' + T3.MessageBody as 'MessageBody'
        from dbo.Message T1
        inner join dbo.MessageRecipient T2 on T1.MessageID = T2.MessageID
        inner join cteMessage T3 on T3.MessageID = T1.ParentMessageID
        WHERE T1.ParentMessageID is not null
        )
        ,cteMessageList as (select *, row_number() over (partition by ParentMessageID order by Level DESC) as RowID from cteMessage)

        select * from cteMessageList where RowID = 1

    Unfortunately, it is producing erratic results. Does anyone have any idea what I am missing?
    I appreciate the help in advance.
    Michael

  • Could you provide the DDL?  I think you will get better (and faster) responses if you provide the DDL so others can build the required tables and then work with the data.

    The above is all just my opinion on what you should do. 
    As with all advice you find on a random internet forum - you shouldn't blindly follow it.  Always test on a test server to see if there is negative side effects before making changes to live!
    I recommend you NEVER run "random code" you found online on any system you care about UNLESS you understand and can verify the code OR you don't care if the code trashes your system.

  • I added it to my original post. Thanks for the tip!!

  • Michael Esposito - Wednesday, July 5, 2017 11:52 AM

    I added it to my original post. Thanks for the tip!!

    some sample data insert scripts would also help....along with expected results for that sample data 🙂🙂

    ________________________________________________________________
    you can lead a user to data....but you cannot make them think
    and remember....every day is a school day

  • How about this (based on the excel file):
    DECLARE @message TABLE (MessageID INT, parentMessageID INT, CreatorID INT, Messagebody VARCHAR(MAX), senddate DATETIME2, retired BIT, retireddate datetime2)
    DECLARE @messageRecipient TABLE (messagerecipientID INT, [MessageID] INT, [RecipientID] INT, readdate DATETIME2, retired BIT, retireddate DATETIME2)

    INSERT INTO @message VALUES
    (1, NULL, 7, 'This is the original message.', '2017-06-29 12:47:00.0000000 -04:00', 0, NULL),
    (2, NULL, 7, 'This is another oirignal message to just Brice.', '2017-06-29 12:47:00.0000000 -04:00', 0, NULL),
    (3, 1, 2, 'This is a response to the first message.', '2017-06-29 12:47:00.0000000 -04:00', 0, NULL),
    (4, 3, 3, 'This is a response to the first message...again.', '2017-06-29 12:47:00.0000000 -04:00', 0, NULL),
    (5, 2, 4, 'This is a response from Brice to Michael', '2017-06-29 12:47:00.0000000 -04:00', 0, NULL)
    INSERT INTO @messageRecipient VALUES
    (1, 1, 2, NULL, 0, NULL),
    (2, 1, 3, NULL, 0, NULL),
    (3, 1, 4, NULL, 0, NULL),
    (4, 1, 7, NULL, 0, NULL),
    (5, 2, 4, NULL, 0, NULL),
    (6, 3, 2, NULL, 0, NULL),
    (7, 3, 3, NULL, 0, NULL),
    (8, 3, 4, NULL, 0, NULL),
    (9, 3, 7, NULL, 0, NULL),
    (17, 4, 2, NULL, 0, NULL),
    (18, 4, 3, NULL, 0, NULL),
    (19, 4, 4, NULL, 0, NULL),
    (20, 4, 7, NULL, 0, NULL),
    (21, 2, 7, NULL, 0, NULL),
    (22, 5, 2, NULL, 0, NULL),
    (23, 5, 4, NULL, 0, NULL)

    ;WITH    [cteMessage]
             AS (
                 SELECT DISTINCT
                        [T1].[MessageID] ,
                        COALESCE([T1].[ParentMessageID], [T1].[MessageID]) AS 'ParentMessageID' ,
                        [T2].[RecipientID] ,
                        CONVERT(VARCHAR(MAX), STUFF((
                                                     SELECT
                                                            ', '
                                                            + CONVERT(VARCHAR, .[RecipientID])
                                                        FROM
                                                            @MessageRecipient
                                                        WHERE
                                                            .[MessageID] = [T2].[MessageID]
                                                    FOR
                                                     XML PATH('')
                                                    ), 1, 2, '')) AS 'RecipientIDs' ,
                        [T2].[Retired] ,
                        1 AS 'Level' ,
                        CONVERT(VARCHAR(MAX), [T1].[MessageBody]) AS 'MessageBody'
                    FROM
                        @message [T1]
                    INNER JOIN @MessageRecipient [T2]
                    ON    [T1].[MessageID] = [T2].[MessageID]
                    WHERE
                        [T1].[ParentMessageID] IS NULL
                 UNION ALL
                 SELECT
                        [T1].[MessageID] ,
                        [T1].[ParentMessageID] ,
                        [T2].[RecipientID] ,
                        CONVERT(VARCHAR(MAX), STUFF((
                                                     SELECT
                                                            ', '
                                                            + CONVERT(VARCHAR, .[RecipientID])
                                                        FROM
                                                            @MessageRecipient
                                                        WHERE
                                                            .[MessageID] = [T2].[MessageID]
                                                    FOR
                                                     XML PATH('')
                                                    ), 1, 2, '')) ,
                        [T2].[Retired] ,
                        [T3].[Level] + 1 AS 'Level' ,
                        CONVERT(VARCHAR(MAX), [T1].[MessageBody]) + '|'
                        + [T3].[MessageBody] AS 'MessageBody'
                    FROM
                        @message [T1]
                    INNER JOIN @messagerecipient [T2]
                    ON    [T1].[MessageID] = [T2].[MessageID]
                    INNER JOIN [cteMessage] [T3]
                    ON    [T3].[MessageID] = [T1].[ParentMessageID]
                    WHERE
                        [T1].[ParentMessageID] IS NOT NULL
                 ),
            [cteMessageList]
             AS (
                 SELECT
                         MAX(messageID) OVER (PARTITION BY recipientIDs) AS MessageID,
                        [cteMessage].[ParentMessageID] ,
                        [cteMessage].[RecipientIDs] ,
                        [cteMessage].[Level] ,
                        [cteMessage].[MessageBody] ,
                        MAX(level) OVER (PARTITION BY recipientIDs) AS maxLevel,
                        ROW_NUMBER() OVER ( PARTITION BY [cteMessage].[ParentMessageID] ORDER BY [cteMessage].[Level] DESC ) AS [RowID]
                    FROM
                        [cteMessage]
                 )
        SELECT [cteMessageList].[MessageID] ,
             [cteMessageList].[ParentMessageID] ,
             [cteMessageList].[RecipientIDs] ,
             [cteMessageList].[Level] ,
             [cteMessageList].[MessageBody] ,
             [cteMessageList].[RowID]
            FROM
                [cteMessageList]
            WHERE
                [cteMessageList].[RowID] = 1 AND level = maxlevel

    Creates the tables as variables so you will need to modify that.  And it uses the data inside your excel spreadsheet, which has a typo on the DesiredResults worksheet.  It states that MessageID 5 has recipient ID's 7 and 4, but if you look at Message Recipient, it clearly shows that it went to 2 and 4.
    The above works with the one particular dataset you provided, but I believe it will be unreliable as I am doing the "max level" partitioning over the recipientIDs which feels like it could introduce errors in a larger data set.
    I'm working out a different solution now...

    EDIT - I can't see any nice way to partition things.  I was trying a few different approaches, but came up empty.
    I think what you need is a conversation ID that would keep track of the conversations.  Then you could use a depth counter in your recursive CTE to keep track how deep a conversation goes.  Then in the second CTE, you'd select the max(depth) over (partition by conversationID) as maxDepth (for example) and in the final select put "WHERE depth = maxDepth AND RowID = 1".  I don't see any nice column to partition it by to get better results unless by recipientIDs is an acceptable solution?
    But somebody else on here may have a better idea 🙂

    The above is all just my opinion on what you should do. 
    As with all advice you find on a random internet forum - you shouldn't blindly follow it.  Always test on a test server to see if there is negative side effects before making changes to live!
    I recommend you NEVER run "random code" you found online on any system you care about UNLESS you understand and can verify the code OR you don't care if the code trashes your system.

  • But somebody else on here may have a better idea Smile 

    Any book on SQL Querying by Itzik Ben-Gan shows how to deal with hierarchies, graphs and networks as well. Joe Celkocomes to mind, too - "SQL for Smarties".
    Both Itzik and Joe demonstrate brilliant ideas and solutions, so good luck  and enjoy

    🙂

  • Zidar - Thursday, July 13, 2017 10:50 AM

    But somebody else on here may have a better idea Smile 

    Any book on SQL Querying by Itzik Ben-Gan shows how to deal with hierarchies, graphs and networks as well. Joe Celkocomes to mind, too - "SQL for Smarties".
    Both Itzik and Joe demonstrate brilliant ideas and solutions, so good luck  and enjoy

    🙂

    I am not sure if you were indicating that my solution was not good and I should read some stuff by them OR if that was directed at the OP.  I personally would have designed the table to have a conversation ID as well as a message ID and parent message ID.  And would have normalized things a bit differently.  I would have had a user table, a message table and a conversation table.  Conversation table would be mostly int columns with foreign keys off to the message table and user table.  But that doesn't mean my solution would be the "best" one.
    My solution posted above works with the provided input, but the issue is that my method may screw things up as it doesn't do a nice way of conversation group matching.  BUT I have come up with a better method for marking the conversation grouping so this method should work better in most cases:
    DECLARE @message TABLE (MessageID INT, parentMessageID INT, CreatorID INT, Messagebody VARCHAR(MAX), senddate DATETIME2, retired BIT, retireddate datetime2)
    DECLARE @messageRecipient TABLE (messagerecipientID INT, [MessageID] INT, [RecipientID] INT, readdate DATETIME2, retired BIT, retireddate DATETIME2)

    INSERT INTO @message VALUES
    (1, NULL, 7, 'This is the original message.', '2017-06-29 12:47:00.0000000 -04:00', 0, NULL),
    (2, NULL, 7, 'This is another oirignal message to just Brice.', '2017-06-29 12:47:00.0000000 -04:00', 0, NULL),
    (3, 1, 2, 'This is a response to the first message.', '2017-06-29 12:47:00.0000000 -04:00', 0, NULL),
    (4, 3, 3, 'This is a response to the first message...again.', '2017-06-29 12:47:00.0000000 -04:00', 0, NULL),
    (5, 2, 4, 'This is a response from Brice to Michael', '2017-06-29 12:47:00.0000000 -04:00', 0, NULL)
    INSERT INTO @messageRecipient VALUES
    (1, 1, 2, NULL, 0, NULL),
    (2, 1, 3, NULL, 0, NULL),
    (3, 1, 4, NULL, 0, NULL),
    (4, 1, 7, NULL, 0, NULL),
    (5, 2, 4, NULL, 0, NULL),
    (6, 3, 2, NULL, 0, NULL),
    (7, 3, 3, NULL, 0, NULL),
    (8, 3, 4, NULL, 0, NULL),
    (9, 3, 7, NULL, 0, NULL),
    (17, 4, 2, NULL, 0, NULL),
    (18, 4, 3, NULL, 0, NULL),
    (19, 4, 4, NULL, 0, NULL),
    (20, 4, 7, NULL, 0, NULL),
    (21, 2, 7, NULL, 0, NULL),
    (22, 5, 2, NULL, 0, NULL),
    (23, 5, 4, NULL, 0, NULL)

    ;WITH  [cteMessage]
        AS (
         SELECT DISTINCT
           [T1].[MessageID] ,
           COALESCE([T1].[ParentMessageID], [T1].[MessageID]) AS 'ParentMessageID' ,
           [T2].[RecipientID] ,
           CONVERT(VARCHAR(MAX), STUFF((
                     SELECT
                       ', '
                       + CONVERT(VARCHAR, .[RecipientID])
                      FROM
                       @MessageRecipient
                      WHERE
                       .[MessageID] = [T2].[MessageID]
                    FOR
                     XML PATH('')
                    ), 1, 2, '')) AS 'RecipientIDs' ,
           1 AS 'Level' ,
           CONVERT(VARCHAR(MAX), [T1].[MessageBody]) AS 'MessageBody',
                        [T1].MessageID AS ConversationID
          FROM
           @message [T1]
          INNER JOIN @MessageRecipient [T2]
          ON  [T1].[MessageID] = [T2].[MessageID]
          WHERE
           [T1].[ParentMessageID] IS NULL
         UNION ALL
         SELECT
           [T1].[MessageID] ,
           [T1].[ParentMessageID] ,
           [T2].[RecipientID] ,
           CONVERT(VARCHAR(MAX), STUFF((
                     SELECT
                       ', '
                       + CONVERT(VARCHAR, .[RecipientID])
                      FROM
                       @MessageRecipient
                      WHERE
                       .[MessageID] = [T2].[MessageID]
                    FOR
                     XML PATH('')
                    ), 1, 2, '')) AS 'RecipientIDs' ,
           [T3].[Level] + 1 AS 'Level' ,
           CONVERT(VARCHAR(MAX), [T1].[MessageBody]) + '|'
           + [T3].[MessageBody] AS 'MessageBody',
                        [T3].[ConversationID]
          FROM
           @message [T1]
          INNER JOIN @messagerecipient [T2]
          ON  [T1].[MessageID] = [T2].[MessageID]
          INNER JOIN [cteMessage] [T3]
          ON  [T3].[MessageID] = [T1].[ParentMessageID]
          WHERE
           [T1].[ParentMessageID] IS NOT NULL
         ),
       [cteMessageList]
        AS (
         SELECT
           MAX(messageID) OVER (PARTITION BY recipientIDs) AS MessageID,
           [cteMessage].[ParentMessageID] ,
           [cteMessage].[RecipientIDs] ,
           [cteMessage].[Level] ,
           [cteMessage].[MessageBody] ,
                        ROW_NUMBER() OVER (PARTITION BY [cteMessage].[ConversationID] ORDER BY [cteMessage].[Level] desc) AS conversationRowID
          FROM
           [cteMessage]
         )
      SELECT [cteMessageList].[MessageID] ,
       [cteMessageList].[ParentMessageID] ,
       [cteMessageList].[RecipientIDs] ,
       [cteMessageList].[Level] ,
       [cteMessageList].[MessageBody]
       FROM
        [cteMessageList]
       WHERE
        [cteMessageList].[conversationRowID] = 1

    I also stripped out the unused columns from the CTE.

    The above is all just my opinion on what you should do. 
    As with all advice you find on a random internet forum - you shouldn't blindly follow it.  Always test on a test server to see if there is negative side effects before making changes to live!
    I recommend you NEVER run "random code" you found online on any system you care about UNLESS you understand and can verify the code OR you don't care if the code trashes your system.

  • Michael Esposito - Wednesday, July 5, 2017 11:52 AM

    I added it to my original post. Thanks for the tip!!

    Here's another tip.  Test data in an Excel spreadsheet is mostly useless to those trying to help.  Please see the first link under "Helpful Links" in my signature line below for how to post "Readily Consumable Data", which inherently also provides the necessary DDL.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply