t-sql 2012 split and locate unique messages

  • In a t-sql 2012 database, I am suppose to count the number of times each 'canned' message is used for
     Elementary students in the last school year and the current school  year.

    The following is the sql that I currently am using:
    Declare @Delimiter char(2) = '. ';

     ;With GetTheComments(Comments) As
     (SELECT GS.Comments
      FROM dbo.Enrol Enrol
      JOIN dbo.Student Student
          ON Student.StudentID = Enrol.StudentID
      JOIN dbo.GS GS
          ON GS.StudentID = Student.Studentid
           AND (GS.Comments IS NOT NULL)
        AND (GS.Comments <> '')
      where Enrol.grade in ('KG','01','02','03','04','05','06')
      and Enrol.endYear between 2016 and 2017)

      ),
      Pieces (Comments, start, stop) AS (
    SELECT Comments, CAST(1 AS bigint) As start, CAST(CHARINDEX(@Delimiter, Comments) AS bigint) As stop
    From GetTheComments
    UNION ALL
    SELECT Comments, CAST(stop + Len(@Delimiter) As bigint) As start, CAST(CHARINDEX(@Delimiter, Comments, stop + Len(@Delimiter)) AS bigint) As stop
    FROM Pieces
    WHERE stop > 0),

    EachComment(Comments) As
    (SELECT LTrim(SUBSTRING(Comments, start, CASE WHEN stop > 0 THEN stop-start ELSE LEN(Comments) END)) AS Comments
    FROM Pieces)

    Select Comments, Count(*) As Counts
    From EachComment
    where Comments <> ''
    Group By Comments
    Order By Counts Desc, Comments Asc;
    An example of multiple messages in the one GS.Comments field would look like the following:
    This student is good at math. This student turns in their assignments on time. This student seems to enjoy school.

     An example of when one messages is in  the one GS.Comments field would look like the following:

    This student seems to enjoy school

    The problem is the GS.Comments field is defined as varchar(1200). There can be one message in the field and/or there can be lots of messages in this field. Each message ends with a period and there is a space between each message that have
    more than 1 message in the varchar(1200) field. When there is only 1 message, there is no period afer the message.

    The problem is there can be results that look like the following

    message                                                                count
    This student seems to enjoy school.                           150
    This student seems to enjoy school                              25
    .
    Basically there is a count for a message with a period and there is a couunt for when there is not a period for the same message.

    Thus can you show me what I can do with the sql listed above so the messages are treated as being the same. Basically I want 1 message with 1 count for what I listed above.

  • dianerstein 8713 - Thursday, May 25, 2017 4:12 PM

    In a t-sql 2012 database, I am suppose to count the number of times each 'canned' message is used for
     Elementary students in the last school year and the current school  year.

    The following is the sql that I currently am using:
    Declare @Delimiter char(2) = '. ';

     ;With GetTheComments(Comments) As
     (SELECT GS.Comments
      FROM dbo.Enrol Enrol
      JOIN dbo.Student Student
          ON Student.StudentID = Enrol.StudentID
      JOIN dbo.GS GS
          ON GS.StudentID = Student.Studentid
           AND (GS.Comments IS NOT NULL)
        AND (GS.Comments <> '')
      where Enrol.grade in ('KG','01','02','03','04','05','06')
      and Enrol.endYear between 2016 and 2017)

      ),
      Pieces (Comments, start, stop) AS (
    SELECT Comments, CAST(1 AS bigint) As start, CAST(CHARINDEX(@Delimiter, Comments) AS bigint) As stop
    From GetTheComments
    UNION ALL
    SELECT Comments, CAST(stop + Len(@Delimiter) As bigint) As start, CAST(CHARINDEX(@Delimiter, Comments, stop + Len(@Delimiter)) AS bigint) As stop
    FROM Pieces
    WHERE stop > 0),

    EachComment(Comments) As
    (SELECT LTrim(SUBSTRING(Comments, start, CASE WHEN stop > 0 THEN stop-start ELSE LEN(Comments) END)) AS Comments
    FROM Pieces)

    Select Comments, Count(*) As Counts
    From EachComment
    where Comments <> ''
    Group By Comments
    Order By Counts Desc, Comments Asc;
    An example of multiple messages in the one GS.Comments field would look like the following:
    This student is good at math. This student turns in their assignments on time. This student seems to enjoy school.

     An example of when one messages is in  the one GS.Comments field would look like the following:

    This student seems to enjoy school

    The problem is the GS.Comments field is defined as varchar(1200). There can be one message in the field and/or there can be lots of messages in this field. Each message ends with a period and there is a space between each message that have
    more than 1 message in the varchar(1200) field. When there is only 1 message, there is no period afer the message.

    The problem is there can be results that look like the following

    message                                                                count
    This student seems to enjoy school.                           150
    This student seems to enjoy school                              25
    .
    Basically there is a count for a message with a period and there is a couunt for when there is not a period for the same message.

    Thus can you show me what I can do with the sql listed above so the messages are treated as being the same. Basically I want 1 message with 1 count for what I listed above.

    Use REPLACE to strip out the punctuation and write it to another column?  Didn't you post the previous stuff about this under your other account, Wendy Elizabeth? Did you use NGrams that Alan Burstein recommended?
    You talk about the same topics, have the same speech patterns, ask the same questions... so you're the same person. Right?
    The original post is here

  • pietlinden - Thursday, May 25, 2017 8:48 PM

    Use REPLACE to strip out the punctuation and write it to another column?  Didn't you post the previous stuff about this under your other account, Wendy Elizabeth? Did you use NGrams that Alan Burstein recommended?
    You talk about the same topics, have the same speech patterns, ask the same questions... so you're the same person. Right?
    The original post is here

    I remember reading that thread as it developed.  I thought Alan's solution for it was excellent.

Viewing 3 posts - 1 through 2 (of 2 total)

You must be logged in to reply to this topic. Login to reply