String occurrence count with Cross apply

  • Hi All,

    See sample data below. I'm trying to count the number of occurrences of strings stored in table @word without a while loop. Any help is appreciated. Thank you.

    DECLARE @t TABLE (Id INT IDENTITY(1,1), String VARCHAR(MAX))

    INSERT INTO @t

    SELECT 'There are a lot of Multidimensional Expressions (MDX) resources available' AS String UNION ALL

    SELECT 'but most teaching aids out there are geared towards professionals with cube development experience' UNION ALL

    SELECT 'As a result SQL developers with no cube development experience start learning MDX on a poor footing' UNION ALL

    SELECT 'because many of the learning aids completely disregard SQL as a good frame of reference for starting the learning process' UNION ALL

    SELECT 'I am going to introduce MDX' UNION ALL

    SELECT 'by drawing only on the similarity and differences between MDX and SQL' UNION ALL

    SELECT 'and more importantly tackle some core MDX and cube concepts along the way' UNION ALL

    SELECT 'In this first part, I will explain how to navigate the cube object with MDX' UNION ALL

    SELECT 'by cutting through some of the quirkiness' UNION ALL

    SELECT 'that makes MDX different from T-SQL even though both were are derived from SQL language' UNION ALL

    SELECT 'Before we move on let’s see how' UNION ALL

    SELECT 'SQL Server 2012 is changing SQL Server data analytics' UNION ALL

    SELECT 'and why MDX is still very important in that regard'

    DECLARE @word TABLE (Id INT IDENTITY(1,1), Word VARCHAR(100))

    INSERT INTO @word

    SELECT 'SQL' AS Word UNION ALL

    SELECT 'MDX' UNION ALL

    SELECT 'cube' UNION ALL

    SELECT '2012' UNION ALL

    SELECT 'learning'

    SELECT * from @t

    SELECT * from @word

    -- Expected result

    SELECT 'SQL' AS Word, 7 AS WordCount UNION ALL

    SELECT 'MDX', 8 UNION ALL

    SELECT 'cube', 4 UNION ALL

    SELECT '2012', 1 UNION ALL

    SELECT 'learning', 3

  • appreciate that you have provided sample data....but in your "real" world and to ensure scalability for a solution...can you tell us approx. number of rows in @t table and confirm that you need "string" to be varchar(max)

    ________________________________________________________________
    you can lead a user to data....but you cannot make them think
    and remember....every day is a school day

  • also, can you please tell us whether you are looking for exact matches or whether partial matches count?

    e.g. if you are looking for "SQL", does "MySQL" count as a hit?

    MM



    select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);

  • Forum Etiquette: How to post Reporting Services problems
  • [/url]
  • Forum Etiquette: How to post data/code on a forum to get the best help - by Jeff Moden
  • [/url]
  • How to Post Performance Problems - by Gail Shaw
  • [/url]

  • Thank you for your quick response. Table @t may store up to 25,000 rows (I used table variable just for demonstration purposes) and yes it is a varchar(max).

    Yes, I would like "MySQL" to be included too.

    Thank you.

  • select Word,sum((len(t.string)-len(replace(t.string,Word,'')))/len(Word)) as count

    from @word as w

    cross join @t as t

    group by Word

    MM



    select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);

  • Forum Etiquette: How to post Reporting Services problems
  • [/url]
  • Forum Etiquette: How to post data/code on a forum to get the best help - by Jeff Moden
  • [/url]
  • How to Post Performance Problems - by Gail Shaw
  • [/url]

  • Thank you!

    I have two follow up questions.

    Is there a way to change the query so that it only counts the word(s) once per row?

    Also, I would like to count the rows that do not include any of those words and return the count in the result set (maybe with a union).

    Thank you for your time.

  • Maybe something like this?

    SELECT ISNULL( w.Word, 'None'),

    COUNT(t.String),

    ISNULL( SUM((LEN(t.String)-LEN(REPLACE(t.String,w.Word,'')))/LEN(w.Word)), COUNT(t.String))

    FROM @word w

    RIGHT

    JOIN @t t ON String LIKE '%' + Word + '%'

    GROUP BY Word

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2
  • clayman (6/17/2014)


    Thank you for your quick response. Table @t may store up to 25,000 rows (I used table variable just for demonstration purposes) and yes it is a varchar(max).

    Yes, I would like "MySQL" to be included too.

    Thank you.

    Understood that it IS a VARCHAR(MAX). I believe the question is, does it have to be? The reason why everyone is asking that is because there's a function that might be able to do a whole lot more than you might expect for such a thing. The only trouble with it is that it runs twice as slow if it's converted to VARCHAR(MAX).

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (6/17/2014)


    clayman (6/17/2014)


    Thank you for your quick response. Table @t may store up to 25,000 rows (I used table variable just for demonstration purposes) and yes it is a varchar(max).

    Yes, I would like "MySQL" to be included too.

    Thank you.

    Understood that it IS a VARCHAR(MAX). I believe the question is, does it have to be? The reason why everyone is asking that is because there's a function that might be able to do a whole lot more than you might expect for such a thing. The only trouble with it is that it runs twice as slow if it's converted to VARCHAR(MAX).

    The data type in the database is TEXT (I know..) so I'm converting it to VARCHAR(MAX) in the stored procedure. Whether it has to be VARCHAR(MAX) or not, I'm not sure about that. All I know these string can pretty long and I don't want to lose any data.

  • clayman (6/17/2014)


    Thank you!

    I have two follow up questions.

    Is there a way to change the query so that it only counts the word(s) once per row?

    Also, I would like to count the rows that do not include any of those words and return the count in the result set (maybe with a union).

    Thank you for your time.

    Yes, try this

    select isnull(Word,'No Match') as Word,sum(1) as count

    from @t t

    left outer join @word as w

    on t.string like '%'+w.word+'%'

    group by Word

    MM



    select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);

  • Forum Etiquette: How to post Reporting Services problems
  • [/url]
  • Forum Etiquette: How to post data/code on a forum to get the best help - by Jeff Moden
  • [/url]
  • How to Post Performance Problems - by Gail Shaw
  • [/url]

  • Thank you.

    I would also like to search for synonym words, however they should not be counted separately. Note, these synonyms could also be placed in a new table, they don't have to be comma separated strings.

    Would full text indexing make more sense in this scenario (then I could just add those synonyms in the tsenu.xml file)? Thank you.

    DECLARE @t TABLE (Id INT IDENTITY(1,1), String VARCHAR(MAX))

    INSERT INTO @t

    SELECT 'I bought a new shirt, I purchased a new t-shirt.' AS String UNION ALL

    SELECT 'He looks very miserable today. He looks very sad today.' UNION ALL

    SELECT 'I’ve been searching for that book for weeks!I’ve been looking for that book for weeks!' UNION ALL

    SELECT 'He looks sad since he could not afford to buy a shirt'

    DECLARE @word TABLE (Id INT IDENTITY(1,1), Word VARCHAR(100), WordSyn VARCHAR(500))

    INSERT INTO @word

    SELECT 'buy' AS Word,'bought, purchase, purchased, obtain, obtained' AS WordSynUNION ALL

    SELECT 'sad','miserable, unhappy, depressed, gloomy'UNION ALL

    SELECT 'search','look for, looking for, looked for, searching for, search for, searched for'

    select isnull(Word,'No Match') as Word,sum(1) as count

    from @t t

    left outer join @word as w

    on t.string like '%'+w.word+'%'

    group by Word

    -- Expected output would be

    select 'buy' as [Word], 3 as [count] UNION ALL

    select 'sad', 3 UNION ALL

    select 'search', 2

  • Viewing 11 posts - 1 through 10 (of 10 total)

    You must be logged in to reply to this topic. Login to reply