Function - Word count in a sentence

  • Comments posted to this topic are about the item Function - Word count in a sentence

  • declare @var1 varchar(max) = 'It''s now or never I ain''t gonna live forever'

    select len(@var1)-len(replace(@var1,' ','')) + 1

  • Thank you for script.

    But it's need to SqlServer 2012 because of lag function.

    why not just split & count?

    CREATE FUNCTION dbo.kg_ifn_StrToTable

    (

    @List NVARCHAR(MAX),

    @Delimiter NVARCHAR(1)

    )

    RETURNS TABLE

    WITH SCHEMABINDING

    AS

    RETURN

    (

    SELECT Sonuc = y.i.value('(./text())[1]', 'nvarchar(MAX)')

    FROM

    (

    SELECT x = CONVERT(XML, '<i>'

    + REPLACE(@List, @Delimiter, '</i><i>')

    + '</i>').query('.')

    ) AS a CROSS APPLY x.nodes('i') AS y(i)

    );

    GO

    Select COUNT(1) from dbo.kg_ifn_StrToTable('It''s now or never I ain''t gonna live forever', ' ')

    -- 300.000 word / 4 sec

    And it's running on SqlServer 2008

  • SQL Guy 1 (11/8/2016)


    declare @var1 varchar(max) = 'It''s now or never I ain''t gonna live forever'

    select len(@var1)-len(replace(@var1,' ','')) + 1

    My thoughts exactly. But you probably need to boil down double, triple, etc. spaces to a single space first. Something along the lines of:

    Replace (Replace (Replace (@Source, @CharacterToRemove + @CharacterToRemove, @CharacterToRemove + Char (8)), Char (8) + @CharacterToRemove, ''), Char (8), '')

  • julian.fletcher (11/11/2016)


    SQL Guy 1 (11/8/2016)


    declare @var1 varchar(max) = 'It''s now or never I ain''t gonna live forever'

    select len(@var1)-len(replace(@var1,' ','')) + 1

    My thoughts exactly. But you probably need to boil down double, triple, etc. spaces to a single space first. Something along the lines of:

    Replace (Replace (Replace (@Source, @CharacterToRemove + @CharacterToRemove, @CharacterToRemove + Char (8)), Char (8) + @CharacterToRemove, ''), Char (8), '')

    I'd be tempted to extend that to all non alphanumeric characters or apostrophies so it handles punctuation and multiple lines.

    Still would have problems if it needed handle something like an IP address, url or formatted phone numbers but you'd need to define the definition of a word a bit better in relation to such things.

  • Same Function - compatible with 2008:

    CREATE FUNCTION dbo.udf_WordCount_2008

    (

    --declare

    @STR VARCHAR(8000)

    --= 'It''s now or never I ain''t gonna live forever'

    )

    RETURNS TABLE AS RETURN

    WITH Tally (n) AS

    (

    SELECT TOP (LEN(@str)) ROW_NUMBER() OVER (ORDER BY (SELECT NULL))

    FROM (VALUES (0),(0),(0),(0),(0),(0),(0),(0)) a(n)

    CROSS JOIN (VALUES(0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) b(n)

    CROSS JOIN (VALUES(0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) c(n)

    CROSS JOIN (VALUES(0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) d(n)

    )

    , BreakChar as

    (

    SELECT SUBSTRING(@str , n , 1) [Char] , N

    FROM Tally

    )

    , Analize as

    (

    SELECT a.* , b.Char PrevChar

    FROM BreakChar a

    JOIN BreakChar b

    on a.n = b.n+1

    )

    SELECT WordCount = COUNT(1) + 1

    FROM Analize

    WHERE [Char] != PrevChar

    AND PrevChar = ' '

  • hkravitz (11/12/2016)


    Same Function - compatible with 2008:

    Thanks for the 2008 version.

  • Excellent. Thanks.

    Sometimes you need to use the function in a query, so that it needs to return INT instead of a set.
    This is a little amendment.

    CREATE FUNCTION [dbo].[fnWordCount] (@str VARCHAR(8000) )
    RETURNS INT AS
    BEGIN
    DECLARE @WordCount INT=0;
    WITH Tally (n) AS
    ( SELECT TOP (LEN(@str)) ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
    FROM(VALUES (0),(0),(0),(0),(0),(0),(0),(0)) a(n)
    CROSS JOIN (VALUES(0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) b(n)
    CROSS JOIN (VALUES(0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) c(n)
    CROSS JOIN (VALUES(0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) d(n)
    ),
    BreakChar as
    (
    SELECT SUBSTRING(@str , n , 1) [Char] , N
    FROM Tally
    ),
    Analize as
    (
    SELECT [Char], LAG([Char],1) OVER (ORDER BY N) PrevChar
    FROM BreakChar
    )
    SELECT @WordCount = COUNT(1) + 1
    FROM Analize
    WHERE [Char] != PrevChar AND PrevChar = ' ';
    RETURN @WordCount;
    END

    Igor Micev,My blog: www.igormicev.com

  • Ercüment EÅŸkar - Friday, November 11, 2016 12:43 AM

    Thank you for script.But it's need to SqlServer 2012 because of lag function.why not just split & count?CREATE FUNCTION dbo.kg_ifn_StrToTable( @List NVARCHAR(MAX), @Delimiter NVARCHAR(1))RETURNS TABLEWITH SCHEMABINDINGAS RETURN ( SELECT Sonuc = y.i.value('(./text())[1]', 'nvarchar(MAX)') FROM ( SELECT x = CONVERT(XML, '<i>' + REPLACE(@List, @Delimiter, '</i><i>') + '</i>').query('.') ) AS a CROSS APPLY x.nodes('i') AS y(i) );GOSelect COUNT(1) from dbo.kg_ifn_StrToTable('It''s now or never I ain''t gonna live forever', ' ') -- 300.000 word / 4 secAnd it's running on SqlServer 2008

    The XML method of splitting is quite slow compared to other methods especially if you don't actually need blobs.  Please see the following article on that.
    http://www.sqlservercentral.com/articles/Tally+Table/72993/

    If you want to save yourself a trip to the article, here's one of the performance charts from the article with split counts in the range of what you might expect in a sentence.  The XML splitter is as bad as a WHILE loop, mostly because of the concatenation that takes place in the XML Splitter.
    https://www.sqlservercentral.com/Forums/Uploads/Images/09c29891-14dd-4d83-b08e-b677.gif
    For splits with more elements but still in the range of VARCHAR(8000) (and, yes, there's a separate splitter for NVARCHAR(4000), please see the following chart.
    https://www.sqlservercentral.com/Forums/Uploads/Images/e64f1e79-1ace-4631-80a9-0f0e.gif
    The skinny Black lines in both charts represent a relatively newer version of the DelimitedSplit8K function and that's before the additional improvements that someone else made and certainly before the improvement for 2012 that Eirikur Erricson posted in the following article.
    http://www.sqlservercentral.com/Authors/Articles/Eirikur_Eiriksson/1346039/

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Igor Micev - Sunday, January 15, 2017 7:50 AM

    Excellent. Thanks.

    Sometimes you need to use the function in a query, so that it needs to return INT instead of a set.
    This is a little amendment.

    CREATE FUNCTION [dbo].[fnWordCount] (@str VARCHAR(8000) )
    RETURNS INT AS
    BEGIN
    DECLARE @WordCount INT=0;
    WITH Tally (n) AS
    ( SELECT TOP (LEN(@str)) ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
    FROM(VALUES (0),(0),(0),(0),(0),(0),(0),(0)) a(n)
    CROSS JOIN (VALUES(0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) b(n)
    CROSS JOIN (VALUES(0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) c(n)
    CROSS JOIN (VALUES(0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) d(n)
    ),
    BreakChar as
    (
    SELECT SUBSTRING(@str , n , 1) [Char] , N
    FROM Tally
    ),
    Analize as
    (
    SELECT [Char], LAG([Char],1) OVER (ORDER BY N) PrevChar
    FROM BreakChar
    )
    SELECT @WordCount = COUNT(1) + 1
    FROM Analize
    WHERE [Char] != PrevChar AND PrevChar = ' ';
    RETURN @WordCount;
    END

    Hi,

    Turning the function into a scalar function effects the performance, you can still use the function (inline) in a select statement using a cross apply clause and use any calculation you are trying to achieve.

Viewing 10 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply