How to make Soundex functions include the first letter

  • I need DIFFERENCE('Kolton','Colton') to equal 4 rather than 3.

    But SOUNDEX keeps the first letter of the word:

    SOUNDEX('Kolton') = K435

    SOUNDEX('Colton') = C435

    Does anyone know a workaround?

    Thanks,

    Tom

  • trapnell (10/8/2015)


    I need DIFFERENCE('Kolton','Colton') to equal 4 rather than 3.

    But SOUNDEX keeps the first letter of the word:

    SOUNDEX('Kolton') = K435

    SOUNDEX('Colton') = C435

    Does anyone know a workaround?

    Thanks,

    Tom

    Not really sure what you are after here. SOUNDEX is a builtin function so you can't change how it works. Phonetically those two spellings should be the same thing.

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • Soundex has been around since 1918 and it's algorithm for letter substitution is documented here:

    https://en.wikipedia.org/wiki/Soundex

    Perhaps you can use REPLACE() function to massage the keywords before you do SOUNDEX() comparison.

    SOUNDEX(replace('Kolton','K','C')) = C435

    SOUNDEX(replace('Colton','K','C')) = C435

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

  • I'm not sure if this helps.

    I sure had fun replicating the SOUNDEX function which I then changed it into the DIFFERENCE function, all by keeping it as an inline table-valued function to help performance.

    CREATE FUNCTION iFullDifference(

    @String1 varchar(30),

    @String2 varchar(30)

    ) RETURNS TABLE WITH SCHEMABINDING

    AS

    RETURN

    WITH

    E(n) AS(

    SELECT n FROM (VALUES(0),(0),(0),(0),(0),(0),(0),(0),(0),(0))E(n)

    ),

    E2(n) AS(

    SELECT a.n FROM E a UNION ALL SELECT a.n FROM E a

    ),

    cteTally(n) AS(

    SELECT ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) n

    FROM E2

    ),

    cteNumbers AS(

    SELECT n,

    NULLIF(

    CASE WHEN SUBSTRING( @String1, n, 1) LIKE '[bfpv]' THEN 1

    WHEN SUBSTRING( @String1, n, 1) LIKE '[cgjkqsxz]' THEN 2

    WHEN SUBSTRING( @String1, n, 1) LIKE '[dt]' THEN 3

    WHEN SUBSTRING( @String1, n, 1) LIKE '[l]' THEN 4

    WHEN SUBSTRING( @String1, n, 1) LIKE '[mn]' THEN 5

    WHEN SUBSTRING( @String1, n, 1) LIKE '[r]' THEN 6

    END,

    CASE WHEN SUBSTRING( @String1, n - 1, 1) LIKE '[hw]'

    THEN

    CASE WHEN SUBSTRING( @String1, n - 2, 1) LIKE '[bfpv]' THEN 1

    WHEN SUBSTRING( @String1, n - 2, 1) LIKE '[cgjkqsxz]' THEN 2

    WHEN SUBSTRING( @String1, n - 2, 1) LIKE '[dt]' THEN 3

    WHEN SUBSTRING( @String1, n - 2, 1) LIKE '[l]' THEN 4

    WHEN SUBSTRING( @String1, n - 2, 1) LIKE '[mn]' THEN 5

    WHEN SUBSTRING( @String1, n - 2, 1) LIKE '[r]' THEN 6

    END

    ELSE

    CASE WHEN SUBSTRING( @String1, n - 1, 1) LIKE '[bfpv]' THEN 1

    WHEN SUBSTRING( @String1, n - 1, 1) LIKE '[cgjkqsxz]' THEN 2

    WHEN SUBSTRING( @String1, n - 1, 1) LIKE '[dt]' THEN 3

    WHEN SUBSTRING( @String1, n - 1, 1) LIKE '[l]' THEN 4

    WHEN SUBSTRING( @String1, n - 1, 1) LIKE '[mn]' THEN 5

    WHEN SUBSTRING( @String1, n - 1, 1) LIKE '[r]' THEN 6

    END

    END) num

    FROM cteTally

    WHERE SUBSTRING( @String1, n, 1) NOT LIKE '[aeiouyhw]'

    AND n <= LEN(@String1)

    ),

    cteNumbers2 AS(

    SELECT n,

    NULLIF(

    CASE WHEN SUBSTRING( @String2, n, 1) LIKE '[bfpv]' THEN 1

    WHEN SUBSTRING( @String2, n, 1) LIKE '[cgjkqsxz]' THEN 2

    WHEN SUBSTRING( @String2, n, 1) LIKE '[dt]' THEN 3

    WHEN SUBSTRING( @String2, n, 1) LIKE '[l]' THEN 4

    WHEN SUBSTRING( @String2, n, 1) LIKE '[mn]' THEN 5

    WHEN SUBSTRING( @String2, n, 1) LIKE '[r]' THEN 6

    END,

    CASE WHEN SUBSTRING( @String2, n - 1, 1) LIKE '[hw]'

    THEN

    CASE WHEN SUBSTRING( @String2, n - 2, 1) LIKE '[bfpv]' THEN 1

    WHEN SUBSTRING( @String2, n - 2, 1) LIKE '[cgjkqsxz]' THEN 2

    WHEN SUBSTRING( @String2, n - 2, 1) LIKE '[dt]' THEN 3

    WHEN SUBSTRING( @String2, n - 2, 1) LIKE '[l]' THEN 4

    WHEN SUBSTRING( @String2, n - 2, 1) LIKE '[mn]' THEN 5

    WHEN SUBSTRING( @String2, n - 2, 1) LIKE '[r]' THEN 6

    END

    ELSE

    CASE WHEN SUBSTRING( @String2, n - 1, 1) LIKE '[bfpv]' THEN 1

    WHEN SUBSTRING( @String2, n - 1, 1) LIKE '[cgjkqsxz]' THEN 2

    WHEN SUBSTRING( @String2, n - 1, 1) LIKE '[dt]' THEN 3

    WHEN SUBSTRING( @String2, n - 1, 1) LIKE '[l]' THEN 4

    WHEN SUBSTRING( @String2, n - 1, 1) LIKE '[mn]' THEN 5

    WHEN SUBSTRING( @String2, n - 1, 1) LIKE '[r]' THEN 6

    END

    END) num

    FROM cteTally

    WHERE SUBSTRING( @String2, n, 1) NOT LIKE '[aeiouyhw]'

    AND n <= LEN(@String2)

    )

    SELECT SUM( CASE WHEN ISNULL( a.num, 0) = ISNULL( b.num, 0) THEN 1 ELSE 0 END) Difference

    FROM cteTally t

    LEFT

    JOIN (

    SELECT TOP 4 ROW_NUMBER() OVER(ORDER BY n) n, num

    FROM cteNumbers

    WHERE num IS NOT NULL) a ON t.n = a.n

    LEFT

    JOIN (

    SELECT TOP 4 ROW_NUMBER() OVER(ORDER BY n) n, num

    FROM cteNumbers2

    WHERE num IS NOT NULL) b ON t.n = b.n

    WHERE t.n <= 4

    GO

    SELECT *, DIFFERENCE(String1, String2)

    FROM (VALUES( 'Kolton', 'Colton')) x(String1, String2) --This is sample data.

    CROSS APPLY iFullDifference(String1, String2)

    GO

    DROP FUNCTION iFullDifference

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2
  • Fascinating! You do that just for fun?

    There are several things in your code I've never seen before. For example, I don't understand why

    SELECT n FROM (VALUES(0),(0),(0),(0),(0),(0),(0),(0),(0),(0))E(n)

    works. I'm going to work through that code until I do understand.

    Thanks,

    Tom

  • trapnell (10/9/2015)


    Fascinating! You do that just for fun?

    There are several things in your code I've never seen before. For example, I don't understand why

    SELECT n FROM (VALUES(0),(0),(0),(0),(0),(0),(0),(0),(0),(0))E(n)

    works. I'm going to work through that code until I do understand.

    Thanks,

    Tom

    Some do jigsaw puzzles for fun, I do SQL puzzles. 😀

    That code works, starting on SQL Server 2008, as an implementation of Table Value Constructors. Basically, I'm concatenating "sets" of values, each set is a row and it can have several columns separated by commas between the parenthesis. In this case, it's a single column.

    With that, I'm just creating rows which will later be assigned a number to recreate a tally table[/url].

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2
  • trapnell (10/9/2015)


    Fascinating! You do that just for fun?

    There are several things in your code I've never seen before. For example, I don't understand why

    SELECT n FROM (VALUES(0),(0),(0),(0),(0),(0),(0),(0),(0),(0))E(n)

    works. I'm going to work through that code until I do understand.

    Thanks,

    Tom

    That is using a table valued constructor. You defined a list of values, give it an alias and define the column name(s). You can read more about this kind of thing here. https://msdn.microsoft.com/en-us/library/dd776382.aspx

    For an example like Luis is using scroll down to the examples section and look at example C. 🙂

    --EDIT--

    LOL Luis has the fastest fingers on the keyboard ever. Seems you beat me to the punch yet again old friend. 😛

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply