String Comparison function

  • Hi All,

    I am trying to write a function to compare the characters between 2 strings and eliminate the similarities to be able to return at the end the number of differences between them.

    Having in mind i need the bigger number of differences to be returned also if a character is repeated in one of the 2 words it will be eliminated once because it exist only one time in other string.

    I will give an example below to be more clear

    --Start

    declare @string1 as varchar(50)='imos'

    declare @string2 as varchar(50)='nasos';

    WITH n (n) AS (

    SELECT 1 FROM (VALUES (1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) n (n)

    ), Tally (n) AS (

    SELECT TOP (SELECT MAX(LEN(x.String)) FROM (VALUES (@String1), (@String2)) x (String))

    ROW_NUMBER() OVER (ORDER BY (SELECT NULL))

    FROM

    n n1, n n2, n n3

    ), Alphas AS (

    SELECT a.Letter FROM (VALUES ('a'),('b'),('c'),('d'),('e'),('f'),('g'),('h'),('i'),('j'),('k'),

    ('l'),('m'),('n'),('o'),('p'),('q'),('r'),('s'),('t'),('u'),('v'),('w'),('x'),('y'),('z')) a (Letter)

    ), Split1 (Letter) AS (

    SELECT

    SUBSTRING(@String1, t.n, 1)

    FROM

    Tally t

    WHERE

    SUBSTRING(@String1, t.n, 1) IN (SELECT a.Letter FROM Alphas a)

    ), split2 (Letter) AS (

    SELECT

    SUBSTRING(@String2, t.n, 1)

    FROM

    Tally t

    WHERE

    SUBSTRING(@String2, t.n, 1) IN (SELECT a.Letter FROM Alphas a))

    select Letter,1 from Split1 st1 union all select letter,2 from split2 st2

    --End

    The differences in first string from second one are 2 (i,m) while the differences in second string from first one are 3(nas).

    So the function should return 3 in previous example.

    Thanks in advance

    Nader

  • Here's one, but it's not quick and I can't think of a way to make it quick, but maybe someone else can.

    CREATE FUNCTION FindDifferenceCount(@string1 AS VARCHAR(50), @string2 AS VARCHAR(50))

    RETURNS TABLE

    WITH SCHEMABINDING

    AS

    RETURN(

    WITH

    N(N) AS (

    SELECT 1 FROM (VALUES (1),(1),(1),(1),(1),(1),(1),(1)) N (N)

    ) -- 8 rows

    , TALLY (N) AS (

    SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM N N1, N N2

    ) -- 64 rows (input strings are 50, so 64 is more than enough)

    , LETTERS1(L,P) AS (

    SELECT SUBSTRING(@string1, T.N, 1), ROW_NUMBER() OVER(PARTITION BY SUBSTRING(@string1, T.N, 1) ORDER BY (SELECT NULL))

    FROM TALLY T

    WHERE SUBSTRING(@string1, T.N, 1) LIKE '[a-zA-Z]' COLLATE Latin1_General_BIN

    ) -- Select only the letters and an ordinal from @string1

    , LETTERS2(L,P) AS (

    SELECT SUBSTRING(@string2, T.N, 1), ROW_NUMBER() OVER(PARTITION BY SUBSTRING(@string2, T.N, 1) ORDER BY (SELECT NULL))

    FROM TALLY T

    WHERE SUBSTRING(@string2, T.N, 1) LIKE '[a-zA-Z]' COLLATE Latin1_General_BIN

    )

    SELECT DISTINCT

    CASE -- Find the largest number of differences

    WHEN COUNT(R1.L) OVER() > COUNT(R2.L) OVER() THEN COUNT(R1.L) OVER()

    ELSE COUNT(R2.L) OVER()

    END AS DIFFERENCES

    FROM LETTERS1 R1

    FULL OUTER JOIN LETTERS2 R2

    ON R2.L = R1.L

    AND R2.P = R1.P

    WHERE R1.L IS NULL OR R2.L IS NULL -- We only care if there is no match

    );

    MM



    select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);

  • Forum Etiquette: How to post Reporting Services problems
  • [/url]
  • Forum Etiquette: How to post data/code on a forum to get the best help - by Jeff Moden
  • [/url]
  • How to Post Performance Problems - by Gail Shaw
  • [/url]

  • Thank you very much mister.magoo.

    Your solution is great, the problem i had in mine is when a letter was duplicated in one of the words which is S in my example, it gets eliminated because it existed once in other string, while what i needed is to count it.

    Not sure how did you solve that in your solution.

    Thanks Nader

  • To account for repeated letters, I use row_number() partitioned by letter to assign a "positional" value.

    So, for the word "test", the results of CTE "LETTERS1" would be

    L P

    e 1

    s 1

    t 1

    t 2 <-- this is the second "t"

    Then when I match to the other string, I do the join on "L" and "P".

    MM



    select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);

  • Forum Etiquette: How to post Reporting Services problems
  • [/url]
  • Forum Etiquette: How to post data/code on a forum to get the best help - by Jeff Moden
  • [/url]
  • How to Post Performance Problems - by Gail Shaw
  • [/url]

  • mister.magoo (10/14/2015)


    To account for repeated letters, I use row_number() partitioned by letter to assign a "positional" value.

    So, for the word "test", the results of CTE "LETTERS1" would be

    L P

    e 1

    s 1

    t 1

    t 2 <-- this is the second "t"

    Then when I match to the other string, I do the join on "L" and "P".

    Great thank you very much

  • Viewing 5 posts - 1 through 4 (of 4 total)

    You must be logged in to reply to this topic. Login to reply