Comparing strings

  • how to compare a column from one table with two columns in other table keeping the primary key same in both the tables?

    After comparing if the two columns in the table are 70% like the column in the first table, the function must return success else false should be returned.

    Please give any suggestions.

    Thanks

  • It's hard to help based on the information provided so far.

    Please read the first link in my signature and provide some sample data (together with table def and expected results) in a ready to use format.



    Lutz
    A pessimist is an optimist with experience.

    How to get fast answers to your question[/url]
    How to post performance related questions[/url]
    Links for Tally Table [/url] , Cross Tabs [/url] and Dynamic Cross Tabs [/url], Delimited Split Function[/url]

  • See table 1 has

    id,

    firstname,

    lastname,

    transactionamount

    table2 has

    id,

    fullname,

    transactionamount

    How to compare the firstname,lastname with fullname

    And after comaprision check whether the full name is 70% of the the both firrstname and lastname.

    Example:

    firstname: William

    lastname: John

    Fullname: Wil John

    After comaprision, full name is not 70% of the firstname+lastname,

    When this triggers, alert should be fired.

    Please suggest.

    Thanks.

  • Based on what formula should the percentage value be calculated?



    Lutz
    A pessimist is an optimist with experience.

    How to get fast answers to your question[/url]
    How to post performance related questions[/url]
    Links for Tally Table [/url] , Cross Tabs [/url] and Dynamic Cross Tabs [/url], Delimited Split Function[/url]

  • Beginner_2008 (11/5/2011)


    See table 1 has

    id,

    firstname,

    lastname,

    transactionamount

    table2 has

    id,

    fullname,

    transactionamount

    How to compare the firstname,lastname with fullname

    And after comaprision check whether the full name is 70% of the the both firrstname and lastname.

    Example:

    firstname: William

    lastname: John

    Fullname: Wil John

    After comaprision, full name is not 70% of the firstname+lastname,

    When this triggers, alert should be fired.

    Please suggest.

    Thanks.

    Ok... so is Wil John supposed to be a match for William John or not? Including spaces, there are 8 characters in Wil John and 12 characters in William John and that's only a 66.6666% match.

    As already stated, we don't know what you mean by 70%. What are the full set of rules for determining the match?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Beginner_2008 (11/5/2011)


    how to compare a column from one table with two columns in other table keeping the primary key same in both the tables?

    After comparing if the two columns in the table are 70% like the column in the first table, the function must return success else false should be returned.

    Please give any suggestions.

    Thanks

    70% is a figure plucked out of the air, correct?

    This so-called "fuzzy matching" is a mix of art and science. It's complicated and difficult. To do it with any degree of accuracy requires a number of algorithms. Each will have its own match percentage.

    Your best bet is to buy an off the shelf package. Failing that, SSIS has a fuzzy matching block, Google it. There are a number of algorithms which you can pick up as CLR-ready code such as Jaro-Winkler. In practice I've found that custom building yields better and often faster results but it does take a while to write. If you really wish to go down this route then token-matching is a good place to start: http://www.sqlservercentral.com/Forums/FindPost605062.aspx.

    Edit: You will need more than the name alone if you are working with larger data sets. Zip code is good, matched address better. The names table I'm currently working with has several thousand Smiths, of which perhaps two hundred are W Smith. The full address, matched to a certified standard, is required for this.


    [font="Arial"]Low-hanging fruit picker and defender of the moggies[/font]

    For better assistance in answering your questions, please read this[/url].


    Understanding and using APPLY, (I)[/url] and (II)[/url] Paul White[/url]

    Hidden RBAR: Triangular Joins[/url] / The "Numbers" or "Tally" Table: What it is and how it replaces a loop[/url] Jeff Moden[/url]

  • This should get you started:

    -- Create some sample data to play with

    DROP TABLE #Fullname

    CREATE TABLE #Fullname (ID int identity(1,1), Fullname VARCHAR(50))

    INSERT INTO #Fullname (fullname)

    SELECT 'Phill R Williams' UNION ALL

    SELECT 'P Smith' UNION ALL

    SELECT 'William Johns' UNION ALL

    SELECT 'Sam Brown'

    DROP TABLE #Splitname

    CREATE TABLE #Splitname (ID int identity(1,1), Firstname VARCHAR(50), Lastname VARCHAR(50))

    INSERT INTO #Splitname (Firstname, Lastname)

    SELECT 'Phil', 'Williams' UNION ALL

    SELECT 'Peter', 'Smith' UNION ALL

    SELECT 'Will', 'Johns' UNION ALL

    SELECT 'Sam', 'Browne'

    SELECT

    r.*,

    t.Fullname,

    m2.*,

    [Match%] = CAST((m2.FirstnameMatchPC + m2.LastnameMatchPC)/2.00 AS DECIMAL(5,2))

    FROM #Splitname r

    INNER JOIN #Fullname t ON t.ID = r.ID

    -- find the position of the rightmost space in the surname

    -- this is assumed to be the space between forename(s) and surname

    -- use it to split the fullname string into firstname and lastname

    CROSS APPLY(

    SELECT

    Firstname = LEFT(fullname,Split.Position-1),

    Lastname = SUBSTRING(fullname,Split.Position+1,(DATALENGTH(fullname)-Split.Position+1))

    FROM (

    SELECT

    Position = MAX(n)

    FROM ( -- create sequential list of numbers where largest = length of t.fullname

    SELECT n = t1.n+t2.n

    FROM (VALUES (0),(10),(20),(30),(40),(50),(60),(70),(80),(90)) t1(n)

    CROSS JOIN (VALUES (1),(2),(3),(4),(5),(6),(7),(8),(9),(10)) t2(n)

    WHERE t1.n+t2.n <= (ISNULL(DATALENGTH(t.fullname),0))

    ) tally

    WHERE SUBSTRING(t.fullname,n,1) = ' '

    ) Split

    ) t2

    -- attempt to match at the simplest level

    CROSS APPLY (

    SELECT

    FirstnameMatchPC = CASE

    WHEN t2.Firstname = r.Firstname THEN 100

    WHEN (len(t2.Firstname) = 1 OR len(r.Firstname) = 1) AND LEFT(t2.Firstname,1) = LEFT(r.Firstname,1) THEN 20

    WHEN (t2.Firstname LIKE r.Firstname + '%' OR r.Firstname LIKE t2.Firstname + '%') THEN 50

    ELSE 0 END,

    LastnameMatchPC = CASE WHEN t2.Lastname = r.Lastname THEN 100 ELSE 0 END

    ) m1

    -- when simple matching fails, get the hammer out

    CROSS APPLY (

    SELECT

    FirstnameMatchPC = CASE

    WHEN m1.FirstnameMatchPC > 0 THEN m1.FirstnameMatchPC -- carry forward the result from previous step / only use token matching where necessary

    ELSE (

    SELECT MatchPC = CAST(100*(COUNT(*) / (LEN(r.Firstname)-2.00)) AS DECIMAL(5,2))

    FROM (

    SELECT n = t1.n+t2.n

    FROM (VALUES (0),(10),(20),(30),(40),(50),(60),(70),(80),(90)) t1(n)

    CROSS JOIN (VALUES (1),(2),(3),(4),(5),(6),(7),(8),(9),(10)) t2(n)

    WHERE t1.n+t2.n <= (ISNULL(DATALENGTH(r.Firstname),0))

    ) Tally

    WHERE CHARINDEX(SUBSTRING(r.Firstname, Tally.n, 3), t2.Firstname) > 0

    AND LEN(SUBSTRING(r.Firstname, Tally.n, 3)) = 3

    )

    END,

    LastnameMatchPC = CASE

    WHEN m1.LastnameMatchPC > 0 THEN m1.LastnameMatchPC

    ELSE (

    SELECT MatchPC = CAST(100*(COUNT(*) / (LEN(r.Lastname)-2.00)) AS DECIMAL(5,2))

    FROM (

    SELECT n = t1.n+t2.n

    FROM (VALUES (0),(10),(20),(30),(40),(50),(60),(70),(80),(90)) t1(n)

    CROSS JOIN (VALUES (1),(2),(3),(4),(5),(6),(7),(8),(9),(10)) t2(n)

    WHERE t1.n+t2.n <= (ISNULL(DATALENGTH(r.Lastname),0))

    ) Tally

    WHERE CHARINDEX(SUBSTRING(r.Lastname, Tally.n, 3), t2.Lastname) > 0

    AND LEN(SUBSTRING(r.Lastname, Tally.n, 3)) = 3

    )

    END

    ) m2


    [font="Arial"]Low-hanging fruit picker and defender of the moggies[/font]

    For better assistance in answering your questions, please read this[/url].


    Understanding and using APPLY, (I)[/url] and (II)[/url] Paul White[/url]

    Hidden RBAR: Triangular Joins[/url] / The "Numbers" or "Tally" Table: What it is and how it replaces a loop[/url] Jeff Moden[/url]

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply