Partial match & Update

  • I need update the AppID in table1 by comparing Table2. the data looks like below. Here the problem is AppNames in the both tables are not exactly same but technically they are same with partial match.

    Table1

    AppID||||Appname

    Null ||||CSO Services Operations

    Null |||| CSP

    Null ||||AAC Claims Systems

    Table 2

    AppID|||||Appname

    1 |||||Corporate Services Operations(CSO)

    2 |||||Credit Servicing Portal(CSP)

    3 |||||American Assuarance Company (AAC) Claims Systems

    Please let me know how to get rid of this task.

    thanks

  • you are going to have to cleanup the data, it looks to me;

    in your example, using charindex, you only get a single match.

    unless you can guarantee the three letter abbreviations won't return false matches ("COR" would find "Google Corporation", for example), i don't see anything to reliably join the data.

    /*

    AppIDAppnameAppIDAppname

    NullCSP2Credit Servicing Portal(CSP)

    */

    ;With Table1([AppID],[Appname])

    AS

    (

    SELECT 'Null','CSO Services Operations' UNION ALL

    SELECT 'Null','CSP' UNION ALL

    SELECT 'Null','AAC Claims Systems'

    ),Table2([AppID],[Appname])

    AS

    (

    SELECT '1','Corporate Services Operations(CSO)' UNION ALL

    SELECT '2','Credit Servicing Portal(CSP)' UNION ALL

    SELECT '3','American Assuarance Company (AAC) Claims Systems'

    )

    SELECT *

    FROM Table1

    INNER JOIN Table2

    ON CHARINDEX(Table1.Appname,Table2.Appname) > 0

    Lowell


    --help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

  • Hi Lowell,

    Thanks for your pormpt reply. But i do have 1000 records to macth in both tables. So, we have to put all the thousand values in the way you mentioned?? or do we have any other way to do it

    Please advise me

    Thanks

  • DBA12345 (7/3/2013)


    Hi Lowell,

    Thanks for your pormpt reply. But i do have 1000 records to macth in both tables. So, we have to put all the thousand values in the way you mentioned?? or do we have any other way to do it

    Please advise me

    Thanks

    well the obvious fix is hard work. you've inherited an app that didn't normalize these fields, and now you need to sanitize and normalize it after the fact...it's not fun.

    1. you have to create a master table for the possible values of AppName, with an identity primary key/unique constraint on Appname., making sure to add an acronym for versatility matching.

    my wild guess:

    CREATE TABLE [dbo].[APPNAMELOOKUP] (

    [APPNAMEID] INT IDENTITY(1,1) NOT NULL,

    APPACRONYM VARCHAR(3),

    [APPNAME] VARCHAR(100) NOT NULL,

    CONSTRAINT [PK__AppNameLookup__11C2C212] PRIMARY KEY CLUSTERED (AppNameID),

    CONSTRAINT [UQ__AppNameLookup__12B6E64B] UNIQUE NONCLUSTERED (AppName))

    2. you have to go through all the descriptions and select only ONE per "real" appname.

    3.Whatever application that allows users to freetext an appName must now be modified to select the value from a drop down menu instead.

    4. Your existing tables now need new column , which now point to the new table as a foreign key, instead of having descriptions

    5. that means a migration by analysis to update the new column to the "correct" value for the Designated Appname. This would probably be a series of scripts, That you custom write yourself, that updates based on comparing charindex like above, then trying for partial matches, and then by 3 letter acronyms, and then finally, for any that did not get caught in the above scripts, a stack of manually assigning records based on eyeball analysis.

    Lowell


    --help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

  • I'm with Lowell on cleaning up the data.

    However, this might help you, but you still have to review the matches given.

    ;With Table1([AppID],[Appname])

    AS

    (

    SELECT 'Null','CSO Services Operations' UNION ALL

    SELECT 'Null','CSP' UNION ALL

    SELECT 'Null','AAC Claims Systems'

    ),Table2([AppID],[Appname])

    AS

    (

    SELECT '1','Corporate Services Operations(CSO)' UNION ALL

    SELECT '2','Credit Servicing Portal(CSP)' UNION ALL

    SELECT '3','American Assuarance Company (AAC) Claims Systems'

    )

    SELECT *

    FROM Table1

    JOIN Table2 ON Table1.Appname LIKE SUBSTRING( Table2.Appname, CHARINDEX( '(',Table2.Appname) + 1, CHARINDEX( ')',Table2.Appname) - CHARINDEX( '(',Table2.Appname) - 1) + '%'

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2
  • DBA12345 (7/3/2013)


    I need update the AppID in table1 by comparing Table2. the data looks like below. Here the problem is AppNames in the both tables are not exactly same but technically they are same with partial match.

    Table1

    AppID||||Appname

    Null ||||CSO Services Operations

    Null |||| CSP

    Null ||||AAC Claims Systems

    Table 2

    AppID|||||Appname

    1 |||||Corporate Services Operations(CSO)

    2 |||||Credit Servicing Portal(CSP)

    3 |||||American Assuarance Company (AAC) Claims Systems

    Please let me know how to get rid of this task.

    thanks

    Are all of the AppNames in Table1 absolutely guaranteed to contain the abbreviation contained in the parenthesis of the AppNames in Table2 and are all the AppNames in Table2 absolutely guaranteed to contain the abbreviation in parenthesis?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Lot's of assumptions here including all of Jeff's, Lowell's that you don't have an abbreviation like COR that matches to Corporation and that also there are not extra parens in Table2, but here's a shot at it:

    WITH Table1([AppID],[Appname]) AS (

    SELECT 'Null','CSO Services Operations' UNION ALL

    SELECT 'Null',' CSP' UNION ALL

    SELECT 'Null','AAC Claims Systems'

    )

    ,Table2([AppID],[Appname]) AS (

    SELECT '1','Corporate Services Operations(CSO)' UNION ALL

    SELECT '2','Credit Servicing Portal(CSP)' UNION ALL

    SELECT '3','American Assuarance Company (AAC) Claims Systems'

    )

    SELECT a.*, e.*

    FROM Table2 a

    CROSS APPLY dbo.PatternSplitCM(a.Appname, '[()A-Za-z]') b

    CROSS APPLY (SELECT RIGHT(Item, LEN(Item)-CHARINDEX('(', Item))) c(Item2)

    CROSS APPLY (SELECT '%' + STUFF(Item2, LEN(Item2), 1, '') + '%') d(Item3)

    INNER JOIN Table1 e ON e.AppName LIKE Item3

    WHERE [Matched]=1 AND CHARINDEX('(', Item) > 0 AND CHARINDEX(')', Item) > 0

    PatternSplitCM can be found in the 4th article in my signature links.

    I wouldn't want to be responsible for this running in Prod though without a whole lot of real thorough testing through all the possible cases. Still, it might be possible to make it work if you could apply additional assumptions like that the abbreviation always starts in the first non-blank character of Table1.


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply