Comparing tables for Like values

  • Hello,

    First post on this site; and its been a great tool for me, but I havent been able to find exactly what Im looking for so Im posing on the forums! Hope someone can lend a hand:

    I have a table of raw data with supplier names, and i need to join it to our supplier database and pull the supplier numbers.

    The issue is that the raw data does not match our database entries for these suppliers; sometimes there are extra periods, commas, or abbreviations (i.e. FedEx, FederalExpress, FedEx, inc.) etc. I'm trying to create a query that will search for entries that are similar. I tried setting a variable to be equal to the raw data field, and then using a LIKE '%@Variable%' to try and return anything that would contain it, but it didnt return any rows.

    Any help would be greatly appreciated!

    Thanks!

  • slevin37 (7/17/2015)


    Hello,

    First post on this site; and its been a great tool for me, but I havent been able to find exactly what Im looking for so Im posing on the forums! Hope someone can lend a hand:

    I have a table of raw data with supplier names, and i need to join it to our supplier database and pull the supplier numbers.

    The issue is that the raw data does not match our database entries for these suppliers; sometimes there are extra periods, commas, or abbreviations (i.e. FedEx, FederalExpress, FedEx, inc.) etc. I'm trying to create a query that will search for entries that are similar. I tried setting a variable to be equal to the raw data field, and then using a LIKE '%@Variable%' to try and return anything that would contain it, but it didnt return any rows.

    Any help would be greatly appreciated!

    Thanks!

    Hi and welcome to the forums! It is not totally clear from your post what you are trying but I think you are trying to do a wildcard search with a variable? When you type your variable name the way you did it will try to find rows that string literal, not the value of the variable. You would need to do something like this:

    '%' + @Variable + '%'

    Unless you really need to use that leading wildcard it would be better if you don't. That renders your query nonSARGable and any indexes on that column would be rendered useless.

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • A couple of things that may help.

    First do a Google search for "Fuzzy SQL Match". Some of the results might be helpful.

    Second, I've have to do something similar before. I sometimes made use of the SOUNDEX when comparing names, as well as as replacing all punctuation characters with nothing ('').



    Alvin Ramard
    Memphis PASS Chapter[/url]

    All my SSC forum answers come with a money back guarantee. If you didn't like the answer then I'll gladly refund what you paid for it.

    For best practices on asking questions, please read the following article: Forum Etiquette: How to post data/code on a forum to get the best help[/url]

  • EDIT*

  • Sean Lange (7/17/2015)


    slevin37 (7/17/2015)


    Hello,

    First post on this site; and its been a great tool for me, but I havent been able to find exactly what Im looking for so Im posing on the forums! Hope someone can lend a hand:

    I have a table of raw data with supplier names, and i need to join it to our supplier database and pull the supplier numbers.

    The issue is that the raw data does not match our database entries for these suppliers; sometimes there are extra periods, commas, or abbreviations (i.e. FedEx, FederalExpress, FedEx, inc.) etc. I'm trying to create a query that will search for entries that are similar. I tried setting a variable to be equal to the raw data field, and then using a LIKE '%@Variable%' to try and return anything that would contain it, but it didnt return any rows.

    Any help would be greatly appreciated!

    Thanks!

    Hi and welcome to the forums! It is not totally clear from your post what you are trying but I think you are trying to do a wildcard search with a variable? When you type your variable name the way you did it will try to find rows that string literal, not the value of the variable. You would need to do something like this:

    '%' + @Variable + '%'

    Unless you really need to use that leading wildcard it would be better if you don't. That renders your query nonSARGable and any indexes on that column would be rendered useless.

    Thanks! I tried your suggestion but got an error that the subquery returned more than one value. I tried selecting distinct but still to no avail. I know its probably impossible to really pinpoint any errors without physically seeing the database, but do you have any suggestions as to what may be throwing it off?

  • slevin37 (7/17/2015)


    Alvin Ramard (7/17/2015)


    A couple of things that may help.

    First do a Google search for "Fuzzy SQL Match". Some of the results might be helpful.

    Second, I've have to do something similar before. I sometimes made use of the SOUNDEX when comparing names, as well as as replacing all punctuation characters with nothing ('').

    Thanks! I tried your suggestion but got an error that the subquery returned more than one value. I tried selecting distinct but still to no avail. I know its probably impossible to really pinpoint any errors without physically seeing the database, but do you have any suggestions as to what may be throwing it off?

    Can you share the query? Without at least that we are shooting clay pigeons in the dark.

    From the error I suspect there is a logic error in your query because the error message is very clear that you have a subquery that is returning multiple rows when it should be returning only 1.

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • I should have said that the ideas I mentioned are not intended to be used by themselves, but can be used to find items that are potential matches. Soundex will often match words that sound similar but are definitely diffferent.



    Alvin Ramard
    Memphis PASS Chapter[/url]

    All my SSC forum answers come with a money back guarantee. If you didn't like the answer then I'll gladly refund what you paid for it.

    For best practices on asking questions, please read the following article: Forum Etiquette: How to post data/code on a forum to get the best help[/url]

  • Sean Lange (7/17/2015)


    slevin37 (7/17/2015)


    Alvin Ramard (7/17/2015)


    A couple of things that may help.

    First do a Google search for "Fuzzy SQL Match". Some of the results might be helpful.

    Second, I've have to do something similar before. I sometimes made use of the SOUNDEX when comparing names, as well as as replacing all punctuation characters with nothing ('').

    Thanks! I tried your suggestion but got an error that the subquery returned more than one value. I tried selecting distinct but still to no avail. I know its probably impossible to really pinpoint any errors without physically seeing the database, but do you have any suggestions as to what may be throwing it off?

    Can you share the query? Without at least that we are shooting clay pigeons in the dark.

    From the error I suspect there is a logic error in your query because the error message is very clear that you have a subquery that is returning multiple rows when it should be returning only 1.

    Sure thing. For now, all i'm trying to do is select the fields from both to compare them and see how much different the spellings are:

    Declare @supplier varchar(255)

    Set @Supplier = (Select vendor_name from [suppliers])

    Select mapped_supplier_name, supp.vendor_name from [TBL_RAWDATA] left join [suppliers] as supp on supp.[vendor_name] = [mapped_supplier_name] where Mapped_Supplier_Name LIKE @Supplier + '%'

  • slevin37 (7/17/2015)


    Sure thing. For now, all i'm trying to do is select the fields from both to compare them and see how much different the spellings are:

    Declare @supplier varchar(255)

    Set @Supplier = (Select vendor_name from [suppliers])

    Select mapped_supplier_name, supp.vendor_name from [TBL_RAWDATA] left join [suppliers] as supp on supp.[vendor_name] = [mapped_supplier_name] where Mapped_Supplier_Name LIKE @Supplier + '%'

    Ahh here we go. The problem is the subquery. It will return every vendor_name from the suppliers table. Since you did this as a subquery it will fail because the subquery returns more than 1 row. How would it know which one to use?

    I like Alvin's recommendation of SOUNDEX. Since this is just a one off query for analysis we don't have to beat ourselves up for performance. Something like this might get you close.

    select *

    from suppliers

    where SOUNDEX(vendor_name) in

    (

    Select SOUNDEX(vendor_name)

    from [suppliers]

    group by SOUNDEX(vendor_name)

    having COUNT(*) > 1

    )

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • Sean Lange (7/17/2015)


    slevin37 (7/17/2015)


    Sure thing. For now, all i'm trying to do is select the fields from both to compare them and see how much different the spellings are:

    Declare @supplier varchar(255)

    Set @Supplier = (Select vendor_name from [suppliers])

    Select mapped_supplier_name, supp.vendor_name from [TBL_RAWDATA] left join [suppliers] as supp on supp.[vendor_name] = [mapped_supplier_name] where Mapped_Supplier_Name LIKE @Supplier + '%'

    Ahh here we go. The problem is the subquery. It will return every vendor_name from the suppliers table. Since you did this as a subquery it will fail because the subquery returns more than 1 row. How would it know which one to use?

    I like Alvin's recommendation of SOUNDEX. Since this is just a one off query for analysis we don't have to beat ourselves up for performance. Something like this might get you close.

    select *

    from suppliers

    where SOUNDEX(vendor_name) in

    (

    Select SOUNDEX(vendor_name)

    from [suppliers]

    group by SOUNDEX(vendor_name)

    having COUNT(*) > 1

    )

    where would the join for the other table come into this?

    edit* - I added in the join, but this is still giving me exact matches only. I know there are records that are similar to some of these, but only my exact matches are showing up in the query:

    select vendor_name, [TBL_RAWDATA].Mapped_Supplier_Name

    from suppliers left join [TBL_RAWDATA] on vendor_name = Mapped_Supplier_Name

    where SOUNDEX(vendor_name) in

    (

    Select SOUNDEX(mapped_supplier_name)

    from [TBL_RAWDATA]

    group by SOUNDEX(Mapped_Supplier_Name)

    having COUNT(*) > 1

    )

  • slevin37 (7/17/2015)


    Sean Lange (7/17/2015)


    slevin37 (7/17/2015)


    Sure thing. For now, all i'm trying to do is select the fields from both to compare them and see how much different the spellings are:

    Declare @supplier varchar(255)

    Set @Supplier = (Select vendor_name from [suppliers])

    Select mapped_supplier_name, supp.vendor_name from [TBL_RAWDATA] left join [suppliers] as supp on supp.[vendor_name] = [mapped_supplier_name] where Mapped_Supplier_Name LIKE @Supplier + '%'

    Ahh here we go. The problem is the subquery. It will return every vendor_name from the suppliers table. Since you did this as a subquery it will fail because the subquery returns more than 1 row. How would it know which one to use?

    I like Alvin's recommendation of SOUNDEX. Since this is just a one off query for analysis we don't have to beat ourselves up for performance. Something like this might get you close.

    select *

    from suppliers

    where SOUNDEX(vendor_name) in

    (

    Select SOUNDEX(vendor_name)

    from [suppliers]

    group by SOUNDEX(vendor_name)

    having COUNT(*) > 1

    )

    where would the join for the other table come into this?

    Well now I am a little confused. I thought you were looking for similar spellings of vendor in the suppliers table. Are you trying to join these tables on similar spellings? If so you could use SOUNDEX in your join, but the performance is going to suffer. If I really understood what you are trying to do I could be more helpful. I am still mostly guessing at what you want.

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • Sorry about that; Let me try to explain the whole purpose:

    We have a raw data file with mapped supplier names. These "normalized" mapped names are not exact matches to our supplier table, so i need to see a comparison of the supplier names (from the supplier table) and how the mapped supplier names compare. So essentially, I need to pull data from the two different tables and compare them on likeness. Make sense?

  • slevin37 (7/17/2015)


    Sorry about that; Let me try to explain the whole purpose:

    We have a raw data file with mapped supplier names. These "normalized" mapped names are not exact matches to our supplier table, so i need to see a comparison of the supplier names (from the supplier table) and how the mapped supplier names compare. So essentially, I need to pull data from the two different tables and compare them on likeness. Make sense?

    No problem. Explaining technical problems is not easy. Something like this maybe?

    Select mapped_supplier_name

    , supp.vendor_name

    from [TBL_RAWDATA]

    join [suppliers] as supp on SOUNDEX(supp.[vendor_name]) = SOUNDEX([mapped_supplier_name])

    where Mapped_Supplier_Name <> supp.[vendor_name]

    That is my last guess. If that doesn't do it then you need to post ddl and sample data along with the desired output so we aren't guessing anymore. Please take a few minutes and read the first link in my signature for best practices when posting questions.

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • Sean Lange (7/17/2015)


    slevin37 (7/17/2015)


    Sorry about that; Let me try to explain the whole purpose:

    We have a raw data file with mapped supplier names. These "normalized" mapped names are not exact matches to our supplier table, so i need to see a comparison of the supplier names (from the supplier table) and how the mapped supplier names compare. So essentially, I need to pull data from the two different tables and compare them on likeness. Make sense?

    No problem. Explaining technical problems is not easy. Something like this maybe?

    Select mapped_supplier_name

    , supp.vendor_name

    from [TBL_RAWDATA]

    join [suppliers] as supp on SOUNDEX(supp.[vendor_name]) = SOUNDEX([mapped_supplier_name])

    where Mapped_Supplier_Name <> supp.[vendor_name]

    That is my last guess. If that doesn't do it then you need to post ddl and sample data along with the desired output so we aren't guessing anymore. Please take a few minutes and read the first link in my signature for best practices when posting questions.

    Thanks, and I definitely will do! Thanks for all the help regardless!

Viewing 14 posts - 1 through 14 (of 14 total)

You must be logged in to reply to this topic. Login to reply