Soundex-Like functionality

  • Background

    I have inherited a system which has a table that stores CompanyName , and a column called ‘soundexe’; soundexe contains letters which are then compared to other company names using Like.

    CREATE TABLE [dbo].[Commission_Soundex] (

    [SoundID] [int] IDENTITY (1, 1) NOT NULL ,

    [Soundexe] [varchar] (50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,

    [CompanyName] [varchar] (50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,

    [LeftOn] [bit] NOT NULL CONSTRAINT [DF_Commission_Soundex_LeftOn] DEFAULT (1),

    [RightOn] [bit] NOT NULL CONSTRAINT [DF_Commission_Soundex_RightOn] DEFAULT (1),

    [NotLike] [bit] NOT NULL CONSTRAINT [DF_Commission_Soundex_BothOn] DEFAULT (1),

    [BothOff] [bit] NOT NULL CONSTRAINT [DF_Commission_Soundex_BothOff] DEFAULT (0),

    CONSTRAINT [PK_Commission_Soundex] PRIMARY KEY CLUSTERED

    Here are a few rows of the table for you to consider:

    SoundIDSoundexeCompanyNameLeftOnRightOnNotLikeBothOff

    1ABNABM Ambro Vancouver1100

    2ABNABN_AMRO1100

    3ACCENTUAccenture1100

    4AEGONAEGON0001

    5AG EDAG Edwards1100

    6AICAIC Limited0100

    7AIGAIG 20060100

    8TRIMAAIM TRIMARK1100

    9ALLIANCE CAAlliance Capital Management1100

    10ALLIANZAllianz0001

    11ALLSTATAllstate Financial1100

    The Problem and Question

    My problem is this – I need to update this table with hundreds of companies and update this ‘soundexe’ column also. But I can not find the algorithm which was used to create it.

    My hope is that someone will look at this data and tell me some wonderful function that would take the company name ‘AIM TRIMARK’ and convert it to ‘TRIMA’.

    Thanks so much!!!

    p.s. I have attached a spreadsheet of the table for those of you wanting more examples.

  • Uh, unless I don't quite understand your question, this is just a simple update statement.

    DECLARE @Commission_Soundex TABLE (

    [SoundID] [int] IDENTITY (1, 1) NOT NULL ,

    [Soundexe] [varchar] (50) NULL ,

    [CompanyName] [varchar] (50) NULL ,

    [LeftOn] [bit] NOT NULL DEFAULT (1),

    [RightOn] [bit] NOT NULL DEFAULT (1),

    [NotLike] [bit] NOT NULL DEFAULT (1),

    [BothOff] [bit] NOT NULL DEFAULT (0)

    )

    INSERT INTO @Commission_Soundex (SoundExe, CompanyName, LeftOn, RightOn, NotLike, BothOff)

    SELECT 'ABN', 'ABM Ambro Vancouver', 1, 1, 0, 0 UNION ALL

    SELECT 'ABN', 'ABN_AMRO', 1, 1, 0, 0 UNION ALL

    SELECT 'ACCENTU', 'Accenture', 1, 1, 0, 0 UNION ALL

    SELECT 'AEGON', 'AEGON', 0, 0, 0, 1 UNION ALL

    SELECT 'AG ED', 'AG Edwards', 1, 1, 0, 0 UNION ALL

    SELECT 'AIC', 'AIC Limited', 0, 1, 0, 0 UNION ALL

    SELECT 'AIG', 'AIG 2006', 0, 1, 0, 0 UNION ALL

    SELECT 'TRIMA', 'AIM TRIMARK', 1, 1, 0, 0 UNION ALL

    SELECT 'ALLIANCE CA', 'Alliance Capital Management', 1, 1, 0, 0 UNION ALL

    SELECT 'ALLIANZ', 'Allianz', 0, 0, 0, 1 UNION ALL

    SELECT 'ALLSTAT', 'Allstate Financial', 1, 1, 0, 0

    SELECT * FROM @Commission_Soundex

    UPDATE @Commission_Soundex

    SET CompanyName = 'TRIMA'

    WHERE CompanyName = 'AIM TRIMARK'

    SELECT * FROM @Commission_Soundex

    John Rowan

    ======================================================
    ======================================================
    Forum Etiquette: How to post data/code on a forum to get the best help[/url] - by Jeff Moden

  • Thanks, but I must not have explained myself well. I don't know what the algorithm is that converts the 'CompanyName' to 'Soundexe'. For example, CompanyName = 'Accenture', and Soundexe = 'ACCENTU'.

    At first I thought it was simply a 'soundex' function - but it is not. I really don't think that anyone could crack this, but I am hopeful!!!

  • What makes you think that the SoundExe value is derived from the CompanyName? What does the SoundExe values represent?

    John Rowan

    ======================================================
    ======================================================
    Forum Etiquette: How to post data/code on a forum to get the best help[/url] - by Jeff Moden

  • I think that the Soundexe column is derived from the company name because in a Stored Procedure used later, the following SQL is executed:

    SELECT DISTINCT C.COMPANYNAME, CS.SOUNDEXE, D.COMPANYNAME, D.SALESREPID, YEAR(ORDERDATE) AS YR

    FROM COMMISSION_SOUNDEX CS LEFT JOIN

    DISCOUNT D ON CS.COMPANYNAME=D.COMPANYNAME JOIN

    COMPANY C ON C.COMPANYNAME LIKE ('%' + CS.SOUNDEXE + '%') JOIN

    CUSTOMER CU ON C.COMPANYID=CU.COMPANYID LEFT JOIN

    ORDERS O ON CU.CUSTOMERID=O.CUSTOMERID

    WHERE CS.LEFTON=0 AND CS.RIGHTON=1

    AND ORDERDATE IS NOT NULL

    There are several iterations of this logic with the following changes:

    COMPANY C ON C.COMPANYNAME LIKE ('%' + CS.SOUNDEXE) JOIN

    WHERE CS.LEFTON=1 AND CS.RIGHTON=0

    and so on. (Notice the % is only on the Left side now, and the query is looking for 'lefton' = 1.)

    So, it is being used to find companies which are spelled similarly. The newly found 'companies' are placed in a temp table which must be reviewed by a human to determine if the newly found company is indeed a branch of the company. (All of this is used to calculate commissions.)

  • Yes, but this would appear to me to be more of a business rule. How does the data initially get inserted into the table?

    John Rowan

    ======================================================
    ======================================================
    Forum Etiquette: How to post data/code on a forum to get the best help[/url] - by Jeff Moden

  • I don't know how it was intially entered into the table. All I know is that the table has not been updated since January, 2007. I have searched through SQL Server to find references to the table to no avail. So, I really have no idea how is should be updated. It does appear to be either a mass load or a program that created the data however. There is actually a 'last_updated column in the database and they have the same datetime in them.

  • Without documentation, it's hard to say. It does appear that the SoundExe values are all comprised of the left most X characters of the name, although I cannot see a pattern as to how many values are used for each name. I don't think there is an algorithm that will reproduce the values.

    Since you cannot find code that inserts the data and it appears to be static, I would suggest finding all of the code that queries the tables and go through it until you have a very good understanding of how the data is used. Then use your knowledge of the data access rules to manually change the CompanyName and SoundExe values. This is the only way I can think of to ensure that once you change the values, your queries will still return the expected results.

    This is probably not the majic answer that you were looking for, but unless someone else chimes in here, I'm out of ideas. You should also find the guys who originally designed the system and pummel them upon the neck and head :D.

    John Rowan

    ======================================================
    ======================================================
    Forum Etiquette: How to post data/code on a forum to get the best help[/url] - by Jeff Moden

  • I would guess that some developer thought he could wire a better SoundEx and something was coded into a front end application that inserted the data. If I were you, I would search all of the source code repositories at your company for the string "soundexe" and hope that something comes back.

    There is also the chance the the data was imported from some source and this information came with it.

Viewing 9 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply