June 11, 2008 at 3:11 pm
Background
I have inherited a system which has a table that stores CompanyName , and a column called ‘soundexe’; soundexe contains letters which are then compared to other company names using Like.
CREATE TABLE [dbo].[Commission_Soundex] (
[SoundID] [int] IDENTITY (1, 1) NOT NULL ,
[Soundexe] [varchar] (50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
[CompanyName] [varchar] (50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
[LeftOn] [bit] NOT NULL CONSTRAINT [DF_Commission_Soundex_LeftOn] DEFAULT (1),
[RightOn] [bit] NOT NULL CONSTRAINT [DF_Commission_Soundex_RightOn] DEFAULT (1),
[NotLike] [bit] NOT NULL CONSTRAINT [DF_Commission_Soundex_BothOn] DEFAULT (1),
[BothOff] [bit] NOT NULL CONSTRAINT [DF_Commission_Soundex_BothOff] DEFAULT (0),
CONSTRAINT [PK_Commission_Soundex] PRIMARY KEY CLUSTERED
Here are a few rows of the table for you to consider:
SoundIDSoundexeCompanyNameLeftOnRightOnNotLikeBothOff
1ABNABM Ambro Vancouver1100
2ABNABN_AMRO1100
3ACCENTUAccenture1100
4AEGONAEGON0001
5AG EDAG Edwards1100
6AICAIC Limited0100
7AIGAIG 20060100
8TRIMAAIM TRIMARK1100
9ALLIANCE CAAlliance Capital Management1100
10ALLIANZAllianz0001
11ALLSTATAllstate Financial1100
The Problem and Question
My problem is this – I need to update this table with hundreds of companies and update this ‘soundexe’ column also. But I can not find the algorithm which was used to create it.
My hope is that someone will look at this data and tell me some wonderful function that would take the company name ‘AIM TRIMARK’ and convert it to ‘TRIMA’.
Thanks so much!!!
p.s. I have attached a spreadsheet of the table for those of you wanting more examples.
June 11, 2008 at 3:21 pm
Uh, unless I don't quite understand your question, this is just a simple update statement.
DECLARE @Commission_Soundex TABLE (
[SoundID] [int] IDENTITY (1, 1) NOT NULL ,
[Soundexe] [varchar] (50) NULL ,
[CompanyName] [varchar] (50) NULL ,
[LeftOn] [bit] NOT NULL DEFAULT (1),
[RightOn] [bit] NOT NULL DEFAULT (1),
[NotLike] [bit] NOT NULL DEFAULT (1),
[BothOff] [bit] NOT NULL DEFAULT (0)
)
INSERT INTO @Commission_Soundex (SoundExe, CompanyName, LeftOn, RightOn, NotLike, BothOff)
SELECT 'ABN', 'ABM Ambro Vancouver', 1, 1, 0, 0 UNION ALL
SELECT 'ABN', 'ABN_AMRO', 1, 1, 0, 0 UNION ALL
SELECT 'ACCENTU', 'Accenture', 1, 1, 0, 0 UNION ALL
SELECT 'AEGON', 'AEGON', 0, 0, 0, 1 UNION ALL
SELECT 'AG ED', 'AG Edwards', 1, 1, 0, 0 UNION ALL
SELECT 'AIC', 'AIC Limited', 0, 1, 0, 0 UNION ALL
SELECT 'AIG', 'AIG 2006', 0, 1, 0, 0 UNION ALL
SELECT 'TRIMA', 'AIM TRIMARK', 1, 1, 0, 0 UNION ALL
SELECT 'ALLIANCE CA', 'Alliance Capital Management', 1, 1, 0, 0 UNION ALL
SELECT 'ALLIANZ', 'Allianz', 0, 0, 0, 1 UNION ALL
SELECT 'ALLSTAT', 'Allstate Financial', 1, 1, 0, 0
SELECT * FROM @Commission_Soundex
UPDATE @Commission_Soundex
SET CompanyName = 'TRIMA'
WHERE CompanyName = 'AIM TRIMARK'
SELECT * FROM @Commission_Soundex
June 11, 2008 at 3:29 pm
Thanks, but I must not have explained myself well. I don't know what the algorithm is that converts the 'CompanyName' to 'Soundexe'. For example, CompanyName = 'Accenture', and Soundexe = 'ACCENTU'.
At first I thought it was simply a 'soundex' function - but it is not. I really don't think that anyone could crack this, but I am hopeful!!!
June 11, 2008 at 3:36 pm
What makes you think that the SoundExe value is derived from the CompanyName? What does the SoundExe values represent?
June 11, 2008 at 3:50 pm
I think that the Soundexe column is derived from the company name because in a Stored Procedure used later, the following SQL is executed:
SELECT DISTINCT C.COMPANYNAME, CS.SOUNDEXE, D.COMPANYNAME, D.SALESREPID, YEAR(ORDERDATE) AS YR
FROM COMMISSION_SOUNDEX CS LEFT JOIN
DISCOUNT D ON CS.COMPANYNAME=D.COMPANYNAME JOIN
COMPANY C ON C.COMPANYNAME LIKE ('%' + CS.SOUNDEXE + '%') JOIN
CUSTOMER CU ON C.COMPANYID=CU.COMPANYID LEFT JOIN
ORDERS O ON CU.CUSTOMERID=O.CUSTOMERID
WHERE CS.LEFTON=0 AND CS.RIGHTON=1
AND ORDERDATE IS NOT NULL
There are several iterations of this logic with the following changes:
COMPANY C ON C.COMPANYNAME LIKE ('%' + CS.SOUNDEXE) JOIN
WHERE CS.LEFTON=1 AND CS.RIGHTON=0
and so on. (Notice the % is only on the Left side now, and the query is looking for 'lefton' = 1.)
So, it is being used to find companies which are spelled similarly. The newly found 'companies' are placed in a temp table which must be reviewed by a human to determine if the newly found company is indeed a branch of the company. (All of this is used to calculate commissions.)
June 11, 2008 at 3:57 pm
Yes, but this would appear to me to be more of a business rule. How does the data initially get inserted into the table?
June 11, 2008 at 4:04 pm
I don't know how it was intially entered into the table. All I know is that the table has not been updated since January, 2007. I have searched through SQL Server to find references to the table to no avail. So, I really have no idea how is should be updated. It does appear to be either a mass load or a program that created the data however. There is actually a 'last_updated column in the database and they have the same datetime in them.
June 11, 2008 at 4:24 pm
Without documentation, it's hard to say. It does appear that the SoundExe values are all comprised of the left most X characters of the name, although I cannot see a pattern as to how many values are used for each name. I don't think there is an algorithm that will reproduce the values.
Since you cannot find code that inserts the data and it appears to be static, I would suggest finding all of the code that queries the tables and go through it until you have a very good understanding of how the data is used. Then use your knowledge of the data access rules to manually change the CompanyName and SoundExe values. This is the only way I can think of to ensure that once you change the values, your queries will still return the expected results.
This is probably not the majic answer that you were looking for, but unless someone else chimes in here, I'm out of ideas. You should also find the guys who originally designed the system and pummel them upon the neck and head :D.
June 12, 2008 at 7:10 am
I would guess that some developer thought he could wire a better SoundEx and something was coded into a front end application that inserted the data. If I were you, I would search all of the source code repositories at your company for the string "soundexe" and hope that something comes back.
There is also the chance the the data was imported from some source and this information came with it.
Viewing 9 posts - 1 through 8 (of 8 total)
You must be logged in to reply to this topic. Login to reply