Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase

Soundex-Like functionality Expand / Collapse
Author
Message
Posted Wednesday, June 11, 2008 3:11 PM


Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Wednesday, July 23, 2008 8:59 AM
Points: 4, Visits: 8
Background
I have inherited a system which has a table that stores CompanyName , and a column called ‘soundexe’; soundexe contains letters which are then compared to other company names using Like.

CREATE TABLE [dbo].[Commission_Soundex] (
[SoundID] [int] IDENTITY (1, 1) NOT NULL ,
[Soundexe] [varchar] (50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
[CompanyName] [varchar] (50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
[LeftOn] [bit] NOT NULL CONSTRAINT [DF_Commission_Soundex_LeftOn] DEFAULT (1),
[RightOn] [bit] NOT NULL CONSTRAINT [DF_Commission_Soundex_RightOn] DEFAULT (1),
[NotLike] [bit] NOT NULL CONSTRAINT [DF_Commission_Soundex_BothOn] DEFAULT (1),
[BothOff] [bit] NOT NULL CONSTRAINT [DF_Commission_Soundex_BothOff] DEFAULT (0),
CONSTRAINT [PK_Commission_Soundex] PRIMARY KEY CLUSTERED

Here are a few rows of the table for you to consider:
SoundID Soundexe CompanyName LeftOn RightOn NotLike BothOff
1 ABN ABM Ambro Vancouver 1 1 0 0
2 ABN ABN_AMRO 1 1 0 0
3 ACCENTU Accenture 1 1 0 0
4 AEGON AEGON 0 0 0 1
5 AG ED AG Edwards 1 1 0 0
6 AIC AIC Limited 0 1 0 0
7 AIG AIG 2006 0 1 0 0
8 TRIMA AIM TRIMARK 1 1 0 0
9 ALLIANCE CA Alliance Capital Management 1 1 0 0
10 ALLIANZ Allianz 0 0 0 1
11 ALLSTAT Allstate Financial 1 1 0 0

The Problem and Question
My problem is this – I need to update this table with hundreds of companies and update this ‘soundexe’ column also. But I can not find the algorithm which was used to create it.

My hope is that someone will look at this data and tell me some wonderful function that would take the company name ‘AIM TRIMARK’ and convert it to ‘TRIMA’.

Thanks so much!!!

p.s. I have attached a spreadsheet of the table for those of you wanting more examples.


  Post Attachments 
CompanyNames.xls (0 views, 56.50 KB)
Post #515536
Posted Wednesday, June 11, 2008 3:21 PM
Hall of Fame

Hall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of Fame

Group: General Forum Members
Last Login: 2 days ago @ 10:01 AM
Points: 3,844, Visits: 3,841
Uh, unless I don't quite understand your question, this is just a simple update statement.

DECLARE @Commission_Soundex TABLE  (
[SoundID] [int] IDENTITY (1, 1) NOT NULL ,
[Soundexe] [varchar] (50) NULL ,
[CompanyName] [varchar] (50) NULL ,
[LeftOn] [bit] NOT NULL DEFAULT (1),
[RightOn] [bit] NOT NULL DEFAULT (1),
[NotLike] [bit] NOT NULL DEFAULT (1),
[BothOff] [bit] NOT NULL DEFAULT (0)
)

INSERT INTO @Commission_Soundex (SoundExe, CompanyName, LeftOn, RightOn, NotLike, BothOff)
SELECT 'ABN', 'ABM Ambro Vancouver', 1, 1, 0, 0 UNION ALL
SELECT 'ABN', 'ABN_AMRO', 1, 1, 0, 0 UNION ALL
SELECT 'ACCENTU', 'Accenture', 1, 1, 0, 0 UNION ALL
SELECT 'AEGON', 'AEGON', 0, 0, 0, 1 UNION ALL
SELECT 'AG ED', 'AG Edwards', 1, 1, 0, 0 UNION ALL
SELECT 'AIC', 'AIC Limited', 0, 1, 0, 0 UNION ALL
SELECT 'AIG', 'AIG 2006', 0, 1, 0, 0 UNION ALL
SELECT 'TRIMA', 'AIM TRIMARK', 1, 1, 0, 0 UNION ALL
SELECT 'ALLIANCE CA', 'Alliance Capital Management', 1, 1, 0, 0 UNION ALL
SELECT 'ALLIANZ', 'Allianz', 0, 0, 0, 1 UNION ALL
SELECT 'ALLSTAT', 'Allstate Financial', 1, 1, 0, 0

SELECT * FROM @Commission_Soundex

UPDATE @Commission_Soundex
SET CompanyName = 'TRIMA'
WHERE CompanyName = 'AIM TRIMARK'

SELECT * FROM @Commission_Soundex





John Rowan

======================================================
======================================================
Forum Etiquette: How to post data/code on a forum to get the best help - by Jeff Moden
Post #515540
Posted Wednesday, June 11, 2008 3:29 PM


Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Wednesday, July 23, 2008 8:59 AM
Points: 4, Visits: 8
Thanks, but I must not have explained myself well. I don't know what the algorithm is that converts the 'CompanyName' to 'Soundexe'. For example, CompanyName = 'Accenture', and Soundexe = 'ACCENTU'.

At first I thought it was simply a 'soundex' function - but it is not. I really don't think that anyone could crack this, but I am hopeful!!!
Post #515546
Posted Wednesday, June 11, 2008 3:36 PM
Hall of Fame

Hall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of Fame

Group: General Forum Members
Last Login: 2 days ago @ 10:01 AM
Points: 3,844, Visits: 3,841
What makes you think that the SoundExe value is derived from the CompanyName? What does the SoundExe values represent?



John Rowan

======================================================
======================================================
Forum Etiquette: How to post data/code on a forum to get the best help - by Jeff Moden
Post #515551
Posted Wednesday, June 11, 2008 3:50 PM


Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Wednesday, July 23, 2008 8:59 AM
Points: 4, Visits: 8
I think that the Soundexe column is derived from the company name because in a Stored Procedure used later, the following SQL is executed:


SELECT DISTINCT C.COMPANYNAME, CS.SOUNDEXE, D.COMPANYNAME, D.SALESREPID, YEAR(ORDERDATE) AS YR
FROM COMMISSION_SOUNDEX CS LEFT JOIN
DISCOUNT D ON CS.COMPANYNAME=D.COMPANYNAME JOIN
COMPANY C ON C.COMPANYNAME LIKE ('%' + CS.SOUNDEXE + '%') JOIN
CUSTOMER CU ON C.COMPANYID=CU.COMPANYID LEFT JOIN
ORDERS O ON CU.CUSTOMERID=O.CUSTOMERID
WHERE CS.LEFTON=0 AND CS.RIGHTON=1
AND ORDERDATE IS NOT NULL

There are several iterations of this logic with the following changes:

COMPANY C ON C.COMPANYNAME LIKE ('%' + CS.SOUNDEXE) JOIN
WHERE CS.LEFTON=1 AND CS.RIGHTON=0
and so on. (Notice the % is only on the Left side now, and the query is looking for 'lefton' = 1.)

So, it is being used to find companies which are spelled similarly. The newly found 'companies' are placed in a temp table which must be reviewed by a human to determine if the newly found company is indeed a branch of the company. (All of this is used to calculate commissions.)
Post #515562
Posted Wednesday, June 11, 2008 3:57 PM
Hall of Fame

Hall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of Fame

Group: General Forum Members
Last Login: 2 days ago @ 10:01 AM
Points: 3,844, Visits: 3,841
Yes, but this would appear to me to be more of a business rule. How does the data initially get inserted into the table?



John Rowan

======================================================
======================================================
Forum Etiquette: How to post data/code on a forum to get the best help - by Jeff Moden
Post #515564
Posted Wednesday, June 11, 2008 4:04 PM


Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Wednesday, July 23, 2008 8:59 AM
Points: 4, Visits: 8
I don't know how it was intially entered into the table. All I know is that the table has not been updated since January, 2007. I have searched through SQL Server to find references to the table to no avail. So, I really have no idea how is should be updated. It does appear to be either a mass load or a program that created the data however. There is actually a 'last_updated column in the database and they have the same datetime in them.
Post #515567
Posted Wednesday, June 11, 2008 4:24 PM
Hall of Fame

Hall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of Fame

Group: General Forum Members
Last Login: 2 days ago @ 10:01 AM
Points: 3,844, Visits: 3,841
Without documentation, it's hard to say. It does appear that the SoundExe values are all comprised of the left most X characters of the name, although I cannot see a pattern as to how many values are used for each name. I don't think there is an algorithm that will reproduce the values.

Since you cannot find code that inserts the data and it appears to be static, I would suggest finding all of the code that queries the tables and go through it until you have a very good understanding of how the data is used. Then use your knowledge of the data access rules to manually change the CompanyName and SoundExe values. This is the only way I can think of to ensure that once you change the values, your queries will still return the expected results.

This is probably not the majic answer that you were looking for, but unless someone else chimes in here, I'm out of ideas. You should also find the guys who originally designed the system and pummel them upon the neck and head :D.




John Rowan

======================================================
======================================================
Forum Etiquette: How to post data/code on a forum to get the best help - by Jeff Moden
Post #515570
Posted Thursday, June 12, 2008 7:10 AM
SSCrazy

SSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazy

Group: General Forum Members
Last Login: Tuesday, August 10, 2010 5:07 AM
Points: 2,732, Visits: 23,078
I would guess that some developer thought he could wire a better SoundEx and something was coded into a front end application that inserted the data. If I were you, I would search all of the source code repositories at your company for the string "soundexe" and hope that something comes back.

There is also the chance the the data was imported from some source and this information came with it.
Post #515866
« Prev Topic | Next Topic »

Add to briefcase

Permissions Expand / Collapse