May 26, 2010 at 2:34 pm
My job requires dealing with data from any source, and I've run into this sort of situation a number of times.
I'll get source data with a column called Name or something, and it'll have "Mr John Q Public Jr" in it. Of course, there's always crazies who put something in like "Reverend Mister James T. Kirk, Junior Ph.D."
I need the latter to wind up looking like "Rev Mr James T Kirk Jr PhD". This also occurs with state names, country names, address parts, etc.
I've got a few tables with search/replace columns for different uses, and each table has a corresponding function to go with it.
Those tables have the search column as the PK and a replace column. Very simple stuff.
For example, the XREF.AddressAbbreviations table has the following:
INSERT INTO XREF.AddressAbbreviations
([Search],[Replace])
VALUES
('Street','ST'),
('Avenue','AVE'),
etc..
The code for the function looks like this:
CREATE FUNCTION dbo.varchar_AbbreviateAddress
(
@haystack varchar(1000)
)
returns varchar(4000)
WITH returns NULL ON NULL INPUT
AS
BEGIN
SET @haystack = ' '+@haystack+' ';
SELECT @haystack = REPLACE(@haystack,' '+[search]+' ',' '+[replace]+' ')
FROM XREF.AddressAbbreviations
WHERE @haystack LIKE '% '+[search]+' %'
RETURN LTRIM(RTRIM(@haystack))
END
However, as the astute have already guessed, this goes through XREF.AddressAbbreviations (or XREF.NameAbbreviations, etc.) for each and every record.
I've had good luck with the output, and the output is only used for matching purposes anyway, but it is fairly agonizing.
Any pointers? (I do try to do this to source data that I can alter, so that when it's compared multiple times to different records, it's not altering the same record multiple times.)
May 26, 2010 at 3:23 pm
If you have to do this a lot - and it sounds like it, I would recommend you take a look at Melissa Data for this task. It will pay for itself in no time - and you have the benefit of being able to validate address information and normalize name information from your application.
Jeffrey Williams
“We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”
― Charles R. Swindoll
How to post questions to get better answers faster
Managing Transaction Logs
May 27, 2010 at 7:45 am
I think I've come up with a way that should drastically improve the speed, but I'm having trouble coming up with a way to make it reusable.
Right now, on the back end, it's traversing the ~ 50-100 record XREF.___Abbreviation - table once for each source record, usually for thousands of such records.
If I were able to switch the order, traversing the abbreviation table once (probably in a cursor), and doing the REPLACE() on all source records, would that significantly improve the speed?
My first idea was to build a nested REPLACE() string and execute that dynamically, but that cannot be done within a function.
My second idea (before I remembered that dynamic SQL cannot be run within a function) was to loop through the abbreviate table, and dynamically execute one replace at a time for each abbreviation record.
I'm still stumped.
Viewing 3 posts - 1 through 3 (of 3 total)
You must be logged in to reply to this topic. Login to reply