Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase

Scalar function too slow Expand / Collapse
Author
Message
Posted Monday, February 04, 2013 3:28 PM


Mr or Mrs. 500

Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500

Group: General Forum Members
Last Login: Today @ 4:21 PM
Points: 512, Visits: 2,245
I created a function which basicallly takes the following input:



then performs various checks on the name details to determine a match score for the name part of a record. The outcome of the function can be sure, likely, possible etc.... This is all part of a recod linkage exercise that I'm working on.

The function performs badly as expected due to the number if IF statements but I can't seee how I can convert the following (see part of function below) to a iTVF? Replace all the IF with CASEes?

CREATE FUNCTION [dbo].[Individual_Name_Score] (

@Master_mkNormalisedName NVARCHAR(50)
,@Master_mkName1 NVARCHAR(50)
,@Master_mkName2 NVARCHAR(50)
,@Master_mkName3 NVARCHAR(50)

,@Duplicate_mkNormalisedName NVARCHAR(50)
,@Duplicate_mkName1 NVARCHAR(50)
,@Duplicate_mkName2 NVARCHAR(50)
,@Duplicate_mkName3 NVARCHAR(50)

)
RETURNS VARCHAR(50)
AS
BEGIN

-- Split the master record
DECLARE @Master_Surname NVARCHAR(50) = ''
DECLARE @Master_Forename NVARCHAR(50) = ''
DECLARE @Master_MiddleName NVARCHAR(50) = ''

-- Split the duplicate recrod
DECLARE @Duplicate_Surname NVARCHAR(50) = ''
DECLARE @Duplicate_Forename NVARCHAR(50) = ''
DECLARE @Duplicate_MiddleName NVARCHAR(50) = ''

DECLARE @ScorerType VARCHAR(50) = ''

--initialise variables used to determine name match score

-- get the normalised name parts and the phonetic name parts for the master record


SET @Master_Surname = LEFT(@Master_mkNormalisedName, CHARINDEX(',',@Master_mkNormalisedName)-1)
SET @Master_Forename = LEFT(SUBSTRING(@Master_mkNormalisedName, CHARINDEX(',',@Master_mkNormalisedName)+1,
LEN(@Master_mkNormalisedName)), CHARINDEX(',',SUBSTRING(@Master_mkNormalisedName,
CHARINDEX(',',@Master_mkNormalisedName)+1,LEN(@Master_mkNormalisedName)))-1)
SET @Master_MiddleName = REVERSE(LEFT(REVERSE(@Master_mkNormalisedName), CHARINDEX(',',REVERSE(@Master_mkNormalisedName))-1))

-- get the normalised name parts and the phonetic name parts for the duplicate record
SET @Duplicate_Surname = LEFT(@Duplicate_mkNormalisedName, CHARINDEX(',',@Duplicate_mkNormalisedName)-1)
SET @Duplicate_Forename = LEFT(SUBSTRING(@Duplicate_mkNormalisedName, CHARINDEX(',',@Duplicate_mkNormalisedName)+1,
LEN(@Duplicate_mkNormalisedName)), CHARINDEX(',',SUBSTRING(@Duplicate_mkNormalisedName,
CHARINDEX(',',@Duplicate_mkNormalisedName)+1,LEN(@Duplicate_mkNormalisedName)))-1)
SET @Duplicate_MiddleName = REVERSE(LEFT(REVERSE(@Duplicate_mkNormalisedName), CHARINDEX(',',REVERSE(@Duplicate_mkNormalisedName))-1))

-- if both surnames are empty
IF LEN(@Master_Surname) = 0 AND LEN(@Duplicate_Surname) = 0
BEGIN
BEGIN SET @ScorerType = 'bothEmpty' RETURN @ScorerType END
END
----one of them is empty then return
ELSE IF LEN(@Master_Surname) = 0 OR LEN(@Duplicate_Surname) = 0
BEGIN
BEGIN SET @ScorerType = 'oneEmpty' RETURN @ScorerType END
END
-- surnames both match
ELSE IF @Master_Surname = @Duplicate_Surname
BEGIN
-- forenames both the same
IF @Master_Forename = @Duplicate_Forename
BEGIN
IF @Master_MiddleName = @Duplicate_MiddleName BEGIN SET @ScorerType = 'sure' RETURN @ScorerType END
IF LEN(@Master_MiddleName) = 0 AND LEN(@Duplicate_MiddleName) = 0 BEGIN SET @ScorerType = 'sure' RETURN @ScorerType END
IF LEN(@Master_MiddleName) = 0 OR LEN(@Duplicate_MiddleName) = 0 BEGIN SET @ScorerType = 'sure' RETURN @ScorerType END
IF dbo.LEVENSHTEIN(@Master_MiddleName, @Duplicate_MiddleName) < 3 BEGIN SET @ScorerType = 'likely' RETURN @ScorerType END
IF dbo.JaroWinkler(@Master_MiddleName, @Duplicate_MiddleName) > 0.6 BEGIN SET @ScorerType = 'likely' RETURN @ScorerType END
IF @Master_MiddleName <> @Duplicate_MiddleName BEGIN SET @ScorerType = 'possible' RETURN @ScorerType END
END
-- forenames sound the same
ELSE IF @Master_mkName2 = @Duplicate_mkName2
BEGIN
IF @Master_MiddleName = @Duplicate_MiddleName BEGIN SET @ScorerType = 'sure' RETURN @ScorerType END
IF LEN(@Master_MiddleName) = 0 AND LEN(@Duplicate_MiddleName) = 0 BEGIN SET @ScorerType = 'likely' RETURN @ScorerType END
IF LEN(@Master_MiddleName) = 0 OR LEN(@Duplicate_MiddleName) = 0 BEGIN SET @ScorerType = 'likely' RETURN @ScorerType END
IF dbo.LEVENSHTEIN(@Master_MiddleName, @Duplicate_MiddleName) < 3 BEGIN SET @ScorerType = 'possible' RETURN @ScorerType END
IF dbo.JaroWinkler(@Master_MiddleName, @Duplicate_MiddleName) > 0.6 BEGIN SET @ScorerType = 'possible' RETURN @ScorerType END
IF @Master_MiddleName <> @Duplicate_MiddleName BEGIN SET @ScorerType = 'zero' RETURN @ScorerType END
END
-- forenames both empty
ELSE IF LEN(@Master_Forename) = 0 AND LEN(@Duplicate_Forename) = 0
BEGIN
IF @Master_MiddleName = @Duplicate_MiddleName BEGIN SET @ScorerType = 'zero' RETURN @ScorerType END
IF LEN(@Master_MiddleName) = 0 AND LEN(@Duplicate_MiddleName) = 0 BEGIN SET @ScorerType = 'likely' RETURN @ScorerType END
IF LEN(@Master_MiddleName) = 0 OR LEN(@Duplicate_MiddleName) = 0 BEGIN SET @ScorerType = 'zero' RETURN @ScorerType END
IF dbo.LEVENSHTEIN(@Master_MiddleName, @Duplicate_MiddleName) < 3 BEGIN SET @ScorerType = 'zero' RETURN @ScorerType END
IF dbo.JaroWinkler(@Master_MiddleName, @Duplicate_MiddleName) > 0.6 BEGIN SET @ScorerType = 'zero' RETURN @ScorerType END
IF @Master_MiddleName <> @Duplicate_MiddleName BEGIN SET @ScorerType = 'zero' RETURN @ScorerType END
END
-- one of the forenames is empty
ELSE IF LEN(@Master_Forename) = 0 OR LEN(@Duplicate_Forename) = 0
BEGIN
IF @Master_MiddleName = @Duplicate_MiddleName BEGIN SET @ScorerType = 'zero' RETURN @ScorerType END
IF LEN(@Master_MiddleName) = 0 AND LEN(@Duplicate_MiddleName) = 0 BEGIN SET @ScorerType = 'likely' RETURN @ScorerType END
IF LEN(@Master_MiddleName) = 0 OR LEN(@Duplicate_MiddleName) = 0 BEGIN SET @ScorerType = 'likely' RETURN @ScorerType END
IF dbo.LEVENSHTEIN(@Master_MiddleName, @Duplicate_MiddleName) < 3 BEGIN SET @ScorerType = 'zero' RETURN @ScorerType END
IF dbo.JaroWinkler(@Master_MiddleName, @Duplicate_MiddleName) > 0.6 BEGIN SET @ScorerType = 'likely' RETURN @ScorerType END
IF @Master_MiddleName <> @Duplicate_MiddleName BEGIN SET @ScorerType = 'zero' RETURN @ScorerType END
END
-- forenames approximately the same
ELSE IF dbo.LEVENSHTEIN(@Master_Forename, @Duplicate_Forename) < 3
BEGIN
IF @Master_MiddleName = @Duplicate_MiddleName BEGIN SET @ScorerType = 'likely' RETURN @ScorerType END
IF LEN(@Master_MiddleName) = 0 AND LEN(@Duplicate_MiddleName) = 0 BEGIN SET @ScorerType = 'likely' RETURN @ScorerType END
IF LEN(@Master_MiddleName) = 0 OR LEN(@Duplicate_MiddleName) = 0 BEGIN SET @ScorerType = 'likely' RETURN @ScorerType END
IF dbo.LEVENSHTEIN(@Master_MiddleName, @Duplicate_MiddleName) < 3 BEGIN SET @ScorerType = 'possible' RETURN @ScorerType END
IF dbo.JaroWinkler(@Master_MiddleName, @Duplicate_MiddleName) > 0.6 BEGIN SET @ScorerType = 'possible' RETURN @ScorerType END
IF @Master_MiddleName <> @Duplicate_MiddleName BEGIN SET @ScorerType = 'zero' RETURN @ScorerType END
END
-- forenames approximately sound the same
ELSE IF dbo.LEVENSHTEIN(@Master_mkName2, @Duplicate_mkName2) < 3
BEGIN
IF @Master_MiddleName = @Duplicate_MiddleName BEGIN SET @ScorerType = 'likely' RETURN @ScorerType END
IF LEN(@Master_MiddleName) = 0 AND LEN(@Duplicate_MiddleName) = 0 BEGIN SET @ScorerType = 'possible' RETURN @ScorerType END
IF LEN(@Master_MiddleName) = 0 OR LEN(@Duplicate_MiddleName) = 0 BEGIN SET @ScorerType = 'possible' RETURN @ScorerType END
IF dbo.LEVENSHTEIN(@Master_MiddleName, @Duplicate_MiddleName) < 3 BEGIN SET @ScorerType = 'zero' RETURN @ScorerType END
IF dbo.JaroWinkler(@Master_MiddleName, @Duplicate_MiddleName) > 0.6 BEGIN SET @ScorerType = 'possible' RETURN @ScorerType END
IF @Master_MiddleName <> @Duplicate_MiddleName BEGIN SET @ScorerType = 'zero' RETURN @ScorerType END
END
-- forenames possibly the same
ELSE IF dbo.JaroWinkler(@Master_Forename, @Duplicate_Forename) > 0.6
BEGIN
IF @Master_MiddleName = @Duplicate_MiddleName BEGIN SET @ScorerType = 'likely' RETURN @ScorerType END
IF LEN(@Master_MiddleName) = 0 AND LEN(@Duplicate_MiddleName) = 0 BEGIN SET @ScorerType = 'likely' RETURN @ScorerType END
IF LEN(@Master_MiddleName) = 0 OR LEN(@Duplicate_MiddleName) = 0 BEGIN SET @ScorerType = 'likely' RETURN @ScorerType END
IF dbo.LEVENSHTEIN(@Master_MiddleName, @Duplicate_MiddleName) < 3 BEGIN SET @ScorerType = 'possible' RETURN @ScorerType END
IF dbo.JaroWinkler(@Master_MiddleName, @Duplicate_MiddleName) > 0.6 BEGIN SET @ScorerType = 'possible' RETURN @ScorerType END
IF @Master_MiddleName <> @Duplicate_MiddleName BEGIN SET @ScorerType = 'zero' RETURN @ScorerType END
END
-- forenames are not equal
ELSE
BEGIN
IF @Master_MiddleName = @Duplicate_MiddleName BEGIN SET @ScorerType = 'zero' RETURN @ScorerType END
IF LEN(@Master_MiddleName) = 0 AND LEN(@Duplicate_MiddleName) = 0 BEGIN SET @ScorerType = 'zero' RETURN @ScorerType END
IF LEN(@Master_MiddleName) = 0 OR LEN(@Duplicate_MiddleName) = 0 BEGIN BEGIN SET @ScorerType = 'zero' RETURN @ScorerType END END
IF dbo.LEVENSHTEIN(@Master_MiddleName, @Duplicate_MiddleName) < 3 BEGIN SET @ScorerType = 'zero' RETURN @ScorerType END
IF dbo.JaroWinkler(@Master_MiddleName, @Duplicate_MiddleName) > 0.6 BEGIN SET @ScorerType = 'possible' RETURN @ScorerType END
IF @Master_MiddleName <> @Duplicate_MiddleName BEGIN SET @ScorerType = 'zero' RETURN @ScorerType END
END
END

-- if lastname sounds equal:
--ELSE IF @Master_mkName1 = @Duplicate_mkName1
-- BEGIN
-- IF @Master_Forename = @Duplicate_Forename
-- BEGIN
-- IF @Master_MiddleName = @Duplicate_MiddleName BEGIN SET @ScorerType = 'likely' RETURN @ScorerType END
-- IF LEN(@Master_MiddleName) = 0 AND LEN(@Duplicate_MiddleName) = 0 BEGIN SET @ScorerType = 'likely' RETURN @ScorerType END
-- IF LEN(@Master_MiddleName) = 0 OR LEN(@Duplicate_MiddleName) = 0 BEGIN SET @ScorerType = 'likely' RETURN @ScorerType END
-- IF dbo.LEVENSHTEIN(@Master_MiddleName, @Duplicate_MiddleName) < 3 BEGIN SET @ScorerType = 'possible' RETURN @ScorerType END
-- IF dbo.JaroWinkler(@Master_MiddleName, @Duplicate_MiddleName) > 0.6 BEGIN SET @ScorerType = 'possible' RETURN @ScorerType END
-- IF @Master_MiddleName <> @Duplicate_MiddleName BEGIN SET @ScorerType = 'zero' RETURN @ScorerType END
-- END

-- ELSE IF @Master_mkName2 = @Duplicate_mkName2
-- BEGIN
-- IF @Master_MiddleName = @Duplicate_MiddleName BEGIN SET @ScorerType = 'likely' RETURN @ScorerType END
-- IF LEN(@Master_MiddleName) = 0 AND LEN(@Duplicate_MiddleName) = 0 BEGIN SET @ScorerType = 'possible' RETURN @ScorerType END
-- IF LEN(@Master_MiddleName) = 0 OR LEN(@Duplicate_MiddleName) = 0 BEGIN SET @ScorerType = 'possible' RETURN @ScorerType END
-- IF dbo.LEVENSHTEIN(@Master_MiddleName, @Duplicate_MiddleName) < 3 BEGIN SET @ScorerType = 'possible' RETURN @ScorerType END
-- IF dbo.JaroWinkler(@Master_MiddleName, @Duplicate_MiddleName) > 0.6 BEGIN SET @ScorerType = 'possible' RETURN @ScorerType END
-- IF @Master_MiddleName <> @Duplicate_MiddleName BEGIN SET @ScorerType = 'zero' RETURN @ScorerType END
-- END

-- ELSE IF LEN(@Master_Forename) = 0 AND LEN(@Duplicate_Forename) = 0
-- BEGIN
-- IF @Master_MiddleName = @Duplicate_MiddleName BEGIN SET @ScorerType = 'zero' RETURN @ScorerType END
-- IF LEN(@Master_MiddleName) = 0 AND LEN(@Duplicate_MiddleName) = 0 BEGIN SET @ScorerType = 'likely' RETURN @ScorerType END
-- IF LEN(@Master_MiddleName) = 0 OR LEN(@Duplicate_MiddleName) = 0 BEGIN SET @ScorerType = 'zero' RETURN @ScorerType END
-- IF dbo.LEVENSHTEIN(@Master_MiddleName, @Duplicate_MiddleName) < 3 BEGIN SET @ScorerType = 'zero' RETURN @ScorerType END
-- IF dbo.JaroWinkler(@Master_MiddleName, @Duplicate_MiddleName) > 0.6 BEGIN SET @ScorerType = 'zero' RETURN @ScorerType END
-- IF @Master_MiddleName <> @Duplicate_MiddleName BEGIN SET @ScorerType = 'zero' RETURN @ScorerType END
-- END

-- ELSE IF LEN(@Master_Forename) = 0 OR LEN(@Duplicate_Forename) = 0
-- BEGIN
-- IF @Master_MiddleName = @Duplicate_MiddleName BEGIN SET @ScorerType = 'zero' RETURN @ScorerType END
-- IF LEN(@Master_MiddleName) = 0 AND LEN(@Duplicate_MiddleName) = 0 BEGIN SET @ScorerType = 'likely' RETURN @ScorerType END
-- IF LEN(@Master_MiddleName) = 0 OR LEN(@Duplicate_MiddleName) = 0 BEGIN SET @ScorerType = 'likely' RETURN @ScorerType END
-- IF dbo.LEVENSHTEIN(@Master_MiddleName, @Duplicate_MiddleName) < 3 BEGIN SET @ScorerType = 'zero' RETURN @ScorerType END
-- IF dbo.JaroWinkler(@Master_MiddleName, @Duplicate_MiddleName) > 0.6 BEGIN SET @ScorerType = 'zero' RETURN @ScorerType END
-- IF @Master_MiddleName <> @Duplicate_MiddleName BEGIN SET @ScorerType = 'zero' RETURN @ScorerType END
-- END

-- ELSE IF dbo.LEVENSHTEIN(@Master_Forename, @Duplicate_Forename) < 3
-- BEGIN
-- IF @Master_MiddleName = @Duplicate_MiddleName BEGIN SET @ScorerType = 'possible' RETURN @ScorerType END
-- IF LEN(@Master_MiddleName) = 0 AND LEN(@Duplicate_MiddleName) = 0 BEGIN SET @ScorerType = 'possible' RETURN @ScorerType END
-- IF LEN(@Master_MiddleName) = 0 OR LEN(@Duplicate_MiddleName) = 0 BEGIN SET @ScorerType = 'possible' RETURN @ScorerType END
-- IF dbo.LEVENSHTEIN(@Master_MiddleName, @Duplicate_MiddleName) < 3 BEGIN SET @ScorerType = 'possible' RETURN @ScorerType END
-- IF dbo.JaroWinkler(@Master_MiddleName, @Duplicate_MiddleName) > 0.6 BEGIN SET @ScorerType = 'possible' RETURN @ScorerType END
-- IF @Master_MiddleName <> @Duplicate_MiddleName BEGIN SET @ScorerType = 'zero' RETURN @ScorerType END
-- END

-- ELSE IF dbo.LEVENSHTEIN(@Master_mkName2, @Duplicate_mkName2) < 3
-- BEGIN
-- IF @Master_MiddleName = @Duplicate_MiddleName BEGIN SET @ScorerType = 'possible' RETURN @ScorerType END
-- IF LEN(@Master_MiddleName) = 0 AND LEN(@Duplicate_MiddleName) = 0 BEGIN SET @ScorerType = 'possible' RETURN @ScorerType END
-- IF LEN(@Master_MiddleName) = 0 OR LEN(@Duplicate_MiddleName) = 0 BEGIN SET @ScorerType = 'possible' RETURN @ScorerType END
-- IF dbo.LEVENSHTEIN(@Master_MiddleName, @Duplicate_MiddleName) < 3 BEGIN SET @ScorerType = 'zero' RETURN @ScorerType END
-- IF dbo.JaroWinkler(@Master_MiddleName, @Duplicate_MiddleName) > 0.6 BEGIN SET @ScorerType = 'possible' RETURN @ScorerType END
-- IF @Master_MiddleName <> @Duplicate_MiddleName BEGIN SET @ScorerType = 'zero' RETURN @ScorerType END
-- END

-- ELSE IF dbo.JaroWinkler(@Master_Forename, @Duplicate_Forename) > 0.6
-- BEGIN
-- IF @Master_MiddleName = @Duplicate_MiddleName BEGIN SET @ScorerType = 'possible' RETURN @ScorerType END
-- IF LEN(@Master_MiddleName) = 0 AND LEN(@Duplicate_MiddleName) = 0 BEGIN SET @ScorerType = 'possible' RETURN @ScorerType END
-- IF LEN(@Master_MiddleName) = 0 OR LEN(@Duplicate_MiddleName) = 0 BEGIN SET @ScorerType = 'possible' RETURN @ScorerType END
-- IF dbo.LEVENSHTEIN(@Master_MiddleName, @Duplicate_MiddleName) < 3 BEGIN SET @ScorerType = 'possible' RETURN @ScorerType END
-- IF dbo.JaroWinkler(@Master_MiddleName, @Duplicate_MiddleName) > 0.6 BEGIN SET @ScorerType = 'possible' RETURN @ScorerType END
-- IF @Master_MiddleName <> @Duplicate_MiddleName BEGIN SET @ScorerType = 'zero' RETURN @ScorerType END
-- END
-- ELSE
-- BEGIN
-- IF @Master_MiddleName = @Duplicate_MiddleName BEGIN SET @ScorerType = 'zero' RETURN @ScorerType END
-- IF LEN(@Master_MiddleName) = 0 AND LEN(@Duplicate_MiddleName) = 0 BEGIN SET @ScorerType = 'zero' RETURN @ScorerType END
-- IF LEN(@Master_MiddleName) = 0 OR LEN(@Duplicate_MiddleName) = 0 BEGIN SET @ScorerType = 'zero' RETURN @ScorerType END
-- IF dbo.LEVENSHTEIN(@Master_MiddleName, @Duplicate_MiddleName) < 3 BEGIN SET @ScorerType = 'zero' RETURN @ScorerType END
-- IF dbo.JaroWinkler(@Master_MiddleName, @Duplicate_MiddleName) > 0.6 BEGIN SET @ScorerType = 'zero' RETURN @ScorerType END
-- IF @Master_MiddleName <> @Duplicate_MiddleName BEGIN SET @ScorerType = 'zero' RETURN @ScorerType END
-- END
-- END



-----------------------------------
http://www.SQL4n00bs.com
Post #1415512
Posted Tuesday, February 05, 2013 4:05 AM


Mr or Mrs. 500

Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500

Group: General Forum Members
Last Login: Today @ 4:21 PM
Points: 512, Visits: 2,245
So many views and no replies?! Where are the clever guys these days?!

-----------------------------------
http://www.SQL4n00bs.com
Post #1415708
Posted Tuesday, February 05, 2013 7:59 AM


SSCrazy Eights

SSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy Eights

Group: General Forum Members
Last Login: Today @ 3:32 PM
Points: 8,980, Visits: 8,540
Abu Dina (2/5/2013)
So many views and no replies?! Where are the clever guys these days?!


I think the biggest challenge to getting an answer is that this code is really scary!!! It is a scalar function with lots and lots of calls to other scalar functions (LEVENSHTEIN, JaroWinkler). You execute a scalar function, examine the results and then execute another scalar function. It is no surprise the performance is not acceptable.

The problem is we don't know what this is supposed to do. I know you have a certain format for the data being passed in. I have recently seen your threads splitting your strings into 3 which is obviously used inside here. You will need to rewrite this and very possibly those other 2 functions if you want to have a chance at making this faster.




_______________________________________________________________

Need help? Help us help you.

Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

Need to split a string? Try Jeff Moden's splitter.

Cross Tabs and Pivots, Part 1 – Converting Rows to Columns
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs
Post #1415859
Posted Tuesday, February 05, 2013 9:17 AM


Mr or Mrs. 500

Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500

Group: General Forum Members
Last Login: Today @ 4:21 PM
Points: 512, Visits: 2,245
Thanks for your input Sean. I really appreciate it.

As mentioned, this is part one part of a record linkage exercise I'm working on. The way it works is like this:

I have two record sets (both contain Firstname, Surname, address1, Town, Postcode) and I want to find duplicate individuals.

Of course the eaiest thing to do is to join the two sets where all 5 columns match but as you know, data is never as clean as one would like. So another approach is to join the two data sets on phonetic code of the surname and phonetic address1.

Once that's done, I take each matching set and then compare the other fields and give them scores. The function above is meant to work out the name score.

So in my effort to make it faster, I pass all the required details to the function I then do checks like:

If Surname from record Set A = Surname from record Set B
AND forename = forename
then the ultimate score will depends on the contents of the middlename

IF A.Surname = B.surname
AND Phonetic Forename = phonetic Forename
then the score will against depends on middle name

etc...

But as you can see, there are so many variations like

IF Surname SOUNDS like Surname
AND forename = forename

etc.. etc...

Does this make sense?!


Does this make sense?


-----------------------------------
http://www.SQL4n00bs.com
Post #1415936
« Prev Topic | Next Topic »

Add to briefcase

Permissions Expand / Collapse