|
|
|
Mr or Mrs. 500
      
Group: General Forum Members
Last Login: Today @ 4:21 PM
Points: 512,
Visits: 2,245
|
|
I created a function which basicallly takes the following input:

then performs various checks on the name details to determine a match score for the name part of a record. The outcome of the function can be sure, likely, possible etc.... This is all part of a recod linkage exercise that I'm working on.
The function performs badly as expected due to the number if IF statements but I can't seee how I can convert the following (see part of function below) to a iTVF? Replace all the IF with CASEes?
CREATE FUNCTION [dbo].[Individual_Name_Score] (
@Master_mkNormalisedName NVARCHAR(50) ,@Master_mkName1 NVARCHAR(50) ,@Master_mkName2 NVARCHAR(50) ,@Master_mkName3 NVARCHAR(50)
,@Duplicate_mkNormalisedName NVARCHAR(50) ,@Duplicate_mkName1 NVARCHAR(50) ,@Duplicate_mkName2 NVARCHAR(50) ,@Duplicate_mkName3 NVARCHAR(50)
) RETURNS VARCHAR(50) AS BEGIN -- Split the master record DECLARE @Master_Surname NVARCHAR(50) = '' DECLARE @Master_Forename NVARCHAR(50) = '' DECLARE @Master_MiddleName NVARCHAR(50) = ''
-- Split the duplicate recrod DECLARE @Duplicate_Surname NVARCHAR(50) = '' DECLARE @Duplicate_Forename NVARCHAR(50) = '' DECLARE @Duplicate_MiddleName NVARCHAR(50) = ''
DECLARE @ScorerType VARCHAR(50) = '' --initialise variables used to determine name match score
-- get the normalised name parts and the phonetic name parts for the master record
SET @Master_Surname = LEFT(@Master_mkNormalisedName, CHARINDEX(',',@Master_mkNormalisedName)-1) SET @Master_Forename = LEFT(SUBSTRING(@Master_mkNormalisedName, CHARINDEX(',',@Master_mkNormalisedName)+1, LEN(@Master_mkNormalisedName)), CHARINDEX(',',SUBSTRING(@Master_mkNormalisedName, CHARINDEX(',',@Master_mkNormalisedName)+1,LEN(@Master_mkNormalisedName)))-1) SET @Master_MiddleName = REVERSE(LEFT(REVERSE(@Master_mkNormalisedName), CHARINDEX(',',REVERSE(@Master_mkNormalisedName))-1)) -- get the normalised name parts and the phonetic name parts for the duplicate record SET @Duplicate_Surname = LEFT(@Duplicate_mkNormalisedName, CHARINDEX(',',@Duplicate_mkNormalisedName)-1) SET @Duplicate_Forename = LEFT(SUBSTRING(@Duplicate_mkNormalisedName, CHARINDEX(',',@Duplicate_mkNormalisedName)+1, LEN(@Duplicate_mkNormalisedName)), CHARINDEX(',',SUBSTRING(@Duplicate_mkNormalisedName, CHARINDEX(',',@Duplicate_mkNormalisedName)+1,LEN(@Duplicate_mkNormalisedName)))-1) SET @Duplicate_MiddleName = REVERSE(LEFT(REVERSE(@Duplicate_mkNormalisedName), CHARINDEX(',',REVERSE(@Duplicate_mkNormalisedName))-1)) -- if both surnames are empty IF LEN(@Master_Surname) = 0 AND LEN(@Duplicate_Surname) = 0 BEGIN BEGIN SET @ScorerType = 'bothEmpty' RETURN @ScorerType END END ----one of them is empty then return ELSE IF LEN(@Master_Surname) = 0 OR LEN(@Duplicate_Surname) = 0 BEGIN BEGIN SET @ScorerType = 'oneEmpty' RETURN @ScorerType END END -- surnames both match ELSE IF @Master_Surname = @Duplicate_Surname BEGIN -- forenames both the same IF @Master_Forename = @Duplicate_Forename BEGIN IF @Master_MiddleName = @Duplicate_MiddleName BEGIN SET @ScorerType = 'sure' RETURN @ScorerType END IF LEN(@Master_MiddleName) = 0 AND LEN(@Duplicate_MiddleName) = 0 BEGIN SET @ScorerType = 'sure' RETURN @ScorerType END IF LEN(@Master_MiddleName) = 0 OR LEN(@Duplicate_MiddleName) = 0 BEGIN SET @ScorerType = 'sure' RETURN @ScorerType END IF dbo.LEVENSHTEIN(@Master_MiddleName, @Duplicate_MiddleName) < 3 BEGIN SET @ScorerType = 'likely' RETURN @ScorerType END IF dbo.JaroWinkler(@Master_MiddleName, @Duplicate_MiddleName) > 0.6 BEGIN SET @ScorerType = 'likely' RETURN @ScorerType END IF @Master_MiddleName <> @Duplicate_MiddleName BEGIN SET @ScorerType = 'possible' RETURN @ScorerType END END -- forenames sound the same ELSE IF @Master_mkName2 = @Duplicate_mkName2 BEGIN IF @Master_MiddleName = @Duplicate_MiddleName BEGIN SET @ScorerType = 'sure' RETURN @ScorerType END IF LEN(@Master_MiddleName) = 0 AND LEN(@Duplicate_MiddleName) = 0 BEGIN SET @ScorerType = 'likely' RETURN @ScorerType END IF LEN(@Master_MiddleName) = 0 OR LEN(@Duplicate_MiddleName) = 0 BEGIN SET @ScorerType = 'likely' RETURN @ScorerType END IF dbo.LEVENSHTEIN(@Master_MiddleName, @Duplicate_MiddleName) < 3 BEGIN SET @ScorerType = 'possible' RETURN @ScorerType END IF dbo.JaroWinkler(@Master_MiddleName, @Duplicate_MiddleName) > 0.6 BEGIN SET @ScorerType = 'possible' RETURN @ScorerType END IF @Master_MiddleName <> @Duplicate_MiddleName BEGIN SET @ScorerType = 'zero' RETURN @ScorerType END END -- forenames both empty ELSE IF LEN(@Master_Forename) = 0 AND LEN(@Duplicate_Forename) = 0 BEGIN IF @Master_MiddleName = @Duplicate_MiddleName BEGIN SET @ScorerType = 'zero' RETURN @ScorerType END IF LEN(@Master_MiddleName) = 0 AND LEN(@Duplicate_MiddleName) = 0 BEGIN SET @ScorerType = 'likely' RETURN @ScorerType END IF LEN(@Master_MiddleName) = 0 OR LEN(@Duplicate_MiddleName) = 0 BEGIN SET @ScorerType = 'zero' RETURN @ScorerType END IF dbo.LEVENSHTEIN(@Master_MiddleName, @Duplicate_MiddleName) < 3 BEGIN SET @ScorerType = 'zero' RETURN @ScorerType END IF dbo.JaroWinkler(@Master_MiddleName, @Duplicate_MiddleName) > 0.6 BEGIN SET @ScorerType = 'zero' RETURN @ScorerType END IF @Master_MiddleName <> @Duplicate_MiddleName BEGIN SET @ScorerType = 'zero' RETURN @ScorerType END END -- one of the forenames is empty ELSE IF LEN(@Master_Forename) = 0 OR LEN(@Duplicate_Forename) = 0 BEGIN IF @Master_MiddleName = @Duplicate_MiddleName BEGIN SET @ScorerType = 'zero' RETURN @ScorerType END IF LEN(@Master_MiddleName) = 0 AND LEN(@Duplicate_MiddleName) = 0 BEGIN SET @ScorerType = 'likely' RETURN @ScorerType END IF LEN(@Master_MiddleName) = 0 OR LEN(@Duplicate_MiddleName) = 0 BEGIN SET @ScorerType = 'likely' RETURN @ScorerType END IF dbo.LEVENSHTEIN(@Master_MiddleName, @Duplicate_MiddleName) < 3 BEGIN SET @ScorerType = 'zero' RETURN @ScorerType END IF dbo.JaroWinkler(@Master_MiddleName, @Duplicate_MiddleName) > 0.6 BEGIN SET @ScorerType = 'likely' RETURN @ScorerType END IF @Master_MiddleName <> @Duplicate_MiddleName BEGIN SET @ScorerType = 'zero' RETURN @ScorerType END END -- forenames approximately the same ELSE IF dbo.LEVENSHTEIN(@Master_Forename, @Duplicate_Forename) < 3 BEGIN IF @Master_MiddleName = @Duplicate_MiddleName BEGIN SET @ScorerType = 'likely' RETURN @ScorerType END IF LEN(@Master_MiddleName) = 0 AND LEN(@Duplicate_MiddleName) = 0 BEGIN SET @ScorerType = 'likely' RETURN @ScorerType END IF LEN(@Master_MiddleName) = 0 OR LEN(@Duplicate_MiddleName) = 0 BEGIN SET @ScorerType = 'likely' RETURN @ScorerType END IF dbo.LEVENSHTEIN(@Master_MiddleName, @Duplicate_MiddleName) < 3 BEGIN SET @ScorerType = 'possible' RETURN @ScorerType END IF dbo.JaroWinkler(@Master_MiddleName, @Duplicate_MiddleName) > 0.6 BEGIN SET @ScorerType = 'possible' RETURN @ScorerType END IF @Master_MiddleName <> @Duplicate_MiddleName BEGIN SET @ScorerType = 'zero' RETURN @ScorerType END END -- forenames approximately sound the same ELSE IF dbo.LEVENSHTEIN(@Master_mkName2, @Duplicate_mkName2) < 3 BEGIN IF @Master_MiddleName = @Duplicate_MiddleName BEGIN SET @ScorerType = 'likely' RETURN @ScorerType END IF LEN(@Master_MiddleName) = 0 AND LEN(@Duplicate_MiddleName) = 0 BEGIN SET @ScorerType = 'possible' RETURN @ScorerType END IF LEN(@Master_MiddleName) = 0 OR LEN(@Duplicate_MiddleName) = 0 BEGIN SET @ScorerType = 'possible' RETURN @ScorerType END IF dbo.LEVENSHTEIN(@Master_MiddleName, @Duplicate_MiddleName) < 3 BEGIN SET @ScorerType = 'zero' RETURN @ScorerType END IF dbo.JaroWinkler(@Master_MiddleName, @Duplicate_MiddleName) > 0.6 BEGIN SET @ScorerType = 'possible' RETURN @ScorerType END IF @Master_MiddleName <> @Duplicate_MiddleName BEGIN SET @ScorerType = 'zero' RETURN @ScorerType END END -- forenames possibly the same ELSE IF dbo.JaroWinkler(@Master_Forename, @Duplicate_Forename) > 0.6 BEGIN IF @Master_MiddleName = @Duplicate_MiddleName BEGIN SET @ScorerType = 'likely' RETURN @ScorerType END IF LEN(@Master_MiddleName) = 0 AND LEN(@Duplicate_MiddleName) = 0 BEGIN SET @ScorerType = 'likely' RETURN @ScorerType END IF LEN(@Master_MiddleName) = 0 OR LEN(@Duplicate_MiddleName) = 0 BEGIN SET @ScorerType = 'likely' RETURN @ScorerType END IF dbo.LEVENSHTEIN(@Master_MiddleName, @Duplicate_MiddleName) < 3 BEGIN SET @ScorerType = 'possible' RETURN @ScorerType END IF dbo.JaroWinkler(@Master_MiddleName, @Duplicate_MiddleName) > 0.6 BEGIN SET @ScorerType = 'possible' RETURN @ScorerType END IF @Master_MiddleName <> @Duplicate_MiddleName BEGIN SET @ScorerType = 'zero' RETURN @ScorerType END END -- forenames are not equal ELSE BEGIN IF @Master_MiddleName = @Duplicate_MiddleName BEGIN SET @ScorerType = 'zero' RETURN @ScorerType END IF LEN(@Master_MiddleName) = 0 AND LEN(@Duplicate_MiddleName) = 0 BEGIN SET @ScorerType = 'zero' RETURN @ScorerType END IF LEN(@Master_MiddleName) = 0 OR LEN(@Duplicate_MiddleName) = 0 BEGIN BEGIN SET @ScorerType = 'zero' RETURN @ScorerType END END IF dbo.LEVENSHTEIN(@Master_MiddleName, @Duplicate_MiddleName) < 3 BEGIN SET @ScorerType = 'zero' RETURN @ScorerType END IF dbo.JaroWinkler(@Master_MiddleName, @Duplicate_MiddleName) > 0.6 BEGIN SET @ScorerType = 'possible' RETURN @ScorerType END IF @Master_MiddleName <> @Duplicate_MiddleName BEGIN SET @ScorerType = 'zero' RETURN @ScorerType END END END -- if lastname sounds equal: --ELSE IF @Master_mkName1 = @Duplicate_mkName1 -- BEGIN -- IF @Master_Forename = @Duplicate_Forename -- BEGIN -- IF @Master_MiddleName = @Duplicate_MiddleName BEGIN SET @ScorerType = 'likely' RETURN @ScorerType END -- IF LEN(@Master_MiddleName) = 0 AND LEN(@Duplicate_MiddleName) = 0 BEGIN SET @ScorerType = 'likely' RETURN @ScorerType END -- IF LEN(@Master_MiddleName) = 0 OR LEN(@Duplicate_MiddleName) = 0 BEGIN SET @ScorerType = 'likely' RETURN @ScorerType END -- IF dbo.LEVENSHTEIN(@Master_MiddleName, @Duplicate_MiddleName) < 3 BEGIN SET @ScorerType = 'possible' RETURN @ScorerType END -- IF dbo.JaroWinkler(@Master_MiddleName, @Duplicate_MiddleName) > 0.6 BEGIN SET @ScorerType = 'possible' RETURN @ScorerType END -- IF @Master_MiddleName <> @Duplicate_MiddleName BEGIN SET @ScorerType = 'zero' RETURN @ScorerType END -- END -- ELSE IF @Master_mkName2 = @Duplicate_mkName2 -- BEGIN -- IF @Master_MiddleName = @Duplicate_MiddleName BEGIN SET @ScorerType = 'likely' RETURN @ScorerType END -- IF LEN(@Master_MiddleName) = 0 AND LEN(@Duplicate_MiddleName) = 0 BEGIN SET @ScorerType = 'possible' RETURN @ScorerType END -- IF LEN(@Master_MiddleName) = 0 OR LEN(@Duplicate_MiddleName) = 0 BEGIN SET @ScorerType = 'possible' RETURN @ScorerType END -- IF dbo.LEVENSHTEIN(@Master_MiddleName, @Duplicate_MiddleName) < 3 BEGIN SET @ScorerType = 'possible' RETURN @ScorerType END -- IF dbo.JaroWinkler(@Master_MiddleName, @Duplicate_MiddleName) > 0.6 BEGIN SET @ScorerType = 'possible' RETURN @ScorerType END -- IF @Master_MiddleName <> @Duplicate_MiddleName BEGIN SET @ScorerType = 'zero' RETURN @ScorerType END -- END -- ELSE IF LEN(@Master_Forename) = 0 AND LEN(@Duplicate_Forename) = 0 -- BEGIN -- IF @Master_MiddleName = @Duplicate_MiddleName BEGIN SET @ScorerType = 'zero' RETURN @ScorerType END -- IF LEN(@Master_MiddleName) = 0 AND LEN(@Duplicate_MiddleName) = 0 BEGIN SET @ScorerType = 'likely' RETURN @ScorerType END -- IF LEN(@Master_MiddleName) = 0 OR LEN(@Duplicate_MiddleName) = 0 BEGIN SET @ScorerType = 'zero' RETURN @ScorerType END -- IF dbo.LEVENSHTEIN(@Master_MiddleName, @Duplicate_MiddleName) < 3 BEGIN SET @ScorerType = 'zero' RETURN @ScorerType END -- IF dbo.JaroWinkler(@Master_MiddleName, @Duplicate_MiddleName) > 0.6 BEGIN SET @ScorerType = 'zero' RETURN @ScorerType END -- IF @Master_MiddleName <> @Duplicate_MiddleName BEGIN SET @ScorerType = 'zero' RETURN @ScorerType END -- END -- ELSE IF LEN(@Master_Forename) = 0 OR LEN(@Duplicate_Forename) = 0 -- BEGIN -- IF @Master_MiddleName = @Duplicate_MiddleName BEGIN SET @ScorerType = 'zero' RETURN @ScorerType END -- IF LEN(@Master_MiddleName) = 0 AND LEN(@Duplicate_MiddleName) = 0 BEGIN SET @ScorerType = 'likely' RETURN @ScorerType END -- IF LEN(@Master_MiddleName) = 0 OR LEN(@Duplicate_MiddleName) = 0 BEGIN SET @ScorerType = 'likely' RETURN @ScorerType END -- IF dbo.LEVENSHTEIN(@Master_MiddleName, @Duplicate_MiddleName) < 3 BEGIN SET @ScorerType = 'zero' RETURN @ScorerType END -- IF dbo.JaroWinkler(@Master_MiddleName, @Duplicate_MiddleName) > 0.6 BEGIN SET @ScorerType = 'zero' RETURN @ScorerType END -- IF @Master_MiddleName <> @Duplicate_MiddleName BEGIN SET @ScorerType = 'zero' RETURN @ScorerType END -- END -- ELSE IF dbo.LEVENSHTEIN(@Master_Forename, @Duplicate_Forename) < 3 -- BEGIN -- IF @Master_MiddleName = @Duplicate_MiddleName BEGIN SET @ScorerType = 'possible' RETURN @ScorerType END -- IF LEN(@Master_MiddleName) = 0 AND LEN(@Duplicate_MiddleName) = 0 BEGIN SET @ScorerType = 'possible' RETURN @ScorerType END -- IF LEN(@Master_MiddleName) = 0 OR LEN(@Duplicate_MiddleName) = 0 BEGIN SET @ScorerType = 'possible' RETURN @ScorerType END -- IF dbo.LEVENSHTEIN(@Master_MiddleName, @Duplicate_MiddleName) < 3 BEGIN SET @ScorerType = 'possible' RETURN @ScorerType END -- IF dbo.JaroWinkler(@Master_MiddleName, @Duplicate_MiddleName) > 0.6 BEGIN SET @ScorerType = 'possible' RETURN @ScorerType END -- IF @Master_MiddleName <> @Duplicate_MiddleName BEGIN SET @ScorerType = 'zero' RETURN @ScorerType END -- END -- ELSE IF dbo.LEVENSHTEIN(@Master_mkName2, @Duplicate_mkName2) < 3 -- BEGIN -- IF @Master_MiddleName = @Duplicate_MiddleName BEGIN SET @ScorerType = 'possible' RETURN @ScorerType END -- IF LEN(@Master_MiddleName) = 0 AND LEN(@Duplicate_MiddleName) = 0 BEGIN SET @ScorerType = 'possible' RETURN @ScorerType END -- IF LEN(@Master_MiddleName) = 0 OR LEN(@Duplicate_MiddleName) = 0 BEGIN SET @ScorerType = 'possible' RETURN @ScorerType END -- IF dbo.LEVENSHTEIN(@Master_MiddleName, @Duplicate_MiddleName) < 3 BEGIN SET @ScorerType = 'zero' RETURN @ScorerType END -- IF dbo.JaroWinkler(@Master_MiddleName, @Duplicate_MiddleName) > 0.6 BEGIN SET @ScorerType = 'possible' RETURN @ScorerType END -- IF @Master_MiddleName <> @Duplicate_MiddleName BEGIN SET @ScorerType = 'zero' RETURN @ScorerType END -- END -- ELSE IF dbo.JaroWinkler(@Master_Forename, @Duplicate_Forename) > 0.6 -- BEGIN -- IF @Master_MiddleName = @Duplicate_MiddleName BEGIN SET @ScorerType = 'possible' RETURN @ScorerType END -- IF LEN(@Master_MiddleName) = 0 AND LEN(@Duplicate_MiddleName) = 0 BEGIN SET @ScorerType = 'possible' RETURN @ScorerType END -- IF LEN(@Master_MiddleName) = 0 OR LEN(@Duplicate_MiddleName) = 0 BEGIN SET @ScorerType = 'possible' RETURN @ScorerType END -- IF dbo.LEVENSHTEIN(@Master_MiddleName, @Duplicate_MiddleName) < 3 BEGIN SET @ScorerType = 'possible' RETURN @ScorerType END -- IF dbo.JaroWinkler(@Master_MiddleName, @Duplicate_MiddleName) > 0.6 BEGIN SET @ScorerType = 'possible' RETURN @ScorerType END -- IF @Master_MiddleName <> @Duplicate_MiddleName BEGIN SET @ScorerType = 'zero' RETURN @ScorerType END -- END -- ELSE -- BEGIN -- IF @Master_MiddleName = @Duplicate_MiddleName BEGIN SET @ScorerType = 'zero' RETURN @ScorerType END -- IF LEN(@Master_MiddleName) = 0 AND LEN(@Duplicate_MiddleName) = 0 BEGIN SET @ScorerType = 'zero' RETURN @ScorerType END -- IF LEN(@Master_MiddleName) = 0 OR LEN(@Duplicate_MiddleName) = 0 BEGIN SET @ScorerType = 'zero' RETURN @ScorerType END -- IF dbo.LEVENSHTEIN(@Master_MiddleName, @Duplicate_MiddleName) < 3 BEGIN SET @ScorerType = 'zero' RETURN @ScorerType END -- IF dbo.JaroWinkler(@Master_MiddleName, @Duplicate_MiddleName) > 0.6 BEGIN SET @ScorerType = 'zero' RETURN @ScorerType END -- IF @Master_MiddleName <> @Duplicate_MiddleName BEGIN SET @ScorerType = 'zero' RETURN @ScorerType END -- END -- END
----------------------------------- http://www.SQL4n00bs.com
|
|
|
|
|
Mr or Mrs. 500
      
Group: General Forum Members
Last Login: Today @ 4:21 PM
Points: 512,
Visits: 2,245
|
|
So many views and no replies?! Where are the clever guys these days?!
----------------------------------- http://www.SQL4n00bs.com
|
|
|
|
|
SSCrazy Eights
        
Group: General Forum Members
Last Login: Today @ 3:32 PM
Points: 8,980,
Visits: 8,540
|
|
Abu Dina (2/5/2013)
So many views and no replies?! Where are the clever guys these days?! 
I think the biggest challenge to getting an answer is that this code is really scary!!! It is a scalar function with lots and lots of calls to other scalar functions (LEVENSHTEIN, JaroWinkler). You execute a scalar function, examine the results and then execute another scalar function. It is no surprise the performance is not acceptable.
The problem is we don't know what this is supposed to do. I know you have a certain format for the data being passed in. I have recently seen your threads splitting your strings into 3 which is obviously used inside here. You will need to rewrite this and very possibly those other 2 functions if you want to have a chance at making this faster.
_______________________________________________________________
Need help? Help us help you.
Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.
Need to split a string? Try Jeff Moden's splitter.
Cross Tabs and Pivots, Part 1 – Converting Rows to Columns Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs
|
|
|
|
|
Mr or Mrs. 500
      
Group: General Forum Members
Last Login: Today @ 4:21 PM
Points: 512,
Visits: 2,245
|
|
Thanks for your input Sean. I really appreciate it.
As mentioned, this is part one part of a record linkage exercise I'm working on. The way it works is like this:
I have two record sets (both contain Firstname, Surname, address1, Town, Postcode) and I want to find duplicate individuals.
Of course the eaiest thing to do is to join the two sets where all 5 columns match but as you know, data is never as clean as one would like. So another approach is to join the two data sets on phonetic code of the surname and phonetic address1.
Once that's done, I take each matching set and then compare the other fields and give them scores. The function above is meant to work out the name score.
So in my effort to make it faster, I pass all the required details to the function I then do checks like:
If Surname from record Set A = Surname from record Set B AND forename = forename then the ultimate score will depends on the contents of the middlename
IF A.Surname = B.surname AND Phonetic Forename = phonetic Forename then the score will against depends on middle name
etc...
But as you can see, there are so many variations like
IF Surname SOUNDS like Surname AND forename = forename
etc.. etc...
Does this make sense?!
Does this make sense?
----------------------------------- http://www.SQL4n00bs.com
|
|
|
|