|
|
|
SSC-Addicted
      
Group: General Forum Members
Last Login: Today @ 10:21 AM
Points: 489,
Visits: 2,140
|
|
I have a lot of respect for the optimisation gurus on this forum so when one of these guys (Jeff Moden) makes a comment like in this thread http://www.sqlservercentral.com/Forums/Topic1420602-3077-2.aspx#bm1420669 then I get really worried.
I've also found that if a function has over 200 IF statements, that it's time to go back to the drawing board. Many such "hopeless" functions frequently do have a much more effective and easy to code solution. there may also be the case where a stored procedure is more appropriate than a function
This has been on my mind since the end of last week so now I want to put this issue to sleep by opening a new discussion to see if any of you good people can help me!
The function in question is used to determine whether two records have the same Firstname, Middlename and Surname.
Consider the following example of two records:

It is clear that these two records refer to the same person but doing a self join on ContactName, Address1 and Postcode would not return these two records as duplicates because there are differences in Address1 and name.
Another approach is to join the records only on postcode then compare the name elements to see calculate a matching score for the name part.
SELECT a.contactname as master_contactname, b.contactname as duplicate_contactname FROM someTable as a inner join someTable as b on a.Postcode = b.PostCode WHERE a.ID < b.ID
My Name scoring function takes 8 parameters (4 for the master record and the other 4 of the potential duplicate record)
1) NormalisedName (Forname, Surname, Middlename) 2) PhoneticForename 3) PhoneticSurname 4) PhoneticMiddlename
Now this is where the 200 + IF statements start and the reason for them is as follows:
IF Master_Surname = Duplicate_Surname AND Master_Firstname = Duplicate_Firstname THEN the ultimate outcome of the name comparison will depend on how the middle names compare BUT the middle names can have 5 different possibilities
IF Master_Middlename = Duplicate_Middlename then EXACT match IF Master_Middlename is empty or Duplicate_Middlename is empty then LIKELY match IF Master_Middlename SOUNDS the same as Duplicate_Middlename then LIKELY match
...etc ...etc
I also have to account for when Master_Surname SOUNDS like Duplicate Surname and for this I also have to look at the firstname and middlename
Hope this makes sense?
I can't see how else to do this without using so many IFs?
In the end I created a SQL CLR in C# which is mega fast compared to the SQL UDF equivalent.
Any suggestions?
----------------------------------- http://www.SQL4n00bs.com
|
|
|
|
|
SSC Veteran
      
Group: General Forum Members
Last Login: Wednesday, May 08, 2013 10:24 AM
Points: 237,
Visits: 413
|
|
| I would have used a multi step UNION, with one of the columns returned the rank value you need. That would be more optimal than the UDF. I would have to test it out to see it compare against the CLR.
|
|
|
|
|
SSC-Addicted
      
Group: General Forum Members
Last Login: Today @ 10:21 AM
Points: 489,
Visits: 2,140
|
|
I don't understand what you mean by a multi step UNION. Can you give a simple example?
----------------------------------- http://www.SQL4n00bs.com
|
|
|
|
|
SSC Veteran
      
Group: General Forum Members
Last Login: Wednesday, May 08, 2013 10:24 AM
Points: 237,
Visits: 413
|
|
I whipped this up quick, and looking at it you would probably want to combine all the EXACTSinto one UNION etc... but I broke them out so you can see what I was doing.
Something like this:
CREATE PROCEDURE dbo.ProcName (@ID1 INT, @ID2 INT) AS BEGIN
SELECT A.id,'EXACT' FROM sometable A CROSS APPLY sometable B WHERE A.ID=@id1 AND B.id=@id2 AND A.surname=B.surname AND A.firstname=B.firstname UNION SELECT A.id,'EXACT' FROM sometable A CROSS APPLY sometable B WHERE A.ID=@id1 AND B.id=@id2 AND A.middle=B.middle UNION SELECT A.id,'LIKELY' FROM sometable A CROSS APPLY sometable B WHERE A.ID=@id1 AND B.id=@id2 AND (A.middle=NULL OR A.middle ='' OR A.middle=NULL OR A.middle ='') etc .....
END
|
|
|
|
|
SSC-Addicted
      
Group: General Forum Members
Last Login: Today @ 10:21 AM
Points: 489,
Visits: 2,140
|
|
Thanks for this. I see what you're doing but I will have to play around with it to get it to fit with my name matching matrix which looks something like the below: (there are other ones for when the surnames SOUND the same, Surnames approximately the same, surnames sound approximately the same, surnames containment, surname sound containment etc....
<lastnames match="equal"> <firstnames match="equal"> <middlenames match="equal">sure</middlenames> <middlenames match="both_empty">sure</middlenames> <middlenames match="one_empty">sure</middlenames> <middlenames match="approx">likely</middlenames> <middlenames match="contains">likely</middlenames> <middlenames match="unequal">possible</middlenames> </firstnames>
<firstnames match="sounds_equal"> <middlenames match="equal">sure</middlenames> <middlenames match="both_empty">likely</middlenames> <middlenames match="one_empty">likely</middlenames> <middlenames match="approx">possible</middlenames> <middlenames match="contains">possible</middlenames> <middlenames match="unequal">zero</middlenames> </firstnames>
<firstnames match="both_empty"> <middlenames match="equal">zero</middlenames> <middlenames match="both_empty">likely</middlenames> <middlenames match="one_empty">zero</middlenames> <middlenames match="approx">zero</middlenames> <middlenames match="contains">zero</middlenames> <middlenames match="unequal">zero</middlenames> </firstnames>
<firstnames match="one_empty"> <middlenames match="equal">zero</middlenames> <middlenames match="both_empty">likely</middlenames> <middlenames match="one_empty">likely</middlenames> <middlenames match="approx">zero</middlenames> <middlenames match="contains">likely</middlenames> <middlenames match="unequal">zero</middlenames> </firstnames>
<firstnames match="approx"> <middlenames match="equal">likely</middlenames> <middlenames match="both_empty">likely</middlenames> <middlenames match="one_empty">likely</middlenames> <middlenames match="approx">possible</middlenames> <middlenames match="contains">possible</middlenames> <middlenames match="unequal">zero</middlenames> </firstnames>
<firstnames match="sounds_approx"> <middlenames match="equal">likely</middlenames> <middlenames match="both_empty">possible</middlenames> <middlenames match="one_empty">possible</middlenames> <middlenames match="approx">zero</middlenames> <middlenames match="contains">possible</middlenames> <middlenames match="unequal">zero</middlenames> </firstnames>
<firstnames match="contains"> <middlenames match="equal">likely</middlenames> <middlenames match="both_empty">likely</middlenames> <middlenames match="one_empty">likely</middlenames> <middlenames match="approx">possible</middlenames> <middlenames match="contains">possible</middlenames> <middlenames match="unequal">zero</middlenames> </firstnames>
<firstnames match="unequal"> <middlenames match="equal">zero</middlenames> <middlenames match="both_empty">zero</middlenames> <middlenames match="one_empty">zero</middlenames> <middlenames match="approx">zero</middlenames> <middlenames match="contains">possible</middlenames> <middlenames match="unequal">zero</middlenames> </firstnames>
</lastnames>
----------------------------------- http://www.SQL4n00bs.com
|
|
|
|