Click here to monitor SSC
SQLServerCentral is supported by Redgate
 
Log in  ::  Register  ::  Not logged in
 
 
 


Clean international Phone Numbers


Clean international Phone Numbers

Author
Message
Jack Corbett
  Jack Corbett
SSChampion
SSChampion (12K reputation)SSChampion (12K reputation)SSChampion (12K reputation)SSChampion (12K reputation)SSChampion (12K reputation)SSChampion (12K reputation)SSChampion (12K reputation)SSChampion (12K reputation)

Group: General Forum Members
Points: 12334 Visits: 14863
I have about 6300 phone numbers from around the world that I need to clean up before loading into SQL Server. I have gotten to the point where I have removed non-numeric characters and other junk data, but now I am somewhat stuck. Some of the numbers that were entered in the US have the country code (+xx or just xx). Some have the international prefix for dialing an international number from that country or the US (011). We would like to store just the in-country number, for example in the US that would be 10 digits, xxxxxxxxxx. We are planning on using masking, stored in our countries table, to display the number appropriately on the UI.

Here is what I think would work, but I can't get the implementation down. In a script, load the international prefix and if the first n characters match it, remove them, and the same thing for the country code.

Anyone know how to load a datatable in a script component set as a transform?



Jack Corbett

Applications Developer

Don't let the good be the enemy of the best. -- Paul Fleming
At best you can say that one job may be more secure than another, but total job security is an illusion. -- Rod at work

Check out these links on how to get faster and more accurate answers:
Forum Etiquette: How to post data/code on a forum to get the best help
Need an Answer? Actually, No ... You Need a Question
How to Post Performance Problems
Crosstabs and Pivots or How to turn rows into columns Part 1
Crosstabs and Pivots or How to turn rows into columns Part 2
XPSCodes-667952
XPSCodes-667952
SSC Journeyman
SSC Journeyman (85 reputation)SSC Journeyman (85 reputation)SSC Journeyman (85 reputation)SSC Journeyman (85 reputation)SSC Journeyman (85 reputation)SSC Journeyman (85 reputation)SSC Journeyman (85 reputation)SSC Journeyman (85 reputation)

Group: General Forum Members
Points: 85 Visits: 153
Is it required to store the country codes as well? If not, you could store the right most 10 digits, that will give the phone number, right? By the way, this is under the assumption that all countries have 10 digit phone numbers Smile

--------------------------------------------------------------------------------------
Save our world, its all we have! A must watch video Pale Blue Dot
Jack Corbett
  Jack Corbett
SSChampion
SSChampion (12K reputation)SSChampion (12K reputation)SSChampion (12K reputation)SSChampion (12K reputation)SSChampion (12K reputation)SSChampion (12K reputation)SSChampion (12K reputation)SSChampion (12K reputation)

Group: General Forum Members
Points: 12334 Visits: 14863
Not all countries have 10 character phone numbers. Some are more, some are less.



Jack Corbett

Applications Developer

Don't let the good be the enemy of the best. -- Paul Fleming
At best you can say that one job may be more secure than another, but total job security is an illusion. -- Rod at work

Check out these links on how to get faster and more accurate answers:
Forum Etiquette: How to post data/code on a forum to get the best help
Need an Answer? Actually, No ... You Need a Question
How to Post Performance Problems
Crosstabs and Pivots or How to turn rows into columns Part 1
Crosstabs and Pivots or How to turn rows into columns Part 2
Frank Bazan
Frank Bazan
Old Hand
Old Hand (310 reputation)Old Hand (310 reputation)Old Hand (310 reputation)Old Hand (310 reputation)Old Hand (310 reputation)Old Hand (310 reputation)Old Hand (310 reputation)Old Hand (310 reputation)

Group: General Forum Members
Points: 310 Visits: 1087
These clean up jobs are always tricky.

I think you're doing the right thing already, but I'd go even further with it. The first part is trying to standardise the input as much as you can so you can set up rules downstream. e.g. In a derived column, use an expression to replace your '+' with 00, remove any brackets and spaces and perhaps replace any single leading zeros with 00 if the length of the string is greater than say 10. Now you should have just a string with a list of numbers. Hopefully you have trapped all the possible variations and standardised them in this step.

I would then split the rows (conditional split) using this derived column into international numbers (00 prefix) and national numbers. Further work is going to be required on the international ones. What you could do is create a reference table for your international dialling codes and use a look up to return the codes to the pipeline based on say the first 3 characters of your string after the leading 00. If a record is found, then use an expression to cut those left most characters out of your string.

Following that if you have any remaining, you should be able to identify a particular rule that applies to these so you can include them in the cleanup too.

Finally when you insert these into the destination, I'd want to keep the countryid as a reference even if its not part of the telephone number column as you never know when you might want to use it.

Hope this helps.

Kindest Regards,

Frank Bazan
Jack Corbett
  Jack Corbett
SSChampion
SSChampion (12K reputation)SSChampion (12K reputation)SSChampion (12K reputation)SSChampion (12K reputation)SSChampion (12K reputation)SSChampion (12K reputation)SSChampion (12K reputation)SSChampion (12K reputation)

Group: General Forum Members
Points: 12334 Visits: 14863
Frank,

Thanks for the input. I actually think I have it figured out. I am doing something similar to what you are suggesting. In my destination system I have a countries table where I am storing the international prefix and the countty code. I have added another Source and I am using a Merge Join to get it all into 1 dataset. Then in a script component I am removing all the non-numeric characters and stripping the extension when it exists. Then I just have the "clean" phone number. Then I do a compare on the country code like this:


If strPhone <> String.Empty Then
strCode = strPhone.Substring(0, CodeLength)

If strCode = Code Then
strPhone = Phone.Substring(CodeLength)
End If
End If



And yes, I do need the check for empty string as there are some phone numbers that, once purged of non-numeric characters are empty.

I'll post how the final results turn out.

As you said the cleanup is tricky because the source system did absolutely no checking for phone number validity. I had names, email addresses, and other "data" in the phone numberHehe



Jack Corbett

Applications Developer

Don't let the good be the enemy of the best. -- Paul Fleming
At best you can say that one job may be more secure than another, but total job security is an illusion. -- Rod at work

Check out these links on how to get faster and more accurate answers:
Forum Etiquette: How to post data/code on a forum to get the best help
Need an Answer? Actually, No ... You Need a Question
How to Post Performance Problems
Crosstabs and Pivots or How to turn rows into columns Part 1
Crosstabs and Pivots or How to turn rows into columns Part 2
Jack Corbett
  Jack Corbett
SSChampion
SSChampion (12K reputation)SSChampion (12K reputation)SSChampion (12K reputation)SSChampion (12K reputation)SSChampion (12K reputation)SSChampion (12K reputation)SSChampion (12K reputation)SSChampion (12K reputation)

Group: General Forum Members
Points: 12334 Visits: 14863
The solution in my last post appears to be working just like I want it to. Might not be the prettiest code in the world but it is getting me the data I think I want.



Jack Corbett

Applications Developer

Don't let the good be the enemy of the best. -- Paul Fleming
At best you can say that one job may be more secure than another, but total job security is an illusion. -- Rod at work

Check out these links on how to get faster and more accurate answers:
Forum Etiquette: How to post data/code on a forum to get the best help
Need an Answer? Actually, No ... You Need a Question
How to Post Performance Problems
Crosstabs and Pivots or How to turn rows into columns Part 1
Crosstabs and Pivots or How to turn rows into columns Part 2
Jeff Moden
Jeff Moden
SSC Guru
SSC Guru (52K reputation)SSC Guru (52K reputation)SSC Guru (52K reputation)SSC Guru (52K reputation)SSC Guru (52K reputation)SSC Guru (52K reputation)SSC Guru (52K reputation)SSC Guru (52K reputation)

Group: General Forum Members
Points: 52130 Visits: 40320
Oh... You really need to double check and make sure, Jack. As you probably know, Country Codes can be 1, 2, or 3 digits long... what you might not know (depending on what you're doing) is that International calls can also have a "city" code of 0 to 4 digits. Unless your lookup tables are spot on can you start looking for perfect matches starting at the 7th digit and working your way back, you can have a hell of a mess on your hands. And, don't forget that calls to Canada, Guam, Marianas, Puerto Rico, and other US Territories etc, are all under the North American Numbering Plan Association (NANPA) control and can all be reached with 1+NPA+NXX dialing.

--Jeff Moden

RBAR is pronounced ree-bar and is a Modenism for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column.
Although they tell us that they want it real bad, our primary goal is to ensure that we dont actually give it to them that way.
Although change is inevitable, change for the better is not.
Just because you can do something in PowerShell, doesnt mean you should. Wink

Helpful Links:
How to post code problems
How to post performance problems
Forum FAQs
Jack Corbett
  Jack Corbett
SSChampion
SSChampion (12K reputation)SSChampion (12K reputation)SSChampion (12K reputation)SSChampion (12K reputation)SSChampion (12K reputation)SSChampion (12K reputation)SSChampion (12K reputation)SSChampion (12K reputation)

Group: General Forum Members
Points: 12334 Visits: 14863
Thanks for the input Jeff. Needless to say I learned a lot about international phone numbers recently. I am pretty confident I have the right codes. I know I have the right country codes, the problem with the international dialing prefix is that the majority of the data was entered in the US so they have the 011 US/North America international prefix. The spot check I did looks pretty good, but it will take the users to verify it.



Jack Corbett

Applications Developer

Don't let the good be the enemy of the best. -- Paul Fleming
At best you can say that one job may be more secure than another, but total job security is an illusion. -- Rod at work

Check out these links on how to get faster and more accurate answers:
Forum Etiquette: How to post data/code on a forum to get the best help
Need an Answer? Actually, No ... You Need a Question
How to Post Performance Problems
Crosstabs and Pivots or How to turn rows into columns Part 1
Crosstabs and Pivots or How to turn rows into columns Part 2
Go


Permissions

You can't post new topics.
You can't post topic replies.
You can't post new polls.
You can't post replies to polls.
You can't edit your own topics.
You can't delete your own topics.
You can't edit other topics.
You can't delete other topics.
You can't edit your own posts.
You can't edit other posts.
You can't delete your own posts.
You can't delete other posts.
You can't post events.
You can't edit your own events.
You can't edit other events.
You can't delete your own events.
You can't delete other events.
You can't send private messages.
You can't send emails.
You can read topics.
You can't vote in polls.
You can't upload attachments.
You can download attachments.
You can't post HTML code.
You can't edit HTML code.
You can't post IFCode.
You can't post JavaScript.
You can post emoticons.
You can't post or upload images.

Select a forum

































































































































































SQLServerCentral


Search