Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase

Clean international Phone Numbers Expand / Collapse
Author
Message
Posted Monday, March 10, 2008 3:23 PM


SSChampion

SSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampion

Group: General Forum Members
Last Login: Yesterday @ 12:10 PM
Points: 10,191, Visits: 13,115
I have about 6300 phone numbers from around the world that I need to clean up before loading into SQL Server. I have gotten to the point where I have removed non-numeric characters and other junk data, but now I am somewhat stuck. Some of the numbers that were entered in the US have the country code (+xx or just xx). Some have the international prefix for dialing an international number from that country or the US (011). We would like to store just the in-country number, for example in the US that would be 10 digits, xxxxxxxxxx. We are planning on using masking, stored in our countries table, to display the number appropriately on the UI.

Here is what I think would work, but I can't get the implementation down. In a script, load the international prefix and if the first n characters match it, remove them, and the same thing for the country code.

Anyone know how to load a datatable in a script component set as a transform?




Jack Corbett

Applications Developer

Don't let the good be the enemy of the best. -- Paul Fleming

Check out these links on how to get faster and more accurate answers:
Forum Etiquette: How to post data/code on a forum to get the best help
Need an Answer? Actually, No ... You Need a Question
How to Post Performance Problems
Crosstabs and Pivots or How to turn rows into columns Part 1
Crosstabs and Pivots or How to turn rows into columns Part 2
Post #466995
Posted Monday, March 10, 2008 9:28 PM
SSC Journeyman

SSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC Journeyman

Group: General Forum Members
Last Login: Tuesday, November 26, 2013 5:24 PM
Points: 83, Visits: 153
Is it required to store the country codes as well? If not, you could store the right most 10 digits, that will give the phone number, right? By the way, this is under the assumption that all countries have 10 digit phone numbers :)

--------------------------------------------------------------------------------------
Save our world, its all we have! A must watch video Pale Blue Dot
Post #467129
Posted Monday, March 10, 2008 9:36 PM


SSChampion

SSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampion

Group: General Forum Members
Last Login: Yesterday @ 12:10 PM
Points: 10,191, Visits: 13,115
Not all countries have 10 character phone numbers. Some are more, some are less.



Jack Corbett

Applications Developer

Don't let the good be the enemy of the best. -- Paul Fleming

Check out these links on how to get faster and more accurate answers:
Forum Etiquette: How to post data/code on a forum to get the best help
Need an Answer? Actually, No ... You Need a Question
How to Post Performance Problems
Crosstabs and Pivots or How to turn rows into columns Part 1
Crosstabs and Pivots or How to turn rows into columns Part 2
Post #467134
Posted Tuesday, March 11, 2008 8:10 AM


SSC Veteran

SSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC Veteran

Group: General Forum Members
Last Login: Thursday, August 21, 2014 5:15 AM
Points: 268, Visits: 1,074
These clean up jobs are always tricky.

I think you're doing the right thing already, but I'd go even further with it. The first part is trying to standardise the input as much as you can so you can set up rules downstream. e.g. In a derived column, use an expression to replace your '+' with 00, remove any brackets and spaces and perhaps replace any single leading zeros with 00 if the length of the string is greater than say 10. Now you should have just a string with a list of numbers. Hopefully you have trapped all the possible variations and standardised them in this step.

I would then split the rows (conditional split) using this derived column into international numbers (00 prefix) and national numbers. Further work is going to be required on the international ones. What you could do is create a reference table for your international dialling codes and use a look up to return the codes to the pipeline based on say the first 3 characters of your string after the leading 00. If a record is found, then use an expression to cut those left most characters out of your string.

Following that if you have any remaining, you should be able to identify a particular rule that applies to these so you can include them in the cleanup too.

Finally when you insert these into the destination, I'd want to keep the countryid as a reference even if its not part of the telephone number column as you never know when you might want to use it.

Hope this helps.


Kindest Regards,

Frank Bazan
Post #467430
Posted Tuesday, March 11, 2008 8:21 AM


SSChampion

SSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampion

Group: General Forum Members
Last Login: Yesterday @ 12:10 PM
Points: 10,191, Visits: 13,115
Frank,

Thanks for the input. I actually think I have it figured out. I am doing something similar to what you are suggesting. In my destination system I have a countries table where I am storing the international prefix and the countty code. I have added another Source and I am using a Merge Join to get it all into 1 dataset. Then in a script component I am removing all the non-numeric characters and stripping the extension when it exists. Then I just have the "clean" phone number. Then I do a compare on the country code like this:

 If strPhone <> String.Empty Then
strCode = strPhone.Substring(0, CodeLength)

If strCode = Code Then
strPhone = Phone.Substring(CodeLength)
End If
End If

And yes, I do need the check for empty string as there are some phone numbers that, once purged of non-numeric characters are empty.

I'll post how the final results turn out.

As you said the cleanup is tricky because the source system did absolutely no checking for phone number validity. I had names, email addresses, and other "data" in the phone number




Jack Corbett

Applications Developer

Don't let the good be the enemy of the best. -- Paul Fleming

Check out these links on how to get faster and more accurate answers:
Forum Etiquette: How to post data/code on a forum to get the best help
Need an Answer? Actually, No ... You Need a Question
How to Post Performance Problems
Crosstabs and Pivots or How to turn rows into columns Part 1
Crosstabs and Pivots or How to turn rows into columns Part 2
Post #467443
Posted Tuesday, March 11, 2008 3:05 PM


SSChampion

SSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampion

Group: General Forum Members
Last Login: Yesterday @ 12:10 PM
Points: 10,191, Visits: 13,115
The solution in my last post appears to be working just like I want it to. Might not be the prettiest code in the world but it is getting me the data I think I want.



Jack Corbett

Applications Developer

Don't let the good be the enemy of the best. -- Paul Fleming

Check out these links on how to get faster and more accurate answers:
Forum Etiquette: How to post data/code on a forum to get the best help
Need an Answer? Actually, No ... You Need a Question
How to Post Performance Problems
Crosstabs and Pivots or How to turn rows into columns Part 1
Crosstabs and Pivots or How to turn rows into columns Part 2
Post #467779
Posted Tuesday, March 11, 2008 5:16 PM


SSC-Dedicated

SSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-Dedicated

Group: General Forum Members
Last Login: Yesterday @ 8:14 PM
Points: 35,215, Visits: 31,665
Oh... You really need to double check and make sure, Jack. As you probably know, Country Codes can be 1, 2, or 3 digits long... what you might not know (depending on what you're doing) is that International calls can also have a "city" code of 0 to 4 digits. Unless your lookup tables are spot on can you start looking for perfect matches starting at the 7th digit and working your way back, you can have a hell of a mess on your hands. And, don't forget that calls to Canada, Guam, Marianas, Puerto Rico, and other US Territories etc, are all under the North American Numbering Plan Association (NANPA) control and can all be reached with 1+NPA+NXX dialing.

--Jeff Moden
"RBAR is pronounced "ree-bar" and is a "Modenism" for "Row-By-Agonizing-Row".

First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column."

(play on words) "Just because you CAN do something in T-SQL, doesn't mean you SHOULDN'T." --22 Aug 2013

Helpful Links:
How to post code problems
How to post performance problems
Post #467826
Posted Tuesday, March 11, 2008 5:53 PM


SSChampion

SSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampion

Group: General Forum Members
Last Login: Yesterday @ 12:10 PM
Points: 10,191, Visits: 13,115
Thanks for the input Jeff. Needless to say I learned a lot about international phone numbers recently. I am pretty confident I have the right codes. I know I have the right country codes, the problem with the international dialing prefix is that the majority of the data was entered in the US so they have the 011 US/North America international prefix. The spot check I did looks pretty good, but it will take the users to verify it.



Jack Corbett

Applications Developer

Don't let the good be the enemy of the best. -- Paul Fleming

Check out these links on how to get faster and more accurate answers:
Forum Etiquette: How to post data/code on a forum to get the best help
Need an Answer? Actually, No ... You Need a Question
How to Post Performance Problems
Crosstabs and Pivots or How to turn rows into columns Part 1
Crosstabs and Pivots or How to turn rows into columns Part 2
Post #467832
« Prev Topic | Next Topic »

Add to briefcase

Permissions Expand / Collapse