Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase «««1234»»

Does anyone have a good 'validate email' function? Expand / Collapse
Author
Message
Posted Thursday, August 28, 2014 10:16 AM


SSC-Addicted

SSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-Addicted

Group: General Forum Members
Last Login: Today @ 2:50 PM
Points: 472, Visits: 1,914
Steve Jones - SSC Editor (8/28/2014)
It's tough, as Troy Hunt found out.

http://www.troyhunt.com/2013/11/dont-trust-net-web-forms-email-regex.html

Honestly, it's good to try to validate, but don't discard data that doesn't meet the standard you set. It's highly likely your validation is broken. Even if you say "it's all North America", people move all the time and you might end up with foreign items over time. Or your company expands.

Throw suspected items into a queue and have someone review them. Contact the person. If you do this in a front end form, give the user a way to bypass validation and give their email with a captcha or some other type of "human" validation that their email is correct.


This seems sort of directed at me.

The email addresses I was processing were not from a web form, they were provided to us either by clients, or by data marts. Nothing was deleted, it was just given a 'status flag' that meant the email inspection found something wonky. A certain percentage of the contact information was expected to be inaccurate or incorrect. So if the project was running a long time without a high response rate, and the PM didn't want to buy more sample, the remainder email addresses could be released.

I had to do a lot of similar stuff with phone number cleaning/checking and formatting so the dialer could make the calls correctly.

Market research is a bummer.

*shrug*

Post #1608340
Posted Tuesday, September 2, 2014 9:56 AM
SSC Eights!

SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!

Group: General Forum Members
Last Login: Today @ 10:48 AM
Points: 881, Visits: 2,440
GilaMonster (8/27/2014)
sqldriver (8/27/2014)
GilaMonster (8/27/2014)
sqldriver (8/27/2014)
I used to use this. Not perfect by any stretch but got most of the junk out.


Does it flag this as invalid?
MyAccount+SSC@Gmail.com

http://gmailblog.blogspot.com/2008/03/2-hidden-ways-to-get-more-from-your.html


It does.


Which is why I point people at that linked blog post on this topic, as what is allowed is much broader than most regex statements allow.


Yes, my favorite reference is Mail::RFC822::Address: regexp-based address validation

Which has a sample Perl RegEx for RFC822 validation, NOT including the arbitrarily nested comments allowed by the RFC.
(??:\r)?[ \t])*(???:[^()<>@,;:\\".\[\] \000-\031]+(???:\r)?[ \t]
)+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(??:\r)?[ \t]))*"(??:
\r)?[ \t])*)(?:\.(??:\r)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(??
?:\r)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(??:\r)?[
\t]))*"(??:\r)?[ \t])*))*@(??:\r)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\0
31]+(???:\r)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\
](??:\r)?[ \t])*)(?:\.(??:\r)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+
(???:\r)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:
(?:\r)?[ \t])*))*|(?:[^()<>@,;:\\".\[\] \000-\031]+(???:\r)?[ \t])+|\Z
|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(??:\r)?[ \t]))*"(??:\r)
?[ \t])*)*\<(??:\r)?[ \t])*(?:@(?:[^()<>@,;:\\".\[\] \000-\031]+(???:\
r)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](??:\r)?[
\t])*)(?:\.(??:\r)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(???:\r)
?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](??:\r)?[ \t]
)*))*(?:,@(??:\r)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(???:\r)?[
\t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](??:\r)?[ \t])*
)(?:\.(??:\r)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(???:\r)?[ \t]
)+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](??:\r)?[ \t])*))*)
*??:\r)?[ \t])*)?(?:[^()<>@,;:\\".\[\] \000-\031]+(???:\r)?[ \t])+
|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(??:\r)?[ \t]))*"(??:\r
)?[ \t])*)(?:\.(??:\r)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(???:
\r)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(??:\r)?[ \t
]))*"(??:\r)?[ \t])*))*@(??:\r)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031
]+(???:\r)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](
??:\r)?[ \t])*)(?:\.(??:\r)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?
??:\r)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](??
:\r)?[ \t])*))*\>(??:\r)?[ \t])*)|(?:[^()<>@,;:\\".\[\] \000-\031]+(??
?:\r)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(??:\r)?
[ \t]))*"(??:\r)?[ \t])*)*??:\r)?[ \t])*(???:[^()<>@,;:\\".\[\]
\000-\031]+(???:\r)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|
\\.|(??:\r)?[ \t]))*"(??:\r)?[ \t])*)(?:\.(??:\r)?[ \t])*(?:[^()<>
@,;:\\".\[\] \000-\031]+(???:\r)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"
(?:[^\"\r\\]|\\.|(??:\r)?[ \t]))*"(??:\r)?[ \t])*))*@(??:\r)?[ \t]
)*(?:[^()<>@,;:\\".\[\] \000-\031]+(???:\r)?[ \t])+|\Z|(?=[\["()<>@,;:\\
".\[\]]))|\[([^\[\]\r\\]|\\.)*\](??:\r)?[ \t])*)(?:\.(??:\r)?[ \t])*(?
:[^()<>@,;:\\".\[\] \000-\031]+(???:\r)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[
\]]))|\[([^\[\]\r\\]|\\.)*\](??:\r)?[ \t])*))*|(?:[^()<>@,;:\\".\[\] \000-
\031]+(???:\r)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(
??:\r)?[ \t]))*"(??:\r)?[ \t])*)*\<(??:\r)?[ \t])*(?:@(?:[^()<>@,;
:\\".\[\] \000-\031]+(???:\r)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([
^\[\]\r\\]|\\.)*\](??:\r)?[ \t])*)(?:\.(??:\r)?[ \t])*(?:[^()<>@,;:\\"
.\[\] \000-\031]+(???:\r)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\
]\r\\]|\\.)*\](??:\r)?[ \t])*))*(?:,@(??:\r)?[ \t])*(?:[^()<>@,;:\\".\
[\] \000-\031]+(???:\r)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\
r\\]|\\.)*\](??:\r)?[ \t])*)(?:\.(??:\r)?[ \t])*(?:[^()<>@,;:\\".\[\]
\000-\031]+(???:\r)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]
|\\.)*\](??:\r)?[ \t])*))*)*??:\r)?[ \t])*)?(?:[^()<>@,;:\\".\[\] \0
00-\031]+(???:\r)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\
.|(??:\r)?[ \t]))*"(??:\r)?[ \t])*)(?:\.(??:\r)?[ \t])*(?:[^()<>@,
;:\\".\[\] \000-\031]+(???:\r)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?
:[^\"\r\\]|\\.|(??:\r)?[ \t]))*"(??:\r)?[ \t])*))*@(??:\r)?[ \t])*
(?:[^()<>@,;:\\".\[\] \000-\031]+(???:\r)?[ \t])+|\Z|(?=[\["()<>@,;:\\".
\[\]]))|\[([^\[\]\r\\]|\\.)*\](??:\r)?[ \t])*)(?:\.(??:\r)?[ \t])*(?:[
^()<>@,;:\\".\[\] \000-\031]+(???:\r)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]
]))|\[([^\[\]\r\\]|\\.)*\](??:\r)?[ \t])*))*\>(??:\r)?[ \t])*)(?:,\s*(
??:[^()<>@,;:\\".\[\] \000-\031]+(???:\r)?[ \t])+|\Z|(?=[\["()<>@,;:\\
".\[\]]))|"(?:[^\"\r\\]|\\.|(??:\r)?[ \t]))*"(??:\r)?[ \t])*)(?:\.(?
?:\r)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(???:\r)?[ \t])+|\Z|(?=[
\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(??:\r)?[ \t]))*"(??:\r)?[ \t
])*))*@(??:\r)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(???:\r)?[ \t
])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](??:\r)?[ \t])*)(?
:\.(??:\r)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(???:\r)?[ \t])+|
\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](??:\r)?[ \t])*))*|(?:
[^()<>@,;:\\".\[\] \000-\031]+(???:\r)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\
]]))|"(?:[^\"\r\\]|\\.|(??:\r)?[ \t]))*"(??:\r)?[ \t])*)*\<(??:\r)
?[ \t])*(?:@(?:[^()<>@,;:\\".\[\] \000-\031]+(???:\r)?[ \t])+|\Z|(?=[\["
()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](??:\r)?[ \t])*)(?:\.(??:\r)
?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(???:\r)?[ \t])+|\Z|(?=[\["()<>
@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](??:\r)?[ \t])*))*(?:,@(??:\r)?[
\t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(???:\r)?[ \t])+|\Z|(?=[\["()<>@,
;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](??:\r)?[ \t])*)(?:\.(??:\r)?[ \t]
)*(?:[^()<>@,;:\\".\[\] \000-\031]+(???:\r)?[ \t])+|\Z|(?=[\["()<>@,;:\\
".\[\]]))|\[([^\[\]\r\\]|\\.)*\](??:\r)?[ \t])*))*)*??:\r)?[ \t])*)?
(?:[^()<>@,;:\\".\[\] \000-\031]+(???:\r)?[ \t])+|\Z|(?=[\["()<>@,;:\\".
\[\]]))|"(?:[^\"\r\\]|\\.|(??:\r)?[ \t]))*"(??:\r)?[ \t])*)(?:\.(??:
\r)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(???:\r)?[ \t])+|\Z|(?=[\[
"()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(??:\r)?[ \t]))*"(??:\r)?[ \t])
*))*@(??:\r)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(???:\r)?[ \t])
+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](??:\r)?[ \t])*)(?:\
.(??:\r)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(???:\r)?[ \t])+|\Z
|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](??:\r)?[ \t])*))*\>(?
?:\r)?[ \t])*))*)?;\s*)


P.S. I tried code="plain" but the sad faces, well, they're not stopped by that.
Post #1609675
Posted Tuesday, September 2, 2014 10:32 AM
SSCrazy

SSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazy

Group: General Forum Members
Last Login: Today @ 12:37 PM
Points: 2,020, Visits: 3,474
Nadrek (9/2/2014)

but the sad faces, well, they're not stopped by that.


They're not stopped by that. They cannot be stopped by anything. They're the imprisoned souls needed to power this most unholy of regexes, staring back at you from the abyss. They are coming.
Post #1609695
Posted Tuesday, September 2, 2014 11:04 AM
SSCarpal Tunnel

SSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal Tunnel

Group: General Forum Members
Last Login: Today @ 4:25 PM
Points: 4,280, Visits: 3,720
sqldriver (8/28/2014)

The email addresses I was processing were not from a web form, they were provided to us either by clients, or by data marts. Nothing was deleted, it was just given a 'status flag' that meant the email inspection found something wonky. A certain percentage of the contact information was expected to be inaccurate or incorrect. So if the project was running a long time without a high response rate, and the PM didn't want to buy more sample, the remainder email addresses could be released.

I had to do a lot of similar stuff with phone number cleaning/checking and formatting so the dialer could make the calls correctly.

Market research is a bummer.

Hey, you know it's your fault that the invalid email addresses and phone numbers don't work. Because they're invalid, that shouldn't impact the response rate, should it? Your situation is certainly not unique. Guess who else works in market research?



Tally Tables - Performance Personified
String Splitting with True Performance
Best practices on how to ask questions
Post #1609709
Posted Tuesday, September 2, 2014 11:07 AM


SSChampion

SSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampion

Group: General Forum Members
Last Login: Today @ 2:26 PM
Points: 13,093, Visits: 12,573
Nadrek (9/2/2014)
[quote]

Which has a sample Perl RegEx for RFC822 validation, NOT including the arbitrarily nested comments allowed by the RFC.


Heaven help you if you have to debug that thing. That to me does not look like a viable piece of code I would put anywhere near my system. That whole being able to support my code thing comes to mind.


_______________________________________________________________

Need help? Help us help you.

Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

Need to split a string? Try Jeff Moden's splitter.

Cross Tabs and Pivots, Part 1 – Converting Rows to Columns
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs
Understanding and Using APPLY (Part 1)
Understanding and Using APPLY (Part 2)
Post #1609712
Posted Tuesday, September 2, 2014 11:10 AM
SSCarpal Tunnel

SSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal Tunnel

Group: General Forum Members
Last Login: Today @ 4:25 PM
Points: 4,280, Visits: 3,720
Sean Lange (9/2/2014)
Nadrek (9/2/2014)
[quote]

Which has a sample Perl RegEx for RFC822 validation, NOT including the arbitrarily nested comments allowed by the RFC.


That to me does not look like a viable piece of code I would put anywhere near my system.

+1000



Tally Tables - Performance Personified
String Splitting with True Performance
Best practices on how to ask questions
Post #1609718
Posted Tuesday, September 2, 2014 11:27 AM


SSC-Addicted

SSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-Addicted

Group: General Forum Members
Last Login: Today @ 2:50 PM
Points: 472, Visits: 1,914
Ed Wagner (9/2/2014)
sqldriver (8/28/2014)

The email addresses I was processing were not from a web form, they were provided to us either by clients, or by data marts. Nothing was deleted, it was just given a 'status flag' that meant the email inspection found something wonky. A certain percentage of the contact information was expected to be inaccurate or incorrect. So if the project was running a long time without a high response rate, and the PM didn't want to buy more sample, the remainder email addresses could be released.

I had to do a lot of similar stuff with phone number cleaning/checking and formatting so the dialer could make the calls correctly.

Market research is a bummer.

Hey, you know it's your fault that the invalid email addresses and phone numbers don't work. Because they're invalid, that shouldn't impact the response rate, should it? Your situation is certainly not unique. Guess who else works in market research?


What do you mean you don't know if these Thai phone numbers are cell phones? We can't have cell phones on the predictive dialer! CASRO will exile us!
Post #1609722
Posted Tuesday, September 2, 2014 11:37 AM
SSCarpal Tunnel

SSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal Tunnel

Group: General Forum Members
Last Login: Today @ 4:25 PM
Points: 4,280, Visits: 3,720
sqldriver (9/2/2014)
Ed Wagner (9/2/2014)
sqldriver (8/28/2014)

The email addresses I was processing were not from a web form, they were provided to us either by clients, or by data marts. Nothing was deleted, it was just given a 'status flag' that meant the email inspection found something wonky. A certain percentage of the contact information was expected to be inaccurate or incorrect. So if the project was running a long time without a high response rate, and the PM didn't want to buy more sample, the remainder email addresses could be released.

I had to do a lot of similar stuff with phone number cleaning/checking and formatting so the dialer could make the calls correctly.

Market research is a bummer.

Hey, you know it's your fault that the invalid email addresses and phone numbers don't work. Because they're invalid, that shouldn't impact the response rate, should it? Your situation is certainly not unique. Guess who else works in market research?


What do you mean you don't know if these Thai phone numbers are cell phones? We can't have cell phones on the predictive dialer! CASRO will exile us!

Yes they will, unless you have permission to use cell phone numbers. I think that predictive dialers are falling out of favor anyway industry-wide, given all the restrictions. There are many little quirks with market research data. And watch where (geographically) that data is stored if you any business with European respondents.

I'm in the process of writing a decent email validation, but I'm a ways away from having anything yet. I will not be using the regular expression solution, as we already have one in place and it's a nightmare. I see phase 1 in my head, so it's just a matter of getting the time to write it, get it working and making it fast. Ideally for phase 2, I'd like to "call up" the email server, ask if the account lives there and store the server's response. I don't see that part yet, but one thing at a time.



Tally Tables - Performance Personified
String Splitting with True Performance
Best practices on how to ask questions
Post #1609725
Posted Tuesday, September 2, 2014 11:41 AM
SSC Eights!

SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!

Group: General Forum Members
Last Login: Today @ 10:48 AM
Points: 881, Visits: 2,440
Sean Lange (9/2/2014)
Nadrek (9/2/2014)
[quote]

Which has a sample Perl RegEx for RFC822 validation, NOT including the arbitrarily nested comments allowed by the RFC.


Heaven help you if you have to debug that thing. That to me does not look like a viable piece of code I would put anywhere near my system. That whole being able to support my code thing comes to mind.


Judging by RFC822, no accurate 'validate email' code is going to be viable or supportable except by a domain expert .
Post #1609729
Posted Tuesday, September 2, 2014 5:41 PM


SSCertifiable

SSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiable

Group: General Forum Members
Last Login: Today @ 3:10 PM
Points: 7,814, Visits: 9,563
Nadrek (9/2/2014)
Judging by RFC822, no accurate 'validate email' code is going to be viable or supportable except by a domain expert .


And if you go by RFC 822 you are going to have problems anyway, because it's out of date and no longer a valid description of the internet email address rules. The new RFC (RFC 2822) which superceded RFC 822 has itself been obsoleted by an even newer rfc (RFC 5322) and even RFC 5322 is partly obsoleted by RFC 6854 which replaces part of RFC 5322.

But being a domain expert is irrelevant anyway, because "valid" means "conforming to the internet rules for email addresses" rather than "being an actual email address" or "being an email address potentially allowed by the xyz.com receiving domain".


Tom
Post #1609848
« Prev Topic | Next Topic »

Add to briefcase «««1234»»

Permissions Expand / Collapse