Bug in CEmailValidator

Me neither, man! Reported this as a bug. Cheers.

Treider, with the word “bug” you imply that the person who created this validator (Qiang in this case) made a mistake. That’s why i don’t like the word “bug” in this context, because the docs already made quite clear, why this regex is used (see the mentioned reference): There is no 100% regex that filters out all invalid email addresses. Any regex is a tradeoff in some way. The regex used is simply reflecting the standard defined in RFC 2822. It’s guaranteed not to filter out any valid email address. IMO we should stay with the standards - you never know, which TLDs might be available in some years.

If you still want to use a different pattern, you can. But Yii should adhere to existing standards as much as possible.

Here is a good read about validating email even as it has been writen on 2007 - http://www.linuxjournal.com/article/9585?page=0,0

Mike, let me explain,

First of all - there is no information about any trade-off in Yii documentation. To be honest, there is nothing at all, except a link to regular-expressions.info, where - as I wrote above - I found a bunch of RegEx of which none is working for me (if I not made a mistake, but shouldn’t - I only copied whole RegEx to ‘pattern’=>’’ field).

Second of all - you wrote that Yii implements The Official Standard: RFC 2822 while regular-expressions.info says “You can (but you shouldn’t–read on) implement it”, which makes me a bit confused.

And third of all - I’m saying that there is a bug in how CEmailValidator validates e-mails, not specifically pointing out if this is problem of validator (or its creator) itself or protocol / solution / source it lays on. But I’m surprised that official RFC standard allows e-mail that are know even to newbie like me that are incorrect.

I’m a complete newbie to regular expressions and I don’t know if preventing numbers in top level domain of e-mail address being validated is a problem or not. For me personally this is a huge bug. And I don’t understand where is that trade-off - i.e. what we receive in place for this missing?

I don’t know like others, but in my company, possibility to enter numbers as top-level domain is and always will be marked by any tester as a bug and no explanation like is it or is it not relying or any standard will be accepted. Pity, but true.

I will not discuss about the sense of standards, defined in RFC. In the same way we could discuss wether it makes sense that it rains sometimes. :)

About the RFC pattern: It says, you should read on to understand, why you shouldn’t use it. If you do that, you find out that only the format how the regex is specified makes a problem - not the rule itself.

So did you give the other suggested rule a try? You need to enclose regular expressions in ‘/’.


/[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*@(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+(?:[A-Z]{2}|com|org|net|edu|gov|mil|biz|info|mobi|name|aero|asia|jobs|museum)\b/

I am just wondering: if it doesn’t work, and you shouldn’t really use it, why on Earth does it ship with Yii then?

That’s what Trejder said - it’s not what the link says and not how it is implemented.

You are right (again) - I missed your post about adhering to a standard, even if that standard is slightly flawed.

Upon reading the topic again, the bug turned into a feature and my point became moot. ;)

Yes! I tried all rules on page mentioned by you.

Seems that I’m a moron or sth. Here are my attempts with reg.ex you provided and results in application:


array('email', 'email', 'pattern'=>"/[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*@(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+(?:[A-Z]{2}|com|org|net|edu|gov|mil|biz|info|mobi|name|aero|asia|jobs|museum)\b/"), //Email has to be a valid email address

Exception: preg_match() [function.preg-match]: Unknown modifier ‘=’


array('email', 'email', 'pattern'=>'/[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*@(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+(?:[A-Z]{2}|com|org|net|edu|gov|mil|biz|info|mobi|name|aero|asia|jobs|museum)\b/'), //Email has to be a valid email address

Parse error: syntax error, unexpected T_DIV_EQUAL in D:\Dev\xampp\htdocs\www\protected\models\ContactForm.php on line 24


array('email', 'email', 'pattern'=>/[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*@(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+(?:[A-Z]{2}|com|org|net|edu|gov|mil|biz|info|mobi|name|aero|asia|jobs|museum)\b/), //Email has to be a valid email address

Parse error: syntax error, unexpected ‘/’ in D:\Dev\xampp\htdocs\www\protected\models\ContactForm.php on line 24

I know that these are basic and I am missing something painfully obvious, but… it doesn’t work neither way I tried (single quotes, double quotes, no quotes). Pretty much the same, as with other reg.exps. from that page.

BTW: Thank you for your enlightening discussion above. Helped me really much! :confused:

You must escape the delimiter and maybe certain characters. Simply do this:


'pattern' => '/' . preg_quote($regex, '/') . '/'

Thanks to Antonio, I was able to solve both problems (valid regex for filtering numbers in top domain and rejection [PHP fatal error] of other regex rules) with following code:


	/**

     * Declares the validation rules.

     */

	public function rules()

	{

		$email_regex_pattern = '/^[a-zA-Z0-9!#$%&\'*+\\/=?^_`{|}~-]+(?:\.[a-zA-Z0-9!#$%&\'*+\\/=?^_`{|}~-]+)*@(?:[a-zA-Z0-9](?:[a-zA-Z0-9-]*[a-zA-Z0-9])?\.)+[a-zA-Z](?:[a-zA-Z]*[a-zA-Z])?$/';


            	return array

            	(

			array('name, email, body', 'required'), //Name, email and body are required

			array('email', 'email', 'pattern'=>$email_regex_pattern), //Email has to be a valid email address

		);

	}

Thanks again, Antonio! :]

Apologies for the rest of the guys for having sent a private message to Trej, and thank you very much Trej for posting the solution here.

@Antonio: No problem for me, at all. It was my pleasure to cite you! :]

@Mike and jacmoe: Hm… Seems that there is solution that validates e-mails for not having numbers in TLDs. Maybe we reopen discussion after your enlightening comments to discuss, why regex same or similar to Antonio does not ships with Yii by default?

EDIT: Just got notification that Qiang marked my bug as WonFix. Hm… seems more and more people are thinking that the fact that build-in validator is passing invalid e-mails as valid is OK. I can’t understand it but won’t argue!

Please note: we are talking about real validation bug. Not a slight one like those explained in article provided by mdomba. For me (both as user and developer) it is OK if e-mail validator can’t accept e-mails in form for example “Mark Anthony” <mark.anthony@somehost.com>. Because there is less than 1% users in my opinion that will enter e-mail in such format when specifically asked to enter e-mail only (in most contact forms I’ve been using there is a separate field for name and one doesn’t have to write it in e-mail field). But it is an obvious bug if a contact form build upon Yii’s default e-mail validator is passing e-mails like test@test.111. Because if I receive such form contents I won’t be able to answer. Obvious at least for me.

But, as I said - I won’t aruge if I’m the only one thinking this is a bug and rest is considering it as welcomed feature.

I see that qiang just closed the ticket, when i wanted so suggest "wontfix".

@Trejder:

RFC 2822 doesn’t forbid numers after the last dot in email addresses per se. That’s why your assumption, that adresses with numbers in the TLD are wrong is not the full truth. It might be true, that there’s no such TLD now , but it doesn’t mean, there never will be (it could already have changed in a year). The current implementation makes sure, that no valid email address is ever filtered out, by getting as close to the RFC as possible. That’s a different statement than: “It filters out every wrong address”. That’s the tradoff, mentioned above: There’s no simple regex that accomplishes both.

And one more note:

You might get much closer to what you look for, if you enable the checkMx feature of the validator.

@Trejder:

I didn’t understand the issue at first, but it’s simple enough: the standard itself is incomplete (or ahead of itself).

It makes sense to follow the standard, even if the real world doesn’t.

Who knows what kinds of email addresses we will have in the near future?

+1 for having raised this issue, and another plus for finding a solution to the problem. :)

I will definitely be using that regex in my email validators today.

Mike,

You’ve already explained this. By reopening topic I was rather suggesting strong off-topic discussion, if using in Yii by default regex as close to RFC 2822 as possible is a good idea? For me personally (mainly as a developer, but also as I user) the trade-off you are talking about has no sense. As I already explained, I prefer far, far much to have regex filtering wrong e-mail addresses than the one that it - let’s say so - very open for future possible changes to TLDs, but for trade-off of passing through invalid users.

Why? Simple. Because possibility of IANA allowing numbers in TLDs is a future problem. And in my opinion even saying future is very optimistic as I would rather use virtual or abstraction here. I mean - look at IANA history of designating new TLDs. How many changes we had in past twenty years? Ten? Twenty? That makes one change per year. How many of TLDs during that time were allowed to have numbers? Zero. How many in whole IANA and DNS history? Zero. What is a chance that in a reasonable future something will change here - i.e, that IANA allow numbers in TLDs? I would say that less than five percent. That rather small chance, don’t you agree?

And (back to main thread) having users being able to enter e-mail like test@test.111 is current, not future problem. If I’m not mistaken, you are guys proposing that Yii will ship with a solution that by default will be changed by any developer in any new application. As is does not fit current needs (is open to future less then possible changes, but who would care about it right now?). Isn’t that something wrong? For me this is like selling a car with summer tires on Greenland or North Pole. Of course - there is a possibility that 1% of buyers in this area will want to take their newly bought car to Europe or even Africa. But this is one percent satisfied, where 99% others will have to start with changing tires to winter one. You got my idea?

My conclusion isn’t that RFC 2822 or regex basing on it is wrong. My conlusion is that Yii using it by default is a mistake. It is currently more open to future, possible changes than to satisfy current developers needs. But that’s just my private opinion.

I followed this thread from the beginning and it’s interesting how you all “argue” about some regex that should filter valid/invalid emails…

In my opinion that is a very small problem in validating email… even if you filter the numbers domain… a much more bigger problem is if a uesr enters wrong/mispelled domain or address…

and that comes from 8 years of experience being a webmaster of a site offering private acomodations (and more) where users when visiting the site send inquiries about the offers, and there we have more then 100 emails per day sent from an online email form…

So the best thing to minimize the errors is (like Mike suggested) to enable the checkMX record… that part is very important and helpfull because it checks if the entered domain has a MX record (has a mail server)… so that misspelled domains get’s filtered here…

Remains the misspelled username… that can be checked only by writing a script that will connect to the mail sever and check if that email exists…

but only by implementing the MX check I got 3-5 bouncing email per day instead of 20-30 per day that was before the MX check…

Edit:

If anybody is interested here is my implementation:

Note that this code is pre Yii and pre PHP 5.2:




if(""==$email)

{

   $emailErr=true;

   $emailErrMsg='E-mail address not entered!';

}

else {

   if(!eregi('^([._a-z0-9-]+[._a-z0-9-]*)@(([a-z0-9-]+\.)*([a-z0-9-]+)(\.[a-z]{2,5})?) ,$email)

   {

  	$emailErr=true;

  	$emailErrMsg='E-mail address NOT valid!<br />';

   }

   else 

   {

  	// split email address into username and domain

  	list($eName,$eDomain)=split("@",$email);

  	if(function_exists("checkdnsrr")) //.. does not exist under windows!

  	{	

 		if(!checkdnsrr($eDomain,"MX")) 

 		{

        	$emailErr=true;

        	$emailErrMsg='Please, check your e-mail. Domain: <strong>$eDomain</strong> does not exist!<br />';

 		}

  	}

   }

}



That is what I’m trying to tell everyone else since very beginning. I don’t argue about filtering e-mails itself. Only on Mike, jacmoe, Qiang and probably many more people stating that this is a fair trade-off: favouring some abstract, virtual changes to TLDs (I personally believe that IANA will never allow numbers in TLDs, because what would be the reason for doing so? Implementing .c0m domain for .com? It this world already bloated with domain-squatting security issue?) with price of forcing current users suffer from not complete e-mail validation.

Of course, there are many things to take care of in email validating sequence. But in general, saying that we will pass e-mails containing numbers only to have space for future changes in TLDs is wrong. And I take back what I just said before. If RFC is build upon such assumption then it is wrong. In my personal opinion.

Checking MX is not an option, if you are writing an application for localhost (or intranet) that will be run on a computer not connected to the internet and will write all user feedback to locally setup mail server / account.

If that server will allow you to do this. I’m not sure if PHP can connect to a mail server on the same port as mail clients do. If not - destination server firewall configuration may block it. And if can - performance degradation would be to high in my opinion to use checking MX in e-mail validation.

In the example you provided you are using check for MX record. If I’m not mistaken, in article you cited above there is a part saying that this is wrong, as some mails servers can be fully functional without publishing MX record, only limiting itself to publish A records. But DNS stuff and configuration is a whole black magic for me, therefore maybe, what I’m saying here is a complete non-sense.

We are talking about different things… you are still on the numbers in the domain… I’m talking about mispelled domains that are more common than numebrs in the domain…

for example "cmo" instead of "com"… or any other alternative…

not to mention mispelled domains… like "gmali" instead of "gmail"…

First thing in my code is the check for local/server execution… and I always have a "isLocal" variable set appropriately… many other code runs differently on localhost than on server… one for example… on localhost I just display the email in a new window so that I can see what is sent… on server the mail is sent…

Because of SMAPers more and more mail server adopted different anty-spam precautions… .one of them is to reject e-mails that has a domain that does not have an MX record… so in today internet world if you don’t have an MX record for your mail server … many mails sent from your server will just bounce back…