Bug in CEmailValidator

Hi there

To validate standard, generated by yiic application, I’m using standard out-of-box email validator:


	public function rules()

	{

		return array

            	(

			...

			array('email', 'email'), //Email has to be a valid email address

			...

		);

	}

Today my tester found out that he can send standard contact form, passing as sender e-mail address something like test@test.111, which is treated as valid.

I’m pretty sure that this is a bug, as if I’m not mistaken ccTLD specification does not allows numbers in any conutry-code top level domains. However, I would like to confirm this bug with other users before opening a new bug ticket.

<deleted>

Data deleted to avoid confusions with regultar expression patterns

</deleted>

more info here: http://www.regular-expressions.info/email.html

Cheers!

UPDATE:

Sorry Trej, you said ‘out of the box’. Apologies

If you read the docs, you pretty soon find this links with some more explanation:

http://www.regular-e…info/email.html

If the default pattern doesn’t satisfy you, you can supply your own improved regular expression or use checkMX.

EDIT:

This time Antonio beat me :)

You can play with your custom pattern here Trej: http://tools.netshiftmedia.com/regexlibrary/

@Mike — took me nearly 500 posts to accomplish this lol!

Guys!

I know, that I can use my own pattern and I know, where to find correct pattern. I wasn’t asking, what to do to properly validate my e-mail address? I was asking, isn’t that a bug, that build-in e-mail validator is passing e-mails, known to be invalid, as valid?

Better now? :] I don’t think it is a good idea to publish along with framework a validator that is useless in some points. I mean - what is the reason for it, if we know that this validator is buggy and users will be asking questions how to properly validate e-mails and you will be answering them, sending them to regular-expressions.info over and over again.

EDIT: All right, I’m a moron! :confused: I don’t know, how to use own patterns! :confused:

If I use Antonio’s:


/^((([a-z]|\d|[!#\$%&'\*\+\-\/=\?\^_`{\|}~]|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])+(\.([a-z]|\d|[!#\$%&'\*\+\-\/=\?\^_`{\|}~]|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])+)*)|((\x22)((((\x20|\x09)*(\x0d\x0a))?(\x20|\x09)+)?(([\x01-\x08\x0b\x0c\x0e-\x1f\x7f]|\x21|[\x23-\x5b]|[\x5d-\x7e]|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(\\([\x01-\x09\x0b\x0c\x0d-\x7f]|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF]))))*(((\x20|\x09)*(\x0d\x0a))?(\x20|\x09)+)?(\x22)))@((([a-z]|\d|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(([a-z]|\d|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])([a-z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])*([a-z]|\d|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])))\.)+(([a-z]|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(([a-z]|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])([a-z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])*([a-z]|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])))\.?$/i

I got error saying: “preg_match(): Compilation failed: PCRE does not support \L, \l, \N, \U, or \u at offset 44”. And if I tried to used any of last two from the page you both mentioned, I got error saying: “preg_match(): Unknown modifier ‘+’”! :confused:

This is why I apologize :P

You are not moron! don’t say that man… maybe I am the moron as that was the regular expression from the best javascript validator that I use in my own projects and you are right, I didn’t make sure that PHP could handle that or not… Apologies again man :(

In order to work as you suggest, the pattern within the class should get right of the 0-9 declarations, this way the pattern will be only matching small and big characters. Just like the simple regular expression ever for matching emails:

^[_a-z0-9-]+(\.[_a-z0-9-]+)@[a-z0-9-]+(\.[a-z0-9-]+)(\.[a-z]{2,3})$ <— see the last part after the dot

Nevertheless, the above is not good to match .co.uk or .co.es or whatever… just two or three char large extensions.

To tell you the truth, I do not know why the default pattern allows numbers on domain ext.

Peace man

Me neither, man! Reported this as a bug. Cheers.

Treider, with the word “bug” you imply that the person who created this validator (Qiang in this case) made a mistake. That’s why i don’t like the word “bug” in this context, because the docs already made quite clear, why this regex is used (see the mentioned reference): There is no 100% regex that filters out all invalid email addresses. Any regex is a tradeoff in some way. The regex used is simply reflecting the standard defined in RFC 2822. It’s guaranteed not to filter out any valid email address. IMO we should stay with the standards - you never know, which TLDs might be available in some years.

If you still want to use a different pattern, you can. But Yii should adhere to existing standards as much as possible.

Here is a good read about validating email even as it has been writen on 2007 - http://www.linuxjournal.com/article/9585?page=0,0

Mike, let me explain,

First of all - there is no information about any trade-off in Yii documentation. To be honest, there is nothing at all, except a link to regular-expressions.info, where - as I wrote above - I found a bunch of RegEx of which none is working for me (if I not made a mistake, but shouldn’t - I only copied whole RegEx to ‘pattern’=>’’ field).

Second of all - you wrote that Yii implements The Official Standard: RFC 2822 while regular-expressions.info says “You can (but you shouldn’t–read on) implement it”, which makes me a bit confused.

And third of all - I’m saying that there is a bug in how CEmailValidator validates e-mails, not specifically pointing out if this is problem of validator (or its creator) itself or protocol / solution / source it lays on. But I’m surprised that official RFC standard allows e-mail that are know even to newbie like me that are incorrect.

I’m a complete newbie to regular expressions and I don’t know if preventing numbers in top level domain of e-mail address being validated is a problem or not. For me personally this is a huge bug. And I don’t understand where is that trade-off - i.e. what we receive in place for this missing?

I don’t know like others, but in my company, possibility to enter numbers as top-level domain is and always will be marked by any tester as a bug and no explanation like is it or is it not relying or any standard will be accepted. Pity, but true.

I will not discuss about the sense of standards, defined in RFC. In the same way we could discuss wether it makes sense that it rains sometimes. :)

About the RFC pattern: It says, you should read on to understand, why you shouldn’t use it. If you do that, you find out that only the format how the regex is specified makes a problem - not the rule itself.

So did you give the other suggested rule a try? You need to enclose regular expressions in ‘/’.


/[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*@(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+(?:[A-Z]{2}|com|org|net|edu|gov|mil|biz|info|mobi|name|aero|asia|jobs|museum)\b/

I am just wondering: if it doesn’t work, and you shouldn’t really use it, why on Earth does it ship with Yii then?

That’s what Trejder said - it’s not what the link says and not how it is implemented.

You are right (again) - I missed your post about adhering to a standard, even if that standard is slightly flawed.

Upon reading the topic again, the bug turned into a feature and my point became moot. ;)

Yes! I tried all rules on page mentioned by you.

Seems that I’m a moron or sth. Here are my attempts with reg.ex you provided and results in application:


array('email', 'email', 'pattern'=>"/[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*@(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+(?:[A-Z]{2}|com|org|net|edu|gov|mil|biz|info|mobi|name|aero|asia|jobs|museum)\b/"), //Email has to be a valid email address

Exception: preg_match() [function.preg-match]: Unknown modifier ‘=’


array('email', 'email', 'pattern'=>'/[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*@(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+(?:[A-Z]{2}|com|org|net|edu|gov|mil|biz|info|mobi|name|aero|asia|jobs|museum)\b/'), //Email has to be a valid email address

Parse error: syntax error, unexpected T_DIV_EQUAL in D:\Dev\xampp\htdocs\www\protected\models\ContactForm.php on line 24


array('email', 'email', 'pattern'=>/[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*@(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+(?:[A-Z]{2}|com|org|net|edu|gov|mil|biz|info|mobi|name|aero|asia|jobs|museum)\b/), //Email has to be a valid email address

Parse error: syntax error, unexpected ‘/’ in D:\Dev\xampp\htdocs\www\protected\models\ContactForm.php on line 24

I know that these are basic and I am missing something painfully obvious, but… it doesn’t work neither way I tried (single quotes, double quotes, no quotes). Pretty much the same, as with other reg.exps. from that page.

BTW: Thank you for your enlightening discussion above. Helped me really much! :confused:

You must escape the delimiter and maybe certain characters. Simply do this:


'pattern' => '/' . preg_quote($regex, '/') . '/'

Thanks to Antonio, I was able to solve both problems (valid regex for filtering numbers in top domain and rejection [PHP fatal error] of other regex rules) with following code:


	/**

     * Declares the validation rules.

     */

	public function rules()

	{

		$email_regex_pattern = '/^[a-zA-Z0-9!#$%&\'*+\\/=?^_`{|}~-]+(?:\.[a-zA-Z0-9!#$%&\'*+\\/=?^_`{|}~-]+)*@(?:[a-zA-Z0-9](?:[a-zA-Z0-9-]*[a-zA-Z0-9])?\.)+[a-zA-Z](?:[a-zA-Z]*[a-zA-Z])?$/';


            	return array

            	(

			array('name, email, body', 'required'), //Name, email and body are required

			array('email', 'email', 'pattern'=>$email_regex_pattern), //Email has to be a valid email address

		);

	}

Thanks again, Antonio! :]

Apologies for the rest of the guys for having sent a private message to Trej, and thank you very much Trej for posting the solution here.

@Antonio: No problem for me, at all. It was my pleasure to cite you! :]

@Mike and jacmoe: Hm… Seems that there is solution that validates e-mails for not having numbers in TLDs. Maybe we reopen discussion after your enlightening comments to discuss, why regex same or similar to Antonio does not ships with Yii by default?

EDIT: Just got notification that Qiang marked my bug as WonFix. Hm… seems more and more people are thinking that the fact that build-in validator is passing invalid e-mails as valid is OK. I can’t understand it but won’t argue!

Please note: we are talking about real validation bug. Not a slight one like those explained in article provided by mdomba. For me (both as user and developer) it is OK if e-mail validator can’t accept e-mails in form for example “Mark Anthony” <mark.anthony@somehost.com>. Because there is less than 1% users in my opinion that will enter e-mail in such format when specifically asked to enter e-mail only (in most contact forms I’ve been using there is a separate field for name and one doesn’t have to write it in e-mail field). But it is an obvious bug if a contact form build upon Yii’s default e-mail validator is passing e-mails like test@test.111. Because if I receive such form contents I won’t be able to answer. Obvious at least for me.

But, as I said - I won’t aruge if I’m the only one thinking this is a bug and rest is considering it as welcomed feature.

I see that qiang just closed the ticket, when i wanted so suggest "wontfix".

@Trejder:

RFC 2822 doesn’t forbid numers after the last dot in email addresses per se. That’s why your assumption, that adresses with numbers in the TLD are wrong is not the full truth. It might be true, that there’s no such TLD now , but it doesn’t mean, there never will be (it could already have changed in a year). The current implementation makes sure, that no valid email address is ever filtered out, by getting as close to the RFC as possible. That’s a different statement than: “It filters out every wrong address”. That’s the tradoff, mentioned above: There’s no simple regex that accomplishes both.