Bug in CEmailValidator

mikl · December 28, 2010, 9:44pm

And one more note:

You might get much closer to what you look for, if you enable the checkMx feature of the validator.

jacmoe · December 28, 2010, 10:27pm

@Trejder:

I didn’t understand the issue at first, but it’s simple enough: the standard itself is incomplete (or ahead of itself).

It makes sense to follow the standard, even if the real world doesn’t.

Who knows what kinds of email addresses we will have in the near future?

+1 for having raised this issue, and another plus for finding a solution to the problem.

I will definitely be using that regex in my email validators today.

trejder · December 29, 2010, 7:30am

Mike,

You’ve already explained this. By reopening topic I was rather suggesting strong off-topic discussion, if using in Yii by default regex as close to RFC 2822 as possible is a good idea? For me personally (mainly as a developer, but also as I user) the trade-off you are talking about has no sense. As I already explained, I prefer far, far much to have regex filtering wrong e-mail addresses than the one that it - let’s say so - very open for future possible changes to TLDs, but for trade-off of passing through invalid users.

Why? Simple. Because possibility of IANA allowing numbers in TLDs is a future problem. And in my opinion even saying future is very optimistic as I would rather use virtual or abstraction here. I mean - look at IANA history of designating new TLDs. How many changes we had in past twenty years? Ten? Twenty? That makes one change per year. How many of TLDs during that time were allowed to have numbers? Zero. How many in whole IANA and DNS history? Zero. What is a chance that in a reasonable future something will change here - i.e, that IANA allow numbers in TLDs? I would say that less than five percent. That rather small chance, don’t you agree?

And (back to main thread) having users being able to enter e-mail like test@test.111 is current, not future problem. If I’m not mistaken, you are guys proposing that Yii will ship with a solution that by default will be changed by any developer in any new application. As is does not fit current needs (is open to future less then possible changes, but who would care about it right now?). Isn’t that something wrong? For me this is like selling a car with summer tires on Greenland or North Pole. Of course - there is a possibility that 1% of buyers in this area will want to take their newly bought car to Europe or even Africa. But this is one percent satisfied, where 99% others will have to start with changing tires to winter one. You got my idea?

My conclusion isn’t that RFC 2822 or regex basing on it is wrong. My conlusion is that Yii using it by default is a mistake. It is currently more open to future, possible changes than to satisfy current developers needs. But that’s just my private opinion.

mdomba · December 29, 2010, 8:06am

I followed this thread from the beginning and it’s interesting how you all “argue” about some regex that should filter valid/invalid emails…

In my opinion that is a very small problem in validating email… even if you filter the numbers domain… a much more bigger problem is if a uesr enters wrong/mispelled domain or address…

and that comes from 8 years of experience being a webmaster of a site offering private acomodations (and more) where users when visiting the site send inquiries about the offers, and there we have more then 100 emails per day sent from an online email form…

So the best thing to minimize the errors is (like Mike suggested) to enable the checkMX record… that part is very important and helpfull because it checks if the entered domain has a MX record (has a mail server)… so that misspelled domains get’s filtered here…

Remains the misspelled username… that can be checked only by writing a script that will connect to the mail sever and check if that email exists…

but only by implementing the MX check I got 3-5 bouncing email per day instead of 20-30 per day that was before the MX check…

Edit:

If anybody is interested here is my implementation:

Note that this code is pre Yii and pre PHP 5.2:




if(""==$email)

{

   $emailErr=true;

   $emailErrMsg='E-mail address not entered!';

}

else {

   if(!eregi('^([._a-z0-9-]+[._a-z0-9-]*)@(([a-z0-9-]+\.)*([a-z0-9-]+)(\.[a-z]{2,5})?) ,$email)

   {

  	$emailErr=true;

  	$emailErrMsg='E-mail address NOT valid!<br />';

   }

   else 

   {

  	// split email address into username and domain

  	list($eName,$eDomain)=split("@",$email);

  	if(function_exists("checkdnsrr")) //.. does not exist under windows!

  	{	

 		if(!checkdnsrr($eDomain,"MX")) 

 		{

        	$emailErr=true;

        	$emailErrMsg='Please, check your e-mail. Domain: <strong>$eDomain</strong> does not exist!<br />';

 		}

  	}

   }

}

trejder · December 29, 2010, 8:58am

That is what I’m trying to tell everyone else since very beginning. I don’t argue about filtering e-mails itself. Only on Mike, jacmoe, Qiang and probably many more people stating that this is a fair trade-off: favouring some abstract, virtual changes to TLDs (I personally believe that IANA will never allow numbers in TLDs, because what would be the reason for doing so? Implementing .c0m domain for .com? It this world already bloated with domain-squatting security issue?) with price of forcing current users suffer from not complete e-mail validation.

Of course, there are many things to take care of in email validating sequence. But in general, saying that we will pass e-mails containing numbers only to have space for future changes in TLDs is wrong. And I take back what I just said before. If RFC is build upon such assumption then it is wrong. In my personal opinion.

Checking MX is not an option, if you are writing an application for localhost (or intranet) that will be run on a computer not connected to the internet and will write all user feedback to locally setup mail server / account.

If that server will allow you to do this. I’m not sure if PHP can connect to a mail server on the same port as mail clients do. If not - destination server firewall configuration may block it. And if can - performance degradation would be to high in my opinion to use checking MX in e-mail validation.

In the example you provided you are using check for MX record. If I’m not mistaken, in article you cited above there is a part saying that this is wrong, as some mails servers can be fully functional without publishing MX record, only limiting itself to publish A records. But DNS stuff and configuration is a whole black magic for me, therefore maybe, what I’m saying here is a complete non-sense.

mdomba · December 29, 2010, 9:33am

We are talking about different things… you are still on the numbers in the domain… I’m talking about mispelled domains that are more common than numebrs in the domain…

for example "cmo" instead of "com"… or any other alternative…

not to mention mispelled domains… like "gmali" instead of "gmail"…

First thing in my code is the check for local/server execution… and I always have a "isLocal" variable set appropriately… many other code runs differently on localhost than on server… one for example… on localhost I just display the email in a new window so that I can see what is sent… on server the mail is sent…

Because of SMAPers more and more mail server adopted different anty-spam precautions… .one of them is to reject e-mails that has a domain that does not have an MX record… so in today internet world if you don’t have an MX record for your mail server … many mails sent from your server will just bounce back…

trejder · December 29, 2010, 10:54am

That ends discussion, at least on my side.

Yes! I am aware of this. I just wrote it as an example that in this situation, without possibility of using checkMx, you are only left to use CEmailValidator, and have to change it’s default regex if you wish to filter numbers in domain.