CModel::sanitize()

Quite often validation rules may be costly (eg checking for existing records in DB). Besides, validation is meant to check attributes without modifying them (even though the latter is also possible). What I suggest is a separate sanitization logic that should be performed during assignment. Sanitization rules should be defined the same way as validation rules. This way, you could separate simple input processing from (potentially) costly validation. At the same time, you could separate destructive processing (modifying input) from non-destructive processing, making for a cleaner and more organized code.

Any thoughts are welcome.

I found your post after trying to discover if Yii had these kind of features built in. What you suggest sounds good but I am wondering if you have pursued this any further, I would like to find a way to do this where I am not peppering my code with sanitisation efforts as it is more prone to error. Unfortunately I am not yet that familiar with Yii so I don’t know what the best way to approach it is, is there a Yii way?

Any advice you or anyone else has would be most appreciated.

Callum

Glad to see that someone else sees the value in this idea. Frankly, I gave up on it because everyone seemed to ignore it on this forum. Perhaps I should give a usage example. Let’s suppose we have an active record class named User:




class User extends CActiveRecord

{	

	/**

	 * Returns the static model of the specified AR class.

	 * @return User the static model class

	 */

	public static function model($className=__CLASS__)

	{

		return parent::model($className);

	}


	/**

	 * @return string the associated database table name

	 */

	public function tableName()

	{

		return 'user';

	}


	/**

	 * @return array validation rules for model attributes.

	 */

	public function rules()

	{

		// NOTE: you should only define rules for those attributes that

		// will receive user inputs.

		return array(

			array('name', 'length', 'max'=>255),

			//example of a costly validation rule

			array('country_id', 'exist', 'className' => 'Country', 'attributeName' => 'id'), 

			// The following rule is used by search().

			// Please remove those attributes that should not be searched.

			array('id, name', 'safe', 'on'=>'search'),

		);

	}


	/**

	 * @return array sanitization rules for model attributes.

	 */	

	public function sanitizationRules()

	{

		return array(

			//supposing there is a built-in sanitization class for trimming a string

			array('name', 'trim', 'length' => 20),

			//suposing there is a built-in sanitization class that filters a string using regex

			array('name', 'filter', 'pattern', '/\W/'), //filter out non-word characters

		);

	}

}



As you can see, I’ve added a method sanitizationRules() that looks very similar to rules(). In this example, I have 2 rules: the first one trims username to 20 chars and the second one filters out any non-word characters. Now, all this can be accomplished by adding these sanitization rules to the existing validation rules. The only difference is that sanitization would be performed on assignment, so as soon as you do:




$user->name = "Jim O'Rourke";



sanitization would be immediatelly performed and name would be transformed to “Jim ORourke” (let’s pretend that this is what we want to achieve). Also, you will not be forced to perform validation (which might be costly) if all you need to do is sanitize data. In addition, you will have a clean separation of validation logic from sanitization logic, instead of mixing them all together.

Hope this makes more sense now.

I like the idea of sanitising user data.

Implemented this partially.

Feedback and suggestions highly welcome.

If this is considered usefull I’ll open a pull request against master.

Documentation, some tests and guide entries are still to-do.

Also I’m going to refactor the methods name from sanatize to sanitize, noticed that to late yesterday…

edit: Yes, I know there are alot typos, I will correct them as soon as I find them ;D

Nice work, Suralc! My initial idea was to perform sanitization on assignment, but on closer inspection I see that this is not that easy to achieve. You could call sanitize() from CActiveRecord::setAttribute(), but you can’t do this in CModel. Does anyone have any ideas how this could be achieved?

Pr got declined. If you have a stong opinion related to this, it would be nice to hear it on github or here.

Well I can see why they would decline it. It doesn’t seem like much of an improvement, simply because both validation and sanitization are performed at the same time. I’m not saying that you did a bad job, Suralc. As I pointed out in my previous post, there doesn’t seem to be a better way. Ideally, sanitization should be performed on assignment, and if this could be achieved, I think that it would be a massive improvement. Then you would specify your sanitization rules (or leave them blank if you don’t want to sanitize anything) and as soon as you assign a value to your model attribute, it gets sanitized according to the specified rules. At that point you may choose not to validate at all (for whatever reason), but you will still have your data sanitized.

As for their argument that you can perform sanitization in validators, with all due respect I think they are wrong. Yes, it can be done, but it’s a bit of a misuse. When you’re validating something, you only expect to get a feedback on whether it’s valid or not, you don’t want to change it. Also (and this is a crucial point), you may choose to validate more than once and expect the same result every time. If validators change your data, this will not be the case. Consider this example:




class User extends CActiveRecord {

{

	//...

	public function rules()

	{

		return array(

			array('name', 'filter', 'filter'=>'addslashes'),

		);

	}

	//...

}


$user = new User();

$user->name = "Jim O'Rourke";

if ($user->validate())

	$user->save();



(NOTE: I chose to use addslashes() because it illustrates the point, not because this is a real world example. So don’t get hung up on that.)

What would happen in the above example is that your model would be validated twice (because CModel::save() performs validation by default) and you will end up saving the string “Jim O\\\'Rourke” to your database. And I’m not doing anything weird in this code, this is textbook stuff. I could write $user->save(false) and get around the issue by avoiding duplicate validation. But all that seems plain wrong. Validation should not change data, IMHO. Sanitization does that job.

Currrent implementation doesn’t differentiate it too much. Both are triggered at the same time so there will be no difference except sanitization rules are declared in the different method.

Yes, I agree. But let’s consider how this can be implemented properly. I am repeating myself a lot, but I think it’s very important that sanitization should occur on assignment. How this can be implemented is another issue altogether. It would require changes in CModel, so that every model attribute is set using magic __set() method. Then you would simply call sanitize() method from __set() and ensure sanitization on assignment.