Changes and new features for CModel

qiang · July 26, 2011, 1:04am

When doing type conversion, we need to pay attention to precision (e.g. bigint represented as 4-byte integers, floating numbers). That’s why PDO uses strings to represent everything.

mindplay · July 31, 2011, 10:59pm

In my opinion, conversion is something that is entirely missing from the current architecture.

For example, I’m frequently vexed by the absence of real timestamp handling, e.g. converting timestamps to/from the persistent (SQL) format. (and yes, I know about the date/time behavior component)

In my opinion, conversion and validation should be fully separated - so that conversions happen first, while binding values to an object (setAttributes) and may cause validation errors if the format isn’t right, but this will shield your validators from having to deal with bad data.

This would enable you to handle date-conversion separately. Let’s say a date/time value is converted to a timestamp by a converter. Now you can write various date-validators, or you can safely use a range-validator to check the date/time range, without worry that you might receive a string or some other invalid value - validators would never be run on a value that could not be parsed and converted in the first place.

For example, validators that expect numbers, for example, will never have to deal with a string.

The trouble with setAttributes() is that it doesn’t incorporate any explicit conversion step. Sure, you can hack it in via behaviors or validators, but it’s still a hack.

qiang · August 1, 2011, 12:55am

Could you propose a way to support conversion and separate it from validation?

gusnips · August 1, 2011, 2:08am

I actually did something like that that and I use in a big project I maintain

I’m not sure if it is the same functionality you want to accomplish

My main need is to unparse data before validate, and to fill it in form and print it parsed in a unified way

anyway

the implementation is like this:

declaring the types in the models




function types(){

 return array(

   //sintax 'attribute'=>array('type','options'), almost like validators

  'user_id'=>array('relation','relationName'=>'myRelation'),

  'create_time'=>array('datetime','timestamp'=>true),

  'valid'=>'boolean',//when its a string its treated like a type without options

  'birthday'=>array('date','format'=>'d/m/Y'),

  'website'=>array('link'),

  'passwd'=>array('password','repeat'=>'repeatPassword','hash'=>'sha1'),

  'comments'=>arrray('text','widget'=>array('ext.ckeditor.ckeditor'),'widgetOptions'=>array('...')),

 );

}

usage like this




echo $model->parse('birthday');

//before parse: $model->birthday is a integer timestamp

//output: 05/04/1984

echo $model->parse('website');

//before parse: mysite.com

//output: <a href='www.mysite.com' target='_blank'>mysite.com</a>

CActiveForm usage




$form->typeField($model,'passwd');

//output: 2 password fields, 1 for the password itself, another one for the repeat field, that must be a public property of the model, named repeatPassword as declared in the type

$form->typeField($model,'comments');

//output: the widget declared in the widget option of the type

the rest of the implementation I’ll post if you think its useful

basically, each type has a class that handles the parse method, beforeValidate conversion and the converstion to a form attribute .This type class is created the first time you access it

this implementation gives me the possibility to output anyway I want, but with predefined types that saves me time

I’ve also made a version of model generator to automatically generate the types method

What do you think ?

mikl · August 1, 2011, 7:49am

In my opinion, what makes validating data in i18n formats so hard, is that the current validation mechanism does both: Validating the type format and the locigal content of the data. For a clean i18n solution i think, both need to be separated.

One workflow could be:

Input data

[b]

[/b][list=1][*][b]Assign attributes

[/b]Like suggested in a previous post, we could use "formatted<name>" (or a shorter prefix), to mark attribute values that are in locale specific format.[*]Validate type (format)

Here only the type of data is checked for validitiy: Check if a float looks like a float, a date like a date or an email like an email address. For several data types the format depends on the locale: “1.234,56” is a valid float, if locale is “de_DE”, “1,234.56” is valid if it’s “en_US”. This step would work on the “formatted<name>” attributes if one is set and on the standard attribute if not.[*]Convert to DB (or PHP) format

If there’s no “formatted<name>” value set for an attribute, this step can be left out. If there was an error in (2) for the attribute, validation should stop here[*]Validate data (logical value)

Only if an attribute has no errors in (2), we can now check the data logically: check for min, max values, ranges, etc. Even MX record validation is part of this step.[/list]

Output data

Like suggested above, we could use virtual attributes (like "formattedBirthday" for attribute "birthday") to output the data in right locale format to views, and forms. The reason for this extra attribute name is, that this makes it possible to access the original value e.g. to perform some calculations.

mindplay · August 1, 2011, 12:02pm

Gustavo:

declaring the types in the models




function types(){

 return array(

   //sintax 'attribute'=>array('type','options'), almost like validators

  'user_id'=>array('relation','relationName'=>'myRelation'),

  'create_time'=>array('datetime','timestamp'=>true),

  'valid'=>'boolean',//when its a string its treated like a type without options

  'birthday'=>array('date','format'=>'d/m/Y'),

  'website'=>array('link'),

  'passwd'=>array('password','repeat'=>'repeatPassword','hash'=>'sha1'),

  'comments'=>arrray('text','widget'=>array('ext.ckeditor.ckeditor'),'widgetOptions'=>array('...')),

 );

}

Another callback - and yet another reason to list all of your property-names again.

That is the direction set forth by Yii - I wish you would consider annotations, so we could get rid of all the callbacks and repetitive arrays repeating all the property-names again and again.

"all the other frameworks are doing it"

phtamas · August 1, 2011, 5:19pm

Different views of the same model may require different formatting. For example short localized date format in forms, medium format on a "table of contents"-type page, long date format on "details" page and RFC822 for RSS feed. Which one should be the "real" one?

The same applies to the input side. Input may come from different sources and in different formats. It’s always better IMO to normalize input in the controller layer (by using helpers) than trying to do that in the model . Model must not know where the data came from so it doesn’t know what input format to expect.

mindplay · August 1, 2011, 5:27pm

Hence view-models, which is not a concept we currently have in Yii - or rather, it’s a choice, you can create and use view-models if you like, decoupling your domain-model from the view.

The more common approach in Yii is to bind your views directly to the domain-model, which sometimes causes problems and forces you to introduce view-specific methods into your domain-models, etc. - view-models are invariably more complex than domain-models, that’s a commonly known fact; using view-models, you can keep view-concerns out of your domain-model and simplify (flatten) your object graphs for easier view-rendering/binding…

(I’m not saying the complexity goes away, it just moves and gets isolated in view-models)

qiang · August 1, 2011, 6:35pm

@Gustavo: coincidentally, my initial response was also adding a types() method. However, I think your implementation is more like another CForm.

@Mike: as phtamas pointed out, there are different formats. Letting the framework to automagically handle them seems to be difficult task.

@mindplay: we are not going to go the annotation way because it has several critical drawbacks (e.g. can’t be dynamically determined; have trouble if comments are stripped off; new learning curve; etc.).

Based on these discussions, it seems to me our current rules() mechanism can still work fine - a set of conversion rules can be declared at the beginning of rules() to ensure they occur before real validation rules. Formatting is a different story which is related with view layer. For complex applications, I think it is a good idea of separating view models from domain models.

mikl · August 2, 2011, 7:28am

@phtamas: Right, there are lot of different formats. But following the convention over configuration strategy, the framework could by default use the most common format per locale (configurable?) and still provide the option to specify formats on a per attribute & scenario level. And you still have access to the "raw" value on your views for the cases where you need to display an attribute in a very special format.

@qiang: I think this is such a common requirement for any non-english locale user, that at least some basic support should be provided by the framework. Using the rules() mechanism for this is very cumbersome (e.g. define a regex for all local date/decimal formats + a conversion rule) and repetitive.

Also conversion is logically not part of validation. This can lead to strange problems, like:

[list=1][]User enters a number in local format and submits the form[]The attribute had an error and you need to render form content again.[/list]

Now: How can you tell in your view, wether b conversion failed[/b], and you can output the raw number value again or wether b a range validator[/b] failed, and before output you have to convert back to the right locale format? Separating type and logical validation in the model solves this.

I do understand though, that this is not an easy thing to implement.

phtamas · August 2, 2011, 11:11am

@Mike: I like the idea of two-level validation and simplified format conversion. I just don’t want it to be done automatically in my models. Maybe a redesigned CForm - with custom field types - would be a more appropriate choice for that. Custom field classes could define conversion and format validation rules - and they would be reusable.

mikl · August 2, 2011, 12:01pm

@phtamas: Can you suggest a usage example? And what if you don’t use a CForm (which i never have)? Actually even from a strict MVC perspective i would not consider it wrong, that a model can handle data conversion from/to a local format for convenience. Any other solution will require more manual interaction, either in Controller action or in view.

If we don’t want this in the model core, then maybe a behavior is a better option.

qiang · August 2, 2011, 1:10pm

@Mike: Here’s a solution (workaround) to your problem, assuming you want to deal with attribute A:

[list=1][]Declare a new attribute formatA[]In rules(), declare a conversion rule for formatA, which converts formatA to A. If there’s any conversion error, add the error to A (not formatA)[]In rules(), add more rules to validate A[]In the view, collect input formatA, but display error for A[/list]You can write the conversion rule as a validator class so that you don’t have to repeat the regex everywhere.

Clearly, to solve your problem, we need to keep two piece of data: A and formatA. The above solution is mainly used when the input widget can’t do the conversion and everything has to be done in the model. If an input widget (such as CJuiDatePicker) can do the conversion, then we can skip formatA in the model.

While strictly speaking, data conversion is different validation (the latter usually shouldn’t change the attribute value), they share a lot of common things. For example, like validation, you also want to display errors occurring during conversion.

Basically I’m trying not to introduce new protocols/conventions unless necessary.

samdark · August 2, 2011, 2:47pm

qiang

Good solution. We came to something like this in current project. I think it’s a good thing to mention in the guide since, I guess, it’s not that rare.

mikl · August 3, 2011, 6:39am

Qiang, your workaround works fine, if you can always use ‘formatA’ in forms. But i think, this only works well for new records. For example what if you edit a DB record? Then formatA is empty after loading and you’d have to manually format A in your views instead. You could use afterFind to popupulate formatA from A - but that would again be lot of repetitive work.

To me the question is: What is the most convenient solution? It should be as less to type as possible, thus not require too much repetition and therefore come with most common default settings. As you found a way to add virtual attributes from a behavior (see here) we could have a behavior that auto-handles all this conversion.

So as conclusion: If you all agree we can take this topic out of the discussion, as it can be solved outside the core.

Sidenote: I have such a behavior in mind for a long time now. But to make it generic, it needs to handle parsing of lot of localized formats automatically and even allow to configure custom parse formats. If someone has an idea, drop me a note.

qiang · August 3, 2011, 1:16pm

Yes, it needs some additional work for update action. Another workaround (which what I am using in my projects) is to define getter and setter for A:

getInputA() { return formated A }
setInputA($v) {parse $v and save it to A }

Ben · August 3, 2011, 2:00pm

I think at least support for type conversion should be in the core. For example by providing filter steps at appropriate stages. When and how those need to be applied might depend on the use case.

For example when loading data from the DB into models, you know what format will be passed to the model. Validation can be skipped, only conversion must be done. This could be achieved with a couple of converters for the basic SQL-types (http://en.wikipedia.org/wiki/SQL#Data_types). Provide a way to register additional custom converters for BDMS-specific data types (enum or set in MySQL comes to my mind) and real custom usage (serialized objects in a BLOB, build DOM from string representation in VARCHAR). Custom converters that prove to be useful can that be added with minor releases.

The other way around, collecting data from user input and trying to convert it is certainly harder, since the framework can’t know what format will be provided.

So maybe it makes sense to differentiate between trusted and untrusted sources?

/////////////////////////////////////

// Edit:

Just found out, that PHP comes with an extension about data filtering, which is included and enabled by default since v5.2

http://php.net/manual/en/book.filter.php

mindplay · August 9, 2011, 6:29pm

In that case, rather than adding yet another callback, I would propose unifying all the callbacks into a single method - getMetadata().




class User extends CActiveRecord

{

  public function getMetadata()

  {

    return array(

      'tableName' => 'users',

      'validators' => array(

        // the usual validation configuration

      ),

      // etc.

    );

  }

}

Now you could have some sort of generic (and configurable) mechanism for metadata-handlers, e.g. standard handlers for ‘tableName’ and ‘validators’.

One problem with this approach (and with the existing approach in Yii) is that all of the metadata is instance-specific, which means you can’t get to it until you have an instance.

I understand your reservations about annotations, and with annotations you have the opposite problem - there is no instance-specific metadata, all metadata is general to the class.

I guess it might be possible to support both?




class User extends CActiveRecord

{

  public static function getClassMetadata()

  {

    return array(

      'tableName' => 'users',

      'validators' => array(

        // the static (class-general) validation configuration

      ),

      // etc.

    );

  }


  public function getMetadata()

  {

    return array(

      'validators' => array(

        // a couple more validators specific to this instance...

      ),

    );

  }

}

Now with two callbacks, one for class-metadata and one for instance-metadata, you can have a more optimized and less memory-intensive metadata-handler architecture, in which getClassMetadata() is invoked only once per class, and getMetadata() is invoked dynamically when you ask for a handler.

Handlers should be optional - if no handler is associated with a particular type of metadata (such as "tableName"), you get the raw value, in this case a string, but could be an array or something else.

mindplay · August 9, 2011, 6:46pm

Forget that it uses annotations - where the metadata comes from is besides the point - but take a look at the mechanism implemented in this demo-script:

http://code.google.com/p/php-annotations/wiki/DemoScript

Quick overview:

[list=1]

[*]look through the object properties (not through the post-data) and search for matching valus in the post-data

[*]attempt conversion based on the property-type; the converter must have access to the validation error-list, in case it fails to convert the input

[*]on successful conversion, run the validators as usual; but now you can write clean validators that assume correctly-typed input (property-validators don’t run unless conversion succeeds)

[*]if all fields converted successfully, run the object’s state validation-method, if present

[*]on successful validation, apply the updated properties to the object

[/list]

This way, conversion and validation runs in stages, and each stage is simpler because you can make a number of presumptions at each stage, separating and simplifying the responsibilities of converters and validators.

Note that this demo is based on the idea of managing the state of a form in a separate (auto generated) view-model - it does not make changes to the object until the form passes validation. You may or may not like that aspect of this implementation - I personally find that it makes a lot of sense, given my general perception that form validation tends to be more complex than (and/or different from) object validation.

kernel · August 27, 2011, 12:05pm

Rules / validators should be an independent part not only it can be used by model but also in general purpose, There could be 2 layers - 1, rule and 2, ruleset.

Rules should be more context sensitive.