ActiveRecord architecture

mindplay · July 2, 2010, 1:59am

Today I had a realization I wish I’d had when I wrote my own object-relational mappers (twice!) in the past.

Today I got in touch with Vance Lucas, who has resumed development of his phpDataMapper project, now known also known as Spot - a very interesting project, perhaps not so much to those who use Yii exclusively for all of their projects, but anyway…

He asked some questions, and got me thinking about a very important topic - something that should be taken into consideration if such a thing as "generation 2" of Active Record for Yii is ever proposed for any reason.

And that is: validation vs type integrity.

The discussion made me realize, that a type integrity layer is something that is entirely missing from Yii’s implementation of Active Record - or perhaps from it’s database abstraction layer, I’m still not 100% sure where type integrity features really belong. I’ve become convinced that they don’t belong in the model, however.

Active Record has validators, of course - it inherits those from CModel, and pretty much relies on those for type integrity constraints.

The problem with that approach, is that you’re really relying on the same feature to solve problems in two different domains: validation of user input (communication between the model and some other component, typically a form) - and type integrity constraints (imposed by the underlying storage model, a relational database).

That in itself would be fine, if not for the fact that the requirements for solving problems in those two different domains are actually much less similar than they may appear - yes, they both validate input in some form, but when you look into it, that’s pretty much where the similarities end.

The Type Integrity Layer:

Implements type safety for the persistence layer, enforces data integrity constraints in the framework.
Mediates between PHP datatypes and SQL string representations of SQL datatypes.
Handles simple datatypes only (string, integer, float, date, time, enumerator, etc.).
Is developer-friendly - throws exceptions that help developers solve storage-related problems.

The Validation Layer:

Implements detailed validation for models, enforces "business" rules in your application.
Mediates between PHP datatypes and raw form-values or widget-specific value representations.
Handles complex/compound/abstract types (e-mail address, URL, numeric range, etc.).
Is user-friendly - accepts localized input, and reports errors that help users correct their input.

The validation layer has support for features that aren’t needed, or even appropriate, for the data integrity layer. For example, a floating-point attribute in the validation layer needs to be aware of what characters are used as thousand- and decimal-separators. While a floating-point value in the data integrity layer is always formatted with “.” as the decimal-separator, and no thousand-separator.

It would appear that you would end up with some overlap - for example, string-length would have to be validated both in the data-integrity layer, and in the validation layer. And this is probably what makes most programmers think that such an approach must be wrong - we don’t like to implement the same things twice.

But only the basic validations for datatype, string-length, and whether a field accepts null-value, actually overlap, and not the conditions under which they are validated. For example, for an integer attribute, the type-integrity layer should check if the value is integer, and in a range supported by the column - while the validator should check if the value is within the range required by your application.

This example demonstrates an important difference validation and data integrity checks. Suppose you’re storing an USIGNED TINYINT (range 0-255) for your column - but you write a validator that requires a range between 0 and 1000. The problem here is obvious, and since the developer is responsible for this mistake, this will result in an exception - clearly you picked a datatype that doesn’t meet the storage requirements for your business rules.

That’s a very simple example, but it demonstrates a real problem. For example, last week I had to fix a bug where, seemingly, pasting HTML into a WYSIWYG editor on a form would cause the content to truncated when you hit save. Two developers spent at least a couple of hours trying to solve this mysterious bug - maybe there was something wrong with the WYSIWYG editor? Maybe there was some invalid characters in the HTML this user was trying to paste? Maybe weird markup was carrying over from Word? We would chase this bug with var_dump() and die() statements in various locations, until, finally, we realized - the column in the database was a STRING, and would only accept 255 characters, it had nothing to do with the editor or the content.

A data-integrity layer gives you an extra layer of insulation against silent errors - as a developer, if there’s a disconnect between what you’re trying to store, and how you’re trying to store it, you need to know. Even if the system is in production, an exception (and a roll-back if you’re doing your job properly!) is always better than letting the users think that everything is fine, when you’ve actually just thrown away half of the content they just submitted. Someone might submit dozens of entries before realizing that things aren’t working as well as they seem to - it’s better to get a developer on the job as soon as possible, with an error message that can help resolve the problem quickly.

Customers tend to be more forgiving of errors that are caught and fixed early on (hopefully during pre-launch user testing!) than of periodic errors that linger mysteriously for months, before anybody comes up with a qualified guess as to what may be causing it!

Anyway, I’m side-tracking here, and I think I’ve made my point. I think there is a missing piece to the puzzle here, and I hope this helps complete the picture at some point in the future…

veris · July 2, 2010, 7:37am

We need these layers!

I have many "," <–> "." corrections in my models before/after validation and before saving.

The reason is of course that human readable number format differs from sql’s data format and the data needs to be converted every time i display it to the user or save it to the database.

Current AR implementation lacks a layer that would deal with these situations. I don’t have an idea how to implement this properly, but have some quick fixes.

I have developed an ActiveRecord class that extends CActiveRecord and upgraded the __get method to automatically convert numbers every time i add “_n2hr” or “_n2sys” at the end of a attribute’s name. I don’t need to define getter/setter functions of every attribute for both situations for every model. It’s just a quick fix and only for these situations.

I think of upgrading AR little bit more (but still quick fix). It will have a private variable holding the list of "number" attributes and will handle the conversion fully automatically. Of course i will modify before/after functions.

Elevating the problem to be more abstract i think we could find a more flexible solution pretty easily but implementing it needs proper knowledge of CActiveRecord, CActiveRecordMetaData, CDbTableSchema and so on. I have in mind the functionality and architecture both and the compatibility with AR idea.

zaccaria · July 2, 2010, 8:13am

Another example is the problem in managing date and time formats (and numbers too).

we rely on sql format for such data types, so we have to write some behavior for convert data from sql format to user-readable format. This behaviour has to be changed if we change database.

Should be nice if there will be an abstraction layer for all types: the development team can fix a standard type for numbers, data, emails and anithing is needed and then write validation rules (and conversion fuciont) for validate (convert) user input into this Yii format.

Later the dbAstraction layer (CDbConnection) will add the data-type check and translation to the various dbTypes.

That allows to change database system without problem with types, because if the types of database are not compliant with Yii types, the developer will have exception.

I guess that "The Type Integrity Layer" can reside all in framework, because are just standard type check between Yii and Db type, and so will not require the programmer to implement two layer.

Thank you very much mindplay for all your philosopy lessons!!!

veris · July 2, 2010, 9:26am

Of course putting data conversion in behaviors makes the application less portable but currently i don’t see a better solution that i could use. I could easily be lacking some knowledge and if you can, correct me.

I think if we put some integrity conversions in AR by ourselves, we should have in mind:

It should be easy to revert when proper layer gets implemented in Yii.
Put them all into one class (for example ActiveRecord that extends CActiveRecord).
Use behaviors and only our own functions/attributes belonging to ActiveRecord.
Do not override any of existing functions or other classes. For example not to implement any additional attributes in validators and rules() method that would control the conversion.

Generally in my opinion the main thing is that there are two data flows that need data conversions and checks etc., not just one:

USER(browser) <–flow1–> Yii(php) <–flow2–> DB(pdo)

My first thought:

Flow 1 can be controlled by CModel.

Flow 2 can be controlled by CDbConnection that cooperates with Pdo as you mentioned.

It seems that every flow is a layer, so we have two separate layers.

What if there were more flows or layers?

It needs to think of "data" in a more abstract way and distinguish for example:

data states
data flows
conversions and checks that belong to these flows

Data flow is between sender and receiver.

The data could have definitions that:

Allow sender to pack the data from his readable format into some standard format.
Allow receiver to unpack the data from standard format into his readable format.
Allow perform checks on the data and pack wrong data back to sender into the format readable by sender.

What do you think of this?

I know that it needs more thinking…

mikl · July 2, 2010, 9:41am

@mindplay:

I feel like you’re on a right path here. I came to a similar conclusion recently, when trying to separate “type related” rules (like max string length in a DB column) that should always apply and “scenario related” rules (like when is an attribute safe or in which scenario should an attribute be unique). It required adding lot of scenarios to not make all type rules safe in all scenarios (a drawback of the current safe attributes definition in 1.1.x).

Your proposal seems to also solve this separation issues. Any idea how a concrete implementation could look? I usually start here, how i’d like use such a feature later. So let me take a start from that perspective and propose something:


/**

 * New model method: Returns the type rules of your attributes.

 */

public function types() {

    array(

      'id'=>array('numeric','type'='tinyint'),

      'name'=>array('string','maxlength'=>255),

      'email'=>array('string','maxlength'=>255),

    );

}


public function rules() {

    array (

      'name'=>array('unique','on'=>'register'),

      'email'=>array('email'),

      ...

    );

}

It’s lacking the mediation part though (like converting dates back and forth to/from locale representation). So suggestions how this could be done?

veris · July 2, 2010, 10:10am

Ok, some other thoughts on the subject.

Data flows between these layers:

USER

output contolled by Yii (CHtml functions) or Php echo function.
input controlled by $_POST and $_GET

PHP (Yii)

this layer doesn’t need any conversions and control, allows complex manipulation on the data by native php functions.

PDO(db)

input and output by DAO or AR, so CDbConnection is at the bottom of both

So we could implement a class that represents the data and allows to define output and input checks and conversions between 3 above layers.

AR fits here almost perfectly. It just needs:

tweaks for conversions on the fly when echoing attribute values (from php to user)
tweaks for automatic converting after $ar->setAttributes() (from user to php)
tweaks for automatic converting before $ar->save() (from php to pdo)

AR might have ‘outputType’ attribute to control the conversions. We can output values by html encoding to the browser or just output them as text in a console application.

Maybe implementing $ar->setOutputType($type)

and redesigning magic __get method could do the trick…

If user doesn’t use AR and use just DAO… the conversion PHP <–> PDO should be in CDbConnection by using for example wrapper classes to some data types to perform the conversion.

Ok, these were some lazy thoughts and don’t get them too seriously

I don’t have much experience in creating and modifying design patterns.

I love Yii and for me Yii is perfect!

zaccaria · July 2, 2010, 10:15am

@mike

I think that the type integrity layer should not rely on rules in model and should not be holded by CActiveRecord.

My idea is to do somethinglike that: in CDbCommandBuilder at the line 153




	/**

	 * Creates an INSERT command.

	 * @param mixed the table schema ({@link CDbTableSchema}) or the table name (string).

	 * @param array data to be inserted (column name=>column value). If a key is not a valid column name, the corresponding value will be ignored.

	 * @return CDbCommand insert command

	 */

	public function createInsertCommand($table,$data)

	{

		[...]

		$i=0;

		foreach($data as $name=>$value)

		{

			if(($column=$table->getColumn($name))!==null && ($value!==null || $column->allowNull))

			{

				$fields[]=$column->rawName;

				if($value instanceof CDbExpression)

				{

				 [...]

				}

				else

				{

					$column->checkType($value);  ///line added

					$placeholders[]=self::PARAM_PREFIX.$i;

					$values[self::PARAM_PREFIX.$i]=$column->typecast($value);

					$i++;

				}

			}

		}

		[...]

	}

And then we can implement our check in CDbColumnSchema:




	/**

	 * Check if the data respect the db formats

	 * @param mixed input value

	 * @throws DbTypeMismathcException

	 */

	public function checkType($value)

	{

		switch($this->type)

		{

			case 'string': 

				if (strlen($value)>$this->size)

					throw new DbTypeMismathcException(

						Yii::t(

							'yii', 

							'The string for the field {table}.{field} is too long. ({strlen} chars, {size} allowed)', 

							array ('{field}'=>$this->table, '{field}'=>$this->rawName,'{strlen}'=>strlen($value),'{size}'=>$this->size)

						)

					);

			break;

			case 'integer':[...]

			case 'boolean':[...]

			[...]

			

		}

	}

As you see, the type integrity system relies ONLY on db types, and so no configuration (= extra work for programmer) is needed in activeRecord definition.

Is possible to throw the more datailed exception we want (like I did, for example), so if a user will have the error, there will be all advantages mindPlay said, including:

The item is not saved, there will be a rollback and the user can signalate the error
In the log there will be our exception, in wich is written wich field of database modify

About translation, there is already the function typecast. This can be expanded and can be added a translation for all dates, form the Yii format (still do choose) to the db actual format.

The check for data should be that the data we are going to save is a valid Yii date (otherways exception).

What do you think about it?

mikl · July 2, 2010, 10:38am

Hmm. I admit my vision on all this is still very vague.

@zaccharia:

From how i understand your proposal, you do number conversion right when the SQL is created. But then how can you do e.g. check the range of a number? Validation must be performed before SQL is built. So a number like in german "1.203,44" (which is 1203.44) needs to be converted before.

I wonder, if there’s still another level involved, separate from type integrity and data validation: namely locale specific type conversion. Because conversion is neither really related to validation nor to type integrity.

zaccaria · July 2, 2010, 11:16am

The conversion from locale to integer should be done by Validation Layer, so the type integrity will receive 1203.44.

The Yii type should be the more close possible to the standard database type. The type integrity level should provide a common type for the validation (presentation) layer. That what a db abstraction layer should do, and Yii do it VERY well, I am not complaining about it but just advincing a small feature.

Locale concern input validation because the input are collected in locale format, so they should be first normalized and later validated.

like that:

user input (1.203,44)

<de-localization>

   Yii format (1203.44) 


       &lt;validation&gt; 


           Yii format 


              &lt;type integrity&gt; 


                   Yii format 


                      &lt;db conversion&gt; 


                          save on db

the type integrity check will be done only if a save is required (and the dbConversion as well).

Right now we have nearly all level:

With the formatter we can do the de-localization level

validation is implemented

Just miss the check for db type and a more strong abstraction of database type (for date type, for all other is not needed).

Ps: I am italian, no ‘h’ in zaccaria is needed

mikl · July 2, 2010, 11:57am

Sorry for that

veris · July 2, 2010, 12:04pm

What do you think of that?

EDIT: Updated to what zaccaria proposed in #9

EDIT: Updated to what Mike said in #12

mikl · July 2, 2010, 12:09pm

@veris:

Close to what i just thought, too. But one problem: If user submits data, conversion occurs before validation happens. So what if data is of wrong locale format and can’t be converted?

We need:

locale specific validation (e.g. correct number format)
"standard" validation (e.g. number range)
type validation (as described by mindplay)

veris · July 2, 2010, 12:21pm

I’ll try to update the diagram as we have new ideas.

@mike:

Good point!

I think that User locale is what Yii doesn’t know by itself.

It should be defined in app configuration or posted with the data.

We should prepare html forms for this.

So we can implement lightweight check of the POST/GET data if there is locale information and perform the conversion then.

After the conversion we just put the data into AR’s attributes. We don’t validate it here.

We will validate just before saving it to database as it is now.

I’ll try to update the diagram to show what i mean.

zaccaria · July 2, 2010, 12:36pm

@veris: I quote what you said.

Just keep in mind that the validation as Yii does is correct and still needed.

I just propose to add a second level of validation, authomatical, developer friendly and right before save, in addition and not instead Yii validator.

In your first diagram there was only one validation.

About validation, I suggest to speak about it in a new post, because we are going a bit ‘out of topic’.

The second level of validation proposed by mindPlay is already an articolated and complex topic, let’s speak only of validation and let discuss translation in another place.

EDIT: now the diagram is what I mean

mindplay · July 2, 2010, 12:47pm

Yes! This point is of crucial importance. Because we’re currently relying on a relational DBMS for storage, we do need to design the back-end storage schema - but once that is done, the business logic of our application should never need to get mixed up with storage concerns.

Converting a timestamp to/from SQL format is a storage concern. Abstracting away storage concerns is the whole point of the Active Record pattern.

This is closely related to another post I put up recently - see here. This may contain parts of the solution to this problem, or at least ideas/inspiration for a more generic solution to these problems…

mindplay · July 2, 2010, 12:57pm

Mike:


/**

 * New model method: Returns the type rules of your attributes.

 */

public function types() {

    array(

      'id'=>array('numeric','type'='tinyint'),

      'name'=>array('string','maxlength'=>255),

      'email'=>array('string','maxlength'=>255),

    );

}


public function rules() {

    array (

      'name'=>array('unique','on'=>'register'),

      'email'=>array('email'),

      ...

    );

}

You don’t need to manually configure the types - it’s already specified by the back-end storage, your database schema, and repeating this information again in code will introduce another potential for errors.

Yii already has a nice API that lets you reflect on the metadata of your tables - it will tell you the column types, max. length, if they’re nullable, etc.

What’s needed, is another lower layer for type conversion that ensures type integrity. Since you already defined your storage requirements when you defined your schema, what you need now is not another specification, but a layer that enforces what you have already specified.

(that’s assuming that we’re not going to try and jump ahead to schemaless / non-relational databases - but in the first place, I would never propose using the Active Record pattern for those, because AR is a pattern designed to abstract away the complexity of SQL, and schemaless databases don’t usually rely on SQL in the first place…)

mindplay · July 2, 2010, 1:18pm

I agree, this goes beyond the discussion I was trying to start with this post - handling different locale is more of a user-interface concern, and likely belongs in yet another layer, probably in an existing layer, perhaps form components or widgets…

veris · July 2, 2010, 1:32pm

Yes, we mixed up too many things and gone a bit from the main topic.

My current diagram shows more of an idea of simple quick fixes than an architectural design that will help to understand the whole idea and two different domains of the problem - type integrity and validation. There are some ideas too, but we can’t put too many domains into 1 diagram.

I need to rethink whole thing again and eventually come up with another diagram.

My idea here was to show that architecture of AR should be designed by having the data flow in mind and also 3 layers: presentation, php and db and also maybe some lightweight conversions on different stages of the data flow.

Best regards

zaccaria · July 2, 2010, 1:50pm

Here there are 2 files with proposal of solution.

The file ZDbConnection is an alternative DbConnection that introduce a MySql schema with datatypes check (I did only for mysql, sorry!)

The file ZMySqlColumnSchema is the column schema with the chek. Only string are validated (you can add data, enum and even numerical format check).

For use it you have to place the files in controls and then configure your main.php:




'db'=>array(

    'class'=>'ZDbConnection''

    [...]

)

742

ZDbConnection.php

741

ZMysqlColumnSchema.php

mindplay · July 2, 2010, 1:54pm

Did anyone look at my sample mediator code?

I took another look at it myself, and I think this could be the beginnings of a type abstraction layer? The first missing piece though, is a mechanism for implicit type mediation for simple non-object attributes - string, int, float, etc… we don’t want an object instance for any simple attribute that can be properly and accurately represented with a native PHP variant, but we do still want the type integrity, so some object still needs to be responsible for creating a mediator for the underlying storage engine and converting values back and forth.

Even dates could probably be supported by implicit conversion, if you’re comfortable using timestamps (integers) to represent dates in PHP - personally, that’s my preference, some people prefer an object, and both could be easily and interchangibly supported, I think…