Today I had a realization I wish I’d had when I wrote my own object-relational mappers (twice!) in the past.
Today I got in touch with Vance Lucas, who has resumed development of his phpDataMapper project, now known also known as Spot - a very interesting project, perhaps not so much to those who use Yii exclusively for all of their projects, but anyway…
He asked some questions, and got me thinking about a very important topic - something that should be taken into consideration if such a thing as "generation 2" of Active Record for Yii is ever proposed for any reason.
And that is: validation vs type integrity.
The discussion made me realize, that a type integrity layer is something that is entirely missing from Yii’s implementation of Active Record - or perhaps from it’s database abstraction layer, I’m still not 100% sure where type integrity features really belong. I’ve become convinced that they don’t belong in the model, however.
Active Record has validators, of course - it inherits those from CModel, and pretty much relies on those for type integrity constraints.
The problem with that approach, is that you’re really relying on the same feature to solve problems in two different domains: validation of user input (communication between the model and some other component, typically a form) - and type integrity constraints (imposed by the underlying storage model, a relational database).
That in itself would be fine, if not for the fact that the requirements for solving problems in those two different domains are actually much less similar than they may appear - yes, they both validate input in some form, but when you look into it, that’s pretty much where the similarities end.
The Type Integrity Layer:
-
Implements type safety for the persistence layer, enforces data integrity constraints in the framework.
-
Mediates between PHP datatypes and SQL string representations of SQL datatypes.
-
Handles simple datatypes only (string, integer, float, date, time, enumerator, etc.).
-
Is developer-friendly - throws exceptions that help developers solve storage-related problems.
The Validation Layer:
-
Implements detailed validation for models, enforces "business" rules in your application.
-
Mediates between PHP datatypes and raw form-values or widget-specific value representations.
-
Handles complex/compound/abstract types (e-mail address, URL, numeric range, etc.).
-
Is user-friendly - accepts localized input, and reports errors that help users correct their input.
The validation layer has support for features that aren’t needed, or even appropriate, for the data integrity layer. For example, a floating-point attribute in the validation layer needs to be aware of what characters are used as thousand- and decimal-separators. While a floating-point value in the data integrity layer is always formatted with “.” as the decimal-separator, and no thousand-separator.
It would appear that you would end up with some overlap - for example, string-length would have to be validated both in the data-integrity layer, and in the validation layer. And this is probably what makes most programmers think that such an approach must be wrong - we don’t like to implement the same things twice.
But only the basic validations for datatype, string-length, and whether a field accepts null-value, actually overlap, and not the conditions under which they are validated. For example, for an integer attribute, the type-integrity layer should check if the value is integer, and in a range supported by the column - while the validator should check if the value is within the range required by your application.
This example demonstrates an important difference validation and data integrity checks. Suppose you’re storing an USIGNED TINYINT (range 0-255) for your column - but you write a validator that requires a range between 0 and 1000. The problem here is obvious, and since the developer is responsible for this mistake, this will result in an exception - clearly you picked a datatype that doesn’t meet the storage requirements for your business rules.
That’s a very simple example, but it demonstrates a real problem. For example, last week I had to fix a bug where, seemingly, pasting HTML into a WYSIWYG editor on a form would cause the content to truncated when you hit save. Two developers spent at least a couple of hours trying to solve this mysterious bug - maybe there was something wrong with the WYSIWYG editor? Maybe there was some invalid characters in the HTML this user was trying to paste? Maybe weird markup was carrying over from Word? We would chase this bug with var_dump() and die() statements in various locations, until, finally, we realized - the column in the database was a STRING, and would only accept 255 characters, it had nothing to do with the editor or the content.
A data-integrity layer gives you an extra layer of insulation against silent errors - as a developer, if there’s a disconnect between what you’re trying to store, and how you’re trying to store it, you need to know. Even if the system is in production, an exception (and a roll-back if you’re doing your job properly!) is always better than letting the users think that everything is fine, when you’ve actually just thrown away half of the content they just submitted. Someone might submit dozens of entries before realizing that things aren’t working as well as they seem to - it’s better to get a developer on the job as soon as possible, with an error message that can help resolve the problem quickly.
Customers tend to be more forgiving of errors that are caught and fixed early on (hopefully during pre-launch user testing!) than of periodic errors that linger mysteriously for months, before anybody comes up with a qualified guess as to what may be causing it!
Anyway, I’m side-tracking here, and I think I’ve made my point. I think there is a missing piece to the puzzle here, and I hope this helps complete the picture at some point in the future…