HTMLPurifier - limit input to alphanumeric characters

I’ve downloaded and configured this Yii extension for purifying user input:

It removes malicious XSS strings, but what I’d like to do is limit the input to alphanumeric characters only.

// For example, say someone tacks this onto the query string:

?somevar=SomeValue') AND 8420=8420 AND ('crap'='crap

// I can grab and purify the input simultaneously with:

$somevar =Yii::app()->input->get('somevar');

// However, when I echo $somevar, it contains the original cruft:

SomeValue') AND 8420=8420 AND ('crap'='crap

How can I make HTMLPurifier (with or without the Yii ‘input’ extension) limit the input to alphanumeric characters?

Any help appreciated, thank you!


Isn’t it better just to use model validation?

Hey Sam, thanks for replying.

There are cases where I want to validate the data before dealing with any models.

For instance, on the URL, someone is signing up for my services. One of the things the controller expects, in the $_GET array, is a variable called "name".

// For example, say someone tacks this onto the query string:

?name=BigWidget') AND 8420=8420 AND ('crap'='crap

$name = Yii::app()->getQuery('name');

$condition = "name='{$this->name}'";

$package = Package::model()->find($condition);

The find() command above will give throw this exception:

Code: 500

File: /home/blah/src/yii/framework/db/CDbCommand.php

Line: 516

CDbCommand failed to execute the SQL statement: SQLSTATE[42000]: 

Syntax error or access violation: 1064 You have an error in your SQL syntax; 

check the manual that corresponds to your MySQL server version for the right 

syntax to use near ') AND 8420=8420 AND ('crap'='crap' LIMIT 1' at line 1

As you can see, find() tried to run this query, which has invalid syntax:


SELECT * FROM product WHERE name=‘BigWidget’) AND 8420=8420 AND (‘crap’=‘crap’


I’d prefer to NOT even be sending such unsanitized input to find() in the first place, which is why I was hoping HTML purifier had an option to strip the input so that only alphanumerics (for example) are allowed.

Short of that, I’d have to write my own wrapper for CmsInput. But that’s just plain sad. Because then my wrapper would wrap around CmsInput, which in turn wraps around HtmlPurifier!



Why are you not using bound query parameters?

Never grab query strings directly.

If you use a param and then cast the raw value to int, nothing bad can happen. ;)


[color=“blue”]$post[/color]=[color=“green”]Post[/color]::[color=“green”]model[/color]color=“olive”[/color]->[color=“green”]find[/color][color=“olive”]([/color][color=“red”]’[/color][color=“red”]postID=:postID[/color][color=“red”]’[/color], [color=“black”]array[/color]color=“olive”[/color][color=“olive”])[/color];


I truly hate this editor!!

You’re right. Time to bind my parameters.

QUESTION: Since binding will take care of potential SQL injection attacks, is there ANY REASON to use HtmlPurifier?

For anyone interested in more on binding, see

Solution is below for anyone interested.

$name = Yii::app()->getQuery('name');


$criteria = new CDbCriteria;

$criteria->addInCondition('name', ':name');

$criteria->params = array(':name' => $name);

$package = Package::model()->find($criteria);


$condition = "name='{$name}'";

$package = Package::model()->find($condition);

Thanks to both of you for replying.



You would purify the output from whatever your users are inputting, like in textfields, textareas, …


I wouldn’t purify it before it enters the database as I would like to be able to see what they enter (and spank/ban them for it).

I am not really a security expert (yet).

Purifying the output is not a good idea since it can slow down an application. Storing both versions may be a solution.

That’s a good idea. But it’s still ‘output’ whether or not it’s cached in the database. :)

I am doing pretty much the same thing with my wiki module I am writing now: source code and generated output.


You can cache it so there will be no need to store both purified and original content.