String Gets Truncated On Insert

I have a really weird problem I hope someone can help me with:

I’m trying to load a webpage, and save the text from the page into a MySQL database. Before saving I’m replacing all control characters and most special characters with a single space. The text string comes out just right after processing, but whenever I try to save it to my database, it gets truncated. To show the problem in code:

$sql = "INSERT INTO " . RawHtml::model()->tableName() . "(raw_text) VALUES (:raw_text)";

$cmd = Yii::app()->db->createCommand($sql);

$data[':raw_text'] = $model->fulltext;


$data[':raw_text'] = "";

$str = "ab' -";

for($i=0; $i<10000; $i++) {

    $data[':raw_text'] .= $str[rand(0,4)];


$data[':raw_text'] .= "xxx";




Here I’m inserting some random characters into $data[’:raw_text’] for debugging. This works fine, and the string is of the appropriate length in the database and is printed in full in the die(). However, when I remove the debugging code, the string inserted into the db gets truncated to about 4000-5000 characters. The die() still prints the whole string, without any truncation. The db column is a LONGTEXT.

Does anyone have any idea what’s happening here? :blink:

What is $model->fulltext?

$model is an instance of the model this function is in (called Sources), instantiated with

$model = new Sources();

$fulltext is

public $fulltext

in Sources. The attribute is populated in:

    private function fetchSite($url) {

        mb_internal_encoding( 'UTF-8'); 

        mb_regex_encoding( 'UTF-8');

        $cu = curl_init();

        $timeout = 5;

        curl_setopt($cu, CURLOPT_URL, $url);

        curl_setopt($cu, CURLOPT_RETURNTRANSFER, 1);

        curl_setopt($cu, CURLOPT_CONNECTTIMEOUT, $timeout);

        $data = curl_exec($cu);


        $cu = null;

        if ($data === false || strlen($data) < 500) return false;

        $data = preg_replace('~<\s*\bscript\b[^>]*>(.*?)<\s*\/\s*script\s*>~is', '', $data);

        $data = preg_replace('~<\s*\bstyle\b[^>]*>(.*?)<\s*\/\s*style\s*>~is', '', $data);

        $data = preg_replace('/<\/option><option(.)+>/', " ", $data);

        $data = trim(strip_tags($data));

        $data = preg_replace('/([\xA7-\xDF\xE4-\xFF\x1-\x20]+)/', ' ', $data);

        //$data = preg_replace('/(?:\s\s+|\n|\t|\r)/', " ", $data);

        //$data = preg_replace('/\s+/', " ", $data);

        $this->fulltext = $data;


that’s called earlier in the code. $model->fulltext is not modified elsewhere. And when outputting $data[’:raw_text’] which is a copy of $model->fulltext in die() the value is not truncated, so something is happening while executing the query.

I would debug it this way:

  1. Try to submit test string (for example, 10000 letters “a”), not a real text. Data can be truncated on some weird char, that you’ve missed during replacement.

  2. Check post_max_size

And, btw, here’s your answer I suppose:

This is a step in the right direction, but still didn’t solve the issue. I’ve set emulatePrepare to false in the main config, and I’m also disabling emulates with

$db = Yii::app()->db;

$db->emulatePrepare = false;

No effect. The character that seems to be the culprit is the pound sign (£), so I thought that I’ll try removing it for debugging, but even that doesn’t work. I’ve tried using preg_replace, str_replace, but the pound sign just wont budge. The source text is UTF-8, my script is UTF-8, php internal mysql table encoding is UTF-8…

Also, I tried inserting some pound signs into the db from the same script, but that doesn’t work any better. Any ideas?

Just to clarify, this script is being run as a console command. And yes, I’m using console.php as the config file, but the same db configuration exists also in main.php.

Ok, have you tried to submit a test string instead of real text? what’s the result?


$sql = "INSERT INTO " . RawHtml::model()->tableName() . "(raw_text) VALUES (:raw_text)";

$db = Yii::app()->db;

$db->emulatePrepare = false;

$cmd = $db->createCommand($sql);            

$data[':raw_text'] = "";

$str = "äÖ' £";

for($i=0; $i<10000; $i++) {

    $data[':raw_text'] .= $str[rand(0,4)];


$data[':raw_text'] .= "xxx";            


It doesn’t work. All multibyte characters seem to break the insert. Outputting the string to console comes up with “¤’ÃÃäÃÃäÅ”.

That’s crap.

Ok, checklist

And try to do it without Yii (just php + pdo).

It isn’t working even without Yii. Here’s the code:




$sql = "SET NAMES 'utf8'; INSERT INTO tbl_raw_html (raw_text) VALUES (:raw_text)";

$db = new PDO("mysql:host=localhost;dbname=...;charset=utf8", ...);

$data[':raw_text'] = "";

$str = "äÖ' £";

for($i=0; $i<10000; $i++) {

    $data[':raw_text'] .= $str[rand(0,4)];


$data[':raw_text'] .= "xxx";

$stmt = $db->prepare($sql);



Same behaviour as before. I’ve also done everything on the checklist. I’m losing hope. :-[

You may be loading a different version of php.ini when launching from the console.

See if any of the information on this page helps.


$dbHandle = new PDO(

    "mysql:host= localhost;dbname=...;charset=utf8", ...user, ...pwd,



$dbHandle -> exec("SET CHARACTER SET utf8");

Btw what’s the encoding of source files?

Thanks! Now we’re getting somewhere! Indeed the php.ini is different when launching from the console, so I added these lines to my script:

ini_set("mbstring.func_overload", "7");

ini_set("mbstring.internal_encoding", "UTF-8");

Then I changed my test string generator to:

$data[':raw_text'] = "ASD";

$str = "asdÖ£ä";

for($i=0; $i<10000; $i++) {

    $data[':raw_text'] = $data[':raw_text'].mb_substr($str, rand(0,5), 1);


And this works! However the actual data I want to insert still isn’t working, even though the data is UTF-8 encoded to begin with, but I’ve tried converting it to UTF-8 too just to be sure, to no avail. All my source files are UTF-8 encoded.

I figured it out! In fact, it was my own fault it wasn’t working. When removing control characters from the source data, I was using this:

$data = preg_replace('/([\xA7-\xDF\xE4-\xFF\x1-\x20]+)/', ' ', $data);

And apparently that was the mistake. Removing that line makes it work.