What the real limitation of ActiveRecord?

mindplay · April 13, 2010, 2:05am

Really? That sounds odd.

Any multi-threaded RDBMS should be able to handle multiple open cursors - otherwise, how would it be able to run more than one query at a time? If such a restriction exists, it must be an artificial restriction, and I would be very surprised if this was the default behavior…

The ORM I wrote in my last job would read and construct objects progressively - we must have built 20 websites on MySQL 4 and 5 while I worked there, and I’ve never run into that problem.

I’m not saying you’re wrong - I don’t know if that’s true or false, I just can’t imagine that MySQL would have this restriction and I would have somehow never run into it in my 10 years or so of using it…

mindplay · June 30, 2010, 1:14pm

As I said a while back, when this thread died - you may be right, I’m still not sure about that.

But the fact remains that processing large amounts of models is a very real need, which you cannot elegantly work around. I discussed this issue with my colleague, a Rails expert - it turns out, Rails has this feature.

Rails 2.3 added a new method, find_in_batches.

And long before they did that, there was a third-party plugin (still maintained) that adds this feature.

The third-party plugin is particularly interesting, because the author claims that he can actually do ORDER BY.

But the basic approach is the same - recordset are processed in smaller batches, avoiding the memory burden.

In the case of the standard Rails implementation, they achieve this by forcing the ordering of your query to sort by primary keys, so that, for example, if you select posts with many tags, each set of tags will arrive in order with it’s related post, e.g. ORDER BY post.id, tag.id.

Turns out, in 99% of all cases, this limitation doesn’t get in the way of anything you’re trying to do, because when you’re reading very large number of records, it’s usually during a batch operation of some sort.

Still though, the third-party implementation would be a worth a peek - if this plugin does what the author claims, that would be even better - for example, I often have clients who request a CSV export of some data, and usually they expect that if the paged HTML view they’re using is set to order by customer name, that their export comes out ordered the same way.

jwerner · December 2, 2013, 11:04am

Hello,

I had the same issue looping over lots of AR’s.

I found http://chris-backhou…l-in-chunks/925, but had to add garbage collection:





ini_set('max_execution_time',10*60);

gc_enable();

// records to read in one chunk: (experiment with that number)

$chunk = 250;[/size]

$total = MyModel::model()->count();

$numOK = $numNOK = 0;

for ($i = 0; $i <= $total; $i=$i+$chunk) [/size]

{

    $criteria=new CDbCriteria;[/size]

    $criteria->addInCondition('issueWorkflowStatus', array('Add. Info Needed','Open'));

    // a distinct sort order column ensures to read all records in a defined order

    $criteria->order = 'displayIssueNbr';

    $criteria->offset = $i;

    $criteria->limit = $chunk;

    // get chunk of AR models

    $models = MyModel::model()->findAll($criteria);

    foreach($models as $n=>$model)

    {

        $model->doSomeUpdateMthod();

        if($model->save()) {

            $numOK++;

        } else {[/size]

            $numNOK++;

        }

        $model = null;

    }

    $models = null;

    gc_collect_cycles();

}

gc_disable();

Regards,

[size="2"]Joachim[/size]

nineinchnick · December 2, 2013, 4:04pm

Just to throw in my two cents, I’ve created the exporter extension to allow exporting very large data sets, for example a whole table from the database, as CSV or other formats. I’m using it to download up to 60Mb files with more than 200k rows and I think the upper limit is much much bigger.

At the same time this is done the same way you render an ordinary CGridView so you can reuse your column configuration and filters that create the data provider. So it’s super easy to add an export button to an existing grid.

The trick is to create a CDbDataReader, read the results row by row, format the output row and stream it to the client. This is the most efficient way requiring the least amount of memory and it executes the database query only once.

I had to use a special version of CActiveFinder to allow building a CDbDataReader using CDbCriteria with ‘with’ param. I’ve created a PR to get this included in 1.1.15.