Yii2: BatchQuery not "batch'ing" at all?


(Blizzke) #1

I have a forum with about 6 million posts and wanted to try the "each()" functionality via batch as it seems ideal for me. I need to assemble a list of the last X posts with specific criteria (that cannot be obtained from the database).

The code is as follows (tried via ActiveQuery as well but the same end result):




$query = (new Query())->from('posts')->orderBy(['postId' => SORT_DESC]);

foreach ($query->each(10) as $post) {

   ...



The code hangs and crashes on an out of memory. I’ve traced the problem to the BatchQueryResult::fetchData()-function where it allocates a new instance for the _dataReader:




if ($this->_dataReader === null) {

   $this->_dataReader = $this->query->createCommand($this->db)->query();

}



The query that comes out of that is just - obviously - what I put in:




SELECT * FROM `posts` ORDER BY `postId` DESC



And then the query() function is called, which ends up in \yii\db\Command::queryInternal() where at the line with "$this->pdoStatement->execute();" everything goes kaboom.

So how exactly is this batch thing supposed to help us if it just seems to try and select the entire dataset? :)

Or is PDO at fault here, trying to pre-cache everything or something?


#2

I just tried this too and you’re right.

However, it seems that the memory saving is not in making several smaller SQL calls but only converting a handful of those to ActiveRecord objects at a time. The AR objects being much larger in size than the array of data for the SQL row.

The documentation is not very clear, and I too thought that there was some SQL offset and limit going on.

EDIT: Even with just Query (i.e. not ActiveRecord) I’m getting a memory usage of 32.5 MB when fetching with all() on a table of about 23,000 rows, and only 2.6 MB when using each().

Compared to the 2.2 MB baseline, that is very good.

So the memory saving must happen while fetching the data from the DB, even though it’s only one SQL query.

Do you know if it’s the PHP instance that is out of memory, or the DB? Is the DB on the same server?


(Blizzke) #3

It’s just PHP, my limit is set at 128M, the DB has a enough memory to do a select on the entire posts table.

In any case, I’ve built a “query result slicer” largely based on the principle behind BatchQueryResult.

It operates based on the assumption that you’ll always enumerate results according to a specific (indexed!) field, in a specific order.

At leat that is the case for me.

The component takes the name of that field and a sorting direction and modifies the query automatically on each fetchData to obtain the next result.

Extra logics aren’t really hard. After the first run you initialise the “last_value” using the last row. On the second run I modify the query to include an extra piece in the were condition that says “and field > :last_value” (or “<” depending on the sort direction) with the value from the previous round. On all other iterations you just update the last_value parameter.

For reset functionality you need to keep the very first field value and put “last_value” to the first + [size=“2”]1 (because you’ve modified the query)[/size]

It’s not as pretty and elegant as I usually like my code, but it does the job very well.

It executes a lot less queries compared to fetching everything row by row, so it works for me.


(Y Korotia) #4

use $query->batch() instead of $query->each()


(Sergeymorkovkin) #5

Faced the very same problem. No LIMIT construct is done and it just crashes out of memory.