thefsb
(Fsb)
September 16, 2019, 5:41pm
1
I’m working on a script to produce statistics from the PHP FPM access log file. The log has the REQUEST_URI for each request but I want to group requests by route (module/controller/action) not by URI. So I need a way to get the route from the URI in the context of a console command. I have the web UrlManager
component but I’m not sure how to use it.
Using ideas from tests/framework/web/UrlManagerTest
, I found I can get what’s needed using Request::setPathInfo()
on a Request
object and passing that to UrlManager::parseRequest()
.
But this is extremely slow. Processing the log file is more than 50x slower using this than with simple string processing (that doesn’t accomplish everything I need). It took 15 minutes to process a log file with 250k lines.
Can you suggest a faster way to do this?
samdark
(Alexander Makarov)
September 16, 2019, 6:38pm
2
I think that’s the only way if you want to use UrlManager.
Do you process rules in batches? Do you instantiate UrlManager once for a batch?
Do you have rule cache set up?
thefsb
(Fsb)
September 17, 2019, 7:44pm
3
I want to use the app’s URL rules. I don’t specifically want to use UrlManager.
I instantiate the UrlManager once and one Request object and use them to process each line in the log file with 283k lines. See below
I don’t think the cache was set up so I now added 'cache' => new ArrayCache(),
in the app config but the processing is not faster.
$request = new Request();
$urlManager = Yii::$app->getUrlManager();
while (!feof($logFile)) {
$line = fgets($logFile);
if ($line === false) {
break;
}
if (preg_match($pattern, $line, $matches)) {
[$pathInfo] = explode('?', $matches['request'], 2);
$request->setPathInfo($pathInfo);
$_POST['_method'] = $matches['method'];
try {
[$route] = $urlManager->parseRequest($request);
} catch (UrlNormalizerRedirectException $redirectException) {
$request->setPathInfo($redirectException->url[0]);
[$route] = $urlManager->parseRequest($request);
}
$stats[sprintf('%-4s %.3s %s', $matches['method'], $matches['status'], $route)]['raw'][] = [$matches['time'], $matches['cpu']];
}
}
samdark
(Alexander Makarov)
September 18, 2019, 8:42am
4
No idea then Url matching was not optimized for such usage so probably profiling the code would reveal a bottleneck that isn’t important for normal request-response usage but matters for your case.
samdark
(Alexander Makarov)
September 18, 2019, 8:43am
5
One more thing is that number of rules you have matters. Maybe worth trying to optimize config array by combining some.
thefsb
(Fsb)
September 18, 2019, 7:01pm
6
Thanks for the suggestions, Alex. I’m going to try approaching this a different way: write the route to the log file and this problem goes away.
Put something like this in the app’s various Controller::beforeAction()
methods
header("yii-route: {$action->controller->module->id}/{$action->controller->id}/{$action->id}");
Configure FPM to write that to the access log by adding %{yii-route}o
to access.format in the pool config.
Configure nginx to strip the output header with fastcgi_hide_header .
samdark
(Alexander Makarov)
September 18, 2019, 8:52pm
7
I was about to suggest it but then I’ve thought again since it’s access log it should be very intensive and may affect application performance.
thefsb
(Fsb)
September 19, 2019, 12:43am
8
Why do you think writing the FPM access log is very intensive?
samdark
(Alexander Makarov)
September 19, 2019, 7:36am
9
Because in this case it’s done by PHP, not by FPM itself, correct? Or maybe I’m not getting it
thefsb
(Fsb)
September 19, 2019, 1:21pm
10
The log is written by FPM. It works like this
The PHP app writes the route to a custom HTTP output header.
FPM copies it to the pool access log (which in my case it is writing anyway).
nginx deletes the header.
An example FPM pool config
access.log = /var/log/php-fpm/$pool.access.log
access.format = "%{REMOTE_ADDR}e - %t \"%m %{REQUEST_URI}e%Q%q\" %s \"%{My-Custom-Header}o\" %d %M %C"
In which
%d: time taken to serve the request
%C: %CPU used by the request
%M: peak of memory allocated by PHP
Using the log I can answer questions such as
Which actions (routes)
are requested most often?
taken together occupy server threads the most (sum of %d
)?
taken together occupy CPU cores the most (sum of (%d * %C)
?
For each route, view descriptive statistics (I like to see mean, median, and 95th percentile) of
time taken to serve the request %d
CPU load %d * %C
peak memory used
Now mature optimization possible :>
(Btw: Also need to set up the log directory and rotation.)
1 Like
thefsb
(Fsb)
September 30, 2019, 9:11pm
11
In the end I put this in my app’s main config
'on beforeAction' => function ($event) {
if (isset($event->action->controller->module)) {
header("X-Myapp-Route: {$event->action->controller->module->id}/{$event->action->controller->id}/{$event->action->id}");
}
},
The header name is whatever is suitable for your app.
Idk if it is possible for $event->action->controller->module
to not be set in this handler but if it is then the result would be really bad, hence the if
.
1 Like