Get route from URL offline


(Fsb) #1

I’m working on a script to produce statistics from the PHP FPM access log file. The log has the REQUEST_URI for each request but I want to group requests by route (module/controller/action) not by URI. So I need a way to get the route from the URI in the context of a console command. I have the web UrlManager component but I’m not sure how to use it.

Using ideas from tests/framework/web/UrlManagerTest, I found I can get what’s needed using Request::setPathInfo() on a Request object and passing that to UrlManager::parseRequest().

But this is extremely slow. Processing the log file is more than 50x slower using this than with simple string processing (that doesn’t accomplish everything I need). It took 15 minutes to process a log file with 250k lines.

Can you suggest a faster way to do this?


(Alexander Makarov) #2

I think that’s the only way if you want to use UrlManager.

  1. Do you process rules in batches? Do you instantiate UrlManager once for a batch?
  2. Do you have rule cache set up?

(Fsb) #3

I want to use the app’s URL rules. I don’t specifically want to use UrlManager.

I instantiate the UrlManager once and one Request object and use them to process each line in the log file with 283k lines. See below

I don’t think the cache was set up so I now added 'cache' => new ArrayCache(), in the app config but the processing is not faster.

$request = new Request();
$urlManager = Yii::$app->getUrlManager();
while (!feof($logFile)) {
    $line = fgets($logFile);
    if ($line === false) {
        break;
    }

    if (preg_match($pattern, $line, $matches)) {
        [$pathInfo] = explode('?', $matches['request'], 2);
        $request->setPathInfo($pathInfo);
        $_POST['_method'] = $matches['method'];
        try {
            [$route] = $urlManager->parseRequest($request);
        } catch (UrlNormalizerRedirectException $redirectException) {
            $request->setPathInfo($redirectException->url[0]);
            [$route] = $urlManager->parseRequest($request);
        }
        $stats[sprintf('%-4s %.3s %s', $matches['method'], $matches['status'], $route)]['raw'][] = [$matches['time'], $matches['cpu']];
    }
}

(Alexander Makarov) #4

No idea then :frowning: Url matching was not optimized for such usage so probably profiling the code would reveal a bottleneck that isn’t important for normal request-response usage but matters for your case.


(Alexander Makarov) #5

One more thing is that number of rules you have matters. Maybe worth trying to optimize config array by combining some.


(Fsb) #6

Thanks for the suggestions, Alex. I’m going to try approaching this a different way: write the route to the log file and this problem goes away.

  1. Put something like this in the app’s various Controller::beforeAction() methods

    header("yii-route: {$action->controller->module->id}/{$action->controller->id}/{$action->id}");
    
  2. Configure FPM to write that to the access log by adding %{yii-route}o to access.format in the pool config.

  3. Configure nginx to strip the output header with fastcgi_hide_header.


(Alexander Makarov) #7

I was about to suggest it but then I’ve thought again since it’s access log it should be very intensive and may affect application performance.


(Fsb) #8

Why do you think writing the FPM access log is very intensive?


(Alexander Makarov) #9

Because in this case it’s done by PHP, not by FPM itself, correct? Or maybe I’m not getting it :slight_smile:


(Fsb) #10

The log is written by FPM. It works like this

  1. The PHP app writes the route to a custom HTTP output header.
  2. FPM copies it to the pool access log (which in my case it is writing anyway).
  3. nginx deletes the header.

An example FPM pool config

access.log = /var/log/php-fpm/$pool.access.log
access.format = "%{REMOTE_ADDR}e - %t \"%m %{REQUEST_URI}e%Q%q\" %s \"%{My-Custom-Header}o\" %d %M %C"

In which

  • %d: time taken to serve the request
  • %C: %CPU used by the request
  • %M: peak of memory allocated by PHP

Using the log I can answer questions such as

  • Which actions (routes)
    • are requested most often?
    • taken together occupy server threads the most (sum of %d)?
    • taken together occupy CPU cores the most (sum of (%d * %C)?
  • For each route, view descriptive statistics (I like to see mean, median, and 95th percentile) of
    • time taken to serve the request %d
    • CPU load %d * %C
    • peak memory used

Now mature optimization possible :>

(Btw: Also need to set up the log directory and rotation.)


(Fsb) #11

In the end I put this in my app’s main config

    'on beforeAction' => function ($event) {
        if (isset($event->action->controller->module)) {
            header("X-Myapp-Route: {$event->action->controller->module->id}/{$event->action->controller->id}/{$event->action->id}");
        }
    },

The header name is whatever is suitable for your app.

Idk if it is possible for $event->action->controller->module to not be set in this handler but if it is then the result would be really bad, hence the if.