Message translation design

In the application we are going to use TranslatorInterface and call its translate() method passing message key (id), parameters to substitute in the final string, category and target locale.

interface TranslatorInterface
{
    public function translate(
        string $id,
        array $parameters = [],
        string $category = null,
        string $locale = null
    ): string;
}

It is expected that we get final string that we can output to end user from this call:

public function actionTest(TranslatorInterface $translator)
{
    $translated = $translator->translate('salary.paid', ['amount' => 1000], 'ui', 'en_US');
}

For the following translation:

return [
    'message.new.articles' => 'Salary paid: {amount}.'
];

It will give you “Salary paid: 1000.”.

Under the hood

What happens when we call TranslatorInterface::translate()? Translator is expected to obtain
the message given its key (id), category and target locale, and then format
it by replacing placeholders in the message with values from params.

In order to achieve greater flexibility in terms of message storage formats and formatting, we introduce two interfaces:

  1. MessageReaderInterface that helps getting a message given id, category and target locale:

    interface MessageReaderInterface
    {
        public function getMessage(string $id, string $category, string $locale): string;
    }
    
  2. MessageFormatterInterface that formats the message given parameters:

    interface MessageFormatterInterface
    {
        public function format(string $message, array $parameters, string $locale): string;
    }
    

Translator is configured using combinations of these per category source:

class Translator implements TranslatorInterface
{
    public function addCategorySource($category, MessageReaderInterface $reader, MessageFormatterInterface $formatter): self
    {
        // ...
    }
}

Overall it looks like the following:

  • Translator::translate('salary.paid', ['amount' => 1000], 'ui', 'en_US');
    • MessageReaderInterface::getMessage('salary.paid', 'ui', 'en_US');
    • MessageFormatterInterface::format('Salary paid: {amount}.', ['amount' => 1000], 'en_US');
    • return 'Salary paid: 1000.';.

Possible message reader implementations are:

  • PHP file returning array of id => message.
  • JSON where keys are ids and values are messages.
  • Database.

Possible message formatters are:

  • Simple formatter that just replaces {placeholder} with the corresponding parameter value.
  • Powerful Intl-based formatter like it was in Yii 2 that supports plurals etc.

Translation extractor

Similar to Yii 2, we are going to have a command line too that will get through source code, extract message keys and
write these into a translation resource merging with existing messages (if any). For these purpose, additional to MessageReaderInterface we’ve introduced MessageWriterInterface:

interface MessageWriterInterface
{
    public function write(array $messages): void;
}

Recommending usage of ids instead of full messages

We are going to recommend using IDs or message keys instead of full messages like it was in Yii 2:

$translator->translate('salary.paid', ['amount' => 1000], 'ui', 'en_US');
// instead of
$translated = $translator->translate('Salary paid: {amount}.', ['amount' => 1000], 'ui', 'en_US');

That would allow:

  1. Not to care about source language being English (it’s not recommended to use non-English since it’s harder to find translator who’s able to handle both your less common source language and less common language to translate to).
  2. To switch to another formatter without touching application code. That would require to adjust translations though.

Note that it would be still possible to use full strings as keys since when no translation string is found, we
pass message key to formatter.

Gettext compatibility problem

GNU gettext is a popular way to handle string translation with good tools such as
Poedit. Ideally, we’d like to allow using it but there is a problem with plurals.

Gettext, unlike intl, handles plurals at the message storage level, not at formatting level. It has separate keys for
a singular form and any of plural forms and this fact is reflected in its usage.

Messages with no plurals are obtained with gettext($id) while messages with plurals are obtained with
a dedicated method ngettext($idSingular, $idPlural, $n) i.e. message id is selected based on the value of $n.

With intl formatter it does not make sense to store messages in such way and to have separate method for getting plurals. It is different:

'There {catsNumber,plural,=0{are no cats} =1{is one cat} other{are # cats}}!'
  1. Both strings for n=1 and n>1 are in a single message.
  2. $n could be named differently. In our message it is catsNumber.
  3. There could be multiple plurals in a single message so multiple $n values.

Considering all that there’s a problem. API that makes sense for Gettext doesn’t make sense when formatter is intl-based
i.e. for PHP files + intl formatter, DB + intl formatter etc.

Possible solutions:

  1. Do not implement gettext.
  2. Do not use native gettext plurals, use only singular strings and format these using intl.

Both aren’t ideal and we need help about deciding how to handle that.

Packages

packages

  • translator - interfaces.
  • message-* - message sources.
  • formatter-* - message formatters.

Other considerations

Likely there’s no need for separate MessageReaderInterface and MessageWriterInterface. These could be merged into MessageSourceInterface.

1 Like

Or… have separate gettext API that doesn’t have to be compatible with intl API. If I get it correctly it would be formatter-gettext? The only con here is that dev won’t be able to switch between intl and gettext without extra work with sources but according to your post it’s not possible anyway in Yii 2 (unless there are no plural forms that is).

Some of the languages, like for example mine (Polish), may have more than 1 plural form. You will find below an rough example of an Inflector class, which inflects the numbers correctly, basing on the passed value.

namespace ds\helpers\numbers;


final class Inflector
{

    private int $number;
    private string $singularForm;
    private string $firstPluralForm;
    private string $secondPluralForm;

    public function __construct(int $number, string $singularForm, string $firstPluralForm, string $secondPluralForm)
    {
        $this->number = $number;
        $this->singularForm = $singularForm;
        $this->firstPluralForm = $firstPluralForm;
        $this->secondPluralForm = $secondPluralForm;
    }

    public function __toString()
    {
        return $this->inflect();
    }

    private function inflect(): string
    {
        if (1 === $this->number) {
            return $this->singularForm;
        }

        $n10 = $this->number % 10;
        $n100 = $this->number % 100;
        return ((($n10 > 1) && ($n10 < 5)) && (($n100 < 10) || ($n100 > 20))) ? $this->firstPluralForm : $this->secondPluralForm;
    }

}

Following example produces 3 different texts describing amount of months, dependent on the value passed into Inflector, e.g.

  • 1 miesiÄ…c (en. 1 month)
  • 2 miesiÄ…ce (en. 2 moths)
  • 5 miesiÄ™cy (en. 5 months)
return new Inflector($amount, "{$amount} miesiąc", "{$amount} miesiące", "{$amount} miesięcy");

The accuracy of inflected form is extremely important, if you are displaying written amounts in banking or payments systems.

It’s obvious that above 3 texts can be achieved with ngettext(), but how you can get the same result in the intl using your proposal?

How about integrations with existing solutions mentioned here, e.g. https://www.toptal.com/php/build-multilingual-app-with-gettext?

I hardly use gettext. Always collect messages via console and translate manually.

Even though I dont use gettext, for framework it’s different position. It should still support gettext as a standard widely used tool, even though without support for plurals. In that case there should be warning in the documentation with explanation.

Criterium here is grammatical correctness - meaning plurals need to be supported in all forms.
Supporting ONLY singular strings would be surprising solution.
Gettext is only an alternative tool - can be supported in a limited way.

@Bizley if we’ll decide that not to fit gettext into our API (or change out API to fit gettext) then we will end up with two separate translation implementations without common interface.

For intl you’ll have to use TranslatorInterface, for gettext — a separate interface with explicit translatePlural().

Developer will have to choose right at the start of the project and won’t be able to switch from one implementation to another.

I don’t understand the problem to be honest. Why not check if first param is a number (or force specific key, like n) and based on that treat this as gettext() or ngettext() call, So $translator->translate('salary.paid', ['n' => 1000], 'ui', 'en_US'); will use ngettext() and $translator->translate('salary.paid', ['someParam' => 'Some value'], 'ui', 'en_US'); will use gettext(). We don’t need separate interface to handle gettext format.

1 Like

That’s fairly easy:

echo $translator->translate('got.bowls', ['amount' => 10], 'ui', 'pl_PL');

translation message would be

'got.bowls' => 'Otrzymał: {n, plural, one{# miskę} few{# miski} many{# misek} other{# miski}}',

Result for 121 would be:

Otrzymał: 121 misek
// if I got the plurals for the language in question correctly :)

The rules for plurals are part of intl’s ICU data package: https://intl.rmcreative.ru/tables?locale=pl_PL

That’s how it currently works in Yii 2.

Yii 2 is mentioned there as well :slight_smile: Seems article author did not review solutions in detail when listing them. I did…

  • symfony/translation works like Yii 2, both were created using Yii 1.1 as a baseline back in the days. It is using gettext format for storage only without using plurals. Plurals are done using non-gettext formatter. Symfony uses their own instead of intl.
  • oscarotero/Gettext and zend/i18n are using separate method for plural form.
  • Laravel translation doesn’t worth mentioning. It’s way too simplistic and complicated at the same time for any real i18n needs. You have to use dated and inconvenient “choice” format for plurals.

Because intl allows multiple plurals in a single string we cannot force the name:

'mutli.plurals' => 'There are {catsNumber, plural, one{one cat} other{# cats}} and {dogsNumber, plural, one{one dog} other{# dogs}}',

That could be used as

echo $translator->translate('mutli.plurals', ['catsNumber' => 13, 'dogsNumber' => 10], 'ui', 'en_US');

Because number doesn’t automatically mean plural form:

'total' => 'Total: {n}'

and because ngettext requires two message IDs instead of just one.

That is the limitation of gettext (no multiple plurals) - if someone decides to use it, it should be aware of consequences and some conventions.

And it will not automatically use plural form, because you need to define it first (it should fallback to singular form if no plural forms are defined).
And I think this will be advantage, because translators does not need, but still could introduce plural form in this case. I’m maintaining Polish language pack for Flarum, which uses separate method for plurals, and plugin authors often forgets about plurals and use method for singular translations. My work would be so much easier if I could introduce plural form based on first number-like parameter, even if plugin author did not think about plural forms.

You can repeat the same ID twice (you probably need to do this anyway with path-like keys instead original phrase).

I’m aware that this is not perfect solution, but at least it provides some level of compatibility, so user can write code compatible with both gettext and non-gettext backends.

1 Like

I agree with Rob on this one. And ok, we can have the same interface but since the implementation is different anyway we can also have different implemetation for storing messages, right? I’m not familiar with gettext so I don’t really see the problem here. Can we do it like that - with intl message key remain as is but with gettext key is appended under the hood with params’ names (or something like that)?

I was pretty much puzzled by this statement and set up a small test project for it. Found out that:

  1. ngettext('singular_with_plural', 'singular_with_plural', 21) properly selects plural message related to singular one if exists.
  2. ngettext('singular_without_plural', 'singular_without_plural', null) properly selects singular message if no related plural message exists.

So we, indeed, can safely use a single method for both plurals and singulars. That pretty much solves the dilemma. Gettext fits into the design.

Thank you very much for a hint!

1 Like