Elastic search and stemming

Hi.
I have these two rows saved to elastic

elephant likes apples

and

the apple are wonderful

Ar elastic model is this:

class Article extends \yii\elasticsearch\ActiveRecord
{
    public function attributes()
    {
        return ['content'];
    }
    
    /**
     * @return array This model's mapping
     */
    public static function mapping()
    {
        return [
            'properties' => [
                'content'     => ['type' => 'text'],
            ]
        ];
    }   
    
   public static function index() {
       return 'articlesstem';
   }
    
    /**
     * Create this model's index
     */
    public static function createIndex2()
    {
        $db = static::getDb();
        $command = $db->createCommand();
        $command->createIndex(static::index(), [
            //'aliases' => [ /* ... */ ],
            'mappings' => static::mapping(),
            'settings' => [ 
                'analysis' => [
                    'analizer' => [
                        'my_analizer' => [
                           "tokenizer"=> "whitespace",
                           "filter" => ["stemmer"]
                        ]    
                    ]
                ]                
            ],
        ]);
    }

With the search in controller

        $search = 'apple';

        $els = ElasticArticle::find()->query(['match' => ['content' => [
            'query' => $search,
           
        ]]])->all();

I get only one AR model, with the exact match

‘the apple are wonderful’

How can I properly add stemming so my query finds both ‘apple’ and ‘apples’ ?

Maybe you can look into ‘fuzziness’: Elasticsearch Fuzzy Query - Techniques, Use Cases & Examples

But that is not a complete/good solution for this specific case, as ‘apple’ is only 5 characters long.
So you could mimic something like that yourself, if the search:

  • is less than 6 characters
  • does not contain spaces (is one word)
  • does not end with “s”

Then add an “OR” search with the same word with an “s”.

So someone searched for ‘apple’, but you query for:


"query": {
    "bool": {
      "should": [
                  { "match": { "content": "apple" }},
                  { "match": { "content": "apples"   }}
      ]
    }
  }

the problem was that I need to set my analyzer as default.
Meaning that instead of my_analizer I had to write default

And misprint - not analizer , but analyzer