Re: Full text search bug ('russian' regconfig)

From: Artur Zakirov <zaartur(at)gmail(dot)com>
To: egocenter <egocenter(at)yandex(dot)ru>, pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: Full text search bug ('russian' regconfig)
Date: 2020-02-20 01:04:54
Message-ID: 0f991eaf-2394-41a2-9d8e-c36aef35fbb1@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Hello

On 2/19/2020 5:35 PM, egocenter wrote:
> Text search doesn't work correct with the EQUAL string in text and query (russian dictionary config),
> as you can see in example ts_vector receives different from ts_query lexemes for identical text:
>
> tsv = 'дан':1 'магазин':2 'нужн':3 'посеща':4 'точн':5
> tsq = 'нужн' & 'точн' & 'дан' & 'посещаем' & 'магазин'

It is because you call to_tsvector() two times. 'russian' is a Snowball
dictionary and it uses stemming algorithms to cut words ending. Your
query works if to_tsvector() isn't called twice on the same text:

=# SELECT
web_query_and @@ ts_title,
web_query_and @@ 'зачем нужны точные данные о посещаемости магазинов',
*
FROM
(SELECT
to_tsvector('russian', 'зачем нужны точные данные о посещаемости
магазинов') AS ts_title,
websearch_to_tsquery('russian', 'зачем нужны точные данные о
посещаемости магазинов?') AS web_query_and
) AS main;

It gives 'true' for the first column.

--
Artur

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message PG Bug reporting form 2020-02-20 05:06:24 BUG #16268: SPI_getvalue requires IsTransactionState but TextDatumGetCString of SPI_getbinval - not!
Previous Message Tom Lane 2020-02-20 00:09:31 Re: BUG #16264: Server closed the connection unexpectedly