Re: Text search prefix matching and stop words

From: Pavel Borisov <pashkin(dot)elfe(at)gmail(dot)com>
To: mnelson(at)binarykeep(dot)com
Cc: PostgreSQL mailing lists <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject: Re: Text search prefix matching and stop words
Date: 2021-10-08 20:30:41
Message-ID: CALT9ZEG-i0prBw5N7pMAPqL_Kj=g_xK-oKjumE6-q0TVvOfB4A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

>
> Prefix matching should not omit stop words, as matching lexemes may
> legitimately begin with stop words.
>
> # select to_tsquery('english', 'over:*') @@ to_tsvector('english',
> 'overhaul');
> NOTICE: text-search query contains only stop words or doesn't contain
> lexemes, ignored
> ?column?
> ----------
> f
> (1 row)
>
> I noticed this after implementing interactive, incremental search in an
> application. As the user typed "overhaul," with each successive character
> executing a search, "ove" and "overh" matched a particular document, but
> "over" did not.

Big thanks for the reporting!

I am not sure that it is a bug. I think this is a way how to_tsquery
conversion work: stopwords first then template processing.

If you want to process successive characters typing, you can use casting to
tsvector type until input is not finished

'over:*'::tsquery;

and when the user finishes input then process the result via to_tsquery
with stop words.

if we do to_tsquery in a way you described I expect it will never apply the
stop-word filter on templated input as it can not be compared to stop words.

--
Best regards,
Pavel Borisov

Postgres Professional: http://postgrespro.com <http://www.postgrespro.com>

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Pavel Borisov 2021-10-08 20:32:28 Re: Text search prefix matching and stop words
Previous Message Matthew Nelson 2021-10-08 18:17:16 Text search prefix matching and stop words