Re: Flexible configuration for full-text search

From: Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>
To: Aleksandr Parfenov <a(dot)parfenov(at)postgrespro(dot)ru>
Cc: Teodor Sigaev <teodor(at)sigaev(dot)ru>, Aleksander Alekseev <a(dot)alekseev(at)postgrespro(dot)ru>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Flexible configuration for full-text search
Date: 2018-08-23 22:06:08
Message-ID: CAPpHfdu5RaCYyKbvgi2NPp_FOFt4wEskxopQs+5L1Fu3W7w2aw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Apr 6, 2018 at 10:52 AM Aleksandr Parfenov
<a(dot)parfenov(at)postgrespro(dot)ru> wrote:
> On Thu, 5 Apr 2018 17:26:10 +0300
> Teodor Sigaev <teodor(at)sigaev(dot)ru> wrote:
> > 4) Initial approach suggested to distinguish three state of
> > dictionary result: null (unknown word), stopword and usual word. Now
> > only two, we lost possibility to catch stopwords. One of way to use
> > stopwrods is: let we have to identical fts configurations, except one
> > skips stopwords and another doesn't. Second configuration is used for
> > indexing, and first one for search by default. But if we can't find
> > anything ('to be or to be' - phrase contains stopwords only) then we
> > can use second configuration. For now, we need to keep two variant of
> > each dictionary - with and without stopwords. But if it's possible to
> > distinguish stop and nonstop words in configuration then we don't
> > need to have duplicated dictionaries.
>
> With the proposed way to configure it is possible to create a special
> dictionary only for stopword checking and use it at decision-making
> time.
>
> For example, we can create dictionary english_stopword which will
> return word itself in case of stopword and NULL otherwise. With such
> dictionary we create a configuration:
>
> ALTER TEXT SEARCH CONFIGURATION test_cfg ALTER MAPPING FOR asciiword,
> word WITH
> CASE english_stopword WHEN NO MATCH THEN english_hunspell END;
>
> In described example, english_hunspell can be implemented without
> processing of stopwords at all and we can divide stopword processing
> and processing of other words into separate dictionaries.
>
> The key point of the patch is to process stopwords the same way as
> others at the level of the PostgreSQL internals and give users an
> instrument to process them in a special way via configurations.

If we're going to do it that way by providing separate dictionaries
for stop words, then I think we should also make it for builtin
dictionaries and configurations. So, I think this patch should also
split builtin dictionaries into stemmers and stop word dictionaries,
and provide corresponding configuration over them. It would be also
needed to perform some benchmarking to show that new way of defining
configurations is not worse than previous way in the performance.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2018-08-23 22:13:46 Re: Flexible configuration for full-text search
Previous Message Bossart, Nathan 2018-08-23 21:53:57 Re: Improve behavior of concurrent ANALYZE/VACUUM