Re: Flexible configuration for full-text search

From: Andres Freund <andres(at)anarazel(dot)de>
To: Teodor Sigaev <teodor(at)sigaev(dot)ru>
Cc: Aleksandr Parfenov <a(dot)parfenov(at)postgrespro(dot)ru>, Aleksander Alekseev <a(dot)alekseev(at)postgrespro(dot)ru>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Flexible configuration for full-text search
Date: 2018-04-05 18:37:49
Message-ID: 20180405183749.vlocjuikmztk7jec@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2018-04-05 17:26:10 +0300, Teodor Sigaev wrote:
> Some notices:
>
> 0) patch conflicts with last changes in gram.y, conflicts are trivial.
>
> 1) jsonb in catalog. I'm ok with it, any opinions?
>
> 2) pg_ts_config_map.h, "jsonb mapdicts" isn't decorated with #ifdef
> CATALOG_VARLEN like other varlena columns in catalog. It it's right, pls,
> explain and add comment.
>
> 3) I see changes in pg_catalog, including drop column, change its type,
> change index, change function etc. Did you pay attention to pg_upgrade? I
> don't see it in patch.
>
> 4) Seems, it could work:
> ALTER TEXT SEARCH CONFIGURATION russian
> ALTER MAPPING FOR asciiword, asciihword, hword_asciipart,
> word, hword, hword_part
> WITH english_stem union (russian_stem, simple);
> ^^^^^^^^^^^^^^^^^^^^^ simple way instead of
> WITH english_stem union (case russian_stem when match then keep else simple end);
>
> 4) Initial approach suggested to distinguish three state of dictionary
> result: null (unknown word), stopword and usual word. Now only two, we lost
> possibility to catch stopwords. One of way to use stopwrods is: let we have
> to identical fts configurations, except one skips stopwords and another
> doesn't. Second configuration is used for indexing, and first one for search
> by default. But if we can't find anything ('to be or to be' - phrase
> contains stopwords only) then we can use second configuration. For now, we
> need to keep two variant of each dictionary - with and without stopwords.
> But if it's possible to distinguish stop and nonstop words in configuration
> then we don't need to have duplicated dictionaries.

Just to be clear: I object to attempting to merge this into v11. This
introduces new user interface, arrived late in the development cycle,
and hasn't seen that much review. Not something that should be merged
two minutes before midnight.

I think it's good to continue reviewing, don't get me wrong.

- Andres

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2018-04-05 18:39:27 Re: Excessive PostmasterIsAlive calls slow down WAL redo
Previous Message Tom Lane 2018-04-05 18:33:51 Re: WIP: a way forward on bootstrap data