Re: Flexible configuration for full-text search

From: Teodor Sigaev <teodor(at)sigaev(dot)ru>
To: Aleksandr Parfenov <a(dot)parfenov(at)postgrespro(dot)ru>, Aleksander Alekseev <a(dot)alekseev(at)postgrespro(dot)ru>
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Flexible configuration for full-text search
Date: 2018-04-05 14:26:10
Message-ID: 196303f9-b456-bd23-fcd7-f4bfe6119115@sigaev.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Some notices:

0) patch conflicts with last changes in gram.y, conflicts are trivial.

1) jsonb in catalog. I'm ok with it, any opinions?

2) pg_ts_config_map.h, "jsonb mapdicts" isn't decorated with #ifdef
CATALOG_VARLEN like other varlena columns in catalog. It it's right, pls,
explain and add comment.

3) I see changes in pg_catalog, including drop column, change its type, change
index, change function etc. Did you pay attention to pg_upgrade? I don't see it
in patch.

4) Seems, it could work:
ALTER TEXT SEARCH CONFIGURATION russian
ALTER MAPPING FOR asciiword, asciihword, hword_asciipart,
word, hword, hword_part
WITH english_stem union (russian_stem, simple);
^^^^^^^^^^^^^^^^^^^^^ simple way instead of
WITH english_stem union (case russian_stem when match then keep else simple end);

4) Initial approach suggested to distinguish three state of dictionary result:
null (unknown word), stopword and usual word. Now only two, we lost possibility
to catch stopwords. One of way to use stopwrods is: let we have to identical fts
configurations, except one skips stopwords and another doesn't. Second
configuration is used for indexing, and first one for search by default. But if
we can't find anything ('to be or to be' - phrase contains stopwords only)
then we can use second configuration. For now, we need to keep two variant of
each dictionary - with and without stopwords. But if it's possible to
distinguish stop and nonstop words in configuration then we don't need to have
duplicated dictionaries.

Aleksandr Parfenov wrote:
> On Fri, 30 Mar 2018 14:43:30 +0000
> Aleksander Alekseev <a(dot)alekseev(at)postgrespro(dot)ru> wrote:
>
>> The following review has been posted through the commitfest
>> application: make installcheck-world: tested, passed
>> Implements feature: tested, passed
>> Spec compliant: tested, passed
>> Documentation: tested, passed
>>
>> LGTM.
>>
>> The new status of this patch is: Ready for Committer
>
> It seems that after d204ef6 (MERGE SQL Command) in master the patch
> doesn't apply due to a conflict in keywords lists (grammar and header).
> The new version of the patch without conflicts is attached.
>

--
Teodor Sigaev E-mail: teodor(at)sigaev(dot)ru
WWW: http://www.sigaev.ru/

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2018-04-05 14:26:35 Re: [HACKERS] path toward faster partition pruning
Previous Message Alexander Korotkov 2018-04-05 14:07:57 Re: WIP: Covering + unique indexes.