websearch_to_tsquery fails to transform compound words from a thesaurus dictionary

From: Jean Gabriel <pgml(at)hasbani(dot)ca>
To: pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: websearch_to_tsquery fails to transform compound words from a thesaurus dictionary
Date: 2022-06-14 14:38:36
Message-ID: d9874680-8292-0728-dca0-f9312afd3221@hasbani.ca
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Hello,

Affected versions: PG 11 to 14.3 (all).
Affected OS:  windows 10 + x86_64-pc-linux-gnu (from dbfiddle)

Issue:

Thesaurus dictionary can transform a compound word to another one. The
example provided in the doc is "supernovae stars : *sn". When used with
websearch_to_tsquery, this transformation does not occur and the
original words are kept, **OR**, if there is another single word entry
in the thesaurus, this single transformation occurs.

Why it is a problem:

since other text search functions apply the transformation, a document
containing the compound word can't be found when using
websearch_to_tsquery.

Expected result:

websearch_to_tsquery should transform compound words from the thesaurus

Good to know:

1) the expected behavior occurs with single words from the thesaurus.
2) the bad behavior occurs regardless of pre or post stemming
3) If the compound word is double quoted, websearch_to_tsquery returns
the expected output in V14 but a bad one in previous versions.

Steps to reproduce:
create a test_theasaurus.ths file with the lines

supernovae stars : *sn
supernovae : *sn
abc def: xy

CREATE TEXT SEARCH DICTIONARY test_thesaurus (
    TEMPLATE = thesaurus,
    DictFile = test_theasaurus,
    Dictionary = pg_catalog.english_stem
);

CREATE TEXT SEARCH CONFIGURATION public.test ( COPY = pg_catalog.english );

ALTER TEXT SEARCH CONFIGURATION public.test
        ALTER MAPPING FOR hword, hword_part, word, asciihword,
hword_asciipart, asciiword
        WITH public.test_thesaurus, english_stem;

select to_tsvector('test','abc def') @@ websearch_to_tsquery('test','abc
def'); --FALSE - wrong result
select to_tsvector('test','supernovae stars') @@
websearch_to_tsquery('test','supernovae stars'); --FALSE - wrong result

select websearch_to_tsquery('test','abc def'); --'abc def' --> no
transformation occurred
select websearch_to_tsquery('test','supernovae stars'); --'sn' & 'star'
--> 1st word is listed by itself in the thesaurus and was transformed

select websearch_to_tsquery('test','"abc def"'); -- 'xy' --> in V14,
double quoted compound words are transformed as expected

select to_tsvector('test','abc def'), plainto_tsquery('test','abc def');
--'xy', expected behavior in other functions
select to_tsvector('test','supernovae stars'),
plainto_tsquery('test','supernovae stars'); --'sn', expected behavior in
other functions

Let me know if there is anything else I can provide!

Thank you for taking the time to look at this issue, it is much appreciated

JG

Browse pgsql-bugs by date

  From Date Subject
Next Message PG Bug reporting form 2022-06-14 15:17:46 BUG #17518: Getting Error "new multixact has more than one updating member" when trying to delete records.
Previous Message Michael Paquier 2022-06-14 01:31:21 Re: BUG #17504: psql --single-transaction -vON_ERROR_STOP=1 still commits after client-side error