Re: Phrase search vs. multi-lexeme tokens

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Phrase search vs. multi-lexeme tokens
Date: 2021-01-06 17:18:32
Message-ID: 10026.1609953512@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Alexander Korotkov <aekorotkov(at)gmail(dot)com> writes:
> # select to_tsvector('pg_class foo') @@ websearch_to_tsquery('"pg_class foo"');
> ?column?
> ----------
> f

Yeah, surely this is wrong.

> # select to_tsquery('pg_class <-> foo');
> to_tsquery
> ------------------------------
> ( 'pg' & 'class' ) <-> 'foo'

> I think if a user writes 'pg_class <-> foo', then it's expected to
> match 'pg_class foo' independently on which lexemes 'pg_class' is
> split into.

Indeed. It seems to me that this:

regression=# select to_tsquery('pg_class');
to_tsquery
----------------
'pg' & 'class'
(1 row)

is wrong all by itself. Now that we have phrase search, a much
saner translation would be "'pg' <-> 'class'". If we fixed that
then it seems like the more complex case would just work.

I read your patch over quickly and it seems like a reasonable
approach (but sadly underdocumented). Can we extend the idea
to fix the to_tsquery case?

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Magnus Hagander 2021-01-06 17:27:48 Re: data_checksums enabled by default (was: Move --data-checksums to common options in initdb --help)
Previous Message Tomas Vondra 2021-01-06 17:16:38 Re: [PoC] Non-volatile WAL buffer