Quick Links

Re: Phrase search vs. multi-lexeme tokens

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Alexander Korotkov <aekorotkov(at)gmail(dot)com>
Cc:	pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Phrase search vs. multi-lexeme tokens
Date:	2021-01-06 17:18:32
Message-ID:	10026.1609953512@sss.pgh.pa.us
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Alexander Korotkov <aekorotkov(at)gmail(dot)com> writes:
> # select to_tsvector('pg_class foo') @@ websearch_to_tsquery('"pg_class foo"');
> ?column?
> ----------
> f

Yeah, surely this is wrong.

> # select to_tsquery('pg_class <-> foo');
> to_tsquery
> ------------------------------
> ( 'pg' & 'class' ) <-> 'foo'

> I think if a user writes 'pg_class <-> foo', then it's expected to
> match 'pg_class foo' independently on which lexemes 'pg_class' is
> split into.

Indeed. It seems to me that this:

regression=# select to_tsquery('pg_class');
to_tsquery
----------------
'pg' & 'class'
(1 row)

is wrong all by itself. Now that we have phrase search, a much
saner translation would be "'pg' <-> 'class'". If we fixed that
then it seems like the more complex case would just work.

I read your patch over quickly and it seems like a reasonable
approach (but sadly underdocumented). Can we extend the idea
to fix the to_tsquery case?

regards, tom lane

In response to

Phrase search vs. multi-lexeme tokens at 2020-11-12 13:09:51 from Alexander Korotkov

Responses

Re: Phrase search vs. multi-lexeme tokens at 2021-01-07 03:36:05 from Alexander Korotkov

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Magnus Hagander	2021-01-06 17:27:48	Re: data_checksums enabled by default (was: Move --data-checksums to common options in initdb --help)
Previous Message	Tomas Vondra	2021-01-06 17:16:38	Re: [PoC] Non-volatile WAL buffer