From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | Alexander Korotkov <aekorotkov(at)gmail(dot)com> |
Cc: | pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Phrase search vs. multi-lexeme tokens |
Date: | 2021-01-06 17:18:32 |
Message-ID: | 10026.1609953512@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Alexander Korotkov <aekorotkov(at)gmail(dot)com> writes:
> # select to_tsvector('pg_class foo') @@ websearch_to_tsquery('"pg_class foo"');
> ?column?
> ----------
> f
Yeah, surely this is wrong.
> # select to_tsquery('pg_class <-> foo');
> to_tsquery
> ------------------------------
> ( 'pg' & 'class' ) <-> 'foo'
> I think if a user writes 'pg_class <-> foo', then it's expected to
> match 'pg_class foo' independently on which lexemes 'pg_class' is
> split into.
Indeed. It seems to me that this:
regression=# select to_tsquery('pg_class');
to_tsquery
----------------
'pg' & 'class'
(1 row)
is wrong all by itself. Now that we have phrase search, a much
saner translation would be "'pg' <-> 'class'". If we fixed that
then it seems like the more complex case would just work.
I read your patch over quickly and it seems like a reasonable
approach (but sadly underdocumented). Can we extend the idea
to fix the to_tsquery case?
regards, tom lane
From | Date | Subject | |
---|---|---|---|
Next Message | Magnus Hagander | 2021-01-06 17:27:48 | Re: data_checksums enabled by default (was: Move --data-checksums to common options in initdb --help) |
Previous Message | Tomas Vondra | 2021-01-06 17:16:38 | Re: [PoC] Non-volatile WAL buffer |