Re: BUG #16744: ts_headline behaves incorrectly with <-> and proximity operators

From: Stas Obydionnov <stas(at)hellofyllo(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: BUG #16744: ts_headline behaves incorrectly with <-> and proximity operators
Date: 2020-11-25 19:23:02
Message-ID: CAKZLNo3TTZFq0WNQVOn_b_Yp97n2tEs+7Zk4oYkq9Fdst___pQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Thanks Tom,

Probably I provided a bad example.
Here is another one from a similar bug that was opened a couple of years
ago and was not answered.

Assuming the following query:

SELECT ts_headline('English',
'This Commercial Bank does not have any Equity in Europe but European
Commercial Bank does',
to_tsquery('English','European <-> Commercial <-> Bank')::tsquery);

The returned result is:
This <b>Commercial</b> <b>Bank</b> does not have any Equity in Europe but
<b>European</b> <b>Commercial</b> <b>Bank</b> does

This highlights the words Commercial & Bank separately in addition
to European Commercial Bank.

However, the correct output expected should be:
This Commercial Bank does not have any Equity in Europe but <b>European</b>
<b>Commercial</b> <b>Bank</b> does

Which only highlights *European Commercial Bank* due to the <-> operator in
phraseto_tsquery.

Regards,
Stas.

On Tue, Nov 24, 2020 at 8:18 PM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> PG Bug reporting form <noreply(at)postgresql(dot)org> writes:
> > When running the following code
> > select ts_headline('Alpha Beta Gama', phraseto_tsquery ('alpha
> gama'))
> > or
> > select ts_headline('Alpha Beta Gama', to_tsquery ('alpha <-> gama'))
> > I would expect the result be not to be highlighted,
>
> That's operating as designed, I think. Per the code comment:
>
> * If we found nothing acceptable, select min_words words starting
> at
> * the beginning.
>
> The expectation really is that it's on you to not select documents that
> don't match your search query. Once you've selected a document to
> display, ts_headline() is just going to do the best it can to produce
> something useful. "Not highlight anything" wasn't deemed particularly
> useful, and I agree with that judgment.
>
> Also, once it's selected a document fragment to display, it will highlight
> all words within that fragment that appear in the search query, whether or
> not the particular occurrence is part of the match-if-any. Thus
>
> regression=# select ts_headline('Alpha Beta Gama foo bar alpha gama',
> phraseto_tsquery ('alpha gama'));
> ts_headline
> ----------------------------------------------------------------
> <b>Alpha</b> Beta <b>Gama</b> foo bar <b>alpha</b> <b>gama</b>
> (1 row)
>
> Again, this is a value judgment about what's useful.
>
> regards, tom lane
>

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Tom Lane 2020-11-25 19:40:50 Re: BUG #16744: ts_headline behaves incorrectly with <-> and proximity operators
Previous Message Bruce Momjian 2020-11-25 16:42:40 Re: BUG #16743: psql doesn't show whole expression in stored column