Re: Rethinking our fulltext phrase-search implementation

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: Rethinking our fulltext phrase-search implementation
Date: 2016-12-20 20:22:12
Message-ID: 24144.1482265332@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I wrote:
> I've been thinking about how to fix the problem Andreas Seltenreich
> reported at
> https://postgr.es/m/87eg1y2s3x.fsf@credativ.de

Attached is a proposed patch that deals with the problems discussed
here and in <26706(dot)1482087250(at)sss(dot)pgh(dot)pa(dot)us>. Is anyone interested
in reviewing this, or should I just push it?

BTW, I noticed that ts_headline() seems to not behave all that nicely
for phrase searches, eg

regression=# SELECT ts_headline('simple', '1 2 3 1 3'::text, '2 <-> 3', 'ShortWord=0');
ts_headline
--------------------------------
1 <b>2</b> <b>3</b> 1 <b>3</b>
(1 row)

Highlighting the second "3", which is not a match, seems pretty dubious.
Negative items are even worse, they don't change the results at all:

regression=# SELECT ts_headline('simple', '1 2 3 1 3'::text, '!2 <-> 3', 'ShortWord=0');
ts_headline
--------------------------------
1 <b>2</b> <b>3</b> 1 <b>3</b>
(1 row)

However, the code involved seems unrelated to the present patch, and
it's also about as close to completely uncommented as I've seen anywhere
in the PG code base. So I'm not excited about touching it.

regards, tom lane

Attachment Content-Type Size
fix-phrase-search.patch text/x-diff 61.1 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Ryan Murphy 2016-12-20 20:31:50 Clarifying "server starting" messaging in pg_ctl start without --wait
Previous Message Tom Lane 2016-12-20 20:10:21 Re: Protect syscache from bloating with negative cache entries