BUG #17569: false negative / positive results when using <-> (followed by) and tsvector limit (16383) hit

From: PG Bug reporting form <noreply(at)postgresql(dot)org>
To: pgsql-bugs(at)lists(dot)postgresql(dot)org
Cc: magicagent(at)gmail(dot)com
Subject: BUG #17569: false negative / positive results when using <-> (followed by) and tsvector limit (16383) hit
Date: 2022-08-03 17:51:10
Message-ID: 17569-47d11a72a38bf8ae@postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

The following bug has been logged on the website:

Bug reference: 17569
Logged by: Alex Malek
Email address: magicagent(at)gmail(dot)com
PostgreSQL version: 14.4
Operating system: Red Hat
Description:

It is well documented that "Position values in tsvector must be greater than
0 and no more than 16,383"

However these limits can result in false positive or false negative search
results
doing a FOLLOWED BY / phrase search in a document w/ more than 16,383
words.

The false negative seems particularly bad / unexpected.

The false positive results happen when a word is at or before before
position 16,382, then every word at or past position 16,383 appears to be at
16,383

SELECT tq, text, text @@ tq AS ok, repeat(' foo ',16381) || text @@ tq AS
false_pos
FROM (VALUES( websearch_to_tsquery('"red cat"'), 'red dogs chase with black
cats' )) t(tq, text) ;
tq | text | ok | false_pos
-----------------+--------------------------------+----+-----------
'red' <-> 'cat' | red dogs chase with black cats | f | t
(1 row)

The false negative happens for any phrase that exists at or after position
16,383 since all words appear to be at 16,383

# SELECT tq, text, text @@ tq AS small, repeat(' foo ',16381) || text @@ tq
AS false_neg
FROM (VALUES( websearch_to_tsquery('"black cat"'), 'red dogs chase with
black cats' )) t(tq, text) ;
tq | text | small | false_neg
-------------------+--------------------------------+-------+-----------
'black' <-> 'cat' | red dogs chase with black cats | t | f
(1 row)

Browse pgsql-bugs by date

  From Date Subject
Next Message Alex Malek 2022-08-03 18:02:51 Re: BUG #15172: Postgresql ts_headline with <-> operator does not highlight text properly
Previous Message Tom Lane 2022-08-03 14:56:13 Re: BUG #17564: Planner bug in combination of generate_series(), unnest() and ORDER BY