Re: Should phraseto_tsquery('simple', 'blue blue') @@ to_tsvector('simple', 'blue') be true ?

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Jean-Pierre Pelletier <jppelletier(at)e-djuster(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org, Teodor Sigaev <teodor(at)sigaev(dot)ru>, Oleg Bartunov <obartunov(at)gmail(dot)com>
Subject: Re: Should phraseto_tsquery('simple', 'blue blue') @@ to_tsvector('simple', 'blue') be true ?
Date: 2016-06-07 22:05:10
Message-ID: 16167.1465337110@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Jean-Pierre Pelletier <jppelletier(at)e-djuster(dot)com> writes:
> I wanted to test if phraseto_tsquery(), new with 9.6 could be used for
> matching consecutive words but it won't work for us if it cannot handle
> consecutive *duplicate* words.

> For example, the following returns true: select
> phraseto_tsquery('simple', 'blue blue') @@ to_tsvector('simple', 'blue');

> Is this expected ?

I concur that that seems like a rather useless behavior. If we have
"x <-> y" it is not possible to match at distance zero, while if we
have "x <-> x" it seems unlikely that the user is expecting us to
treat that identically to "x". So phrase search simply should not
consider distance-zero matches.

The attached one-liner patch seems to fix this problem, though I am
uncertain whether any other places need to be changed to match.
Also, there is a regression test case that changes:

*** /home/postgres/pgsql/src/test/regress/expected/tstypes.out Thu May 5 19:21:17 2016
--- /home/postgres/pgsql/src/test/regress/results/tstypes.out Tue Jun 7 17:55:41 2016
***************
*** 897,903 ****
SELECT ts_rank_cd(' a:1 sa:2A sb:2D g'::tsvector, 'a <-> s:* <-> sa:A');
ts_rank_cd
------------
! 0.0714286
(1 row)

SELECT ts_rank_cd(' a:1 sa:2A sb:2D g'::tsvector, 'a <-> s:* <-> sa:B');
--- 897,903 ----
SELECT ts_rank_cd(' a:1 sa:2A sb:2D g'::tsvector, 'a <-> s:* <-> sa:A');
ts_rank_cd
------------
! 0
(1 row)

SELECT ts_rank_cd(' a:1 sa:2A sb:2D g'::tsvector, 'a <-> s:* <-> sa:B');

I'm not sure if this case is intentionally exhibiting the behavior that
both parts of "s:* <-> sa:A" can be matched to the same lexeme, or if the
result simply wasn't thought about carefully.

regards, tom lane

Attachment Content-Type Size
phrase-search-no-match-at-distance-0.patch text/x-diff 658 bytes

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2016-06-07 22:06:26 Re: COMMENT ON, psql and access methods
Previous Message Peter Geoghegan 2016-06-07 21:01:22 Re: Parallel query and temp_file_limit