From: | Oleg Bartunov <obartunov(at)gmail(dot)com> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | Jean-Pierre Pelletier <jppelletier(at)e-djuster(dot)com>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>, Teodor Sigaev <teodor(at)sigaev(dot)ru> |
Subject: | Re: Should phraseto_tsquery('simple', 'blue blue') @@ to_tsvector('simple', 'blue') be true ? |
Date: | 2016-06-08 20:39:42 |
Message-ID: | CAF4Au4wkjS6D2dG9Z1_VFJ95zojhwpVvkY4JGq6W-BwL3+tJyQ@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Wed, Jun 8, 2016 at 1:05 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Jean-Pierre Pelletier <jppelletier(at)e-djuster(dot)com> writes:
>> I wanted to test if phraseto_tsquery(), new with 9.6 could be used for
>> matching consecutive words but it won't work for us if it cannot handle
>> consecutive *duplicate* words.
>
>> For example, the following returns true: select
>> phraseto_tsquery('simple', 'blue blue') @@ to_tsvector('simple', 'blue');
>
>> Is this expected ?
>
> I concur that that seems like a rather useless behavior. If we have
> "x <-> y" it is not possible to match at distance zero, while if we
> have "x <-> x" it seems unlikely that the user is expecting us to
> treat that identically to "x". So phrase search simply should not
> consider distance-zero matches.
what's about word with several infinitives
select to_tsvector('en', 'leavings');
to_tsvector
------------------------
'leave':1 'leavings':1
(1 row)
select to_tsvector('en', 'leavings') @@ 'leave <0> leavings'::tsquery;
?column?
----------
t
(1 row)
>
> The attached one-liner patch seems to fix this problem, though I am
> uncertain whether any other places need to be changed to match.
> Also, there is a regression test case that changes:
>
> *** /home/postgres/pgsql/src/test/regress/expected/tstypes.out Thu May 5 19:21:17 2016
> --- /home/postgres/pgsql/src/test/regress/results/tstypes.out Tue Jun 7 17:55:41 2016
> ***************
> *** 897,903 ****
> SELECT ts_rank_cd(' a:1 sa:2A sb:2D g'::tsvector, 'a <-> s:* <-> sa:A');
> ts_rank_cd
> ------------
> ! 0.0714286
> (1 row)
>
> SELECT ts_rank_cd(' a:1 sa:2A sb:2D g'::tsvector, 'a <-> s:* <-> sa:B');
> --- 897,903 ----
> SELECT ts_rank_cd(' a:1 sa:2A sb:2D g'::tsvector, 'a <-> s:* <-> sa:A');
> ts_rank_cd
> ------------
> ! 0
> (1 row)
>
> SELECT ts_rank_cd(' a:1 sa:2A sb:2D g'::tsvector, 'a <-> s:* <-> sa:B');
>
>
> I'm not sure if this case is intentionally exhibiting the behavior that
> both parts of "s:* <-> sa:A" can be matched to the same lexeme, or if the
> result simply wasn't thought about carefully.
>
> regards, tom lane
>
From | Date | Subject | |
---|---|---|---|
Next Message | Bruce Momjian | 2016-06-08 21:07:34 | Re: Use of index for 50% column restriction |
Previous Message | Oleg Bartunov | 2016-06-08 20:33:59 | Re: Should phraseto_tsquery('simple', 'blue blue') @@ to_tsvector('simple', 'blue') be true ? |