Re: Should phraseto_tsquery('simple', 'blue blue') @@ to_tsvector('simple', 'blue') be true ?

From: Teodor Sigaev <teodor(at)sigaev(dot)ru>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, obartunov(at)gmail(dot)com
Cc: Jean-Pierre Pelletier <jppelletier(at)e-djuster(dot)com>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Should phraseto_tsquery('simple', 'blue blue') @@ to_tsvector('simple', 'blue') be true ?
Date: 2016-06-15 16:05:39
Message-ID: 57617CD3.4040702@sigaev.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

>> what's about word with several infinitives
>
>> select to_tsvector('en', 'leavings');
>> to_tsvector
>> ------------------------
>> 'leave':1 'leavings':1
>> (1 row)
>
>> select to_tsvector('en', 'leavings') @@ 'leave <0> leavings'::tsquery;
>> ?column?
>> ----------
>> t
>> (1 row)

Second example is not correct:

select phraseto_tsquery('en', 'leavings')
will produce 'leave | leavings'

and

select phraseto_tsquery('en', 'leavings cats')
will produce 'leave <-> cat | leavings <-> cat'

which seems correct and we don't need special threating of <0>.

> This brings up something else that I am not very sold on: to wit,
> do we really want the "less than or equal" distance behavior at all?
> The documentation gives the example that
> phraseto_tsquery('cat ate some rats')
> produces
> ( 'cat' <-> 'ate' ) <2> 'rat'
> because "some" is a stopword. However, that pattern will also match
> "cat ate rats", which seems surprising and unexpected to me; certainly
> it would surprise a user who did not realize that "some" is a stopword.
>
> So I think there's a reasonable case for decreeing that <N> should only
> match lexemes *exactly* N apart. If we did that, we would no longer have
> the misbehavior that Jean-Pierre is complaining about, and we'd not need
> to argue about whether <0> needs to be treated specially.

Agree, seems that's easy to change. I thought that I saw an issue with
hyphenated word but, fortunately, I forget that hyphenated words don't share a
position:
# select to_tsvector('foo-bar');
to_tsvector
-----------------------------
'bar':3 'foo':2 'foo-bar':1
# select phraseto_tsquery('foo-bar');
phraseto_tsquery
-----------------------------------
( 'foo-bar' <-> 'foo' ) <-> 'bar'
and
# select to_tsvector('foo-bar') @@ phraseto_tsquery('foo-bar');
?column?
----------
t

Patch is attached

--
Teodor Sigaev E-mail: teodor(at)sigaev(dot)ru
WWW: http://www.sigaev.ru/

Attachment Content-Type Size
phrase_exact_distance.patch binary/octet-stream 4.9 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jimmy 2016-06-15 16:09:10 pg_isready features
Previous Message Jim Nasby 2016-06-15 15:56:36 Re: 10.0