From: | Teodor Sigaev <teodor(at)sigaev(dot)ru> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, obartunov(at)gmail(dot)com |
Cc: | Jean-Pierre Pelletier <jppelletier(at)e-djuster(dot)com>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Should phraseto_tsquery('simple', 'blue blue') @@ to_tsvector('simple', 'blue') be true ? |
Date: | 2016-06-15 16:05:39 |
Message-ID: | 57617CD3.4040702@sigaev.ru |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
>> what's about word with several infinitives
>
>> select to_tsvector('en', 'leavings');
>> to_tsvector
>> ------------------------
>> 'leave':1 'leavings':1
>> (1 row)
>
>> select to_tsvector('en', 'leavings') @@ 'leave <0> leavings'::tsquery;
>> ?column?
>> ----------
>> t
>> (1 row)
Second example is not correct:
select phraseto_tsquery('en', 'leavings')
will produce 'leave | leavings'
and
select phraseto_tsquery('en', 'leavings cats')
will produce 'leave <-> cat | leavings <-> cat'
which seems correct and we don't need special threating of <0>.
> This brings up something else that I am not very sold on: to wit,
> do we really want the "less than or equal" distance behavior at all?
> The documentation gives the example that
> phraseto_tsquery('cat ate some rats')
> produces
> ( 'cat' <-> 'ate' ) <2> 'rat'
> because "some" is a stopword. However, that pattern will also match
> "cat ate rats", which seems surprising and unexpected to me; certainly
> it would surprise a user who did not realize that "some" is a stopword.
>
> So I think there's a reasonable case for decreeing that <N> should only
> match lexemes *exactly* N apart. If we did that, we would no longer have
> the misbehavior that Jean-Pierre is complaining about, and we'd not need
> to argue about whether <0> needs to be treated specially.
Agree, seems that's easy to change. I thought that I saw an issue with
hyphenated word but, fortunately, I forget that hyphenated words don't share a
position:
# select to_tsvector('foo-bar');
to_tsvector
-----------------------------
'bar':3 'foo':2 'foo-bar':1
# select phraseto_tsquery('foo-bar');
phraseto_tsquery
-----------------------------------
( 'foo-bar' <-> 'foo' ) <-> 'bar'
and
# select to_tsvector('foo-bar') @@ phraseto_tsquery('foo-bar');
?column?
----------
t
Patch is attached
--
Teodor Sigaev E-mail: teodor(at)sigaev(dot)ru
WWW: http://www.sigaev.ru/
Attachment | Content-Type | Size |
---|---|---|
phrase_exact_distance.patch | binary/octet-stream | 4.9 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Jimmy | 2016-06-15 16:09:10 | pg_isready features |
Previous Message | Jim Nasby | 2016-06-15 15:56:36 | Re: 10.0 |