Re: Simplifying Text Search

From: Gregory Stark <stark(at)enterprisedb(dot)com>
To: "Bruce Momjian" <bruce(at)momjian(dot)us>
Cc: "Simon Riggs" <simon(at)2ndquadrant(dot)com>, "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Simplifying Text Search
Date: 2007-11-13 14:44:08
Message-ID: 877ikmjjd3.fsf@oxford.xeocode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

"Bruce Momjian" <bruce(at)momjian(dot)us> writes:

> I realized this when editing the documentation but not clearly. I
> noticed that:
>
> http://momjian.us/main/writings/pgsql/sgml/textsearch-intro.html#TEXTSEARCH-MATCHING
>
> tsvector @@ tsquery
> tsquery @@ tsvector
> text @@ tsquery
> text @@ text
>
> The first two of these we saw already. The form text @@ tsquery is
> equivalent to to_tsvector(x) @@ y. The form text @@ text is equivalent
> to to_tsvector(x) @@ plainto_tsquery(y).
>
> was quite odd, especially the "text @@ text" case, and in fact it makes
> casting almost required unless you can remember which one is a query and
> which is a vector (hint, the vector is first). What really adds to the
> confusion is that the operator is two _identical_ characters, meaning
> the operator is symetric, and it behave symetric if you cast one side,
> but as vector @@ query if you don't.

I find this odd as well. Effectively what we're doing is rather than defining
the casting behaviour in a global way we're defining operators specifically
for text which do the casts internally. That seems like a bad idea, especially
given that other data types implement @@ operators as well.

I feel like the right idea is to throw out all but tsvector @@ tsquery and
define casts as necessary to get that to work in every (non-inverted) case
above.

Actually the only case which wouldn't work with just that is a bare

'foo' @@ 'bar'

And even that would work fine until you load _int.sql or ltree.sql which
define conflicting operators.

Separately I feel like we should name this operator something like ~= or =? or
something like that. @@ doesn't look like any kind of equality or matching
operator and it looks symmetric. We also already have @@ operators which are
right-handed unary operators for geometric data types which this is very
different from.

I would suggest something like =?

PS: I thought of a whacky idea which would look neat but be mainly silly. I
thought I would mention it anyways though. If we define a unary postfix
operator "text ?" which just casted text to tsquery then define a
"text = tsquery" operator which does what @@ does. Then you could write
queries like:

WHERE col = 'foo & bar' ?

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com
Ask me about EnterpriseDB's Slony Replication support!

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrew Sullivan 2007-11-13 15:02:46 Re: How to keep a table in memory?
Previous Message andy 2007-11-13 14:41:55 Re: New to PostgreSQL