Re: Simplifying Text Search

From: Simon Riggs <simon(at)2ndquadrant(dot)com>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Simplifying Text Search
Date: 2007-11-13 06:48:39
Message-ID: 1194936519.2644.261.camel@ebony.site
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, 2007-11-12 at 23:03 -0500, Bruce Momjian wrote:
> Simon Riggs wrote:
> > On Mon, 2007-11-12 at 11:56 -0500, Tom Lane wrote:
> > > Simon Riggs <simon(at)2ndquadrant(dot)com> writes:
> > > > So we end up with a normal sounding function that is overloaded to
> > > > provide all of the various goodies.
> > >
> > > As best I can tell, @@ does exactly this already. This is just a
> > > different spelling of the same capability, and I don't actually
> > > find it better. Why is "text_search(x,y)" better than "x @@ y"?
> > > We don't recommend that people write "texteq(x,y)" instead of
> > > "x = y".
> >
> > Most people don't understand those differences. x = y means "make sure
> > they are the same" to most people. They don't see what you (and I) see:
> > function and operator interchangeability. So text_search() is better
> > than @@ and = is better than texteq(). Life ain't neat...
> >
> > Right now, Full Text Search SQL looks like complete gibberish and it
> > dissuades many people from using what is an awesome set of features. I
> > just want to add a little sugar to help people get started.
>
> I realized this when editing the documentation but not clearly. I
> noticed that:
>
> http://momjian.us/main/writings/pgsql/sgml/textsearch-intro.html#TEXTSEARCH-MATCHING
>
> tsvector @@ tsquery
> tsquery @@ tsvector
> text @@ tsquery
> text @@ text
>
> The first two of these we saw already. The form text @@ tsquery is
> equivalent to to_tsvector(x) @@ y. The form text @@ text is equivalent
> to to_tsvector(x) @@ plainto_tsquery(y).
>
> was quite odd, especially the "text @@ text" case, and in fact it makes
> casting almost required unless you can remember which one is a query and
> which is a vector (hint, the vector is first). What really adds to the
> confusion is that the operator is two _identical_ characters, meaning
> the operator is symetric, and it behave symetric if you cast one side,
> but as vector @@ query if you don't.

I'm thinking we can have an inlinable function

contains(text, text) returns int

Return values limited to just 0 or 1 or NULL, as with SQL/MM.
It's close to SQL/MM, but not exact.

contains(sourceText, searchText) is a macro for

case to_tsvector(default_text_search_config, sourceText) @@
to_tsquery(default_text_search_config, searchText)
when true then 1
when false then 0
else null
end

that allows us to write indexable queries like this

WHERE contains(sourceText, searchText) > 0

where we must still have built the index on a constant config.
Not checked that still works yet, maybe not, in which case something
slightly more complex to make sure its still indexable. This is the
difficult part.

So changes are:
- add SQL function
- simplify first 2 pages of docs using this function

--
Simon Riggs
2ndQuadrant http://www.2ndQuadrant.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Pavel Stehule 2007-11-13 07:58:03 Re: Simplifying Text Search
Previous Message Christopher Browne 2007-11-13 05:30:42 Re: How to keep a table in memory?