Re: Simplifying Text Search

From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Simplifying Text Search
Date: 2007-11-20 19:25:32
Message-ID: 200711201925.lAKJPWq19902@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


This has been saved for the 8.4 release:

http://momjian.postgresql.org/cgi-bin/pgpatches_hold

---------------------------------------------------------------------------

Simon Riggs wrote:
> Something Tom Dunstan just mentioned has made me ask the question "Why
> does our full text search feature look so strange?". It's the
> operator-laden syntax that causes the problem.
>
> By any stretch, this query is difficult for most people to understand:
>
> SELECT * FROM text_table
> WHERE to_tsvector('fat cats ate fat rats') @@ to_tsquery('fat & rat');
>
> Wouldn't it be much simpler to just have a function, so we can write
> this query like this?
>
> SELECT * FROM text_table
> WHERE text_search('haystack needle haystack', 'needle');
>
> We then explain to people that while the above is nice, it will presume
> that both the function inputs are Text, which isn't any good for complex
> searches, indexing and dictionaries etc.., so then we move to:
>
> SELECT * FROM text_table
> WHERE text_search('haystack needle haystack'::tsvector,
> 'needle'::tsquery);
>
> or perhaps
>
> SELECT * FROM text_table
> WHERE full_text_search('haystack needle haystack', 'needle & hay');
>
> which would automatically do the conversions to tsvector and tsquery for
> us. No more tedious casting, easy to read.
>
> [*text_search() functions would return bool]
>
> So we end up with a normal sounding function that is overloaded to
> provide all of the various goodies. We can include the text_search(text,
> text) version of the function in the normal chapter on functions, with a
> pointer to the more complex stuff elsewhere.
>
> Sound good?
>
> We can then explain everything without having to use @@ operators. They
> can then be introduced as an option.
>
> The side benefit of this is that we can then allow our wonderful new
> functionality to be more easily usable by things like Hibernate. We just
> tell them we have this new function and thats all they need to know.
>
> I know that under the covers the @@ operator is necessary because we
> hang various pieces of optimizer information from it. Each function
> signature gets an operator with matching signature, so there's a 1:1
> correspondence in most use cases. So to make this all hang together,
> there'd need to be a some smarts that says: if there is only one
> operator on a function then use the operator's optimizer information
> when you see just the function. That information can be assessed at DDL
> time, so we can keep accurate track of operator counts in pgproc.
>
> An alternative approach might be to make the first operator created on a
> function the "primary" operator. All other operators would then be
> secondary operators, so that adding operators would not change the
> inference mechanism.
>
> I've not got sufficient knowledge to say how hard the
> function-to-operator inference is, but it would be dang useful in making
> text search and many other programs readable and easy to interface to.
> In the end that is going to mean wider usage of that functionality, with
> more people feeling like they can dip their toes into the water.
>
> I must confess I have insufficient time to do this myself right now, not
> least me discovering exactly how. I'm spending time on this now because
> I'm the one that has to explain this stuff to people and things like
> this can make a huge difference in their understanding and eventual
> uptake.
>
> Thoughts?
>
> --
> Simon Riggs
> 2ndQuadrant http://www.2ndQuadrant.com
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 7: You can help support the PostgreSQL project by donating at
>
> http://www.postgresql.org/about/donate

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://postgres.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2007-11-20 19:29:42 Re: possible to create multivalued index from xpath() results in 8.3?
Previous Message Bruce Momjian 2007-11-20 19:24:13 Re: [HACKERS] pg_ctl -t N register ??