Re: Updated tsearch documentation

From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Teodor Sigaev <teodor(at)sigaev(dot)ru>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Updated tsearch documentation
Date: 2007-07-17 21:24:09
Message-ID: 200707172124.l6HLO9212328@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-advocacy pgsql-hackers

Oleg Bartunov wrote:
> On Tue, 17 Jul 2007, Bruce Momjian wrote:
>
> > I think the tsearch documentation is nearing completion:
> >
> > http://momjian.us/expire/fulltext/HTML/textsearch.html
> >
> > but I am not happy with how tsearch is enabled in a user table:
> >
> > http://momjian.us/expire/fulltext/HTML/textsearch-app-tutorial.html
> >
> > Aside from the fact that it needs more examples, it only illustrates an
> > example where someone creates a table, populates it, then adds a
> > tsvector column, populates that, then creates an index.
> >
> > That seems quite inflexible. Is there a way to avoid having a separate
> > tsvector column? What happens if the table is dynamic? How is that
> > column updated based on table changes? Triggers? Where are the
> > examples? Can you create an index like this:
>
> I agree, that there are could be more examples, but text search doesn't
> require something special !
> *Example* of trigger function is documented on
> http://momjian.us/expire/fulltext/HTML/textsearch-opfunc.html

Yes, I see that in tsearch() here:

http://momjian.us/expire/fulltext/HTML/textsearch-opfunc.html#TEXTSEARC$

I assume my_filter_name is optional right? I have updated the prototype
to be:

tsearch([vector_column_name], [my_filter_name], text_column_name [, ... ])

Is this accurate? What does this text below it mean?

There can be many functions and text columns specified in a tsearch()
trigger. The following rule is used: a function is applied to all
subsequent TEXT columns until the next matching column occurs.

Why are we allowing my_filter_name here? Isn't that something for a
custom trigger. Is calling it tsearch() a good idea? Why not
tsvector_trigger().

> > CREATE INDEX textsearch_id ON pgweb USING gin(to_tsvector(column));
> >
> > That avoids having to have a separate column because you can just say:
> >
> > WHERE to_query('XXX') @@ to_tsvector(column)
>
> yes, it's possible, but without ranking, since currently it's impossible
> to store any information in index (it's pg's feature). btw, this should
> works and for GiST index also.

What if they use @@@. Wouldn't that work because it is going to check
the heap?

> That kind of search is useful if there is another natural ordering of search
> results, for example, by timestamp.
>
> >
> > How do we make sure that the to_query is using the same text search
> > configuration as the 'column' or index? Perhaps we should suggest:
>
> please, keep in mind, it's not mandatory to use the same configuration
> at search time, that was used at index creation.

Well, sort of. If you have stop words in the tquery configuration, you
aren't going to hit any matches in the tsvector, right? Same for
synonymns, I suppose. I can see that stemming would work if there was a
mismatch between tsquery and tsvector.

> > CREATE INDEX textsearch_idx ON pgweb USING gin(to_tsvector('english',column));
> >
> > so that at least the configuration is documented in the index.
>
> yes, it's better to always explicitly specify configuration name and not
> rely on default configuration.
> Unfortunately, configuration name doesn't saved in the index.

I was more concerned that there is nothing documenting the configuration
used by the index or the tsvector table column trigger. By doing:

CREATE INDEX textsearch_idx ON pgweb USING gin(to_tsvector('english',column));

you guarantee that the index uses 'english' for all its entries. If you
omit the 'english' or use a different configuration, it will heap scan
the table, which at least gives the right answer.

Also, how do you guarantee that tsearch() triggers always uses the same
configuration? The existing tsearch() API seems to make that
impossible. I am wondering if we need to add the configuration name as
a mandatory parameter to tsearch().

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://www.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

In response to

Responses

Browse pgsql-advocacy by date

  From Date Subject
Next Message Bruce Momjian 2007-07-17 21:30:17 Re: Change to EnterpriseDB website
Previous Message Derek Rodner 2007-07-17 20:19:37 Change to EnterpriseDB website

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Glaesemann 2007-07-17 21:35:54 Re: Updated tsearch documentation
Previous Message Magnus Hagander 2007-07-17 18:07:36 Re: [HACKERS] msvc, build and install with cygwin in the PATH