The examples in the previous section illustrated full text matching using simple constant strings. This section shows how to search table data, optionally using indexes.
It is possible to do a full text search without an index. A
simple query to print the
each row that contains the word
body field is:
SELECT title FROM pgweb WHERE to_tsvector('english', body) @@ to_tsquery('english', 'friend');
This will also find related words such as
since all these are reduced to the same normalized lexeme.
The query above specifies that the
english configuration is to be used to parse and
normalize the strings. Alternatively we could omit the
SELECT title FROM pgweb WHERE to_tsvector(body) @@ to_tsquery('friend');
This query will use the configuration set by default_text_search_config.
A more complex example is to select the ten most recent
documents that contain
table in the
SELECT title FROM pgweb WHERE to_tsvector(title || ' ' || body) @@ to_tsquery('create & table') ORDER BY last_mod_date DESC LIMIT 10;
For clarity we omitted the
coalesce function calls which would be needed to
find rows that contain
NULL in one of
the two fields.
Although these queries will work without an index, most applications will find this approach too slow, except perhaps for occasional ad-hoc searches. Practical use of text searching usually requires creating an index.
We can create a GIN index (Section 12.9) to speed up text searches:
CREATE INDEX pgweb_idx ON pgweb USING GIN (to_tsvector('english', body));
Notice that the 2-argument version of
to_tsvector is used. Only text search functions
that specify a configuration name can be used in expression indexes
(Section 11.7). This is
because the index contents must be unaffected by default_text_search_config.
If they were affected, the index contents might be inconsistent
because different entries could contain
tsvectors that were created with different text
search configurations, and there would be no way to guess which was
which. It would be impossible to dump and restore such an index
Because the two-argument version of
to_tsvector was used in the index above, only a
query reference that uses the 2-argument version of
to_tsvector with the same configuration name will
use that index. That is,
to_tsvector('english', body) @@ 'a & b' can use the
WHERE to_tsvector(body) @@ 'a
& b' cannot. This ensures that an index will be used
only with the same configuration used to create the index
It is possible to set up more complex expression indexes wherein the configuration name is specified by another column, e.g.:
CREATE INDEX pgweb_idx ON pgweb USING GIN (to_tsvector(config_name, body));
config_name is a column in
pgweb table. This allows mixed
configurations in the same index while recording which
configuration was used for each index entry. This would be useful,
for example, if the document collection contained documents in
different languages. Again, queries that are meant to use the index
must be phrased to match, e.g.,
to_tsvector(config_name, body) @@ 'a & b'.
Indexes can even concatenate columns:
CREATE INDEX pgweb_idx ON pgweb USING GIN (to_tsvector('english', title || ' ' || body));
Another approach is to create a separate
tsvector column to hold the output of
to_tsvector. This example is a concatenation of
to ensure that one field will still be indexed when the other is
ALTER TABLE pgweb ADD COLUMN textsearchable_index_col tsvector; UPDATE pgweb SET textsearchable_index_col = to_tsvector('english', coalesce(title,'') || ' ' || coalesce(body,''));
Then we create a GIN index to speed up the search:
CREATE INDEX textsearch_idx ON pgweb USING GIN (textsearchable_index_col);
Now we are ready to perform a fast full text search:
SELECT title FROM pgweb WHERE textsearchable_index_col @@ to_tsquery('create & table') ORDER BY last_mod_date DESC LIMIT 10;
When using a separate column to store the
tsvector representation, it is necessary to create a
trigger to keep the
body changes. Section 12.4.3
explains how to do that.
One advantage of the separate-column approach over an expression
index is that it is not necessary to explicitly specify the text
search configuration in queries in order to make use of the index.
As shown in the example above, the query can depend on
default_text_search_config. Another advantage is
that searches will be faster, since it will not be necessary to
to_tsvector calls to verify
index matches. (This is more important when using a GiST index than
a GIN index; see Section 12.9.)
The expression-index approach is simpler to set up, however, and it
requires less disk space since the
tsvector representation is not stored explicitly.
If you see anything in the documentation that is not correct, does not match your experience with the particular feature or requires further clarification, please use this form to report a documentation issue.