From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | Chris Spencer <chrisspen(at)gmail(dot)com> |
Cc: | pgsql-novice(at)postgresql(dot)org |
Subject: | Re: How to enable partial matching on a GIN index |
Date: | 2016-01-11 20:33:37 |
Message-ID: | 2444.1452544417@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-novice |
Chris Spencer <chrisspen(at)gmail(dot)com> writes:
>> What's your grounds for claiming that?
> It returns no results whenever I use partial search terms.
Oh; that's got nothing to do with whether an index is used or not.
The index just makes it faster.
> If I search for,
> say, "hospital" it returns results containing the exact word "hospital",
> but if I search for "hosp" it returns nothing. The doc page explaining that
> PG "can" do partial matches, and not that it "does" do partial matches, led
> me to believe this is the expected default behaviour. Is this not the case?
There are a few problems with your example ...
> ALTER TABLE mytable ADD COLUMN search_index tsvector;
> CREATE INDEX mytable_search_index_gin ON mytable USING gin(search_index);
> INSERT INTO mytable (name, search_index) VALUES ('hospital',
> plainto_tsquery('pg_catalog.english', 'hospital'));
That couldn't have worked as written, because tsvector != tsquery.
I assume you meant to_tsvector() in the INSERT.
> SELECT * FROM mytable WHERE (search_index) @@
> (plainto_tsquery('pg_catalog.english', 'hospital')); -- returns results
Right ...
> SELECT * FROM mytable WHERE (search_index) @@
> (plainto_tsquery('pg_catalog.english', 'hosp')); -- returns nothing
This isn't right, you have to use
to_tsquery('pg_catalog.english', 'hosp:*')
to specify a partial match. You didn't give a partial match indicator
(":*"), and if you had, plainto_tsquery() would have stripped it off,
because it throws away all punctuation.
Another pitfall to keep in mind is that stemming may result in searches
not matching that look like they should. For example, what's really
stored in your tsvector column for this example is
# select to_tsvector('pg_catalog.english', 'hospital');
to_tsvector
-------------
'hospit':1
(1 row)
and a prefix match has to match or be a prefix of that string exactly.
So you might expect to_tsquery('pg_catalog.english', 'hospita:*')
to match that entry, but it won't. (In some cases, stemming of the query
word will hide this effect, which is why a search for 'hospital' works;
but in this example 'hospita' doesn't look enough like an English word
to trigger removal of 'a'.)
If you're confused, it always pays to look at the actual outputs of
the to_tsvector and to_tsquery functions. There's no particular magic
after that point, it's just string matching; but the word-break and
stemmer functions can do surprising things.
If you expect to be doing mostly prefix-type searches, you might end
up deciding you want to use the "simple" text search configuration,
which I'm pretty sure does no stemming at all.
regards, tom lane
From | Date | Subject | |
---|---|---|---|
Next Message | Chris Spencer | 2016-01-11 21:31:24 | Re: How to enable partial matching on a GIN index |
Previous Message | Chris Spencer | 2016-01-11 20:14:04 | Re: How to enable partial matching on a GIN index |