On Thu, 2009-10-22 at 18:28 +0200, Jesper Krogh wrote:
> I somehow would expect the index-search to take advantage of the MCV's
> informations in the statistics that sort of translate it into a search
> and post-filtering (as PG's queryplanner usually does at the SQL-level).
MCVs are full values that are found in columns or indexes -- you aren't
likely to have two entire documents that are exactly equal, so MCVs are
useless in your example.
I believe that stop words are a more common way of accomplishing what
you want to do, but they are slightly more limited: they won't be
checked at any level, and so are best used for truly common words like
"and". From your example, I assume that you still want the word checked,
but it's not selective enough to be usefully checked by the index.
In effect, what you want are words that aren't searched (or stored) in
the index, but are included in the tsvector (so the RECHECK still
works). That sounds like it would solve your problem and it would reduce
index size, improve update performance, etc. I don't know how difficult
it would be to implement, but it sounds reasonable to me.
The only disadvantage is that it's more metadata to manage -- all of the
existing data like dictionaries and stop words, plus this new "common
words". Also, it would mean that almost every match requires RECHECK. It
would be interesting to know how common a word needs to be before it's
better to leave it out of the index.
In response to
pgsql-performance by date
|Next:||From: Jesper Krogh||Date: 2009-10-23 05:18:32|
|Subject: Re: Queryplan within FTS/GIN index -search.|
|Previous:||From: Scott Carey||Date: 2009-10-22 22:08:00|
|Subject: Re: Partitioned Tables and ORDER BY|