I've been poking at the FTS problem example recently submitted by
Artur Dabrowski. It contains an index declared like so:
CREATE INDEX idx_keywords_ger ON search_tab
USING gin (to_tsvector('german'::regconfig, keywords));
which can be queried like so
select * from search_tab where
to_tsvector('german', keywords) @@ to_tsquery('german', 'whatever');
which is all fine and matches suggestions in our documentation. But I
was dismayed to discover that ANALYZE wasn't storing any statistics
for this expression index, which rather hobbles the planner's ability
to produce useful estimates for the selectivity of such WHERE clauses.
The reason for that turns out to be this bit in analyze.c:
* Can't analyze if the opclass uses a storage type
* different from the expression result type. We'd get
* confused because the type shown in pg_attribute for
* the index column doesn't match what we are getting
* from the expression. Perhaps this can be fixed
* someday, but for now, punt.
if (exprType(indexkey) !=
Since a tsvector index does have a storage type different from tsvector
(in fact, it's shown as TEXT), this code decides to punt and not analyze
the column. I think it's past time we did something about this.
After a bit of study of the code, it appears to me that it's not too
difficult to fix: we just have to use the expression's result type
rather than the index column's atttypid in the subsequent processing.
ANALYZE never actually looks at the index column contents (indeed
cannot, since our index AMs provide no API for extracting the contents),
so atttypid isn't really all that relevant to it. However, this'll
imply an API/ABI break for custom typanalyze functions: we'll need to
store the actual type OID in struct VacAttrStats, and use that rather
than looking to attr->atttypid or any of the other type-dependent
fields of the Form_pg_attribute. So that seems to make it not practical
I thought of an ugly hack that would avoid an API/ABI break: since
VacAttrStats.attr is a copy of the pg_attribute data, we could scribble
on it, changing atttypid, attlen, attbyval, etc to match the index
expression result type. This seems pretty grotty, but it would allow
the fix to be back-patched into existing branches.
Comments? I'm not sure which way to jump here.
regards, tom lane
pgsql-hackers by date
|Next:||From: Tom Lane||Date: 2010-07-31 04:40:03|
|Subject: rbtree code breaks GIN's adherence to maintenance_work_mem|
|Previous:||From: Robert Haas||Date: 2010-07-31 01:55:45|
|Subject: Re: reducing NUMERIC size for 9.1|