Re: Operator Classes and ANALYZE

From: Rusty Conover <rconover(at)infogears(dot)com>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: <pgsql-bugs(at)postgresql(dot)org>
Subject: Re: Operator Classes and ANALYZE
Date: 2006-11-06 21:47:55
Message-ID: 77971F3B-C24E-4155-8B8B-09F2AE0C0903@infogears.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs


On Nov 6, 2006, at 1:53 PM, Simon Riggs wrote:

> On Fri, 2006-11-03 at 15:16 -0700, Rusty Conover wrote:
>
>> It doesn't appear that ANALYZE uses the specified operator class for
>> producing statistics on an index when that operator class is not the
>> default for the data type. This appears to be leading to poor query
>> planning.
>
>> For speed of indexing a few million urls I'm indexing them with a
>> custom data type. The data type aptly named "urlhash" has:
>
> Have you read the CREATE TYPE man page, specifically with regard to
> the
> analyze_function clause? Basically, if you want anything different,
> you
> have to write an analysis function yourself. This is what PostGIS
> (www.postgis.org) does, if you want to look for specific code
> examples.

I have read it and works great, because it says it uses the default
operators = and < to build the statistics which should work great for
the default operator class. I don't think I need to write my own
analyze function because the default behavior works so well. I just
want the ANALYZE call to use the index's opclass definitions of = and
< if the index is created with a custom operator class that is not
the default for the data type.

>
> ANALYZE collects stats for tables, not indexes, using the default
> operator class for the datatype. So even though you've clearly
> specified
> an opclass for the index, no stats will be collected using it.
>

I don't think thats correct because, with indexes based on functions
there are statistics that clearly can't be gathered just directly
from the table. Analyze does look at indexes to build up the
histogram statistics and correlation. See compute_index_stats in
analyze.c.

> Alternatively, perhaps you have fallen foul of this situation?
>

> /*
> * Can't analyze if the opclass uses a storage type
> * different from the expression result type. We'd get
> * confused because the type shown in pg_attribute for
> * the index column doesn't match what we are getting
> * from the expression. Perhaps this can be fixed
> * someday, but for now, punt.
> */
>
> It's in the analyze.c code, but not in the docs.
>
> Be interested in a full report of your research, once you're done.
>

I don't think this applies because the storage type is the same its
just a separate operator class being used.

I think the problem may be with the "ordering_oper" and
"equality_oper" only being passed the type rather then the type and
the operator class that's being used for the index. Looking more it
goes back into typcache.c which only caches the default operators
from the default btree or hash operator classes. Changing this looks
like it would require a bit of effort if it is the right path to pursue.

Thanks,

Rusty
--
Rusty Conover
InfoGears Inc.
Web: http://www.infogears.com

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Simon Riggs 2006-11-06 22:20:54 Re: Operator Classes and ANALYZE
Previous Message Simon Riggs 2006-11-06 20:53:28 Re: Operator Classes and ANALYZE