Skip site navigation (1) Skip section navigation (2)

Re: Operator Classes and ANALYZE

From: Rusty Conover <rconover(at)infogears(dot)com>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: <pgsql-bugs(at)postgresql(dot)org>
Subject: Re: Operator Classes and ANALYZE
Date: 2006-11-06 21:47:55
Message-ID: 77971F3B-C24E-4155-8B8B-09F2AE0C0903@infogears.com (view raw or flat)
Thread:
Lists: pgsql-bugs
On Nov 6, 2006, at 1:53 PM, Simon Riggs wrote:

> On Fri, 2006-11-03 at 15:16 -0700, Rusty Conover wrote:
>
>> It doesn't appear that ANALYZE uses the specified operator class for
>> producing statistics on an index when that operator class is not the
>> default for the data type.  This appears to be leading to poor query
>> planning.
>
>> For speed of indexing a few million urls I'm indexing them with a
>> custom data type. The data type aptly named "urlhash" has:
>
> Have you read the CREATE TYPE man page, specifically with regard to  
> the
> analyze_function clause? Basically, if you want anything different,  
> you
> have to write an analysis function yourself. This is what PostGIS
> (www.postgis.org) does, if you want to look for specific code  
> examples.


I have read it and works great, because it says it uses the default  
operators = and < to build the statistics which should work great for  
the default operator class.    I don't think I need to write my own  
analyze function because the default behavior works so well.  I just  
want the ANALYZE call to use the index's opclass definitions of = and  
< if the index is created with a custom operator class that is not  
the default for the data type.


>
> ANALYZE collects stats for tables, not indexes, using the default
> operator class for the datatype. So even though you've clearly  
> specified
> an opclass for the index, no stats will be collected using it.
>

I don't think thats correct because, with indexes based on functions  
there are statistics that clearly can't be gathered just directly  
from the table.  Analyze does look at indexes to build up the  
histogram statistics and correlation.  See compute_index_stats in  
analyze.c.


> Alternatively, perhaps you have fallen foul of this situation?
>

> /*
>  * Can't analyze if the opclass uses a storage type
>  * different from the expression result type. We'd get
>  * confused because the type shown in pg_attribute for
>  * the index column doesn't match what we are getting
>  * from the expression. Perhaps this can be fixed
>  * someday, but for now, punt.
>  */
>
> It's in the analyze.c code, but not in the docs.
>
> Be interested in a full report of your research, once you're done.
>

I don't think this applies because the storage type is the same its  
just a separate operator class being used.

I think the problem may be with the "ordering_oper" and  
"equality_oper" only being passed the type rather then the type and  
the operator class that's being used for the index.  Looking more it  
goes back into typcache.c which only caches the default operators  
from the default btree or hash operator classes.  Changing this looks  
like it would require a bit of effort if it is the right path to pursue.

Thanks,

Rusty
--
Rusty Conover
InfoGears Inc.
Web: http://www.infogears.com



In response to

Responses

pgsql-bugs by date

Next:From: Simon RiggsDate: 2006-11-06 22:20:54
Subject: Re: Operator Classes and ANALYZE
Previous:From: Simon RiggsDate: 2006-11-06 20:53:28
Subject: Re: Operator Classes and ANALYZE

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group