Skip site navigation (1) Skip section navigation (2)

Operator Classes and ANALYZE

From: Rusty Conover <rconover(at)infogears(dot)com>
To: pgsql-bugs(at)postgresql(dot)org
Subject: Operator Classes and ANALYZE
Date: 2006-11-03 22:16:35
Message-ID: 46F770BA-EDFD-49B5-BF62-1B6F6BB1C185@infogears.com (view raw or flat)
Thread:
Lists: pgsql-bugs
Hi All,

It doesn't appear that ANALYZE uses the specified operator class for  
producing statistics on an index when that operator class is not the  
default for the data type.  This appears to be leading to poor query  
planning.

I'm using:

# select version();
-[ RECORD  
1 ]--------------------------------------------------------------------- 
-------------------------
version | PostgreSQL 8.1.5 on i686-pc-linux-gnu, compiled by GCC gcc  
(GCC) 4.1.0 20060304 (Red Hat 4.1.0-3)

For speed of indexing a few million urls I'm indexing them with a  
custom data type. The data type aptly named "urlhash" has:

A 2 byte host id (just the hostname, not many hosts are involved so  
this is fine)
A 2 byte locality id (parts of the path)
A 16 byte MD5 of the full url all in a custom data type called aptly.

For a total length of 20 bytes.  By default the text output of the  
data type is of the form of ([data in hex]).

Everything is nice and fast (due to good locality) for the default  
operator class that respects the entire data type.  Everything is  
unique so there is an n_distinct of -1 in pg_stats for the index.

Now, I created an operator class for urlhash that just does  
comparisons on the host id "urlhash_host_ops" and I created the index  
using:

# create index url_uh2_idx on url using btree  (texttourlhash(url)  
urlhash_host_ops);

After running vacuum analyze that statistics for the new index using  
the second operator class still say that every value is unique.  It  
appears it's using the default operator class to produce the  
statistics.  By doing so, the planner doesn't work great since it  
assumes that only one row will be returned.

Note: I prefixed all default operators with @ to denote the host class.

# explain select * from url where url @= 'http:// 
www.prolitegear.com/'::urlhash ;
                                        QUERY PLAN
------------------------------------------------------------------------ 
-----------------
Index Scan using url_uh2_idx on url  (cost=0.00..6.02 rows=1 width=105)
    Index Cond: ((url)::urlhash @=  
'(93d48c2ab505280c99322630a24f61c6533bc368)'::urlhash)


# select count(*) from url where url @= 'http:// 
www.prolitegear.com/'::urlhash;
count
-------
   6992

There are hosts with orders of magnitude difference in total number  
of rows (100,000 or more), this was just chosen as an example.

Here are the stats for the index:

# select * from pg_stats where tablename = 'url_uh2_idx';
-[ RECORD 1 ]----- 
+----------------------------------------------------------------------- 
------------------------------------------------------------------------ 
------------------------------------------------------------------------ 
------------------------------------------------------------------------ 
------------------------------------------------------------------------ 
------------------------------------------------------------------------ 
--------------------------------------------
schemaname        | public
tablename         | url_uh2_idx
attname           | pg_expression_1
null_frac         | 0
avg_width         | 20
n_distinct        | -1
most_common_vals  |
most_common_freqs |
histogram_bounds  | {(11890a55087088877a5a01ea5b8a8459e62491c9), 
(3c983e95c87b3ba8669ade6fe1be797495be671f), 
(4ac4698bc9a61b94eef527c6599217a0bbcd1ad6), 
(4ac4b7c874b9b16ceab038e93646510a8fb756ca), 
(60d74740f0a49e68e89e01232373ed2da0ca4c2f), 
(67fe336eb73e327491a4e8bf224bce5857879188), 
(7975336e56ad4ee211ebee9ff2a6f1e4c9e43ca5), 
(88a9336e7a8e5419ff63ed1cec6985d91cd15688), 
(909b76d12cf6c84255a33a008bf91f0cf0f711c4), 
(d687336e90c41fc92776dc35896797ef873c38f5), 
(f4aaba84e7e86a4c7d7c4d5930128cb3a5889cbf)}
correlation       | -0.109073

Should I just create a cast function rather then an entire operator  
class that just respects the host part?  Is this bad style?  If you  
have questions I'd be happy to explain more.

Thanks,

Rusty
--
Rusty Conover
InfoGears Inc.
Web: http://www.infogears.com




Responses

pgsql-bugs by date

Next:From: Harald Armin MassaDate: 2006-11-04 00:17:30
Subject: Re: BUG #2731: Cannot install PostgreSQL server on WinXP Media Center Edition
Previous:From: Jeff DavisDate: 2006-11-03 20:25:22
Subject: Re: COPY fails on 8.1 with invalid byte sequences in text

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group