query plans and immutable functions

From: Reece Hart <reece(at)harts(dot)net>
To: SF PostgreSQL <sfpug(at)postgresql(dot)org>
Subject: query plans and immutable functions
Date: 2006-05-09 17:51:00
Message-ID: 1147197060.27835.54.camel@tallac.gene.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: sfpug

I've got a simple 1-predicate query that uses an index when the
condition is literal but does a seq scan when the condition uses the
result of an immutable function. This leads to terrible performance in
the latter case. I believe that this is a bug candidate and I'll
forward it to pgsql-bugs if other sfpug folks agree.

Environs:

unison(at)csb-dev=> select version();
version
------------------------------------------------------------------------------------------------------------------
PostgreSQL 8.1.3 on x86_64-unknown-linux-gnu, compiled by GCC gcc (GCC) 4.0.2 20050901 (prerelease) (SUSE Linux)

unison(at)csb-dev=> \d pseq
Table "unison.pseq"
Column | Type | Modifiers
---------+--------------------------+--------------------------------------------------------
pseq_id | integer | not null default nextval('pseq_pseq_id_seq'::regclass)
seq | text | not null
len | integer | not null
md5 | character(32) | not null
added | timestamp with time zone | not null default now()
Indexes:
"pseq_pkey" PRIMARY KEY, btree (pseq_id) CLUSTER
"pseq_md5" UNIQUE, btree (md5)
"pseq_seqhash" UNIQUE, btree (seqhash(seq))
"pseq_added" btree (added)
"pseq_len" btree (len)
Triggers:
pseq_iu_trigger BEFORE INSERT OR UPDATE ON pseq FOR EACH ROW EXECUTE PROCEDURE pseq_iu_trigger()

unison(at)csb-dev=> analyze pseq;
ANALYZE
Time: 75.583 ms

Protein sequences are stored in the database. pseq_iu_trigger fills len
and md5. The pseq_md5 index is UNIQUE above, but that was for testing
this problem. The behavior is the same when it is not unique.

clean_sequence(<sequence>) returns a canonical version of the sequence.
It's implemented in C and is marked immutable:

unison(at)csb-dev=> \df+ unison.clean_sequence
List of functions
Schema | Name | Result data type | Argument data types | Owner | Language | Source code | Description
--------+----------------+------------------+---------------------+----------+----------+-------------------+-------------
unison | clean_sequence | text | text | postgres | c | pg_clean_sequence |

unison(at)csb-dev=> select proname,provolatile from pg_proc where proname='clean_sequence';
proname | provolatile
----------------+-------------
clean_sequence | i

I'm using PostgreSQL's md5 (pg_catalog), which is also implemented in C
and marked immutable.

Now the problem: I'd like to look up a sequence by md5. When I do this
using the function in the predicate, I get:

unison(at)csb-dev=> select pseq_id from pseq where md5=md5(clean_sequence('SDVLELTDEN'));
pseq_id
---------
7387455
Time: 7044.530 ms

unison(at)csb-dev=> explain select pseq_id from pseq where md5=md5(clean_sequence('SDVLELTDEN'));
QUERY PLAN
--------------------------------------------------------------------
Seq Scan on pseq (cost=0.00..441233.47 rows=36772 width=4)
Filter: ((md5)::text = 'f6c80739f2f752132b89535c070192b9'::text)
Time: 1.493 ms

unison(at)csb-dev=> explain select pseq_id from pseq where md5=md5('SDVLELTDEN');
QUERY PLAN
--------------------------------------------------------------------
Seq Scan on pseq (cost=0.00..441233.47 rows=36772 width=4)
Filter: ((md5)::text = 'f6c80739f2f752132b89535c070192b9'::text)

Note that the function was replaced with its result in the plan.
However, when the literal value is used in the original query, I get a
plan that uses the index and much better performance:

unison(at)csb-dev=> explain select pseq_id from pseq where md5='f6c80739f2f752132b89535c070192b9';
QUERY PLAN
---------------------------------------------------------------------
Index Scan using pseq_md5 on pseq (cost=0.00..6.01 rows=1 width=4)
Index Cond: (md5 = 'f6c80739f2f752132b89535c070192b9'::bpchar)
(2 rows)

Time: 1.402 ms

unison(at)csb-dev=> select pseq_id from pseq where md5='f6c80739f2f752132b89535c070192b9';
pseq_id
---------
7387455
(1 row)

Time: 1.424 ms

So, it appears to me that although the planner recognized that the
function call is immutable during the query and may be replaced by its
return value, it didn't use this information when constructing the query
plan.

Feedback?

Thanks,
Reece

--
Reece Hart, http://harts.net/reece/, GPG:0x25EC91A0

Responses

Browse sfpug by date

  From Date Subject
Next Message Reece Hart 2006-05-09 18:16:58 Re: query plans and immutable functions
Previous Message Josh Berkus 2006-05-09 00:23:45 Carpool tommorrow