Skip site navigation (1) Skip section navigation (2)

query plans and immutable functions

From: Reece Hart <reece(at)harts(dot)net>
To: SF PostgreSQL <sfpug(at)postgresql(dot)org>
Subject: query plans and immutable functions
Date: 2006-05-09 17:51:00
Message-ID: 1147197060.27835.54.camel@tallac.gene.com (view raw or flat)
Thread:
Lists: sfpug
I've got a simple 1-predicate query that uses an index when the
condition is literal but does a seq scan when the condition uses the
result of an immutable function. This leads to terrible performance in
the latter case.  I believe that this is a bug candidate and I'll
forward it to pgsql-bugs if other sfpug folks agree.

Environs:

        unison(at)csb-dev=> select version();
                                                             version
        ------------------------------------------------------------------------------------------------------------------
         PostgreSQL 8.1.3 on x86_64-unknown-linux-gnu, compiled by GCC gcc (GCC) 4.0.2 20050901 (prerelease) (SUSE Linux)
        
        unison(at)csb-dev=> \d pseq
                                             Table "unison.pseq"
         Column  |           Type           |                       Modifiers
        ---------+--------------------------+--------------------------------------------------------
         pseq_id | integer                  | not null default nextval('pseq_pseq_id_seq'::regclass)
         seq     | text                     | not null
         len     | integer                  | not null
         md5     | character(32)            | not null
         added   | timestamp with time zone | not null default now()
        Indexes:
            "pseq_pkey" PRIMARY KEY, btree (pseq_id) CLUSTER
            "pseq_md5" UNIQUE, btree (md5)
            "pseq_seqhash" UNIQUE, btree (seqhash(seq))
            "pseq_added" btree (added)
            "pseq_len" btree (len)
        Triggers:
            pseq_iu_trigger BEFORE INSERT OR UPDATE ON pseq FOR EACH ROW EXECUTE PROCEDURE pseq_iu_trigger()
        
        unison(at)csb-dev=> analyze pseq;
        ANALYZE
        Time: 75.583 ms

Protein sequences are stored in the database.  pseq_iu_trigger fills len
and md5.  The pseq_md5 index is UNIQUE above, but that was for testing
this problem.  The behavior is the same when it is not unique.


clean_sequence(<sequence>) returns a canonical version of the sequence.
It's implemented in C and is marked immutable:

        unison(at)csb-dev=> \df+ unison.clean_sequence
                                                            List of functions
         Schema |      Name      | Result data type | Argument data types |  Owner   | Language |    Source code    | Description
        --------+----------------+------------------+---------------------+----------+----------+-------------------+-------------
         unison | clean_sequence | text             | text                | postgres | c        | pg_clean_sequence |
        
        unison(at)csb-dev=> select proname,provolatile from pg_proc where proname='clean_sequence';
            proname     | provolatile
        ----------------+-------------
         clean_sequence | i

I'm using PostgreSQL's md5 (pg_catalog), which is also implemented in C
and marked immutable.


Now the problem: I'd like to look up a sequence by md5. When I do this
using the function in the predicate, I get:

        unison(at)csb-dev=> select pseq_id from pseq where md5=md5(clean_sequence('SDVLELTDEN'));
         pseq_id
        ---------
         7387455
        Time: 7044.530 ms
        
        unison(at)csb-dev=> explain select pseq_id from pseq where md5=md5(clean_sequence('SDVLELTDEN'));
                                     QUERY PLAN
        --------------------------------------------------------------------
         Seq Scan on pseq  (cost=0.00..441233.47 rows=36772 width=4)
           Filter: ((md5)::text = 'f6c80739f2f752132b89535c070192b9'::text)
        Time: 1.493 ms
        
        unison(at)csb-dev=> explain select pseq_id from pseq where md5=md5('SDVLELTDEN');
                                     QUERY PLAN
        --------------------------------------------------------------------
         Seq Scan on pseq  (cost=0.00..441233.47 rows=36772 width=4)
           Filter: ((md5)::text = 'f6c80739f2f752132b89535c070192b9'::text)

Note that the function was replaced with its result in the plan.
However, when the literal value is used in the original query, I get a
plan that uses the index and much better performance:

        unison(at)csb-dev=> explain select pseq_id from pseq where md5='f6c80739f2f752132b89535c070192b9';
                                     QUERY PLAN
        ---------------------------------------------------------------------
         Index Scan using pseq_md5 on pseq  (cost=0.00..6.01 rows=1 width=4)
           Index Cond: (md5 = 'f6c80739f2f752132b89535c070192b9'::bpchar)
        (2 rows)
        
        Time: 1.402 ms
        
        unison(at)csb-dev=> select pseq_id from pseq where md5='f6c80739f2f752132b89535c070192b9';
         pseq_id
        ---------
         7387455
        (1 row)
        
        Time: 1.424 ms


So, it appears to me that although the planner recognized that the
function call is immutable during the query and may be replaced by its
return value, it didn't use this information when constructing the query
plan.

Feedback?

Thanks,
Reece
 
-- 
Reece Hart, http://harts.net/reece/, GPG:0x25EC91A0

Responses

sfpug by date

Next:From: Reece HartDate: 2006-05-09 18:16:58
Subject: Re: query plans and immutable functions
Previous:From: Josh BerkusDate: 2006-05-09 00:23:45
Subject: Carpool tommorrow

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group