Re: query plans and immutable functions

From: elein <elein(at)varlena(dot)com>
To: Reece Hart <reece(at)harts(dot)net>
Cc: SF PostgreSQL <sfpug(at)postgresql(dot)org>
Subject: Re: query plans and immutable functions
Date: 2006-05-09 18:49:46
Message-ID: 20060509184946.GS26910@varlena.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: sfpug

Do you get the same results with explain analyze?
This makes a difference. The immutability of a function
is a hint. It is used when there are multiple calls
to the function in the same statement. It may have
no affect on a single call in one statement.

Also try different sequences to avoid caching--the
second time usually is faster than the first.

After checking explain analyze you may want to fiddle
with the cost of the clean_sequence function if possible.
However, I don't know how that is done or if it still
possible. I knew it as a postgres feature, not a
postgresql feature.

--elein
elein(at)varlena(dot)com

On Tue, May 09, 2006 at 10:51:00AM -0700, Reece Hart wrote:
> I've got a simple 1-predicate query that uses an index when the condition is
> literal but does a seq scan when the condition uses the result of an immutable
> function. This leads to terrible performance in the latter case. I believe
> that this is a bug candidate and I'll forward it to pgsql-bugs if other sfpug
> folks agree.
>
> Environs:
>
> unison(at)csb-dev=> select version();
> version
> ------------------------------------------------------------------------------------------------------------------
> PostgreSQL 8.1.3 on x86_64-unknown-linux-gnu, compiled by GCC gcc (GCC) 4.0.2 20050901 (prerelease) (SUSE Linux)
>
> unison(at)csb-dev=> \d pseq
> Table "unison.pseq"
> Column | Type | Modifiers
> ---------+--------------------------+--------------------------------------------------------
> pseq_id | integer | not null default nextval('pseq_pseq_id_seq'::regclass)
> seq | text | not null
> len | integer | not null
> md5 | character(32) | not null
> added | timestamp with time zone | not null default now()
> Indexes:
> "pseq_pkey" PRIMARY KEY, btree (pseq_id) CLUSTER
> "pseq_md5" UNIQUE, btree (md5)
> "pseq_seqhash" UNIQUE, btree (seqhash(seq))
> "pseq_added" btree (added)
> "pseq_len" btree (len)
> Triggers:
> pseq_iu_trigger BEFORE INSERT OR UPDATE ON pseq FOR EACH ROW EXECUTE PROCEDURE pseq_iu_trigger()
>
> unison(at)csb-dev=> analyze pseq;
> ANALYZE
> Time: 75.583 ms
>
>
> Protein sequences are stored in the database. pseq_iu_trigger fills len and
> md5. The pseq_md5 index is UNIQUE above, but that was for testing this
> problem. The behavior is the same when it is not unique.
>
>
> clean_sequence(<sequence>) returns a canonical version of the sequence. It's
> implemented in C and is marked immutable:
>
> unison(at)csb-dev=> \df+ unison.clean_sequence
> List of functions
> Schema | Name | Result data type | Argument data types | Owner | Language | Source code | Description
> --------+----------------+------------------+---------------------+----------+----------+-------------------+-------------
> unison | clean_sequence | text | text | postgres | c | pg_clean_sequence |
>
> unison(at)csb-dev=> select proname,provolatile from pg_proc where proname='clean_sequence';
> proname | provolatile
> ----------------+-------------
> clean_sequence | i
>
>
> I'm using PostgreSQL's md5 (pg_catalog), which is also implemented in C and
> marked immutable.
>
>
> Now the problem: I'd like to look up a sequence by md5. When I do this using
> the function in the predicate, I get:
>
> unison(at)csb-dev=> select pseq_id from pseq where md5=md5(clean_sequence('SDVLELTDEN'));
> pseq_id
> ---------
> 7387455
> Time: 7044.530 ms
>
> unison(at)csb-dev=> explain select pseq_id from pseq where md5=md5(clean_sequence('SDVLELTDEN'));
> QUERY PLAN
> --------------------------------------------------------------------
> Seq Scan on pseq (cost=0.00..441233.47 rows=36772 width=4)
> Filter: ((md5)::text = 'f6c80739f2f752132b89535c070192b9'::text)
> Time: 1.493 ms
>
> unison(at)csb-dev=> explain select pseq_id from pseq where md5=md5('SDVLELTDEN');
> QUERY PLAN
> --------------------------------------------------------------------
> Seq Scan on pseq (cost=0.00..441233.47 rows=36772 width=4)
> Filter: ((md5)::text = 'f6c80739f2f752132b89535c070192b9'::text)
>
>
> Note that the function was replaced with its result in the plan. However, when
> the literal value is used in the original query, I get a plan that uses the
> index and much better performance:
>
> unison(at)csb-dev=> explain select pseq_id from pseq where md5='f6c80739f2f752132b89535c070192b9';
> QUERY PLAN
> ---------------------------------------------------------------------
> Index Scan using pseq_md5 on pseq (cost=0.00..6.01 rows=1 width=4)
> Index Cond: (md5 = 'f6c80739f2f752132b89535c070192b9'::bpchar)
> (2 rows)
>
> Time: 1.402 ms
>
> unison(at)csb-dev=> select pseq_id from pseq where md5='f6c80739f2f752132b89535c070192b9';
> pseq_id
> ---------
> 7387455
> (1 row)
>
> Time: 1.424 ms
>
>
>
> So, it appears to me that although the planner recognized that the function
> call is immutable during the query and may be replaced by its return value, it
> didn't use this information when constructing the query plan.
>
> Feedback?
>
> Thanks,
> Reece
>
> --
> Reece Hart, http://harts.net/reece/, GPG:0x25EC91A0
>

In response to

Browse sfpug by date

  From Date Subject
Next Message Josh Berkus 2006-05-10 00:16:07 Re: query plans and immutable functions
Previous Message Reece Hart 2006-05-09 18:16:58 Re: query plans and immutable functions