Re: Performance problems with multiple layers of functions

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Jim C(dot) Nasby" <jnasby(at)pervasive(dot)com>
Cc: Svenne Krap <svenne(at)krap(dot)dk>, pgsql-performance(at)postgresql(dot)org
Subject: Re: Performance problems with multiple layers of functions
Date: 2006-03-24 16:02:34
Message-ID: 5090.1143216154@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

"Jim C. Nasby" <jnasby(at)pervasive(dot)com> writes:
> On Fri, Mar 24, 2006 at 01:49:17PM +0100, Svenne Krap wrote:
>> explain select dataset_id, entity, sum(amount) from entrydata_current
>> where flow_direction in (select * from outflow_direction(dataset_id))
>> and dataset_id in (select * from get_dataset_ids(122)) group by
>> dataset_id, entity;

> The issue is that the planner has no way to know what's comming back
> from get_dataset_ids.

More specifically, the first IN is not optimizable into a join because
the results of the sub-SELECT depend on the current row of the outer
query. The second IN is being optimized fine, but the first one is
what's killing you.

I'd suggest refactoring the functions into something that returns a set
of outflow_direction/dataset_id pairs, and then phrase the query as

where (flow_direction, dataset_id) in (select * from new_func(122))

You could do it without refactoring:

where (flow_direction, dataset_id) in
(select outflow_direction(id),id from get_dataset_ids(122) id)

however this won't work if outflow_direction() is a plpgsql function
because of limitations in plpgsql's set-function support. (It will work
if outflow_direction() is a SQL function, or you could kluge it as a SQL
function wrapper around a plpgsql function.)

regards, tom lane

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Tom Lane 2006-03-24 16:25:55 Re: limitation using LIKE on ANY(array)
Previous Message Tom Lane 2006-03-24 15:51:44 Re: Array performance