Re: Function is called multiple times in subselect

From: Chris Campbell <chris(at)bignerdranch(dot)com>
To:
Cc: PostgreSQL Bugs List <pgsql-bugs(at)postgresql(dot)org>
Subject: Re: Function is called multiple times in subselect
Date: 2004-03-12 16:01:37
Message-ID: 4051DEE1.60109@bignerdranch.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Alex J. Avriette wrote:

> And if you are concerned about the query's cost vs its actual output
> (you weren't clear in your original message), there is the 'iscachable'
> pragma for functions.

I was concerned that my function was being called 4 times for each row
of the result. The cost of the function call is quite high, so the cost
of the query was 4 times what I expected.

So my concern was the query's cost.

SELECT query.i,
query.squared AS test1,
query.squared + 1 AS test2,
query.squared + 2 AS test3,
query.squared + 3 AS test4
FROM (
SELECT i,
square_it(i) AS squared
FROM foo
OFFSET 0
) query;

As Tom explained to me, the optimizer flattens the subselect, so it then
becomes:

SELECT i,
square_it(i) AS test1,
square_it(i) + 1 AS test2,
square_it(i) + 2 AS test3,
square_it(i) + 3 AS test4
FROM foo;

Thus, each reference to query.squared in the outer query is replaced
with a call to the function.

In this trivial example, that's not really a performance issue. But my
real-world function calculates about 8 values and returns them as a
record, which I then pick apart in the outer query. When the optimizer
flattens the subquery and each reference to the function result is
replaced with a call to the function, it makes for a very expensive query.

CREATE TYPE patient_balances_type AS (patient_id INTEGER,
account_id INTEGER,
date DATE,
due_now INTEGER,
future_due INTEGER,
copay_balance INTEGER,
expected_insurance INTEGER,
total_balance INTEGER,
contract_amount INTEGER,
real_due_now INTEGER,
real_future_due INTEGER);

CREATE OR REPLACE FUNCTION patient_balances(INTEGER, INTEGER, DATE)
RETURNS patient_balances_type AS '...' LANGUAGE 'plpgsql';

My actual query looked something like:

SELECT query.appointment_id AS appointment_id,
query.date,
query.start_time,
query.duration,
query.patient_id,
(query.bal).total_balance,
(query.bal).expected_insurance,
(query.bal).future_due,
(query.bal).due_now
FROM
(
SELECT appt.appointment_id,
appt.start_time,
appt.duration,
appt.date,
p.patient_id,
p.account_id,
patient_balances(p.patient_id,
p.account_id,
d.system_date) AS bal
FROM
patients AS p
JOIN appointments appt ON
(p.patient_id = appt.patient_id)
JOIN system_date d ON
(appt.date = d.system_date)
) query;

When that subquery was flattened, each reference to query.bal was
replaced with a call to patient_balances(). Which was pretty expensive.

Tom's suggestion off using OFFSET 0 to cause the optimizer not to
flatten the subquery is exactly what I was looking for.

Is there a better way to be picking apart the result of my function than
putting it in a subquery, since the subquery will be flattened by
default? There are a bunch of places I've done this that I need to go
back to now and add an OFFSET 0 because I didn't realize the optimizer
would negate my cleverly crafted record-dissecting subselect. :)

Or is there a way to advise the optimizer of the cost of my function, so
that it will choose to not flatten the subquery (since the total cost of
doing that will be higher)? Or should the optimizer be assuming that
function calls are fairly expensive by default, and not flattening
subqueries that have function calls? So it's not a bug, but maybe a
feature request? :)

Thanks!

- Chris

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Tom Lane 2004-03-12 16:33:19 Re: Function is called multiple times in subselect
Previous Message Tom Lane 2004-03-12 14:41:39 Re: Function is called multiple times in subselect