Re: Function execution costs 'n all that

From: Mark Cave-Ayland <mark(dot)cave-ayland(at)ilande(dot)co(dot)uk>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Brian Hurt <bhurt(at)janestcapital(dot)com>, Neil Conway <neilc(at)samurai(dot)com>, Richard Troy <rtroy(at)ScienceTools(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Function execution costs 'n all that
Date: 2007-01-16 08:23:12
Message-ID: 1168935792.5652.46.camel@mca-desktop
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, 2007-01-15 at 15:05 -0500, Tom Lane wrote:
> Brian Hurt <bhurt(at)janestcapital(dot)com> writes:
> > Non-developer here, but we use a lot of plpgsql functions at work. And
> > the functions we use fall into two broad, ill-defined catagories-
> > "expensive" functions and "cheap" functions. What I'd like as a user is
> > some way to tell the planner "this function is expensive- prefer plans
> > which call this function less even if they're otherwise more expensive"
> > or "this function is cheap, prefer plans that are otherwise less
> > expensive even if they call this function more often". Precise cost
> > estimates aren't that important, IMHO.
>
> Right, so a plain constant cost would be plenty for your situation.
>
> I suspect there's an 80/20 rule at work here --- the estimator-function
> side of this will take most of the effort to design/implement, but not
> get used nearly as much as the plain-constant form ... maybe we should
> just do the constant for starters and see how many people really want to
> write C-code estimators ...
>
> regards, tom lane

Hi Tom et al,

Having worked with stored procedures on large datasets for reporting, I
would say that it would be useful to have a non-constant estimator for
the number of rows, whereas a single CPU cost constant should be fine.
Where I have struggled with this has been joining onto slightly more
exotic queries when doing large scale data processing as part of a
custom report or an application upgrade.

Using PL/PGSQL I would find it useful to have access to the constants
passed into a function to be used to help provide a row count estimate
(typically useful for passing in table/column names), e.g.

SELECT * FROM my_func('my_table1') AS t1, my_table2 AS t2 WHERE t1.id =
t2.id;

CREATE FUNCTION my_func(text) AS $$
...
$$ LANGUAGE 'plpgsql' COST 1.0 ROWS my_func_row_cost;

In my cost function, I could then estimate the number of rows using
something like below, where all constants are passed into the cost
function as parameters, e.g.:

CREATE FUNCTION my_func_row_cost(text) AS $$
DECLARE
foo bigint;
BEGIN
EXECUTE INTO foo 'SELECT COUNT(*) FROM ' || quote_literal($1);
RETURN foo;
END;
$$ LANGUAGE 'plpgsql';

In the case where a parameter was not a constant but a column name, then
it would be reasonable in my mind to simply replace that parameter with
NULL when passing to the row cost function, e.g.

SELECT * FROM my_table1 WHERE my_table1.junk = (SELECT
my_func(my_table1.name));

In this case, the text parameter passed to my_func_row_cost would be
replaced by NULL to indicate that it was non-constant.

Of course, even with constants passed upon input, it still may not be
possible to calculate a number of rows that can be returned - it could
be the case that the only parameter passed to cost function has been
converted to NULL because it is a column name. Perhaps in this case it
would be useful to specify returning NULL from my_func_row_cost means "I
can't return anything meaningful, so use the fallback values".

Kind regards,

Mark.

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Magnus Hagander 2007-01-16 09:09:46 Re: O_DIRECT support for Windows
Previous Message Tom Lane 2007-01-16 05:29:22 Re: [HACKERS] Checkpoint request failed on version 8.2.1.