Re: Functionscan estimates

From: Neil Conway <neilc(at)samurai(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "Jim C(dot) Nasby" <decibel(at)decibel(dot)org>, Alvaro Herrera <alvherre(at)dcc(dot)uchile(dot)cl>, Josh Berkus <josh(at)agliodbs(dot)com>, Michael Fuhr <mike(at)fuhr(dot)org>, pgsql-performance(at)postgresql(dot)org
Subject: Re: Functionscan estimates
Date: 2005-04-10 05:44:00
Message-ID: 4258BD20.7090907@samurai.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-performance

Tom Lane wrote:
> The larger point is that writing an estimator for an SRF is frequently a
> task about as difficult as writing the SRF itself

True, although I think this doesn't necessarily kill the idea. If
writing an estimator for a given SRF is too difficult, the user is no
worse off than they are today. Hopefully there would be a fairly large
class of SRFs for which writing an estimator would be relatively simple,
and result in improved planner behavior.

> I don't foresee a whole lot of use of an estimator hook designed as
> proposed here. In particular, if the API is such that we can only
> use the estimator when all the function arguments are plan-time
> constants, it's not going to be very helpful.

Yes :( One approach might be to break the function's domain into pieces
and have the estimator function calculate the estimated result set size
for each piece. So, given a trivial function like:

foo(int):
if $1 < 10 then produce 100 rows
else produce 10000 rows

If the planner has encoded the distribution of input tuples to the
function as a histogram, it could invoke the SRF's estimator function
for the boundary values of each histogram bucket, and use that to get an
idea of the function's likely result set size at runtime.

And yes, the idea as sketched is totally unworkable :) For one thing,
the difficulty of doing this grows rapidly as the number of arguments to
the function increases. But perhaps there is some variant of this idea
that might work...

Another thought is that the estimator could provide information on the
cost of evaluating the function, the number of tuples produced by the
function, and even the distribution of those tuples.

BTW, why is this on -performance? It should be on -hackers.

-Neil

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Oliver Jowett 2005-04-10 05:54:16 Re: prepared statements don't log arguments?
Previous Message Neil Conway 2005-04-10 05:25:25 Re: Functionscan estimates

Browse pgsql-performance by date

  From Date Subject
Next Message John A Meinel 2005-04-10 13:09:46 Re: performance - triggers, row existence etc.
Previous Message Neil Conway 2005-04-10 05:25:25 Re: Functionscan estimates