Re: Parameterized-path cost comparisons need some work

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Parameterized-path cost comparisons need some work
Date: 2012-02-29 18:40:05
Message-ID: 4457.1330540805@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> On Tue, Feb 28, 2012 at 5:35 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> The most obvious thing to do about this is to consider that one path can
>> dominate another on cost only if it is both cheaper *and* produces no
>> more rows.

> Hmm. Are you sure that's the right rule?

On further reflection it had seemed like we might have to treat the
rowcount as another independent metric of value.

> I am having trouble
> constructing an example, but I feel like there might be cases where
> it's possible to have path A, not parameterized, path B, parameterized
> by R, and path C, parameterized by S, and maybe also path D,
> parameterized by both R and S. In that case, I feel like paths B and
> C are incomparable.

Indeed, and the code already knows that. However, in this example, path
A is capable of dominating the other three (being strictly less
parameterized than them), and B and C are each capable of dominating D.
The problem is just that I'd neglected to consider that rowcount now
also becomes a figure of merit.

> Even if one is cheaper than the other and also
> produces fewer rows, they're not the same rows; so the expense of a
> subsequent join might be different with one vs. the other.

Yeah, if the quals being used are different, that's possible in
principle; but I don't foresee being able to estimate such a thing.
I think just using the number of rows is as good as we're likely to
be able to do.

> Even in the simple case where there's only one possible
> parameterization, it seems very difficult to compare the cost of a
> parameterized path A with the cost of an unparameterized path B.

I don't believe this. We can certainly compare costs, and we can
certainly compare rowcount. The possibility that the specific set
of rows selected might affect subsequent join costs seems to me to
be too second-order to justify ignoring these first-order differences.

The variant of the argument that had occurred to me is that if you
consider that rowcount is an independent metric, then it might be that a
less-parameterized path can never dominate a more-parameterized path,
because even if the former is cheaper it should always generate more
rows. However, I'm not convinced that I believe that --- in particular,
with a poor choice of parameterizing quals it might not be true.
If we did have a more-parameterized path that wasn't any more selective,
we'd really want add_path to notice that and throw it out.

> I almost wonder if the bottom-up approach is the wrong way to tackle
> this entirely.

I don't find this idea too attractive ... and in any case we're not
doing rm -rf src/backend/optimizer/ and start over for 9.2.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2012-02-29 18:53:03 Re: review: CHECK FUNCTION statement
Previous Message Peter Eisentraut 2012-02-29 18:32:48 Re: controlling the location of server-side SSL files