Re: Parameterized-path cost comparisons need some work

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Parameterized-path cost comparisons need some work
Date: 2012-04-12 19:27:57
Message-ID: 27109.1334258877@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I wrote:
> So I'm back to thinking we need to look explicitly at the rowcount
> comparison as well as all the existing conditions in add_path.

> One annoying thing about that is that it will reduce the usefulness of
> add_path_precheck, because that's called before we compute the rowcount
> estimates (and indeed not having to make the rowcount estimates is one
> of the major savings from the precheck). I think what we'll have to do
> is assume that a difference in parameterization could result in a
> difference in rowcount, and hence only a dominant path with exactly the
> same parameterization can result in failing the precheck.

I've been experimenting some more with this, and have observed that in
the test cases I'm using, adding rowcount as an additional criterion in
add_path doesn't cost much of anything: it doesn't seem to affect the
runtime significantly, and it only seldom changes the keep/reject
decisions. So that's good news.

Unfortunately, the precheck situation is actually worse than I thought:
there are plenty of cases where parameterized paths can have the exact
same parameterization (that is, same sets of required outer rels) and
yet have different row estimates, because one might use different join
clauses than the other. All you need to be at risk is more than one
join clause between the same two rels, with those clauses matching
different indexes or index columns. This entirely destroys the logic of
add_path_precheck as currently constituted, because it implies we can
never reject a parameterized path before computing its rowcount.

I said upthread that I wouldn't cry if we got rid of add_path_precheck
again, but it still looks like that would cost us a noticeable hit in
planning speed. I've considered three other alternatives:

1. Lobotomize add_path_precheck so it always returns true for a
parameterized path. This sounds horrid, but in the test cases I'm using
it seems that this only results in doing the full path construction for
a very small number of additional paths.

2. Refactor so that we obtain the row estimate during the first not the
second cost estimation step. This doesn't look promising; I have not
actually coded and tested it, but eyeballing gprof numbers for the
current code suggests it would give back a considerable percentage of
the savings from having a precheck at all.

3. Rearrange plan generation so that a parameterized path always uses
all join clauses available from the specified outer rels. (Any that
don't work as indexquals would have to be applied as filter conditions.)
If we did that, then we would be back to a situation where all paths
with the same parameterization should yield the same rowcount, thus
justifying letting add_path_precheck work as it does now.

#3 would amount to pushing quals that would otherwise be checked at the
nestloop join node down to the lowest inner-relation level where they
could be checked. This is something I'd suspected would be a good idea
to start with, but hadn't gotten around to implementing for non-index
quals. It had not occurred to me that it might simplify cost estimation
to always do that.

I'm going to take a closer look at #3, but it may not be practical to
try to squeeze it into 9.2; if not, I think #1 will do as a stopgap.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Martijn van Oosterhout 2012-04-12 22:02:37 Re: Last gasp
Previous Message Greg Smith 2012-04-12 17:38:15 Re: Last gasp