Re: Parameterized-path cost comparisons need some work

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Parameterized-path cost comparisons need some work
Date: 2012-04-17 16:14:41
Message-ID: 10175.1334679281@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> On Thu, Apr 12, 2012 at 3:27 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> 3. Rearrange plan generation so that a parameterized path always uses
>> all join clauses available from the specified outer rels.  (Any that
>> don't work as indexquals would have to be applied as filter conditions.)
>> If we did that, then we would be back to a situation where all paths
>> with the same parameterization should yield the same rowcount, thus
>> justifying letting add_path_precheck work as it does now.
>>
>> #3 would amount to pushing quals that would otherwise be checked at the
>> nestloop join node down to the lowest inner-relation level where they
>> could be checked.  This is something I'd suspected would be a good idea
>> to start with, but hadn't gotten around to implementing for non-index
>> quals.  It had not occurred to me that it might simplify cost estimation
>> to always do that.

> This seems like it could be quite a significant win.

I've been hacking away on a patch to do this, and attached is something
that I think is pretty close to committable. It needs another going-over
and some new regression test cases, but it seems to work, and it fixes a
number of things besides the above-mentioned issue. In particular, this
has a much more principled approach than HEAD does to the problem of where
to place parameterizable join clauses in the plan tree; that can be seen
in the one change in the existing regression tests, where we no longer
generate a redundant upper-level copy of an OR join clause that the old
code wasn't bright enough to get rid of.

The patch is a bit large because I chose to revise the data representation.
Instead of each Path having its own required_outer, rows, and
param_clauses fields, now a parameterized Path has a pointer to a
ParamPathInfo struct that it shares with other Paths for the same rel and
the same parameterization. This guarantees that such paths will have the
same estimated rowcount, because we only compute that once per
parameterization, which should save some work as well as making the world
safe for add_path_precheck.

The only place where this approach proved a bit tricky was in handling
AppendPaths and MergeAppendPaths, which didn't surprise me because that
was a rough spot for the old way too (and indeed they aren't handled
completely correctly in HEAD). A parameterized path is now *required*
to enforce all clauses that the join clause movement rules assign to it;
but Append and MergeAppend don't do qual checking, and I didn't feel like
changing that. The method that I have settled on is to require all child
paths of a parameterized append to have the exact same parameterization,
IOW we push the qual checks down below the append. Now the interesting
point about that is that we want to support Appends wherein some children
are seqscans and some are indexscans (consider a partitioned table where
the parent is a dummy empty table with no indexes). The "raw" situation
there is that we'll have a plain seqscan path for the parent and then a
collection of similarly-parameterized indexscan paths for the live
partition children. To make it possible to convert that case into a
parameterized append path, I added parameterization support to seqscans
and then wrote "reparameterize_path", which changes a Path to increase
its parameterization level (and thereby assign it more pushed-down join
clauses to check at runtime). That allows us to reconfigure the seqscan
to match the other children. I've also added such support to
SubqueryScan, on the grounds that the other common use of append paths is
UNION ALL across subqueries. We might later want to add parameterization
support to other path types, but this seemed like enough for the moment.

BTW, after writing the code for it I decided to remove creation of
parameterized MergeAppendPaths from allpaths.c, though there is still some
support for them elsewhere. On reflection it seemed to me that the code
was willing to create far too many of these, much more than their
potential usefulness could justify (remember that parameterized paths must
be on the inside of a nestloop, so their sort ordering is typically of
marginal use). We can put that back if we can think of a more restrictive
heuristic for when to create them.

The core of the patch is in the new functions get_baserel_parampathinfo
and get_joinrel_parampathinfo, which look up or construct ParamPathInfos,
and join_clause_is_parameterizable_for and
join_clause_is_parameterizable_within, which control
movement of parameterizable join clauses. (I'm not that thrilled with the
names of the latter two functions, anybody got a better idea?) The rest
of it is pretty much boilerplate changes and replacing ad-hoc logic with
uses of this stuff.

I have a couple of other ideas in mind in the way of mop-up, but they are
not in this patch to keep it from bloating even more. First, I'm thinking
we should get rid of RelOptInfo.baserestrictcost, thus forcing all scan
cost estimators to invoke cost_qual_eval explicitly. That field has been
vestigial from a planning-speed standpoint for a long time, ever since we
started caching eval costs in RestrictInfos. The most commonly used cost
estimators don't use it anymore as of this patch, and it'd likely be best
to have a uniform coding pattern there. Second, I've gotten dissatisfied
with the terminology "required_outer" that was used in the original param
plans patch. I'm considering a global search and replace with
"param_relids" or some variant of that.

Comments?

regards, tom lane

Attachment Content-Type Size
param-plans-revision-1.patch text/x-patch 183.6 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jameison Martin 2012-04-17 16:22:42 patch submission: truncate trailing nulls from heap rows to reduce the size of the null bitmap
Previous Message Stephen Frost 2012-04-17 15:27:16 Re: Gsoc2012 idea, tablesample