Re: [HACKERS] path toward faster partition pruning

From: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>, Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>, Rajkumar Raghuwanshi <rajkumar(dot)raghuwanshi(at)enterprisedb(dot)com>, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, Jesper Pedersen <jesper(dot)pedersen(at)redhat(dot)com>, Amit Langote <amitlangote09(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Beena Emerson <memissemerson(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] path toward faster partition pruning
Date: 2018-03-02 15:54:26
Message-ID: CAKJS1f_sXkUEU=Ft_a_O5znpmUSShvvtTs9MYtOJV9kjX00Kcg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2 March 2018 at 08:13, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> 2. All processing of clauses should happen at plan time, not run time,
> so that we're not repeating work. If, for example, a prepared query
> is executed repeatedly, the clauses should get fully processed when
> it's planned, and then the only thing that should happen at execution
> time is that we take the values to which we now have access and use
> them to decide what to prune. With this design, we're bound to repeat
> at least the portion of the work done by get_partitions_from_clauses()
> at runtime, and as things stand, it looks like the current version of
> the run-time partition pruning patch repeats the
> generate_partition_clauses() work at runtime as well.
>
> It seems to me as though what we ought to be doing is extracting
> clauses that are of the correct general form and produces a list of
> <partition-column-index>, <partition-operator-strategy>, <expression>
> triplets sorted by increasing partition column index and operator
> strategy number and in a form that can be represented as a Node tree.
> This can be attached to the plan for use at runtime or, if there are
> at least some constants among the expressions, used at plan time for
> partition exclusion. So the main interfaces would be something like
> this:

We did try this already as I also thought the same a while back and
even wrote a broken patch to do that. The thing is,
PartitionClauseInfo is designed to be optimal for having
get_partitions_from_clauses() called on it multiple times over, as
will likely happen when run-time pruning is pruning away unneeded
partitioned during a Parameterized nested loop join. To make it a Node
type, it's not quite as simple as changing arrays to Lists as the
keyclauses List contains PartClause, which are also not a Node type,
and that struct contains a FmgrInfo, which is also not a node type, so
we'd need to go try to make all those Node types (I doubt that's going
to happen) ... but ...

It's probably not impossible to come up with some intermediate
partially processed type that can be a Node type. Ideally, this could
be quickly converted into a PartitionClauseInfo during execution. The
problem with this is that this adds a rather silly additional step
during query planning to convert the intermediate processed list into
the fully processed PaitionClauseInfo that get_partitions_from_clauses
would still need.

I don't think building it is going to cost a huge amount. Presumably,
there are not many partitioned tables with 10 rows, so probably having
the get_partitions_from_clauses work as quickly as possible is better
than saving 100 nanoseconds in executor startup.

That being said, there's still a small issue with the run-time pruning
patch which is caused by me not pre-processing the clauses during
planning. Ideally, I'd be able to pre-process at least enough to
determine if any Params match the partition key so that I know if
run-time pruning can be used or not. As of now, I'm not doing that as
it seems wasteful to pre-process during planning just to get the Param
Ids out, then not be able to carry the pre-processed ones over to the
executor. We also can't really reuse the pre-processed state that was
generated during the planner's calls to generate_partition_clauses()
since we'll additionally also be passing in the parameterized path
clauses as well as the baserestrictinfo clauses.

Given the above, I'm happy to listen to ideas on this, as I'm really
not sure what's best here.

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David Rowley 2018-03-02 16:03:18 Re: [HACKERS] path toward faster partition pruning
Previous Message Tom Lane 2018-03-02 15:49:13 Re: perltidy version