Re: why partition pruning doesn't work?

From: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
To: Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: why partition pruning doesn't work?
Date: 2018-06-14 11:23:42
Message-ID: CAKJS1f-Dqc_J5=Ukkmrkz_RoN_Lq+vzzppfDE51K7ACRo1rqqQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 14 June 2018 at 19:17, Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp> wrote:
> I had sent a patch to try to get rid of the open(NoLock) there a couple of
> months ago [1]. The idea was to both lock and open the relation in
> ExecNonLeafAppendTables, which is the first time all partitioned tables in
> a given Append node are locked for execution. Also, the patch makes it a
> responsibility of ExecEndAppend to release the relcache pins, so the
> recently added ExecDestroyPartitionPruneState would not be needed.

Robert and I briefly discussed something more complete a while back.
Not much detail was talked about, but the basic idea was to store the
Relation somewhere in the planner an executor that we could lookup by
rel index rather than having to relation_open all the time.

I just had a very quick look around and thought maybe RangeTblEntry
might be a good spot to store the Relation and current lock level that
we hold on that relation. This means that we can use
PlannerInfo->simple_rte_array in the planner and
EState->es_range_table in the executor. The latter of those would be
much nicer if we built an array instead of keeping the list (can
probably build that during InitPlan()). We could then get hold of a
Relation object when needed without having to do relation_open(...,
NoLock).

Code which currently looks like:

reloid = getrelid(scanrelid, estate->es_range_table);
rel = heap_open(reloid, lockmode);

In places like ExecOpenScanRelation, could be replaced with some
wrapper function call like: rte_open_rel(estate, scanrelid, lockmode);

This could also be used to ERROR out if rte_open_rel() was done
initially with NoLock. Right now there are so many places that we do
relation_open(..., NoLock) and write a comment /* Lock already taken
by ... */, which we hope is always true.

If the Relation is already populated in the RangeTblEntry then the
function would just return that, otherwise, we'd just look inside the
RangeTblEntry for the relation Oid and do a heap_open on it, and store
the lockmode that's held. We could then also consider getting of a
bunch of fields like boundinfo and nparts from RelOptInfo.

We could also perhaps do a WARNING about lock upgrade hazards in
there, at least maybe in debug builds.

However, I only spent about 10 mins looking into this, there may be
some giant holes in the idea. It would need much more research.

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Magnus Hagander 2018-06-14 11:36:40 Re: Logging transaction IDs for DDL.
Previous Message Nikhil Sontakke 2018-06-14 11:02:06 Re: pgsql: Store 2PC GID in commit/abort WAL recs for logical decoding