Re: why partition pruning doesn't work?

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>, Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: why partition pruning doesn't work?
Date: 2018-07-15 17:02:22
Message-ID: 4114.1531674142@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp> writes:
> On 2018/06/19 2:05, Tom Lane wrote:
>> Or maybe what we should do is drop ExecLockNonLeafAppendTables/
>> ExecOpenAppendPartitionedTables entirely and teach InitPlan to do it.

> Hmm, for InitPlan to do what ExecOpenAppendPartitionedTables does, we'd
> have to have all the partitioned tables (contained in partitioned_rels
> fields of all Append/MergeAppend/ModifyTable nodes of a plan) also listed
> in a global list like rootResultRelations and nonleafResultRelations of
> PlannedStmt.

> Attached updated patch.

I've been looking at this patch, and while it's not unreasonable on its
own terms, I'm growing increasingly distressed at the ad-hoc and rather
duplicative nature of the data structures that have gotten stuck into
plan trees thanks to partitioning (the rootResultRelations and
nonleafResultRelations lists being prime examples).

It struck me this morning that a whole lot of the complication here is
solely due to needing to identify the right type of relation lock to take
during executor startup, and that THAT WORK IS TOTALLY USELESS. In every
case, we must already hold a suitable lock before we ever get to the
executor; either one acquired during the parse/plan pipeline, or one
re-acquired by AcquireExecutorLocks in the case of a cached plan.
Otherwise it's entirely possible that the plan has been invalidated by
concurrent DDL --- and it's not the executor's job to detect that and
re-plan; that *must* have been done upstream.

Moreover, it's important from a deadlock-avoidance standpoint that these
locks get acquired in the same order as they were acquired during the
initial parse/plan pipeline. I think it's reasonable to assume they were
acquired in RTE order in that stage, so AcquireExecutorLocks is OK. But,
if the current logic in the executor gets them in that order, it's both
non-obvious that it does so and horribly fragile if it does, seeing that
the responsibility for this is split across InitPlan,
ExecOpenScanRelation, and ExecLockNonLeafAppendTables.

So I'm thinking that what we really ought to do here is simplify executor
startup to just open all rels with NoLock, and get rid of any supporting
data structures that turn out to have no other use. (David Rowley's
nearby patch to create a properly hierarchical executor data structure for
partitioning info is likely to tie into this too, by removing some other
vestigial uses of those lists.)

I think that this idea has been discussed in the past, and we felt at
the time that having the executor take its own locks was a good safety
measure, and a relatively cheap one since the lock manager is pretty
good at short-circuiting duplicative lock requests. But those are
certainly not free. Moreover, I'm not sure that this is really a
safety measure at all: if the executor were really taking any lock
not already held, it'd be masking a DDL hazard.

To investigate this further, I made the attached not-meant-for-commit
hack to verify whether InitPlan and related executor startup functions
were actually taking any not-previously-held locks. I could only find
one such case: the parser always opens tables selected FOR UPDATE with
RowShareLock, but if we end up implementing the resulting row mark
with ROW_MARK_COPY, the executor is acquiring just AccessShareLock
(because ExecOpenScanRelation thinks it needs to acquire some lock).
The patch as presented hacks ExecOpenScanRelation to avoid that, and
it passes check-world.

What we'd be better off doing, if we go this route, is to install an
assertion-build-only test that verifies during relation_open(NoLock)
that some kind of lock is already held on the rel. That would protect
not only the executor, but a boatload of existing places that open
rels with NoLock on the currently-unverified assumption that a lock is
already held.

I'm also rather strongly tempted to add a field to RangeTblEntry
that records the appropriate lock strength to take, so that we don't
have to rely on keeping AcquireExecutorLocks' logic to decide on the
lock type in sync with whatever the parse/plan pipeline does. (One
could then imagine adding assertions in the executor that this field
shows a lock strength of at least X, in place of actually opening
the rel with X.)

BTW, there'd be a lot to be said for having InitPlan just open all
the rels and build an array of Relation pointers that parallels the
RTE list, rather than doing heap_opens in random places elsewhere.

regards, tom lane

Attachment Content-Type Size
verify-executor-locks-are-already-held.patch text/x-diff 5.7 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Dmitry Dolgov 2018-07-15 17:43:43 Re: [HACKERS] advanced partition matching algorithm for partition-wise join
Previous Message Dean Rasheed 2018-07-15 14:43:36 Re: [HACKERS] PATCH: multivariate histograms and MCV lists