Re: why partition pruning doesn't work?

From: Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>, Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: why partition pruning doesn't work?
Date: 2018-07-18 08:59:52
Message-ID: ab10ce44-18b0-244a-c9c8-67d7020e2ba9@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2018/07/16 2:02, Tom Lane wrote:
> Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp> writes:
>> On 2018/06/19 2:05, Tom Lane wrote:
>>> Or maybe what we should do is drop ExecLockNonLeafAppendTables/
>>> ExecOpenAppendPartitionedTables entirely and teach InitPlan to do it.
>
>> Hmm, for InitPlan to do what ExecOpenAppendPartitionedTables does, we'd
>> have to have all the partitioned tables (contained in partitioned_rels
>> fields of all Append/MergeAppend/ModifyTable nodes of a plan) also listed
>> in a global list like rootResultRelations and nonleafResultRelations of
>> PlannedStmt.
>
>> Attached updated patch.
>
> I've been looking at this patch, and while it's not unreasonable on its
> own terms, I'm growing increasingly distressed at the ad-hoc and rather
> duplicative nature of the data structures that have gotten stuck into
> plan trees thanks to partitioning (the rootResultRelations and
> nonleafResultRelations lists being prime examples).
>
> It struck me this morning that a whole lot of the complication here is
> solely due to needing to identify the right type of relation lock to take
> during executor startup, and that THAT WORK IS TOTALLY USELESS.

I agree. IIUC, that's also the reason why PlannedStmt had to grow a
resultRelations field in the first place.

> In every
> case, we must already hold a suitable lock before we ever get to the
> executor; either one acquired during the parse/plan pipeline, or one
> re-acquired by AcquireExecutorLocks in the case of a cached plan.
> Otherwise it's entirely possible that the plan has been invalidated by
> concurrent DDL --- and it's not the executor's job to detect that and
> re-plan; that *must* have been done upstream.
>
> Moreover, it's important from a deadlock-avoidance standpoint that these
> locks get acquired in the same order as they were acquired during the
> initial parse/plan pipeline. I think it's reasonable to assume they were
> acquired in RTE order in that stage, so AcquireExecutorLocks is OK. But,
> if the current logic in the executor gets them in that order, it's both
> non-obvious that it does so and horribly fragile if it does, seeing that
> the responsibility for this is split across InitPlan,
> ExecOpenScanRelation, and ExecLockNonLeafAppendTables.
>
> So I'm thinking that what we really ought to do here is simplify executor
> startup to just open all rels with NoLock, and get rid of any supporting
> data structures that turn out to have no other use. (David Rowley's
> nearby patch to create a properly hierarchical executor data structure for
> partitioning info is likely to tie into this too, by removing some other
> vestigial uses of those lists.)

I agree it would be nice to be able to get rid of those lists.

> I think that this idea has been discussed in the past, and we felt at
> the time that having the executor take its own locks was a good safety
> measure, and a relatively cheap one since the lock manager is pretty
> good at short-circuiting duplicative lock requests. But those are
> certainly not free. Moreover, I'm not sure that this is really a
> safety measure at all: if the executor were really taking any lock
> not already held, it'd be masking a DDL hazard.
>
> To investigate this further, I made the attached not-meant-for-commit
> hack to verify whether InitPlan and related executor startup functions
> were actually taking any not-previously-held locks. I could only find
> one such case: the parser always opens tables selected FOR UPDATE with
> RowShareLock, but if we end up implementing the resulting row mark
> with ROW_MARK_COPY, the executor is acquiring just AccessShareLock
> (because ExecOpenScanRelation thinks it needs to acquire some lock).
> The patch as presented hacks ExecOpenScanRelation to avoid that, and
> it passes check-world.
>
> What we'd be better off doing, if we go this route, is to install an
> assertion-build-only test that verifies during relation_open(NoLock)
> that some kind of lock is already held on the rel. That would protect
> not only the executor, but a boatload of existing places that open
> rels with NoLock on the currently-unverified assumption that a lock is
> already held.

+1

> I'm also rather strongly tempted to add a field to RangeTblEntry
> that records the appropriate lock strength to take, so that we don't
> have to rely on keeping AcquireExecutorLocks' logic to decide on the
> lock type in sync with whatever the parse/plan pipeline does. (One
> could then imagine adding assertions in the executor that this field
> shows a lock strength of at least X, in place of actually opening
> the rel with X.)
>
> BTW, there'd be a lot to be said for having InitPlan just open all
> the rels and build an array of Relation pointers that parallels the
> RTE list, rather than doing heap_opens in random places elsewhere.

+1 to this. Actually I had written such a patch in place of the one I
posted on this thread, but wasn't confident enough that I had found and
fixed all the places in the executor that would use the Relation pointer
from there instead of fetching it themselves.

Thanks,
Amit

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Dean Rasheed 2018-07-18 09:14:56 Re: PG 10: could not generate random cancel key
Previous Message Alexander Korotkov 2018-07-18 08:59:02 Re: Bug in gin insert redo code path during re-compression of empty gin data leaf pages