Re: Partition-wise join for join between (declaratively) partitioned tables

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>
Cc: Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>, Etsuro Fujita <fujita(dot)etsuro(at)lab(dot)ntt(dot)co(dot)jp>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Rafia Sabih <rafia(dot)sabih(at)enterprisedb(dot)com>, Rajkumar Raghuwanshi <rajkumar(dot)raghuwanshi(at)enterprisedb(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Partition-wise join for join between (declaratively) partitioned tables
Date: 2017-09-12 19:09:38
Message-ID: CA+TgmobEygfb9wdz0nBZF-pCoVaoQK0w2uKfRsb2yjiq4WZ0sg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Sep 12, 2017 at 3:46 AM, Amit Langote
<Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp> wrote:
> In this case, AcquireExecutorLocks will lock all the relations in
> PlannedStmt.rtable, which must include all partitioned tables of all
> partition trees involved in the query. Of those, it will lock the tables
> whose RT indexes appear in PlannedStmt.nonleafResultRelations with
> RowExclusiveLock mode. PlannedStmt.nonleafResultRelations is a global
> list of all partitioned table RT indexes obtained by concatenating
> partitioned_rels lists of all ModifyTable nodes involved in the query
> (set_plan_refs does that). We need to distinguish nonleafResultRelations,
> because we need to take the stronger lock on a given table before any
> weaker one if it happens to appear in the query as a non-result relation
> too, to avoid lock strength upgrade deadlock hazard.

Hmm. The problem with this theory in my view is that it doesn't
explain why InitPlan() and ExecOpenScanRelation() lock the relations
instead of just assuming that they are already locked either by
AcquireExecutorLocks or by planning. If ExecLockNonLeafAppendTables()
doesn't really need to take locks, then ExecOpenScanRelation() must
not need to do it either. We invented ExecLockNonLeafAppendTables()
on the occasion of removing the scans of those tables which would
previously have caused ExecOpenScanRelation() to be invoked, so as to
keep the locking behavior unchanged.

AcquireExecutorLocks() looks like an odd bit of code to me. The
executor itself locks result tables in InitPlan() and then everything
else during InitPlan() and all of the others later on while walking
the plan tree -- comments in InitPlan() say that this is to avoid a
lock upgrade hazard if a result rel is also a source rel. But
AcquireExecutorLocks() has no such provision; it just locks everything
in RTE order. In theory, that's a deadlock hazard of another kind, as
we just talked about in the context of EIBO. In fact, expanding in
bound order has made the situation worse: before, expansion order and
locking order were the same, so maybe having AcquireExecutorLocks()
work in RTE order coincidentally happened to give the same result as
the executor code itself as long as there are no result relations.
But this is certainly not true any more. I'm not sure it's worth
expending a lot of time on this -- it's evidently not a problem in
practice, or somebody probably would've complained before now.

But that having been said, I don't think we should assume that all the
locks taken from the executor are worthless because plancache.c will
always do the job for us. I don't know of a case where we execute a
saved plan without going through the plan cache, but that doesn't mean
that there isn't one or that there couldn't be one in the future.
It's not the job of these partitioning patches to whack around the way
we do locking in general -- they should preserve the existing behavior
as much as possible. If we want to get rid of the locking in the
executor altogether, that's a separate discussion where, I have a
feeling, there will prove to be better reasons for the way things are
than we are right now supposing.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2017-09-12 19:11:45 Re: Clarification in pg10's pgupgrade.html step 10 (upgrading standby servers)
Previous Message Pavel Stehule 2017-09-12 19:07:36 Re: Re: issue: record or row variable cannot be part of multiple-item INTO list