Re: Partition-wise join for join between (declaratively) partitioned tables

From: Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>
To: Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Etsuro Fujita <fujita(dot)etsuro(at)lab(dot)ntt(dot)co(dot)jp>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Rafia Sabih <rafia(dot)sabih(at)enterprisedb(dot)com>, Rajkumar Raghuwanshi <rajkumar(dot)raghuwanshi(at)enterprisedb(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Partition-wise join for join between (declaratively) partitioned tables
Date: 2017-09-12 07:46:58
Message-ID: 040c1322-cf92-ab47-5246-dae08d5b6f7a@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2017/09/11 21:07, Ashutosh Bapat wrote:
> On Mon, Sep 11, 2017 at 5:19 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>> On Mon, Sep 11, 2017 at 6:45 AM, Ashutosh Bapat
>> <ashutosh(dot)bapat(at)enterprisedb(dot)com> wrote:
>>> So, all partitioned partitions are getting locked correctly. Am I
>>> missing something?
>>
>> That's not a valid test. In that scenario, you're going to hold all
>> the locks acquired by the planner, all the locks acquired by the
>> rewriter, and all the locks acquired by the executor, but when using
>> prepared queries, it's possible to execute the plan after the planner
>> and rewriter locks are no longer held.
>
> I see the same thing when I use prepare and execute

So I looked at this a bit closely and came to the conclusion that we may
not need to keep partitioned table RT indexes in the
(Merge)Append.partitioned_rels after all, as far as execution-time locking
is concerned.

Consider two cases:

1. Plan is created and executed in the same transaction

In this case, locks taken on the partitioned tables by the planner will
suffice.

2. Plan is executed in a different transaction from the one in which it
was created (a cached plan)

In this case, AcquireExecutorLocks will lock all the relations in
PlannedStmt.rtable, which must include all partitioned tables of all
partition trees involved in the query. Of those, it will lock the tables
whose RT indexes appear in PlannedStmt.nonleafResultRelations with
RowExclusiveLock mode. PlannedStmt.nonleafResultRelations is a global
list of all partitioned table RT indexes obtained by concatenating
partitioned_rels lists of all ModifyTable nodes involved in the query
(set_plan_refs does that). We need to distinguish nonleafResultRelations,
because we need to take the stronger lock on a given table before any
weaker one if it happens to appear in the query as a non-result relation
too, to avoid lock strength upgrade deadlock hazard.

Moreover, because all the tables from plannedstmt->rtable, including the
partitioned tables, will be added to PlannedStmt.relationsOids, any
invalidation events affecting the partitioned tables (for example,
add/remove a partition) will cause the plan involving partitioned tables
to be recreated.

In none of this do we rely on the partitioned table RT indexes appearing
in the (Merge)Append node itself. Maybe, we should just remove
partitioned_rels from (Merge)AppendPath and (Merge)Append node in a
separate patch and move on.

Thoughts?

Thanks,
Amit

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tatsuo Ishii 2017-09-12 07:53:49 Re: More efficient truncation of pg_stat_activity query strings
Previous Message Ashutosh Bapat 2017-09-12 07:39:02 Re: Partition-wise join for join between (declaratively) partitioned tables