Re: Partition-wise join for join between (declaratively) partitioned tables

From: Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>
To: Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>
Cc: Etsuro Fujita <fujita(dot)etsuro(at)lab(dot)ntt(dot)co(dot)jp>, Robert Haas <robertmhaas(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Rafia Sabih <rafia(dot)sabih(at)enterprisedb(dot)com>, Rajkumar Raghuwanshi <rajkumar(dot)raghuwanshi(at)enterprisedb(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Partition-wise join for join between (declaratively) partitioned tables
Date: 2017-09-05 11:01:31
Message-ID: CAFjFpRfRDhWp=oguNjyzN=NMoOD+RCC3wS+b+xbGKwKUk0dRKg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Sep 5, 2017 at 3:00 PM, Ashutosh Bapat
<ashutosh(dot)bapat(at)enterprisedb(dot)com> wrote:
> On Tue, Sep 5, 2017 at 1:16 PM, Amit Langote
> <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp> wrote:
>> On 2017/09/05 15:43, Ashutosh Bapat wrote:
>>> Ok. Can you please answer my previous questions?
>>>
>>> AFAIU, the list contained RTIs of the relations, which didnt' have
>>> corresponding AppendRelInfos to lock those relations. Now that we
>>> create AppendRelInfos even for partitioned partitions with my 0001
>>> patch, I don't think
>>> we need the list to take care of the locks. Is there any other reason
>>> why we maintain that list (apart from the trigger case I have raised
>>> and Fujita-san says that the list is not required in that case as
>>> well.)?
>>
>> AppendRelInfos exist within the planner (they come to be and go away
>> within the planner). Once we leave the planner, that information is gone.
>>
>> Executor will receive a plan node that won't contain that information:
>>
>> 1. Append has an appendplans field, which contains one plan tree for every
>> leaf partition. None of its fields, other than partitined_rels,
>> contains the RT indexes of the partitioned tables. Similarly in the
>> case of MergeAppend.
>>
>> 2. ModifyTable has a resultRelations fields which contains a list of leaf
>> partition RT indexes and a plans field which contains one plan tree for
>> every RT index in the resultRelations list (that is a plan tree that
>> will scan the particular leaf partition). None of its fields, other
>> than partitined_rels, contains the RT indexes of the partitioned
>> tables.
>>
>> I learned over the course of developing the patch that added this
>> partitioned_rels field [1] that the executor needs to identify all the
>> affected tables by a given plan tree so that it could lock them. Executor
>> needs to lock them separately even if the plan itself was built after
>> locking all the relevant tables [2]. For example, see
>> ExecLockNonLeafAppendTables(), which will lock the tables in the
>> (Merge)Append.partitioned_rels list.
>>
>> While I've been thinking all along that the same thing must be happening
>> for RT indexes in ModifyTable.partitioned_rels list (I said so a couple of
>> times on this thread), it's actually not. Instead,
>> ModifyTable.partitioned_rels of all ModifyTable nodes in a give query are
>> merged into PlannedStmt.nonleafResultRelations (which happens in
>> set_plan_refs) and that's where the executor finds them to lock them
>> (which happens in InitPlan).
>>
>> So, it appears that ModifyTable.partitioned_rels is indeed unused in the
>> executor. But we still can't get rid of it from the ModifyTable node
>> itself without figuring out a way (a channel) to transfer that information
>> into PlannedStmt.nonleafResultRelations.
>
> Thanks a lot for this detailed analysis. IIUC, in my 0001 patch, I am
> not adding any partitioned partition other than the parent itself. But
> since every partitioned partition in turn acts as parent, it appears
> its own list. The list obtained by concatenating all such lists
> together contains all the partitioned partition RTIs. In my patch, I
> need to teach accumulate_append_subpath() to accumulate
> partitioned_rels as well.
>

accumulate_append_subpath() is executed for every path instead of
every relation, so changing it would collect the same list multiple
times. Instead, I found the old way of associating all intermediate
partitions with the root partitioned relation work better. Here's the
updated patch set.
--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Attachment Content-Type Size
pg_dp_join_patches_v29.tar.gz application/x-gzip 165.6 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2017-09-05 11:04:06 Re: Proposal: pg_rewind to skip config files
Previous Message Konstantin Knizhnik 2017-09-05 10:58:56 Re: JIT compiling expressions/deform + inlining prototype v2.0