Re: Partitioned tables and relfilenode

From: Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>
To: Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Partitioned tables and relfilenode
Date: 2017-03-15 06:32:10
Message-ID: 744d20fe-fc7b-f89e-8d06-6496ec537b86@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2017/03/15 13:38, Ashutosh Bapat wrote:
> On Wed, Mar 15, 2017 at 3:39 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>> On Mon, Mar 13, 2017 at 6:24 AM, Amit Langote
>> <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp> wrote:
>>> The previous proposal was for expand_inherited_rtentry to not create RT
>>> entries and AppendRelInfo's for the non-leaf tables, but I think that
>>> doesn't work, as I tried to explain above. We need RTEs because that
>>> seems to be the only way currently for informing the executor of the
>>> non-leaf tables. We need AppendRelInfo's to store the RT indexes of those
>>> RTEs for the latter planning steps to collect them in partitioned_rels
>>> mentioned above. So with the latest patch, we do create the RT entry and
>>> AppendRelInfo for non-leaf tables. AppendRelInfo created in this case is
>>> a minimal one; only parent_relid and child_relid are valid. To make the
>>> latter planning steps ignore these minimal AppendRelInfo's, every
>>> AppendRelInfo is now marked with child_relkind. Only
>>> set_append_rel_pathlist() and inheritance_planner() process them to
>>> collect the child_relid into the partitioned_rels list to be stored in
>>> AppendPath/MergeAppendPath and ModifyTablePath, respectively.
>>
>> I see your point, but I still think this kind of stinks. You've got
>> all kinds of logic that is now conditional on child_is_partitioned,
>> and that seems like a recipe for bugs and future maintenance
>> difficulties. It would be much nicer if we could come up with a
>> design that doesn't create the AppendRelInfo in the first place,
>> because then all of that stuff could just work. Can we get away with
>> creating an RTE for each partitioned table (other than the parent,
>> perhaps; for that one it would be nice to use the inh-flagged RTE we
>> already have) but NOT creating an AppendRelInfo to go with it? If
>> we've got the list of RTIs for the new RTEs associated with the append
>> path in some other form, can't we get by without also having an
>> AppendRelInfo to hold onto that translation?
>>
>
> Will it help to retain the partition hierarchy as inheritance
> hierarchy and then collapse it while creating append paths. That will
> be needed by partition-wise join, will be helpful in partition pruning
> without using constraints and so on. So, may be could use that
> infrastructure to simplify the logic here. The patch is available as
> 0013 in [1].
>
> [1] CAFjFpRfqotRR6cM3sooBHMHEVdkFfAZ6PyYg4GRZsoMuW08HjQ(at)mail(dot)gmail(dot)com

IMHO, it would be better to keep those patches separate because the
problems being solved are different. By the way, one of the reasons that
patch (as I had written it) was skipped was because it didn't cover the
inheritance_planner() case [1]. Your patch 0013 at the link should be
updated (maybe I should report on the partitionwise joins thread as well)
in some way to handle the update/delete case, because this happens:

create table p (a int, b char) partition by list (a);
create table p1 partition of p for values in (1) partition by list (b);
create table p1a partition of p1 for values in ('a');
create table p2 partition of p for values in (2);

explain (costs off) update p set a = a, b = 'b';
QUERY PLAN
-----------------------------------
Update on p
Update on p
Update on p1 p
Update on p2
-> Seq Scan on p
-> Result
-> Append
-> Seq Scan on p1
-> Seq Scan on p1a
-> Seq Scan on p2
(10 rows)

update p set a = a, b = 'b';
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.

Thanks,
Amit

[1]
https://www.postgresql.org/message-id/CA%2BTgmobMy%3DrqM%3DMTN_FUEfD-PiWSCSonH%2BZ1_SjL6ZmQ2GGz1w%40mail.gmail.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2017-03-15 07:00:25 Re: [COMMITTERS] pgsql: Fix cardinality estimates for parallel joins.
Previous Message Peter van Hardenberg 2017-03-15 06:29:35 Defaulting psql to ON_ERROR_ROLLBACK=interactive