Re: Partitioned tables and relfilenode

From: Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Partitioned tables and relfilenode
Date: 2017-03-17 08:57:23
Message-ID: ac0aac85-91a4-3079-d859-f2deb59df4ca@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2017/03/16 22:16, Robert Haas wrote:
> On Thu, Mar 16, 2017 at 6:03 AM, Amit Langote
> <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp> wrote:
>> I think we'll need to store *somewhere* the mapping of which inh=false
>> partitioned table RTE is the child of which inh=true (IOW, parent)
>> partitioned table RTE.
>
> I mean, for the children you're going to scan, that seems to be
> necessary so that you can do things like translate targetlists to use
> the correct varno. But for the children you're not going to scan,
> well, you need to know which ones they are so you can lock them, but
> do you really need the parent-child mappings? Or just a list of which
> ones there are?

I thought we'd want to keep the RT indexes segregated per controlling
node, which is why we'd want to store the mapping somehow. There may be
multiple partitioned tables in a query and one of them might be the result
relation. If we propagate all of them in one list to the executor,
InitPlan wouldn't know which ones are target relations, for example. But
I guess you don't mean passing all of them in one big list to the executor.

>> I've come to think that AppendRelInfos, although
>> contain extraneous information that won't be used, are better than
>> inventing something new altogether for time being. AppendRelInfos are
>> referred to a few times by query_planner() steps before we eventually get
>> to either set_append_rel_pathlist() or inheritance_planner(), so not
>> changing that approach seems less worrisome for now. So now if we both
>> create child RTEs and AppendRelInfos for the partitioned tables, we don't
>> need to change expand_inherited_rtentry() at all with this patch.
>> Finally, set_append_rel_size/pathlist() and inheritance_planner() skip the
>> child partitioned table RTEs, because no path/plan need to be created. We
>> can do away with having to create RelOptInfos for child partitioned table
>> RTEs, which I found to be not that invasive.
>
> Yes, but on the flip side, you're having to add code in a lot of
> places -- I think I counted 7 -- where you turn around and ignore
> those AppendRelInfos.

Perhaps you were looking at the previous version with "minimal" appinfos
containing the child_is_partitioned field? Following are the instances in
the latest patch posted yesterday:

- In set_append_re_size, do not count the partitioned child tables

- In set_append_rel_pathlist, do not create paths, but add to the list of
the partitioned tables to be included in the final Append path

- In create_lateral_join_info(), no need to propagate lateral join info to
partitioned child tables

- In inheritance_planner, no need to create plans for partitioned child
tables, but add to the list of the partitioned tables to be included in
the final ModifyTable path

In the previous version, because AppendRelInfo structure was itself
modified, there were more instances because more places would need to know
about the new kinds of appinfos (the minimal ones), which isn't the case
with the latest patch. Although admittedly, there will be a lot of
useless manipulation of those appinfos whose only purpose is to carry the
parent-child relid mapping up to where we finally consume that
information. Anyway, I've tried to implement an approach that doesn't
create AppendRelInfos for partitioned child tables as described below.

> That's a lot; how do we know we've got them
> all? I'm not sure what the patch would look like the other way, but
> I'm hoping that you could just keep the list of partitioned table RTIs
> someplace that mostly gets ignored, and then all of that special
> handling could be ripped out.

Since we'll want to save the RT indexes when we create child RTEs, that is
in expand_inherited_rtentry(), we'd do use a data structure that parallels
root->append_rel_list. The structure might consist of a the parent_relid
and a list of partitioned child RT indexes. Since we build this
information early during subquery_planner() it might equally be subject to
errors of omission, although that somehow sounds less likely.

I've tried to implement something like that in the attached updated 0001.
The mapping structure PartitionedChildRelInfo (a Node in fact) consists of
parent_relid and the list of partitioned child RT indexes. A global list
of these nodes is maintained in the root PlannerInfo called pcinfo_list.
In set_append_rel_pathlist() and inheritance_planner(), we search the
aforementioned list and copy the correct list to
Append/MergeAppend/ModifyTable path created for the parent. Thoughts?

I also noticed that PlanRowMarks that are created by
expand_inherited_rtentry() for partitioned child tables are problematic -
they must somehow be ignored in the executor. One way to do that is to
set isParent (making it a dummy entry) when initializing them, which is
what the patch does. Comment is updated to say that just like inheritance
parent relations, PlanRowMarks for partitioned child tables are also
marked as dummy.

Thanks,
Amit

Attachment Content-Type Size
0001-Avoid-creating-scan-nodes-for-partitioned-tables.patch text/x-diff 64.5 KB
0002-Do-not-allocate-storage-for-partitioned-tables.patch text/x-diff 3.1 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Dave Page 2017-03-17 09:00:51 Re: pg_ls_waldir() & pg_ls_logdir()
Previous Message Kyotaro HORIGUCHI 2017-03-17 08:35:05 Re: asynchronous execution