Re: Partitioned tables and relfilenode

From: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To: Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>
Cc: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Partitioned tables and relfilenode
Date: 2017-02-15 07:14:49
Message-ID: CAB7nPqREJRXSTk9fQC2oQgAyQXapeNxjs1SQC2NDS7JOm-FKBQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Feb 10, 2017 at 3:19 PM, Amit Langote
<Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp> wrote:
> The new partitioned tables do not contain any data by themselves. Any
> data inserted into a partitioned table is routed to and stored in one of
> its partitions. In fact, it is impossible to insert *any* data before a
> partition (to be precise, a leaf partition) is created. It seems wasteful
> then to allocate physical storage (files) for partitioned tables. If we
> do not allocate the storage, then we must make sure that the right thing
> happens when a command that is intended to manipulate a table's storage
> encounters a partitioned table, the "right thing" here being that the
> command's code either throws an error or warning (in some cases) if the
> specified table is a partitioned table or ignores any partitioned tables
> when it reads the list of relations to process from pg_class. Commands
> that need to be taught about this are vacuum, analyze, truncate, and alter
> table. Specifically:

Thanks. I have been looking a bit at this set of patches.

> - In case of vacuum, specifying a partitioned table causes a warning
> - In case of analyze, we do not throw an error or warning but simply
> avoid calling do_analyze_rel() *non-recursively*. Further in
> acquire_inherited_sample_rows(), any partitioned tables in the list
> returned by find_all_inheritors() are skipped.
> - In case of truncate, only the part which manipulates table's physical
> storage is skipped for partitioned tables.

I am wondering if instead those should not just be errors saying that
such operations are just not support. This could be relaxed in the
future to mean that a vacuum, truncate or analyze on the partitioned
table triggers an operator in cascade.

> - ATRewriteTables() skips on the AlteredTableInfo entries for partitioned
> tables, because there is nothing to be done.

Perhaps that makes sense, foreign tables do that.

> - Since we cannot create indexes on partitioned tables anyway, there is
> no need to handle cluster and reindex (they throw a meaningful error
> already due to the lack of indexes.)

Yep.

> Patches 0001 and 0002 of the attached implement the above part. 0001
> teaches the above mentioned commands to do the "right thing" as described
> above and 0002 teaches heap_create() and heap_create_with_catalog() to not
> create any physical storage (none of the forks) for partitioned tables.

- else
+ /*
+ * Although we cannot analyze partitioned tables themselves, we are
+ * still be able to do the recursive ANALYZE.
+ */
+ else if (onerel->rd_rel->relkind != RELKIND_PARTITIONED_TABLE)
{
/* No need for a WARNING if we already complained during VACUUM */
if (!(options & VACOPT_VACUUM))
It seems to me that it is better style to use an empty else if with
only the comment. Comments related to a condition that are outside
this condition should be conditional in their formulations. Comments
within a condition can be affirmations when they refer to a decided
state of things.

From patch 2:
@@ -1351,7 +1352,12 @@ heap_create_with_catalog(const char *relname,
relkind == RELKIND_TOASTVALUE ||
relkind == RELKIND_PARTITIONED_TABLE);

- heap_create_init_fork(new_rel_desc);
+ /*
+ * We do not want to create any storage objects for a partitioned
+ * table.
+ */
+ if (relkind != RELKIND_PARTITIONED_TABLE)
+ heap_create_init_fork(new_rel_desc);
}
This does not make much sense with an assertion telling exactly the
contrary a couple of lines before. I think that it would make more
sense to move the assertion on relkind directly in
heap_create_init_fork().

> Then comes 0003, which concerns inheritance planning. In a regular
> inheritance set (ie, the inheritance set corresponding to an inheritance
> hierarchy whose root is a regular table), the inheritance parents are
> included in their role as child members, because they might contribute
> rows to the final result. So AppendRelInfo's are created for each such
> table by the planner prep phase, which the later planning steps use to
> create a scan plan for those tables as the Append's child plans.
> Currently, the partitioned tables are also processed by the optimizer as
> inheritance sets. Partitioned table inheritance parents do not own any
> storage, so we *must* not create scan plans for them. So we do not need
> to process them as child members of the inheritance set. 0003 teaches
> expand_inherited_rtentry() to not add partitioned tables as child members.
> Also, since the root partitioned table RTE is no longer added to the
> Append list as the 1st child member, inheritance_planner() cannot assume
> that it can install the 1st child RTE as the nominalRelation of a given
> ModifyTable node, instead the original root parent table RTE is installed
> as the nominalRelation.

This is a collection of checks on relkind == RELKIND_PARTITIONED_TABLE
to avoid interactions with partition tables. Did you consider doing
something in the executor instead?
--
Michael

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Khandekar 2017-02-15 07:33:08 Re: Parallel Append implementation
Previous Message Seki, Eiji 2017-02-15 06:33:29 Re: Proposal: GetOldestXminExtend for ignoring arbitrary vacuum flags