Partitioned tables and relfilenode

From: Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>
To: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Partitioned tables and relfilenode
Date: 2017-02-10 06:19:47
Views: Raw Message | Whole Thread | Download mbox
Lists: pgsql-hackers

The new partitioned tables do not contain any data by themselves. Any
data inserted into a partitioned table is routed to and stored in one of
its partitions. In fact, it is impossible to insert *any* data before a
partition (to be precise, a leaf partition) is created. It seems wasteful
then to allocate physical storage (files) for partitioned tables. If we
do not allocate the storage, then we must make sure that the right thing
happens when a command that is intended to manipulate a table's storage
encounters a partitioned table, the "right thing" here being that the
command's code either throws an error or warning (in some cases) if the
specified table is a partitioned table or ignores any partitioned tables
when it reads the list of relations to process from pg_class. Commands
that need to be taught about this are vacuum, analyze, truncate, and alter
table. Specifically:

- In case of vacuum, specifying a partitioned table causes a warning

- In case of analyze, we do not throw an error or warning but simply
avoid calling do_analyze_rel() *non-recursively*. Further in
acquire_inherited_sample_rows(), any partitioned tables in the list
returned by find_all_inheritors() are skipped.

- In case of truncate, only the part which manipulates table's physical
storage is skipped for partitioned tables.

- ATRewriteTables() skips on the AlteredTableInfo entries for partitioned
tables, because there is nothing to be done.

- Since we cannot create indexes on partitioned tables anyway, there is
no need to handle cluster and reindex (they throw a meaningful error
already due to the lack of indexes.)

Patches 0001 and 0002 of the attached implement the above part. 0001
teaches the above mentioned commands to do the "right thing" as described
above and 0002 teaches heap_create() and heap_create_with_catalog() to not
create any physical storage (none of the forks) for partitioned tables.

Then comes 0003, which concerns inheritance planning. In a regular
inheritance set (ie, the inheritance set corresponding to an inheritance
hierarchy whose root is a regular table), the inheritance parents are
included in their role as child members, because they might contribute
rows to the final result. So AppendRelInfo's are created for each such
table by the planner prep phase, which the later planning steps use to
create a scan plan for those tables as the Append's child plans.
Currently, the partitioned tables are also processed by the optimizer as
inheritance sets. Partitioned table inheritance parents do not own any
storage, so we *must* not create scan plans for them. So we do not need
to process them as child members of the inheritance set. 0003 teaches
expand_inherited_rtentry() to not add partitioned tables as child members.
Also, since the root partitioned table RTE is no longer added to the
Append list as the 1st child member, inheritance_planner() cannot assume
that it can install the 1st child RTE as the nominalRelation of a given
ModifyTable node, instead the original root parent table RTE is installed
as the nominalRelation.

Together the above patches implement the first item listed in "Some
notes:" part of an email [1] on the original declarative partitioning
thread, which says:

"We should try to teach the executor never to scan the parent. That's
never necessary with this system, and it might add significant overhead.
We should also try to get rid of the idea of the parent having storage
(i.e. a relfilenode)."

Thoughts, comments?



Attachment Content-Type Size
0001-Partitioned-tables-are-empty-themselves.patch text/x-diff 6.4 KB
0002-Do-not-allocate-storage-for-partitioned-tables.patch text/x-diff 1.3 KB
0003-Always-plan-partitioned-tables-as-inheritance-sets.patch text/x-diff 14.6 KB


Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2017-02-10 06:45:38 Re: Write Ahead Logging for Hash Indexes
Previous Message Claudio Freire 2017-02-10 06:18:27 Re: Improve OR conditions on joined columns (common star schema problem)