path toward faster partition pruning

From: Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>
To: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: path toward faster partition pruning
Date: 2017-08-21 06:37:18
Views: Raw Message | Whole Thread | Download mbox
Lists: pgsql-hackers

I've been working on implementing a way to perform plan-time
partition-pruning that is hopefully faster than the current method of
using constraint exclusion to prune each of the potentially many
partitions one-by-one. It's not fully cooked yet though.

Meanwhile, I thought I'd share a couple of patches that implement some
restructuring of the planner code related to partitioned table inheritance
planning that I think would be helpful. They are to be applied on top of
the patches being discussed at [1]. Note that these patches themselves
don't implement the actual code that replaces constraint exclusion as a
method of performing partition pruning. I will share that patch after
debugging it some more.

The main design goal of the patches I'm sharing here now is to defer the
locking and opening of leaf partitions in a given partition tree to a
point after set_append_rel_size() is called on the root partitioned table.
Currently, AFAICS, we need to lock and open the child tables in
expand_inherited_rtentry() only to set the translated_vars field in
AppendRelInfo that we create for the child. ISTM, we can defer the
creation of a child AppendRelInfo to a point when it (its field
translated_vars in particular) will actually be used and so lock and open
the child tables only at such a time. Although we don't lock and open the
partition child tables in expand_inherited_rtentry(), their RT entries are
still created and added to root->parse->rtable, so that
setup_simple_rel_arrays() knows the maximum number of entries
root->simple_rel_array will need to hold and allocate the memory for that
array accordingly. Slots in simple_rel_array[] corresponding to
partition child tables will be empty until they are created when
set_append_rel_size() is called on the root parent table and it determines
the partitions that will be scanned after all.

Patch augments the existing PartitionedChildRelInfo node, which currently
holds only the partitioned child rel RT indexes, to carry some more
information about the partition tree, which includes the information
returned by RelationGetPartitionDispatchInfo() when it is called from
expand_inherited_rtentry() (per the proposed patch in [1], we call it to
be able to add partitions to the query tree in the bound order).
Actually, since PartitionedChildRelInfo now contains more information
about the partition tree than it used to before, I thought the struct's
name is no longer relevant, so renamed it to PartitionRootInfo and renamed
root->pcinfo_list accordingly to prinfo_list. That seems okay because we
only use that node internally.

Then during the add_base_rels_to_query() step, when build_simple_rel()
builds a RelOptInfo for the root partitioned table, it also initializes
some newly introduced fields in RelOptInfo from the information contained
in PartitionRootInfo of the table. The aforementioned fields are only
initialized in RelOptInfos of root partitioned tables. Note that the
add_base_rels_to_query() step won't add the partition "otherrel"
RelOptInfos yet (unlike the regular inheritance case, where they are,
after looking them up in root->append_rel_list).

When set_append_rel_size() is called on the root partitioned table, it
will call a find_partitions_for_query(), which using the partition tree
information, determines the partitions that will need to be scanned for
the query. This processing happens recursively, that is, we first
determine the root-parent's partitions and then for each partition that's
partitioned, we will determine its partitions and so on. As we determine
partitions in this per-partitioned-table manner, we maintain a pair
(parent_relid, list-of-partition-relids-to-scan) for each partitioned
table and also a single list of all leaf partitions determined so far.
Once all partitions have been determined, we turn to locking the leaf
partitions. The locking happens in the order of OIDs as
find_all_inheritors would have returned in expand_inherited_rtentry(); the
list of OIDs in that original order is also stored in the table's
PartitionRootInfo node. For each OID in that list, check if that OID is
in the set of leaf partition OIDs that was just computed, and if so, lock
it. For all chosen partitions that are partitioned tables (including the
root), we create a PartitionAppendInfo node which stores the
aforementioned pair (parent_relid, list-of-partitions-relids-to-scan), and
append it to a list in the root table's RelOptInfo, with the root table's
PartitionAppendInfo at the head of the list. Note that the list of
partitions in this pair contains only the immediate partitions, so that
the original parent-child relationship is reflected in the list of
PartitionAppendInfos thus collected. The next patch that will implement
actual partition-pruning will add some more code that will run under

set_append_rel_size() processing then continues for the root partitioned
table. It is at this point that we will create the RelOptInfos and
AppendRelInfos for partitions. First for those of the root partitioned
table and then for those of each partitioned table when
set_append_rel_size() will be recursively called for the latter.

Note that this is still largely a WIP patch and the implementation details
might change per both the feedback here and the discussion at [1].



Attachment Content-Type Size
0001-Teach-pg_inherits.c-a-bit-about-partitioning.patch text/plain 26.3 KB
0002-Allow-locking-only-partitioned-children-in-partition.patch text/plain 14.0 KB
0003-WIP-Defer-opening-and-locking-partitions-to-set_appe.patch text/plain 55.2 KB


Browse pgsql-hackers by date

  From Date Subject
Next Message Simon Riggs 2017-08-21 06:41:52 Re: [COMMITTERS] pgsql: Account for catalog snapshot in PGXACT->xmin updates.
Previous Message Amit Langote 2017-08-21 06:10:55 Re: expanding inheritance in partition bound order