Re: path toward faster partition pruning

From: Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>
To: Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>
Cc: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: path toward faster partition pruning
Date: 2017-08-23 11:16:10
Message-ID: CAFjFpRdb_fkmJHFjvAbB+Ln0t45fWjekLd5pY=sv+eAhBAKXPQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Aug 21, 2017 at 12:07 PM, Amit Langote
<Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp> wrote:
> I've been working on implementing a way to perform plan-time
> partition-pruning that is hopefully faster than the current method of
> using constraint exclusion to prune each of the potentially many
> partitions one-by-one. It's not fully cooked yet though.
>
> Meanwhile, I thought I'd share a couple of patches that implement some
> restructuring of the planner code related to partitioned table inheritance
> planning that I think would be helpful. They are to be applied on top of
> the patches being discussed at [1]. Note that these patches themselves
> don't implement the actual code that replaces constraint exclusion as a
> method of performing partition pruning. I will share that patch after
> debugging it some more.
>
> The main design goal of the patches I'm sharing here now is to defer the
> locking and opening of leaf partitions in a given partition tree to a
> point after set_append_rel_size() is called on the root partitioned table.
> Currently, AFAICS, we need to lock and open the child tables in
> expand_inherited_rtentry() only to set the translated_vars field in
> AppendRelInfo that we create for the child. ISTM, we can defer the
> creation of a child AppendRelInfo to a point when it (its field
> translated_vars in particular) will actually be used and so lock and open
> the child tables only at such a time. Although we don't lock and open the
> partition child tables in expand_inherited_rtentry(), their RT entries are
> still created and added to root->parse->rtable, so that
> setup_simple_rel_arrays() knows the maximum number of entries
> root->simple_rel_array will need to hold and allocate the memory for that
> array accordingly. Slots in simple_rel_array[] corresponding to
> partition child tables will be empty until they are created when
> set_append_rel_size() is called on the root parent table and it determines
> the partitions that will be scanned after all.

The partition pruning can happen only after the quals have been
distributed to Rels i.e. after deconstruct_jointree(),
reconsider_outer_join_clauses() and generate_base_implied_equalities()
have been called. If the goal is to not heap_open() the partitions
which are pruned, we can't do that in expand_inherited_rtentry(). One
reason why I think we don't want to heap_open() partition relations is
to avoid relcache bloat because of opened partition relations, which
are ultimately pruned. But please note that according to your patches,
we still need to populate catalog caches to get relkind and reltype
etc.

There are many functions that traverse simple_rel_array[] after it's
created. Most of them assume that the empty entries in that array
correspond to non-simple range entries like join RTEs. But now we are
breaking that assumption. Most of these functions also skip "other"
relations, so that may be OK now, but I am not sure if it's really
going to be fine if we keep empty slots in place of partition
relations. There may be three options here 1. add placeholder
RelOptInfos for partition relations (may be mark those specially) and
mark the ones which get pruned as dummy later. 2. Prune the partitions
before any functions scans simple_rel_array[] or postpone creating
simple_rel_array till pruning. 3. Examine all the current scanners
esp. the ones which will be called before pruning to make sure that
skipping "other" relations is going to be kosher.

>
> Patch augments the existing PartitionedChildRelInfo node, which currently
> holds only the partitioned child rel RT indexes, to carry some more
> information about the partition tree, which includes the information
> returned by RelationGetPartitionDispatchInfo() when it is called from
> expand_inherited_rtentry() (per the proposed patch in [1], we call it to
> be able to add partitions to the query tree in the bound order).
> Actually, since PartitionedChildRelInfo now contains more information
> about the partition tree than it used to before, I thought the struct's
> name is no longer relevant, so renamed it to PartitionRootInfo and renamed
> root->pcinfo_list accordingly to prinfo_list. That seems okay because we
> only use that node internally.
>
> Then during the add_base_rels_to_query() step, when build_simple_rel()
> builds a RelOptInfo for the root partitioned table, it also initializes
> some newly introduced fields in RelOptInfo from the information contained
> in PartitionRootInfo of the table. The aforementioned fields are only
> initialized in RelOptInfos of root partitioned tables. Note that the
> add_base_rels_to_query() step won't add the partition "otherrel"
> RelOptInfos yet (unlike the regular inheritance case, where they are,
> after looking them up in root->append_rel_list).

Partition-wise join requires the partition hierarchy to be expanded
level-by-level keeping in-tact the parent-child relationship between
partitioned table and its partitions. Your patch doesn't do that and
adds all the partitioning information in the root partitioned table's
RelOptInfo. OTOH, partition-wise join patch adds partition bounds, key
expressions, OID and RelOptInfos of the immediate partitions
(including partitioned partitions) to RelOptInfo of a partitioned
table (see patch 0002 in the latest set of patches at [1]). I don't
see much point in having conflicting changes in both of our patches.
May be you should review that patch from my set and we can find a set
of members which help both partition pruning and partition-wise join.

>
> When set_append_rel_size() is called on the root partitioned table, it
> will call a find_partitions_for_query(), which using the partition tree
> information, determines the partitions that will need to be scanned for
> the query. This processing happens recursively, that is, we first
> determine the root-parent's partitions and then for each partition that's
> partitioned, we will determine its partitions and so on. As we determine
> partitions in this per-partitioned-table manner, we maintain a pair
> (parent_relid, list-of-partition-relids-to-scan) for each partitioned
> table and also a single list of all leaf partitions determined so far.
> Once all partitions have been determined, we turn to locking the leaf
> partitions. The locking happens in the order of OIDs as
> find_all_inheritors would have returned in expand_inherited_rtentry(); the
> list of OIDs in that original order is also stored in the table's
> PartitionRootInfo node. For each OID in that list, check if that OID is
> in the set of leaf partition OIDs that was just computed, and if so, lock
> it. For all chosen partitions that are partitioned tables (including the
> root), we create a PartitionAppendInfo node which stores the
> aforementioned pair (parent_relid, list-of-partitions-relids-to-scan), and
> append it to a list in the root table's RelOptInfo, with the root table's
> PartitionAppendInfo at the head of the list. Note that the list of
> partitions in this pair contains only the immediate partitions, so that
> the original parent-child relationship is reflected in the list of
> PartitionAppendInfos thus collected. The next patch that will implement
> actual partition-pruning will add some more code that will run under
> find_partitions_for_query().
>
> set_append_rel_size() processing then continues for the root partitioned
> table. It is at this point that we will create the RelOptInfos and
> AppendRelInfos for partitions. First for those of the root partitioned
> table and then for those of each partitioned table when
> set_append_rel_size() will be recursively called for the latter.

set_append_rel_size(), set_append_rel_pathlist() are already
recursive, so if we process expansion and pruning for one level in
those functions, the recursion will automatically take care of doing
so for every level.

>
>
> Note that this is still largely a WIP patch and the implementation details
> might change per both the feedback here and the discussion at [1].

The changes to code which handle expansion in this patch set should
really be part of expansion in bound order thread so that it's easy to
review all changes together. And this thread can then only concentrate
on partition pruning.

[1] http://postgr.es/m/CAFjFpRd9Vqh_=-Ldv-XqWY006d07TJ+VXuhXCbdj=P1jukYBrw@mail.gmail.com

--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2017-08-23 11:38:10 Re: Page Scan Mode in Hash Index
Previous Message Jeevan Chalke 2017-08-23 11:13:04 Re: Partition-wise aggregation/grouping