Quick Links

Re: Partition-wise join for join between (declaratively) partitioned tables

From:	Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>, Rafia Sabih <rafia(dot)sabih(at)enterprisedb(dot)com>, Rajkumar Raghuwanshi <rajkumar(dot)raghuwanshi(at)enterprisedb(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Partition-wise join for join between (declaratively) partitioned tables
Date:	2017-03-30 05:22:45
Message-ID:	CAFjFpRf_JFMOGz7o7p3KrSX-S+k6XZQx473xt3iTxKG+yvVpXw@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Wed, Mar 29, 2017 at 8:39 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Tue, Mar 28, 2017 at 12:54 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>> Regarding 0002, I think the parts that involve factoring out
>> find_param_path_info() are uncontroversial. Regarding the changes to
>> adjust_appendrel_attrs(), my main question is whether we wouldn't be
>> better off using an array representation rather than a List
>> representation. In other words, this function could take PlannerInfo
>> *root, Node *node, int nappinfos, AppendRelInfo **appinfos. Existing
>> callers doing adjust_appendrel_attrs(root, whatever, appinfo) could
>> just do adjust_appendrel_attrs(root, whatever, 1, &appinfo), not
>> needing to allocate. To make this work, adjust_child_relids() and
>> find_appinfos_by_relids() would need to be adjusted to use a similar
>> argument-passing convention. I suspect this makes iterating over the
>> AppendRelInfos mildly faster, too, apart from the memory savings.
>
> Still regarding 0002, looking at adjust_appendrel_attrs_multilevel,
> could we have a common code path for the baserel and joinrel cases?
> It seems like perhaps you could just loop over root->append_rel_list.
> For each appinfo, if (bms_is_member(appinfo->child_relid,
> child_rel->relids)) bms_add_member(parent_relids,
> appinfo->parent_relid).
>
> This implementation would have some loss of efficiency in the
> single-rel case because we'd scan all of the AppendRelInfos in the
> list even if there's only one relid. But you could fix that by
> writing it like this:
>
> foreach (lc, root->append_rel_list)
> {
> if (bms_is_member(appinfo->child_relid, child_rel->relids))
> {
> bms_add_member(parent_relids, appinfo->parent_relid);
> if (child_rel->reloptkind == RELOPT_OTHER_MEMBER_REL)
> break; /* only one relid to find, and we've found it */
> }
> }
> Assert(bms_num_members(child_rel->relids) == bms_num_members(parent_relids));
>
> That seems pretty slick. It is just as fast as the current
> implementation for the single-rel case. It allocates no memory
> (unlike what you've got now). And it handles the joinrel case using
> essentially the same code as the simple rel case.

I got rid of those differences completely by using trick similar to
adjust_child_relids_multilevel(), which uses top_parent_relids instead
of rel->reloptkind to decide whether we have reached the top parent or
not. Those can trickle down from the topmost caller to any depth in
recursion. This also avoids any call to find_*_rel(), which was the
main reason why we had different code paths for base and join
relation.

>
> In 0003, it seems that it would be more consistent with what you did
> elsewhere if the last argument to allow_star_schema_join were named
> inner_paramrels rather than innerparams. Other than that, I don't see
> anything to complain about.

I had used the same name as the local variable declared with the same
purpose. But this change looks good. Done.

>
> In 0004:
>
> + Assert(!rel->part_rels[cnt_parts]);
> + rel->part_rels[cnt_parts] = childrel;
>
> break here?

Right, done.

>
> +static void
> +get_relation_partition_info(PlannerInfo *root, RelOptInfo *rel,
> + Relation
> relation, bool inhparent)
> +{
> + /* No partitioning information for an unpartitioned relation. */
> + if (relation->rd_rel->relkind != RELKIND_PARTITIONED_TABLE ||
> + !inhparent ||
>
> I still think the inhparent check should be moved to the caller.

Done.

>
> In 0005:
>
> + * Returns a list of the RT indexes of the partitioned
> child relations
> + * with any of joining relations' rti as the root parent RT index.
>
> I found this wording confusing. Maybe: Build and return a list
> containing the RTI of every partitioned relation which is a child of
> some rel included in the join.

This is better. Thanks. Done.

>
> + * Note: Only call this function on joins between partitioned tables.
>
> Or what, the boogeyman will come and get you?
>
> (In other words, I don't think that's a very informative comment.)

I mimicked the prologue of earlier function. I guess, similar comment
in the prologue of earlier function is written because if we use
something other than a partitioned table there, the assertion at the
end of that function would trip. Similarly, for this function, the
assertion at the end of the function will trip, if we use it for
something other than a join relation.

PFA patches rebased on f90d23d0c51895e0d7db7910538e85d3d38691f0.

>
> I don't think 0011 is likely to be acceptable in current form. I
> can't imagine that we just went to the trouble of getting rid of
> AppendRelInfos for child partitioned rels only to turn around and put
> them back again. If you just need the parent-child mappings, you can
> get that from the PartitionedChildRelInfo list.

I will reply to this separately.

>
> Unfortunately, I don't think we're likely to be able to get this whole
> patch series into a committable form in the next few days, but I'd
> like to keep reviewing it and working with you on it; there's always
> next cycle.

Thanks for all your efforts in reviewing the patches and for your
excellent suggestions to improve the patches.

As I have stated earlier, it will help if we can get 0001 committed,
may be 0002. 0004 introduces the concept of partitioning scheme which
seems to be vital for partititon-wise aggregation, partition pruning
and may be sorting optimization discussed in [1]. If we are able to
commit it in this commitfest, the patches for those optimizations can
take advantage of partitioning scheme. I understand it's really close
to the end of this commitfest and we may not be able to commit even
those patches.

https://www.mail-archive.com/pgsql-hackers(at)postgresql(dot)org/msg308742.html

--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Attachment	Content-Type	Size
pg_dp_join_patches_v16.zip	application/zip	70.7 KB

In response to

Re: Partition-wise join for join between (declaratively) partitioned tables at 2017-03-29 03:09:11 from Robert Haas

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Tom Lane	2017-03-30 05:25:19	Somebody has not thought through subscription locking considerations
Previous Message	Ashutosh Bapat	2017-03-30 05:14:02	Re: Partition-wise join for join between (declaratively) partitioned tables