Re: Partition-wise join for join between (declaratively) partitioned tables

From: Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Rajkumar Raghuwanshi <rajkumar(dot)raghuwanshi(at)enterprisedb(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Partition-wise join for join between (declaratively) partitioned tables
Date: 2017-03-14 12:04:38
Message-ID: CAFjFpRf4wXhQeKhi4NhA9B3vo8BK99zkm-o8tEBq3qb8cwbLzQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Mar 14, 2017 at 5:47 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Mon, Mar 13, 2017 at 3:24 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>> Haven't looked at 0007 yet.
>
> + if (rel->part_scheme)
> + {
> + int cnt_parts;
> +
> + for (cnt_parts = 0; cnt_parts < nparts; cnt_parts++)
> + {
> + if (rel->part_oids[cnt_parts] ==
> childRTE->relid)
> + {
> + Assert(!rel->part_rels[cnt_parts]);
> + rel->part_rels[cnt_parts] = childrel;
> + }
> + }
> + }
>
> It's not very appealing to use an O(n^2) algorithm here. I wonder if
> we could arrange things so that inheritance expansion expands
> partitions in the right order, and then we could just match them up
> one-to-one. This would probably require an alternate version of
> find_all_inheritors() that expand_inherited_rtentry() would call only
> for partitioned tables.

That seems a much better solution, but
1. Right now when we expand a multi-level partitioned table, we
include indirect partitions as direct children in inheritance
hierachy. part_rels array OTOH should correspond to the partitioning
scheme and should hold RelOptInfos of direct partitions. 0013 patch
fixes that to include only direct partitions as direct children
preserving partitioning hierarchy in the inheritance hierarchy. That
patch right now uses find_inheritance_children() to get Oids of direct
partitions, but instead it could return rd_partdesc->oids in the form
of list; OIDs ordered same as the array. Once we do that, we should
expect the appinfos to appear in the same order as the
rd_partdesc->oids and so RelOptInfo::part_oids. We just need to make
sure that the order is preserved and assign part_rels as they appear
in that loop.

One would argue that we preserve the OIDs only for single-level
partitioned tables, but in expand_inheritance_rtentry(), if we want to
detect whether a relation is single-level partitioned or multi-level,
we need to look up its direct partitions to see if they are further
partitioned. That will look a bit ugly and will not be necessary once
we have 0013. In case we decide to defer multi-level partitioned table
changes to v11 and based on the progress in [1], I will work on fixing
the order in which appinfos are created for single-level partitioned
tables.

> Failing that, another idea would be to use
> qsort() or qsort_arg() to put the partitions in the right order.

I didn't get this. I could not find documentation for qsort_arg(). Can
you please elaborate? I guess, if we fix expand_inheritance_rtentry()
we don't need this. It looks like we will change
expand_inheritance_rtentry() anyway.

>
> + if (relation->rd_rel->relkind != RELKIND_PARTITIONED_TABLE ||
> + !inhparent ||
> + !(rel->part_scheme = find_partition_scheme(root, relation)))
>
> Maybe just don't call this function in the first place in the
> !inhparent case, instead of passing down an argument that must always
> be true.

The function serves a single place to re/set partitioning information.
It would set the partitioning information if the above three
conditions are met. Otherwise it would nullify that information. If we
decide not to call this function when !inhparent, we will need to
nullify the partitioning information outside of this function as well
as inside this function, duplicating the code.

>
> + /* Match the partition key types. */
> + for (cnt_pks = 0; cnt_pks < partnatts; cnt_pks++)
> + {
> + /*
> + * For types, it suffices to match the type
> id, mod and collation;
> + * len, byval and align are depedent on the first two.
> + */
> + if (part_key->partopfamily[cnt_pks] !=
> part_scheme->partopfamily[cnt_pks] ||
> + part_key->partopcintype[cnt_pks] !=
> part_scheme->partopcintype[cnt_pks] ||
> + part_key->parttypid[cnt_pks] !=
> part_scheme->key_types[cnt_pks] ||
> + part_key->parttypmod[cnt_pks] !=
> part_scheme->key_typmods[cnt_pks] ||
> + part_key->parttypcoll[cnt_pks] !=
> part_scheme->key_collations[cnt_pks])
> + break;
> + }
>
> I think memcmp() might be better than a for-loop.

Done.

PFA patches.

[1] https://www.postgresql.org/message-id/2b0d42f2-3a53-763b-c9c2-47139e4b1c2e@lab.ntt.co.jp
--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Attachment Content-Type Size
pg_dp_join_patches_v3.zip application/zip 89.4 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Ashutosh Bapat 2017-03-14 12:04:46 Re: Partition-wise join for join between (declaratively) partitioned tables
Previous Message Ashutosh Bapat 2017-03-14 12:04:27 Re: Partition-wise join for join between (declaratively) partitioned tables