Re: expanding inheritance in partition bound order

From: Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: expanding inheritance in partition bound order
Date: 2017-08-10 02:11:56
Message-ID: 4e073db7-c6a1-1819-f522-9907e2f12a39@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2017/08/05 2:25, Robert Haas wrote:
> Concretely, my proposal is:
>
> 1. Before calling RelationGetPartitionDispatchInfo, the calling code
> should use find_all_inheritors to lock all the relevant relations (or
> the planner could use find_all_inheritors to get a list of relation
> OIDs, store it in the plan in order, and then at execution time we
> visit them in that order and lock them).
>
> 2. RelationGetPartitionDispatchInfo assumes the relations are already locked.
>
> 3. While we're optimizing, in the first loop inside of
> RelationGetPartitionDispatchInfo, don't call heap_open(). Instead,
> use get_rel_relkind() to see whether we've got a partitioned table; if
> so, open it. If not, there's no need.
>
> 4. For safety, add a function bool RelationLockHeldByMe(Oid) and add
> to this loop a check if (!RelationLockHeldByMe(lfirst_oid(lc1))
> elog(ERROR, ...). Might be interesting to stuff that check into the
> relation_open(..., NoLock) path, too.
>
> One objection to this line of attack is that there might be a good
> case for locking only the partitioned inheritors first and then going
> back and locking the leaf nodes in a second pass, or even only when
> required for a particular row. However, that doesn't require putting
> everything in bound order - it only requires moving the partitioned
> children to the beginning of the list. And I think rather than having
> new logic for that, we should teach find_inheritance_children() to do
> that directly. I have a feeling Ashutosh is going to cringe at this
> suggestion, but my idea is to do this by denormalizing: add a column
> to pg_inherits indicating whether the child is of
> RELKIND_PARTITIONED_TABLE. Then, when find_inheritance_children scans
> pg_inherits, it can pull that flag out for free along with the
> relation OID, and qsort() first by the flag and then by the OID. It
> can also return the number of initial elements of its return value
> which have that flag set.
>
> Then, in find_all_inheritors, we split rels_list into
> partitioned_rels_list and other_rels_list, and process
> partitioned_rels_list in its entirety before touching other_rels_list;
> they get concatenated at the end.
>
> Now, find_all_inheritors and find_inheritance_children can also grow a
> flag bool only_partitioned_children; if set, then we skip the
> unpartitioned children entirely.
>
> With all that in place, you can call find_all_inheritors(blah blah,
> false) to lock the whole hierarchy, or find_all_inheritors(blah blah,
> true) to lock just the partitioned tables in the hierarchy. You get a
> consistent lock order either way, and if you start with only the
> partitioned tables and later want the leaf partitions too, you just go
> through the partitioned children in the order they were returned and
> find_inheritance_children(blah blah, false) on each one of them and
> the lock order is exactly consistent with what you would have gotten
> if you'd done find_all_inheritors(blah blah, false) originally.

I tried implementing this in the attached set of patches.

[PATCH 2/5] Teach pg_inherits.c a bit about partitioning

Both find_inheritance_children and find_all_inheritors now list
partitioned child tables before non-partitioned ones and return
the number of partitioned tables in an optional output argument

[PATCH 3/5] Relieve RelationGetPartitionDispatchInfo() of doing locking

Anyone who wants to call RelationGetPartitionDispatchInfo() must first
acquire locks using find_all_inheritors.

TODO: Add RelationLockHeldByMe() and put if (!RelationLockHeldByMe())
elog(ERROR, ...) check in RelationGetPartitionDispatchInfo()

[PATCH 4/5] Teach expand_inherited_rtentry to use partition bound order

After locking the child tables using find_all_inheritors, we discard
the list of child table OIDs that it generates and rebuild the same
using the information returned by RelationGetPartitionDispatchInfo.

[PATCH 5/5] Store in pg_inherits if a child is a partitioned table

Catalog changes so that is_partitioned property of child tables is now
stored in pg_inherits. This avoids consulting syscache to get that
property as is currently implemented in patch 2/5.

I haven't yet done anything about changing the timing of opening and
locking leaf partitions, because it will require some more thinking about
the required planner changes. But the above set of patches will get us
far enough to get leaf partition sub-plans appear in the partition bound
order (same order as what partition tuple-routing uses in the executor).

With the above patches, we get the desired order of child sub-plans in
Append and ModifyTable plans for partitioned tables:

create table p (a int) partition by range (a);
create table p4 partition of p for values from (30) to (40);
create table p3 partition of p for values from (20) to (30);
create table p2 partition of p for values from (10) to (20);
create table p1 partition of p for values from (1) to (10);
create table p0 partition of p for values from (minvalue) to (1) partition
by list (a);
create table p00 partition of p0 for values in (0);
create table p01 partition of p0 for values in (-1);
create table p02 partition of p0 for values in (-2);

explain select count(*) from p;
QUERY PLAN
-------------------------------------------------------------------
Aggregate (cost=293.12..293.13 rows=1 width=8)
-> Append (cost=0.00..248.50 rows=17850 width=0)
-> Seq Scan on p1 (cost=0.00..35.50 rows=2550 width=0)
-> Seq Scan on p2 (cost=0.00..35.50 rows=2550 width=0)
-> Seq Scan on p3 (cost=0.00..35.50 rows=2550 width=0)
-> Seq Scan on p4 (cost=0.00..35.50 rows=2550 width=0)
-> Seq Scan on p02 (cost=0.00..35.50 rows=2550 width=0)
-> Seq Scan on p01 (cost=0.00..35.50 rows=2550 width=0)
-> Seq Scan on p00 (cost=0.00..35.50 rows=2550 width=0)

explain update p set a = a;
QUERY PLAN
--------------------------------------------------------------
Update on p (cost=0.00..248.50 rows=17850 width=10)
Update on p1
Update on p2
Update on p3
Update on p4
Update on p02
Update on p01
Update on p00
-> Seq Scan on p1 (cost=0.00..35.50 rows=2550 width=10)
-> Seq Scan on p2 (cost=0.00..35.50 rows=2550 width=10)
-> Seq Scan on p3 (cost=0.00..35.50 rows=2550 width=10)
-> Seq Scan on p4 (cost=0.00..35.50 rows=2550 width=10)
-> Seq Scan on p02 (cost=0.00..35.50 rows=2550 width=10)
-> Seq Scan on p01 (cost=0.00..35.50 rows=2550 width=10)
-> Seq Scan on p00 (cost=0.00..35.50 rows=2550 width=10)
(15 rows)

> P.S. While I haven't reviewed 0002 in detail, I think the concept of
> minimizing what needs to be built in RelationGetPartitionDispatchInfo
> is a very good idea.

I put this patch ahead in the list and so it's now 0001.

Thanks,
Amit

Attachment Content-Type Size
0001-Decouple-RelationGetPartitionDispatchInfo-from-execu.patch text/plain 37.7 KB
0002-Teach-pg_inherits.c-a-bit-about-partitioning.patch text/plain 20.7 KB
0003-Relieve-RelationGetPartitionDispatchInfo-of-doing-an.patch text/plain 6.9 KB
0004-Teach-expand_inherited_rtentry-to-use-partition-boun.patch text/plain 2.0 KB
0005-Store-in-pg_inherits-if-a-child-is-a-partitioned-tab.patch text/plain 6.4 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Langote 2017-08-10 02:14:00 Re: dubious error message from partition.c
Previous Message Thomas Munro 2017-08-10 02:02:57 Re: Causal reads take II