Re: ATTACH/DETACH PARTITION CONCURRENTLY

From: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, Sergei Kornilov <sk(at)zsrv(dot)org>, Amit Langote <langote_amit_f8(at)lab(dot)ntt(dot)co(dot)jp>, Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Simon Riggs <simon(at)2ndquadrant(dot)com>
Subject: Re: ATTACH/DETACH PARTITION CONCURRENTLY
Date: 2019-02-04 05:02:04
Message-ID: CAKJS1f_wBQJmn9s7+j2ftZ=8rDESV6Ti1hu4W3CHVXMnSFXXUg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, 4 Feb 2019 at 16:45, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>
> On Sat, Feb 2, 2019 at 7:19 PM David Rowley
> <david(dot)rowley(at)2ndquadrant(dot)com> wrote:
> > I think we do need to ensure that the PartitionDesc matches between
> > worker and leader. Have a look at choose_next_subplan_for_worker() in
> > nodeAppend.c. Notice that a call is made to
> > ExecFindMatchingSubPlans().
>
> Thanks for the tip. I see that code, but I'm not sure that I
> understand why it matters here. First, if I'm not mistaken, what's
> being returned by ExecFindMatchingSubPlans is a BitmapSet of subplan
> indexes, not anything that returns to a PartitionDesc directly. And
> second, even if it did, it looks like the computation is done
> separately in every backend and not shared among backends, so even if
> it were directly referring to PartitionDesc indexes, it still won't be
> assuming that they're the same in every backend. Can you further
> explain your thinking?

In a Parallel Append, each parallel worker will call ExecInitAppend(),
which calls ExecCreatePartitionPruneState(). That function makes a
call to RelationGetPartitionDesc() and records the partdesc's
boundinfo in context->boundinfo. This means that if we perform any
pruning in the parallel worker in choose_next_subplan_for_worker()
then find_matching_subplans_recurse() will use the PartitionDesc from
the parallel worker to translate the partition indexes into the
Append's subnodes.

If the PartitionDesc from the parallel worker has an extra partition
than what was there when the plan was built then the partition index
to subplan index translation will be incorrect as the
find_matching_subplans_recurse() will call get_matching_partitions()
using the context with the PartitionDesc containing the additional
partition. The return value from get_matching_partitions() is fine,
it's just that the code inside the while ((i =
bms_next_member(partset, i)) >= 0) loop that will do the wrong thing.
It could even crash if partset has an index out of bounds of the
subplan_map or subpart_map arrays.

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2019-02-04 05:04:13 Re: Reduce amount of WAL generated by CREATE INDEX for gist, gin and sp-gist
Previous Message Amit Kapila 2019-02-04 04:59:06 Re: WIP: Avoid creation of the free space map for small tables