Re: ATTACH/DETACH PARTITION CONCURRENTLY

From: Andres Freund <andres(at)anarazel(dot)de>
To: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Simon Riggs <simon(at)2ndquadrant(dot)com>
Subject: Re: ATTACH/DETACH PARTITION CONCURRENTLY
Date: 2018-08-07 13:29:25
Message-ID: 20180807132925.grxgp3mtg4i6mpib@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox
Thread:
Lists: pgsql-hackers

On 2018-08-08 01:23:51 +1200, David Rowley wrote:
> On 8 August 2018 at 00:47, Andres Freund <andres(at)anarazel(dot)de> wrote:
> > On 2018-08-08 00:40:12 +1200, David Rowley wrote:
> >> 1. Obtain a ShareUpdateExclusiveLock on the partitioned table rather
> >> than an AccessExclusiveLock.
> >> 2. Do all the normal partition attach partition validation.
> >> 3. Insert pg_partition record with partvalid = true.
> >> 4. Invalidate relcache entry for the partitioned table
> >> 5. Any loops over a partitioned table's PartitionDesc must check
> >> PartitionIsValid(). This will return true if the current snapshot
> >> should see the partition or not. The partition is valid if partisvalid
> >> = true and the xmin precedes or is equal to the current snapshot.
> >
> > How does this protect against other sessions actively using the relcache
> > entry? Currently it is *NOT* safe to receive invalidations for
> > e.g. partitioning contents afaics.
>
> I'm not proposing that sessions running older snapshots can't see that
> there's a new partition. The code I have uses PartitionIsValid() to
> test if the partition should be visible to the snapshot. The
> PartitionDesc will always contain details for all partitions stored in
> pg_partition whether they're valid to the current snapshot or not. I
> did it this way as there's no way to invalidate the relcache based on
> a point in transaction, only a point in time.

I don't think that solves the problem that an arriving relcache
invalidation would trigger a rebuild of rd_partdesc, while it actually
is referenced by running code.

You'd need to build infrastructure to prevent that.

One approach would be to make sure that everything relying on
rt_partdesc staying the same stores its value in a local variable, and
then *not* free the old version of rt_partdesc (etc) when the refcount >
0, but delay that to the RelationClose() that makes refcount reach
0. That'd be the start of a framework for more such concurrenct
handling.

Regards,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Don Seiler 2018-08-07 13:31:38 Re: [PATCH] Include application_name in "connection authorized" log message
Previous Message David Rowley 2018-08-07 13:23:51 Re: ATTACH/DETACH PARTITION CONCURRENTLY