|From:||Andres Freund <andres(at)anarazel(dot)de>|
|To:||David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>|
|Cc:||PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Simon Riggs <simon(at)2ndquadrant(dot)com>|
|Subject:||Re: ATTACH/DETACH PARTITION CONCURRENTLY|
|Views:||Raw Message | Whole Thread | Download mbox | Resend email|
On 2018-08-08 01:23:51 +1200, David Rowley wrote:
> On 8 August 2018 at 00:47, Andres Freund <andres(at)anarazel(dot)de> wrote:
> > On 2018-08-08 00:40:12 +1200, David Rowley wrote:
> >> 1. Obtain a ShareUpdateExclusiveLock on the partitioned table rather
> >> than an AccessExclusiveLock.
> >> 2. Do all the normal partition attach partition validation.
> >> 3. Insert pg_partition record with partvalid = true.
> >> 4. Invalidate relcache entry for the partitioned table
> >> 5. Any loops over a partitioned table's PartitionDesc must check
> >> PartitionIsValid(). This will return true if the current snapshot
> >> should see the partition or not. The partition is valid if partisvalid
> >> = true and the xmin precedes or is equal to the current snapshot.
> > How does this protect against other sessions actively using the relcache
> > entry? Currently it is *NOT* safe to receive invalidations for
> > e.g. partitioning contents afaics.
> I'm not proposing that sessions running older snapshots can't see that
> there's a new partition. The code I have uses PartitionIsValid() to
> test if the partition should be visible to the snapshot. The
> PartitionDesc will always contain details for all partitions stored in
> pg_partition whether they're valid to the current snapshot or not. I
> did it this way as there's no way to invalidate the relcache based on
> a point in transaction, only a point in time.
I don't think that solves the problem that an arriving relcache
invalidation would trigger a rebuild of rd_partdesc, while it actually
is referenced by running code.
You'd need to build infrastructure to prevent that.
One approach would be to make sure that everything relying on
rt_partdesc staying the same stores its value in a local variable, and
then *not* free the old version of rt_partdesc (etc) when the refcount >
0, but delay that to the RelationClose() that makes refcount reach
0. That'd be the start of a framework for more such concurrenct
|Next Message||Don Seiler||2018-08-07 13:31:38||Re: [PATCH] Include application_name in "connection authorized" log message|
|Previous Message||David Rowley||2018-08-07 13:23:51||Re: ATTACH/DETACH PARTITION CONCURRENTLY|