Re: hot_standby_feedback vs excludeVacuum and snapshots

From: Noah Misch <noah(at)leadboat(dot)com>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>, Andres Freund <andres(at)anarazel(dot)de>
Cc: Greg Stark <stark(at)mit(dot)edu>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: hot_standby_feedback vs excludeVacuum and snapshots
Date: 2018-07-04 20:27:56
Message-ID: 20180704202756.GA269861@rfd.leadboat.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Jun 08, 2018 at 11:03:38AM -0700, Andres Freund wrote:
> On 2018-06-08 09:23:02 +0100, Simon Riggs wrote:
> > For context, AEL locks are normally removed by COMMIT or ABORT.
> > StandbyReleaseOldLocks() is just a sweeper to catch anything that
> > didn't send an abort before it died, so it hardly ever activates. The
> > coding of StandbyReleaseOldLocks() is backwards... if it ain't in the
> > running list, then we remove it.
> >
> > But that wasn't working correctly either, since as of 49bff5300d527 we
> > assigned AELs to subxids. Subxids weren't in the running list and so
> > AELs held by them would have been removed at the wrong time, an extant
> > bug in PG10. It looks to me like they would have been removed either
> > early or late, up to the next runningxact info record. They would be
> > removed, so no leakage, but the late timing wouldn't be noticeable for
> > tests or most usage, since it would look just like lock contention.
> > Early release might give same issue of block access to non-existent
> > block/file.
>
> Yikes, that's kinda bad. It can also just cause plain crashes, no? The
> on-disk / catalog state isn't necessarily consistent during DDL, which
> is why we hold AE locks. At the very least it can cause corruption of
> in-use relcache entries and such.

Yes. If I'm reading the commit message right, the committed code also
releases locks too early:

> commit 15378c1
> Author: Simon Riggs <simon(at)2ndQuadrant(dot)com>
> AuthorDate: Sat Jun 16 14:03:29 2018 +0100
> Commit: Simon Riggs <simon(at)2ndQuadrant(dot)com>
> CommitDate: Sat Jun 16 14:03:29 2018 +0100
>
> Remove AELs from subxids correctly on standby
>
> Issues relate only to subtransactions that hold AccessExclusiveLocks
> when replayed on standby.
>
> Prior to PG10, aborting subtransactions that held an
> AccessExclusiveLock failed to release the lock until top level commit or
> abort. 49bff5300d527 fixed that.
>
> However, 49bff5300d527 also introduced a similar bug where subtransaction
> commit would fail to release an AccessExclusiveLock, leaving the lock to
> be removed sometimes early and sometimes late. This commit fixes
> that bug also. Backpatch to PG10 needed.

Subtransaction commit is too early to release an arbitrary
AccessExclusiveLock. The primary releases every AccessExclusiveLock at
top-level transaction commit, top-level transaction abort, or subtransaction
abort. CommitSubTransaction() doesn't do that; it transfers locks to the
parent sub(xact). Standby nodes can't safely remove an arbitrary lock earlier
than the primary would.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2018-07-04 20:43:19 Re: Legacy GiST invalid tuples
Previous Message Tom Lane 2018-07-04 20:10:20 Re: Legacy GiST invalid tuples