Re: hot_standby_feedback vs excludeVacuum and snapshots

From: Simon Riggs <simon(at)2ndquadrant(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Greg Stark <stark(at)mit(dot)edu>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: hot_standby_feedback vs excludeVacuum and snapshots
Date: 2018-06-09 13:10:11
Message-ID: CANP8+jJ6XE8EJ3HX5kRtC_nLQG8d=BASDat4kE00oMACO_SfoA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 8 June 2018 at 19:03, Andres Freund <andres(at)anarazel(dot)de> wrote:
> Hi,
>
> On 2018-06-08 09:23:02 +0100, Simon Riggs wrote:
>> I have also found another bug which affects what we do next.
>>
>> For context, AEL locks are normally removed by COMMIT or ABORT.
>> StandbyReleaseOldLocks() is just a sweeper to catch anything that
>> didn't send an abort before it died, so it hardly ever activates. The
>> coding of StandbyReleaseOldLocks() is backwards... if it ain't in the
>> running list, then we remove it.
>>
>> But that wasn't working correctly either, since as of 49bff5300d527 we
>> assigned AELs to subxids. Subxids weren't in the running list and so
>> AELs held by them would have been removed at the wrong time, an extant
>> bug in PG10. It looks to me like they would have been removed either
>> early or late, up to the next runningxact info record. They would be
>> removed, so no leakage, but the late timing wouldn't be noticeable for
>> tests or most usage, since it would look just like lock contention.
>> Early release might give same issue of block access to non-existent
>> block/file.
>
> Yikes, that's kinda bad. It can also just cause plain crashes, no? The
> on-disk / catalog state isn't necessarily consistent during DDL, which
> is why we hold AE locks. At the very least it can cause corruption of
> in-use relcache entries and such. In practice the fact this probably
> hits only a limited number of people because it requires DDL to happen
> in subtransactions, which probably isn't *that* common.

Yep, kinda bad, hence fix.

>> So the attached patch fixes both the bug in the recent commit and the
>> one I just found by observation of 49bff5300d527, since they are
>> related.
>
> Can we please keep them separate?

The attached patch is separate. It fixes 49bff5300d527 and also the
committed code, but should work fine even if we revert. Will
doublecheck.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Fabien COELHO 2018-06-09 13:31:13 Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors
Previous Message Simon Riggs 2018-06-09 13:00:36 Re: hot_standby_feedback vs excludeVacuum and snapshots