Re: [PATCH] Disable bgworkers during servers start in pg_upgrade

From: Jehan-Guillaume de Rorthais <jgdr(at)dalibo(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Denis Laxalde <denis(dot)laxalde(at)dalibo(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: [PATCH] Disable bgworkers during servers start in pg_upgrade
Date: 2021-01-27 13:41:32
Message-ID: 20210127144132.215fe4bb@firost
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On Wed, 27 Jan 2021 11:25:11 +0100
Denis Laxalde <denis(dot)laxalde(at)dalibo(dot)com> wrote:

> Andres Freund a écrit :
> > On 2021-01-21 16:23:58 +0100, Denis Laxalde wrote:
> > > We found an issue in pg_upgrade on a cluster with a third-party
> > > background worker. The upgrade goes fine, but the new cluster is then in
> > > an inconsistent state. The background worker comes from the PoWA
> > > extension but the issue does not appear to related to this particular
> > > code.
> >
> > Well, it does imply that that backgrounder did something, as the pure
> > existence of a bgworker shouldn't affect anything. Presumably the issue is
> > that the bgworker actually does transactional writes, which causes problems
> > because the xids / multixacts from early during pg_upgrade won't actually
> > be valid after we do pg_resetxlog etc.

Indeed, it does some writes. As soon as the powa bgworker starts, it takes
"snapshots" of pg_stat_statements state and record them in a local table. To
avoid concurrent run, it takes a lock on some of its local rows using SELECT FOR
UPDATE, hence the mxid consumption.

The inconsistency occurs at least at two place:

* the datminmxid and relminmxid fields pg_dump(all)'ed and restored in the new
cluster.
* the multixid fields in the controlfile read during the check phase and
restored later using pg_resetxlog.

> > > As a solution, it seems that, for similar reasons that we restrict
> > > socket access to prevent accidental connections (from commit
> > > f763b77193), we should also prevent background workers to start at this
> > > step.
> >
> > I think that'd have quite the potential for negative impact - [...]
>
> Thank you for those insights!

+1

> > I wonder if we could
> >
> > a) set default_transaction_read_only to true, and explicitly change it
> > in the sessions that need that.

According to Denis' tests discussed off-list, it works fine in regard with the
powa bgworker, albeit some complaints in logs. However, I wonder how fragile it
could be as bgworker could easily manipulate either the GUC or "BEGIN READ
WRITE". I realize this is really uncommon practices, but bgworker code from
third parties might be surprising.

> > b) when in binary upgrade mode / -b, error out on all wal writes in
> > sessions that don't explicitly set a session-level GUC to allow
> > writes.

It feels safer because more specific to the subject. But I wonder if the
benefice worth adding some (limited?) complexity and a small/quick conditional
block in a very hot code path for a very limited use case.

What about c) where the bgworker are not loaded by default during binary upgrade
mode / -b unless they explicitly set a bgw_flags (BGWORKER_BINARY_UPGRADE ?)
when they are required during pg_upgrade?

Regards,

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Li Japin 2021-01-27 14:05:06 Re: Support ALTER SUBSCRIPTION ... ADD/DROP PUBLICATION ... syntax
Previous Message vignesh C 2021-01-27 13:35:16 Re: Printing backtrace of postgres processes