Re: improving wraparound behavior

From: Andres Freund <andres(at)anarazel(dot)de>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: improving wraparound behavior
Date: 2019-05-04 00:45:16
Message-ID: 20190504004516.6cspdxxgddcd2rme@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2019-05-03 18:42:35 -0400, Robert Haas wrote:
> On Fri, May 3, 2019 at 4:47 PM Andres Freund <andres(at)anarazel(dot)de> wrote:
> > I'd actually say the proper fix would be to instead move the truncation
> > to *after* finishing updating relfrozenxid etc. If we truncate, the
> > additional cost of another in-place pg_class update, to update relpages,
> > is basically insignificant. And the risk of errors, or being cancelled,
> > during truncation is much higher than before (due to the AEL).
>
> That would prevent the ERROR from impeding relfrozenxid advancement,
> but it does not prevent the error itself, nor the XID consumption. If
> autovacuum is hitting that ERROR, it will spew errors in the log but
> succeed in advancing relfrozenxid anyway. I don't think that's as
> nice as the behavior I proposed, but it's certainly better than the
> status quo. If you are hitting that error due to a manual VACUUM,
> under your proposal, you'll stop the manual VACUUM as soon as you hit
> the first table where this happens, which is not what you want.
> You'll also keep consuming XIDs, which is not what you want either,
> especially if you are in single-user mode because the number of
> remaining XIDs is less that a million.

Part of my opposition to just disabling it when close to a wraparound,
is that it still allows to get close to wraparound because of truncation
issues. IMO preventing getting closer to wraparound is more important
than making it more "comfortable" to be in a wraparound situation.

The second problem I see is that even somebody close to a wraparound
might have an urgent need to free up some space. So I'm a bit wary of
just disabling it.

Wonder if there's a reasonable way that'd allow to do the WAL logging
for the truncation without using an xid. One way would be to just get
rid of the lock on the primary as previously discussed. But we could
also drive the locking through the WAL records that do the actual
truncation - then there'd not be a need for an xid. It's probably not a
entirely trivial change, but I don't think it'd be too bad?

> > > Also, I think that old prepared transactions and stale replication
> > > slots should be emphasized more prominently. Maybe something like:
> > >
> > > HINT: Commit or roll back old prepared transactions, drop stale
> > > replication slots, or kill long-running sessions.
> > > Ensure that autovacuum is progressing, or run a manual database-wide VACUUM.
> >
> > I think it'd be good to instead compute what the actual problem is. It'd
> > not be particularly hard to show some these in the errdetail:
> >
> > 1) the xid horizon (xid + age) of the problematic database; potentially,
> > if connected to that database, additionally compute what the oldest
> > xid is (although that's computationally potentially too expensive)

s/oldest xid/oldest relfrozenxid/

> > 2) the xid horizon (xid + age) due to prepared transactions, and the
> > oldest transaction's name
> > 3) the xid horizon (xid + age) due to replication slot, and the "oldest"
> > slot's name
> > 4) the xid horizon (xid + age) and pid for the connection with the
> > oldest snapshot.
> >
> > I think that'd allow users much much easier to pinpoint what's going on.
> >
> > In fact, I think we probably should additionally add a function that can
> > display the above. That'd make it much easier to write monitoring
> > queries.
>
> I think that the error and hint that you get from
> GetNewTransactionId() has to be something that we can generate very
> quickly, without doing anything that might hang on cluster with lots
> of databases or lots of relations; but if there's useful detail we can
> display there, that's good. With a view, it's more OK if it takes a
> long time on a big cluster.

Yea, I agree it has to be reasonably fast. But all of the above, with
the exception of the optional "oldest table", should be cheap enough to
compute. Sure, a scan through PGXACT isn't cheap, but in comparison to
an ereport() and an impending shutdown it's peanuts. In contrast to
scanning a pg_class that could be many many gigabytes.

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Thomas Munro 2019-05-04 01:07:01 Re: First-draft release notes for back branches are up
Previous Message Tomas Vondra 2019-05-04 00:34:14 accounting for memory used for BufFile during hash joins