Re: improving wraparound behavior

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: improving wraparound behavior
Date: 2019-05-03 22:42:35
Message-ID: CA+TgmoY=j-0r+d9PnyD3FAMXOnhmhpsnKBZiwSVyDQofE3rNag@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, May 3, 2019 at 4:47 PM Andres Freund <andres(at)anarazel(dot)de> wrote:
> I'd actually say the proper fix would be to instead move the truncation
> to *after* finishing updating relfrozenxid etc. If we truncate, the
> additional cost of another in-place pg_class update, to update relpages,
> is basically insignificant. And the risk of errors, or being cancelled,
> during truncation is much higher than before (due to the AEL).

That would prevent the ERROR from impeding relfrozenxid advancement,
but it does not prevent the error itself, nor the XID consumption. If
autovacuum is hitting that ERROR, it will spew errors in the log but
succeed in advancing relfrozenxid anyway. I don't think that's as
nice as the behavior I proposed, but it's certainly better than the
status quo. If you are hitting that error due to a manual VACUUM,
under your proposal, you'll stop the manual VACUUM as soon as you hit
the first table where this happens, which is not what you want.
You'll also keep consuming XIDs, which is not what you want either,
especially if you are in single-user mode because the number of
remaining XIDs is less that a million.

> > Also, I think that old prepared transactions and stale replication
> > slots should be emphasized more prominently. Maybe something like:
> >
> > HINT: Commit or roll back old prepared transactions, drop stale
> > replication slots, or kill long-running sessions.
> > Ensure that autovacuum is progressing, or run a manual database-wide VACUUM.
>
> I think it'd be good to instead compute what the actual problem is. It'd
> not be particularly hard to show some these in the errdetail:
>
> 1) the xid horizon (xid + age) of the problematic database; potentially,
> if connected to that database, additionally compute what the oldest
> xid is (although that's computationally potentially too expensive)
> 2) the xid horizon (xid + age) due to prepared transactions, and the
> oldest transaction's name
> 3) the xid horizon (xid + age) due to replication slot, and the "oldest"
> slot's name
> 4) the xid horizon (xid + age) and pid for the connection with the
> oldest snapshot.
>
> I think that'd allow users much much easier to pinpoint what's going on.
>
> In fact, I think we probably should additionally add a function that can
> display the above. That'd make it much easier to write monitoring
> queries.

I think that the error and hint that you get from
GetNewTransactionId() has to be something that we can generate very
quickly, without doing anything that might hang on cluster with lots
of databases or lots of relations; but if there's useful detail we can
display there, that's good. With a view, it's more OK if it takes a
long time on a big cluster.

> IMO we also ought to compute the *actual* relfrozenxid/relminmxid for a
> table. I.e. the oldest xid actually present. It's pretty common for most
> tables to have effective horizons that are much newer than what
> GetOldestXmin()/vacuum_set_xid_limits() can return. Obviously we can do
> so only when scanning all non-frozen pages. But being able to record
> "more aggressive" horizons would often save unnecessary work. And it
> ought to not be hard. I think especially for regular non-freeze,
> non-wraparound vacuums that'll often result in a much newer relfrozenxid
> (as we'll otherwise just have GetOldestXmin() - vacuum_freeze_min_age).

Sure, that would make sense.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2019-05-03 22:46:10 Re: improving wraparound behavior
Previous Message Tom Lane 2019-05-03 22:29:35 First-draft release notes for back branches are up