Re: Temporary tables prevent autovacuum, leading to XID wraparound

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc: "Tsunakawa, Takayuki" <tsunakawa(dot)takay(at)jp(dot)fujitsu(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Temporary tables prevent autovacuum, leading to XID wraparound
Date: 2018-02-02 16:48:15
Message-ID: CA+TgmobT3m=+dU5HF3VGVqiZ2O+v6P5wN1Gj+Prq+hj7dAm9AQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Feb 2, 2018 at 1:27 AM, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> Thank you for suggestion. It sounds more smarter. So it would be more
> better if we vacuums database for anti-wraparound in ascending order
> of relfrozenxid?

Currently, we're doing it based on datfrozenxid. I was looking for a
small, relatively safe change to improve things, which led to my
proposal: continue using datfrozenxid when the database isn't being
vacuumed, but when it is, substitute the datfrozenxid that we expect
the database *will have* after the tables currently being vacuumed are
finished for the actual datfrozenxid.

On further reflection, though, I see problems. Suppose db1 has
age(datfrozenxid) = 600m and two tables with age(relfrozenxid) of 600m
and 400m. db2 has age(datfrozenxid) = 500m. Then suppose #1 starts
vacuuming the table with age 600m. The AV launcher sees that, with
that table out of the way, we'll be able to bring datfrozenxid forward
to 400m and so sends worker #2 to db2. But actually, when the worker
in db1 finishes vacuuming the table with age 600m, it's going to have
to vacuum the table with age 400m before performing any relfrozenxid
update. So actually, the table with age 400m is holding the
relfrozenxid at 600m just as much as if it too had a relfrozenxid of
600m. In fact, any tables in the same database that need routine
vacuuming are a problem, too: we're going to vacuum all of those
before updating relfrozenxid, too.

Maybe we should start by making the scheduling algorithm used by the
individual workers smarter:

1. Let's teach do_autovacuum() that, when there are any tables needing
vacuum for wraparound, either for relfrozenxid or relminmxid, it
should vacuum only those and forget about the rest. This seems like
an obvious optimization to prevent us from delaying
datfrozenxid/datminmxid updates for the sake of vacuuming tables that
are "merely" bloated.

2. Let's have do_autovacuum() sort the tables to be vacuumed in
ascending order by relfrozenxid, so older tables are vacuumed first.
A zero-order idea is to put tables needing relfrozenxid vacuuming
before tables needing relminmxid vacuuming, and sort the latter by
ascending relminmxid, but maybe there's something smarter possible
there. The idea here is to vacuum tables in order of priority rather
than in whatever order they happen to be physically mentioned in
pg_class.

3. Let's have do_autovacuum() make an extra call to
vac_update_datfrozenxid() whenever the next table to be vacuumed is at
least 10 million XIDs (or MXIDs) newer than the first one it vacuumed
either since the last call to vac_update_datfrozenxid() or, if none,
since it started. That way, we don't have to wait for every single
table to get vacuumed before we can consider advancing
relfrozenxid/relminmxid.

4. When it's performing vacuuming for wraparound, let's have AV
workers advertise in shared memory the oldest relfrozenxid and
relminmxid that it might exist in the database. Given #1 and #2, this
is pretty easy, since we start by moving through tables in increasing
relfrozenxid order and then shift to moving through them in increasing
relminmxid order. When we're working through the relfrozenxid tables,
the oldest relminmxid doesn't move, and the oldest relfrozenxid is
that of the next table in the list. When we're working through the
relminmxid tables, it's the reverse. We need a little cleverness to
figure out what value to advertise when we're on the last table in
each list -- it should be the next-higher value, even though that will
be above the relevant threshold, not a sentinel value.

5. With those steps in place, I think we can now adopt my previous
idea to have the AV launcher use any advertised relfrozenxid (and, as
I now realize, relminmxid) instead of the ones found in pg_database,
because now we know that individual workers are definitely focused on
getting relfrozenxid (and/or relminmxid) as soon as possible, and
vacuuming unrelated tables won't help them do it any faster.

This gets us fairly close to vacuuming tables in decreasing order of
wraparound danger across the entire cluster. It's not perfect. It
prefers to keep vacuuming tables in the same database rather than
having a worker exit and maybe launching a new one in a different
database -- but the alternative is not very appealing. If we didn't
do it that way, and if we had a million tables with XIDs that were
closely spaced spread across different databases, we'd have to
terminate and relaunching workers at a very high rate to get
everything sorted out, which would be inefficient and annoying to
program. Also, it keeps the existing hard prioritization of
relfrozenxid over relminmxid, which could theoretically be wrong for
some installation. But I think that might not be a big problem in
practice, and it seems like that could be separately improved at
another time.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David G. Johnston 2018-02-02 16:57:49 Re: Add RANGE with values and exclusions clauses to the Window Functions
Previous Message Konstantin Knizhnik 2018-02-02 16:36:13 Re: Built-in connection pooling