Re: pgsql: Improve autovacuum logging for aggressive and anti-wraparound ru

From: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To: michael(at)paquier(dot)xyz
Cc: sawada(dot)mshk(at)gmail(dot)com, tgl(at)sss(dot)pgh(dot)pa(dot)us, alvherre(at)2ndquadrant(dot)com, sk(at)zsrv(dot)org, nasbyj(at)amazon(dot)com, andres(at)anarazel(dot)de, robertmhaas(at)gmail(dot)com, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: pgsql: Improve autovacuum logging for aggressive and anti-wraparound ru
Date: 2018-10-09 09:15:36
Message-ID: 20181009.181536.142257785.horiguchi.kyotaro@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-committers pgsql-hackers

At Fri, 5 Oct 2018 15:35:04 +0900, Michael Paquier <michael(at)paquier(dot)xyz> wrote in <20181005063504(dot)GB14664(at)paquier(dot)xyz>
> On Fri, Oct 05, 2018 at 12:16:03PM +0900, Michael Paquier wrote:
> > So, I have come back to this stuff, and finished with the attached
> > instead, so as the assertion is in a single place. I find that
> > clearer. The comments have also been improved. Thoughts?
>
> And so... I have been looking at committing this thing, and while
> testing in-depth I have been able to trigger a case where an autovacuum
> has been able to be not aggressive but anti-wraparound, which is exactly
> what should not be possible, no? I have simply created an instance with
> autovacuum_freeze_max_age = 200000, then ran pgbench with
> autovacuum_freeze_table_age=200000 set for each table, and also ran
> installcheck-world in parallel. This has been able to trigger the
> assertion pretty quickly.

I investigated it and in short, it can happen.

It is a kind of race consdition between two autovacuum
processes. do_autovacuum() looks into pg_class (using a snapshot)
and vacuum_set_xid_limits() looks into relcache. If concurrent
vacuum happens and one has finished the relation, another gets
relcache invalidation and relfrozenxid is updated. If this
happens between do_autovacuum() and vacuum_set_xid_limits(), the
latter sees newer relfrozenxid than the former. The problem
happens when it moves by more than 5% of
autovacuum_freeze_max_age.

If lazy_vacuum_rel() sees the situation, the relation is already
aggressively vacuumed by a cocurrent worker. We can just ingore
the state safely but also we know that the vacuum is useless.

1. Just allow the case there (and add comment).
Causes redundant anti-wraparound vacuum.

2. Skip the relation by the condition.

I think that we can safely skip the relation in the
case. (attached)

3. Ensure that do_autovacuum always sees the same relfrozenxid
with vacuum_set_xid_limits().

4. Prevent concurrent acuuming of the same relation rigorously,
somehow.

Thoughts?

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

Attachment Content-Type Size
skip_vacuum_after_concurrently_processed.patch text/x-patch 980 bytes

In response to

Responses

Browse pgsql-committers by date

  From Date Subject
Next Message Michael Paquier 2018-10-09 13:38:25 pgsql: Add pg_ls_archive_statusdir function
Previous Message Tom Lane 2018-10-09 04:05:02 pgsql: Convert some long lists in configure.in to one-line-per-entry st

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Langote 2018-10-09 09:41:59 Re: partition tree inspection functions
Previous Message Michael Paquier 2018-10-09 09:10:00 Re: partition tree inspection functions