Re: pgsql: Improve autovacuum logging for aggressive and anti-wraparound ru

From: Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>
To: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, michael(at)paquier(dot)xyz
Cc: sawada(dot)mshk(at)gmail(dot)com, tgl(at)sss(dot)pgh(dot)pa(dot)us, alvherre(at)2ndquadrant(dot)com, sk(at)zsrv(dot)org, nasbyj(at)amazon(dot)com, andres(at)anarazel(dot)de, robertmhaas(at)gmail(dot)com, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: pgsql: Improve autovacuum logging for aggressive and anti-wraparound ru
Date: 2019-03-08 22:05:53
Message-ID: 920b7f84-407c-005c-4b7d-e8266a17d22b@2ndQuadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-committers pgsql-hackers


On 10/9/18 5:15 AM, Kyotaro HORIGUCHI wrote:
> At Fri, 5 Oct 2018 15:35:04 +0900, Michael Paquier <michael(at)paquier(dot)xyz> wrote in <20181005063504(dot)GB14664(at)paquier(dot)xyz>
>> On Fri, Oct 05, 2018 at 12:16:03PM +0900, Michael Paquier wrote:
>>> So, I have come back to this stuff, and finished with the attached
>>> instead, so as the assertion is in a single place. I find that
>>> clearer. The comments have also been improved. Thoughts?
>> And so... I have been looking at committing this thing, and while
>> testing in-depth I have been able to trigger a case where an autovacuum
>> has been able to be not aggressive but anti-wraparound, which is exactly
>> what should not be possible, no? I have simply created an instance with
>> autovacuum_freeze_max_age = 200000, then ran pgbench with
>> autovacuum_freeze_table_age=200000 set for each table, and also ran
>> installcheck-world in parallel. This has been able to trigger the
>> assertion pretty quickly.
> I investigated it and in short, it can happen.
>
> It is a kind of race consdition between two autovacuum
> processes. do_autovacuum() looks into pg_class (using a snapshot)
> and vacuum_set_xid_limits() looks into relcache. If concurrent
> vacuum happens and one has finished the relation, another gets
> relcache invalidation and relfrozenxid is updated. If this
> happens between do_autovacuum() and vacuum_set_xid_limits(), the
> latter sees newer relfrozenxid than the former. The problem
> happens when it moves by more than 5% of
> autovacuum_freeze_max_age.
>
> If lazy_vacuum_rel() sees the situation, the relation is already
> aggressively vacuumed by a cocurrent worker. We can just ingore
> the state safely but also we know that the vacuum is useless.
>
> 1. Just allow the case there (and add comment).
> Causes redundant anti-wraparound vacuum.
>
> 2. Skip the relation by the condition.
>
> I think that we can safely skip the relation in the
> case. (attached)
>
> 3. Ensure that do_autovacuum always sees the same relfrozenxid
> with vacuum_set_xid_limits().
>
> 4. Prevent concurrent acuuming of the same relation rigorously,
> somehow.
>
> Thoughts?
>

I notice that this seems never to have been acted on. I think we should
apply this and remove the (confusing) message setting for the case we'll
now be avoiding. If not we should at least comment there that this is a
case we only expect to see in pathological cases.

cheers

andrew

--
Andrew Dunstan https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Responses

Browse pgsql-committers by date

  From Date Subject
Next Message Alvaro Herrera 2019-03-08 22:14:59 pgsql: Fix crash with old libxml2
Previous Message Tom Lane 2019-03-08 17:01:34 pgsql: Minor improvements for reformat_dat_file.pl.

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2019-03-08 23:04:16 Re: BUG #15383: Join Filter cost estimation problem in 10.5
Previous Message Tom Lane 2019-03-08 21:52:44 Re: Ordered Partitioned Table Scans