Re: Autovacuum vs vac_update_datfrozenxid() vs ?

From: Michael Paquier <michael(at)paquier(dot)xyz>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: pgsql-hackers(at)postgresql(dot)org, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: Autovacuum vs vac_update_datfrozenxid() vs ?
Date: 2020-03-25 08:29:00
Message-ID: 20200325082900.GB177907@paquier.xyz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Mar 23, 2020 at 04:50:36PM -0700, Andres Freund wrote:
> Which all in theory makes sense - but there's two fairly sizable races
> here:
> 1) Because lastSaneFrozenXid is determined *before* the snapshot for
> the pg_class, it's entirely possible that concurrently another vacuum on
> another table in the same database starts & finishes. And that very well
> can end up using a relfrozenxid that's newer than the current
> database.
> 2) Much worse, because vac_update_relstats() uses heap_inplace_update(),
> there's actually no snapshot semantics here anyway! Which means that
> the window for a race isn't just between the ReadNewTransactionId()
> and the systable_beginscan(), but instead lasts for the duration of
> the entire scan.
>
> Either of these can then triggers vac_update_datfrozenxid() to
> *silently* bail out without touching anything.

Hmm. It looks that you are right for both points, with 2) being much
more plausible to trigger. Don't we have other issues with the cutoff
calculations done within vacuum_set_xid_limits()? We decide if an
aggressive job happens depending on the data in the relation cache,
but that may not play well with concurrent jobs that just finished
with invalidations?

> I think there's also another (even larger?) race in
> vac_update_datfrozenxid(): Unless I miss something, two backends can
> concurrently run through the scan in vac_update_datfrozenxid() for two
> different tables in the same database, both can check that they need to
> update the pg_database row, and then one of them can overwrite the
> results of the other. And the one that updates last might actually be
> the one with an older horizon. This is possible since there is no 'per
> database' locking in heap_vacuum_rel().

The chances of hitting this gets higher with a higher number of max
workers, and it is easy to finish with multiple concurrent workers on
the same database with a busy system. Perhaps a reason why it was
harder for me to reproduce the problem that was that I was just using
the default for autovacuum_max_workers. Looks worth trying with a
crazily high value for the max number of workers and more relations
doing a bunch of heavy updates (the application I saw facing a
lockdown uses literally hundreds of relations with a poor man's
partitioning schema and some tables of the schema are heavily
updated).

> The way I suspect that interacts with the issue in [1] ist that once
> that has happened, do_start_worker() figures out it's doing a xid
> wraparound vacuum based on the "wrong" datfrozenxid:
> /* Check to see if this one is at risk of wraparound */
> if (TransactionIdPrecedes(tmp->adw_frozenxid, xidForceLimit))
> {
> if (avdb == NULL ||
> TransactionIdPrecedes(tmp->adw_frozenxid,
> avdb->adw_frozenxid))
> avdb = tmp;
> for_xid_wrap = true;
> continue;
> }
> else if (for_xid_wrap)
> continue; /* ignore not-at-risk DBs */
>
> which means autovac launcher will only start workers for that database -
> but there might actually not be any work for it.

Actually, I don't think that pg_database.datfrozenxid explains
everything. As per what I saw this field is getting freshly updated.
So workers are spawned for a wraparound, and the relation-level stats
cause workers to try to autovacuum a table but nothing happens at the
end.

> I have some theories as to why this is more common in 12, but I've not
> fully nailed them down.

It seems to me that 2aa6e33 also helps in triggering those issues more
easily as it creates a kind of cascading effect with the workers still
getting spawned by the launcher. However the workers finish by
handling relations which have nothing to do as I suspect that skipping
the anti-wraparound and non-aggressive jobs creates a reduction of the
number of relation stat updates done via vac_update_relstats(), where
individual relations finish in a state where they consider that they
cannot do any more work, being put in a state where they all consider
non-aggressive but anti-wraparound jobs as the norm, causing nothing
to happen as we saw in the thread of -general mentioned by Andres
upthread. And then things just loop over again and again. Note that
it is also mentioned on the other thread that issuing a manual VACUUM
FREEZE fixes temporarily the lockdown issue as it forces an update of
the relation stats. So it seems to me that reverting 2aa6e33 is a
necessary first step to prevent the lockdown to happen, still that
does not address the actual issues causing anti-wraparound and
non-aggressive jobs to exist.
--
Michael

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Konstantin Knizhnik 2020-03-25 08:45:47 Small computeRegionDelta optimization.
Previous Message Daniel Gustafsson 2020-03-25 08:14:51 Re: pg_upgrade fails with non-standard ACL