Re: Interesting glitch in autovacuum

From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: Interesting glitch in autovacuum
Date: 2008-09-10 17:17:27
Message-ID: 20080910171726.GH4399@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Tom Lane wrote:
> I observed a curious bug in autovac just now. Since plain vacuum avoids
> calling GetTransactionSnapshot, an autovac worker that happens not to
> analyze any tables will never call GetTransactionSnapshot at all.
> This means it will arrive at vac_update_datfrozenxid with
> RecentGlobalXmin never having been changed from its boot value of
> FirstNormalTransactionId, which means that it will fail to update the
> database's datfrozenxid ... or, if the current value of datfrozenxid
> is past 2 billion, that it will improperly advance datfrozenxid to
> sometime in the future.

Ouch :-(

> I've only directly tested this in HEAD, but I suspect the problem goes
> back a ways.

Well, this logic was introduced in 8.2; I'm not sure if there's a
problem in 8.1, but I don't think so.

> On reflection I'm not even sure that this is strictly an autovacuum
> bug. It can be cast more generically as "RecentGlobalXmin getting
> used without ever having been set", and it sure looks to me like the
> HOT patch may have introduced a few risks of that sort.

Agreed.

Maybe we should boot RecentGlobalXmin with InvalidOid, and ensure where
it's going to be used that it's not that.

> I'm thinking that maybe an appropriate fix is to insert a
> GetTransactionSnapshot call at the beginning of InitPostgres'
> transaction, thus ensuring that every backend has some vaguely sane
> value for RecentGlobalXmin before it tries to do any database access.

AFAIR there's an "initial transaction" in InitPostgres or something like
that. Since it goes away quickly, it'd be a good place to ensure the
snapshot does not last much longer.

> Another thought is that even with that, an autovac worker is likely
> to reach vac_update_datfrozenxid with a RecentGlobalXmin value that
> was computed at the start of its run, and is thus rather old.
> I wonder why vac_update_datfrozenxid is using the variable at all
> rather than doing GetOldestXmin? It's not like that function is
> so performance-critical that it needs to avoid calling GetOldestXmin.

The function is called only once per autovacuum iteration, and once in
manually-invoked vacuum, so certainly it's not performance-critical.

> Lastly, now that we have the PROC_IN_VACUUM test in GetSnapshotData,
> is it actually necessary for lazy vacuum to avoid setting a snapshot?
> It seems like it might be a good idea for it to do so in order to
> keep its RecentGlobalXmin reasonably current.

Hmm, I think I'd rather be inclined to get a snapshot just when it's
going to finish. That way, RecentGlobalXmin will be up to date even if
the

> I've only looked at this in HEAD, but I am thinking that we have
> a real problem here in both HEAD and 8.3. I'm less sure how bad
> things are in the older branches.

8.2 does contain the vac_update_datfrozenxid problem at the very least.
Older versions do not have that logic, so they are probably safe.

--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2008-09-10 17:34:11 Re: Interesting glitch in autovacuum
Previous Message Alex Hunsaker 2008-09-10 16:27:24 Re: hash index improving v3