Re: GetOldestXmin going backwards is dangerous after all

From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: GetOldestXmin going backwards is dangerous after all
Date: 2013-02-04 10:37:52
Message-ID: 20130204103752.GB6645@awork2.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2013-02-01 19:24:02 -0500, Tom Lane wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> > Having said that, I agree that a fix in GetOldestXmin() would be nice
> > if we could find one, but since the comment describes at least three
> > different ways the value can move backwards, I'm not sure that there's
> > really a practical solution there, especially if you want something we
> > can back-patch.
>
> Actually, wait a second. As you say, the comment describes three known
> ways to make it go backwards. It strikes me that all three are fixable:
>
> * if allDbs is FALSE and there are no transactions running in the current
> * database, GetOldestXmin() returns latestCompletedXid. If a transaction
> * begins after that, its xmin will include in-progress transactions in other
> * databases that started earlier, so another call will return a lower value.
>
> The reason this is a problem is that GetOldestXmin ignores XIDs of
> processes that are connected to other DBs. It now seems to me that this
> is a flat-out bug. It can ignore their xmins, but it should include
> their XIDs, because the point of considering those XIDs is that they may
> contribute to the xmins of snapshots computed in the future by processes
> in our own DB. And snapshots never exclude any XIDs on the basis of
> which DB they're in. (They can't really, since we can't know when the
> snap is taken whether it might be used to examine shared catalogs.)

> * The return value is also adjusted with vacuum_defer_cleanup_age, so
> * increasing that setting on the fly is another easy way to make
> * GetOldestXmin() move backwards, with no consequences for data integrity.
>
> And as for that, it's been pretty clear for awhile that allowing
> vacuum_defer_cleanup_age to change on the fly was a bad idea we'd
> eventually have to undo. The day of reckoning has arrived: it needs
> to be PGC_POSTMASTER.

ISTM that the original problem can still occur, even after Simon's
commit.
1) start with -c vacuum_defer_cleanup_age=0
2) autovacuum vacuums "test";
3) restart with -c vacuum_defer_cleanup_age=10000
4) autovacuum vacuums "test"'s toast table;

should result in about the same ERROR, shouldn't it?

Given that there seemingly isn't yet a way to fix that people agree on
and that it "only" result in a transient error I think the fix for this
should be pushed after the next point release.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Miroslav Šimulčík 2013-02-04 10:40:49 Re: Temporal features in PostgreSQL
Previous Message Marko Tiikkaja 2013-02-04 10:04:28 Re: pg_dump --pretty-print-views