Re: PD_ALL_VISIBLE flag was incorrectly set happend during repeatable vacuum

From: daveg <daveg(at)sonic(dot)net>
To: Greg Stark <gsstark(at)mit(dot)edu>
Cc: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: PD_ALL_VISIBLE flag was incorrectly set happend during repeatable vacuum
Date: 2011-03-08 02:51:28
Message-ID: 20110308025128.GB21941@sonic.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin pgsql-hackers

On Tue, Mar 08, 2011 at 02:07:53AM +0000, Greg Stark wrote:
> On Mon, Mar 7, 2011 at 11:53 PM, daveg <daveg(at)sonic(dot)net> wrote:
> >> Looking at the code, I don't see how that situation could arise, though.
> >> The value calculated by GetOldestXmin() should never move backwards. And
> >> GetOldestXmin() is called in lazy_vacuum_rel(), after it has acquired a
> >> lock on the table, which should protect from a race condition where two
> >> vacuums could run on the table one after another, in a way where the
> >> later vacuum runs with an OldestXmin calculated before the first vacuum.
> >>
> >> Hmm, fiddling with vacuum_defer_cleanup_age on the fly could cause that,
> >> though. You don't do that, do you?
> >
> > No.
> >
> > I've updated the patch to collect db and schema and added Merlins patch as
> > well and run it for a while. The attached log is all the debug messages
> > for pg_statistic page 333 from one database. I've also attached the two
> > most recent page images for that particular page, the last digits in the
> > filename are the hour and minute of when the page was saved.
>
>
> Well from that log you definitely have OldestXmin going backwards. And
> not by a little bit either. at 6:33 it set the all_visible flag and
> then at 7:01 it was almost 1.3 million transactions earlier. In fact
> to precisely the same value that was in use for a transaction at 1:38.
> That seems like a bit of a coincidence though it's not repeated
> earlier.
>
> It also seems odd that it happens only with this one block of this one table.

It happens on several tables on thousands of blocks in dozens of dbs daily
on four separate but similar hosts. I have over 240MB of page images saved
from one host so far but assumed that would be too much to send, so I just
filtered it down to the one block.


> What does SHOW ALL show for the current settings in effect? And what

Attached.

> was process 23896, are there any other log messages from it? When did
> it start?

The process appears to be an autovacuum worker, all it logged was the attached
messages from the debugging for this.

Thanks

-dg

--
David Gould daveg(at)sonic(dot)net 510 536 1443 510 282 0869
If simplicity worked, the world would be overrun with insects.

Attachment Content-Type Size
pd_all.pid.23896 text/plain 168.4 KB
pd_all.show_all.txt text/plain 30.4 KB

In response to

Browse pgsql-admin by date

  From Date Subject
Next Message Bob Lunney 2011-03-08 04:24:00 PG Server Crash
Previous Message Greg Stark 2011-03-08 02:07:53 Re: PD_ALL_VISIBLE flag was incorrectly set happend during repeatable vacuum

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2011-03-08 03:04:26 Re: pgsql: Basic Recovery Control functions for use in Hot Standby. Pause,
Previous Message Fujii Masao 2011-03-08 02:36:09 Re: pgsql: Basic Recovery Control functions for use in Hot Standby. Pause,