From: | daveg <daveg(at)sonic(dot)net> |
---|---|
To: | Greg Stark <gsstark(at)mit(dot)edu> |
Cc: | Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: PD_ALL_VISIBLE flag was incorrectly set happend during repeatable vacuum |
Date: | 2011-03-08 02:51:28 |
Message-ID: | 20110308025128.GB21941@sonic.net |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-admin pgsql-hackers |
On Tue, Mar 08, 2011 at 02:07:53AM +0000, Greg Stark wrote:
> On Mon, Mar 7, 2011 at 11:53 PM, daveg <daveg(at)sonic(dot)net> wrote:
> >> Looking at the code, I don't see how that situation could arise, though.
> >> The value calculated by GetOldestXmin() should never move backwards. And
> >> GetOldestXmin() is called in lazy_vacuum_rel(), after it has acquired a
> >> lock on the table, which should protect from a race condition where two
> >> vacuums could run on the table one after another, in a way where the
> >> later vacuum runs with an OldestXmin calculated before the first vacuum.
> >>
> >> Hmm, fiddling with vacuum_defer_cleanup_age on the fly could cause that,
> >> though. You don't do that, do you?
> >
> > No.
> >
> > I've updated the patch to collect db and schema and added Merlins patch as
> > well and run it for a while. The attached log is all the debug messages
> > for pg_statistic page 333 from one database. I've also attached the two
> > most recent page images for that particular page, the last digits in the
> > filename are the hour and minute of when the page was saved.
>
>
> Well from that log you definitely have OldestXmin going backwards. And
> not by a little bit either. at 6:33 it set the all_visible flag and
> then at 7:01 it was almost 1.3 million transactions earlier. In fact
> to precisely the same value that was in use for a transaction at 1:38.
> That seems like a bit of a coincidence though it's not repeated
> earlier.
>
> It also seems odd that it happens only with this one block of this one table.
It happens on several tables on thousands of blocks in dozens of dbs daily
on four separate but similar hosts. I have over 240MB of page images saved
from one host so far but assumed that would be too much to send, so I just
filtered it down to the one block.
> What does SHOW ALL show for the current settings in effect? And what
Attached.
> was process 23896, are there any other log messages from it? When did
> it start?
The process appears to be an autovacuum worker, all it logged was the attached
messages from the debugging for this.
Thanks
-dg
--
David Gould daveg(at)sonic(dot)net 510 536 1443 510 282 0869
If simplicity worked, the world would be overrun with insects.
Attachment | Content-Type | Size |
---|---|---|
pd_all.pid.23896 | text/plain | 168.4 KB |
pd_all.show_all.txt | text/plain | 30.4 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Bob Lunney | 2011-03-08 04:24:00 | PG Server Crash |
Previous Message | Greg Stark | 2011-03-08 02:07:53 | Re: PD_ALL_VISIBLE flag was incorrectly set happend during repeatable vacuum |
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2011-03-08 03:04:26 | Re: pgsql: Basic Recovery Control functions for use in Hot Standby. Pause, |
Previous Message | Fujii Masao | 2011-03-08 02:36:09 | Re: pgsql: Basic Recovery Control functions for use in Hot Standby. Pause, |