Re: [HACKERS] Re: PD_ALL_VISIBLE flag was incorrectly set happend during repeatable vacuum

From: daveg <daveg(at)sonic(dot)net>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Greg Stark <gsstark(at)mit(dot)edu>, David Christensen <david(at)endpoint(dot)com>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Maxim Boguk <maxim(dot)boguk(at)gmail(dot)com>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, pgsql-admin(at)postgresql(dot)org, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] Re: PD_ALL_VISIBLE flag was incorrectly set happend during repeatable vacuum
Date: 2011-03-03 07:12:08
Message-ID: 20110303071208.GA4053@sonic.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin pgsql-hackers

On Tue, Mar 01, 2011 at 08:40:37AM -0500, Robert Haas wrote:
> On Mon, Feb 28, 2011 at 10:32 PM, Greg Stark <gsstark(at)mit(dot)edu> wrote:
> > On Tue, Mar 1, 2011 at 1:43 AM, David Christensen <david(at)endpoint(dot)com> wrote:
> >> Was this cluster upgraded to 8.4.4 from 8.4.0?  It sounds to me like a known bug in 8.4.0 which was fixed by this commit:
> >>
> >
> > The reproduction script described was running vacuum repeatedly. A
> > single vacuum run out to be sufficient to clean up the problem if it
> > was left-over.
> >
> > I wonder if it would help to write a regression test that runs 100 or
> > so vacuums and see if the bulid farm turns up any examples of this
> > behaviour.
>
> One other thing to keep in mind here is that the warning message we've
> chosen can be a bit misleading. The warning is:
>
> WARNING: PD_ALL_VISIBLE flag was incorrectly set in relation "test" page 1
>
> ...which implies that the state of the tuples is correct, and that the
> page-level bit is wrong in comparison. But I recently saw a case
> where the infomask got clobbered, resulting in this warning. The page
> level bit was correct, at least relative to the intended page
> contents; it was the a tuple on the page that was screwed up. It
> might have been better to pick a more neutral phrasing, like "page is
> marked all-visible but some tuples are not visible".

Yeesh. Yikes. I hope that this is not the case as we are seeing thousands of
these daily on each of 4 large production hosts. Mostly on catalogs,
especially pg_statistic. However it does occur on some high delete/insert
traffic user tables too.

Question: what would be the consequence of simply patching out the setting
of this flag? Assuming that the incorrect PD_ALL_VISIBLE flag is the only
problem (big assumption perhaps) then simply never setting it would at least
avoid the possibility of returning wrong answers, presumably at some
performance cost. We possibly could live with that until we get a handle
on the real cause and fix.

I had a look and don't really see anything except vacuum_lazy that sets it,
so it seems simple to disable.

Or have I understood this incorrectly?

Anything else I can be doing to try to track this down?

-dg

--
David Gould daveg(at)sonic(dot)net 510 536 1443 510 282 0869
If simplicity worked, the world would be overrun with insects.

In response to

Responses

Browse pgsql-admin by date

  From Date Subject
Next Message Heikki Linnakangas 2011-03-03 08:16:29 Re: [HACKERS] Re: PD_ALL_VISIBLE flag was incorrectly set happend during repeatable vacuum
Previous Message Joshua D. Drake 2011-03-03 05:18:50 Re: long running commits

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2011-03-03 07:14:00 Re: Sync Rep v17
Previous Message Tom Lane 2011-03-03 07:08:18 Re: [COMMITTERS] pgsql: Add KNNGIST support to contrib/btree_gist.