From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | Robert Haas <robertmhaas(at)gmail(dot)com> |
Cc: | "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Something is rotten in the state of Denmark... |
Date: | 2015-04-02 18:55:22 |
Message-ID: | 19248.1428000922@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> On Thu, Apr 2, 2015 at 2:40 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> However, I'm having second thoughts about whether we've fully diagnosed
>> this. Three out of the four failures we've seen in the buildfarm reported
>> "cache lookup failed for access method 403", not "could not open relation
>> with OID 2601" ... and I'm so far only able to replicate the latter
>> symptom. It's really unclear how the former one could arise, because
>> nothing that vacuum.sql does would change xmin of the rows in pg_am.
> It probably changes the *relfilenode* of pg_am, because it runs VACUUM
> FULL on that catalog. Perhaps some backend sees the old relfilenode
> value and tries to a heap scan, interpreting the now-truncated file as
> empty?
Yeah, I came up with the same theory a few minutes later. Trying to
reproduce on that basis.
Actually, now that I think it through, the "could not open relation"
error is pretty odd in itself. If we are trying to open pg_am using
a stale catalog snapshot, it seems like we ought to reliably find its
old pg_class tuple (the one with the obsolete relfilenode), rather than
finding nothing. But the latter is the behavior I'm seeing.
regards, tom lane
From | Date | Subject | |
---|---|---|---|
Next Message | Robert Haas | 2015-04-02 19:17:15 | Re: Re: Abbreviated keys for Datum tuplesort |
Previous Message | Robert Haas | 2015-04-02 18:49:17 | Re: Something is rotten in the state of Denmark... |