Re: buildfarm: could not read block 3 in file "base/16384/2662": read only 0 of 8192 bytes

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Peter Geoghegan <pg(at)bowt(dot)ie>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: buildfarm: could not read block 3 in file "base/16384/2662": read only 0 of 8192 bytes
Date: 2018-08-29 16:56:07
Message-ID: 22112.1535561767@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I wrote:
> * We now recursively enter ScanPgRelation, which (again) needs to do a
> search using pg_class_oid_index, so it (again) opens and locks that.
> BUT: LockRelationOid sees that *this process already has share lock on
> pg_class_oid_index*, so it figures it can skip AcceptInvalidationMessages.

BTW, I now have a theory for why we suddenly started seeing this problem
in mid-June: commits a54e1f158 et al added a ScanPgRelation call where
there had been none before (in RelationReloadNailed, for non-index rels).
That didn't create the problem, but it probably increased the odds of
seeing it happen.

Also ... isn't the last "relation->rd_isvalid = true" in
RelationReloadNailed wrong? If it got cleared during ScanPgRelation,
I do not think we want to believe that we got an up-to-date row.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2018-08-29 17:38:57 Re: FailedAssertion on partprune
Previous Message Jonathan S. Katz 2018-08-29 16:38:33 Re: Something's busted in plpgsql composite-variable handling