Re: buildfarm: could not read block 3 in file "base/16384/2662": read only 0 of 8192 bytes

From: Andres Freund <andres(at)anarazel(dot)de>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Peter Geoghegan <pg(at)bowt(dot)ie>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: buildfarm: could not read block 3 in file "base/16384/2662": read only 0 of 8192 bytes
Date: 2018-08-29 03:51:47
Message-ID: 20180829035147.xarkwumymohayfsj@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox
Thread:
Lists: pgsql-hackers

On 2018-08-28 23:32:51 -0400, Tom Lane wrote:
> Andres Freund <andres(at)anarazel(dot)de> writes:
> > On 2018-08-28 20:27:14 -0700, Andres Freund wrote:
> >> Locally that triggers the problem within usually a few seconds.
>
> > FWIW, it does so including versions as old as 9.2.

9.0 as well, so it's definitely not some recently backpatched change.

> Interesting. One thing I'd like to know is why this only started
> showing up in the buildfarm a few weeks ago. Maybe that's just a
> timing effect, but it seems odd.

I suspect it's largely timing and invalidation traffic. It's been
really hard to reproduce with core regression tests after all. If my
theory is right - and I'm getting more and more certain - that we're
trying to access a remapped relation during invalidation, before reading
the relevant relmapper invalidation, that also makes sense: You need to
be unlucky enough that there's a relmapper invalidation *after* an
invalidation for a currently "open" relation (otherwise we'd not do
inval processing). Most of the time we'll likely just be "too fast" to
process all the invalidations (thereby not getting to the point that
pg_class has been remapped later in the queue).

I think it might just be related to some modifed tests changing when
exactly autovacuum is triggered, which then in turn triggers the
invalidation.

Unfortunately the only "real" fix I can think of is to change the
relcache invalidation logic to only ever mark entries as 'invalid', and
not do any eager rebuilding of the entries. That might be a good idea
reasons for performance anyway, but is quite scary to imagine doing in a
minor release.

Greetings,

Andres Freund

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Sand Stone 2018-08-29 04:44:07 Re: dsa_allocate() faliure
Previous Message Tom Lane 2018-08-29 03:32:51 Re: buildfarm: could not read block 3 in file "base/16384/2662": read only 0 of 8192 bytes