Re: buildfarm: could not read block 3 in file "base/16384/2662": read only 0 of 8192 bytes

From: Andres Freund <andres(at)anarazel(dot)de>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Peter Geoghegan <pg(at)bowt(dot)ie>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: buildfarm: could not read block 3 in file "base/16384/2662": read only 0 of 8192 bytes
Date: 2018-08-29 03:13:08
Message-ID: 20180829031308.h46ema2zo7lhcb5d@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2018-08-28 19:56:58 -0700, Andres Freund wrote:
> Hi Everyone,
>
>
> Tom, I think this could use your eyes.
>
>
> On 2018-08-28 00:52:13 -0700, Andres Freund wrote:
> > I've a few leads that I'm currently testing out. One observation I think
> > might be crucial is that the problem, in Tomas' testcase with just
> > VACUUM FULL of pg_class and INSERTs into another rel, appears to only
> > happen if autovacuum is enabled. The crashes are always immediately
> > after autovacuum did an analyze of the relation.
>
> I've run this with autovacuum disabled for quite a while, without the
> problem re-occuring. It does however re-occur with pgbench running an
> ANALYZE on the user defined table.
>
>
> > What I see on the apply side, I think, that invalidation records for
> > contents of pg_class and pg_class itself are intermixed.
>
> I've not 100% figured this out yet, and I need to switch to another task
> for a bit, but what I think is happening here is that we have a
> relatively fundamental issue with how rewrites of catalog tables are
> handled. A slightly instrumented version of postgres shows the
> following:
>
> 2018-08-29 04:30:17.515 CEST [19551] LOG: statement: INSERT INTO t(b,c) SELECT i,i FROM generate_series(1,59) s(i);
> ...
> 2018-08-29 04:30:17.559 CEST [19554] LOG: statement: vacuum full pg_class;
> 2018-08-29 04:30:17.571 CEST [19557] LOG: swapping mapped 1259/1281520 w 1281526/1281526
> 2018-08-29 04:30:17.571 CEST [19557] STATEMENT: vacuum full pg_class;
> 2018-08-29 04:30:17.572 CEST [19557] LOG: reindexing 2662 to 1281529 on 1259/1281526
> 2018-08-29 04:30:17.572 CEST [19557] STATEMENT: vacuum full pg_class;
> 2018-08-29 04:30:17.576 CEST [19557] LOG: reindexing 2663 to 1281530 on 1259/1281526
> 2018-08-29 04:30:17.576 CEST [19557] STATEMENT: vacuum full pg_class;
> 2018-08-29 04:30:17.578 CEST [19557] LOG: reindexing 3455 to 1281531 on 1259/1281526
> 2018-08-29 04:30:17.578 CEST [19557] STATEMENT: vacuum full pg_class;
> 2018-08-29 04:30:17.582 CEST [19557] LOG: releasing lock on 1259
> 2018-08-29 04:30:17.582 CEST [19557] STATEMENT: vacuum full pg_class;
> 2018-08-29 04:30:17.582 CEST [19557] LOG: duration: 87.389 ms
> 2018-08-29 04:30:17.582 CEST [19557] LOG: statement: vacuum full pg_class;
> 2018-08-29 04:30:17.583 CEST [19563] LOG: duration: 630.481 ms
>
> 2018-08-29 04:30:17.584 CEST [19563] LOG: statement: analyze t;
> 2018-08-29 04:30:17.597 CEST [19555] LOG: swapping mapped 1259/1281526 w 1281532/1281532
> 2018-08-29 04:30:17.597 CEST [19555] STATEMENT: vacuum full pg_class;
> 2018-08-29 04:30:17.597 CEST [19555] LOG: reindexing 2662 to 1281535 on 1259/1281532
> 2018-08-29 04:30:17.597 CEST [19555] STATEMENT: vacuum full pg_class;
> 2018-08-29 04:30:17.599 CEST [19555] LOG: reindexing 2663 to 1281536 on 1259/1281532
> 2018-08-29 04:30:17.599 CEST [19555] STATEMENT: vacuum full pg_class;
> 2018-08-29 04:30:17.601 CEST [19555] LOG: reindexing 3455 to 1281537 on 1259/1281532
> 2018-08-29 04:30:17.601 CEST [19555] STATEMENT: vacuum full pg_class;
> 2018-08-29 04:30:17.605 CEST [19555] LOG: releasing lock on 1259
> 2018-08-29 04:30:17.605 CEST [19555] STATEMENT: vacuum full pg_class;
> 2018-08-29 04:30:17.605 CEST [19551] WARNING: could not read block 3 in file "base/16384/1281529": read only 0 of 8192 bytes
> (it's a warning because the process SIGSTOPs itself afterwards, for
> easier debugging)
>
> So what we can see is that a first VACUUM FULL of pg_class builds a new
> pg_class_oid_index/2662 index under filefilenode 1281526. Then a
> *second* VACUUM FULL again reindexes 2663, this time to 1281532.
>
> But *after* the VACUUM FULL has finished and released the lock (that's
> printed just before the UnlockRelationIdForSession in vacuum_rel()), we
> can see that the INSERT from pid 19551 still accesses the OLD index,
> from *before* the second VACUUM FULL. Which at that point doesn't exist
> anymore.
>
> I think the reason for that is that an invalidation for the relmapper
> changes due to the second VACUUM has been queued, but currently the
> invalidation processing hasn't yet gotten to that entry yet. Instead
> currently the entry for the user defined table is being rebuilt, which
> then tries to access pg_class, but under its "old" relfilenode. Which
> then fails. The fact that inex scans trigger this problem isn't
> relevant - the heap entry for pg_class also refers to the old entry, as
> I've verified using gdb.
>
>
> I'm unclear on how can fix this, as long as we rebuild cache entries
> during invalidation, rather than during access. For most cache entries
> it's not a problem to rebuild them relying on other outdated content,
> we'll just rebuild them again later. But for things like the relation
> mapper that's not true, unfortunately.

Fwiw, if I use gdb to issue InvalidateSystemCaches(),
RelationInitPhysicalAddr(sysscan->irel) makes the relcache entry have
the correct relfilenode, and restarting the relcache invalidation (by
judicious use of 'ret') "fixes" the issue.

Greetings,

Andres Freund

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2018-08-29 03:18:25 Re: buildfarm: could not read block 3 in file "base/16384/2662": read only 0 of 8192 bytes
Previous Message Andres Freund 2018-08-29 02:56:58 Re: buildfarm: could not read block 3 in file "base/16384/2662": read only 0 of 8192 bytes