Re: buildfarm: could not read block 3 in file "base/16384/2662": read only 0 of 8192 bytes

From: Andres Freund <andres(at)anarazel(dot)de>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Peter Geoghegan <pg(at)bowt(dot)ie>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: buildfarm: could not read block 3 in file "base/16384/2662": read only 0 of 8192 bytes
Date: 2018-08-29 02:56:58
Message-ID: 20180829025658.brk3hfdczo3f73al@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Everyone,

Tom, I think this could use your eyes.

On 2018-08-28 00:52:13 -0700, Andres Freund wrote:
> I've a few leads that I'm currently testing out. One observation I think
> might be crucial is that the problem, in Tomas' testcase with just
> VACUUM FULL of pg_class and INSERTs into another rel, appears to only
> happen if autovacuum is enabled. The crashes are always immediately
> after autovacuum did an analyze of the relation.

I've run this with autovacuum disabled for quite a while, without the
problem re-occuring. It does however re-occur with pgbench running an
ANALYZE on the user defined table.

> What I see on the apply side, I think, that invalidation records for
> contents of pg_class and pg_class itself are intermixed.

I've not 100% figured this out yet, and I need to switch to another task
for a bit, but what I think is happening here is that we have a
relatively fundamental issue with how rewrites of catalog tables are
handled. A slightly instrumented version of postgres shows the
following:

2018-08-29 04:30:17.515 CEST [19551] LOG: statement: INSERT INTO t(b,c) SELECT i,i FROM generate_series(1,59) s(i);
...
2018-08-29 04:30:17.559 CEST [19554] LOG: statement: vacuum full pg_class;
2018-08-29 04:30:17.571 CEST [19557] LOG: swapping mapped 1259/1281520 w 1281526/1281526
2018-08-29 04:30:17.571 CEST [19557] STATEMENT: vacuum full pg_class;
2018-08-29 04:30:17.572 CEST [19557] LOG: reindexing 2662 to 1281529 on 1259/1281526
2018-08-29 04:30:17.572 CEST [19557] STATEMENT: vacuum full pg_class;
2018-08-29 04:30:17.576 CEST [19557] LOG: reindexing 2663 to 1281530 on 1259/1281526
2018-08-29 04:30:17.576 CEST [19557] STATEMENT: vacuum full pg_class;
2018-08-29 04:30:17.578 CEST [19557] LOG: reindexing 3455 to 1281531 on 1259/1281526
2018-08-29 04:30:17.578 CEST [19557] STATEMENT: vacuum full pg_class;
2018-08-29 04:30:17.582 CEST [19557] LOG: releasing lock on 1259
2018-08-29 04:30:17.582 CEST [19557] STATEMENT: vacuum full pg_class;
2018-08-29 04:30:17.582 CEST [19557] LOG: duration: 87.389 ms
2018-08-29 04:30:17.582 CEST [19557] LOG: statement: vacuum full pg_class;
2018-08-29 04:30:17.583 CEST [19563] LOG: duration: 630.481 ms

2018-08-29 04:30:17.584 CEST [19563] LOG: statement: analyze t;
2018-08-29 04:30:17.597 CEST [19555] LOG: swapping mapped 1259/1281526 w 1281532/1281532
2018-08-29 04:30:17.597 CEST [19555] STATEMENT: vacuum full pg_class;
2018-08-29 04:30:17.597 CEST [19555] LOG: reindexing 2662 to 1281535 on 1259/1281532
2018-08-29 04:30:17.597 CEST [19555] STATEMENT: vacuum full pg_class;
2018-08-29 04:30:17.599 CEST [19555] LOG: reindexing 2663 to 1281536 on 1259/1281532
2018-08-29 04:30:17.599 CEST [19555] STATEMENT: vacuum full pg_class;
2018-08-29 04:30:17.601 CEST [19555] LOG: reindexing 3455 to 1281537 on 1259/1281532
2018-08-29 04:30:17.601 CEST [19555] STATEMENT: vacuum full pg_class;
2018-08-29 04:30:17.605 CEST [19555] LOG: releasing lock on 1259
2018-08-29 04:30:17.605 CEST [19555] STATEMENT: vacuum full pg_class;
2018-08-29 04:30:17.605 CEST [19551] WARNING: could not read block 3 in file "base/16384/1281529": read only 0 of 8192 bytes
(it's a warning because the process SIGSTOPs itself afterwards, for
easier debugging)

So what we can see is that a first VACUUM FULL of pg_class builds a new
pg_class_oid_index/2662 index under filefilenode 1281526. Then a
*second* VACUUM FULL again reindexes 2663, this time to 1281532.

But *after* the VACUUM FULL has finished and released the lock (that's
printed just before the UnlockRelationIdForSession in vacuum_rel()), we
can see that the INSERT from pid 19551 still accesses the OLD index,
from *before* the second VACUUM FULL. Which at that point doesn't exist
anymore.

I think the reason for that is that an invalidation for the relmapper
changes due to the second VACUUM has been queued, but currently the
invalidation processing hasn't yet gotten to that entry yet. Instead
currently the entry for the user defined table is being rebuilt, which
then tries to access pg_class, but under its "old" relfilenode. Which
then fails. The fact that inex scans trigger this problem isn't
relevant - the heap entry for pg_class also refers to the old entry, as
I've verified using gdb.

I'm unclear on how can fix this, as long as we rebuild cache entries
during invalidation, rather than during access. For most cache entries
it's not a problem to rebuild them relying on other outdated content,
we'll just rebuild them again later. But for things like the relation
mapper that's not true, unfortunately.

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2018-08-29 03:13:08 Re: buildfarm: could not read block 3 in file "base/16384/2662": read only 0 of 8192 bytes
Previous Message Kyotaro HORIGUCHI 2018-08-29 02:04:34 Re: Reopen logfile on SIGHUP