Re: Failures in constraints regression test, "read only 0 of 8192 bytes"

From: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
To: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
Cc: Ronan Dunklau <ronan(dot)dunklau(at)aiven(dot)io>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
Subject: Re: Failures in constraints regression test, "read only 0 of 8192 bytes"
Date: 2024-03-10 05:48:22
Message-ID: CA+hUKG+XOrCi3UwiK5dNL_B8Eav6hMk334L4Qpctfw4MPDUYaw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, Mar 10, 2024 at 5:02 PM Thomas Munro <thomas(dot)munro(at)gmail(dot)com> wrote:
> Thanks, reproduced here (painfully slowly). Looking...

I changed that ERROR to a PANIC and now I can see that
_bt_metaversion() is failing to read a meta page (block 0), and the
file is indeed of size 0 in my filesystem. Which is not cool, for a
btree. Looking at btbuildempty(), we have this sequence:

bulkstate = smgr_bulk_start_rel(index, INIT_FORKNUM);

/* Construct metapage. */
metabuf = smgr_bulk_get_buf(bulkstate);
_bt_initmetapage((Page) metabuf, P_NONE, 0, allequalimage);
smgr_bulk_write(bulkstate, BTREE_METAPAGE, metabuf, true);

smgr_bulk_finish(bulkstate);

Ooh. One idea would be that the smgr lifetime stuff is b0rked,
introducing corruption. Bulk write itself isn't pinning the smgr
relation, it's relying purely on the object not being invalidated,
which the theory of 21d9c3ee's commit message allowed for but ... here
it's destroyed (HASH_REMOVE'd) sooner under CACHE_CLOBBER_ALWAYS,
which I think we failed to grok. If that's it, I'm surprised that
things don't implode more spectacularly. Perhaps HASH_REMOVE should
clobber objects in debug builds, similar to pfree?

For that hypothesis, the corruption might not be happening in the
above-quoted code itself, because it doesn't seem to have an
invalidation acceptance point (unless I'm missing it). Some other
bulk write got mixed up? Not sure yet.

I won't be surprised if the answer is: if you're holding a reference,
you have to get a pin (referring to bulk_write.c).

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Thomas Munro 2024-03-10 06:23:35 Re: Failures in constraints regression test, "read only 0 of 8192 bytes"
Previous Message Leung, Anthony 2024-03-10 04:38:59 Re: Allow non-superuser to cancel superuser tasks.