Re: pg_visibility's pg_check_visible() yields false positive when working in parallel with autovacuum

From: Andres Freund <andres(at)anarazel(dot)de>
To: Daniel Shelepanov <deniel1495(at)mail(dot)ru>, Robert Haas <robertmhaas(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: pg_visibility's pg_check_visible() yields false positive when working in parallel with autovacuum
Date: 2022-02-18 17:51:19
Message-ID: 20220218175119.7hwv7ksamfjwijbx@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Hi,

On 2022-02-18 08:56:37 -0800, Andres Freund wrote:
> Could you try to minimize the script? A 300 line reproducer is quite long. And
> it looks like it won't even work in non postgres-pro tree.
>
> One thing to do would be to modify pg_visibility to elog(PANIC, "something")
> when it encounters corruption. Then you would have a chance of inspecting the
> state of the tuple/page in that moment.

Oh, I have been able to reliably reproduce this on HEAD. I modified
record_corrupt_item() to PANIC and then:

psql regression:
BEGIN ;SELECT txid_current();
<leave open>

psql postgres
DROP TABLE IF EXISTS vacuum_test_0;
create table vacuum_test_0 as select 42 i;
vacuum (disable_page_skipping) vacuum_test_0;
select * from pg_check_visible('vacuum_test_0');

At which point there immediately is a crash.

This reproduces in earlier versions too, at least back to 10.

I *think* this is a false positive:

- PGPROC->xmin is computed without regard for the database in which the other
sessions are running. Due to the the txid_current() session this includes an
older xid.

- During the VACUUM in vis.sql the only connection to the database pgbench
connects to is VACUUM and thus ignored when determining horizons (due to
PROC_IN_VACUUM). Therefore the horizon is computed to latestCompletedXid +
1.

- But during pg_check_visible(), the current session is *not* marked as
PROC_IN_VACUUM. So the horizon is the xid from the txid_current().

Boom.

Greetings,

Andres Freund

Attachment Content-Type Size
repro.sql application/sql 720 bytes

In response to

Browse pgsql-bugs by date

  From Date Subject
Next Message wliang 2022-02-19 05:45:38 Report some potential memory leak bugs in pg_dump.c
Previous Message Andres Freund 2022-02-18 16:56:37 Re: pg_visibility's pg_check_visible() yields false positive when working in parallel with autovacuum