Re: BUG: Postgres 14 + vacuum_defer_cleanup_age + FOR UPDATE + UPDATE

From: Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>
To: Michail Nikolaev <michail(dot)nikolaev(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: BUG: Postgres 14 + vacuum_defer_cleanup_age + FOR UPDATE + UPDATE
Date: 2023-01-05 21:49:23
Message-ID: CAEze2WjggdAo5A0oaE6EAhrM-EPtPexcqt1f3HEM_Oyx3u8_Pg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, 5 Jan 2023 at 14:12, Michail Nikolaev <michail(dot)nikolaev(at)gmail(dot)com>
wrote:
>
> Hello, hackers.
>
> It seems like PG 14 works incorrectly with vacuum_defer_cleanup_age
> (or just not cleared rows, not sure) and SELECT FOR UPDATE + UPDATE.
> I am not certain, but hot_standby_feedback probably able to cause the
> same issues.
>
> Steps to reproduce:
>
> [steps]
>
> I was able to see such a set of errors (looks scary):
>
> ERROR: MultiXactId 30818104 has not been created yet -- apparent
wraparound
> ERROR: could not open file "base/13757/16385.1" (target block
> 39591744): previous segment is only 24 blocks

This looks quite suspicious too - it wants to access a block at 296GB of
data, where only 196kB exist.

> ERROR: attempted to lock invisible tuple
> ERROR: could not access status of transaction 38195704
> DETAIL: Could not open file "pg_subtrans/0246": No such file or
directory.

I just saw two instances of this "attempted to lock invisible tuple" error
for the 15.1 image (run on Docker in Ubuntu in WSL) with your reproducer
script, so this does not seem to be specific to PG14 (.6).

And, after some vacuum and restarting the process, I got the following:

client 29 script 0 aborted in command 2 query 0: ERROR: heap tid from
index tuple (111,1) points past end of heap page line pointer array at
offset 262 of block 1 in index "something_is_wrong_here_pkey"

There is indeed something wrong there; the page can't be read by
pageinspect:

$ select get_raw_page('public.something_is_wrong_here', 111)::bytea;
ERROR: invalid page in block 111 of relation base/5/16385

I don't have access to the v14 data anymore (I tried a restart, which
dropped the data :-( ), but will retain my v15 instance for some time to
help any debugging.

Kind regards,

Matthias van de Meent

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2023-01-05 21:56:55 Re: postgres_fdw: using TABLESAMPLE to collect remote sample
Previous Message Tomas Vondra 2023-01-05 21:47:23 Re: postgres_fdw: using TABLESAMPLE to collect remote sample