Re: collect_corrupt_items_vacuum.patch

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Daniel Shelepanov <deniel1495(at)mail(dot)ru>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>
Subject: Re: collect_corrupt_items_vacuum.patch
Date: 2022-07-27 21:50:46
Message-ID: CA+TgmoZtw7yNZogHFrgatHnvc2rOX9hoRRq=GtUBZx9mxh86Vg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Apr 4, 2022 at 4:51 AM Daniel Shelepanov <deniel1495(at)mail(dot)ru> wrote:
> I’ve been working on this [https://www.postgresql.org/message-id/flat/cfcca574-6967-c5ab-7dc3-2c82b6723b99%40mail.ru] bug. Finally, I’ve come up with the patch you can find attached. Basically what is does is rises a PROC_IN_VACUUM flag and resets it afterwards. I know this seems kinda crunchy and I hope you guys will give me some hints on where to continue. This [https://www.postgresql.org/message-id/20220218175119.7hwv7ksamfjwijbx%40alap3.anarazel.de] message contains reproduction script. Thank you very much in advance.

I noticed the CommitFest entry for this thread today and decided to
take a look. I think the general issue here can be stated in this way:
suppose a VACUUM computes an all-visible cutoff X, i.e. it thinks all
committed XIDs < X are all-visible. Then, at a later time, pg_visible
computes an all-visible cutoff Y, i.e. it thinks all committed XIDs <
Y are all-visible. If Y < X, pg_check_visible() might falsely report
corruption, because VACUUM might have marked as all-visible some page
containing tuples which pg_check_visibile() thinks aren't really
all-visible.

In reality, the oldest all-visible XID cannot move backward, but
ComputeXidHorizons() lets it move backward, because it's intended for
use by a caller who wants to mark pages all-visible, and it's only
concerned with making sure that the value is old enough to be safe.
And that's a problem for the way that pg_visibility is (mis-)using it.

To say that another way, ComputeXidHorizons() is perfectly fine with
returning a value that is older than the true answer, as long as it
never returns a value that is newer than the new answer. pg_visibility
wants the opposite. Here, a value that is newer than the true value
can't do worse than hide corruption, which is sort of OK, but a value
that's older than the true value can report corruption where none
exists, which is very bad.

I have a feeling, therefore, that this isn't really a complete fix. I
think it might address one way for the horizon reported by
ComputeXidHorizons() to move backward, but not all the ways.

Unfortunately, I am out of time for today to study this... but will
try to find more time on another day.

--
Robert Haas
EDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2022-07-27 21:55:56 Re: collect_corrupt_items_vacuum.patch
Previous Message Thomas Kellerer 2022-07-27 21:36:11 Re: Official Windows Installer and Documentation