Re: Fixing a couple of buglets in how VACUUM sets visibility map bits

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Fixing a couple of buglets in how VACUUM sets visibility map bits
Date: 2023-01-10 19:47:41
Message-ID: CAH2-WzkZjvZG5tho2gXHZL3GSDc6ZqLeYta5fXvn9eOx2wGBkw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Jan 10, 2023 at 10:50 AM Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> Look, I don't want to spend time arguing about what seem to me to be
> basic principles of good software engineering. When I don't put test
> cases into my patches, people complain at me and tell me that I'm a
> bad software engineer because I didn't include test cases. Your
> argument here seems to be that you're such a good software engineer
> that you don't need any test cases to know what the bug is or that
> you've fixed it correctly.

That's not what I said. This is a straw man.

What I actually said was that there is no reason to declare up front
that the only circumstances under which a fix could be committed is
when a clean repro is available. I never said that a test case has
little or no value, and I certainly didn't assert that we definitely
don't need a test case to proceed with a commit -- since I am not in
the habit of presumptuously attaching conditions to such things well
in advance.

> I don't particularly appreciate the implication that I either lack
> relevant or expertise or don't actually take time, either.

The implication was only that you didn't take the time. Clearly you
have the expertise. Obviously you're very far from stupid.

I have been unable to reproduce the problem, and think it's possible
that the issue cannot be triggered in practice. Though only through
sheer luck. Here's why that is:

While pruning will remove aborted dead tuples, freezing will not
remove the xmax of an aborted update unless the XID happens to be <
OldestXmin. With my problem scenario, the page will be all_visible in
prunestate, but not all_frozen -- so it dodges the relevant
visibilitymap_set() call site.

That just leaves inserts that abort, I think. An aborted insert will
be totally undone by pruning, but that does still leave behind an
LP_DEAD item that needs to be vacuumed in the second heap pass. This
means that we can only set the page all-visible/all-frozen in the VM
in the second heap pass, which also dodges the relevant
visibilitymap_set() call site.

In summary, I think that there is currently no way that we can have
the VM (or the PD_ALL_VISIBLE flag) concurrently unset, while leaving
the page all_frozen. It can happen and leave the page all_visible, but
not all_frozen, due to these very fine details. (Assuming I haven't
missed another path to the problem with aborted Multis or something,
but looks like I haven't.)

--
Peter Geoghegan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2023-01-10 19:52:20 Re: logical decoding and replication of sequences, take 2
Previous Message Mark Dilger 2023-01-10 19:47:11 Re: Transparent column encryption