Re: Fixing a couple of buglets in how VACUUM sets visibility map bits

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Fixing a couple of buglets in how VACUUM sets visibility map bits
Date: 2023-01-11 00:39:30
Message-ID: CAH2-WzmisLS0Gxj=+NJLAxqC9bZmUV4W3CdD0dxnisf7WeXPcQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Jan 10, 2023 at 12:08 PM Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
> Actually, FreezeMultiXactId() can fully remove an xmax that has some
> member XIDs >= OldestXmin, provided FRM_NOOP processing isn't
> possible, at least when no individual member is still running. Doesn't
> have to involve transaction aborts at all.
>
> Let me go try to break it that way...

Attached patch shows how this could break.

It adds an assertion that checks that the expected
PD_ALL_VISIBLE/VM_ALL_VISIBLE() conditions hold at the right point. It
also comments out FreezeMultiXactId()'s FRM_NOOP handling.

The FRM_NOOP case is really just an optimization, and shouldn't be
needed for correctness. This is amply demonstrated by running "meston
test" with the patch applied, which will pass without incident.

I can get the PD_ALL_VISIBLE assertion to fail by following this
procedure with the patch applied:

* Run a plain VACUUM to set all the pages from a table all-visible,
but not all-frozen.

* Set a breakpoint that will hit after all_visible_according_to_vm is
set to true, for an interesting blkno.

* Run VACUUM FREEZE. We need FREEZE in order to be able to hit the
relevant visibilitymap_set() call site (the one that passes
VISIBILITYMAP_ALL_FROZEN as its flags, without also passing
VISIBILITYMAP_ALL_VISIBLE).

Now all_visible_according_to_vm is set to true, but we don't have a
lock/pin on the same heap page just yet.

* Acquire several non-conflicting row locks on a row on the block in
question, so that a new multi is allocated.

* End every session whose XID is stored in our multi (commit/abort).

* Within GDB, continue from before -- observe assertion failure.

Obviously this scenario doesn't demonstrate the presence of a bug --
not quite. But it does prove that we rely on FRM_NOOP to not allow the
VM to become corrupt, which just doesn't make any sense, and can't
have been intended. At a minimum, it strongly suggests that the
current approach is very fragile.

--
Peter Geoghegan

Attachment Content-Type Size
0001-Debug-freeze-map-issue.patch application/octet-stream 1.5 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David Rowley 2023-01-11 00:48:16 Re: Todo: Teach planner to evaluate multiple windows in the optimal order
Previous Message Thomas Munro 2023-01-11 00:12:46 Re: Use windows VMs instead of windows containers on the CI