Re: Reducing the WAL overhead of freezing in VACUUM by deduplicating per-tuple freeze plans

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Reducing the WAL overhead of freezing in VACUUM by deduplicating per-tuple freeze plans
Date: 2022-11-11 00:48:17
Message-ID: CAH2-WzmtP1rpk6At_xS4zLYL+s-645o-SbnVE3ZzkWwa3kkmAw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Sep 20, 2022 at 3:12 PM Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
> Attached is v2, which I'm just posting to keep CFTester happy. No real
> changes here.

Attached is v3. I'd like to move forward with commit soon. I'll do so
in the next few days, barring objections.

v3 has vacuumlazy.c pass NewRelfrozenXid instead of FreezeLimit for
the purposes of generating recovery conflicts during subsequent REDO
of the resulting xl_heap_freeze_page WAL record. This more general
approach is preparation for my patch to add page-level freezing [1].
It might theoretically lead to more recovery conflicts, but in
practice the impact should be negligible. For one thing VACUUM must
freeze *something* before any recovery conflict can happen during
subsequent REDO on a replica in hot standby. It's far more likely that
any disruptive recovery conflicts come from pruning.

It also makes the cutoff_xid field from the xl_heap_freeze_page WAL
record into a "standard latestRemovedXid format" field. In other words
it backs up an XID passed by vacuumlazy.c caller during original
execution (not in the REDO routine, as on HEAD). To make things
clearer, the patch also renames the nearby xl_heap_visible.cutoff_xid
field to xl_heap_visible.latestRemovedXid. Now there are no WAL
records with a field called "cutoff_xid" (they're all called
"latestRemovedXid" now). This matches PRUNE records, and B-Tree DELETE
records.

The overall picture is that all REDO routines (for both heapam and
index AMs) now advertise that they have a field that they use to
generate recovery conflicts that follows a standard format. All
latestRemovedXid XIDs are applied in a standard way during REDO: by
passing them to ResolveRecoveryConflictWithSnapshot(). Users can grep
the output of tools like pg_waldump to find latestRemovedXid fields,
without necessarily needing to give any thought to which kind of WAL
records are involved, or even the rmgr. Presenting this information
precisely and uniformity seems useful to me. (Perhaps we should have a
truly generic name, which latestRemovedXid isn't, but that can be
handled separately.)

[1] https://commitfest.postgresql.org/39/3843/
--
Peter Geoghegan

Attachment Content-Type Size
v3-0001-Shrink-freeze-WAL-records-via-deduplication.patch application/octet-stream 23.9 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jeff Davis 2022-11-11 01:20:35 Re: Lack of PageSetLSN in heap_xlog_visible
Previous Message Tom Lane 2022-11-10 23:22:44 Re: [PATCH] ALTER TABLE ... SET STORAGE default