Re: Show various offset arrays for heap WAL records

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Melanie Plageman <melanieplageman(at)gmail(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers(at)postgresql(dot)org, Robert Haas <robertmhaas(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>
Subject: Re: Show various offset arrays for heap WAL records
Date: 2023-03-21 22:37:12
Message-ID: CAH2-Wz=0ztNEy+8G3_Xj56qPfZ7eJzrHtfOtGdLavjrZJJMmeA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Mar 13, 2023 at 6:41 PM Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
> There are several different things that seem important to me
> personally. These are in tension with each other, to a degree. These
> are:
>
> 1. Like Andres, I'd really like to have some way of inspecting things
> like heapam PRUNE, VACUUM, and FREEZE_PAGE records in significant
> detail. These record types happen to be very important in general, and
> the ability to see detailed information about the WAL record would
> definitely help with some debugging scenarios. I've really missed
> stuff like this while debugging serious issues under time pressure.

One problem that I often run into when performing analysis of VACUUM
using pg_walinspect is the issue of *who* pruned which heap page, for
any given PRUNE record. Was it VACUUM/autovacuum, or was it
opportunistic pruning? There is no way of knowing for sure right now.
You *cannot* rely on an xid of 0 as an indicator of a given PRUNE
record coming from VACUUM; it could just have been an opportunistic
prune operation that happened to take place when a SELECT query ran,
before any XID was ever allocated.

I think that we should do something like the attached, to completely
avoid this ambiguity. This patch adds a new XLOG_HEAP2 bit that's
similar to XLOG_HEAP_INIT_PAGE -- XLOG_HEAP2_BYVACUUM. This allows all
XLOG_HEAP2 record types to indicate that they took place during
VACUUM, by XOR'ing the flag with the record type/info when
XLogInsert() is called. For now this is only used by PRUNE records.
Tools like pg_walinspect will report a separate "Heap2/PRUNE+BYVACUUM"
record_type, as well as the unadorned Heap2/PRUNE record_type, which
we'll now know must have been opportunistic pruning.

The approach of using a bit in the style of the heapam init bit makes
sense to me, because the bit is available, and works in a way that is
minimally invasive. Also, one can imagine needing to resolve a similar
ambiguity in the future, when (say) opportunistic freezing is added.

I think that it makes sense to treat this within the scope of
Melanie's ongoing work to improve the instrumentation of these records
-- meaning that it's in scope for Postgres 16. Admittedly this is a
slightly creative interpretation, so if others disagree then I won't
argue. This is quite a small patch, though, which makes debugging
significantly easier. I think that there could be a great deal of
utility in being able to easily "pair up" corresponding
"Heap2/PRUNE+BYVACUUM" and "Heap2/VACUUM" records in debugging
scenarios. I can imagine linking these to "Heap2/FREEZE_PAGE" and
"Heap2/VISIBLE" records, too, since they're all closely related record
types.

--
Peter Geoghegan

Attachment Content-Type Size
v1-0001-Record-which-PRUNE-records-are-from-VACUUM.patch application/octet-stream 5.2 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Smith 2023-03-21 23:28:53 Re: Allow logical replication to copy tables in binary format
Previous Message Matthias van de Meent 2023-03-21 22:22:40 Re: Save a few bytes in pg_attribute