Re: Show various offset arrays for heap WAL records

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Melanie Plageman <melanieplageman(at)gmail(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers(at)postgresql(dot)org, Robert Haas <robertmhaas(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>
Subject: Re: Show various offset arrays for heap WAL records
Date: 2023-03-14 01:41:09
Message-ID: CAH2-Wz=pyNmfMnjeKpOcSqai9jiTzS3yZ+df=Nj1XGEUJuTRww@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Mar 13, 2023 at 4:01 PM Melanie Plageman
<melanieplageman(at)gmail(dot)com> wrote:
> On Fri, Jan 27, 2023 at 3:02 PM Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> > I'm not sure what's best in terms of formatting details but I
> > definitely like the idea of making pg_waldump show more details.

> If I'm not mistaken, this would be quite difficult without changing
> rm_desc to return some kind of self-describing data type.

I'd say that it would depend on how far you went with it. Basic
information about the tuple wouldn't require any of that. I suggest
leaving this part out for now, though.

> So, we can scrap any README or big comment, but are there other changes
> to the code or structure you think would avoid it being seen as an
> API?

I think that it would be good to try to build something that looks
like an API, while making zero promises about its stability -- at
least until further notice. Kind of like how there are no guarantees
about the stability of internal interfaces within the Linux kernel.

There is no reason to not take a firm position on some things now.
Things like punctuation, and symbol names for generic cross-record
symbols like snapshotConflictHorizon. Many of the differences that
exist now are wholly gratuitous -- just accidents. It would make sense
to standardize-away these clearly unnecessary variations. And to
document the new standard. I'd be surprised if anybody disagreed with
me on this point.

> I have added detail to xl_btree_delete and xl_btree_vacuum. I have added
> the updated/deleted target offset numbers and the updated tuples
> metadata.
>
> I wondered if there was any reason to do xl_btree_dedup deduplication
> intervals.

No reason. It wouldn't be hard to cover xl_btree_dedup deduplication
intervals -- each element is a page offset number, and a corresponding
count of index tuples to merge together in the REDO routine. That's
slightly different to anything else, but not in a way that seems like
it requires very much additional effort.

> I wanted to include at least a minimal example for those following along
> with this thread that would cause creation of one of the record types
> which I have enhanced, but I had a little trouble making a reliable
> example.
>
> Below is my strategy for getting a Heap PRUNE record with redirects, but
> it occasionally doesn't end up working and I wasn't sure why (I can do
> more investigation if we think that having some kind of test for this is
> useful).

I'm not sure, but offhand I think that there could be a number of
annoying little implementation details that make it hard to come up
with a perfectly reliable test case. Perhaps try it while using VACUUM
VERBOSE, with the proviso that we should only expect the revised
example workflow to show a redirect record as intended when the
VERBOSE output confirms that VACUUM actually ran as expected, in
whatever way. For example, VACUUM can't have failed to acquire a
cleanup lock on a heap page due to the current phase of the moon.
VACUUM shouldn't have its "removable cutoff" held back by
who-knows-what when the test case is run, either.

Some of the tests for VACUUM use a temp table, since they conveniently
cannot have their "removable cutoff" held back -- not since commit
a7212be8. Of course, that strategy won't help you here. Getting VACUUM
to behave very predictably for testing purposes has proven tricky at
times.

> > I agree, in general, though long term the best approach is one that
> > has a configurable level of verbosity, with some kind of roughly
> > uniform definition of verbosity (kinda like DEBUG1 - DEBUG5, though
> > probably with only 2 or 3 distinct levels).
>
> Given this comment and Robert's concern quoted below, I am wondering if
> the consensus is that a lack of verbosity control is a dealbreaker for
> adding offsets or not.

There are several different things that seem important to me
personally. These are in tension with each other, to a degree. These
are:

1. Like Andres, I'd really like to have some way of inspecting things
like heapam PRUNE, VACUUM, and FREEZE_PAGE records in significant
detail. These record types happen to be very important in general, and
the ability to see detailed information about the WAL record would
definitely help with some debugging scenarios. I've really missed
stuff like this while debugging serious issues under time pressure.

2. To a lesser extent I would like to see similar detailed information
for nbtree's DELETE, VACUUM, and possibly DEDUPLICATE record types.
They might also come in handy for debugging, in about the same way.

3. More manageable verbosity.

I think that it would be okay to put off coming up with a solution to
3, for now. 1 and 2 seem more important than 3.

> I think if there was a more structured output of rmgrdesc, then this
> would also solve the verbosity level problem. Consumers could decide on
> their verbosity level -- in various pg_walinspect function outputs, that
> would probably just be column selection. For pg_waldump, I imagine that
> some kind of parameter or flag would work.
>
> Unless you are suggesting that we add a verbosity parameter to the
> rmgrdesc function API now?

The verbosity problem will get somewhat worse if we do just my items 1
and 2, so it would be nice if we at least had a strategy in mind that
delivers on item 3/verbosity -- though the implementation can appear
in a later release. Maybe something simple would work, like promising
to output (say) 30 characters or less in terse mode, and making no
such promise otherwise. Terse mode wouldn't just truncate the output
of verbose mode -- it would never display information that could in
principle exceed the 30 character allowance, even with records that
happen to fall under the limit.

I can't feel too bad about putting this part off. A pager like pspg is
already table stakes when using pg_walinspect in any sort of serious
way. As I said upthread, absurdly wide output is already reasonably
common in most cases.

--
Peter Geoghegan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Smith 2023-03-14 02:13:37 Re: Allow logical replication to copy tables in binary format
Previous Message Andrey Borodin 2023-03-14 01:14:18 Re: psql \watch 2nd argument: iteration count