Re: Show various offset arrays for heap WAL records

From: Melanie Plageman <melanieplageman(at)gmail(dot)com>
To: Peter Geoghegan <pg(at)bowt(dot)ie>
Cc: Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers(at)postgresql(dot)org, Robert Haas <robertmhaas(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>
Subject: Re: Show various offset arrays for heap WAL records
Date: 2023-03-13 23:00:59
Message-ID: CAAKRu_ZpxbZc4rUCwb=FWCmGmwrosU6hTsz6ioxTBJUfZd4YBA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Thanks for the various perspectives and feedback.

Attached v2 has additional info for xl_btree_vacuum and xl_btree_delete.

I've quoted various emails by various senders below and replied.

On Fri, Jan 27, 2023 at 3:02 PM Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>
> On Fri, Jan 27, 2023 at 12:24 PM Melanie Plageman
> <melanieplageman(at)gmail(dot)com> wrote:
> > I believe I have addressed this in the attached patch.
>
> I'm not sure what's best in terms of formatting details but I
> definitely like the idea of making pg_waldump show more details. I'd
> even like to have a way to extract the tuple data, when it's
> operations on tuples and we have those tuples in the payload. That'd
> be a lot more verbose than what you are doing here, though, and I'm
> not saying you should go do it right now or anything like that.

If I'm not mistaken, this would be quite difficult without changing
rm_desc to return some kind of self-describing data type.

On Tue, Jan 31, 2023 at 4:52 PM Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
> On Fri, Jan 27, 2023 at 9:24 AM Melanie Plageman
> <melanieplageman(at)gmail(dot)com> wrote:
> > I started documenting it in the rmgr_utils.h header file in a comment,
> > however it may be worth a README?
> >
> > I haven't polished this description of the format (or added examples,
> > etc) or used it in the btree-related functions because I assume the
> > format and helper function API will need more discussion.
>
> I think that standardization is good, but ISTM that we need clarity on
> what the scope is -- what is *not* being standardized? It may or may
> not be useful to call the end result an API. Or it may not make sense
> to do so in the first committed version, even though we may ultimately
> end up as something that deserves to be called an API. The obligation
> to not break tools that are scraping the output in whatever way seems
> kind of onerous right now -- just not having any gratuitous
> inconsistencies (e.g., fixing totally inconsistent punctuation, making
> the names for fields across WAL records consistent when they serve
> exactly the same purpose) would be a big improvement.

So, we can scrap any README or big comment, but are there other changes
to the code or structure you think would avoid it being seen as an
API?

> As I mentioned in passing already, I actually don't think that the
> B-Tree WAL records are all that special, as far as this stuff goes.
> For example, the DELETE Btree record type is very similar to heapam's
> PRUNE record type, and practically identical to Btree's VACUUM record
> type. All of these record types use the same basic conventions, like a
> snapshotConflictHorizon field for recovery conflicts (which is
> generated in a very similar way during original execution, and
> processed in precisely the same way during REDO), and arrays of page
> offset numbers sorted in ascending order.
>
> There are some remaining details where things from an index AM WAL
> record aren't directly analogous (or pretty much identical) to some
> other heapam WAL records, such as the way that the DELETE Btree record
> type deals with deleting a subset of TIDs from a posting list index
> tuple (generated by B-Tree deduplication). But even these exceptions
> don't require all that much discussion. You could either choose to
> only display the array of deleted index tuple page offset numbers, as
> well as the similar array of "updated" index tuple page offset numbers
> from xl_btree_delete, in which case you just display two arrays of
> page offset numbers, in the same standard way. You may or may not want
> to also show each individual xl_btree_update entry -- doing so would
> be kinda like showing the details of individual freeze plans, except
> that you'd probably display something very similar to the page offset
> number display here too (even though these aren't page offset numbers,
> they're 0-based offsets into the posting list's item pointer data
> array).

I have added detail to xl_btree_delete and xl_btree_vacuum. I have added
the updated/deleted target offset numbers and the updated tuples
metadata.

I wondered if there was any reason to do xl_btree_dedup deduplication
intervals.

> BTW, there is also a tendency for non-btree index AM WAL records to be
> fairly similar or even near-identical to the B-Tree WAL records. While
> Hash indexes are very different to B-Tree indexes at a high level, it
> is nevertheless the case that xl_hash_vacuum_one_page is directly
> based on xl_btree_delete/xl_btree_vacuum, and that xl_hash_insert is
> directly based on xl_btree_insert. There are some other WAL record
> types that are completely different across hash and B-Tree, which is a
> reflection of the fact that the index grows using a totally different
> approach in each AM -- but that doesn't seem like something that
> throws up any roadblocks for you (these can all be displayed as simple
> structs anyway).

I chose not to take on any other index types until I saw if this was viable.

> > Perhaps there should also be example output of the offset arrays in
> > pgwalinspect docs?
>
> That would definitely make sense.

I wanted to include at least a minimal example for those following along
with this thread that would cause creation of one of the record types
which I have enhanced, but I had a little trouble making a reliable
example.

Below is my strategy for getting a Heap PRUNE record with redirects, but
it occasionally doesn't end up working and I wasn't sure why (I can do
more investigation if we think that having some kind of test for this is
useful).

CREATE EXTENSION pg_walinspect;
DROP TABLE IF EXISTS lsns;
CREATE TABLE lsns(name TEXT, lsn pg_lsn);

DROP TABLE IF EXISTS baz;
create table baz(a int, b int) with (autovacuum_enabled=false);
insert into baz select i, i % 3 from generate_series(1,100)i;

update baz set b = 0 where b = 1;
update baz set b = 7 where b = 0;
INSERT INTO lsns VALUES('start_lsn', (SELECT pg_current_wal_lsn()));
vacuum baz;
select count(*) from baz;
INSERT INTO lsns VALUES('end_lsn', (SELECT pg_current_wal_lsn()));
SELECT * FROM pg_get_wal_records_info((select lsn from lsns where name
= 'start_lsn'),
(select lsn from lsns where name = 'end_lsn'))
WHERE record_type LIKE 'PRUNE%' AND resource_manager = 'Heap2' LIMIT 1;

> > Personally, I like having the infomasks for the freeze plans. If we
> > someday have a more structured input to rmgr_desc, we could then easily
> > have them in their own column and use functions like
> > heap_tuple_infomask_flags() on them.
>
> I agree, in general, though long term the best approach is one that
> has a configurable level of verbosity, with some kind of roughly
> uniform definition of verbosity (kinda like DEBUG1 - DEBUG5, though
> probably with only 2 or 3 distinct levels).

Given this comment and Robert's concern quoted below, I am wondering if
the consensus is that a lack of verbosity control is a dealbreaker for
adding offsets or not.

On Wed, Feb 1, 2023 at 12:52 PM Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Wed, Feb 1, 2023 at 12:47 PM Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
> > On Wed, Feb 1, 2023 at 5:20 AM Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> > > If we're dumping a lot of details out of each WAL record, we might
> > > want to switch to a multi-line format of some kind. No one enjoys a
> > > 460-character wide line, let alone 46000.
> >
> > I generally prefer it when I can use psql without using expanded table
> > format mode, and without having to use a pager. Of course that isn't
> > always possible, but it often is. I just don't think that that's going
> > to become feasible with pg_walinspect queries any time soon, since it
> > really requires a comprehensive strategy to deal with the issue of
> > verbosity.
>
> Well, if we're thinking of making the output a lot more verbose, it
> seems like we should at least do a bit of brainstorming about what
> that strategy could be.

In terms of strategies for controlling output verbosity, it seems
difficult to do without changing the rmgrdesc function signature. Unless
you are thinking of trying to reparse the rmgrdesc string output on the
pg_walinspect/pg_waldump side?

I think if there was a more structured output of rmgrdesc, then this
would also solve the verbosity level problem. Consumers could decide on
their verbosity level -- in various pg_walinspect function outputs, that
would probably just be column selection. For pg_waldump, I imagine that
some kind of parameter or flag would work.

Unless you are suggesting that we add a verbosity parameter to the
rmgrdesc function API now?

On Thu, Mar 2, 2023 at 3:17 AM Peter Eisentraut
<peter(dot)eisentraut(at)enterprisedb(dot)com> wrote:
> On 01.03.23 17:11, Melanie Plageman wrote:
> > diff --git a/contrib/pg_walinspect/pg_walinspect--1.0.sql b/contrib/pg_walinspect/pg_walinspect--1.0.sql
> > index 08b3dd5556..eb8ff82dd8 100644
> > --- a/contrib/pg_walinspect/pg_walinspect--1.0.sql
> > +++ b/contrib/pg_walinspect/pg_walinspect--1.0.sql
> > @@ -17,7 +17,7 @@ CREATE FUNCTION pg_get_wal_record_info(IN in_lsn pg_lsn,
> > OUT main_data_length int4,
> > OUT fpi_length int4,
> > OUT description text,
> > - OUT block_ref text
> > + OUT block_ref int4[][]
> > )
> > AS 'MODULE_PATHNAME', 'pg_get_wal_record_info'
> > LANGUAGE C STRICT PARALLEL SAFE;
>
> A change like this would require a new extension version and an upgrade
> script.
>
> I suppose it's ok to postpone that work while the actual meat of the
> patch is still being worked out, but I figured I'd mention it in case it
> wasn't considered yet.

Thanks for letting me know. This pg_walinspect patch ended up being
discussed over in [1].

- Melanie

[1] https://www.postgresql.org/message-id/flat/CAAKRu_bORebdZmcV8V4cZBzU8M_C6tDDdbiPhCZ6i-iuSXW9TA%40mail.gmail.com

Attachment Content-Type Size
v2-0001-Add-rmgr_desc-utilities.patch text/x-patch 16.2 KB
v2-0002-Add-detail-to-some-btree-xlog-records.patch text/x-patch 7.2 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Nathan Bossart 2023-03-13 23:10:08 Re: Sub-millisecond [autovacuum_]vacuum_cost_delay broken
Previous Message Nathan Bossart 2023-03-13 22:32:03 Re: meson: Non-feature feature options