Re: pg_walinspect - a new extension to get raw WAL data and WAL stats

From: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
To: bharath(dot)rupireddyforpostgres(at)gmail(dot)com
Cc: pgsql(at)j-davis(dot)com, ashu(dot)coek88(at)gmail(dot)com, andrew(at)dunslane(dot)net, robertmhaas(at)gmail(dot)com, stark(at)mit(dot)edu, schneider(at)ardentperf(dot)com, bruce(at)momjian(dot)us, pgsql-hackers(at)lists(dot)postgresql(dot)org, satyanarlapuram(at)gmail(dot)com, marvin_liang(at)qq(dot)com, actyzhang(at)outlook(dot)com
Subject: Re: pg_walinspect - a new extension to get raw WAL data and WAL stats
Date: 2022-03-11 02:38:22
Message-ID: 20220311.113822.1784045072426345189.horikyota.ntt@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

At Thu, 10 Mar 2022 22:15:42 +0530, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com> wrote in
> On Thu, Mar 10, 2022 at 1:52 PM Jeff Davis <pgsql(at)j-davis(dot)com> wrote:
> >
> > On Wed, 2022-03-02 at 22:37 +0530, Bharath Rupireddy wrote:
> > >
> > > Attaching v6 patch set with above review comments addressed. Please
> > > review it further.
>
> Thanks Jeff for reviewing it. I've posted the latest v7 patch-set
> upthread [1] which is having more simple-yet-useful-and-effective
> functions.
>
> > * Don't issue WARNINGs or other messages for ordinary situations, like
> > when pg_get_wal_records_info() hits the end of WAL.
>
> v7 patch-set [1] has no warnings, but the functions will error out if
> future LSN is specified.
>
> > * It feels like the APIs that allow waiting for the end of WAL are
> > slightly off. Can't you just do pg_get_wal_records_info(start_lsn,
> > least(pg_current_wal_flush_lsn(), end_lsn)) if you want the non-waiting
> > behavior? Try to make the API more orthogonal, where a few basic
> > functions can be combined to give you everything you need, rather than
> > specifying extra parameters and issuing WARNINGs. I
>
> v7 patch-set [1] onwards waiting mode has been removed for all of the
> functions, again to keep things simple-yet-useful-and-effective.
> However, we can always add new pg_walinspect functions that wait for
> future WAL in the next versions once basic stuff gets committed and if
> many users ask for it.
>
> > * In the docs, include some example output. I don't see any output in
> > the tests, which makes sense because it's mostly non-deterministic, but
> > it would be helpful to see sample output of at least
> > pg_get_wal_records_info().
>
> +1. Added for pg_get_wal_records_info and pg_get_wal_stats.
>
> > * Is pg_get_wal_stats() even necessary, or can you get the same
> > information with a query over pg_get_wal_records_info()? For instance,
> > if you want to group by transaction ID rather than rmgr, then
> > pg_get_wal_stats() is useless.
>
> Yes, you are right pg_get_wal_stats provides WAL stats per resource
> manager which is similar to pg_waldump with --start, --end and --stats
> option. It provides more information than pg_get_wal_records_info and
> is a good way of getting stats than adding more columns to
> pg_get_wal_records_info, calculating percentage in sql and having
> group by clause. IMO, pg_get_wal_stats is more readable and useful.
>
> > * Would be nice to have a pg_wal_file_is_valid() or similar, which
> > would test that it exists, and the header matches the filename (e.g. if
> > it was recycled but not used, that would count as invalid). I think
> > pg_get_first_valid_wal_record_lsn() would make some cases look invalid
> > even if the file is valid -- for example, if a wal record spans many
> > wal segments, the segments might look invalid because they contain no
> > complete records, but the file itself is still valid and contains valid
> > wal data.
>
> Actually I haven't tried testing a single WAL record spanning many WAL
> files yet(I'm happy to try it if someone suggests such a use-case). In
> that case too I assume pg_get_first_valid_wal_record_lsn() shouldn't
> have a problem because it just gives the next valid LSN and it's
> previous LSN using existing WAL reader API XLogFindNextRecord(). It
> opens up the WAL file segments using (some dots to connect -
> page_read/read_local_xlog_page, WALRead,
> segment_open/wal_segment_open). Thoughts?
>
> I don't think it's necessary to have a function pg_wal_file_is_valid()
> that given a WAL file name as input checks whether a WAL file exists
> or not, probably not in the core (xlogfuncs.c) too. These kinds of
> functions can open up challenges in terms of user input validation and
> may cause unnecessary problems, please see some related discussion
> [2].
>
> > * Is there a reason you didn't include the timeline ID in
> > pg_get_wal_records_info()?
>
> I'm right now allowing the functions to read WAL from the current
> server's timeline which I have mentioned in the docs. The server's
> current timeline is available via pg_control_checkpoint()'s
> timeline_id. So, having timeline_id as a column doesn't make sense.
> Again this is to keep things simple-yet-useful-and-effective. However,
> we can add new pg_walinspect functions to read WAL from historic as
> well as current timelines in the next versions once basic stuff gets
> committed and if many users ask for it.
>
> + <para>
> + All the functions of this module will provide the WAL information using the
> + current server's timeline ID.
> + </para>
>
> > * Can we mark this extension 'trusted'? I'm not 100% clear on the
> > standards for that marker, but it seems reasonable for a database owner
> > with the right privileges might want to install it.
>
> 'trusted' extensions concept is added by commit 50fc694 [3]. Since
> pg_walinspect deals with WAL, we strictly want to control who creates
> and can execute functions exposed by it, so I don't know if 'trusted'
> is a good idea here. Also, pageinspect isn't a 'trusted' extension.
>
> > * pg_get_raw_wal_record() seems too powerful for pg_monitor. Maybe that
> > function should require pg_read_server_files? Or at least
> > pg_read_all_data?
>
> pg_read_all_data may not be the right choice, but pg_read_server_files
> is. However, does it sound good if some functions are allowed to be
> executed by users with a pg_monitor role and others
> pg_get_raw_wal_record by users with pg_read_server_files? Since the
> extension itself can be created by superusers, isn't the
> pg_get_raw_wal_record sort of safe with pg_mointor itself?
>
> If hackers don't agree, I'm happy to grant execution on
> pg_get_raw_wal_record() to the pg_read_server_files role.
>
> Attaching the v8 patch-set resolving above comments and some tests for
> checking function permissions. Please review it further.
>
> [1] https://www.postgresql.org/message-id/CALj2ACWtToUQ5hCCBJP%2BmKeVUcN-g7cMb9XvhAcicPxUDsdcKg%40mail.gmail.com
> [2] https://www.postgresql.org/message-id/CA%2BTgmobYrTgMEF0SV%2ByDYyCCh44DAGjZVs7BYGrD8xD3vwNjHA%40mail.gmail.com
> [3] commit 50fc694e43742ce3d04a5e9f708432cb022c5f0d
> Author: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
> Date: Wed Jan 29 18:42:43 2020 -0500
>
> Invent "trusted" extensions, and remove the pg_pltemplate catalog.

I played with this a bit, and would like to share some thoughts on it.

It seems to me too rigorous that pg_get_wal_records_info/stats()
reject future LSNs as end-LSN and I think WARNING or INFO and stop at
the real end-of-WAL is more kind to users. I think the same with the
restriction that start and end LSN are required to be different.

The definition of end-lsn is fuzzy here. If I fed a future LSN to the
functions, they tell me the beginning of the current insertion point
in error message. On the other hand they don't accept the same
value as end-LSN. I think it is right that they tell the current
insertion point and they should take the end-LSN as the LSN to stop
reading.

I think pg_get_wal_stats() is worth to have but I think it should be
implemented in SQL. Currently pg_get_wal_records_info() doesn't tell
about FPI since pg_waldump doesn't but it is internally collected (of
course!) and easily revealed. If we do that, the
pg_get_wal_records_stats() would be reduced to the following SQL
statement

SELECT resource_manager resmgr,
count(*) AS N,
(count(*) * 100 / sum(count(*)) OVER tot)::numeric(5,2) AS "%N",
sum(total_length) AS "combined size",
(sum(total_length) * 100 / sum(sum(total_length)) OVER tot)::numeric(5,2) AS "%combined size",
sum(fpi_len) AS fpilen,
(sum(fpi_len) * 100 / sum(sum(fpi_len)) OVER tot)::numeric(5,2) AS "%fpilen"
FROM pg_get_wal_records_info('0/1000000', '0/175DD7f')
GROUP by resource_manager
WINDOW tot AS ()
ORDER BY "combined size" desc;

The only difference with pg_waldump is the statement above doesn't
show lines for the resource managers that don't contained in the
result of pg_get_wal_records_info(). But I don't think that matters.

Sometimes the field description has very long (28kb long) content. It
makes the result output almost unreadable and I had a bit hard time
struggling with the output full of '-'s. I would like have a default
limit on the length of such fields that can be long but I'm not sure
we want that.

The difference between pg_get_wal_record_info and _records_ other than
the number of argument is the former accepts incorrect LSNs.

The following works,
pg_get_wal_record_info('0/1000000');
pg_get_wal_records_info('0/1000000');

but this doesn't
pg_get_wal_records_info('0/1000000', '0/1000000');
> ERROR: WAL start LSN must be less than end LSN

But the following works
pg_get_wal_records_info('0/1000000', '0/1000029');
> 0/1000028 | 0/0 | 0

So I think we can consolidate the two functions as:

- pg_get_wal_records_info('0/1000000');

(current behavior) find the first record and show all records
thereafter.

- pg_get_wal_records_info('0/1000000', '0/1000000');

finds the first record since the start lsn and show it.

- pg_get_wal_records_info('0/1000000', '0/1000030');

finds the first record since the start lsn then show records up to
the end-lsn.

And about pg_get_raw_wal_record(). I don't see any use-case of the
function alone on SQL interface. Even if we need to inspect broken
WAL files, it needs profound knowledge of WAL format and tools that
doesn't work on SQL interface.

However like pageinspect, if we separate the WAL-record fetching and
parsing it could be thought as useful.

pg_get_wal_records_info woule be like:

SELECT * FROM pg_walinspect_parse(raw)
FROM (SELECT * FROM pg_walinspect_get_raw(start_lsn, end_lsn));

And pg_get_wal_stats woule be like:

SELECT * FROM pg_walinpect_stat(pg_walinspect_parse(raw))
FROM (SELECT * FROM pg_walinspect_get_raw(start_lsn, end_lsn)));

Regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2022-03-11 02:46:57 Re: Column Filtering in Logical Replication
Previous Message Masahiko Sawada 2022-03-11 02:25:04 Re: Skipping logical replication transactions on subscriber side