Re: pg_walinspect - a new extension to get raw WAL data and WAL stats

From: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
To: Jeff Davis <pgsql(at)j-davis(dot)com>
Cc: Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Robert Haas <robertmhaas(at)gmail(dot)com>, Greg Stark <stark(at)mit(dot)edu>, Jeremy Schneider <schneider(at)ardentperf(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, SATYANARAYANA NARLAPURAM <satyanarlapuram(at)gmail(dot)com>, marvin_liang(at)qq(dot)com, actyzhang(at)outlook(dot)com
Subject: Re: pg_walinspect - a new extension to get raw WAL data and WAL stats
Date: 2022-03-10 16:45:42
Message-ID: CALj2ACWhcbW_s4BXLyCpLWcCppZN9ncTXHbn4dv1F0Vpe0kxqA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Mar 10, 2022 at 1:52 PM Jeff Davis <pgsql(at)j-davis(dot)com> wrote:
>
> On Wed, 2022-03-02 at 22:37 +0530, Bharath Rupireddy wrote:
> >
> > Attaching v6 patch set with above review comments addressed. Please
> > review it further.

Thanks Jeff for reviewing it. I've posted the latest v7 patch-set
upthread [1] which is having more simple-yet-useful-and-effective
functions.

> * Don't issue WARNINGs or other messages for ordinary situations, like
> when pg_get_wal_records_info() hits the end of WAL.

v7 patch-set [1] has no warnings, but the functions will error out if
future LSN is specified.

> * It feels like the APIs that allow waiting for the end of WAL are
> slightly off. Can't you just do pg_get_wal_records_info(start_lsn,
> least(pg_current_wal_flush_lsn(), end_lsn)) if you want the non-waiting
> behavior? Try to make the API more orthogonal, where a few basic
> functions can be combined to give you everything you need, rather than
> specifying extra parameters and issuing WARNINGs. I

v7 patch-set [1] onwards waiting mode has been removed for all of the
functions, again to keep things simple-yet-useful-and-effective.
However, we can always add new pg_walinspect functions that wait for
future WAL in the next versions once basic stuff gets committed and if
many users ask for it.

> * In the docs, include some example output. I don't see any output in
> the tests, which makes sense because it's mostly non-deterministic, but
> it would be helpful to see sample output of at least
> pg_get_wal_records_info().

+1. Added for pg_get_wal_records_info and pg_get_wal_stats.

> * Is pg_get_wal_stats() even necessary, or can you get the same
> information with a query over pg_get_wal_records_info()? For instance,
> if you want to group by transaction ID rather than rmgr, then
> pg_get_wal_stats() is useless.

Yes, you are right pg_get_wal_stats provides WAL stats per resource
manager which is similar to pg_waldump with --start, --end and --stats
option. It provides more information than pg_get_wal_records_info and
is a good way of getting stats than adding more columns to
pg_get_wal_records_info, calculating percentage in sql and having
group by clause. IMO, pg_get_wal_stats is more readable and useful.

> * Would be nice to have a pg_wal_file_is_valid() or similar, which
> would test that it exists, and the header matches the filename (e.g. if
> it was recycled but not used, that would count as invalid). I think
> pg_get_first_valid_wal_record_lsn() would make some cases look invalid
> even if the file is valid -- for example, if a wal record spans many
> wal segments, the segments might look invalid because they contain no
> complete records, but the file itself is still valid and contains valid
> wal data.

Actually I haven't tried testing a single WAL record spanning many WAL
files yet(I'm happy to try it if someone suggests such a use-case). In
that case too I assume pg_get_first_valid_wal_record_lsn() shouldn't
have a problem because it just gives the next valid LSN and it's
previous LSN using existing WAL reader API XLogFindNextRecord(). It
opens up the WAL file segments using (some dots to connect -
page_read/read_local_xlog_page, WALRead,
segment_open/wal_segment_open). Thoughts?

I don't think it's necessary to have a function pg_wal_file_is_valid()
that given a WAL file name as input checks whether a WAL file exists
or not, probably not in the core (xlogfuncs.c) too. These kinds of
functions can open up challenges in terms of user input validation and
may cause unnecessary problems, please see some related discussion
[2].

> * Is there a reason you didn't include the timeline ID in
> pg_get_wal_records_info()?

I'm right now allowing the functions to read WAL from the current
server's timeline which I have mentioned in the docs. The server's
current timeline is available via pg_control_checkpoint()'s
timeline_id. So, having timeline_id as a column doesn't make sense.
Again this is to keep things simple-yet-useful-and-effective. However,
we can add new pg_walinspect functions to read WAL from historic as
well as current timelines in the next versions once basic stuff gets
committed and if many users ask for it.

+ <para>
+ All the functions of this module will provide the WAL information using the
+ current server's timeline ID.
+ </para>

> * Can we mark this extension 'trusted'? I'm not 100% clear on the
> standards for that marker, but it seems reasonable for a database owner
> with the right privileges might want to install it.

'trusted' extensions concept is added by commit 50fc694 [3]. Since
pg_walinspect deals with WAL, we strictly want to control who creates
and can execute functions exposed by it, so I don't know if 'trusted'
is a good idea here. Also, pageinspect isn't a 'trusted' extension.

> * pg_get_raw_wal_record() seems too powerful for pg_monitor. Maybe that
> function should require pg_read_server_files? Or at least
> pg_read_all_data?

pg_read_all_data may not be the right choice, but pg_read_server_files
is. However, does it sound good if some functions are allowed to be
executed by users with a pg_monitor role and others
pg_get_raw_wal_record by users with pg_read_server_files? Since the
extension itself can be created by superusers, isn't the
pg_get_raw_wal_record sort of safe with pg_mointor itself?

If hackers don't agree, I'm happy to grant execution on
pg_get_raw_wal_record() to the pg_read_server_files role.

Attaching the v8 patch-set resolving above comments and some tests for
checking function permissions. Please review it further.

[1] https://www.postgresql.org/message-id/CALj2ACWtToUQ5hCCBJP%2BmKeVUcN-g7cMb9XvhAcicPxUDsdcKg%40mail.gmail.com
[2] https://www.postgresql.org/message-id/CA%2BTgmobYrTgMEF0SV%2ByDYyCCh44DAGjZVs7BYGrD8xD3vwNjHA%40mail.gmail.com
[3] commit 50fc694e43742ce3d04a5e9f708432cb022c5f0d
Author: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Date: Wed Jan 29 18:42:43 2020 -0500

Invent "trusted" extensions, and remove the pg_pltemplate catalog.

Regards,
Bharath Rupireddy.

Attachment Content-Type Size
v8-0001-pg_walinspect.patch application/octet-stream 36.2 KB
v8-0001-pg_walinspect-tests.patch application/octet-stream 8.7 KB
v8-0001-pg_walinspect-docs.patch application/octet-stream 12.4 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2022-03-10 16:48:16 Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints
Previous Message Dmitry Dolgov 2022-03-10 16:38:37 Re: pg_stat_statements and "IN" conditions