Re: pg_walinspect - a new extension to get raw WAL data and WAL stats

From: Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com>
To: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, Robert Haas <robertmhaas(at)gmail(dot)com>, Greg Stark <stark(at)mit(dot)edu>, Jeremy Schneider <schneider(at)ardentperf(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, SATYANARAYANA NARLAPURAM <satyanarlapuram(at)gmail(dot)com>, marvin_liang(at)qq(dot)com, actyzhang(at)outlook(dot)com
Subject: Re: pg_walinspect - a new extension to get raw WAL data and WAL stats
Date: 2022-02-14 15:01:51
Message-ID: CAE9k0P=9SReU_613TXytZmpwL3ZRpnC5zrf96UoNCATKpK-UxQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Here are few comments:

+/*
+ * Verify the authenticity of the given raw WAL record.
+ */
+Datum
+pg_verify_raw_wal_record(PG_FUNCTION_ARGS)
+{

Do we really need this function? I see that whenever the record is
read, we verify it. So could there be a scenario where any of these
functions would return an invalid WAL record?

--

Should we add a function that returns the pointer to the first and
probably the last WAL record in the WAL segment? This would help users
to inspect the wal records in the entire wal segment if they wish to
do so.

--

+PG_FUNCTION_INFO_V1(pg_get_raw_wal_record);
+PG_FUNCTION_INFO_V1(pg_get_first_valid_wal_record_lsn);
+PG_FUNCTION_INFO_V1(pg_verify_raw_wal_record);
+PG_FUNCTION_INFO_V1(pg_get_wal_record_info);
+PG_FUNCTION_INFO_V1(pg_get_wal_records_info);

I think we should allow all these functions to be executed in wait and
*nowait* mode. If a user specifies nowait mode, the function should
return if no WAL data is present, rather than waiting for new WAL data
to become available, default behaviour could be anything you like.

--

+Datum
+pg_get_wal_records_info(PG_FUNCTION_ARGS)
+{
+#define PG_GET_WAL_RECORDS_INFO_COLS 10

We could probably have another variant of this function that would
work even if the end pointer is not specified, in which case the
default end pointer would be the last WAL record in the WAL segment.
Currently it mandates the use of an end pointer which slightly reduces
flexibility.

--

+
+/*
+ * Get the first valid raw WAL record lsn.
+ */
+Datum
+pg_get_first_valid_wal_record_lsn(PG_FUNCTION_ARGS)

I think this function should return a pointer to the nearest valid WAL
record which can be the previous WAL record to the LSN entered by the
user or the next WAL record. If a user unknowingly enters an lsn that
does not exist then in such cases we should probably return the lsn of
the previous WAL record instead of hanging or waiting for the new WAL
record to arrive.

--

Another important point I would like to mention here is - have we made
an attempt to ensure that we try to share as much of code with
pg_waldump as possible so that if any changes happens in the
pg_waldump in future it gets applied here as well and additionally it
will also reduce the code duplication.

I haven't yet looked into the code in detail. I will have a look at it
asap. thanks.

--
With Regards,
Ashutosh Sharma.

On Sat, Feb 12, 2022 at 5:03 PM Bharath Rupireddy
<bharath(dot)rupireddyforpostgres(at)gmail(dot)com> wrote:
>
> On Thu, Feb 10, 2022 at 9:55 PM Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
> >
> > On Sun, Feb 6, 2022 at 7:45 AM Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> > > For what it's worth, I am generally in favor of having something like
> > > this in PostgreSQL. I think it's wrong of us to continue assuming that
> > > everyone has command-line access. Even when that's true, it's not
> > > necessarily convenient. If you choose to use a relational database,
> > > you may be the sort of person who likes SQL. And if you are, you may
> > > want to have the database tell you what's going on via SQL rather than
> > > command-line tools or operating system utilities. Imagine if we didn't
> > > have pg_stat_activity and you had to get that information by running a
> > > separate binary. Would anyone like that? Why is this case any
> > > different?
> >
> > +1. An SQL interface is significantly easier to work with. Especially
> > because it can use the built-in LSN type, pg_lsn.
> >
> > I don't find the slippery slope argument convincing. There aren't that
> > many other things that are like pg_waldump, but haven't already been
> > exposed via an SQL interface. Offhand, I can't think of any.
>
> On Sat, Feb 12, 2022 at 4:03 AM Andrew Dunstan <andrew(at)dunslane(dot)net> wrote:
> >
> > Almost completely off topic, but this reminded me of an incident about
> > 30 years ago at my first gig as an SA/DBA. There was an application
> > programmer who insisted on loading a set of values from a text file into
> > a temp table (it was Ingres, anyone remember that?). Why? Because he
> > knew how to write "Select * from mytable order by mycol" but didn't know
> > how to drive the Unix sort utility at the command line. When I was
> > unable to restrain myself from smiling at this he got very angry and
> > yelled at me loudly.
> >
> > So, yes, some people do like SQL and hate the command line.
>
> Thanks a lot for the comments. I'm looking forward to the review of
> the latest v4 patches posted at [1].
>
> [1] https://www.postgresql.org/message-id/CALj2ACUS9%2B54QGPtUjk76dcYW-AMKp3hPe-U%2BpQo2-GpE4kjtA%40mail.gmail.com
>
> Regards,
> Bharath Rupireddy.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tomas Vondra 2022-02-14 15:16:19 Re: Merging statistics from children instead of re-sampling everything
Previous Message Ashutosh Sharma 2022-02-14 14:56:49 Re: Identify missing publications from publisher while create/alter subscription.