Re: pg_walinspect - a new extension to get raw WAL data and WAL stats

From: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
To: Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, Robert Haas <robertmhaas(at)gmail(dot)com>, Greg Stark <stark(at)mit(dot)edu>, Jeremy Schneider <schneider(at)ardentperf(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, SATYANARAYANA NARLAPURAM <satyanarlapuram(at)gmail(dot)com>, marvin_liang(at)qq(dot)com, actyzhang(at)outlook(dot)com
Subject: Re: pg_walinspect - a new extension to get raw WAL data and WAL stats
Date: 2022-03-02 17:07:43
Message-ID: CALj2ACVY37ed+BT_WeM2KQS1TUM90HO9NjNktaKCX2G6X5iXsQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Mar 2, 2022 at 8:12 PM Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com> wrote:
>
> Some review comments on v5 patch (v5-0001-pg_walinspect.patch)

Thanks for reviewing.

> +--
> +-- pg_get_wal_records_info()
> +--
> +CREATE FUNCTION pg_get_wal_records_info(IN start_lsn pg_lsn,
> + IN end_lsn pg_lsn,
> + IN wait_for_wal boolean DEFAULT false,
> + OUT lsn pg_lsn,
>
> What does the wait_for_wal flag mean here when one has already
> specified the start and end lsn? AFAIU, If a user has specified a
> start and stop LSN, it means that the user knows the extent to which
> he/she wants to display the WAL records in which case we need to stop
> once the end lsn has reached . So what is the meaning of wait_for_wal
> flag? Does it look sensible to have the wait_for_wal flag here? To me
> it doesn't.

Users can always specify a future end_lsn and set wait_for_wal to
true, then the pg_get_wal_records_info/pg_get_wal_stats functions can
wait for the WAL. IMO, this is useful. If you remember you were okay
with wait/nowait versions for some of the functions upthread [1]. I'm
not going to retain this behaviour for both
pg_get_wal_records_info/pg_get_wal_stats as it is similar to
pg_waldump's --follow option.

> ==
>
> +--
> +-- pg_get_wal_records_info_till_end_of_wal()
> +--
> +CREATE FUNCTION pg_get_wal_records_info_till_end_of_wal(IN start_lsn pg_lsn,
> + OUT lsn pg_lsn,
> + OUT prev_lsn pg_lsn,
> + OUT xid xid,
>
> Why is this function required? Is pg_get_wal_records_info() alone not
> enough? I think it is. See if we can make end_lsn optional in
> pg_get_wal_records_info() and lets just have it alone. I think it can
> do the job of pg_get_wal_records_info_till_end_of_wal function.
>
> ==
>
> +--
> +-- pg_get_wal_stats_till_end_of_wal()
> +--
> +CREATE FUNCTION pg_get_wal_stats_till_end_of_wal(IN start_lsn pg_lsn,
> + OUT resource_manager text,
> + OUT count int8,
>
> Above comment applies to this function as well. Isn't pg_get_wal_stats() enough?

I'm doing the following input validations for these functions to not
cause any issues with invalid LSN. If I were to have the default value
for end_lsn as 0/0, I can't perform input validations right? That is
the reason I'm having separate functions {pg_get_wal_records_info,
pg_get_wal_stats}_till_end_of_wal() versions.

/* Validate input. */
if (XLogRecPtrIsInvalid(start_lsn))
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
errmsg("invalid WAL record start LSN")));

if (XLogRecPtrIsInvalid(end_lsn))
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
errmsg("invalid WAL record end LSN")));

if (start_lsn >= end_lsn)
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
errmsg("WAL record start LSN must be less than end LSN")));

> ==
>
>
> + if (loc <= read_upto)
> + break;
> +
> + /* Let's not wait for WAL to be available if
> indicated */
> + if (loc > read_upto &&
> + state->private_data != NULL)
> + {
>
> Why loc > read_upto? The first if condition is (loc <= read_upto)
> followed by the second if condition - (loc > read_upto). Is the first
> if condition (loc <= read_upto) not enough to indicate that loc >
> read_upto?

Yeah, that's unnecessary, I improved the comment there and removed loc
> read_upto.

> ==
>
> +#define IsEndOfWALReached(state) \
> + (state->private_data != NULL && \
> + (((ReadLocalXLOGPage2Private *)
> xlogreader->private_data)->no_wait == true) && \
> + (((ReadLocalXLOGPage2Private *)
> xlogreader->private_data)->reached_end_of_wal == true))
>
> I think we should either use state or xlogreader. First line says
> state->private_data and second line xlogreader->private_data.

I've changed it to use state instead of xlogreader.

> ==
>
> + (((ReadLocalXLOGPage2Private *)
> xlogreader->private_data)->reached_end_of_wal == true))
> +
>
> There is a new patch coming to make the end of WAL messages less
> scary. It introduces the EOW flag in xlogreaderstate maybe we can use
> that instead of introducing new flags in private area to represent the
> end of WAL.

Yeah that would be great. But we never know which one gets committed
first. Until then it's not good to have dependencies on two "on-going"
patches. Later, we can change.

> ==
>
> +/*
> + * XLogReaderRoutine->page_read callback for reading local xlog files
> + *
> + * This function is same as read_local_xlog_page except that it works in both
> + * wait and no wait mode. The callers can specify about waiting in private_data
> + * of XLogReaderState.
> + */
> +int
> +read_local_xlog_page_2(XLogReaderState *state, XLogRecPtr targetPagePtr,
> + int reqLen, XLogRecPtr
> targetRecPtr, char *cur_page)
> +{
> + XLogRecPtr read_upto,
>
> Do we really need this function? Can't we make use of an existing WAL
> reader function - read_local_xlog_page()?

I clearly explained the reasons upthread [2]. Please let me know if
you have more thoughts/doubts here, we can connect offlist.

Attaching v6 patch set with above review comments addressed. Please
review it further.

[1] https://www.postgresql.org/message-id/CAE9k0P%3D9SReU_613TXytZmpwL3ZRpnC5zrf96UoNCATKpK-UxQ%40mail.gmail.com
+PG_FUNCTION_INFO_V1(pg_get_raw_wal_record);
+PG_FUNCTION_INFO_V1(pg_get_first_valid_wal_record_lsn);
+PG_FUNCTION_INFO_V1(pg_verify_raw_wal_record);
+PG_FUNCTION_INFO_V1(pg_get_wal_record_info);
+PG_FUNCTION_INFO_V1(pg_get_wal_records_info);

I think we should allow all these functions to be executed in wait and
*nowait* mode. If a user specifies nowait mode, the function should
return if no WAL data is present, rather than waiting for new WAL data
to become available, default behaviour could be anything you like.

[2] https://www.postgresql.org/message-id/CALj2ACUtqWX95uAj2VNJED0PnixEeQ%3D0MEzpouLi%2Bzd_iTugRA%40mail.gmail.com
I've added a new function read_local_xlog_page_2 (similar to
read_local_xlog_page but works in wait and no wait mode) and the
callers can specify whether to wait or not wait using private_data.
Actually, I wanted to use the private_data structure of
read_local_xlog_page but the logical decoding already has context as
private_data, that is why I had to have a new function. I know it
creates a bit of duplicate code, but its cleaner than using
backend-local variables or additional flags in XLogReaderState or
adding wait/no-wait boolean to page_read callback. Any other
suggestions are welcome here.

With this, I'm able to have wait/no wait versions for any functions.
But for now, I'm having wait/no wait for two functions
(pg_get_wal_records_info and pg_get_wal_stats) for which it makes more
sense.

Regards,
Bharath Rupireddy.

Attachment Content-Type Size
v6-0001-pg_walinspect.patch application/octet-stream 49.9 KB
v6-0001-pg_walinspect-tests.patch application/octet-stream 8.2 KB
v6-0001-pg_walinspect-docs.patch application/octet-stream 8.4 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David Steele 2022-03-02 17:17:34 Re: Allow root ownership of client certificate key
Previous Message Chapman Flack 2022-03-02 17:04:59 Re: pg_stop_backup() v2 incorrectly marked as proretset