Re: Combine pg_walinspect till_end_of_wal functions with others

From: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
To: Michael Paquier <michael(at)paquier(dot)xyz>
Cc: Julien Rouhaud <rjuju123(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Combine pg_walinspect till_end_of_wal functions with others
Date: 2023-03-13 10:23:37
Message-ID: CALj2ACWLj9ViWeexS=8YBZh+ssrtPEXzF3in3j_5Hfp83O8Hsg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Mar 13, 2023 at 12:26 PM Michael Paquier <michael(at)paquier(dot)xyz> wrote:
>
> On Fri, Mar 10, 2023 at 04:45:06PM +0530, Bharath Rupireddy wrote:
>
> After that comes the rest of the patch, and I have found a couple of
> mistakes.
>
> - pg_get_wal_records_info(start_lsn pg_lsn, end_lsn pg_lsn)
> + pg_get_wal_records_info(start_lsn pg_lsn, end_lsn pg_lsn DEFAULT NULL)
> returns setof record
> [...]
> - pg_get_wal_stats(start_lsn pg_lsn, end_lsn pg_lsn, per_record boolean DEFAULT false)
> + pg_get_wal_stats(start_lsn pg_lsn, end_lsn pg_lsn DEFAULT NULL, per_record boolean DEFAULT false)
>
> This part of the documentation is now incorrect.

Oh, yeah. Thanks for fixing it.

> +-- Make sure checkpoints don't interfere with the test.
> +SELECT 'init' FROM pg_create_physical_replication_slot('regress_pg_walinspect_slot', true, false);
>
> Adding a physical slot is better for stability of course, but the test
> also has to drop it or installcheck would cause an existing cluster to
> have that still around. The third argument could be true, as well, so
> as we'd use a temporary slot.

# Disabled because these tests require "wal_level=replica", which
# some installcheck users do not have (e.g. buildfarm clients).
NO_INSTALLCHECK = 1

pg_walinspect can't be run under installcheck. I don't think dropping
the slot at the end is needed, it's unnecessary. I saw
oldextversions.sql using the same replication slot name, well no
problem, but I changed it to a unique name.

> Hmm. I would simplify that, and just mention that an error is raised
> when the start LSN is available, without caring about the rest (valid
> end LSN being past the current insert LSN, and error if start > end,
> the second being obvious).

Okay.

> + <note>
> + <para>
> + Note that <function>pg_get_wal_records_info_till_end_of_wal</function> and
> + <function>pg_get_wal_stats_till_end_of_wal</function> functions have been
> + removed in the <filename>pg_walinspect</filename> version
> + <literal>1.1</literal>. The same functionality can be achieved with
> + <function>pg_get_wal_records_info</function> and
> + <function>pg_get_wal_stats</function> functions by specifying a future
> + <replaceable>end_lsn</replaceable>. However, <function>till_end_of_wal</function>
> + functions will still work if the extension is installed explicitly with
> + version <literal>1.0</literal>.
> + </para>
> + </note>
>
> Not convinced that this is necessary.

As hackers we know that these functions have been removed and how to
achieve till_end_of_wal with the other functions. I noticed that
you've removed my changes (see below) from the docs that were saying
how to get info/stats till_end_of_wal. That leaves end users confused
as to how they can achieve till_end_of_wal functionality. All users
can't look for commit history/message but they can easily read the
docs. I prefer to have the following (did so in the attached v7) and
get rid of the above note if you don't feel strongly about it.

+ If a future <replaceable>end_lsn</replaceable>
+ (i.e. the LSN server doesn't know about) is specified, it returns
+ informaton till end of WAL.

> + GetInputLSNs(fcinfo, &start_lsn, &end_lsn, till_end_of_wal);
> +
> + stats_per_record = PG_GETARG_BOOL(2);
>
> This code in GetWalStats() is incorrect.
> pg_get_wal_stats_till_end_of_wal() has a stats_per_record, but as
> *second* argument, so this would be broken.

Oh, yeah. Thanks for fixing it.

> + GetInputLSNs(fcinfo, &start_lsn, &end_lsn, till_end_of_wal);
>
> Coming from the last point, I think that this interface is confusing,
> and actually incorrect. From what I can see, we should be doing what
> ~15 has by grepping the argument values within the main function
> calls, and just pass them down to the internal routines GetWalStats()
> and GetWALRecordsInfo().

Hm, what you have in v6 works for me.

> At the end, I am finishing with the attached. ValidateInputLSNs()
> ought to be called, IMO, when the caller of the SQL functions can
> directly specify an end_lsn. This means that there is no point to do
> this check in the two till_end_* functions. This has as cost two
> extra checks to make sure that the start_lsn is not higher than the
> current LSN, but I am fine to live with that. It seemed rather
> natural to me to let ValidateInputLSNs() do a refresh of the end_lsn
> if it sees that it is higher than the current LSN. And if you look
> closely, you will see that we only call *once* GetCurrentLSN() for
> each function call, so the maths are more precise.
>
> I have cleaned up the comments of the modules, while on it, as there
> was not much value in copy-pasting how a function fails while there is
> a centralized validation code path. The tests for the till_end()
> functions have been moved to the test path where we install 1.0.

I have some comments and fixed them in the attached v7 patch:

1.
+ * pg_get_wal_records_info
*
+ * pg_get_wal_stats
*
I think you wanted to be consistent with function comments with
function names atop, but missed adding for all functions. Actually, I
don't have a strong opinion on these changes as they unnecessarily
bloat the changes, so I removed them.

2.
+ if (start_lsn > curr_lsn)
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
- errmsg("cannot accept future start LSN"),
- errdetail("Last known WAL LSN on the database system
is at %X/%X.",
- LSN_FORMAT_ARGS(curr_lsn))));
- }
+ errmsg("WAL start LSN must be less than current LSN")));

I don't like this inconsistency much, especially when
pg_get_wal_record_info emits "cannot accept future input LSN" with the
current LSN details (this current LSN will give a bit more information
to the user). Also, let's be consistent across returning NULLs if
input LSN/start LSN equal to the current LSN. I've done these changes
in the attached v7 patch.

3. I wanted COUNT(*) >= 0 for successful function execution to be
COUNT(*) >= 1 so that we will check for at least the functions
returning 1 record. And failures to be SELECT * FROM. This was my
intention but I don't see that in this patch or in the previous
test-refactoring commit. I added that in the attached v7 patch again.
Also, made test comments conssitent.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachment Content-Type Size
v7-0001-Rework-pg_walinspect-functions.patch application/x-patch 31.0 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Melih Mutlu 2023-03-13 10:29:32 Re: Allow logical replication to copy tables in binary format
Previous Message Daniel Gustafsson 2023-03-13 10:10:13 Re: Allow tests to pass in OpenSSL FIPS mode