Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL

From: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
To: David Christensen <david(dot)christensen(at)crunchydata(dot)com>
Cc: Justin Pryzby <pryzby(at)telsasoft(dot)com>, Ian Lawrence Barwick <barwick(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL
Date: 2022-11-15 10:41:05
Message-ID: CALj2ACUukW74XXb2n3FFAdRiisF06jSNDR8DegOpMevq_0yVoA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Nov 15, 2022 at 1:29 AM David Christensen
<david(dot)christensen(at)crunchydata(dot)com> wrote:
>
> Enclosed is v8, which uses the replication slot method to retain WAL
> as well as fsync'ing the output directory when everything is done.

Thanks. It mostly is in good shape. However, few more comments:

1.
+ if it does not exist. The images saved will be subject to the same
+ filtering and limiting criteria as display records, but in this
+ mode <application>pg_waldump</application> will not output any other
+ information.
May I know what's the intention of the statement 'The images saved
....'? If it's not necessary and convey anything useful to the user,
can we remove it?

2.
+#include "storage/checksum.h"
+#include "storage/checksum_impl.h"
I think we don't need the above includes as we got rid of verifying
page checksums. The patch compiles without them for me.

3.
+ char *save_fpw_path;
Can we rename the above variable to save_fpi_path, just to be in sync
with what we expose to the user, the option name 'save-fpi'?

4.
+ if (config.save_fpw_path != NULL)
+ {
+ /* Fsync our output directory */
+ fsync_fname(config.save_fpw_path, true);
+ }
I guess adding a comment there as to why we aren't fsyncing for every
file that gets created, but once per the directory at the end. That'd
help clarify doubts that other members might get while looking at the
code.

5.
+ if (config.save_fpw_path != NULL)
+ {
+ /* Fsync our output directory */
+ fsync_fname(config.save_fpw_path, true);
+ }
So, are we sure that we don't want to fsync for time_to_stop exit(0)
cases, say when CTRL+C'ed. Looks like we handle time_to_stop safely
meaning exiting with return code 0, shouldn't we fsync the directory?

6.
+ else if (config.save_fpw_path)
Let's use the same convention to check non-NULLness,
config.save_fpw_path != NULL.

7.
+CHECKPOINT;
+SELECT pg_switch_wal();
+UPDATE test_table SET a = a + 1;
+SELECT pg_switch_wal();
I don't think switching WAL after checkpoint is necessary here,
because the checkpoint ensures all the WAL gets flushed to disk.
Please remove it.

PS: I've seen the following code:
+my $walfile = [sort { $a <=> $b } glob("$waldir/00*")]->[1]; # we
want the second WAL file, which will be a complete WAL file with
full-page writes for our specific relation.

8.
+$node->safe_psql('postgres', <<EOF);
+EOF
Why EOF is used here? Can't we do something like below to execute
multiple statements?
$node->safe_psql(
'postgres', qq[
SELECT data FROM pg_logical_slot_get_changes('regression_slot1', NULL,
NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
SELECT data FROM pg_logical_slot_get_changes('regression_slot2', NULL,
NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
SELECT data FROM pg_logical_slot_get_changes('regression_slot3', NULL,
NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
SELECT data FROM pg_logical_slot_get_changes('regression_slot4', NULL,
NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
]);

Same here:
+$node->safe_psql('postgres', <<EOQ);
+SELECT pg_drop_replication_slot('regress_pg_waldump_slot');
+EOQ

9.
+my $walfile = [sort { $a <=> $b } glob("$waldir/00*")]->[1]; # we
want the second WAL file, which will be a complete WAL file with
full-page writes for our specific relation.
Is it guaranteed that just looking at the second WAL file in the
pg_wal directory assures WAL file with FPIs? I think we have to save
the WAL file that contains FPIs, that is the file after, CHECKPOINT,
UPDATE and pg_switch_wal. I think you can store output LSN of
pg_switch_wal

10.
+$node->safe_psql('postgres', <<EOQ);
+SELECT pg_drop_replication_slot('regress_pg_waldump_slot');
+EOQ
+done_testing();

Do we need to explicitly drop the slot here? I think we don't
specifically drop the replication slot in all the places, my guess is
after done_testing(), the node would get destroyed and also the slot.
I think it's not required.

11.
+# verify filename formats matches w/--save-fpi
+for my $fullpath (glob "$tmp_folder/raw/*")
Do we need to look for the exact match of the file that gets created
in the save-fpi path? While checking for this is great, it makes the
test code non-portable (may not work on Windows or other platforms,
no?) and complex? This way, you can get rid of get_block_info() as
well? And +for my $fullpath (glob "$tmp_folder/raw/*")
will also get simplified.

I think you can further simplify the tests by:
create the node
generate an FPI
call pg_waldump with save-fpi option
check the target directory for a file that contains the relid,
something like '%relid%'.

The above would still serve the purpose, tests the code without much complexity.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Pavel Borisov 2022-11-15 10:43:06 Re: Unit tests for SLRU
Previous Message Daniel Gustafsson 2022-11-15 10:39:20 Re: Unit tests for SLRU