Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL

From: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
To: David Christensen <david(dot)christensen(at)crunchydata(dot)com>
Cc: Justin Pryzby <pryzby(at)telsasoft(dot)com>, Ian Lawrence Barwick <barwick(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL
Date: 2022-11-10 08:33:29
Message-ID: CALj2ACVnA8tQBsFWPfk4062YVe=HhaypVsD5j6PCfJYDML9Q+Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Nov 10, 2022 at 1:31 AM David Christensen
<david(dot)christensen(at)crunchydata(dot)com> wrote:
>

Thanks for providing the v7 patch, please see my comments and responses below.

> > 2. I'm unable to understand the use-case for --fixup-fpi option.
> > pg_waldump is supposed to be just WAL reader, and must not return any
> > modified information, with --fixup-fpi option, the patch violates this
> > principle i.e. it sets page LSN and returns. Without actually
> > replaying the WAL record on the page, how is it correct to just set
> > the LSN? How will it be useful? ISTM, we must ignore this option
> > unless there's a strong use-case.
>
> How I was envisioning this was for cases like extreme surgery for
> corrupted pages, where you extract the page from WAL but it has lsn
> and checksum set so you could do something like `dd if=fixup-block
> of=relation ...`, so it *simulates* the replay of said fullpage blocks
> in cases where for some reason you can't play the intermediate
> records; since this is always a fullpage block, it's capturing what
> would be the snapshot so you could manually insert somewhere as needed
> without needing to replay (say if dealing with an incomplete or
> corrupted WAL stream).

Recovery sets the page LSN after it replayed the WAL record on the
page right? Recovery does this - base_page/FPI +
apply_WAL_record_and_then_set_applied_WAL_record's_LSN =
new_version_of_page. Essentially, in your patch, you are just setting
the WAL record LSN with the page contents being the base page's. I'm
still not sure what's the real use-case here. We don't have an
independent function in postgres, given a base page and a WAL record
that just replays the WAL record and output's the new version of the
page, so I think what you do in the patch with fixup option seems
wrong to me.

> > 5.
> > + if (!RestoreBlockImage(record, block_id, page))
> > + continue;
> > +
> > + /* we have our extracted FPI, let's save it now */
> > After extracting the page from the WAL record, do we need to perform a
> > checksum on it?

I think you just need to do the following, this will ensure the
authenticity of the page that pg_waldump returns.
if ((PageHeader) page)->pd_checksum != pg_checksum_page((char *) page, blk))
{
pg_fatal("page checksum failed");
}

> > case 'W':
> > config.save_fpw_path = pg_strdup(optarg);
> > case 'X':
> > config.fixup_fpw = true;
> > config.save_fpw_path = pg_strdup(optarg);
>
> Like separate opt processing with their own `break` statement?
> Probably a bit more readable/conventional.

Yes.

Some more comments:

1.
+ PGAlignedBlock zerobuf;
Essentially, it's not a zero buffer, please rename the variable to
something like 'buf' or 'page_buf' or someother?

2.
+ if (pg_pwrite(fileno(OPF), page, BLCKSZ, 0) != BLCKSZ)
Replace pg_pwrite with fwrite() and avoid fileno() system calls that
should suffice here, AFICS, we don't need pg_pwrite.

3.
+ if (config.save_fpw_path != NULL)
+ {
+ /* Create the dir if it doesn't exist */
+ if (pg_mkdir_p(config.save_fpw_path, pg_dir_create_mode) < 0)
I think you still need pg_check_dir() here, how about something like below?

if (pg_check_dir(config.save_fpw_path) == 0)
{
if (pg_mkdir_p(config.save_fpw_path, pg_dir_create_mode) < 0)
{
/* error */
}
}

4.
+ /* Create the dir if it doesn't exist */
+ if (pg_mkdir_p(config.save_fpw_path, pg_dir_create_mode) < 0)
+ {
+ pg_log_error("could not create output directory \"%s\": %m",
+ config.save_fpw_path);
+ goto bad_argument;
Why is the directory creation error a bad_argument? I think you need
just pg_fatal() here.

5.
+ fsync(fileno(OPF));
+ fclose(OPF);
I think just the fsync() isn't enough, you still need fsync_fname()
and/or fsync_parent_path(), perhaps after for (block_id = 0; block_id
<= XLogRecMaxBlockId(record); block_id++) loop.

6. Speaking of which, do we need to do fsync()'s optionally? If we
were to write many such FPI files, aren't there going to be more
fsync() calls and imagine this feature being used on a production
server and a lot of fsync() will definitely make running server
fsync() ops slower. I think we need a new option whether pg_waldump
ever do fsync() or not, something similar to --no-sync of
pg_receivewal/pg_upgrade/pg_dump/pg_initdb/pg_checksums etc. I would
like it if the pg_waldump's --no-sync is coded as 0001 and 0002 can
make use of it.

7.
+ pg_fatal("couldn't write out complete fullpage image to
file: %s", filename);
We typically use "full page image" in the output strings, please correct.

8.
+
+ if (((PageHeader) page)->pd_checksum)
+ ((PageHeader) page)->pd_checksum =
pg_checksum_page((char *) page, blk);
Why do you need to set the page's checksum by yourself? I don't think
this is the right way, pg_waldump should just return what it sees in
the WAL record, of course, after verifying a few checks (like checksum
is correct or not), but it mustn't set or compute anything new in the
returned page.

Few comments on the tests:
1.
+$primary->init('-k');
+$primary->append_conf('postgresql.conf', "max_wal_size='100MB'");
+$primary->append_conf('postgresql.conf', "wal_level='replica'");
+$primary->append_conf('postgresql.conf', 'archive_mode=on');
+$primary->append_conf('postgresql.conf', "archive_command='/bin/false'");
+$primary->start;
I don't think we need these many append_conf calls here, see how
others are doing it with just one single call:

$node->append_conf(
'postgresql.conf', qq{
listen_addresses = '$hostaddr'
krb_server_keyfile = '$keytab'
log_connections = on
lc_messages = 'C'
});

2.
+$primary->append_conf('postgresql.conf', "max_wal_size='100MB'");
Do you really need 100MB max_wal_size? Why can't you just initialize
your cluster with 1MB wal files instead of 16MB and set max_wal_size
to 4MB or a bit more, something like 019_replslot_limit.pl does?

3.
+# generate data/wal to examine
+$primary->safe_psql('postgres', q(CREATE DATABASE db1));
+$primary->safe_psql('db1', <<EOF);
+CREATE TABLE test_table AS SELECT generate_series(1,100) a;
+CHECKPOINT;
+SELECT pg_switch_wal();
+UPDATE test_table SET a = a + 1;
+CHECKPOINT;
+SELECT pg_switch_wal();
+UPDATE test_table SET a = a + 1;
+CHECKPOINT;
+SELECT pg_switch_wal();
+EOF
I don't think you need these many complex things to generate WAL,
multiple CHECKPOINT;s can make tests slower. To keep it simple, you
can just create a table, insert a single row, checkpoint, update the
row, switch the wal - no need to test if your feature generates
multiple WAL files, it's enough to test if it generates just one.
Please simplfiy the tests.

4.
+$primary->append_conf('postgresql.conf', "wal_level='replica'");
+$primary->append_conf('postgresql.conf', 'archive_mode=on');
+$primary->append_conf('postgresql.conf', "archive_command='/bin/false'");
Why do you need to set wal_level to replica, out of the box your
cluster comes with replica only no?
And why do you need archive_mode on and set the command to do nothing?
Why archiving is needed for testing your feature firstly?

5.
+my $primary = PostgreSQL::Test::Cluster->new('primary');
Can you rename your node to other than primary? Because this isn't a
test of replication where primary and standby nodes get created. How
about just 'node'?

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Juan José Santamaría Flecha 2022-11-10 09:59:41 Meson doesn't define HAVE_LOCALE_T for mscv
Previous Message Michael Paquier 2022-11-10 08:17:46 Re: Unit tests for SLRU