Re: Stronger safeguard for archive recovery not to miss data

From: Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com>
To: "osumi(dot)takamichi(at)fujitsu(dot)com" <osumi(dot)takamichi(at)fujitsu(dot)com>, 'Kyotaro Horiguchi' <horikyota(dot)ntt(at)gmail(dot)com>
Cc: "david(at)pgmasters(dot)net" <david(at)pgmasters(dot)net>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>, "laurenz(dot)albe(at)cybertec(dot)at" <laurenz(dot)albe(at)cybertec(dot)at>
Subject: Re: Stronger safeguard for archive recovery not to miss data
Date: 2021-04-05 03:34:53
Message-ID: 8208c4c1-2b09-5a89-18cc-d472bbc5a203@oss.nttdata.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2021/04/04 11:58, osumi(dot)takamichi(at)fujitsu(dot)com wrote:
>> IMO it's better to comment why this server restart is necessary.
>> As far as I understand correctly, this is necessary to ensure the WAL file
>> containing the record about the change of wal_level (to minimal) is archived,
>> so that the subsequent archive recovery will be able to replay it.
> OK, added some comments. Further, I felt the way I wrote this part was not good at all and self-evident
> and developers who read this test would feel uneasy about that point.
> So, a little bit fixed that test so that we can get clearer conviction for wal archive.

LGTM. Thanks for updating the patch!

Attached is the updated version of the patch. I applied the following changes.
Could you review this version? Barring any objection, I'm thinking to
commit this.

+sub check_wal_level
+{
+ my ($target_wal_level, $explanation) = @_;
+
+ is( $node->safe_psql(
+ 'postgres', q{show wal_level}),
+ $target_wal_level,
+ $explanation);

Do we really need this test? This test doesn't seem to be directly related
to the test purpose. It seems to be testing the behavior of the PostgresNode
functions. So I removed this from the patch.

+# This protection should apply to recovery mode
+my $another_node = get_new_node('another_node');

The same test is performed twice with different settings. So isn't it better
to merge the code for both tests into one common function, to simplify
the code? I did that.

I also applied some cosmetic changes.

>>> By the way, when I build postgres with this patch and enable-coverage
>>> option, the results of RT becomes unstable. Does someone know the
>> reason ?
>>> When it fails, I get stderr like below
>>
>> I have no idea about this. Does this happen even without the patch?
> Unfortunately, no. I get this only with --enable-coverage and with my patch,
> althought regression tests have passed with this patch.
> OSS HEAD doesn't produce the stderr even with --enable-coverage.

Could you check whether the latest patch still causes this issue or not?
If it still causes, could you check which part (the change of xlog.c or
the addition of regression test) caused the issue?

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Attachment Content-Type Size
stronger_safeguard_for_archive_recovery_v07.patch text/plain 4.9 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Fujii Masao 2021-04-05 03:59:14 Re: Get memory contexts of an arbitrary backend process
Previous Message Bharath Rupireddy 2021-04-05 03:27:41 Re: Logical Replication - improve error message while adding tables to the publication in check_publication_add_relation