Re: Race condition in recovery?

From: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
To: tgl(at)sss(dot)pgh(dot)pa(dot)us
Cc: robertmhaas(at)gmail(dot)com, dilipbalaut(at)gmail(dot)com, hlinnaka(at)iki(dot)fi, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Race condition in recovery?
Date: 2021-06-10 01:12:40
Message-ID: 20210610.101240.1270925505780628275.horikyota.ntt@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

At Wed, 09 Jun 2021 19:09:54 -0400, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote in
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> > Got it. I have now committed the patch to all branches, after adapting
> > your changes just a little bit.
> > Thanks to you and Kyotaro-san for all the time spent on this. What a slog!
>
> conchuela failed its first encounter with this test case:
>
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=conchuela&dt=2021-06-09%2021%3A12%3A25
>
> That machine has a certain, er, history of flakiness; so this may
> not mean anything. Still, we'd better keep an eye out to see if
> the test needs more stabilization.

https://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?nm=conchuela&dt=2021-06-09%2021%3A12%3A25&stg=recovery-check

> ==~_~===-=-===~_~== pgsql.build/src/test/recovery/tmp_check/log/025_stuck_on_old_timeline_cascade.log ==~_~===-=-===~_~==
....
> 2021-06-09 23:31:10.439 CEST [893820:1] LOG: started streaming WAL from primary at 0/2000000 on timeline 1
> 2021-06-09 23:31:10.439 CEST [893820:2] FATAL: could not receive data from WAL stream: ERROR: requested WAL segment 000000010000000000000002 has already been removed

The script 025_stuck_on_olde_timeline.pl (and I) forgets to set
wal_keep_size(segments).

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

Attachment Content-Type Size
set_walkeepsize_025_stuck_on_old_timeline_pl_13-master.patch text/x-patch 582 bytes
set_walkeepsize_025_stuck_on_old_timeline_pl_9_6-12.patch text/x-patch 582 bytes

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message John Naylor 2021-06-10 01:21:10 a path towards replacing GEQO with something better
Previous Message Michael Paquier 2021-06-10 00:46:26 Re: Multiple hosts in connection string failed to failover in non-hot standby mode