Re: RecoveryWalAll and RecoveryWalStream wait events

From: Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com>
To: Atsushi Torikoshi <atorik(at)gmail(dot)com>
Cc: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: RecoveryWalAll and RecoveryWalStream wait events
Date: 2020-03-18 09:59:51
Message-ID: bacc5242-50e1-3e1e-550d-df393861a1bf@oss.nttdata.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2020/03/18 17:56, Atsushi Torikoshi wrote:
>
>
> On Tue, Mar 17, 2020 at 11:55 AM Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com <mailto:masao(dot)fujii(at)oss(dot)nttdata(dot)com>> wrote:
>
> >  >    Waiting when WAL data is not available from any kind of sources
> >  >    (local, archive or stream) before trying again to retrieve WAL data,
> >
> > I think 'local' means pg_wal here, but the comment on
> > WaitForWALToBecomeAvailable() says checking pg_wal in
> > standby mode is 'not documented', so I'm a little bit worried
> > that users may be confused.
>
> This logic seems to be documented in high-availability.sgml.
>
>
> Thanks! I didn't notice it.
> I think you mean the below sentence.
>
> >  The standby server will also attempt to restore any WAL found in the standby cluster's pg_wal directory.

I meant the following part in the doc.

---------------------
At startup, the standby begins by restoring all WAL available in the archive
location, calling restore_command. Once it reaches the end of WAL available
there and restore_command fails, it tries to restore any WAL available in the
pg_wal directory. If that fails, and streaming replication has been configured,
the standby tries to connect to the primary server and start streaming WAL from
the last valid record found in archive or pg_wal. If that fails or streaming
replication is not configured, or if the connection is later disconnected,
the standby goes back to step 1 and tries to restore the file from the archive
again. This loop of retries from the archive, pg_wal, and via streaming
replication goes on until the server is stopped or failover is triggered by a
trigger file.
---------------------

> It seems the comment on WaitForWALToBecomeAvailable()
> does not go along with the high-availability.sgml, do we need
> modification on the comment on the function?

No, I think for now. But you'd like to improve the docs?

> But, anyway, you think that "pg_wal" should be used instead
>
> of "local" here?
>
>
> I don't have special opinion here.
> It might be better because high-availability.sgml does not use
> "local" but "pg_wal" for the explanation,  but I also feel it's
> obvious in this context.

Ok, I changed that from "local" to "pg_wal" in the patch for
the master. Attached is the updated version of the patch.
If you're OK with this, I'd like to commit two patches that I posted
in this thread.

Regards,

--
Fujii Masao
NTT DATA CORPORATION
Advanced Platform Technology Group
Research and Development Headquarters

Attachment Content-Type Size
improve_recovery_wait_event_for_master_v2.patch text/plain 5.1 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Julien Rouhaud 2020-03-18 10:10:55 Re: Online checksums verification in the backend
Previous Message Ronan Dunklau 2020-03-18 09:39:10 SupportRequestSimplify and SQL SRF