| From: | Koen De Groote <kdg(dot)dev(at)gmail(dot)com> |
|---|---|
| To: | Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at> |
| Cc: | PostgreSQL General <pgsql-general(at)lists(dot)postgresql(dot)org> |
| Subject: | Re: In case of network issues, how long before archive_command does retries |
| Date: | 2022-05-19 13:43:38 |
| Message-ID: | CAGbX52E5AyuiJTU0FJQ25XEYqE6XtfbguJN49bJM747=zL3p7w@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-general |
Hello Laurenz,
Thanks for the reply. That would mean the source code is here:
https://github.com/postgres/postgres/blob/REL_11_0/src/backend/postmaster/pgarch.c
Just to be sure, the "signal" you speak of, this is the result of the
command executed by archive_command?
If my understanding of the code is right, if no SIGTERM or other signal
arrives, it won't ever happen that a walarchive is skipped if the
archive_command fails too many times or takes too long? It will simply
check again every 60 seconds(PGARCH_AUTOWAKE_INTERVAL) ? Or is the 60
seconds the point where it stops trying, waiting for the next time
archive_command is invoked?
I'm assuming that as long as the file is still in the pg_wal directory and
as long as there is no ".done" file for that walarchive under
pg_wal/archive_status, it will keep trying forever(or until someone
forcefully switches the timeline with for instance a basebackup)?
Apologies, I already sent this message once, but only to Laurenz. Sending
again to have it in the archives.
Regards,
Koen
On Thu, May 19, 2022 at 9:10 AM Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at>
wrote:
> On Wed, 2022-05-18 at 22:51 +0200, Koen De Groote wrote:
> > I've got a setup where archive_command will gzip the wal archive to a
> directory that is itself an NFS mount.
> >
> > When connection is gone or blocked, archive_command fails after the
> timeout specified by the NFS mount, as expected. (for a soft mount. hard
> mount hangs, as expected)
> >
> > However, on restoring connection, it's not clear to me how long it takes
> before the command is retried.
> >
> > Experience says "a few minutes", but I can't find documentation on an
> exact algorithm.
> >
> > To be clear, the question is: if archive_command fails, what are the
> specifics of retrying? Is there a timeout? How is that timeout defined?
> >
> > Is this detailed somewhere? Perhaps in the source code? I couldn't find
> it in the documentation.
> >
> > For detail, I'm using postgres 11, running on Ubuntu 20.
>
> You can find the details in "src/backend/postmaster/pgarch.c".
>
> The archiver will try to archive three times (NUM_ARCHIVE_RETRIES) in an
> interval
> of one second, then back off until it receives a signal, PostgreSQL shutd
> down
> or a minute has passed.
>
> Yours,
> Laurenz Albe
> --
> Cybertec | https://www.cybertec-postgresql.com
>
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Tom Lane | 2022-05-19 14:11:06 | Re: No default for (user-specific) service file location on Windows? |
| Previous Message | Julien Rouhaud | 2022-05-19 09:52:56 | Re: No default for (user-specific) service file location on Windows? |