Quick Links

checkpoint_timeout parameter & WAL archive delay, pgbackrest fails

From:	KK CHN <kkchn(dot)in(at)gmail(dot)com>
To:	pgsql-general <pgsql-general(at)postgresql(dot)org>
Subject:	checkpoint_timeout parameter & WAL archive delay, pgbackrest fails
Date:	2025-05-02 07:16:29
Message-ID:	CAKgGyB97a_dGXFg-1n76Cu9-_OTOSkA1_QA76QwV5kY5dwqZ3w@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-general

Hi folks,
My pgbackrest backup on one of my RepoServer fails. The backup fails
some times with the error WAL file cannot be archived before 60000 ms
timeout.

The pgbackrest stanza check command is sometimes successful, but sometimes
fails.

I don't know why PG is unable to copy WAL files from pg_wal to
/data/myarchive_dir in real time. I always observed a delay of around 10
minutes for a wal file in pg_wal to appear in /data/my_archive_dir.

On investigation I'hv observed that our DB admin has put
checkpoint_timeout = 10 m in the postgresql.conf file.

I think this causes the WAL archiving delay and subsequently my
pgbackrest fails while trying to backup the DB to a remote RepoServer.

What the ideal value needed to be set for "checkpoint_timeout" to
overcome this issue. I don't want pgbackrest backup fails due to this
parameter ?. ( Is it possible to set a very minimum value for
checkpoint_timeout what is the minimum value or can I put 0 ? )

archive_command = 'pgbackrest --stanza=My_Repo archive-push %p && cp %p
/data/archive/%f'

From postgresql logs I am seeing this ..

ERROR: [082]: unable to push WAL file '000000010000026300000002' to the
archive asynchronously after 60 second(s)
HINT: check '/var/log/pgbackrest/My_Repo-archive-push-async.log' for
errors.
INFO: archive-push command end: aborted with exception [082]
2025-05-02 12:15:17 IST LOG: archive command failed with exit code 82
2025-05-02 12:15:17 IST DETAIL: The failed archive command was: pgbackrest
--stanza=My_Repo archive-push pg_wal/000000010000026300000002 && cp
pg_wal/000000010000026300000002 /data/archive/000000010000026300000002
INFO: archive-push command begin 2.52.1: [pg_wal/000000010000026300000002]
--archive-async --compress-type=zst --exec-id=2848559-384cf49c
--log-level-console=info --log-level-file=debug --log-level-stderr=info
--pg1-path= /var/lib/postgres/16/data --pg-version-force=16
--process-max=6 --repo1-host=10.50.12.202 --repo1-host-user=pgbackrest
--spool-path=/var/spool/pgbackrest --stanza=My_Repo

top output on DB cluster:

top - 12:37:00 up 66 days, 17:24, 2 users, load average: 4.04, 4.72, 4.56

Tasks: 902 total, 4 running, 897 sleeping, 0 stopped, 1 zombie
%Cpu(s): 7.4 us, 1.7 sy, 0.0 ni, 89.9 id, 0.4 wa, 0.2 hi, 0.4 si,
0.0 st
MiB Mem : 31837.6 total, 706.1 free, 15243.0 used, 24741.0 buff/cache
MiB Swap: 8060.0 total, 6634.0 free, 1426.0 used. 16608.9 avail Mem

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+
COMMAND
2839363 postgre+ 20 0 8965608 7.2g 7.1g S 70.2 23.0 2:02.61
postgres
2864108 postgre+ 20 0 8967848 7.1g 7.1g S 64.9 22.8 0:30.04
postgres
2865547 postgre+ 20 0 8965432 7.1g 7.1g S 39.1 22.8 0:32.30
postgres
2865752 postgre+ 20 0 8964352 6.9g 6.9g S 16.6 22.3 0:32.94
postgres

Model name: Intel(R) Xeon(R) Gold 6430
BIOS Model name: Intel(R) Xeon(R) Gold 6430
CPU family: 6
Model: 143
Thread(s) per core: 1
Core(s) per socket: 16

These are vCPUs (16 nos) , OS RHEL 9, postgres 16

Any hints on how to make pgbackrest take backup properly are much
appreciated.

Thanks,
Krishane

Browse pgsql-general by date

	From	Date	Subject
Next Message	Phil Florent	2025-05-02 13:09:52	RE: Unlogged partitionned tables and hot standbys
Previous Message	KK CHN	2025-05-02 06:08:33	Re: Pgbackrest : Resumable backup of same type exists