From: | KK CHN <kkchn(dot)in(at)gmail(dot)com> |
---|---|
To: | pgsql-general(at)lists(dot)postgresql(dot)org |
Subject: | Pgbackrest failure for INCR and DIFF but not FULL backup |
Date: | 2025-05-07 11:17:29 |
Message-ID: | CAKgGyB-8ap3-zna_gkpEQNfzgWzu4QwO5Y6U_-2q1JcjK=UTvw@mail.gmail.com |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
Hi folks,
I am facing a strange issue, Pgbackrest backup fails for DIFF or INCR
backups but not Full backup, with the * error WAL file cannot be archived
before 60000 ms timeout.*
The pgbackrest " *stanza check* " command *sometimes succeeds, but
sometimes fails.*
I don't know why * PG is unable to* *copy WAL files from pg_wal to
/data/myarchive_di*r* in real time*. I always observed a delay of a few
minutes for a wal file from pg_wal to appear in /data/my_archive_dir.
I'hv observed in the postgresql.conf (checkpoint_timeout = 5 m,
max_wal_size = 16 GB, wal_keep_size=15GB, min_wal_size=80MB etc.)
and I found pg_wal dir size is 16 GB on disk (du -h -d 1 ) and
/data/archive = 1.2 T
/dev/mapper/rhel_bcga68-data 5.0T 1.8T 3.3T 35% /data
Can we suspect the 5 M is the reason for the WAL archiving delay ?
Backup to a remote RepoServer for INCR or DIFF backup always fails, I
found a full backup always succeeds!!!
What is the ideal value needed to be set for "*checkpoint_timeout*" ?
*Or this doesn't have any impact on pgbackrest failure ?*
*archive_command = 'pgbackrest --stanza=My_Repo archive-push %p && cp %p
/data/archive/%f' *
From postgresql logs I am seeing this ..
ERROR: [082]: unable to push WAL file '000000010000026300000002' to the
archive asynchronously after 60 second(s)
HINT: check '/var/log/pgbackrest/My_Repo-archive-push-async.log' for
errors.
INFO: archive-push command end: aborted with exception [082]
2025-05-02 12:15:17 IST LOG: archive command failed with exit code 82
2025-05-02 12:15:17 IST DETAIL: The failed archive command was: pgbackrest
--stanza=My_Repo archive-push pg_wal/000000010000026300000002 && cp
pg_wal/000000010000026300000002 /data/archive/000000010000026300000002
INFO: archive-push command begin 2.52.1: [pg_wal/000000010000026300000002]
--archive-async --compress-type=zst --exec-id=2848559-384cf49c
--log-level-console=info --log-level-file=debug --log-level-stderr=info
--pg1-path= /var/lib/postgres/16/data --pg-version-force=16
--process-max=6 --repo1-host=10.x.y.202 --repo1-host-user=pgbackrest
--spool-path=/var/spool/pgbackrest --stanza=My_Repo
top output on DB cluster:
top - 12:37:00 up 66 days, 17:24, 2 users, load average: 4.04, 4.72, 4.56
Tasks: 902 total, 4 running, 897 sleeping, 0 stopped, 1 zombie
%Cpu(s): 7.4 us, 1.7 sy, 0.0 ni, 89.9 id, 0.4 wa, 0.2 hi, 0.4 si,
0.0 st
MiB Mem : 31837.6 total, 706.1 free, 15243.0 used, 24741.0 buff/cache
MiB Swap: 8060.0 total, 6634.0 free, 1426.0 used. 16608.9 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+
COMMAND
2839363 postgre+ 20 0 8965608 7.2g 7.1g S 70.2 23.0 2:02.61
postgres
2864108 postgre+ 20 0 8967848 7.1g 7.1g S 64.9 22.8 0:30.04
postgres
2865547 postgre+ 20 0 8965432 7.1g 7.1g S 39.1 22.8 0:32.30
postgres
2865752 postgre+ 20 0 8964352 6.9g 6.9g S 16.6 22.3 0:32.94
postgres
Model name: Intel(R) Xeon(R) Gold 6430
BIOS Model name: Intel(R) Xeon(R) Gold 6430
CPU family: 6
Model: 143
Thread(s) per core: 1
Core(s) per socket: 16
These are vCPUs (16 nos) , OS RHEL 9, postgres 16, pgbackrest 2.52.1
*Any hints most welcome to find the root cause / troubleshoot
the pgbackrest failures for DIFF/ INCR backups.*
Thank you
Krishane
*For More Inputs: *
For more info : you can see the full backup success here. but a diff
backup fails.
full backup: 20250505-070204F
timestamp start/stop: 2025-05-05 07:02:04+05:30 / 2025-05-05
22:11:23+05:30
wal start/stop: 000000010000026F00000066 /
000000010000027300000045
database size: 503.1GB, database backup size: 503.1GB
repo1: backup size: 79GB
When I try diff backup it always fails.
[root(at)Repo ~]# tail -f /var/log/pgbackrest/My_Repo-backup.log
stack trace:
command/archive/find.c:walSegmentFind:191:(this: {WalSegmentFind},
walSegment: {"000000010000027B0000006C"})
command/backup/backup.c:backupArchiveCheckCopy:(backupData: {BackupData},
manifest: {Manifest})
command/backup/backup.c:cmdBackup:(void)
main.c:main:(debug log level required
for parameters)
--------------------------------------------------------------------
2025-05-07 15:47:49.760 P00 INFO: backup command end: aborted with
exception [082]
2025-05-07 15:47:49.760 P00 DEBUG: command/exit::exitSafe: => 82
2025-05-07 15:47:49.860 P00 DEBUG: main::main: => 82
^C
[ root(at)Repo ~ ~]# date
Wednesday 07 May 2025 04:06:37 PM IST
*The postgres log says*
=316781-61d82f85 --log-level-console=info --log-level-file=debug
--log-level-stderr=info --pg1-path=
/var/lib/postgres/16/data --pg-version-force=16 --process-max=3
--repo1-host=10.x.y.202 --repo1-host-user=pgbackrest
--spool-path=/var/spool/pgbackrest --stanza=My_Repo
INFO: pushed WAL file '000000010000027B00000082' to the archive
asynchronously
INFO: archive-push command end: completed successfully (37506ms)
INFO: archive-push command begin 2.52.1: [pg_wal/000000010000027B00000083]
--archive-async --compress-type=zst --exec-id=317334-27a30b57
--log-level-console=info --log-level-file=debug --log-level-stderr=info
--pg1-path= /var/lib/postgres/16/data --pg-version-force=16
--process-max=3 --repo1-host=10.x.y.202 --repo1-host-user=pgbackrest
--spool-path=/var/spool/pgbackrest --stanza=My_Repo
ERROR: [082]: unable to push WAL file '000000010000027B00000083' to the
archive asynchronously after 60 second(s)
HINT: check '/var/log/pgbackrest/My_Repo-archive-push-async.log' for
errors.
INFO: archive-push command end: aborted with exception [082]
2025-05-07 16:22:56 IST LOG: archive command failed with exit code 82
[ root(at)Repo ~ ~]#
From | Date | Subject | |
---|---|---|---|
Next Message | Alicja Kucharczyk | 2025-05-07 12:16:49 | Re: huge_pages=on cause could not map anonymous shared memory: Cannot allocate memory |
Previous Message | Bogdan Siara | 2025-05-07 11:00:18 | huge_pages=on cause could not map anonymous shared memory: Cannot allocate memory |