Clear base backup progress reporting on error

From: Chao Li <li(dot)evan(dot)chao(at)gmail(dot)com>
To: PostgreSQL-development <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Clear base backup progress reporting on error
Date: 2026-06-26 08:58:50
Message-ID: EA1A6CD2-EFA6-462B-9A02-03003555AB4A@gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

While testing “[deb674454] Add backup_type column to pg_stat_progress_basebackup”, I didn’t find anything wrong with the feature itself, but I noticed a suspicious existing behavior of pg_stat_progress_basebackup. After a BASE_BACKUPcommand fails, pg_stat_progress_basebackup can still show the failed session.

See this simple repro:

1. Start a server

2. Under the data directory, create a unreadable file, that will cause base backup to fail:
```
% cd <data-dir>
% mkdir data
% touch data/zzz_unreadable
% chmod 000 data/zzz_unreadable
```

3. In psql, check pg_stat_progress_basebackup. It shows nothing:
```
evantest=# SELECT pid, phase, backup_type, backup_streamed > 0 AS streamed FROM pg_stat_progress_basebackup;
pid | phase | backup_type | streamed
-----+-------+-------------+----------
(0 rows)
```

4. Start a walsender, issue a BASE_BACKUP command, and the command should fail:
```
% psql 'dbname=postgres replication=database’
postgres=# BASE_BACKUP (checkpoint 'fast’);
<… omit some output …>
ERROR: could not open file "./data/zzz_unreadable": Permission denied
```

5. In psql, check pg_stat_progress_basebackup again:
```
evantest=# SELECT pid, phase, backup_type, backup_streamed > 0 AS streamed FROM pg_stat_progress_basebackup;
pid | phase | backup_type | streamed
-------+--------------------------+-------------+----------
50156 | streaming database files | full | t
(1 row)
```

It still shows the failed base backup session with phase “streaming database files”, which seems wrong.

After debugging, I think the problem is:

* bbsink_progress_new calls pgstat_progress_start_command.
* pgstat_progress_end_command is called by basebackup_progress_done, and basebackup_progress_done is only called by perform_base_backup after a backup succeeds.
* If a backup fails, the FINALLY clause calls bbsink_cleanup, which calls sink->bbs_ops->cleanup(sink). This eventually calls bbsink_forward_cleanup, so in the failure path, pgstat_progress_end_command is not called.

This patch makes the following changes:

* Add a new cleanup helper that calls pgstat_progress_end_command and bbsink_forward_cleanup.
* Change bbsink_progress_ops’s cleanup callback to point to the new helper.
* Remove the call to basebackup_progress_done from perform_base_backup.
* Since basebackup_progress_done has no other caller, remove it.

With this change, pgstat_progress_start_command and pgstat_progress_end_command are called in paired new/cleanup paths, which looks more consistent.

I don’t think this is a serious bug, because with the usual pg_basebackup command path, the client normally disconnects after the error, so the stale progress entry is not observable. It is mainly visible when the same replication connection stays open after the failed BASE_BACKUP command. So I feel it might not be worth adding a TAP test. But if others think differently, I will be happy to add one.

Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/

Attachment Content-Type Size
v1-0001-Clear-base-backup-progress-reporting-during-sink-.patch application/octet-stream 3.5 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Shinya Kato 2026-06-26 09:07:48 Re: Report oldest xmin source when autovacuum cannot remove tuples
Previous Message shveta malik 2026-06-26 08:53:04 Re: Support EXCEPT for ALL SEQUENCES publications