Trap errors from streaming child in pg_basebackup to exit early

From: Daniel Gustafsson <daniel(at)yesql(dot)se>
To: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Cc: Magnus Hagander <magnus(at)hagander(dot)net>
Subject: Trap errors from streaming child in pg_basebackup to exit early
Date: 2021-08-26 09:25:06
Message-ID: 0F69E282-97F9-4DB7-8D6D-F927AA6340C8@yesql.se
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

When using pg_basebackup with WAL streaming (-X stream), we have observed on a
number of times in production that the streaming child exited prematurely (to
no fault of the code it seems, most likely due to network middleboxes), which
cause the backup to fail but only after it has run to completion. On long
running backups this can consume a lot of time before it’s noticed.

By trapping the failure of the streaming process we can instead exit early to
allow the user to fix and/or restart the process.

The attached adds a SIGCHLD handler for Unix, and catch the returnvalue from
the Windows thread, in order to break out early from the main loop. It still
needs a test, and proper testing on Windows, but early feedback on the approach
would be appreciated.

--
Daniel Gustafsson https://vmware.com/

Attachment Content-Type Size
0001-Quick-exit-on-log-stream-child-exit-in-pg_basebackup.patch application/octet-stream 2.9 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Magnus Hagander 2021-08-26 09:33:08 Re: cannot access to postgres-git via ssh
Previous Message Etsuro Fujita 2021-08-26 09:20:45 Re: list of acknowledgments for PG14