Backend crash on non-exclusive backup cancel

From: David Steele <david(at)pgmasters(dot)net>
To: pgsql-bugs(at)postgresql(dot)org
Subject: Backend crash on non-exclusive backup cancel
Date: 2017-02-28 01:33:34
Views: Raw Message | Whole Thread | Download mbox
Lists: pgsql-bugs pgsql-hackers

I found this issue while working on a pg_stop_backup() patch. If a
non-exclusive pg_stop_backup() is cancelled and then attempted again the
backend will crash on assertion:

$ test/pg/bin/psql
psql (10devel)
Type "help" for help.

postgres=# select * from pg_start_backup('label', true, false);
(1 row)

postgres=# select * from pg_stop_backup(false);
^CCancel request sent
ERROR: canceling statement due to user request
postgres=# select * from pg_stop_backup(false);
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.
!> \q

From the server log:

2017-02-28 01:21:34.755 UTC STATEMENT: select * from pg_stop_backup(false);
TRAP: FailedAssertion("!(XLogCtl->Insert.nonExclusiveBackups > 0)",
File: "/postgres/src/backend/access/transam/xlog.c", Line: 10723)

This error was produced in master at 30df93f. Configure settings are
--enable-cassert --enable-tap-tests --with-openssl.

Disabling assertions "works", but there is still a problem. A backend
that keeps cancelling pg_stop_backup() without ever resetting the
exclusive flag in xlogfunc.c can decrement the the shared variable
XLogCtl->Insert.nonExclusiveBackups as many times as it wants. As far
as I can see the worst that will happen is that
XLogCtl->Insert.forcePageWrites won't get set back to false, but that's
still a bug.

This condition should throw "backup is not in progress" just as a
exclusive backup would, whether assertions are enabled or not.

I believe the solution is to move the exclusive flag to xlog.c and only
decrement XLogCtl->Insert.nonExclusiveBackups when exclusive is true,
otherwise return an error. Even then, it wouldn't be clear if the
backup had completed or not. I suppose any cancelled non-exclusive
pg_stop_backup() should be considered aborted whether a stop backup
record was written or not?

If that makes sense I'm happy to work up a patch. This is definitely an
edge case and I seriously doubt it is causing any issues in the field.



Browse pgsql-hackers by date

  From Date Subject
Next Message Haribabu Kommi 2017-02-28 01:42:34 Re: utility commands benefiting from parallel plan
Previous Message Andres Freund 2017-02-28 01:13:32 Re: Replication vs. float timestamps is a disaster

Browse pgsql-bugs by date

  From Date Subject
Next Message Michael Paquier 2017-02-28 03:05:11 Re: Backend crash on non-exclusive backup cancel
Previous Message Tom Lane 2017-02-27 23:07:33 Re: BUG #14543: libpq fails with group readable ssl keys