Re: Backend crash on non-exclusive backup cancel

From: David Steele <david(at)pgmasters(dot)net>
To: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Cc: PostgreSQL mailing lists <pgsql-bugs(at)postgresql(dot)org>
Subject: Re: Backend crash on non-exclusive backup cancel
Date: 2017-02-28 13:21:18
Message-ID: 9e95bb2a-c767-df8d-94b2-8da9b05d9a26@pgmasters.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs pgsql-hackers

On 2/27/17 10:05 PM, Michael Paquier wrote:
> On Tue, Feb 28, 2017 at 10:33 AM, David Steele <david(at)pgmasters(dot)net> wrote:
>>
>> I believe the solution is to move the exclusive flag to xlog.c and only
>> decrement XLogCtl->Insert.nonExclusiveBackups when exclusive is true,
>> otherwise return an error. Even then, it wouldn't be clear if the
>> backup had completed or not.
>
> I understand by this sentence that you mean
> nonexclusive_backup_running, and I think that I agree with that.

Whoops! Yes, I meant the nonexclusive_backup_running flag.

> We
> need a way to reset it properly the session-level switch in case of an
> interrupt found, and that needs visibly to happen in
> pg_stop_backup_callback and pg_start_backup_callback(). At the same
> time I think that exclusive_backup_running had better be moved to
> xlog.c as well. If I look at the failure in details when issuing a
> cancel,

Agreed.

> I can see that XLogCtl->Insert.nonExclusiveBackups gets
> decremented at the end do_pg_stop_backup, but
> nonexclusive_backup_running never gets set back to false because of
> the query cancellation.

Exactly.

>> I suppose any cancelled non-exclusive
>> pg_stop_backup() should be considered aborted whether a stop backup
>> record was written or not?
>
> That's not necessarily true, I can see a stop backup able to finish as
> well by issuing a cancel request. It seems to me that we just need to
> have the shmem information updated at the same time as the
> session-level switches for consistency and we're good. The
> inconsistency in places when updating the session-level flags and the
> shmem-level flags is what is causing harm.

I'm sure this could be done but it will require quite a bit of
refactoring and I'm not sure that it's worth it. In my mind it would be
enough to document that cancelled backups should be considered invalid
whether the stop backup record was written or not. However, I'm willing
to go with the majority opinion.

>> If that makes sense I'm happy to work up a patch. This is definitely an
>> edge case and I seriously doubt it is causing any issues in the field.
>
> <...>
>
> David, are you willing to write a patch or should I? Changing
> nonexclusive_backup_running/exclusive_backup_running to be an enum
> would make the code more readable as well IMO.

I would like to see if anyone else weighs in on this first, but yes I am
planning to write a patch. Agreed on the enum.

--
-David
david(at)pgmasters(dot)net

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Stephen Frost 2017-02-28 15:53:55 Re: BUG #14543: libpq fails with group readable ssl keys
Previous Message Magnus Hagander 2017-02-28 11:15:30 Re: BUG #14543: libpq fails with group readable ssl keys

Browse pgsql-hackers by date

  From Date Subject
Next Message Simon Riggs 2017-02-28 13:23:05 Re: avoid bloat from CREATE INDEX CONCURRENTLY
Previous Message Simon Riggs 2017-02-28 13:19:12 Avoiding bloat in CIC