From: | David Steele <david(at)pgmasters(dot)net> |
---|---|
To: | Michael Paquier <michael(dot)paquier(at)gmail(dot)com> |
Cc: | PostgreSQL mailing lists <pgsql-bugs(at)postgresql(dot)org> |
Subject: | Re: Backend crash on non-exclusive backup cancel |
Date: | 2017-03-01 03:44:53 |
Message-ID: | 6a8f9d7e-ce4e-d850-12d3-7787815447ea@pgmasters.net |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-bugs pgsql-hackers |
On 2/28/17 9:08 PM, Michael Paquier wrote:
> On Tue, Feb 28, 2017 at 10:21 PM, David Steele <david(at)pgmasters(dot)net> wrote:
>> On 2/27/17 10:05 PM, Michael Paquier wrote:
>>> That's not necessarily true, I can see a stop backup able to finish as
>>> well by issuing a cancel request. It seems to me that we just need to
>>> have the shmem information updated at the same time as the
>>> session-level switches for consistency and we're good. The
>>> inconsistency in places when updating the session-level flags and the
>>> shmem-level flags is what is causing harm.
>>
>> I'm sure this could be done but it will require quite a bit of
>> refactoring and I'm not sure that it's worth it. In my mind it would be
>> enough to document that cancelled backups should be considered invalid
>> whether the stop backup record was written or not. However, I'm willing
>> to go with the majority opinion.
>
> I would think that addressing the problem is the way to go, hitting an
> assertion in this code path means that we are doing something wrong so
> simply removing it does not sound like a correct answer to me.
>
>>>> If that makes sense I'm happy to work up a patch. This is definitely an
>>>> edge case and I seriously doubt it is causing any issues in the field.
>>>
>>> <...>
>>>
>>> David, are you willing to write a patch or should I? Changing
>>> nonexclusive_backup_running/exclusive_backup_running to be an enum
>>> would make the code more readable as well IMO.
>>
>> I would like to see if anyone else weighs in on this first, but yes I am
>> planning to write a patch. Agreed on the enum.
>
> This was itching me yesterday so I wrote a patch able to address this
> problem...
I know that feeling.
> At least it can be registered to the CF on time and give it
> more visibility.
Indeed.
> By the way, I have noticed a second bug in the
> current logic while looking at the problem you have reported:
> 1) Start backup in session 1:
> =# select pg_start_backup('popo');
> pg_start_backup
> -----------------
> 0/2000028
> (1 row)
> 2) pg_stop_backup() in session 2
> 3) And now when trying to work on backups with session 1:
> =# select pg_start_backup('popo');
> ERROR: 55000: a backup is already in progress in this session
> LOCATION: pg_start_backup, xlogfuncs.c:87
> =# select pg_stop_backup();
> ERROR: 55000: exclusive backup not in progress
> LOCATION: do_pg_stop_backup, xlog.c:10642
Yeah, that doesn't look good.
> So the handling around exclusive_backup_running is broken now as it is
> possible to lock a session easily when handling backups. And the root
> of the problem is that checks on exclusive_backup_running are not
> necessary. I have fixed as well this problem as per the attached. Note
> however that SESSION_BACKUP_EXCLUSIVE still makes sense in the patch
> attached when doing checks with pg_stop_backup_v2() for a
> non-exclusive backup run. It is also useful for debugging.
Excellent! I'll have a look, and thanks for working up a patch.
--
-David
david(at)pgmasters(dot)net
From | Date | Subject | |
---|---|---|---|
Next Message | rick | 2017-03-01 16:08:41 | BUG #14572: pgadmin restore command path error |
Previous Message | Michael Paquier | 2017-03-01 02:08:55 | Re: Backend crash on non-exclusive backup cancel |
From | Date | Subject | |
---|---|---|---|
Next Message | Pavel Stehule | 2017-03-01 03:45:53 | Re: SQL/JSON in PostgreSQL |
Previous Message | Kuntal Ghosh | 2017-03-01 03:43:15 | Re: Performance degradation in TPC-H Q18 |