From: | Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> |
---|---|
To: | Michael Paquier <michael(dot)paquier(at)gmail(dot)com> |
Cc: | PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Assertion failure when the non-exclusive pg_stop_backup aborted. |
Date: | 2017-09-22 02:34:05 |
Message-ID: | CAD21AoA+isXYL1_ZXMnk9xJhYEL5h6rxJtTovLi7fumqfmCYgg@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Thu, Sep 21, 2017 at 5:56 PM, Michael Paquier
<michael(dot)paquier(at)gmail(dot)com> wrote:
>
> Yep, but the deficiency is caused by the use before_shmem_exit() in
> the SQL-level functions present in 9.6~ which make the cleanup happen
> potentially twice. If you look at the code of basebackup.c, you would
> notice that the cleanups of the counters only happen within the
> PG_ENSURE_ERROR_CLEANUP() blocks, so a backpatch to 9.6 should be
> enough.
Thank you for explaining. I understood and agreed..
On Fri, Sep 22, 2017 at 8:02 AM, Michael Paquier
<michael(dot)paquier(at)gmail(dot)com> wrote:
> On Thu, Sep 21, 2017 at 5:56 PM, Michael Paquier
> <michael(dot)paquier(at)gmail(dot)com> wrote:
>> On Thu, Sep 21, 2017 at 4:40 PM, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>>> On Thu, Sep 21, 2017 at 2:25 PM, Michael Paquier
>>> <michael(dot)paquier(at)gmail(dot)com> wrote:
>>>> +- Assert(XLogCtl->Insert.nonExclusiveBackups >= 0);
>>>> + if (XLogCtl->Insert.nonExclusiveBackups > 0)
>>>> + XLogCtl->Insert.nonExclusiveBackups--;
>>>> Hm, no, I don't agree. I think that instead you should just leave
>>>> do_pg_abort_backup() immediately if sessionBackupState is set to
>>>> SESSION_BACKUP_NONE. This variable is the link between the global
>>>> counters and the session stopping the backup so I don't think that we
>>>> should touch this assertion of this counter. I think that this method
>>>> would be safe as well for backup start as pg_start_backup_callback
>>>> takes care of any cleanup. Also because the counters are incremented
>>>> before entering in the PG_ENSURE_ERROR_CLEANUP block, and
>>>> sessionBackupState is updated just after leaving the block.
>>>
>>> I think that the assertion failure still can happen if the process
>>> aborts after decremented the counter and before setting to
>>> SESSION_BACKUP_NONE. Am I missing something?
>>
>> The process would stop at the next CHECK_FOR_INTERRUPTS() and trigger
>> the cleanup at this moment. So this happens when waiting for the
>> archives to be done, and the session flag is set to NONE at this
>> point.
>
> And actually, with two non-exclusive backups taken in parallel, it is
> still possible to fail on another assertion in do_pg_stop_backup()
> even with your patch. Imagine the following:
> 1) Two backups are taken, counter for non-exclusive backups is set at 2.
> 2) One backup is stopped, then interrupted. This causes the counter to
> be decremented twice, once in do_pg_stop_backup, and once when
> aborting. Counter is at 0, switching as well forcePageWrites to
> false..
> 3) The second backup stops, a new assertion failure is triggered.
> Without assertions the counter would get at -1.
You're right. I updated the patch so that it exits do_pg_abort_backup
if the state is NONE and setting the state to NONE in
do_pg_stop_backup before releasing the WAL insert lock.
Regards,
--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center
Attachment | Content-Type | Size |
---|---|---|
fix_do_pg_abort_backup_v2.patch | application/octet-stream | 1.2 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Thomas Munro | 2017-09-22 02:37:38 | Re: [HACKERS] Re: pgsql: Make new crash restart test a bit more robust. |
Previous Message | Kyotaro HORIGUCHI | 2017-09-22 02:19:52 | Re: hash index on unlogged tables doesn't behave as expected |