Re: Problem with synchronous replication

From: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To: "lingce(dot)ldm" <lingce(dot)ldm(at)alibaba-inc(dot)com>
Cc: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Problem with synchronous replication
Date: 2019-10-30 08:21:17
Message-ID: CAHGQGwFKC89NhV2Ab=VRUzYZHZKfZkQLBr=FVQ2QDNEN7S3cnA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Oct 30, 2019 at 4:16 PM lingce.ldm <lingce(dot)ldm(at)alibaba-inc(dot)com> wrote:
>
> On Oct 29, 2019, at 18:50, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com> wrote:
>
>
> Hello.
>
> At Fri, 25 Oct 2019 15:18:34 +0800, "Dongming Liu" <lingce(dot)ldm(at)alibaba-inc(dot)com> wrote in
>
>
> Hi,
>
> I recently discovered two possible bugs about synchronous replication.
>
> 1. SyncRepCleanupAtProcExit may delete an element that has been deleted
> SyncRepCleanupAtProcExit first checks whether the queue is detached, if it is not detached,
> acquires the SyncRepLock lock and deletes it. If this element has been deleted by walsender,
> it will be deleted repeatedly, SHMQueueDelete will core with a segment fault.
>
> IMO, like SyncRepCancelWait, we should lock the SyncRepLock first and then check
> whether the queue is detached or not.
>
>
> I think you're right here.

This change causes every ending backends to always take the exclusive lock
even when it's not in SyncRep queue. This may be problematic, for example,
when terminating multiple backends at the same time? If yes,
it might be better to check SHMQueueIsDetached() again after taking the lock.
That is,

if (!SHMQueueIsDetached(&(MyProc->syncRepLinks)))
{
LWLockAcquire(SyncRepLock, LW_EXCLUSIVE);
if (!SHMQueueIsDetached(&(MyProc->syncRepLinks)))
SHMQueueDelete(&(MyProc->syncRepLinks));
LWLockRelease(SyncRepLock);
}

Regards,

--
Fujii Masao

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Ibrar Ahmed 2019-10-30 08:38:29 Proposal: Global Index
Previous Message Fabien COELHO 2019-10-30 08:04:12 Re: Join Correlation Name