Re: tablesync patch broke the assumption that logical rep depends on?

From: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To: Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: tablesync patch broke the assumption that logical rep depends on?
Date: 2017-04-13 17:31:32
Message-ID: CAHGQGwH2-Vp5tfZjhdhGx_Acs7kdPdWawOGw-ZPTS9d0i3z5sw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Apr 14, 2017 at 1:28 AM, Peter Eisentraut
<peter(dot)eisentraut(at)2ndquadrant(dot)com> wrote:
> On 4/10/17 13:28, Fujii Masao wrote:
>> src/backend/replication/logical/launcher.c
>> * Worker started and attached to our shmem. This check is safe
>> * because only launcher ever starts the workers, so nobody can steal
>> * the worker slot.
>>
>> The tablesync patch enabled even worker to start another worker.
>> So the above assumption is not valid for now.
>>
>> This issue seems to cause the corner case where the launcher picks up
>> the same worker slot that previously-started worker has already picked
>> up to start another worker.
>
> I think what the comment should rather say is that workers are always
> started through logicalrep_worker_launch() and worker slots are always
> handed out while holding LogicalRepWorkerLock exclusively, so nobody can
> steal the worker slot.
>
> Does that make sense?

No unless I'm missing something.

logicalrep_worker_launch() picks up unused worker slot (slot's proc == NULL)
while holding LogicalRepWorkerLock. But it releases the lock before the slot
is marked as used (i.e., slot is set to non-NULL). Then newly-launched worker
calls logicalrep_worker_attach() and marks the slot as used.

So if another logicalrep_worker_launch() starts after LogicalRepWorkerLock
is released before the slot is marked as used, it can pick up the same slot
because that slot looks unused.

Regards,

--
Fujii Masao

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Pavel Stehule 2017-04-13 17:35:50 Re: bugfix: xpath encoding issue
Previous Message Tom Lane 2017-04-13 17:27:39 Re: Re: Query fails when SRFs are part of FROM clause (Commit id: 69f4b9c85f)