Re: Truncate in synchronous logical replication failed

From: Japin Li <japinli(at)hotmail(dot)com>
To: Japin Li <japinli(at)hotmail(dot)com>
Cc: "tanghy(dot)fnst(at)fujitsu(dot)com" <tanghy(dot)fnst(at)fujitsu(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Truncate in synchronous logical replication failed
Date: 2021-04-10 14:52:10
Message-ID: MEYP282MB1669E53EDA1B6180182BDE92B6729@MEYP282MB1669.AUSP282.PROD.OUTLOOK.COM
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


On Thu, 08 Apr 2021 at 19:20, Japin Li <japinli(at)hotmail(dot)com> wrote:
> On Wed, 07 Apr 2021 at 16:34, tanghy(dot)fnst(at)fujitsu(dot)com <tanghy(dot)fnst(at)fujitsu(dot)com> wrote:
>> On Wednesday, April 7, 2021 5:28 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote
>>
>>>Can you please check if the behavior is the same for PG-13? This is
>>>just to ensure that we have not introduced any bug in PG-14.
>>
>> Yes, same failure happens at PG-13, too.
>>
>
> I found that when we truncate a table in synchronous logical replication,
> LockAcquireExtended() [1] will try to take a lock via fast path and it
> failed (FastPathStrongRelationLocks->count[fasthashcode] = 1).
> However, it can acquire the lock when in asynchronous logical replication.
> I'm not familiar with the locks, any suggestions? What the difference
> between sync and async logical replication for locks?
>

After some analyze, I find that when the TRUNCATE finish, it will call
SyncRepWaitForLSN(), for asynchronous logical replication, it will exit
early, and then it calls ResourceOwnerRelease(RESOURCE_RELEASE_LOCKS) to
release the locks, so the walsender can acquire the lock.

But for synchronous logical replication, SyncRepWaitForLSN() will wait
for specified LSN to be confirmed, so it cannot release the lock, and
the walsender try to acquire the lock. Obviously, it cannot acquire the
lock, because the lock hold by the process which performs TRUNCATE
command. This is why the TRUNCATE in synchronous logical replication is
blocked.

I don't know if it makes sense to fix this, if so, how to do fix it?
Thoughts?

--
Regrads,
Japin Li.
ChengDu WenWu Information Technology Co.,Ltd.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2021-04-10 14:52:15 Re: SQL-standard function body
Previous Message Tom Lane 2021-04-10 14:33:10 Re: Is it worth to optimize VACUUM/ANALYZE by combining duplicate rel instances into single rel instance?