Re: Truncate in synchronous logical replication failed

From: Japin Li <japinli(at)hotmail(dot)com>
To: "osumi(dot)takamichi(at)fujitsu(dot)com" <osumi(dot)takamichi(at)fujitsu(dot)com>
Cc: 'Amit Kapila' <amit(dot)kapila16(at)gmail(dot)com>, Petr Jelinek <petr(dot)jelinek(at)enterprisedb(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, "tanghy(dot)fnst(at)fujitsu(dot)com" <tanghy(dot)fnst(at)fujitsu(dot)com>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Truncate in synchronous logical replication failed
Date: 2021-04-14 02:38:13
Message-ID: MEYP282MB16691F27E96F23D6D4D5CDF5B64E9@MEYP282MB1669.AUSP282.PROD.OUTLOOK.COM
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


On Tue, 13 Apr 2021 at 21:54, osumi(dot)takamichi(at)fujitsu(dot)com <osumi(dot)takamichi(at)fujitsu(dot)com> wrote:
> On Monday, April 12, 2021 3:58 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>> On Mon, Apr 12, 2021 at 10:03 AM osumi(dot)takamichi(at)fujitsu(dot)com
>> <osumi(dot)takamichi(at)fujitsu(dot)com> wrote:
>> > but if we take a measure to fix the doc, we have to be careful for the
>> > description, because when we remove the primary keys of 'test' tables on the
>> scenario in [1], we don't have this issue.
>> > It means TRUNCATE in synchronous logical replication is not always
>> blocked.
>> >
>>
>> The problem happens only when we try to fetch IDENTITY_KEY attributes
>> because pgoutput uses RelationGetIndexAttrBitmap() to get that information
>> which locks the required indexes. Now, because TRUNCATE has already
>> acquired an exclusive lock on the index, it seems to create a sort of deadlock
>> where the actual Truncate operation waits for logical replication of operation to
>> complete and logical replication waits for actual Truncate operation to finish.
>>
>> Do we really need to use RelationGetIndexAttrBitmap() to build IDENTITY_KEY
>> attributes? During decoding, we don't even lock the main relation, we just scan
>> the system table and build that information using a historic snapshot. Can't we
>> do something similar here?
> I think we can build the IDENTITY_KEY attributes with NoLock
> instead of calling RelationGetIndexAttrBitmap().
>
> When we trace back the caller side of logicalrep_write_attrs(),
> doing the thing equivalent to RelationGetIndexAttrBitmap()
> for INDEX_ATTR_BITMAP_IDENTITY_KEY impacts only pgoutput_truncate.
>
> OTOH, I can't find codes similar to RelationGetIndexAttrBitmap()
> in pgoutput_* functions and in the file of relcache.c.
> Therefore, I'd like to discuss how to address the hang.
>
> My first idea is to extract some parts of RelationGetIndexAttrBitmap()
> only for INDEX_ATTR_BITMAP_IDENTITY_KEY and implement those
> either in a logicalrep_write_attrs() or as a new function.
> RelationGetIndexAttrBitmap() has 'restart' label for goto statement
> in order to ensure to return up-to-date attribute bitmaps, so
> I prefer having a new function when we choose this direction.
> Having that goto in logicalrep_write_attrs() makes it a little bit messy, I felt.
>
> The other direction might be to extend RelationGetIndexAttrBitmap's function definition
> to accept lockmode to give NoLock from logicalrep_write_attrs().
> But, this change impacts on other several callers so is not as good as the first direction above, I think.
>
> If someone has any better idea, please let me know.
>

I think the first idea is better than the second. OTOH, can we release the
locks before SyncRepWaitForLSN(), since it already flush to local WAL files.

--
Regrads,
Japin Li.
ChengDu WenWu Information Technology Co.,Ltd.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Craig Ringer 2021-04-14 02:41:44 Re: [PATCH] Identify LWLocks in tracepoints
Previous Message vignesh C 2021-04-14 02:34:06 Re: Replication slot stats misgivings