RE: Truncate in synchronous logical replication failed

From: "osumi(dot)takamichi(at)fujitsu(dot)com" <osumi(dot)takamichi(at)fujitsu(dot)com>
To: 'Japin Li' <japinli(at)hotmail(dot)com>
Cc: 'Amit Kapila' <amit(dot)kapila16(at)gmail(dot)com>, Petr Jelinek <petr(dot)jelinek(at)enterprisedb(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, "tanghy(dot)fnst(at)fujitsu(dot)com" <tanghy(dot)fnst(at)fujitsu(dot)com>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: RE: Truncate in synchronous logical replication failed
Date: 2021-04-14 08:22:25
Message-ID: OSBPR01MB4888A5294403523B136282BCED4E9@OSBPR01MB4888.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wednesday, April 14, 2021 11:38 AM Japin Li <japinli(at)hotmail(dot)com> wrote:
> On Tue, 13 Apr 2021 at 21:54, osumi(dot)takamichi(at)fujitsu(dot)com
> <osumi(dot)takamichi(at)fujitsu(dot)com> wrote:
> > On Monday, April 12, 2021 3:58 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
> wrote:
> >> On Mon, Apr 12, 2021 at 10:03 AM osumi(dot)takamichi(at)fujitsu(dot)com
> >> <osumi(dot)takamichi(at)fujitsu(dot)com> wrote:
> >> > but if we take a measure to fix the doc, we have to be careful for
> >> > the description, because when we remove the primary keys of 'test'
> >> > tables on the
> >> scenario in [1], we don't have this issue.
> >> > It means TRUNCATE in synchronous logical replication is not always
> >> blocked.
> >> >
> >>
> >> The problem happens only when we try to fetch IDENTITY_KEY attributes
> >> because pgoutput uses RelationGetIndexAttrBitmap() to get that
> >> information which locks the required indexes. Now, because TRUNCATE
> >> has already acquired an exclusive lock on the index, it seems to
> >> create a sort of deadlock where the actual Truncate operation waits
> >> for logical replication of operation to complete and logical replication waits
> for actual Truncate operation to finish.
> >>
> >> Do we really need to use RelationGetIndexAttrBitmap() to build
> >> IDENTITY_KEY attributes? During decoding, we don't even lock the main
> >> relation, we just scan the system table and build that information
> >> using a historic snapshot. Can't we do something similar here?
> > I think we can build the IDENTITY_KEY attributes with NoLock instead
> > of calling RelationGetIndexAttrBitmap().
> >
> > When we trace back the caller side of logicalrep_write_attrs(), doing
> > the thing equivalent to RelationGetIndexAttrBitmap() for
> > INDEX_ATTR_BITMAP_IDENTITY_KEY impacts only pgoutput_truncate.
> >
> > OTOH, I can't find codes similar to RelationGetIndexAttrBitmap() in
> > pgoutput_* functions and in the file of relcache.c.
> > Therefore, I'd like to discuss how to address the hang.
> >
> > My first idea is to extract some parts of RelationGetIndexAttrBitmap()
> > only for INDEX_ATTR_BITMAP_IDENTITY_KEY and implement those either
> in
> > a logicalrep_write_attrs() or as a new function.
> > RelationGetIndexAttrBitmap() has 'restart' label for goto statement in
> > order to ensure to return up-to-date attribute bitmaps, so I prefer
> > having a new function when we choose this direction.
> > Having that goto in logicalrep_write_attrs() makes it a little bit messy, I felt.
> >
> > The other direction might be to extend RelationGetIndexAttrBitmap's
> > function definition to accept lockmode to give NoLock from
> logicalrep_write_attrs().
> > But, this change impacts on other several callers so is not as good as the first
> direction above, I think.
> >
> > If someone has any better idea, please let me know.
> >
>
> I think the first idea is better than the second. OTOH, can we release the locks
> before SyncRepWaitForLSN(), since it already flush to local WAL files.
Thank you for your comments.
I didn't mean to change and touch TRUNCATE side to release the locks,
because I expected that its AccessExclusiveLock
protects any other operation (e.g. DROP INDEX) to the table
which affects IDENTITY KEY building. But, now as I said in another e-mail,
both ideas above can't work. Really sorry for making noises.

Best Regards,
Takamichi Osumi

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Dave Page 2021-04-14 08:40:06 Re: More sepgsql weirdness
Previous Message Michael Paquier 2021-04-14 08:13:30 File truncation within PostgresNode::issues_sql_like() wrong on Windows