Re: Truncate in synchronous logical replication failed

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Petr Jelinek <petr(dot)jelinek(at)enterprisedb(dot)com>
Cc: "osumi(dot)takamichi(at)fujitsu(dot)com" <osumi(dot)takamichi(at)fujitsu(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, Japin Li <japinli(at)hotmail(dot)com>, "tanghy(dot)fnst(at)fujitsu(dot)com" <tanghy(dot)fnst(at)fujitsu(dot)com>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Truncate in synchronous logical replication failed
Date: 2021-04-15 10:22:52
Message-ID: CAA4eK1JfDjgr1YckE4Z9=Xwk48hqb58PBP7+4_Ehdi7ja+MRvg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Apr 14, 2021 at 3:31 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Tue, Apr 13, 2021 at 8:07 PM Petr Jelinek
> <petr(dot)jelinek(at)enterprisedb(dot)com> wrote:
> >
> > > On 12 Apr 2021, at 08:58, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> > >
> > > The problem happens only when we try to fetch IDENTITY_KEY attributes
> > > because pgoutput uses RelationGetIndexAttrBitmap() to get that
> > > information which locks the required indexes. Now, because TRUNCATE
> > > has already acquired an exclusive lock on the index, it seems to
> > > create a sort of deadlock where the actual Truncate operation waits
> > > for logical replication of operation to complete and logical
> > > replication waits for actual Truncate operation to finish.
> > >
> > > Do we really need to use RelationGetIndexAttrBitmap() to build
> > > IDENTITY_KEY attributes? During decoding, we don't even lock the main
> > > relation, we just scan the system table and build that information
> > > using a historic snapshot. Can't we do something similar here?
> > >
> > > Adding Petr J. and Peter E. to know their views as this seems to be an
> > > old problem (since the decoding of Truncate operation is introduced).
> >
> > We used RelationGetIndexAttrBitmap because it already existed, no other reason.
> >
>
> Fair enough. But I think we should do something about it because using
> the same (RelationGetIndexAttrBitmap) just breaks the synchronous
> logical replication. I think this is broken since the logical
> replication of Truncate is supported.
>
> > I am not sure what exact locking we need but I would have guessed at least AccessShareLock would be needed.
> >
>
> Are you telling that we need AccessShareLock on the index? If so, why
> is it different from how we access the relation during decoding
> (basically in ReorderBufferProcessTXN, we directly use
> RelationIdGetRelation() without any lock on the relation)? I think we
> do it that way because we need it to process WAL entries and we need
> the relation state based on the historic snapshot, so, even if the
> relation is later changed/dropped, we are fine with the old state we
> got. Isn't the same thing applies here in logicalrep_write_attrs? If
> that is true then some equivalent of RelationGetIndexAttrBitmap where
> we use RelationIdGetRelation instead of index_open should work?
>

Today, again I have thought about this and don't see a problem with
the above idea. If the above understanding is correct, then I think
for our purpose in pgoutput, we just need to call RelationGetIndexList
and then build the idattr list for relation->rd_replidindex. We can
then cache it in relation->rd_idattr. I am not sure if it is really
required to do all the other work in RelationGetIndexAttrBitmap.

--
With Regards,
Amit Kapila.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Japin Li 2021-04-15 10:30:14 Forget close an open relation in ReorderBufferProcessTXN()
Previous Message Luc Vlaming 2021-04-15 09:33:56 Re: "could not find pathkey item to sort" for TPC-DS queries 94-96