RE: Truncate in synchronous logical replication failed

From: "osumi(dot)takamichi(at)fujitsu(dot)com" <osumi(dot)takamichi(at)fujitsu(dot)com>
To: 'Amit Kapila' <amit(dot)kapila16(at)gmail(dot)com>
Cc: Ajin Cherian <itsajin(at)gmail(dot)com>, Japin Li <japinli(at)hotmail(dot)com>, Petr Jelinek <petr(dot)jelinek(at)enterprisedb(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, "tanghy(dot)fnst(at)fujitsu(dot)com" <tanghy(dot)fnst(at)fujitsu(dot)com>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: RE: Truncate in synchronous logical replication failed
Date: 2021-04-23 09:03:02
Message-ID: OSBPR01MB48888CA6F58CCE11ABD1667AED459@OSBPR01MB4888.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Friday, April 23, 2021 3:43 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> On Fri, Apr 23, 2021 at 12:04 PM osumi(dot)takamichi(at)fujitsu(dot)com
> <osumi(dot)takamichi(at)fujitsu(dot)com> wrote:
> >
> > On Thursday, April 22, 2021 6:33 PM Amit Kapila
> <amit(dot)kapila16(at)gmail(dot)com> wrote:
> > > On Tue, Apr 20, 2021 at 9:00 AM osumi(dot)takamichi(at)fujitsu(dot)com
> > > <osumi(dot)takamichi(at)fujitsu(dot)com> wrote:
> > > >
> > > > On Tuesday, April 20, 2021 10:53 AM Ajin Cherian
> > > > <itsajin(at)gmail(dot)com>
> > > wrote:
> > > > >
> > > > > I reviewed the patch, ran make check, no issues. One minor
> comment:
> > > > >
> > > > > Could you add the comment similar to
> > > > > RelationGetIndexAttrBitmap() on why the redo, it's not very
> > > > > obvious to someone reading the code, why we are refetching the
> index list here.
> > > > >
> > > > > + /* Check if we need to redo */
> > > > >
> > > > > + newindexoidlist = RelationGetIndexList(relation);
> > > > Yeah, makes sense. Fixed.
> > >
> > > I don't think here we need to restart to get a stable list of
> > > indexes as we do in RelationGetIndexAttrBitmap. The reason is here
> > > we build the cache entry using a historic snapshot and all the later
> > > changes are absorbed while decoding WAL.
> > I rechecked this and I agree with this.
> > I don't see any problem to remove the redo check.
> > Based on this direction, I'll share my several minor comments.
> >
> > [1] a typo of RelationGetIdentityKeyBitmap's comment
> >
> > + * This is a special purpose function used during logical
> > + replication. Here,
> > + * unlike RelationGetIndexAttrBitmap(), we don't a acquire lock on
> > + the required
> >
> > We have "a" in an unnatural place. It should be "we don't acquire...".
> >
> > [2] suggestion to fix RelationGetIdentityKeyBitmap's comment
> >
> > + * later changes are absorbed while decoding WAL. Due to this reason,
> > + we don't
> > + * need to retry here in case of a change in the set of indexes.
> >
> > I think it's better to use "Because of" instead of "Due to".
> > This patch works to solve the deadlock.
> >
>
> I am not sure which one is better. I would like to keep it as it is unless you feel
> strongly about point 2.
>
> > [3] wrong comment in RelationGetIdentityKeyBitmap
> >
> > + /* Save some values to compare after building attributes */
> >
> > I wrote this comment for the redo check that has been removed already.
> > We can delete it.
> >
> > [4] suggestion to remove local variable relreplindex in
> > RelationGetIdentityKeyBitmap
> >
> > I think we don't need relreplindex.
> > We can just pass relaton->rd_replidindex to RelationIdGetRelation().
> > There is no more reference of the variable.
> >
> > [5] suggestion to fix the place to release indexoidlist
> >
> > I thought we can do its list_free() ealier, after checking if there is
> > no indexes.
> >
>
> Hmm, why? If there are no indexes then we wouldn't have allocated any
> memory.
>
> > [6] suggestion to remove period in one comment.
> >
> > + /* Quick exit if we already computed the result. */
> >
> > This comes from RelationGetIndexAttrBitmap (and my posted versions)
> > but I think we can remove the period to give better alignment shared with
> other comments in the function.
> >
>
> Can you please update the patch except for the two points to which I
> responded above?
I've combined v5 with above accepted comments.

Just in case, I've conducted make check-world,
the test of clobber_cache_always mode again for v6
and tight loop test of 100 times for 010_truncate.pl.
The result is that all passed with no failure.

Best Regards,
Takamichi Osumi

Attachment Content-Type Size
truncate_in_synchronous_logical_replication_v06.patch application/octet-stream 5.3 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2021-04-23 09:15:27 Re: Replication slot stats misgivings
Previous Message Patrik Novotny 2021-04-23 08:51:02 Re: RFE: Make statistics robust for unplanned events