Re: Optionally automatically disable logical replication subscriptions on error

From: Peter Smith <smithpb2250(at)gmail(dot)com>
To: "osumi(dot)takamichi(at)fujitsu(dot)com" <osumi(dot)takamichi(at)fujitsu(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, "tanghy(dot)fnst(at)fujitsu(dot)com" <tanghy(dot)fnst(at)fujitsu(dot)com>, "wangw(dot)fnst(at)fujitsu(dot)com" <wangw(dot)fnst(at)fujitsu(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, Greg Nancarrow <gregn4422(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Mark Dilger <mark(dot)dilger(at)enterprisedb(dot)com>, "Smith, Peter" <peters(at)fast(dot)au(dot)fujitsu(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Optionally automatically disable logical replication subscriptions on error
Date: 2022-02-21 22:53:07
Message-ID: CAHut+Pt+Mm=312rZ+YZkMdC0xZqUGfdLzt=V9burF6coW_vRkg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Feb 21, 2022 at 11:44 PM osumi(dot)takamichi(at)fujitsu(dot)com
<osumi(dot)takamichi(at)fujitsu(dot)com> wrote:
>
> On Monday, February 21, 2022 2:56 PM Peter Smith <smithpb2250(at)gmail(dot)com> wrote:
> > Thanks for addressing my previous comments. Now I have looked at v19.
> >
> > On Mon, Feb 21, 2022 at 11:25 AM osumi(dot)takamichi(at)fujitsu(dot)com
> > <osumi(dot)takamichi(at)fujitsu(dot)com> wrote:
> > >
> > > On Friday, February 18, 2022 3:27 PM Peter Smith
> > <smithpb2250(at)gmail(dot)com> wrote:
> > > > Hi. Below are my code review comments for v18.
> > > Thank you for your review !
> > ...
> > > > 5. src/backend/replication/logical/worker.c -
> > > > DisableSubscriptionOnError
> > > >
> > > > + /*
> > > > + * We would not be here unless this subscription's disableonerr
> > > > + field was
> > > > + * true when our worker began applying changes, but check whether
> > > > + that
> > > > + * field has changed in the interim.
> > > > + */
> > > >
> > > > Apparently, this function might just do nothing if it detects some
> > > > situation where the flag was changed somehow, but I'm not 100% sure
> > > > that the callers are properly catering for when nothing happens.
> > > >
> > > > IMO it would be better if this function would return true/false to
> > > > mean "did disable subscription happen or not?" because that will
> > > > give the calling code the chance to check the function return and do
> > > > the right thing - e.g. if the caller first thought it should be disabled but then
> > it turned out it did NOT disable...
> > > I don't think we need to do something more.
> > > After this function, table sync worker and the apply worker just exit.
> > > IMO, we don't need to do additional work for already-disabled
> > > subscription on the caller sides.
> > > It should be sufficient to fulfill the purpose of
> > > DisableSubscriptionOnError or confirm it has been fulfilled.
> >
> > Hmmm - Yeah, it may be the workers might just exit soon after anyhow as you
> > say so everything comes out in the wash, but still, I felt for this case when
> > DisableSubscriptionOnError turned out to do nothing it would be better to exit
> > via the existing logic. And that is easy to do if the function returns true/false.
> >
> > For example, changes like below seemed neater code to me. YMMV.
> >
> > BEFORE (SyncTableStartWrapper):
> > if (MySubscription->disableonerr)
> > {
> > DisableSubscriptionOnError();
> > proc_exit(0);
> > }
> > AFTER
> > if (MySubscription->disableonerr && DisableSubscriptionOnError())
> > proc_exit(0);
> >
> > BEFORE (ApplyLoopWrapper)
> > if (MySubscription->disableonerr)
> > {
> > /* Disable the subscription */
> > DisableSubscriptionOnError();
> > return;
> > }
> > AFTER
> > if (MySubscription->disableonerr && DisableSubscriptionOnError()) return;
> Okay, so this return value works for better readability.
> Fixed.
>
>
> > ~~~
> >
> > Here are a couple more comments:
> >
> > 1. src/backend/replication/logical/worker.c - DisableSubscriptionOnError,
> > Refactor error handling
> >
> > (this comment assumes the above gets changed too)
> I think those are independent.

OK. I was only curious if the change #5 above might cause the error to
be logged 2x, if the DisableSubscriptionOnError returns false.
- firstly, when it logs errors within the function
- secondly, by normal error mechanism when the caller re-throws it.

But, if you are sure that won't happen then it is good news.

>
>
> > +static void
> > +DisableSubscriptionOnError(void)
> > +{
> > + Relation rel;
> > + bool nulls[Natts_pg_subscription];
> > + bool replaces[Natts_pg_subscription];
> > + Datum values[Natts_pg_subscription];
> > + HeapTuple tup;
> > + Form_pg_subscription subform;
> > +
> > + /* Emit the error */
> > + EmitErrorReport();
> > + /* Abort any active transaction */
> > + AbortOutOfAnyTransaction();
> > + /* Reset the ErrorContext */
> > + FlushErrorState();
> > +
> > + /* Disable the subscription in a fresh transaction */
> > + StartTransactionCommand();
> >
> > If this DisableSubscriptionOnError function decides later that actually the
> > 'disableonerr' flag is false (i.e. it's NOT going to disable the subscription after
> > all) then IMO it make more sense that the error logging for that case should just
> > do whatever it is doing now by the normal error processing mechanism.
> >
> > In other words, I thought perhaps the code to EmitErrorReport/FlushError state
> > etc be moved to be BELOW the if
> > (!subform->subdisableonerr) bail-out code?
> >
> > Please see what you think in my attached POC [1]. It seems neater to me, and
> > tests are all OK. Maybe I am mistaken...
> I had a concern that this order change of codes would have a negative
> impact when we have another new error during the call of DisableSubscriptionOnError.
>
> With the debugger, I raised an error in this function before emitting the original error.
> As a result, the original error that makes the apply worker go into the path of
> DisableSubscriptionOnError (in my test, duplication error) has vanished.
> In this sense, v19 looks safer, and the current order to handle error recovery first
> looks better to me.
>
> FYI, after the 2nd debugger error,
> the next new apply worker created quickly met the same type of error,
> went into the same path, and disabled the subscription with the log.
> But, it won't be advisable to let the possibility left.

OK - thanks for checking it.

Will it be better to put some comments about that? Something like --

BEFORE
/* Emit the error */
EmitErrorReport();
/* Abort any active transaction */
AbortOutOfAnyTransaction();
/* Reset the ErrorContext */
FlushErrorState();

/* Disable the subscription in a fresh transaction */
StartTransactionCommand();

AFTER
/* Disable the subscription in a fresh transaction */
AbortOutOfAnyTransaction();
StartTransactionCommand();

/*
* Log the error that caused DisableSubscriptionOnError to be called.
* We do this immediately so that it won't be lost if some other internal
* error occurs in the following code,
*/
EmitErrorReport();
FlushErrorState();

...
>
> > ------
> > [1] peter-v19-poc.diff - POC just to try some of my suggestions above to make
> > sure all tests still pass ok.
> Thanks ! I included you as co-author, because
> you shared meaningful patches for me.
>

Thanks!

------
Kind Regards,
Peter Smith.
Fujitsu Australia

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Daniel Gustafsson 2022-02-21 23:08:11 TAP output format in pg_regress
Previous Message Joel Jacobson 2022-02-21 22:35:39 Re: List of all* PostgreSQL EXTENSIONs in the world