Re: Minimal logical decoding on standbys

From: Amit Khandekar <amitdkhan(dot)pg(at)gmail(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Craig Ringer <craig(at)2ndquadrant(dot)com>, Petr Jelinek <petr(at)2ndquadrant(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Minimal logical decoding on standbys
Date: 2019-05-22 09:35:29
Message-ID: CAJ3gD9dC9oMGXqCAeisNOK2BkeaiOTYO2aTMFzi9SHnKtu4PJg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, 9 Apr 2019 at 22:23, Amit Khandekar <amitdkhan(dot)pg(at)gmail(dot)com> wrote:
>
> On Sat, 6 Apr 2019 at 04:45, Andres Freund <andres(at)anarazel(dot)de> wrote:
> > > diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
> > > index 006446b..5785d2f 100644
> > > --- a/src/backend/replication/slot.c
> > > +++ b/src/backend/replication/slot.c
> > > @@ -1064,6 +1064,85 @@ ReplicationSlotReserveWal(void)
> > > }
> > > }
> > >
> > > +void
> > > +ResolveRecoveryConflictWithSlots(Oid dboid, TransactionId xid)
> > > +{
> > > + int i;
> > > + bool found_conflict = false;
> > > +
> > > + if (max_replication_slots <= 0)
> > > + return;
> > > +
> > > +restart:
> > > + if (found_conflict)
> > > + {
> > > + CHECK_FOR_INTERRUPTS();
> > > + /*
> > > + * Wait awhile for them to die so that we avoid flooding an
> > > + * unresponsive backend when system is heavily loaded.
> > > + */
> > > + pg_usleep(100000);
> > > + found_conflict = false;
> > > + }
> > > +
> > > + LWLockAcquire(ReplicationSlotControlLock, LW_SHARED);
> > > + for (i = 0; i < max_replication_slots; i++)
> > > + {
> > > + ReplicationSlot *s;
> > > + NameData slotname;
> > > + TransactionId slot_xmin;
> > > + TransactionId slot_catalog_xmin;
> > > +
> > > + s = &ReplicationSlotCtl->replication_slots[i];
> > > +
> > > + /* cannot change while ReplicationSlotCtlLock is held */
> > > + if (!s->in_use)
> > > + continue;
> > > +
> > > + /* not our database, skip */
> > > + if (s->data.database != InvalidOid && s->data.database != dboid)
> > > + continue;
> > > +
> > > + SpinLockAcquire(&s->mutex);
> > > + slotname = s->data.name;
> > > + slot_xmin = s->data.xmin;
> > > + slot_catalog_xmin = s->data.catalog_xmin;
> > > + SpinLockRelease(&s->mutex);
> > > +
> > > + if (TransactionIdIsValid(slot_xmin) && TransactionIdPrecedesOrEquals(slot_xmin, xid))
> > > + {
> > > + found_conflict = true;
> > > +
> > > + ereport(WARNING,
> > > + (errmsg("slot %s w/ xmin %u conflicts with removed xid %u",
> > > + NameStr(slotname), slot_xmin, xid)));
> > > + }
> > > +
> > > + if (TransactionIdIsValid(slot_catalog_xmin) && TransactionIdPrecedesOrEquals(slot_catalog_xmin, xid))
> > > + {
> > > + found_conflict = true;
> > > +
> > > + ereport(WARNING,
> > > + (errmsg("slot %s w/ catalog xmin %u conflicts with removed xid %u",
> > > + NameStr(slotname), slot_catalog_xmin, xid)));
> > > + }
> > > +
> > > +
> > > + if (found_conflict)
> > > + {
> > > + elog(WARNING, "Dropping conflicting slot %s", s->data.name.data);
> > > + LWLockRelease(ReplicationSlotControlLock); /* avoid deadlock */
> > > + ReplicationSlotDropPtr(s);
> > > +
> > > + /* We released the lock above; so re-scan the slots. */
> > > + goto restart;
> > > + }
> > > + }
> > >
> > I think this should be refactored so that the two found_conflict cases
> > set a 'reason' variable (perhaps an enum?) to the particular reason, and
> > then only one warning should be emitted. I also think that LOG might be
> > more appropriate than WARNING - as confusing as that is, LOG is more
> > severe than WARNING (see docs about log_min_messages).
>
> What I have in mind is :
>
> ereport(LOG,
> (errcode(ERRCODE_INTERNAL_ERROR),
> errmsg("Dropping conflicting slot %s", s->data.name.data),
> errdetail("%s, removed xid %d.", conflict_str, xid)));
> where conflict_str is a dynamically generated string containing
> something like : "slot xmin : 1234, slot catalog_xmin: 5678"
> So for the user, the errdetail will look like :
> "slot xmin: 1234, catalog_xmin: 5678, removed xid : 9012"
> I think the user can figure out whether it was xmin or catalog_xmin or
> both that conflicted with removed xid.
> If we don't do this way, we may not be able to show in a single
> message if both xmin and catalog_xmin are conflicting at the same
> time.
>
> Does this message look good to you, or you had in mind something quite
> different ?

The above one is yet another point that needs to be concluded on. Till
then I will use the above way to display the error message in the
upcoming patch version.

--
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Meskes 2019-05-22 10:07:35 Re: SQL statement PREPARE does not work in ECPG
Previous Message Amit Khandekar 2019-05-22 09:32:10 Re: Minimal logical decoding on standbys