Re: Synchronizing slots from primary to standby

From: shveta malik <shveta(dot)malik(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: "Drouvot, Bertrand" <bertranddrouvot(dot)pg(at)gmail(dot)com>, Nisha Moond <nisha(dot)moond412(at)gmail(dot)com>, "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Ajin Cherian <itsajin(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, shveta malik <shvetamalik(at)gmail(dot)com>
Subject: Re: Synchronizing slots from primary to standby
Date: 2023-12-12 09:14:28
Message-ID: CAJpy0uDaGMNpgmdxie-MgHmMhnD4ET_LDjQNEe76xJ+MLqRQ8Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Dec 11, 2023 at 7:12 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Mon, Dec 11, 2023 at 2:41 PM shveta malik <shveta(dot)malik(at)gmail(dot)com> wrote:
> >
> > >
> > > 5.
> > > +synchronize_slots(WalReceiverConn *wrconn)
> > > {
> > > ...
> > > ...
> > > + /* The syscache access needs a transaction env. */
> > > + StartTransactionCommand();
> > > +
> > > + /*
> > > + * Make result tuples live outside TopTransactionContext to make them
> > > + * accessible even after transaction is committed.
> > > + */
> > > + MemoryContextSwitchTo(oldctx);
> > > +
> > > + /* Construct query to get slots info from the primary server */
> > > + initStringInfo(&s);
> > > + construct_slot_query(&s);
> > > +
> > > + elog(DEBUG2, "slot sync worker's query:%s \n", s.data);
> > > +
> > > + /* Execute the query */
> > > + res = walrcv_exec(wrconn, s.data, SLOTSYNC_COLUMN_COUNT, slotRow);
> > > + pfree(s.data);
> > > +
> > > + if (res->status != WALRCV_OK_TUPLES)
> > > + ereport(ERROR,
> > > + (errmsg("could not fetch failover logical slots info "
> > > + "from the primary server: %s", res->err)));
> > > +
> > > + CommitTransactionCommand();
> > > ...
> > > ...
> > > }
> > >
> > > Where exactly in the above code, there is a syscache access as
> > > mentioned above StartTransactionCommand()?
> > >
> >
> > It is in walrcv_exec (libpqrcv_processTuples). I have changed the
> > comments to add this info.
> >
>
> Okay, I see that the patch switches context twice once after starting
> the transaction and the second time after committing the transaction,
> why is that required? Also, can't we extend the duration of the
> transaction till the remote_slot information is constructed?

If we extend duration, we have to extend till remote_slot information
is consumed and not only till it is constructed.

> I am
> asking this because the context used is TopMemoryContext which should
> be used only if we need something specific to be retained at the
> process level which doesn't seem to be the case here.
>

Okay, I understand your concern. But this needs more thoughts on shall
we have all the slots synchronized in one txn or is it better to have
it existing way i.e. each slot being synchronized in its own txn
started in synchronize_one_slot. If we go by the former, can it have
any implications? I need to review this bit more before concluding.
.
> I have noticed a few other minor things:
> 1.
> postgres=# select * from pg_logical_slot_get_changes('log_slot_2', NULL, NULL);
> ERROR: cannot use replication slot "log_slot_2" for logical decoding
> DETAIL: This slot is being synced from the primary server.
> ...
> ...
> postgres=# select * from pg_drop_replication_slot('log_slot_2');
> ERROR: cannot drop replication slot "log_slot_2"
> DETAIL: This slot is being synced from the primary.
>
> I think the DETAIL message should be the same in the above two cases.
>
> 2.
> +void
> +WalSndWaitForStandbyConfirmation(XLogRecPtr wait_for_lsn)
> +{
> + List *standby_slots;
> +
> + Assert(!am_walsender);
> +
> + if (!MyReplicationSlot->data.failover)
> + return;
> +
> + standby_slots = GetStandbySlotList(true);
> +
> + ConditionVariablePrepareToSleep(&WalSndCtl->wal_confirm_rcv_cv);
> ...
> ...
>
> Shouldn't we return if the standby slot names list is NIL unless there
> is a reason to do ConditionVariablePrepareToSleep() or any of the code
> following it?
>
> --
> With Regards,
> Amit Kapila.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2023-12-12 09:15:01 Re: Add isCatalogRel in rmgrdesc
Previous Message shveta malik 2023-12-12 08:44:09 Re: How abnormal server shutdown could be detected by tests?