| From: | Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> |
|---|---|
| To: | shveta malik <shveta(dot)malik(at)gmail(dot)com> |
| Cc: | Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, Shlok Kyal <shlok(dot)kyal(dot)oss(at)gmail(dot)com>, Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
| Subject: | Re: POC: enable logical decoding when wal_level = 'replica' without a server restart |
| Date: | 2025-12-08 07:36:20 |
| Message-ID: | CAD21AoByAJ-AkfeaNThCQnn7sfnEsvsmk=Ct=xjaV-oY-sP=wA@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
On Fri, Dec 5, 2025 at 4:56 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>
> On Thu, Dec 4, 2025 at 11:13 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> >
> > On Thu, Dec 4, 2025 at 7:59 PM shveta malik <shveta(dot)malik(at)gmail(dot)com> wrote:
> > >
> > > On Fri, Dec 5, 2025 at 12:23 AM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> > > >
> > > > On Thu, Dec 4, 2025 at 1:30 AM shveta malik <shveta(dot)malik(at)gmail(dot)com> wrote:
> > > > >
> > > > > On Thu, Dec 4, 2025 at 1:54 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> > > > > >
> > > > > >
> > > > > > I've attached the updated patch that incorporated the review comments
> > > > > > and is rebased to the current HEAD.
> > > > > >
> > > > >
> > > > > Thanks, a few things:
> > > > >
> > > > >
> > > > > 1)
> > > > > + The system automatically increases the
> > > > > + effective WAL level to <literal>logical</literal> when
> > > > > creating the first
> > > > > + logical replication slot, and decreases it back to
> > > > > <literal>replica</literal>
> > > > > + when dropping the last logical replication slot. The current
> > > > > effective WAL
> > > > > + level can be monitored through <xref
> > > > > linkend="guc-effective-wal-level"/>
> > > > > + parameter.
> > > > >
> > > > > Shouldn't we mention about invalidation of slot along with dropping of
> > > > > slot? I have suggested this earlier but I think it got missed to be
> > > > > addressed.
> > > >
> > > > Fixed. Sorry I missed it.
> > > >
> > > > >
> > > > > 2)
> > > > > + else if (sync_replication_slots)
> > > > > + {
> > > > > + /*
> > > > > + * Signal the postmaster to launch the slotsync worker.
> > > > > + *
> > > > > + * XXX: For simplicity, we keep the slotsync worker running
> > > > > + * even after logical decoding is disabled. A future
> > > > > + * improvement can consider starting and stopping the worker
> > > > > + * based on logical decoding status change.
> > > > > + */
> > > > > + kill(PostmasterPid, SIGUSR1);
> > > > > + }
> > > > > + }
> > > > > +
> > > > > + /* Update the status on shared memory */
> > > > > + UpdateLogicalDecodingStatus(logical_decoding, true);
> > > > >
> > > > > I see that we have moved 'Update' post slotsync's start attempt. This
> > > > > leaves a possibility that that slot-sync is started sooner than last
> > > > > 'Update' call and thus slotsync may exit with:
> > > > > replication slot synchronization requires "effective_wal_level" >=
> > > > > "logical" on the primary
> > > > >
> > > > > I see that update was prior to the slotsync step in earlier patches.
> > > > > Why have we moved it to a later stage?
> > > >
> > > > You're right. I've reconsidered the operation order and reverted the
> > > > previous change. On further thought, I think it should not happen to
> > > > decode the STATUS_CHANGE record because when applying the
> > > > STATUS_CHANGE record, we disable logical decoding first and then
> > > > invalidate the slots. There is no window where a logical slot can
> > > > creep in on the standby, and on the primary it cannot restart after
> > > > changing wal_level < replica if there is a pre-existing slot. If my
> > > > understanding is right, we should put a sanity check in xlog_decode().
> > > >
> > > > >
> > > > > 3)
> > > > > + /* Return if someone already started to enable logical decoding */
> > > > >
> > > > > Shall we update:
> > > > > Return if someone already started to enable logical decoding, or if it
> > > > > is already enabled.
> > > >
> > > > Fixed.
> > > >
> > > > I've attached the updated patches. The 0002 patch is a patch to
> > > > address the above review comments. If it looks good, I'll merge these
> > > > changes.
> > > >
> > >
> > > Thanks, I will review.
> > >
> > > While testing one of the scenarios on v33 patch, I observed a strange
> > > behaviour. I could not reproduce it again. I am not sure if it is due
> > > to the patch. Please have a look once.
> > > The scenario is:
> > >
> > > pub: create a slot to enable logical decoding, create table, insert few records
> > > standby: create slot and try to get-changes, get-changes hit some error:
> > >
> > > postgres=# SELECT pg_create_logical_replication_slot('slot13',
> > > 'pgoutput', false, false, false);
> > > pg_create_logical_replication_slot
> > > ------------------------------------
> > > (slot13,0/0306D448)
> > > (1 row)
> > >
> > > postgres=# select pg_logical_slot_get_binary_changes('slot13', NULL,
> > > NULL, 'proto_version', '4', 'publication_names', 'pub');
> > > ERROR: cannot query non-catalog table "pg_class" during logical decoding
> > >
> >
> > Hmm, ISTM that the root cause is that we don't synchronously update
> > the XLogLogicalInfo cache on the standby. A backend on the standby who
> > started when logical decoding is disabled keeps holding
> > XLogLogicalInfo = false until the promotion. So even if the startup
> > process enables logical decoding when applying the STATUS_CHANGE
> > record, logical decoding is enabled on the standby but the backend
> > process would start logical decoding with XLogLogicalInfo = false,
> > resulting in RelationIsAccessibleInLogicalDecoding() returns false. I
> > missed the fact that we check XLogLogicalInfoActive() even read paths.
> > Will fix it.
> >
>
> I noticed that the XLogLogicalInfoXactCache doesn't work as expected
> even on the primary; because the process keep using the
> XLogLogicalInfoXactCache in a transaction, sending
> PROCSIGNAL_BARRIER_UPDATE_XLOG_LOGICAL_INFO with a wait doesn't
> actually ensure that all processes are writing logical-WAL records
> before enabling logical decoding. XLogLogicalInfoXactCache could be
> cached even read paths without XID assignment, for example when HOT
> pruning happens while reading a table. So, if a process has been
> executing only read queries while logical decoding is disabled, it has
> XLogLogicalInfoXactCache=0 and can get an XID after logical decoding
> is enabled, writing non-logical WAL records. Such read-only
> transactions don't appear in xl_running_xacts so we cannot wait for
> them to finish during logical decoding initialization.
>
> I think there are two possible solutions to resolve this issue:
>
> 1. Revert the XLogLogicalInfoXactCache idea. While we can simplify the
> code, I'm concerned that it could be problematic if
> XLogLogicalInfoActive() could return different results whenever it's
> called. For instance, we should at least remove the assertion in
> ExecuteTruncateGuts(), but we could face a similar issue in the
> future.
I'm concerned that even this idea could be problematic without waiting
for all running transactions to finish before enabling logical
decoding, depending on how we write WAL records. If
XLogLogicalInfoActive() could return a different value within the same
command, we could write a WAL record mixiting logical-information and
non-logical information, because some functions use
XLogLogicalInfoActive() several times before writing a one WAL record.
Further, if there is a path where a transaction decides whether to
include logical information based on XLogLogicalInfoActive() *before*
getting an XID, the transaction might not appear in xl_running_xacts
record, resulting in that logical decoding that started before the
transaction decodes such WAL records in the CONSISTENT snapbuild
state. Therefore, I'm thinking that in addition to making
XLogLogicalInfoActive() return a consistent result within a single
command or in a transaction, we would need to wait for all running
transactions (including those who don't have an XID) to finish before
enabling logical decoding.
Regards,
--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Bertrand Drouvot | 2025-12-08 07:43:34 | Re: More const-marking cleanup |
| Previous Message | Chao Li | 2025-12-08 07:35:54 | Re: Fix LOCK_TIMEOUT handling in slotsync worker |