Re: POC: enable logical decoding when wal_level = 'replica' without a server restart

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: Shlok Kyal <shlok(dot)kyal(dot)oss(at)gmail(dot)com>
Cc: shveta malik <shveta(dot)malik(at)gmail(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com>, Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: POC: enable logical decoding when wal_level = 'replica' without a server restart
Date: 2025-08-21 17:04:52
Message-ID: CAD21AoAcS2f-i5zc+fxCgmrtg+PLRKX0AsiWL9vzcbPDF5X7DA@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Aug 21, 2025 at 3:50 AM Shlok Kyal <shlok(dot)kyal(dot)oss(at)gmail(dot)com> wrote:
>
> On Fri, 15 Aug 2025 at 04:38, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> >
> > On Tue, Aug 12, 2025 at 1:26 AM Shlok Kyal <shlok(dot)kyal(dot)oss(at)gmail(dot)com> wrote:
> > >
> > >
> > > Hi Sawada-san,
> > >
> > > I have reviewed the patch and have few comments:
> >
> > Thank you for reviewing the patch!
> >
> > >
> > > 1. There are some spelling mistakes in logicaldecoding.sgml
> > > + and requires waiting for any concurrent transactions to finish, ensureing
> > > + system-wide conistency. Conversely, when the last logical replication slot
> > > ensureing -> ensuring
> > > conistency -> consistency
> >
> > Will fix.
> >
> > >
> > > 2. In publicationcmds.c:
> > > + errmsg("logical decoding needs to be enabled to
> > > publish logical changes"),
> > > + errhint("Set \"wal_level\" >= \"logical\" or create a
> > > logical replication slot with \"replica\" WAL level before creating
> > > subscriptions.")));
> > >
> > > Should we use something like this for errhint ?
> > > errhint("Set \"wal_level\" >= \"logical\" or create a logical
> > > replication slot before creating subscriptions when \"wal_level\" =
> > > \"replica\".")));
> >
> > The current message is:
> >
> > Set "wal_level" >= "logical" or create a logical replication slot with
> > "replica" WAL level before creating subscriptions.
> >
> > whereas the proposed message is:
> >
> > Set "wal_level" >= "logical" or create a logical replication slot
> > before creating subscriptions when "wal_level" = "replica".
> >
> > I don't see a big difference between the two sentences but your point
> > is to use 'when "wal_level" = "replica"' instead of 'with "replica"
> > WAL level'?
> >
> Yes. After reading the error messages again, I also think there is no
> big difference and I am fine with both.
>
> > >
> > >
> > > 3. In logical.c:
> > > + errmsg("logical decoding needs to be enabled on the primary"),
> > > + errhint("Set \"wal_level\" >= \"logical\" or create
> > > at least one logical slot with \"replica\" WAL level on the
> > > primary.")));
> > >
> > > Should we change the errhint message as below?
> > > errhint("Set \"wal_level\" >= \"logical\" or create at least one
> > > logical slot on the primary when \"wal_level\" = \"replica\".")));
> > >
> > > 4. In slotsync.c:
> > > + errmsg("replication slot synchronization requires
> > > logical decoding to be enabled"),
> > > + errhint("Set \"wal_level\" >= \"logical\" or create at
> > > least one logical slot with \"replica\" WAL level on the primary "));
> > >
> > > Should we change the errhint message as below?
> > > errhint("Set \"wal_level\" >= \"logical\" or create at least one
> > > logical slot on the primary when \"wal_level\" = \"replica\".")));
> >
> > Please see the above question.
> >
> > >
> > > ---------
> > >
> > > I have also tested the patch with creating multiple permanent/
> > > temporary slots in concurrent sessions and I did not find any issue. I
> > > also tested this patch with a physical replication setup.
> > > I have a doubt in this case:
> > > 1. Suppose we have a physical replication setup. wal_level on primary
> > > is set to 'replica'
> > > 2. Now we try to create a logical slot on standby, an error is thrown
> > > as wal_level is set to 'replica' as primary does not have any logical
> > > slot
> > > 3. Now we create a temporary logical slot on primary,
> > > effective_wal_level is set to logical.
> > > 4. Now we can create slots on standby as effective_wal_level is logical.
> > > 5. Now we exit the session of the primary server. The temporary slot
> > > is dropped. This will invalidate the slots on standby as the
> > > effective_wal_level will be set to 'replica'.
> > > So we can say that indirectly a temporary slot on primary can control
> > > the behaviour of permanent slots on standbys.
> > >
> > > I checked this behaviour in HEAD. In HEAD also the behaviour is the
> > > same. If we change the wal_level on primary from 'logical' to
> > > 'replica', all slots are invalidated on the standby.
> > > With patch this behaviour can be indirectly controlled by a temporary
> > > slot. Is it fine? Thoughts?
> >
> > Your understanding is correct. I've discussed whether we need a way to
> > keep auto-increased 'logical' WAL level on the primary when standbys
> > have logical slots. You mentioned temporary logical slots cases but I
> > think the same is true for the case where users accidentally drop the
> > last logical slot.
> >
> > My understanding is that it's fine that logical decoding availability
> > on standbys is controlled by primary's logical slots (including temp
> > slots) presence. This essentially is the same behavior as the current
> > one and users who are concerned about indirectly invalidating the
> > logical slots on standbys can set wal_level to 'logical' on the
> > primary. It's a separate discussion (and patch) whether we need to
> > provide a way for users to keep auto-increased 'logical' WAL level on
> > the primary.
> >
> Thanks for the explanation. I agree with you.
>
> Also,
> I did some testing with pg_createsubscriber and it is working fine. I
> have some more comments:
>
> 1. in 040_pg_createsubscriber.pl we have:
> # Check some unmet conditions on node P
> $node_p->append_conf(
> 'postgresql.conf', q{
> wal_level = replica
> max_replication_slots = 1
> max_wal_senders = 1
> max_worker_processes = 2
> });
> Comment says: "Check some unmet conditions on node P". But with this
> patch "wal_level = replica", will be a valid configuration, so it
> will be contradictory to comment. Should we remove "wal_level =
> replica" from the append_conf?
>
> 2. If we plan to change the above then we should also remove
> "wal_level = logical" in the following:
> # Restore default settings here but only apply it after testing standby. Some
> # standby settings should not be a lower setting than on the primary.
> $node_p->append_conf(
> 'postgresql.conf', q{
> wal_level = logical
> max_replication_slots = 10
> max_wal_senders = 10
> max_worker_processes = 8
> });

Thank you for testing and reviewing the patch! I agree with above
comments so I incorporated them into the latest version patch I've
just submitted[1].

Regards,

[1] https://www.postgresql.org/message-id/CAD21AoBNCf_Yr%3Db7FbVpMPS4Vt6x-uqcLT3ELtATRFB9jUC3QQ%40mail.gmail.com

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Masahiko Sawada 2025-08-21 17:22:10 Re: Logical Replication of sequences
Previous Message Masahiko Sawada 2025-08-21 17:03:23 Re: POC: enable logical decoding when wal_level = 'replica' without a server restart