Re: POC: enable logical decoding when wal_level = 'replica' without a server restart

From: Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, shveta malik <shveta(dot)malik(at)gmail(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, Shlok Kyal <shlok(dot)kyal(dot)oss(at)gmail(dot)com>, Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: POC: enable logical decoding when wal_level = 'replica' without a server restart
Date: 2026-01-31 12:10:47
Message-ID: CAEze2Wg9fWze9dA3GssLVP_TZNV0DqdNq2Td8XZ5XHJtqA1SDw@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, 8 Jan 2026 at 06:16, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> > > NB. I'm not opposed to changing wal_level in a running cluster, and I
> > > do think that the current xact+checkpoint -based approach to selecting
> > > the local effective_wal_level is fine, as well as standby picking up
> > > the primary's current setting; it's the trigger condition for the
> > > decision to change effective_wal_level that I have problems with.
> > >
> >
> > Thank you for the comments.
> >
> > I understand the concern that users with the REPLICATION privilege can
> > now effectively control wal_level, potentially increasing system-wide
> > overhead. While the REPLICATION privilege already implies a high
> > degree of trust as we allow it to take a basebackup and create a
> > physical slot etc., I agree that this feature might elevate that power
> > further, and we may need a mechanism to address this.
> >
>
> If we allow taking the entire physical data via the REPLICATION
> privilege, then the user must already be highly privileged. Such a
> user is already allowed to read every byte of data in the database via
> physical streaming. Now, such a user influencing wal_level to be
> changed from 'replica' to 'logical' is of lesser harm.

I don't think the harm of changing wal_level from 'replica' to
'logical' is decreased, because the harm is in the distributed
performance impact, not the access to data. A physical replication
slot does not (need to) impact the write performance of other backends
if it's sufficiently partitioned from other workloads (not configured
for syncrep, etc.), but wal_level=logical cannot be partitioned from
write workloads as it adds a non-negotiable overhead to the write
workloads of other backends, as they now needs to track more data
(identity columns) and must write more WAL.

> I agree that it
> can lead to some non-malicious impact, like disk space (due to
> increased WAL volume), and extra CPU consumption due to extra WAL
> volume. But I think REPLICATION privilege can already lead to extra
> CPU consumption due to wal_sender activity, and even disk space by not
> letting the slot advance, which can even crash the system.
>
> Since these users already have the power to access all data and cause
> a Denial of Service (DoS) via disk exhaustion, the ability to
> "upgrade" WAL logging from replica to logical can be seen as an
> incremental addition to an already highly trusted role. I think we can
> update the documentation of the REPLICATION privilege.

Replication slots that keep WAL from being recycled can be monitored
(and therefore, likely acted on) before the relevant problem (OOD)
occurs; which is not the case with the current effective_wal_level
implementation. One moment your tps is normal, the next moment it
drops because a role with REPLICATION added a logical slot, and you'll
have to delete it and wait for a checkpoint to revert back to replica.
The difference here is reaction time until it starts impacting
transactions.

> >
> > To address your concerns, I have come up with the following ideas:
> >
>
> I feel, If an administrator does not want to allow logical decoding,
> they can set max_replication_slots to a value that only covers their
> known physical replicas. So, they can still control the additional CPU
> consumption if they are worried that it can cause harm. The other
> possibility is to have a separate GUC for logical slots such as
> max_logical_replication_slots. So, still, an administrator can keep
> control.

As mentioned by Ashutosh, it's not strange to configure
max_replication_slots with some leeway; e.g. to allow for new
permanent replicas to be added, or to make scheduled failovers less
painful by being able to pre-provision the new secondary replica ahead
of time. max_logical_replication_slots could as extension to this, but
it feels like putting the cart before the horse: Instead of not
allowing REPLICA users to effectuate change effective_wal_level, you
now don't allow them to create replications slots, which then has the
side effect of not triggering effective_wal_level=logical. I would
personally prefer a wal_level=dynamic or such, which could be put
between replica and logical.

Kind regards,

Matthias van de Meent
Databricks (https://www.databricks.com)

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message David E. Wheeler 2026-01-31 14:15:33 Re: ABI Compliance Checker GSoC Project
Previous Message David Rowley 2026-01-31 11:27:02 Re: More speedups for tuple deformation