From: | shveta malik <shveta(dot)malik(at)gmail(dot)com> |
---|---|
To: | Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> |
Cc: | Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com>, Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, shveta malik <shveta(dot)malik(at)gmail(dot)com> |
Subject: | Re: POC: enable logical decoding when wal_level = 'replica' without a server restart |
Date: | 2025-06-06 10:02:45 |
Message-ID: | CAJpy0uDW6BpNXLZ0AaP=_GU6pCsZf_7Sk2R0Ti+ov+EO6ruMkg@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Wed, Jun 4, 2025 at 3:40 PM shveta malik <shveta(dot)malik(at)gmail(dot)com> wrote:
>
> On Wed, Jun 4, 2025 at 6:41 AM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> >
> > On Tue, May 20, 2025 at 9:54 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> > >
> > > Yeah, I find the idea that the presence of a logical slot will allow
> > > the user to enable logical decoding/replication more appealing than
> > > this new alternative, leaving aside the challenges of realizing it.
>
> +1. This idea appears more user-friendly and easier to understand
> compared to other approaches, such as having multiple GUCs or using
> ALTER SYSTEM.
>
> > I've drafted this idea. Here are summary for attached two patches:
> >
> > 0001 patch allows us to create a logical slot without WAL reservation.
> >
> > 0002 patch is the main patch for dynamically enabling/disabling
> > logical decoding when wal_level is 'replica'.
>
> Thank You for the patches. I have done some initial testing, it seems
> to be working well. I will do more testing and review and will share
> further feedback.
I reviewed further and had few concerns:
1)
We now invalidate slots on standby if the primary (with
wal_level=replica) has dropped the last logical slot and internally
reverted its runtime (effective) wal_level back to replica. Consider
the following scenario involving a cascaded logical replication setup:
a) The publisher is configured with wal_level = replica and has
created a publication (pub1).
b) A subscriber server creates a subscription (sub1) to pub1. As part
of the slot creation for sub1, the publisher's effective wal_level is
switched to logical.
c) The publisher also has a physical standby, which in turn has its
own logical subscriber, named standby_sub1.
At this point, everything works as expected i.e. changes from the
publisher flow through the physical standby and are replicated to
standby_sub1. Now if the user drops sub1, the replication slot on the
primary is also dropped. Since this was the last logical slot, the
primary automatically switches its effective wal_level back to
replica. This change propagates to the standby, causing it to
invalidate the slot for standby_sub1. As a result, the standby logs
the following error:
STATEMENT: START_REPLICATION SLOT "standby_sub1" LOGICAL 0/0 (...)
ERROR: logical decoding needs to be enabled on the primary
Even if we manually recreate a logical slot on the primary afterward,
the standby_sub1 subscriber is not able to proceed:
ERROR: can no longer access replication slot "standby_sub1"
DETAIL: This replication slot has been invalidated due to
"wal_level_insufficient".
So the removal of the logical subscriber for the publisher has somehow
restricted the logical subscriber of standby to work. Is this
behaviour acceptable?
Without this feature, if I manually switch back wal_level to replica
on primary, then it will fail to start. This makes the issue obvious
and prevents misconfiguration.
FATAL: logical replication slot "sub2" exists, but "wal_level" < "logical"
HINT: Change "wal_level" to be "logical" or higher.
But the current behaviour is harder to diagnose, as the problem is
effectively hidden behind subscription/slot creation/deletion.
2)
'show effective_wal_level' shows output as 'logical' if a slot exists
on primary. But on physical standby, it still shows it as 'replica'
even in the presence of slots. Is this intentional?
3)
I haven’t tested this yet, but I’d like to discuss what the expected
behavior should be if a slot exists on the primary but is marked as
invalidated. Will an invalidated slot still cause the effective
wal_level to remain at logical, or will invalidating the only logical
slot trigger a switch back to replica?
There is a chance that a slot with un-reserved wal may be invalidated
due to time-out.
thanks
Shveta
From | Date | Subject | |
---|---|---|---|
Next Message | Tatsuro Yamada | 2025-06-06 10:03:27 | Re: Add enable_groupagg GUC parameter to control GroupAggregate usage |
Previous Message | Zhijie Hou (Fujitsu) | 2025-06-06 09:47:49 | RE: Conflict detection for update_deleted in logical replication |