From: | shveta malik <shveta(dot)malik(at)gmail(dot)com> |
---|---|
To: | Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> |
Cc: | Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, Shlok Kyal <shlok(dot)kyal(dot)oss(at)gmail(dot)com>, Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, shveta malik <shveta(dot)malik(at)gmail(dot)com> |
Subject: | Re: POC: enable logical decoding when wal_level = 'replica' without a server restart |
Date: | 2025-10-10 08:54:10 |
Message-ID: | CAJpy0uDA_5untUaMiSoWEfpFEzhwNMoepAp8+3u=-yDq2wnkwg@mail.gmail.com |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Fri, Oct 10, 2025 at 9:32 AM shveta malik <shveta(dot)malik(at)gmail(dot)com> wrote:
>
> On Thu, Oct 9, 2025 at 4:14 AM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> >
> > Thank you for the comments.
> >
> > I've attached the updated patch. As we discussed, it now uses the lazy
> > deactivation in all cases and has the discussed points in the comments
> > atop logicalctl.c.
> >
>
> Thank You for making the changes. I am in the process of verifying the
> patch. Please find one concern.
>
> When I promote a standby, there's a brief period during
> recovery-completion when status_change_allowed is set to true and also
> RecoveryInProgress() is still true. If I try to create a logical
> replication slot on the standby during this window, it fails with the
> following error:
>
> postgres=# SELECT pg_create_logical_replication_slot('slot1',
> 'pgoutput', false, false, false);
> ERROR: logical decoding on standby requires "effective_wal_level" >=
> "logical" on the primary
> HINT: To enable logical decoding, set "wal_level" >= "logical" or
> create at least one logical slot when "wal_level" = "replica".
>
> Note that the primary already has logical slots. This issue occurs
> because logical decoding is disabled by
> UpdateLogicalDecodingStatusEndOfRecovery() due to no existing slots on
> standby while RecoveryInProgress() is still true. Should
> CheckLogicalDecodingRequirements() also consider
> 'status_change_allowed' on standby to handle this transition more
> gracefully?
>
Please find a few more comments:
1)
I am creating a subscription for publication present on standby, I get
this error:
postgres=# CREATE subscription sub1 connection 'dbname=postgres
host=localhost user=shveta port=5434' publication pub1;
ERROR: could not create replication slot "sub1": ERROR: logical
decoding on standby requires "effective_wal_level" >= "logical" on the
primary
HINT: To enable logical decoding, set "wal_level" >= "logical" or
create at least one logical slot when "wal_level" = "replica".
Should Hint also mention 'on primary' i.e. 'To enable logical decoding
on primary, set..' , otherwise it could be slightly confusing to
follow HINT.
2)
CheckLogicalDecodingRequirements():
Is there a reason for moving 'logical decoding on standby requires..'
error before 'logical decoding requires a database connection' error?
Shall we keep the same order as earlier?
3)
GetActiveWalLevelOnStandby() is not being used anywhere now.
4)
+ if (!old_pending_disable)
+ {
+ volatile PROC_HDR *procglobal = ProcGlobal;
+ ProcNumber checkpointerProc = procglobal->checkpointerProc;
+
+ /* Wake up the checkpointer */
+ if (checkpointerProc != INVALID_PROC_NUMBER)
+ SetLatch(&GetPGProcByNumber(checkpointerProc)->procLatch);
+ }
Why are we waking the checkpointer only when pending_disable changes
from false to true? When this function is called and
old_pending_disable is still true, wouldn’t that be an even stronger
reason to wake the checkpointer?
5)
+ /*
+ * Update shmem flags. We don't need to care about the order of setting
+ * global flag and writing the WAL record writes are not allowed yet.
+ */
Comment is somewhat problematic.
6)
ReportSlotInvalidation():
+ appendStringInfoString(&err_detail, _("Logical decoding on standby
requires \"wal_level\" >= \"logical\" or at least one logical slot on
the primary server."));
This message could be a little ambiguous. It is not completely clear
that that wal_level=logical is needed on standby or on primary. How
about below msg:
Logical decoding on standby requires the primary server to have either
wal_level >= logical or at least one logical replication slot created.
7)
+ "effective_wal_level got decreased to 'replica' on the primary to
invalidate stnadby's slots"
stnadby --> standby
8)
+ # Trigger promotion with no wait for the startup process to reach the
+ # injection point.
+ $standby5->safe_psql('postgres', qq[select pg_promote(false)]);
+ note('promote the standby and waiting for injection_point');
+ $standby5->wait_for_event('startup',
+ 'startup-logical-decoding-status-change-end-of-recovery');
+ note(
+ "injection_point
'startup-logical-decoding-status-change-end-of-recovery' is reached"
+ );
Is the comment correct here? We are waiting to reach the injection
point? But the comment says 'no wait for'?
9)
+ # Drop the logical slot, requesting to disable logical decoding to
the checkpointer.
+ # It has to wait for the recovery to complete before disabling
logical decoding.
Double space in 'It has'.
10)
+ # Drop the logical slot, requesting to disable logical decoding to
the checkpointer.
+ # It has to wait for the recovery to complete before disabling
logical decoding.
+ $standby5->safe_psql('postgres',
+ qq[select pg_drop_replication_slot('standby5_slot');]);
+
+ # Resume the startup process to complete the recovery.
+ $standby5->safe_psql('postgres',
+ qq[select injection_points_wakeup('startup-logical-decoding-status-change-end-of-recovery')]
+ );
I think between these 2 steps, we should check logs for 'waiting for
recovery completion to change logical decoding status'.
The test will become more clear then.
thanks
Shveta
From | Date | Subject | |
---|---|---|---|
Next Message | Aya Iwata (Fujitsu) | 2025-10-10 09:04:54 | RE: [PROPOSAL] Termination of Background Workers for ALTER/DROP DATABASE |
Previous Message | Chao Li | 2025-10-10 08:08:28 | Re: URLs in rbtree.c are broken |