| From: | Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> |
|---|---|
| To: | Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> |
| Cc: | shveta malik <shveta(dot)malik(at)gmail(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, Shlok Kyal <shlok(dot)kyal(dot)oss(at)gmail(dot)com>, Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
| Subject: | Re: POC: enable logical decoding when wal_level = 'replica' without a server restart |
| Date: | 2025-11-27 10:59:11 |
| Message-ID: | CAA4eK1LQUVbYfq5bdbJ5qHYtiAeQee5vM8n9nDT-iuT+W3DtiA@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
On Thu, Nov 27, 2025 at 2:32 AM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>
> I've squashed all fixup patches and attached the updated patch.
>
1.
<literal>wal_level_insufficient</literal> means that the
- primary doesn't have a <xref linkend="guc-wal-level"/> sufficient to
- perform logical decoding. It is set only for logical slots.
+ primary doesn't have a <xref linkend="guc-effective-wal-level"/>
+ to perform logical decoding.
sufficient is missing after "guc-effective-wal-level"
2.
+ * With 'minimal' WAL level, there are not logical replication slots
+ * during recovery.
/not/no. Typo
3.
case XLOG_LOGICAL_DECODING_STATUS_CHANGE:
{
- xl_parameter_change *xlrec =
- (xl_parameter_change *) XLogRecGetData(buf->record);
+ bool logical_decoding;
- /*
- * If wal_level on the primary is reduced to less than
- * logical, we want to prevent existing logical slots from
- * being used. Existing logical slots on the standby get
- * invalidated when this WAL record is replayed; and further,
- * slot creation fails when wal_level is not sufficient; but
- * all these operations are not synchronized, so a logical
- * slot may creep in while the wal_level is being reduced.
- * Hence this extra check.
- */
- if (xlrec->wal_level < WAL_LEVEL_LOGICAL)
+ memcpy(&logical_decoding, XLogRecGetData(buf->record), sizeof(bool));
The patch has entirely removed this comment but I feel we should write
something similar to it especially for the part: "Existing logical
slots on the standby get invalidated when this WAL record is replayed;
and further, slot creation fails when wal_level is not sufficient; but
all these operations are not synchronized, so a logical slot may creep
in while the wal_level is being reduced. Hence this extra check." Did
anything change about this part of the comment?
4.
WaitLSN "Waiting to read or update shared Wait-for-LSN state."
+LogicalDecodingControl "Waiting to access logical decoding status information."
Seeing the description just above, won't it be correct to say:"Waiting
to read or update logical decoding status information."?
5. The newly added test took approximately 8s on my machine, whereas
other similar tests normally took 2-6s on the same machine, though
there are some exceptions, such as 035_standby_logical_decoding.pl.
See below results of some of the tests:
-------
[10:03:37] t/028_pitr_timelines.pl ............... ok 2254 ms (
0.00 usr 0.00 sys + 0.39 cusr 0.83 csys = 1.22 CPU)
[10:03:39] t/029_stats_restart.pl ................ ok 2915 ms (
0.00 usr 0.00 sys + 0.34 cusr 0.42 csys = 0.76 CPU)
[10:03:42] t/030_stats_cleanup_replica.pl ........ ok 2282 ms (
0.00 usr 0.00 sys + 0.42 cusr 0.66 csys = 1.08 CPU)
[10:03:45] t/031_recovery_conflict.pl ............ ok 2705 ms (
0.00 usr 0.00 sys + 0.39 cusr 0.64 csys = 1.03 CPU)
[10:03:47] t/032_relfilenode_reuse.pl ............ ok 2611 ms (
0.01 usr 0.00 sys + 0.37 cusr 0.61 csys = 0.99 CPU)
[10:03:50] t/033_replay_tsp_drops.pl ............. ok 4860 ms (
0.00 usr 0.00 sys + 0.57 cusr 1.60 csys = 2.17 CPU)
[10:03:55] t/034_create_database.pl .............. ok 922 ms (
0.00 usr 0.00 sys + 0.19 cusr 0.19 csys = 0.38 CPU)
[10:03:56] t/035_standby_logical_decoding.pl ..... ok 10899 ms (
0.01 usr 0.00 sys + 1.13 cusr 2.21 csys = 3.35 CPU)
[10:04:07] t/036_truncated_dropped.pl ............ ok 1781 ms (
0.00 usr 0.00 sys + 0.21 cusr 0.22 csys = 0.43 CPU)
[10:04:09] t/037_invalid_database.pl ............. ok 944 ms (
0.00 usr 0.00 sys + 0.19 cusr 0.21 csys = 0.40 CPU)
[10:04:09] t/038_save_logical_slots_shutdown.pl .. ok 1562 ms (
0.00 usr 0.00 sys + 0.21 cusr 0.36 csys = 0.57 CPU)
[10:04:11] t/039_end_of_wal.pl ................... ok 4638 ms (
0.00 usr 0.00 sys + 0.48 cusr 0.66 csys = 1.14 CPU)
[10:04:16] t/040_standby_failover_slots_sync.pl .. ok 7418 ms (
0.01 usr 0.00 sys + 0.81 cusr 1.82 csys = 2.64 CPU)
[10:04:23] t/041_checkpoint_at_promote.pl ........ ok 1535 ms (
0.00 usr 0.00 sys + 0.29 cusr 0.51 csys = 0.80 CPU)
[10:04:25] t/042_low_level_backup.pl ............. ok 2842 ms (
0.00 usr 0.00 sys + 0.37 cusr 0.66 csys = 1.03 CPU)
[10:04:27] t/043_no_contrecord_switch.pl ......... ok 1946 ms (
0.00 usr 0.00 sys + 0.32 cusr 0.69 csys = 1.01 CPU)
[10:04:29] t/044_invalidate_inactive_slots.pl .... ok 603 ms (
0.00 usr 0.00 sys + 0.19 cusr 0.17 csys = 0.36 CPU)
[10:04:30] t/045_archive_restartpoint.pl ......... ok 4324 ms (
0.00 usr 0.00 sys + 0.97 cusr 0.66 csys = 1.63 CPU)
[10:04:34] t/046_checkpoint_logical_slot.pl ...... ok 3322 ms (
0.00 usr 0.00 sys + 0.33 cusr 0.55 csys = 0.88 CPU)
[10:04:38] t/047_checkpoint_physical_slot.pl ..... ok 1919 ms (
0.00 usr 0.00 sys + 0.28 cusr 0.43 csys = 0.71 CPU)
[10:04:40] t/048_vacuum_horizon_floor.pl ......... ok 1413 ms (
0.01 usr 0.00 sys + 0.26 cusr 0.53 csys = 0.80 CPU)
[10:04:41] t/049_wait_for_lsn.pl ................. ok 6851 ms (
0.00 usr 0.00 sys + 0.40 cusr 0.71 csys = 1.11 CPU)
[10:04:48] t/050_effective_wal_level.pl .......... ok 8106 ms (
0.00 usr 0.00 sys + 0.83 cusr 1.79 csys = 2.62 CPU)
---------
I haven't investigated to see if we can optimize or reduce the test
timing without impacting the coverage or functionality, but just see
if we can reduce it. If you think we can't do anything on this front
without compromising functionality coverage, then I think we can live
with it.
--
With Regards,
Amit Kapila.
| From | Date | Subject | |
|---|---|---|---|
| Next Message | shveta malik | 2025-11-27 11:03:11 | Re: POC: enable logical decoding when wal_level = 'replica' without a server restart |
| Previous Message | Rafia Sabih | 2025-11-27 10:50:27 | Re: Bypassing cursors in postgres_fdw to enable parallel plans |