| From: | Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> |
|---|---|
| To: | shveta malik <shveta(dot)malik(at)gmail(dot)com> |
| Cc: | Chao Li <li(dot)evan(dot)chao(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
| Subject: | Re: pg_upgrade: optimize replication slot caught-up check |
| Date: | 2026-01-21 23:49:14 |
| Message-ID: | CAD21AoAqd8=7km_AENWtv2TddXfn3P2veiTmqdmeYh2bSC0owA@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
On Mon, Jan 19, 2026 at 10:38 PM shveta malik <shveta(dot)malik(at)gmail(dot)com> wrote:
>
> On Wed, Jan 14, 2026 at 11:24 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> >
> > I've attached the updated patch.
> >
>
> Thank You for the patch. I like the idea of optimization. Few initial comments:
Thank you for reviewing the patch!
>
> 1)
> + * The query returns the slot names and their caught-up status in
> + * the same order as the results collected by
> + * get_old_cluster_logical_slot_infos(). If this query is changed,
>
> I could not find the function get_old_cluster_logical_slot_infos(), do
> you mean get_old_cluster_logical_slot_infos_query()?
It seems an oversight in commit 6d3d2e8e541f0. I think it should be
get_db_rel_and_slot_infos().
>
> 2)
> " WHERE database = current_database() AND "
> " slot_type = 'logical' AND "
>
> Is there a reason why database = current_database() is placed before
> slot_type = 'logical'? I am not sure how the PostgreSQL optimizer and
> executor will order these predicates, but from the first look,
> slot_type = 'logical' appears cheaper and could be placed first,
> consistent with the ordering used at other places.
Changed.
>
> 3)
> Shouldn’t we add a sanity check inside
> get_old_cluster_logical_slot_infos_query() to ensure that when
> skip_caught_up_check is true, we are on PostgreSQL 18 or lower? This
> would make the function safer for future use if it's called elsewhere.
> I understand the caller already performs a similar check, but I think
> it's more appropriate here since we call
> binary_upgrade_logical_slot_has_caught_up() from inside, which doesn’t
> even exist on newer versions.
What kind of sanity check did you mean? We can have a check with
pg_fatal() but it seems almost the same to me even if pg_upgrade fails
with an error due to missing
binary_upgrade_logical_slot_has_caught_up().
>
> 4)
> +# Check the file content. While both test_slot1 and test_slot2 should
> be reporting
> +# that they have unconsumed WAL records, test_slot3 should not be reported as
> +# it has caught up.
>
> Can you please elaborate the reason behind test_slot3 not being
> reported? Also mention in the comment if possible.
We advance test_slot3 to the current WAL LSN before executing
pg_upgrade, so the test_slot3 should have consumed all pending WALs.
Please refer to the following changes:
# Preparations for the subsequent test:
-# 1. Generate extra WAL records. At this point neither test_slot1 nor
-# test_slot2 has consumed them.
+# 1. Generate extra WAL records. At this point none of slots has consumed them.
#
# 2. Advance the slot test_slot2 up to the current WAL location, but test_slot1
# still has unconsumed WAL records.
#
# 3. Emit a non-transactional message. This will cause test_slot2 to detect the
# unconsumed WAL record.
+#
+# 4. Advance the slot test_slots3 up to the current WAL location.
$oldpub->start;
$oldpub->safe_psql(
'postgres', qq[
CREATE TABLE tbl AS SELECT generate_series(1, 10) AS a;
SELECT pg_replication_slot_advance('test_slot2', pg_current_wal_lsn());
- SELECT count(*) FROM pg_logical_emit_message('false',
'prefix', 'This is a non-transactional message');
+ SELECT count(*) FROM pg_logical_emit_message('false',
'prefix', 'This is a non-transactional message', true);
+ SELECT pg_replication_slot_advance('test_slot3', pg_current_wal_lsn());
I believe that the following new comment explains the reason well:
+# Check the file content. While both test_slot1 and test_slot2 should
be reporting
+# that they have unconsumed WAL records, test_slot3 should not be reported as
+# it has caught up.
I've attached the updated patch.
Regards,
--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com
| Attachment | Content-Type | Size |
|---|---|---|
| v6-0001-pg_upgrade-Optimize-replication-slot-caught-up-ch.patch | application/octet-stream | 19.6 KB |
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Michael Paquier | 2026-01-22 00:02:10 | Re: Flush some statistics within running transactions |
| Previous Message | Sami Imseih | 2026-01-21 23:41:13 | Re: Flush some statistics within running transactions |