From: | Japin Li <japinli(at)hotmail(dot)com> |
---|---|
To: | shveta malik <shveta(dot)malik(at)gmail(dot)com> |
Cc: | Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: Excessive LOG messages from replication slot sync worker |
Date: | 2025-08-12 07:38:34 |
Message-ID: | ME0P300MB04452400335D1F925CCDC865B62BA@ME0P300MB0445.AUSP300.PROD.OUTLOOK.COM |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Tue, Aug 12, 2025 at 12:24:10PM +0530, shveta malik wrote:
> On Tue, Aug 12, 2025 at 8:52 AM Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
> >
> > Hi,
> >
> > While testing the replication slot synchronization feature, I set up
> > three instances on the same machine:
> >
> > - Physical replication primary (also Logical replication publisher)
> > - Physical replication standby
> > - Logical replication subscriber
> >
> > ----------------------------------------
> > initdb -D pri --encoding=UTF8 --locale=C
> > cat <<EOF >> pri/postgresql.conf
> > wal_level = logical
> > log_line_prefix = '%m [pri][%b] '
> > logging_collector = on
> > EOF
> > pg_ctl -D pri start
> > pgbench -i
> > cat <<EOF | psql
> > CREATE PUBLICATION mypub FOR ALL TABLES;
> > SELECT pg_create_physical_replication_slot('pslot');
> > EOF
> > cat <<EOF >> pri/postgresql.conf
> > synchronized_standby_slots = 'pslot'
> > EOF
> > pg_ctl -D pri reload
> >
> > pg_basebackup -D sby -c fast
> > touch sby/standby.signal
> > cat <<EOF >> sby/postgresql.conf
> > port = 5433
> > primary_conninfo = 'host=localhost port=5432 dbname=postgres'
> > primary_slot_name = 'pslot'
> > sync_replication_slots = on
> > hot_standby_feedback = on
> > log_line_prefix = '%m [sby][%b] '
> > EOF
> > pg_ctl -D sby start
> >
> > initdb -D sub --encoding=UTF8 --locale=C
> > cat <<EOF >> sub/postgresql.conf
> > port = 5434
> > wal_level = logical
> > log_line_prefix = '%m [sub][%b] '
> > logging_collector = on
> > EOF
> > pg_ctl -D sub start
> > pgbench -i -Itp -p 5434
> > cat <<EOF | psql -p 5434
> > CREATE SUBSCRIPTION mysub CONNECTION 'host=localhost port=5432
> > dbname=postgres' PUBLICATION mypub WITH (failover);
> > EOF
> > ----------------------------------------
> >
> >
> > I noticed that the replication slot sync worker on the standby generates
> > a large number of LOG messages, about two per 200–300 ms, during
> > pgbench runs (pgbench -c 10 -j 10 -T 60). For example,
> >
> > ----------------------------------------
> > 2025-08-12 11:48:21.067 JST [sby][slotsync worker] LOG: starting
> > logical decoding for slot "mysub"
> > 2025-08-12 11:48:21.067 JST [sby][slotsync worker] DETAIL: Streaming
> > transactions committing after 0/04F317B8, reading WAL from 0/04006FC0.
> > 2025-08-12 11:48:21.067 JST [sby][slotsync worker] LOG: logical
> > decoding found consistent point at 0/04006FC0
> > 2025-08-12 11:48:21.067 JST [sby][slotsync worker] DETAIL: There are
> > no running transactions.
> > 2025-08-12 11:48:21.296 JST [sby][slotsync worker] LOG: starting
> > logical decoding for slot "mysub"
> > 2025-08-12 11:48:21.296 JST [sby][slotsync worker] DETAIL: Streaming
> > transactions committing after 0/04F9DC90, reading WAL from 0/04006FC0.
> > 2025-08-12 11:48:21.297 JST [sby][slotsync worker] LOG: logical
> > decoding found consistent point at 0/04006FC0
> > 2025-08-12 11:48:21.297 JST [sby][slotsync worker] DETAIL: There are
> > no running transactions.
> > 2025-08-12 11:48:21.526 JST [sby][slotsync worker] LOG: starting
> > logical decoding for slot "mysub"
> > 2025-08-12 11:48:21.526 JST [sby][slotsync worker] DETAIL: Streaming
> > transactions committing after 0/0506A5C0, reading WAL from 0/04006FC0.
> > 2025-08-12 11:48:21.527 JST [sby][slotsync worker] LOG: logical
> > decoding found consistent point at 0/04006FC0
> > 2025-08-12 11:48:21.527 JST [sby][slotsync worker] DETAIL: There are
> > no running transactions.
> > 2025-08-12 11:48:21.762 JST [sby][slotsync worker] LOG: starting
> > logical decoding for slot "mysub"
> > 2025-08-12 11:48:21.762 JST [sby][slotsync worker] DETAIL: Streaming
> > transactions committing after 0/050C7F70, reading WAL from 0/04006FC0.
> > 2025-08-12 11:48:21.763 JST [sby][slotsync worker] LOG: logical
> > decoding found consistent point at 0/04006FC0
> > 2025-08-12 11:48:21.763 JST [sby][slotsync worker] DETAIL: There are
> > no running transactions.
> > 2025-08-12 11:48:21.999 JST [sby][slotsync worker] LOG: starting
> > logical decoding for slot "mysub"
> > 2025-08-12 11:48:21.999 JST [sby][slotsync worker] DETAIL: Streaming
> > transactions committing after 0/05111420, reading WAL from 0/04006FC0.
> > 2025-08-12 11:48:21.999 JST [sby][slotsync worker] LOG: logical
> > decoding found consistent point at 0/04006FC0
> > 2025-08-12 11:48:21.999 JST [sby][slotsync worker] DETAIL: There are
> > no running transactions.
> > ----------------------------------------
> >
> >
> > Is this expected behavior, a configuration mistake on my side, or perhaps
> > an oversight in the commit of the replication slot synchronization feature?
> >
> > This seems more like debug output left over from development rather than
> > something useful for users. If this is true, would it make sense to change
> > these messages from LOG to DEBUG?
> >
>
> Thank You for reporting the issue.
>
> It looks like commit 2633dae (mistakenly) introduced a change ([1]) in
> SnapBuildSnapshotExists(), altering the format used for snapshot file
> names during the search. However, SnapBuildSerialize still uses the
> old format ("%s/%X-%X.snap"), which led to the slot-sync worker being
> unable to locate existing snapshot files. Reverting the format
> resolved the issue for me. Could you please check if the attached
> patch works for you as well?
>
> [1]:
> SnapBuildSnapshotExists()
> - sprintf(path, "%s/%X-%X.snap",
> + sprintf(path, "%s/%08X-%08X.snap",
>
Sorry, it's my fault. I forgot to update the format in SnapBuildSerialize().
I'd prefer to update the format in SnapBuildSerialize() instead of reverting
this change.
diff --git a/src/backend/replication/logical/snapbuild.c b/src/backend/replication/logical/snapbuild.c
index 8532bfd27e5..db7051170fc 100644
--- a/src/backend/replication/logical/snapbuild.c
+++ b/src/backend/replication/logical/snapbuild.c
@@ -1529,7 +1529,7 @@ SnapBuildSerialize(SnapBuild *builder, XLogRecPtr lsn)
* unless the user used pg_resetwal or similar. If a user did so, there's
* no hope continuing to decode anyway.
*/
- sprintf(path, "%s/%X-%X.snap",
+ sprintf(path, "%s/%08X-%08X.snap",
PG_LOGICAL_SNAPSHOTS_DIR,
LSN_FORMAT_ARGS(lsn));
@@ -1573,7 +1573,7 @@ SnapBuildSerialize(SnapBuild *builder, XLogRecPtr lsn)
elog(DEBUG1, "serializing snapshot to %s", path);
/* to make sure only we will write to this tempfile, include pid */
- sprintf(tmppath, "%s/%X-%X.snap.%d.tmp",
+ sprintf(tmppath, "%s/%08X-%08X.snap.%d.tmp",
PG_LOGICAL_SNAPSHOTS_DIR,
LSN_FORMAT_ARGS(lsn), MyProcPid);
--
Best regards,
Japin Li
ChengDu WenWu Information Technology Co., LTD.
From | Date | Subject | |
---|---|---|---|
Next Message | Japin Li | 2025-08-12 07:44:56 | Update the LSN format in the comment example |
Previous Message | Gavin Panella | 2025-08-12 07:32:43 | Re: `pg_ctl init` crashes when run concurrently; semget(2) suspected |