From: | shveta malik <shveta(dot)malik(at)gmail(dot)com> |
---|---|
To: | Fujii Masao <masao(dot)fujii(at)gmail(dot)com> |
Cc: | PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, shveta malik <shveta(dot)malik(at)gmail(dot)com> |
Subject: | Re: Excessive LOG messages from replication slot sync worker |
Date: | 2025-08-12 06:54:10 |
Message-ID: | CAJpy0uDYtHGLkcr1sDqKFUm-ef33Kdky9R8qdDC5JgeM7VdWNA@mail.gmail.com |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Tue, Aug 12, 2025 at 8:52 AM Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
>
> Hi,
>
> While testing the replication slot synchronization feature, I set up
> three instances on the same machine:
>
> - Physical replication primary (also Logical replication publisher)
> - Physical replication standby
> - Logical replication subscriber
>
> ----------------------------------------
> initdb -D pri --encoding=UTF8 --locale=C
> cat <<EOF >> pri/postgresql.conf
> wal_level = logical
> log_line_prefix = '%m [pri][%b] '
> logging_collector = on
> EOF
> pg_ctl -D pri start
> pgbench -i
> cat <<EOF | psql
> CREATE PUBLICATION mypub FOR ALL TABLES;
> SELECT pg_create_physical_replication_slot('pslot');
> EOF
> cat <<EOF >> pri/postgresql.conf
> synchronized_standby_slots = 'pslot'
> EOF
> pg_ctl -D pri reload
>
> pg_basebackup -D sby -c fast
> touch sby/standby.signal
> cat <<EOF >> sby/postgresql.conf
> port = 5433
> primary_conninfo = 'host=localhost port=5432 dbname=postgres'
> primary_slot_name = 'pslot'
> sync_replication_slots = on
> hot_standby_feedback = on
> log_line_prefix = '%m [sby][%b] '
> EOF
> pg_ctl -D sby start
>
> initdb -D sub --encoding=UTF8 --locale=C
> cat <<EOF >> sub/postgresql.conf
> port = 5434
> wal_level = logical
> log_line_prefix = '%m [sub][%b] '
> logging_collector = on
> EOF
> pg_ctl -D sub start
> pgbench -i -Itp -p 5434
> cat <<EOF | psql -p 5434
> CREATE SUBSCRIPTION mysub CONNECTION 'host=localhost port=5432
> dbname=postgres' PUBLICATION mypub WITH (failover);
> EOF
> ----------------------------------------
>
>
> I noticed that the replication slot sync worker on the standby generates
> a large number of LOG messages, about two per 200–300 ms, during
> pgbench runs (pgbench -c 10 -j 10 -T 60). For example,
>
> ----------------------------------------
> 2025-08-12 11:48:21.067 JST [sby][slotsync worker] LOG: starting
> logical decoding for slot "mysub"
> 2025-08-12 11:48:21.067 JST [sby][slotsync worker] DETAIL: Streaming
> transactions committing after 0/04F317B8, reading WAL from 0/04006FC0.
> 2025-08-12 11:48:21.067 JST [sby][slotsync worker] LOG: logical
> decoding found consistent point at 0/04006FC0
> 2025-08-12 11:48:21.067 JST [sby][slotsync worker] DETAIL: There are
> no running transactions.
> 2025-08-12 11:48:21.296 JST [sby][slotsync worker] LOG: starting
> logical decoding for slot "mysub"
> 2025-08-12 11:48:21.296 JST [sby][slotsync worker] DETAIL: Streaming
> transactions committing after 0/04F9DC90, reading WAL from 0/04006FC0.
> 2025-08-12 11:48:21.297 JST [sby][slotsync worker] LOG: logical
> decoding found consistent point at 0/04006FC0
> 2025-08-12 11:48:21.297 JST [sby][slotsync worker] DETAIL: There are
> no running transactions.
> 2025-08-12 11:48:21.526 JST [sby][slotsync worker] LOG: starting
> logical decoding for slot "mysub"
> 2025-08-12 11:48:21.526 JST [sby][slotsync worker] DETAIL: Streaming
> transactions committing after 0/0506A5C0, reading WAL from 0/04006FC0.
> 2025-08-12 11:48:21.527 JST [sby][slotsync worker] LOG: logical
> decoding found consistent point at 0/04006FC0
> 2025-08-12 11:48:21.527 JST [sby][slotsync worker] DETAIL: There are
> no running transactions.
> 2025-08-12 11:48:21.762 JST [sby][slotsync worker] LOG: starting
> logical decoding for slot "mysub"
> 2025-08-12 11:48:21.762 JST [sby][slotsync worker] DETAIL: Streaming
> transactions committing after 0/050C7F70, reading WAL from 0/04006FC0.
> 2025-08-12 11:48:21.763 JST [sby][slotsync worker] LOG: logical
> decoding found consistent point at 0/04006FC0
> 2025-08-12 11:48:21.763 JST [sby][slotsync worker] DETAIL: There are
> no running transactions.
> 2025-08-12 11:48:21.999 JST [sby][slotsync worker] LOG: starting
> logical decoding for slot "mysub"
> 2025-08-12 11:48:21.999 JST [sby][slotsync worker] DETAIL: Streaming
> transactions committing after 0/05111420, reading WAL from 0/04006FC0.
> 2025-08-12 11:48:21.999 JST [sby][slotsync worker] LOG: logical
> decoding found consistent point at 0/04006FC0
> 2025-08-12 11:48:21.999 JST [sby][slotsync worker] DETAIL: There are
> no running transactions.
> ----------------------------------------
>
>
> Is this expected behavior, a configuration mistake on my side, or perhaps
> an oversight in the commit of the replication slot synchronization feature?
>
> This seems more like debug output left over from development rather than
> something useful for users. If this is true, would it make sense to change
> these messages from LOG to DEBUG?
>
Thank You for reporting the issue.
It looks like commit 2633dae (mistakenly) introduced a change ([1]) in
SnapBuildSnapshotExists(), altering the format used for snapshot file
names during the search. However, SnapBuildSerialize still uses the
old format ("%s/%X-%X.snap"), which led to the slot-sync worker being
unable to locate existing snapshot files. Reverting the format
resolved the issue for me. Could you please check if the attached
patch works for you as well?
[1]:
SnapBuildSnapshotExists()
- sprintf(path, "%s/%X-%X.snap",
+ sprintf(path, "%s/%08X-%08X.snap",
thanks
Shveta
Attachment | Content-Type | Size |
---|---|---|
0001-Revert-wrong-change-in-SnapBuildSnapshotExists.patch | application/octet-stream | 805 bytes |
From | Date | Subject | |
---|---|---|---|
Next Message | Amit Kapila | 2025-08-12 06:56:38 | Re: Parallel Apply |
Previous Message | jian he | 2025-08-12 06:37:10 | Re: implement CAST(expr AS type FORMAT 'template') |