| From: | Jaime Casanova <jcasanov(at)systemguards(dot)com(dot)ec> |
|---|---|
| To: | PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
| Subject: | could sent_lsn be lower than write/flush/replay_lsn? |
| Date: | 2025-12-26 17:49:33 |
| Message-ID: | CAJKUy5gayvisRsYFF0DWuO_5Hyb4No2-OU_k-eJZCWgUWAPbgQ@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
Hi,
We have a customer that for the second time have most of its logical
replicas (13 of 16) in a catchcup state, they have been working fine
for some time now and suddenly the pg_stat_replication view shows
something like this for all of the replicas in catchup state:
"""
pid | 2667517
state | catchup
sent_lsn | 38B4/67C403A8
write_lsn | 38B7/D2C9C038
flush_lsn | 38B7/D2C9C038
replay_lsn | 38B7/D2C9C038
"""
This doesn't make sense for me. This is 16.9 btw.
The pg_stat_activity says:
"""
wait_event_type | IO
wait_event | ReorderBufferWrite
state | active
backend_xid |
backend_xmin |
query_id |
query | START_REPLICATION SLOT "sub_down_tables" LOGICAL
38B7/CEBC9330 (proto_version '4', origin 'any', publication_names
'"pub_down_tables"')
backend_type | walsender
"""
And the logs keeps showing this:
"""
2025-12-26 12:17:41.861 -05 [pid=2667517;l=1;tx=0] LOG: 38B7/CEBC9330
has been already streamed, forwarding to 38B7/D2C9C038
2025-12-26 12:17:41.861 -05 [pid=2667517;l=2;tx=0] STATEMENT:
START_REPLICATION SLOT "sub_down_tables_central_trx001" LOGICAL
38B7/CEBC9330 (proto_version '4', origin 'any', publication_names
'"pub_elipsys_cresio_down_tables"')
2025-12-26 12:17:41.867 -05 [pid=2667517;l=3;tx=0] LOG: starting
logical decoding for slot "sub_down_tables_central_trx001"
2025-12-26 12:17:41.867 -05 [pid=2667517;l=4;tx=0] DETAIL: Streaming
transactions committing after 38B7/D2C9C038, reading WAL from
38B0/2261B890.
2025-12-26 12:17:41.867 -05 [pid=2667517;l=5;tx=0] STATEMENT:
START_REPLICATION SLOT "sub_down_tables_central_trx001" LOGICAL
38B7/CEBC9330 (proto_version '4', origin 'any', publication_names
'"pub_elipsys_cresio_down_tables"')
2025-12-26 12:17:41.868 -05 [pid=2667517;l=6;tx=0] LOG: logical
decoding found consistent point at 38B0/2261B890
2025-12-26 12:17:41.868 -05 [pid=2667517;l=7;tx=0] DETAIL: Logical
decoding will begin using saved snapshot.
2025-12-26 12:17:41.868 -05 [pid=2667517;l=8;tx=0] STATEMENT:
START_REPLICATION SLOT "sub_down_tables_central_trx001" LOGICAL
38B7/CEBC9330 (proto_version '4', origin 'any', publication_names
'"pub_elipsys_cresio_down_tables"')
2025-12-26 12:30:35.953 -05 [pid=2678504;l=1;tx=0] ERROR: replication
slot "sub_down_tables_central_trx001" is active for PID 2667517
2025-12-26 12:30:40.959 -05 [pid=2678564;l=1;tx=0] ERROR: replication
slot "sub_down_tables_central_trx001" is active for PID 2667517
"""
any idea what to check?
--
Jaime Casanova
SYSTEMGUARDS S.A.
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Pavel Stehule | 2025-12-26 18:12:46 | Re: 17f446784d54da827f74c2acc0fa772a41b92354 breaks orafce build |
| Previous Message | Jelte Fennema-Nio | 2025-12-26 17:30:53 | Re: cleanup: Split long Makefile lists across lines and sort them |