| From: | Jaime Casanova <jcasanov(at)systemguards(dot)com(dot)ec> |
|---|---|
| To: | Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com> |
| Cc: | cca5507 <2624345507(at)qq(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
| Subject: | Re: could sent_lsn be lower than write/flush/replay_lsn? |
| Date: | 2025-12-31 01:42:16 |
| Message-ID: | CAJKUy5hYMtjrNR+DNduNEXchNYJTsAREyXGv8mEjG=FqG5Loww@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
On Mon, Dec 29, 2025 at 2:13 AM Ashutosh Bapat
<ashutosh(dot)bapat(dot)oss(at)gmail(dot)com> wrote:
>
> On Sat, Dec 27, 2025 at 1:18 PM cca5507 <2624345507(at)qq(dot)com> wrote:
> >
> > The sent_lsn is just where the wal sender currently reading, so it could be lower than
> > write/flush/replay_lsn.
>
> +1.
>
> I guess, the logical replication is restarting in a loop. If that's
> the case, you will find multiple errors happening in the loop. Another
> guess is it's because of the walsender/receiver timeout. Do you see
> timeout error from the corresponding background workers? What's
> downstream?
>
Thanks both of you for clarifying this, it was actually a timeout
error. It seems for some reason all the subscriber got disconnected
from provider and for a problem we had some years ago (when using
pglogical in this same customer) wal_sender_timeout was set to 1
hour... which AFAIU made the wal_sender process keep active for 1 hour
while the subscriber tried to reconnect ans saw a walsender already
connected to another (the oldest already died) PID.
We returned wal_sender_timeout to its original value and everything
started to flow...
--
Jaime Casanova
SYSTEMGUARDS S.A.
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Aditya Gollamudi | 2025-12-31 01:54:38 | [PATCH] Typo fix in fk-snapshot-3.spec |
| Previous Message | jian he | 2025-12-31 01:40:49 | Re: CREATE SCHEMA ... CREATE DOMAIN support |