From: | Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> |
---|---|
To: | Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, abhishek(dot)bhola(at)japannext(dot)co(dot)jp |
Cc: | PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: walsender timeout on logical replication set |
Date: | 2021-09-17 04:48:11 |
Message-ID: | CAA4eK1K56Ag8nuAH1t0Q0cH6TDkPQaQD5SsP_r2n90zJ56sshA@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Mon, Sep 13, 2021 at 7:01 AM Kyotaro Horiguchi
<horikyota(dot)ntt(at)gmail(dot)com> wrote:
>
> Hello.
>
> As reported in [1] it seems that walsender can suffer timeout in
> certain cases. It is not clearly confirmed, but I suspect that
> there's the case where LogicalRepApplyLoop keeps running the innermost
> loop without receiving keepalive packet for longer than
> wal_sender_timeout (not wal_receiver_timeout).
>
Why is that happening? In the previous investigation in this area [1]
your tests revealed that after reading a WAL page, we always send keep
alive, so even if the transaction is large, we should send some
keepalive in-between.
The other thing that I am not able to understand from Abhishek's reply
[2] is why increasing wal_sender_timeout/wal_recevier_timeout leads to
the removal of required WAL segments. As per my understanding, we
shouldn't remove WAL unless we get confirmation that the subscriber
has processed it.
[1] - https://www.postgresql.org/message-id/20210610.150016.1709823354377067679.horikyota.ntt%40gmail.com
[2] - https://www.postgresql.org/message-id/CAEDsCzjEHLxgqa4d563CKFwSbgBvvnM91Cqfq_qoZDXCkyOsiw%40mail.gmail.com
Note - I have added Abhishek to see if he has answers to any of these questions.
--
With Regards,
Amit Kapila.
From | Date | Subject | |
---|---|---|---|
Next Message | Amit Kapila | 2021-09-17 05:05:27 | Re: Logical replication keepalive flood |
Previous Message | Kyotaro Horiguchi | 2021-09-17 04:14:30 | Re: Improve logging when using Huge Pages |