From: | Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> |
---|---|
To: | Fabrice Chapuis <fabrice636861(at)gmail(dot)com> |
Cc: | PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: Logical replication timeout problem |
Date: | 2021-09-21 09:52:29 |
Message-ID: | CAA4eK1JYB2eLNTs48szQ8oJXW9GLcrxT0yWETQS0gF2sVBxMYA@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Tue, Sep 21, 2021 at 1:52 PM Fabrice Chapuis <fabrice636861(at)gmail(dot)com> wrote:
>
> If I understand, the instruction to send keep alive by the wal sender has not been reached in the for loop, for what reason?
> ...
> * Check for replication timeout. */
> WalSndCheckTimeOut();
>
> /* Send keepalive if the time has come */
> WalSndKeepaliveIfNecessary();
> ...
>
Are you sure that these functions have not been called? Or the case is
that these are called but due to some reason the keep-alive is not
sent? IIUC, these are called after processing each WAL record so not
sure how is it possible in your case that these are not reached?
> The data load is performed on a table which is not replicated, I do not understand why the whole transaction linked to an insert is copied to snap files given that table does not take part of the logical replication.
>
It is because we don't know till the end of the transaction (where we
start sending the data) whether the table will be replicated or not. I
think specifically for this purpose the new 'streaming' feature
introduced in PG-14 will help us to avoid writing data of such tables
to snap/spill files. See 'streaming' option in Create Subscription
docs [1].
> We are going to do a test by modifying parameters wal_sender_timeout/wal_receiver_timeout from 1' to 5'. The problem is that these parameters are global and changing them will also impact the physical replication.
>
Do you mean you are planning to change from 1 minute to 5 minutes? I
agree with the global nature of parameters and I think your approach
to finding out the root cause is good here because otherwise, under
some similar or more heavy workload, it might lead to the same
situation.
> Concerning the walsender timeout, when the worker is started again after a timeout, it will trigger a new walsender associated with it.
>
Right, I know that but I was curious to know if the walsender has
exited before walreceiver.
[1] - https://www.postgresql.org/docs/devel/sql-createsubscription.html
--
With Regards,
Amit Kapila.
From | Date | Subject | |
---|---|---|---|
Next Message | Amit Kapila | 2021-09-21 09:59:25 | Re: POC: Cleaning up orphaned files using undo logs |
Previous Message | Amit Kapila | 2021-09-21 09:04:40 | Re: row filtering for logical replication |