Re: Time delayed LR (WAS Re: logical replication restrictions)

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>
Cc: "Takamichi Osumi (Fujitsu)" <osumi(dot)takamichi(at)fujitsu(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, Euler Taveira <euler(at)eulerto(dot)com>, Melih Mutlu <m(dot)melihmutlu(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Marcos Pegoraro <marcos(at)f10(dot)com(dot)br>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Peter Smith <smithpb2250(at)gmail(dot)com>
Subject: Re: Time delayed LR (WAS Re: logical replication restrictions)
Date: 2022-12-12 12:40:00
Message-ID: CAA4eK1Jg-KVAi5_WDj9pu1g_Uc_DHLxdU9ELnV_1ot-Sv6VvMg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Dec 12, 2022 at 1:04 PM Hayato Kuroda (Fujitsu)
<kuroda(dot)hayato(at)fujitsu(dot)com> wrote:
>
> This is a reply for later part of your e-mail.
>
> > > (2) About the timeout issue
> > >
> > > When having a look at the physical replication internals,
> > > it conducts sending feedback and application of delay separately on different
> > processes.
> > > OTOH, the logical replication needs to achieve those within one process.
> > >
> > > When we want to apply delay and avoid the timeout,
> > > we should not store all the transactions data into memory.
> > > So, one approach for this is to serialize the transaction data and after the delay,
> > > we apply the transactions data.
> > >
> >
> > It is not clear to me how this will avoid a timeout.
>
> At first, the reason why the timeout occurs is that while delaying the apply
> worker neither reads messages from the walsender nor replies to it.
> The worker's last_recv_timeout will be not updated because it does not receive
> messages. This leads to wal_receiver_timeout. Similarly, the walsender's
> last_processing will be not updated and exit due to the timeout because the
> worker does not reply to upstream.
>
> Based on the above, we thought that workers must receive and handle messages
> evenif they are delaying applying transactions. In more detail, workers must
> iterate the outer loop in LogicalRepApplyLoop().
>
> If workers receive transactions but they need to delay applying, they must keep
> messages somewhere. So we came up with the idea that workers serialize changes
> once and apply later. Our basic design is as follows:
>
> * All transactions areserialized to files if min_apply_delay is set to non-zero.
> * After receiving the commit message and spending time, workers reads and
> applies spooled messages
>

I think this may be more work than required because in some cases
doing I/O just to delay xacts will later lead to more work. Can't we
send some ping to walsender to communicate that walreceiver is alive?
We already seem to be sending a ping in LogicalRepApplyLoop if we
haven't heard anything from the server for more than
wal_receiver_timeout / 2. Now, it is possible that the walsender is
terminated due to some other reason and we need to see if we can
detect that or if it will only be detected once the walreceiver's
delay time is over.

--
With Regards,
Amit Kapila.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Pavel Stehule 2022-12-12 13:00:24 Re: Add PL/pgSQL extra check no_data_found
Previous Message Мельников Игорь 2022-12-12 12:36:58 Re: Add PL/pgSQL extra check no_data_found