RE: Time delayed LR (WAS Re: logical replication restrictions)

From: "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>
To: 'Amit Kapila' <amit(dot)kapila16(at)gmail(dot)com>, "Takamichi Osumi (Fujitsu)" <osumi(dot)takamichi(at)fujitsu(dot)com>
Cc: vignesh C <vignesh21(at)gmail(dot)com>, Euler Taveira <euler(at)eulerto(dot)com>, Melih Mutlu <m(dot)melihmutlu(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Marcos Pegoraro <marcos(at)f10(dot)com(dot)br>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Peter Smith <smithpb2250(at)gmail(dot)com>
Subject: RE: Time delayed LR (WAS Re: logical replication restrictions)
Date: 2022-12-12 07:34:49
Message-ID: TYAPR01MB58669394A67F2340B82E42D1F5E29@TYAPR01MB5866.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Dear Amit,

This is a reply for later part of your e-mail.

> > (2) About the timeout issue
> >
> > When having a look at the physical replication internals,
> > it conducts sending feedback and application of delay separately on different
> processes.
> > OTOH, the logical replication needs to achieve those within one process.
> >
> > When we want to apply delay and avoid the timeout,
> > we should not store all the transactions data into memory.
> > So, one approach for this is to serialize the transaction data and after the delay,
> > we apply the transactions data.
> >
>
> It is not clear to me how this will avoid a timeout.

At first, the reason why the timeout occurs is that while delaying the apply
worker neither reads messages from the walsender nor replies to it.
The worker's last_recv_timeout will be not updated because it does not receive
messages. This leads to wal_receiver_timeout. Similarly, the walsender's
last_processing will be not updated and exit due to the timeout because the
worker does not reply to upstream.

Based on the above, we thought that workers must receive and handle messages
evenif they are delaying applying transactions. In more detail, workers must
iterate the outer loop in LogicalRepApplyLoop().

If workers receive transactions but they need to delay applying, they must keep
messages somewhere. So we came up with the idea that workers serialize changes
once and apply later. Our basic design is as follows:

* All transactions areserialized to files if min_apply_delay is set to non-zero.
* After receiving the commit message and spending time, workers reads and
applies spooled messages

> > However, this means if users adopt this feature,
> > then all transaction data that should be delayed would be serialized.
> > We are not sure if this sounds a valid approach or not.
> >
> > One another approach was to divide the time of delay in apply_delay() and
> > utilize the divided time for WaitLatch and sends the keepalive messages from
> there.
> >
>
> Do we anytime send keepalive messages from the apply side? I think we
> only send feedback reply messages as a response to the publisher's
> keep_alive message. So, we need to do something similar for this if
> you want to follow this approach.

Right, and the above mechanism is needed for workers to understand messages
and send feedback replies as a response to the publisher's keepalive message.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Takamichi Osumi (Fujitsu) 2022-12-12 07:42:30 RE: Time delayed LR (WAS Re: logical replication restrictions)
Previous Message Takamichi Osumi (Fujitsu) 2022-12-12 07:23:20 RE: Time delayed LR (WAS Re: logical replication restrictions)