Re: Time delayed LR (WAS Re: logical replication restrictions)

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
Cc: "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Önder Kalacı <onderkalaci(at)gmail(dot)com>, "Yu Shi (Fujitsu)" <shiy(dot)fnst(at)fujitsu(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, "andres(at)anarazel(dot)de" <andres(at)anarazel(dot)de>, "vignesh21(at)gmail(dot)com" <vignesh21(at)gmail(dot)com>, "shveta(dot)malik(at)gmail(dot)com" <shveta(dot)malik(at)gmail(dot)com>, "Takamichi Osumi (Fujitsu)" <osumi(dot)takamichi(at)fujitsu(dot)com>, "dilipbalaut(at)gmail(dot)com" <dilipbalaut(at)gmail(dot)com>, Nathan Bossart <nathandbossart(at)gmail(dot)com>, "euler(at)eulerto(dot)com" <euler(at)eulerto(dot)com>, "m(dot)melihmutlu(at)gmail(dot)com" <m(dot)melihmutlu(at)gmail(dot)com>, "marcos(at)f10(dot)com(dot)br" <marcos(at)f10(dot)com(dot)br>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Subject: Re: Time delayed LR (WAS Re: logical replication restrictions)
Date: 2023-05-11 03:19:37
Message-ID: CAA4eK1JTK464_v+YvK1dyOg2Gu=q=OwQ3GXZiwNj_MfhXjZGbA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, May 10, 2023 at 5:35 PM Bharath Rupireddy
<bharath(dot)rupireddyforpostgres(at)gmail(dot)com> wrote:
>
> On Fri, Apr 28, 2023 at 2:35 PM Hayato Kuroda (Fujitsu)
> <kuroda(dot)hayato(at)fujitsu(dot)com> wrote:
> >
> > Dear hackers,
> >
> > I rebased and refined my PoC. Followings are the changes:
>
> Thanks.
>
> Apologies for being late here. Please bear with me if I'm repeating
> any of the discussed points.
>
> I'm mainly trying to understand the production level use-case behind
> this feature, and for that matter, recovery_min_apply_delay. AFAIK,
> people try to keep the replication lag as minimum as possible i.e.
> near zero to avoid the extreme problems on production servers - wal
> file growth, blocked vacuum, crash and downtime.
>
> The proposed feature commit message and existing docs about
> recovery_min_apply_delay justify the reason as 'offering opportunities
> to correct data loss errors'. If someone wants to enable
> recovery_min_apply_delay/min_apply_delay on production servers, I'm
> guessing their values will be in hours, not in minutes; for the simple
> reason that when a data loss occurs, people/infrastructure monitoring
> postgres need to know it first and need time to respond with
> corrective actions to recover data loss. When these parameters are
> set, the primary server mustn't be generating too much WAL to avoid
> eventual crash/downtime. Who would really want to be so defensive
> against somebody who may or may not accidentally cause data loss and
> enable these features on production servers (especially when these can
> take down the primary server) and live happily with the induced
> replication lag?
>
> AFAIK, PITR is what people use for recovering from data loss errors in
> production.
>

I think PITR is not a preferred way to achieve this because it can be
quite time-consuming. See how Gitlab[1] uses delayed replication in
PostgreSQL. This is one of the use cases I came across but I am sure
there will be others as well, otherwise, we would not have introduced
this feature in the first place.

Some of the other solutions like MySQL also have this feature. See
[2], you can also read the other use cases in that article. It seems
pglogical has this feature and there is a customer demand for the same
[3]

> IMO, before we even go implement the apply delay feature for logical
> replication, it's worth to understand if induced replication lags have
> any production level significance.
>

I think the main thing here is to come up with the right design to
implement this feature. In the last release, we found some blocking
problems with the proposed patch at that time but Kuroda-San came up
with a new patch with a different design based on the discussion here.
I haven't looked at it yet though.

[1] - https://about.gitlab.com/blog/2019/02/13/delayed-replication-for-disaster-recovery-with-postgresql/
[2] - https://dev.mysql.com/doc/refman/8.0/en/replication-delayed.html
[3] - https://www.postgresql.org/message-id/73b06a32-56ab-4056-86ff-e307f3c316f1%40www.fastmail.com

--
With Regards,
Amit Kapila.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Hayato Kuroda (Fujitsu) 2023-05-11 04:19:40 RE: Time delayed LR (WAS Re: logical replication restrictions)
Previous Message Wei Wang (Fujitsu) 2023-05-11 03:18:21 RE: [PoC] pg_upgrade: allow to upgrade publisher node