Re: How to simulate sync/async standbys being closer/farther (network distance) to primary in core postgres?

From: SATYANARAYANA NARLAPURAM <satyanarlapuram(at)gmail(dot)com>
To: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
Cc: Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: How to simulate sync/async standbys being closer/farther (network distance) to primary in core postgres?
Date: 2022-04-08 16:52:23
Message-ID: CAHg+QDewHQU6LHNw0hRmV2Ca0=BxD538V1wLnwuBwrwwQWNtQQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Apr 8, 2022 at 6:44 AM Bharath Rupireddy <
bharath(dot)rupireddyforpostgres(at)gmail(dot)com> wrote:

> On Wed, Apr 6, 2022 at 4:30 PM Ashutosh Bapat
> <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com> wrote:
> >
> > On Tue, Apr 5, 2022 at 9:23 PM Bharath Rupireddy
> > <bharath(dot)rupireddyforpostgres(at)gmail(dot)com> wrote:
> > >
> > > Hi,
> > >
> > > I'm thinking if there's a way in core postgres to achieve $subject. In
> > > reality, the sync/async standbys can either be closer/farther (which
> > > means sync/async standbys can receive WAL at different times) to
> > > primary, especially in cloud HA environments with primary in one
> > > Availability Zone(AZ)/Region and standbys in different AZs/Regions.
> > > $subject may not be possible on dev systems (say, for testing some HA
> > > features) unless we can inject a delay in WAL senders before sending
> > > WAL.
>

Simulation will be helpful even for end customers to simulate faults in the
production environments during availability zone/disaster recovery drills.

> > >
> > > How about having two developer-only GUCs {async,
> > > sync}_wal_sender_delay? When set, the async and sync WAL senders will
> > > delay sending WAL by {async, sync}_wal_sender_delay
> > > milliseconds/seconds? Although, I can't think of any immediate use, it
> > > will be useful someday IMO, say for features like [1], if it gets in.
> > > With this set of GUCs, one can even add core regression tests for HA
> > > features.
>

I would suggest doing this at the slot level, instead of two GUCs that
control the behavior of all the slots (physical/logical). Something like
"pg_suspend_replication_slot and pg_Resume_replication_slot"?
Alternatively a GUC on the standby side instead of primary so that the wal
receiver stops responding to the wal sender? This helps achieve the same as
above but the granularity is now at individual replica level.

Thanks,
Satya

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Nathan Bossart 2022-04-08 16:53:12 Re: avoid multiple hard links to same WAL file after a crash
Previous Message Peter Geoghegan 2022-04-08 16:44:12 Re: Lowering the ever-growing heap->pd_lower