Re: How to simulate sync/async standbys being closer/farther (network distance) to primary in core postgres?

From: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
To: Julien Rouhaud <rjuju123(at)gmail(dot)com>
Cc: SATYANARAYANA NARLAPURAM <satyanarlapuram(at)gmail(dot)com>, Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: How to simulate sync/async standbys being closer/farther (network distance) to primary in core postgres?
Date: 2022-04-22 14:23:44
Message-ID: CALj2ACWCMdzPVg-=MUTmzJ82vjfJXtrGx4CzeLp10-bx9Y1Qrg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Apr 9, 2022 at 6:38 PM Julien Rouhaud <rjuju123(at)gmail(dot)com> wrote:
>
> On Sat, Apr 09, 2022 at 02:38:50PM +0530, Bharath Rupireddy wrote:
> > On Fri, Apr 8, 2022 at 10:22 PM SATYANARAYANA NARLAPURAM
> > <satyanarlapuram(at)gmail(dot)com> wrote:
> > >
> > >> > <bharath(dot)rupireddyforpostgres(at)gmail(dot)com> wrote:
> > >> > >
> > >> > > Hi,
> > >> > >
> > >> > > I'm thinking if there's a way in core postgres to achieve $subject. In
> > >> > > reality, the sync/async standbys can either be closer/farther (which
> > >> > > means sync/async standbys can receive WAL at different times) to
> > >> > > primary, especially in cloud HA environments with primary in one
> > >> > > Availability Zone(AZ)/Region and standbys in different AZs/Regions.
> > >> > > $subject may not be possible on dev systems (say, for testing some HA
> > >> > > features) unless we can inject a delay in WAL senders before sending
> > >> > > WAL.
> > >
> > > Simulation will be helpful even for end customers to simulate faults in the
> > > production environments during availability zone/disaster recovery drills.
> >
> > Right.
>
> I'm not sure that's actually helpful. If you want to do some realistic testing
> you need to fully simulate various network incidents and only delaying postgres
> replication is never going to be close to that. You should instead rely on
> tool like tc, which can do much more than what $subject could ever do, and do
> that for all your HA stack. At the very least you don't want to validate that
> your setup is working as excpected by just simulating a faulty postgres
> replication connection but still having all your clients and HA agent not
> having any network issue at all.

Agree that the external networking tools and commands can be used.
IMHO, not everyone is familiar with those tools and the tools may not
be portable and reliable all the time. And developers may not be able
to use those tools to test some of the HA related features (which may
require sync and async standbys being closer/farther to the primary)
that I or some other postgres HA solution providers may develop.
Having a reliable way within the core would actually help.

Upon thinking further, how about we have hooks in WAL sender code
(perhaps with replication slot info that it manages and some other
info) and one can implement an extension of their choice (similar to
auth_delay and ClientAuthentication_hook)?

Regards,
Bharath Rupireddy.

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrew Dunstan 2022-04-22 14:24:27 Re: pgsql: Allow db.schema.table patterns, but complain about random garbag
Previous Message Tom Lane 2022-04-22 14:18:00 Re: Fix NULL pointer reference in _outPathTarget()