Re: Multi-Master Logical Replication

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
Cc: Peter Smith <smithpb2250(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Multi-Master Logical Replication
Date: 2022-06-10 04:24:04
Message-ID: CAA4eK1J93-mKD7nRZGcPJChHjoRDtjkBm-RP-uJfhMbBLSR0GA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Jun 9, 2022 at 6:04 PM Bharath Rupireddy
<bharath(dot)rupireddyforpostgres(at)gmail(dot)com> wrote:
>
> On Thu, Apr 28, 2022 at 5:20 AM Peter Smith <smithpb2250(at)gmail(dot)com> wrote:
> >
> > MULTI-MASTER LOGICAL REPLICATION
> >
> > 1.0 BACKGROUND
> >
> > Let’s assume that a user wishes to set up a multi-master environment
> > so that a set of PostgreSQL instances (nodes) use logical replication
> > to share tables with every other node in the set.
> >
> > We define this as a multi-master logical replication (MMLR) node-set.
> >
> > <please refer to the attached node-set diagram>
> >
> > 1.1 ADVANTAGES OF MMLR
> >
> > - Increases write scalability (e.g., all nodes can write arbitrary data).
> > - Allows load balancing
> > - Allows rolling updates of nodes (e.g., logical replication works
> > between different major versions of PostgreSQL).
> > - Improves the availability of the system (e.g., no single point of failure)
> > - Improves performance (e.g., lower latencies for geographically local nodes)
>
> Thanks for working on this proposal. I have a few high-level thoughts,
> please bear with me if I repeat any of them:
>
> 1. Are you proposing to use logical replication subscribers to be in
> sync quorum? In other words, in an N-masters node, M (M >= N)-node
> configuration, will each master be part of the sync quorum in the
> other master?
>

What exactly do you mean by sync quorum here? If you mean to say that
each master node will be allowed to wait till the commit happens on
all other nodes similar to how our current synchronous_commit and
synchronous_standby_names work, then yes, it could be achieved. I
think the patch currently doesn't support this but it could be
extended to support the same. Basically, one can be allowed to set up
async and sync nodes in combination depending on its use case.

> 2. Is there any mention of reducing the latencies that logical
> replication will have generally (initial table sync and
> after-caught-up decoding and replication latencies)?
>

No, this won't change under the hood replication mechanism.

> 3. What if "some" postgres provider assures an SLA of very few seconds
> for failovers in typical HA set up with primary and multiple sync and
> async standbys? In this context, where does the multi-master
> architecture sit in the broad range of postgres use-cases?
>

I think this is one of the primary use cases of the n-way logical
replication solution where in there shouldn't be any noticeable wait
time when one or more of the nodes goes down. All nodes have the
capability to allow writes so the app just needs to connect to another
node. I feel some analysis is required to find out and state exactly
how the users can achieve this but seems doable. The other use cases
are discussed in this thread and are summarized in emails [1][2].

> 4. Can the design proposed here be implemented as an extension instead
> of a core postgres solution?
>

Yes, I think it could be. I think this proposal introduces some system
tables, so need to analyze what to do about that. BTW, do you see any
advantages to doing so?

> 5. Why should one use logical replication for multi master
> replication? If logical replication is used, isn't it going to be
> something like logically decode and replicate every WAL record from
> one master to all other masters? Instead, can't it be achieved via
> streaming/physical replication?
>

The failover/downtime will be much lesser in a solution based on
logical replication because all nodes are master nodes and users will
be allowed to write on other nodes instead of waiting for the physical
standby to become writeable. Then it will allow more localized
database access for geographically distributed databases, see the
email for further details on this [3]. Also, the benefiting scenarios
are the same as all usual Logical Replication quoted benefits - e.g
version independence, getting selective/required data, etc.

[1] - https://www.postgresql.org/message-id/CAA4eK1%2BZP9c6q1BQWSQC__w09WQ-qGt22dTmajDmTxR_CAUyJQ%40mail.gmail.com
[2] - https://www.postgresql.org/message-id/TYAPR01MB58660FCFEC7633E15106C94BF5A29%40TYAPR01MB5866.jpnprd01.prod.outlook.com
[3] - https://www.postgresql.org/message-id/CAA4eK1%2BDRHCNLongM0stsVBY01S-s%3DEa_yjBFnv_Uz3m3Hky-w%40mail.gmail.com

--
With Regards,
Amit Kapila.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Thomas Munro 2022-06-10 04:30:57 Re: Collation version tracking for macOS
Previous Message Peter Eisentraut 2022-06-10 04:14:11 Re: Allow foreign keys to reference a superset of unique columns