Re: Multi-Master Logical Replication

From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Peter Smith <smithpb2250(at)gmail(dot)com>, Yura Sokolov <y(dot)sokolov(at)postgrespro(dot)ru>, vignesh C <vignesh21(at)gmail(dot)com>, "kuroda(dot)hayato(at)fujitsu(dot)com" <kuroda(dot)hayato(at)fujitsu(dot)com>, Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Multi-Master Logical Replication
Date: 2022-06-01 14:03:18
Message-ID: Ypdxpke2NsaKTnLL@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Jun 1, 2022 at 10:27:27AM +0530, Amit Kapila wrote:
> On Tue, May 31, 2022 at 7:36 PM Bruce Momjian <bruce(at)momjian(dot)us> wrote:
> > Uh, thinking some more, why would anyone set things up this way ---
> > having part of a table being primary on one server and a different part
> > of the table be a subscriber. Seems it would be simpler and safer to
> > create two child tables and have one be primary on only one server.
> > Users can access both tables using the parent.
>
> Yes, users can choose to do that way but still, to keep the nodes in
> sync and continuity of operations, it will be very difficult to manage
> the operations without the LRG APIs. Let us consider a simple two-node
> example where on each node there is Table T that has partitions P1 and
> P2. As far as I can understand, one needs to have the below kind of
> set-up to allow local operations on geographically distributed nodes.
>
> Node-1:
> node1 writes to P1
> node1 publishes P1
> node2 subscribes to P1 of node1
>
> Node-2:
> node2 writes to P2
> node2 publishes P2
> node1 subscribes to P2 on node2

Yes, that is how you would set it up.

> In this setup, we need to publish individual partitions, otherwise, we
> will face the loop problem where the data sent by node-1 to node-2 via
> logical replication will again come back to it causing problems like
> constraints violations, duplicate data, etc. There could be other ways
> to do this set up with current logical replication commands (for ex.
> publishing via root table) but that would require ways to avoid loops
> and could have other challenges.

Right, individual paritions.

> Now, in such a setup/scheme, consider a scenario (scenario-1), where
> node-2 went off (either it crashes, went out of network, just died,
> etc.) and comes up after some time. Now, one can either make the
> node-2 available by fixing the problem it has or can promote standby
> in that location (if any) to become master, both might require some
> time. In the meantime to continue the operations (which provides a
> seamless experience to users), users will be connected to node-1 to
> perform the required write operations. Now, to achieve this without
> LRG APIs, it will be quite complex for users to keep the data in sync.
> One needs to perform various steps to get the partition P2 data that
> went to node-1 till the time node-2 was not available. On node-1, it
> has to publish P2 changes for the time node-2 becomes available with
> the help of Create/Drop Publication APIs. And when node-2 comes back,
> it has to create a subscription for the above publication pub-2 to get
> that data, ensure both the nodes and in sync, and then allow
> operations on node-2.

Well, you are going to need to modify the app so it knows it can write
to both partitions on failover anyway. I just don't see how adding this
complexity is wise.

My big point is that you should not be showing up with a patch but
rather have these discussions to get agreement that this is the
direction the community wants to go.

--
Bruce Momjian <bruce(at)momjian(dot)us> https://momjian.us
EDB https://enterprisedb.com

Indecision is a decision. Inaction is an action. Mark Batterson

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2022-06-01 14:05:39 Re: Assorted small doc patches
Previous Message Robert Haas 2022-06-01 14:01:34 Re: Prevent writes on large objects in read-only transactions