Re: Multi-Master Logical Replication

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Peter Smith <smithpb2250(at)gmail(dot)com>, Yura Sokolov <y(dot)sokolov(at)postgrespro(dot)ru>, vignesh C <vignesh21(at)gmail(dot)com>, "kuroda(dot)hayato(at)fujitsu(dot)com" <kuroda(dot)hayato(at)fujitsu(dot)com>, Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Multi-Master Logical Replication
Date: 2022-06-01 04:57:27
Message-ID: CAA4eK1+jwH_TM3OQPmVVATaJixSYMdQKWe5LOWnDy2oMjKtC3Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, May 31, 2022 at 7:36 PM Bruce Momjian <bruce(at)momjian(dot)us> wrote:
>
> On Wed, May 25, 2022 at 10:32:50PM -0400, Bruce Momjian wrote:
> > On Wed, May 25, 2022 at 12:13:17PM +0530, Amit Kapila wrote:
> > >
> > > It helps with setting up logical replication among two or more nodes
> > > (data flows both ways) which is important for use cases where
> > > applications are data-aware. For such apps, it will be beneficial to
> >
> > That does make sense, thanks.
>
> Uh, thinking some more, why would anyone set things up this way ---
> having part of a table being primary on one server and a different part
> of the table be a subscriber. Seems it would be simpler and safer to
> create two child tables and have one be primary on only one server.
> Users can access both tables using the parent.
>

Yes, users can choose to do that way but still, to keep the nodes in
sync and continuity of operations, it will be very difficult to manage
the operations without the LRG APIs. Let us consider a simple two-node
example where on each node there is Table T that has partitions P1 and
P2. As far as I can understand, one needs to have the below kind of
set-up to allow local operations on geographically distributed nodes.

Node-1:
node1 writes to P1
node1 publishes P1
node2 subscribes to P1 of node1

Node-2:
node2 writes to P2
node2 publishes P2
node1 subscribes to P2 on node2

In this setup, we need to publish individual partitions, otherwise, we
will face the loop problem where the data sent by node-1 to node-2 via
logical replication will again come back to it causing problems like
constraints violations, duplicate data, etc. There could be other ways
to do this set up with current logical replication commands (for ex.
publishing via root table) but that would require ways to avoid loops
and could have other challenges.

Now, in such a setup/scheme, consider a scenario (scenario-1), where
node-2 went off (either it crashes, went out of network, just died,
etc.) and comes up after some time. Now, one can either make the
node-2 available by fixing the problem it has or can promote standby
in that location (if any) to become master, both might require some
time. In the meantime to continue the operations (which provides a
seamless experience to users), users will be connected to node-1 to
perform the required write operations. Now, to achieve this without
LRG APIs, it will be quite complex for users to keep the data in sync.
One needs to perform various steps to get the partition P2 data that
went to node-1 till the time node-2 was not available. On node-1, it
has to publish P2 changes for the time node-2 becomes available with
the help of Create/Drop Publication APIs. And when node-2 comes back,
it has to create a subscription for the above publication pub-2 to get
that data, ensure both the nodes and in sync, and then allow
operations on node-2.

Not only this, but if there are more nodes in this set-up (say-10), it
has to change (drop/create) subscriptions corresponding to partition
P2 on all other nodes as each individual node is the owner of some
partition.

Another possibility is that the entire data center where node-2 was
present was gone due to some unfortunate incident in which case they
need to set up a new data center and hence a new node. Now, in such a
case, the user needs to do all the steps mentioned in the previous
scenario and additionally, it needs to ensure that it set up the node
to sync all the existing data (of all partitions) before this node
again starts receiving write changes for partition P2.

I think all this should be relatively simpler with LRG APIs wherein
for the second scenario user ideally just needs to use the lrg_attach*
API and in the first scenario, it should automatically sync the
missing data once the node-2 comes back.

Now, the other important point that we should also consider for these
LRG APIs is the ease of setup even in the normal case where we are
just adding a new node as mentioned by Peter Smith in his email [1]
(LRG makes setup easier). e.g. even if there are many nodes we only
need a single lrg_attach by the joining node instead of needing N-1
subscriptions on all the existing nodes.

[1] - https://www.postgresql.org/message-id/CAHut%2BPsvvfTWWwE8vkgUg4q%2BQLyoCyNE7NU%3DmEiYHcMcXciXdg%40mail.gmail.com

--
With Regards,
Amit Kapila.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2022-06-01 04:59:06 Re: convert libpq uri-regress tests to tap test
Previous Message Peter Eisentraut 2022-06-01 04:55:06 Re: [RFC] building postgres with meson -v8