Re: Multi-Master Logical Replication

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Peter Smith <smithpb2250(at)gmail(dot)com>, Yura Sokolov <y(dot)sokolov(at)postgrespro(dot)ru>, vignesh C <vignesh21(at)gmail(dot)com>, "kuroda(dot)hayato(at)fujitsu(dot)com" <kuroda(dot)hayato(at)fujitsu(dot)com>, Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Multi-Master Logical Replication
Date: 2022-05-25 06:43:17
Message-ID: CAA4eK1+DRHCNLongM0stsVBY01S-s=Ea_yjBFnv_Uz3m3Hky-w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, May 24, 2022 at 5:57 PM Bruce Momjian <bruce(at)momjian(dot)us> wrote:
>
> On Sat, May 14, 2022 at 12:20:05PM +0530, Amit Kapila wrote:
> > On Sat, May 14, 2022 at 12:33 AM Bruce Momjian <bruce(at)momjian(dot)us> wrote:
> > >
> > > Uh, without these features, what workload would this help with?
> > >
> >
> > To allow replication among multiple nodes when some of the nodes may
> > have pre-existing data. This work plans to provide simple APIs to
> > achieve that. Now, let me try to explain the difficulties users can
> > face with the existing interface. It is simple to set up replication
> > among various nodes when they don't have any pre-existing data but
> > even in that case if the user operates on the same table at multiple
> > nodes, the replication will lead to an infinite loop and won't
> > proceed. The example in email [1] demonstrates that and the patch in
> > that thread attempts to solve it. I have mentioned that problem
> > because this work will need that patch.
> ...
> > This will become more complicated when more than two nodes are
> > involved, see the example provided for the three nodes case [2]. Can
> > you think of some other simpler way to achieve the same? If not, I
> > don't think the current way is ideal and even users won't prefer that.
> > I am not telling that the APIs proposed in this thread is the only or
> > best way to achieve the desired purpose but I think we should do
> > something to allow users to easily set up replication among multiple
> > nodes.
>
> You still have not answered my question above. "Without these features,
> what workload would this help with?" You have only explained how the
> patch would fix one of the many larger problems.
>

It helps with setting up logical replication among two or more nodes
(data flows both ways) which is important for use cases where
applications are data-aware. For such apps, it will be beneficial to
always send and retrieve data to local nodes in a geographically
distributed database. Now, for such apps, to get 100% consistent data
among nodes, one needs to enable synchronous_mode (aka set
synchronous_standby_names) but if that hurts performance and the data
is for analytical purposes then one can use it in asynchronous mode.
Now, for such cases, if the local node goes down, the other master
node can be immediately available to use, sure it may slow down the
operations for some time till the local node come-up. For such apps,
later it will be also easier to perform online upgrades.

Without this, if the user tries to achieve the same via physical
replication by having two local nodes, it can take quite long before
the standby can be promoted to master and local reads/writes will be
much costlier.

--
With Regards,
Amit Kapila.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Noah Misch 2022-05-25 06:46:58 Re: "ERROR: latch already owned" on gharial
Previous Message Peter Eisentraut 2022-05-25 06:21:26 pg_upgrade test writes to source directory