Re: Notes on physical replica failover with logical publisher or subscriber

From: Alexey Kondratov <a(dot)kondratov(at)postgrespro(dot)ru>
To: Craig Ringer <craig(dot)ringer(at)enterprisedb(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>, Petr Jelinek <petr(dot)jelinek(at)enterprisedb(dot)com>, Markus Wanner <markus(dot)wanner(at)enterprisedb(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>
Subject: Re: Notes on physical replica failover with logical publisher or subscriber
Date: 2020-11-30 17:34:21
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi Craig,

On 2020-11-30 06:59, Craig Ringer wrote:

Thank you for sharing these notes. I have not dealt a lot with
physical/logical replication interoperability, so those were mostly new
problems for me to know.

One point from the wiki page, which seems clear enough to me:

Logical slots can fill pg_wal and can't benefit from archiving. Teach
the logical decoding page read callback how to use the restore_command
to retrieve WAL segs temporarily if they're not found in pg_wal...

It does not look like a big deal to teach logical decoding process to
use restore_command, but I have some doubts about how everything will
perform in the case when we started getting WAL from archive for
decoding purposes. If we started using restore_command, then subscriber
lagged long enough to exceed max_slot_wal_keep_size. Taking into account
that getting WAL files from the archive has an additional overhead and
that primary continues generating (and archiving) new segments, there is
a possibility for primary to start doing this double duty forever ---
archive WAL file at first and get it back for decoding when requested.

Another problem is that there are maybe several active decoders, IIRC,
so they would have better to communicate in order to avoid fetching the
same segment twice.

> I tried to address many of these issues with failover slots, but I am
> not trying to beat that dead horse now. I know that at least some
> people here are of the opinion that effort shouldn't go into
> logical/physical replication interoperation anyway - that we should
> instead address the remaining limitations in logical replication so
> that it can provide complete HA capabilities without use of physical
> replication. So for now I'm just trying to save others who go looking
> into these issues some time and warn them about some of the less
> obvious booby-traps.

Another point to add regarding logical replication capabilities to build
logical-only HA system --- logical equivalent of pg_rewind. At least I
have not noticed anything after brief reading of the wiki page. IIUC,
currently there is no way to quickly return ex-primary (ex-logical
publisher) into HA-cluster without doing a pg_basebackup, isn't it? It
seems that we should have the same problem here as with physical
replication --- ex-primary may accept some xacts after promotion of new
primary, so their history diverges and old primary should be rewound
before being returned as standby (subscriber).

Alexey Kondratov

Postgres Professional
Russian Postgres Company

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2020-11-30 17:38:31 Re: Cost overestimation of foreign JOIN
Previous Message 2020-11-30 17:27:11 Re: Re: parallel distinct union and aggregate support patch