Re: Synchronizing slots from primary to standby

From: Andres Freund <andres(at)anarazel(dot)de>
To: Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Synchronizing slots from primary to standby
Date: 2022-02-18 22:23:19
Message-ID: 20220218222319.yozkbhren7vkjbi5@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2022-02-11 15:28:19 +0100, Peter Eisentraut wrote:
> On 05.02.22 20:59, Andres Freund wrote:
> > On 2022-01-03 14:46:52 +0100, Peter Eisentraut wrote:
> > > From ec00dc6ab8bafefc00e9b1c78ac9348b643b8a87 Mon Sep 17 00:00:00 2001
> > > From: Peter Eisentraut<peter(at)eisentraut(dot)org>
> > > Date: Mon, 3 Jan 2022 14:43:36 +0100
> > > Subject: [PATCH v3] Synchronize logical replication slots from primary to
> > > standby
> > I've just skimmed the patch and the related threads. As far as I can tell this
> > cannot be safely used without the conflict handling in [1], is that correct?
>
> This or similar questions have been asked a few times about this or similar
> patches, but they always come with some doubt.

I'm certain it's a problem - the only reason I couched it was that there could
have been something clever in the patch preventing problems that I missed
because I just skimmed it.

> If we think so, it would be
> useful perhaps if we could come up with test cases that would demonstrate
> why that other patch/feature is necessary. (I'm not questioning it
> personally, I'm just throwing out ideas here.)

The patch as-is just breaks one of the fundamental guarantees necessary for
logical decoding, that no rows versions can be removed that are still required
for logical decoding (signalled via catalog_xmin). So there needs to be an
explicit mechanism upholding that guarantee, but there is not right now from
what I can see.

One piece of the referenced patchset is that it adds information about removed
catalog rows to a few WAL records, and then verifies during replay that no
record can be replayed that removes resources that are still needed. If such a
conflict exists it's dealt with as a recovery conflict.

That itself doesn't provide prevention against removal of required, but it
provides detection. The prevention against removal can then be done using a
physical replication slot with hot standby feedback or some other mechanism
(e.g. slot syncing mechanism could maintain a "placeholder" slot on the
primary for all sync targets or something like that).

Even if that infrastructure existed / was merged, the slot sync stuff would
still need some very careful logic to protect against problems due to
concurrent WAL replay and "synchronized slot" creation. But that's doable.

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrew Dunstan 2022-02-18 22:34:39 Re: killing perl2host
Previous Message Tom Lane 2022-02-18 22:18:21 Re: Emit a warning if the extension's GUC is set incorrectly