Re: Non-reserved replication slots and slot advancing

From: Michael Paquier <michael(at)paquier(dot)xyz>
To: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Postgres hackers <pgsql-hackers(at)postgresql(dot)org>, Petr Jelinek <petr(at)2ndquadrant(dot)com>, Simon Riggs <simon(at)2ndQuadrant(dot)com>
Subject: Re: Non-reserved replication slots and slot advancing
Date: 2018-07-05 23:16:08
Message-ID: 20180705231608.GA2366@paquier.xyz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Jul 05, 2018 at 12:35:29PM -0400, Alvaro Herrera wrote:
> Do we use the term "reserved" anywhere else? It just doesn't click for
> me. Other than "All rights reserved", that is ...

This is used in a couple of places in the docs:
$ git grep -i slot | grep reserved
src/sgml/catalogs.sgml:
if the <literal>LSN</literal> of this slot has never been reserved.
src/sgml/func.sgml:
replication slot be reserved immediately; otherwise
src/sgml/protocol.sgml:
Drops a replication slot, freeing any reserved server-side resources

> As for the patch itself: why is the problem that the slot is "not
> reserved" in the first place? I think what we should be actually
> checking is that the target LSN is within valid limits, ie. the end
> state of the slot after the operation, rather than the initial state of
> the slot before the operation.
>
> If we made this code check the end state, we could make the error
> message be something like "target LSN is not within the allocated range"
> or something like that.

The code relies on the LSN used within the slot to define if it can be
advanced or not.

/*
* Check if the slot is not moving backwards. Physical slots rely simply
* on restart_lsn as a minimum point, while logical slots have confirmed
* consumption up to confirmed_lsn, meaning that in both cases data older
* than that is not available anymore.
*/
if (OidIsValid(MyReplicationSlot->data.database))
minlsn = MyReplicationSlot->data.confirmed_flush;
else
minlsn = MyReplicationSlot->data.restart_lsn;

And then is issues an error only if the LSN to move to is lower than the
minimum found.
if (moveto < minlsn)
{
ReplicationSlotRelease();
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("cannot move slot to %X/%X, minimum is %X/%X",
(uint32) (moveto >> 32), (uint32) moveto,
(uint32) (minlsn >> 32), (uint32) minlsn)));
}

For a non-reserved slot, there is nothing smaller than 0, so this error
would not trigger, and even if we enforce it to trigger with minlsn ==
InvalidXLogRecPtr then it would make no sense. So the root of the
problem is that there is no lower-bound which can be used as a
comparison base. Well, logically that would be the LSN of the oldest
segment still present in pg_wal, but if you begin to authorize that then
you have race conditions with checkpoints running in parallel which
could recycle the requested position. So it seems to me that ERRORing
when there is no way to define correctly the bounds where the move can
be done is the safest solution.
--
Michael

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Thomas Munro 2018-07-05 23:40:52 Re: Cache invalidation after authentication (on-the-fly role creation)
Previous Message David Rowley 2018-07-05 21:55:06 Re: documentation fixes for partition pruning, round three