Re: pg_receivewal starting position

From: Ronan Dunklau <ronan(dot)dunklau(at)aiven(dot)io>
To: Michael Paquier <michael(at)paquier(dot)xyz>
Cc: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, bharath(dot)rupireddyforpostgres(at)gmail(dot)com, pgsql-hackers(at)lists(dot)postgresql(dot)org, sawada(dot)mshk(at)gmail(dot)com
Subject: Re: pg_receivewal starting position
Date: 2021-09-06 06:50:21
Message-ID: 16991660.zIo0mqpRSP@aivenronan
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Le lundi 6 septembre 2021, 06:22:30 CEST Michael Paquier a écrit :
> On Fri, Sep 03, 2021 at 11:58:27AM +0200, Ronan Dunklau wrote:
> > 0002 for pg_receivewal tried to simplify the logic of information to
> > return, by using a dedicated struct for this. This accounts for Bahrath's
> > demands to return every possible field.
> > In particular, an is_logical field simplifies the detection of the type of
> > slot. In my opinion it makes sense to simplify it like this on the client
> > side while being more open-minded on the server side if we ever need to
> > provide a new type of slot. Also, GetSlotInformation now returns an enum
> > to be able to handle the different modes of failures, which differ
> > between pg_receivewal and pg_basebackup.
>
> + if (PQserverVersion(conn) < 150000)
> + return READ_REPLICATION_SLOT_UNSUPPORTED;
> [...]
> +typedef enum {
> + READ_REPLICATION_SLOT_OK,
> + READ_REPLICATION_SLOT_UNSUPPORTED,
> + READ_REPLICATION_SLOT_ERROR,
> + READ_REPLICATION_SLOT_NONEXISTENT
> +} ReadReplicationSlotStatus;
>
> Do you really need this much complication? We could treat the
> unsupported case and the non-existing case the same way: we don't know
> so just assign InvalidXlogRecPtr or NULL to the fields of the
> structure, and make GetSlotInformation() return false just on error,
> with some pg_log_error() where adapted in its internals.

I actually started with the implementation you propose, but changed my mind
while writing it because I realised it's easier to reason about like this,
instead of failing silently during READ_REPLICATION_SLOT to fail a bit later
when actually trying to start the replication slot because it doesn't exists.
Either the user expects the replication slot to exists, and in this case we
should retry the whole loop in the hope of getting an interesting LSN, or the
user doesn't and shouldn't even pass a replication_slot name to begin with.

>
> > There is still the concern raised by Bharath about being able to select
> > the
> > way to fetch the replication slot information for the user, and what
> > should or should not be included in the documentation. I am in favor of
> > documenting the process of selecting the wal start, and maybe include a
> > --start-lsn option for the user to override it, but that's maybe for
> > another patch.
>
> The behavior of pg_receivewal that you are implementing should be
> documented. We don't say either how the start location is selected
> when working on an existing directory, so I would recommend to add a
> paragraph in the description section to detail all that, as of:
> - First a scan of the existing archives is done.
> - If nothing is found, and if there is a slot, request the slot
> information.
> - If still nothing (aka slot undefined, or command not supported), use
> the last flush location.

Sounds good, I will add another patch for the documentation of this.

>
> As a whole, I am not really convinced that we need a new option for
> that as long as we rely on a slot with pg_receivewal as these are used
> to make sure that we avoid holes in the WAL archives.
>
> Regarding pg_basebackup, Daniel has proposed a couple of days ago a
> different solution to trap errors earlier, which would cover the case
> dealt with here:
> https://www.postgresql.org/message-id/0F69E282-97F9-4DB7-8D6D-F927AA6340C8@y
> esql.se

I will take a look.

> We should not mimic in the frontend errors that are safely trapped
> in the backend with the proper locks, in any case.

I don't understand what you mean by this ? My original proposal was for the
command to actually attach to the replication slot while reading it, thus
keeping a lock on it to prevent dropping or concurrent access on the server.
>
> While on it, READ_REPLICATION_SLOT returns a confirmed LSN when
> grabbing the data of a logical slot. We are not going to use that
> with pg_recvlogical as by default START_STREAMING 0/0 would just use
> the confirmed LSN. Do we have use cases where this information would
> help? There is the argument of consistency with physical slots and
> that this can be helpful to do sanity checks for clients, but that's
> rather thin.

If we don't, we should rename the command to READ_PHYSICAL_REPLICATION_SLOT
and error out on the server if the slot is not actually a physical one to
spare the client from performing those checks. I still think it's better to
support both cases as opposed to having two completely different APIs
(READ_(PHYSICAL)_REPLICATION_SLOT for physical ones on a replication
connection, pg_replication_slots view for logical ones) as it would enable
more for third-party clients at a relatively low maintenance cost for us.

--
Ronan Dunklau

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bharath Rupireddy 2021-09-06 07:07:29 Re: pg_receivewal starting position
Previous Message Drouvot, Bertrand 2021-09-06 06:47:57 Extension proposal to deal with existing orphaned files