Re: pg_receivewal starting position

From: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
To: ronan(dot)dunklau(at)aiven(dot)io
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: pg_receivewal starting position
Date: 2021-07-29 05:32:37
Message-ID: 20210729.143237.1445082330516607084.horikyota.ntt@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

At Wed, 28 Jul 2021 12:57:39 +0200, Ronan Dunklau <ronan(dot)dunklau(at)aiven(dot)io> wrote in
> Le mercredi 28 juillet 2021, 08:22:30 CEST Kyotaro Horiguchi a écrit :
> > At Tue, 27 Jul 2021 07:50:39 +0200, Ronan Dunklau <ronan(dot)dunklau(at)aiven(dot)io>
> > wrote in
> > > I don't know if it should be the default, toggled by a command line flag,
> > > or if we even should let the user provide a LSN.
> >
> > *I* think it is completely reasonable (or at least convenient or less
> > astonishing) that pg_receivewal starts from the restart_lsn of the
> > replication slot to use. The tool already decides the clean-start LSN
> > a bit unusual way. And it seems to me that proposed behavior can be
> > the default when -S is specified.
> >
>
> As of now we can't get the replication_slot restart_lsn with a replication
> connection AFAIK.
>
> This implies that the patch could require the user to specify a maintenance-db
> parameter, and we would use that if provided to fetch the replication slot
> info, or fallback to the previous behaviour. I don't really like this approach
> as the behaviour changing wether we supply a maintenance-db parameter or not
> is error-prone for the user.
>
> Another option would be to add a new replication command (for example
> ACQUIRE_REPLICATION_SLOT <slot_name>) to set the replication slot as the
> current one, and return some info about it (restart_lsn at least for a
> physical slot).

I didn't thought in details. But I forgot that ordinary SQL commands
have been prohibited in physical replication connection. So we need a
new replication command but it's not that a big deal.

> I don't see any reason not to make it work for logical replication connections
> / slots, but it wouldn't be that useful since we can query the database in
> that case.

Ordinary SQL queries are usable on a logical replication slot so
I'm not sure how logical replication connection uses the command.
However, like you, I wouldn't bother restricting the command to
physical replication, but perhaps the new command should return the
slot type.

> Acquiring the replication slot instead of just reading it would make sure that
> no other process could start the replication between the time we read the
> restart_lsn and when we issue START_REPLICATION. START_REPLICATION could then
> check if we already have a replication slot, and ensure it is the same one as
> the one we're trying to use.

I'm not sure it's worth adding complexity for such strictness.
START_REPLICATION safely fails if someone steals the slot meanwhile.
In the first place there's no means to protect a slot from others
while idle. One possible problem is the case where START_REPLICATION
successfully acquire the slot after the new command failed. But that
case doesn't seem worse than the case someone advances the slot while
absence. So I think READ_REPLICATION_SLOT is sufficient.

> From pg_receivewal point of view, this would amount to:
>
> - check if we currently have wal in the target directory.
> - if we do, proceed as currently done, by computing the start lsn and
> timeline from the last archived wal
> - if we don't, and we have a slot, run ACQUIRE_REPLICATION_SLOT. Use the
> restart_lsn as the start lsn if there is one, and don't provide a timeline
> - if we still don't have a start_lsn, fallback to using the current server
> wal position as is done.

That's pretty much it.

> What do you think ? Which information should we provide about the slot ?

We need the timeline id to start with when using restart_lsn. The
current timeline can be used in most cases but there's a case where
the LSN is historical.

pg_receivewal doesn't send a replication status report when a segment
is finished. So after pg_receivewal stops just after a segment is
finished, the slot stays at the beginning of the last segment. Thus
next time it will start from there, creating a duplicate segment.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amul Sul 2021-07-29 05:47:16 Re: needless complexity in StartupXLOG
Previous Message kuroda.hayato@fujitsu.com 2021-07-29 05:12:57 [postgres_fdw] add local pid to fallback_application_name