Re: BUG #15591: pg_receivewal does not honor replication slots

From: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: BUG #15591: pg_receivewal does not honor replication slots
Date: 2019-01-13 15:36:02
Message-ID: CAMkU=1zTe8toCD+df9isTs_JOhexd-3f2o8PS=oFEHmcmde=tQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On Fri, Jan 11, 2019 at 1:50 PM Andres Freund <andres(at)anarazel(dot)de> wrote:

> Hi,
>
> On 2019-01-11 16:52:42 +0000, PG Bug reporting form wrote:
> > The following bug has been logged on the website:
> >
> > Bug reference: 15591
> > Logged by: Jeff Janes
> > Email address: jeff(dot)janes(at)gmail(dot)com
> > PostgreSQL version: 11.1
> > Operating system: all
> > Description:
> >
> > When you invoke pg_receivewal using --slot to give it the name of an
> > existing slot which has WAL reserved, and -D pointing to an empty
> directory,
> > it fast-forwards the slot's LSN reservation to the beginning of the most
> > recent WAL file on the server, and starts streaming from there. Rather
> than
> > streaming from the LSN reservation point.
>
> ...

> > Does this not utterly destroy the main point of using slots? If I didn't
> > want to ensure a gapless WAL stream, why use slots in the first place?
>
> So the upstream server doesn't drop WAL that a standby (or something
> like that) still needs? It's pretty rare to randomly start to stream to
> a differnt place.
>

I don't want to start it randomly. I want to start it where the
pg_basebackup (or some other backup method) using the same slot name left
off, which is not-by-coincidence the same place or later than where the
slot itself left off. I thought that that was the point of slots--or at
least the user-facing documentation implies it is and I don't see that it
disclaims it for this particular case. It seems like pg_receivelog is a
second class citizen, it doesn't count as either a standby, or as
"something like that". At least not when you are first transitioning from
the base backup to it. If you are resuming an interrupted or lagging
pg_receivewal, then the slot does do its job. So the slots appear to be
global on the surface, but functionally they are local to pg_receivewal.

The barrier to fixing it is that the replication protocol offers neither a
way to interrogate where a slot left off, nor a way to tell it to pick up
where a slot left off (regressed to the start of the WAL file). Other
users of slot have a way to figure that out for themselves, but
pg_receivewal (are there others?) do not.

A work around is to "seed" the directory about to be used by pg_receivewal
by copying the last wal file from the backup's pg_wal into it. (and adding
.partial to the end? That probably isn't needed as the end of backup does
a log switch)

If this isn't a bug, then is there a way to document it so the end user
knows what is going on? Or is there existing documentation I am
overlooking? I guess the doc change would need to be in pg_receivelog, if
the problem is unique to it.

Cheers,

Jeff

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Michael Paquier 2019-01-14 00:45:20 Re: Is temporary functions feature official/supported? Found some issues with it.
Previous Message Дилян Палаузов 2019-01-13 15:25:58 Re: psql and readline comments